Service Virtualization for tests

VCR-style HTTP
record & replay

Like Ruby's VCR — record the HTTP your tests make, replay it offline — but the tape is readable Markdown you can share across languages and review in a Git diff.

One native engine · seventeen language bindings · byte-identical tapes

get_person.md
## Interaction 0: GET /person/42

### Response body (200: application/json):
  {
-   "name": "Ada Lovelace"
+   "name": "Ada Lovelace",
+   "hairColor": "auburn"
  }
API drift shows up as an ordinary line-by-line diff in code review.

About

Service Virtualization for tests

Like Ruby's VCR, Servirtium records the HTTP conversations your tests make and replays them later without calling the real service — but the recording format is canonical, human-readable Markdown, so tapes can be shared across languages and API drift is reviewable in a Git diff.

For teams that love Agile, CI/CD, DevOps, or fast builds that devs can run on their workstations too. The name: SERvice VIRTualization IUM (inspired by Selenium).

Teams consuming vendor web-APIs

Eliminate the slow, flaky and costly parts of working against someone else's API in your test suite.

Vendors shipping example usages

Ship language-agnostic example scenarios — "out of credit at time of purchase", "increase an auction bid by a specified amount" — as Markdown tapes your clients can replay.

Why a web-service vendor would adopt this

Shipping Servirtium tapes is a developer-experience play. A vendor that hands clients ready-to-replay recordings lets those clients build and test against the API offline, for free, without sandbox keys or flaky network calls — so integrations land faster and churn less. Better DX wins evaluations and grows market share, which is exactly the incentive of a challenger trying to pull developers away from an incumbent.

Conversely, a vendor that is already dominant in its market has little reason to promote Servirtium: friction-free, portable integration makes it easier for customers to evaluate and switch to rivals. The vendors most likely to embrace shareable, Git-diffable tapes are the ones competing on openness and developer goodwill — not the ones relying on lock-in.

Servirtium VCR

One engine, many languages

servirtium-vcr is the current direction for Servirtium. Instead of a separate re-implementation per language, there is a single record/replay engine — Markdown parser/emitter, HTTP server, request matching, redaction, whole-tape normalization and drift detection — built once as a native shared library, with a thin FFI binding per language on top.

Because every language drives the same engine, cross-language compatibility is a build-time guarantee, not something each library re-derives and drifts on. One monorepo, seventeen bindings, one build.

Language Binding mechanism In the monorepo
Go cgo go/
Python ctypes python/
Java FFM / Panama (JDK 22+) java/
.NET P/Invoke dotnet/
Rust libloading rust/
Ruby Fiddle ruby/
JavaScript & TypeScript koffi (Node) javascript/
Dart (and Flutter) dart:ffi dart/
PHP ext-ffi php/
Haskell foreign import ccall haskell/
Elixir shares the Erlang NIF (BEAM) elixir/
Pharo (Smalltalk) UnifiedFFI pharo/
Nim importc (linked) nim/
Zig extern C (linked) zig/
Lua C extension (Lua 5.4) lua/
Erlang C NIF (canonical, shared) erlang/
Gleam shares the Erlang NIF (BEAM) gleam/

The seventeen bindings build and test through one aeb run against the shared libservirtium_vcr.so. See the monorepo README for per-language quick-starts.

On the JVM, the Java binding also serves Kotlin, Scala, Clojure and Groovy — thin idiomatic layers (a Kotlin trailing-lambda DSL, a Groovy Closure DSL, Scala helpers, Clojure fns + with-open) over the same jar, with no second native FFI.

Looking for the independent http4k implementation or the earlier standalone per-language libraries? See Other & earlier implementations.

Markdown format

The tape file, up close

This is the tape file your tests commit and diff. One readable Markdown document captures the whole HTTP conversation — request, response, headers, bodies — so a change to the API shows up as an ordinary line-by-line diff in code review.

Each payload sits in a Markdown code fence, so the XML or JSON appears verbatim — pretty-printed and unescaped. Compare that with stuffing an XML body into a JSON (or YAML) cassette, where it has to be escaped into a single unreadable string with \" and \n everywhere. A fenced block keeps the payload exactly as it went over the wire, which is what makes the diffs legible.

Here's some raw Servirtium markdown source — shown as inline SVG, so the text is real (selectable, searchable) and stays crisp at any zoom:

example1.md 1 ## Interaction 0: GET /path/to/resource 2 3 ### Request headers recorded for playback: 4 5 ``` 6 Header1: something 7 Header2: something else 8 ``` 9 10 ### Request body recorded for playback (application/xml): 11 12 ``` 13 <hello> 14   <how-are-you/> 15 </hello> 16 ``` 17 18 ### Response headers recorded for playback: 19 20 ``` 21 Content-Type: text/plain 22 Connection: keep-alive 23 Set-Cookie: CookieName=CookieValue 24 Header-X: abc-123 25 ``` 26 27 ### Response body recorded for playback (200: text/plain): 28 29 ``` 30 Mary had a little lamb 31 ```

(actual source file: github.com/servirtium/site/examples/example1.md)

Drop that same file in a repo and a code portal renders it beautifully — here's what GitHub makes of it. The rendered view is the default, and the exact bytes are always one click away via the Raw button (or ?plain=1):

Interaction 0: GET /path/to/resource Request headers recorded for playback: Header1: something Header2: something else Request body recorded for playback (application/xml): <hello>   <how-are-you/> </hello> Response headers recorded for playback: Content-Type: text/plain Connection: keep-alive Set-Cookie: CookieName=CookieValue Header-X: abc-123 Response body recorded for playback (200: text/plain): Mary had a little lamb

That's the whole point: one file that's human-inspectable in raw form and that your code portal renders in a pretty way too. You get the readable rendered view for review and the exact raw bytes for diffing — no trade-off between the two. If your code portal is GitHub, then 'pretty' is true.

(rendered on GitHub: github.com/servirtium/site/examples/example1.md)

... and you'd be storing that VCS as you would your automated tests.

Markdown Syntax Explained

Multiple Interactions Catered For

  • Each interaction is denoted via a Level 2 Markdown Heading. e.g. ## Interaction N: <METHOD> <PATH-FROM-ROOT>
  • N starts as 0, and goes up depending on how many interactions there were in the conversation.
  • <METHOD> is GET or POST (or any standard HTML or non standard method/verb name).
  • <PATH-FROM-ROOT> is the path without the domain & port. e.g. /card/addTo.doIt

Request And Reply Details Per Interaction

Each interaction has four sections denoted by a *Level 3 Markdown headers

  1. The request headers going from the client to the HTTP server, denoted with a heading like so ### Request headers recorded for playback:
  2. The request body going from the client to the HTTP server (if applicable - GET does not use this), denoted with a heading like so ### Request body recorded for playback (<MIME-TYPE>):. And <MIME-TYPE> is something like application/json
  3. The response headers coming back from the HTTP server to the client, denoted with a heading like so ### Response headers recorded for playback:
  4. The response body coming back from the HTTP server to the client (some HTTP methods do not use this), denoted with a heading like so ### Response body recorded for playback (<STATUS-CODE>: <MIME-TYPE>):

Within each of those there is a single Markdown code block (three back-ticks) with the details of each. The lines in that block may be reformatted depending on the settings of the recorder. If binary, then there is a Base64 sequence instead (admittedly not so pretty on the eye).

Recording and Playback

Recording a HTTP conversation

You'll write your test (say JUnit) and that will use a library (that your company may have written or be from a vendor). For recording you will swap the real service URL for one running a Servirtium middle-man server (which itself will delegate to the real service). If that service is flaky - keep re-running the test manually until the service is non-flaky, and commit that Servirtium-style markdown to source-control. Best practice is to configure the same test to have two modes of operation: 'direct' and 'recording' modes. This is not a caching idea - it is deliberate - you are explicitly recording while running a test, or not recording while running a test (and doing direct to the service)

Anyway, the recording ends up in the markdown described in a text file on your file system - which you'll commit to VCS alongside your tests.

Playback of HTTP conversations

Those same markdown recordings are used in playback. Again an explicit mode - you're running in this mode and it will fail if there are no recordings in the dir/file in source control.

Playback itself will fail if the headers/body sent by the client to the real service (through the Servirtium library) are not the same they were when the recording was made. It is possible that masking/redacting and general manipulations should happen deliberately during the recording to get rid of transient aspects that are not helpful in playback situations. The test failing in this situation is deliberate - you're using this to guard against potential incompatibilities.

For example any dates in headers of the body that go from the client to the HTTP Server could be swapped for some date in the future like "2099-01-01" or a date in the past "1970-01-01".

The person who's designing the tests that recording or playback would work on the redactions/masking towards an "always passing" outcome, with no differences in the markdown regardless of the number of time the same test is re-recorded.

Note: How a difference in request-header or request-body expectation is logged in the test output needs to be part of the deliberate design of the tests themselves. This is easier said than done, and you can't catch assertion failures over HTTP.

Note2: this is a third mode of operation for the same test as in "Recording a HTTP conversation" above - "playback" mode meaning you have three modes of operation all in all.

What it is

What is Servirtium?

Servirtium records HTTP conversations from your tests and replays them later without calling the real service. Unlike most VCR-style tools, the recording is canonical Markdown under source-control, so teams using different languages can share the same tapes. It aims to be a lingua franca for mock HTTP conversations:

  • libraries for many languages with interoperable record and playback capability;
  • agnostic about test-runner choices — use any unit test or spec framework;
  • "in process" for the testing tech in question (no process spawning);
  • web-API makers can ship Markdown conversations for known scenarios (with a sample test in their language);
  • tracking incompatibility with vendor or other-team web-APIs is easy (the key TCK concept).

How this differs from Ruby VCR

If you know VCR (or a Betamax-style port), you already know the workflow. Servirtium keeps the familiar record/replay loop and adds:

Familiar workflow

The same record-once, replay-forever loop you'd get from VCR.

Readable Markdown

Human-readable Markdown recordings, not YAML/JSON cassettes.

Language-neutral tapes

Record in one language, replay in another.

Git diffs as evidence

Tape diffs document API compatibility — or drift.

Shared core engine

One native engine driven by thin per-language bindings.

Cross-team negotiation

Record with a Ruby Test::Unit library, replay it from Java JUnit — useful when the publishing and consuming teams use different stacks.

In common with other Service Virtualization technologies

  • · tests can leverage previously recorded HTTP conversations;
  • · much faster than real services, which vary in speed and are typically slower than desired;
  • · don't fail in unexpected ways — real services can be flaky;
  • · don't require per-person credentials — real services often need API keys/tokens (even "sandbox" ones) that devs share against the rules;
  • · exclusively for test automation — production deployments hook up the real services.

Wikipedia maintains a Comparison of API simulation tools and a page on service virtualization; the comparison table lacks the columns needed to differentiate.

The Markdown difference

  • · Markdown renders well on GitHub and other code portals;
  • · an XML payload in a Markdown code fence still looks like XML — verbatim and unescaped;
  • · a JSON payload in a Markdown code fence still looks like JSON — verbatim and unescaped;
  • · compare that to embedding the same body inside a JSON or YAML cassette, where it gets escaped into one unreadable string — the fenced block keeps the payload (and its diffs) legible.

Technology Compatibility Kits

With Servirtium, the HTTP conversations invoked by running tests are recorded and replayed from the same Markdown format. Two teams could negotiate changes over time in those recordings. If that were a vendor and a client, the client could ask for (say) hair color in a /get-person/{id} web-API by sharing an example in Markdown — ideally in a Git repo, with a test/spec example of use. The vendor could likewise communicate forthcoming changes via Servirtium Markdown conversations in Git repos (private or public).

Related contract ideas / tools (prior art)

OpenAPI (formerly Swagger) and RAML are complementary to Servirtium, not competitive — they describe the shape of an API (its endpoints, parameters and schemas), while a Servirtium tape is a concrete, recorded instance of real traffic. OpenAPI says what can happen; a tape is a specific exchange that did happen, replayable offline and diffable in Git. The two pair naturally: use OpenAPI as the contract, and Servirtium tapes as the worked examples and regression guards against it — a tape can even confirm that a real response still validates against the OpenAPI schema you publish.

TypeSpec (Microsoft's API-design language) sits one level up again: you author the API in TypeSpec and compile it down to OpenAPI, JSON Schema or protobuf as a build step. That's a design-time concern; Servirtium is a test-time one — and the two slot into the same pipeline. A CI build can emit the OpenAPI/JSON Schema from TypeSpec, replay the Servirtium tapes for the relevant endpoints, and then assert that each recorded response still validates against the freshly-generated schema. If TypeSpec changes the contract, the tape replay (or a git diff of the tapes) flags exactly where real traffic no longer matches — turning "the spec drifted from reality" into a failing build rather than a production surprise.

Postman or Postwoman remain tools you use to explore and learn web-APIs.

Ruby's VCR and its Betamax-style ports are the established record/playback service-virtualization tools — the closest prior art to Servirtium. They lack the canonical Markdown data model and the "git diff" TCK angle (see How this differs from Ruby VCR above), but could readily gain a mode that reads and writes Servirtium Markdown.

Mountebank has long offered an advanced way to programmatically mock web-APIs (and other wire protocols) and co-evolve them toward business deliverables; we hope it gains a "dumber" mode supporting our Markdown format. Similarly there's WireMock, Pact ("contract tests" since 2013), Netflix's Polly.js (2019), LinkedIn's Flashback (2017), Specto Lab's Hoverfly (2015), CA's Lisa (since 2014 — note it does not co-locate recordings with prod & test source), and Karate (Intuit's Peter Thomas). Contract testing is the same field as this, too.

Architectural paradigms

Service-Oriented Architecture (SOA) and Micro-Services are both better with Servirtium.

TCKs

TCKs — what's that about?

Technology Compatibility Kit is best known as a 2004 source set that Sun released to allow (subject to license) other implementation of Java. For Servirtium, a previously recorded set of HTTP interactions would be stored in source-control adjacent to the tests they correspond to and used in TWO broad ways:

Produced by OmniGraffle 7.12.1 2019-12-28 19:00:37 +0000 TCKs Layer 1 Build Infrastructure - Running Servirtium Automated Tests Developers* running tests on their workstations before committing/pushing functional changes to prod code (with tests obviously) Continuous Integration (CI) jobs running Servirtium tests in playback mode Jobs (non-CI / intermittent ) running Servirtium tests in record mode … with a deliberate job failure if different than before Continuous use of prior Servirtium test recordings (in playback mode) Push-triggered jobs running Servirtium tests for PR branches in playback mode Service tests that depend on Servirtium * and test engineers

Developers, test engineers and the CI-related automated jobs are running service tests in playback mode thousands or millions of times a day, and always pass quickly. If they fail that's something that can be fixed before commit/push and integration into trunk/master.

Because something could incompatible versus the "real" service an hourly or daily job is run in the same build infrastructure that runs the same tests in record mode. Those tests could fail, in which case the job fails an a developer investigates. If the service is flaky, this job can be run with retries=10 mode (whatever that is for the test framework) and may be coerced into passing. Sometimes this is needed because the vendor's infrastructure for "sandbox" is not as reliable as their production service. If the test suite passes, one last check is made on the Servirtium recordings (Servirtese?) before that build completes. That check is git diff and if there are any, then the job deliberately fails and a developer would be asked to investigate.

tck-diff-check.sh 1 # Your regular compile and invocation 2 # of pertinent service tests here: 3 4 your_compile.sh # job could fail here 5 your_test_suite_invocation.sh  # job could fail here, too 6 7 # passed, so far, but one more job failure opportunity based on 8 # differences in the recorded Servirtium Markdown (versus that committed) 9 10 theDiff=$(git diff --unified=0 path/to/module/src/test/mocks/) 11 if [[ -z "${theDiff}" ]] 12 then 13       echo " - No differences versus recorded interactions committed to Git, so that's good" 14 15 else 16       echo " - There SHOULD BE no differences versus last TCK 17       echo "   recording, but there were - see build log :-(  " 18       echo "**** DIFF ****:${theDiff}." 19       exit 1 20 fi

Oh and yes, a Markdown representation of one or more HTTP interactions is the easiest for seeing differences in a changed recording. That is because XML and JSON payloads sit in Markdown code fences, where they stay pretty-printed and verbatim. There's none of the escaping you'd get from cramming an XML body into a JSON (or YAML) cassette field — so the diffs read cleanly.

git diff diff --git a/src/mocks/averageRainfallForEgyptFrom1980to1999Exists.md b/src/mocks/averageRainfallForEgyptFrom1980to1999Exists.md index 7133dd2..5c714cc 100644 --- a/src/mocks/averageRainfallForEgyptFrom1980to1999Exists.md +++ b/src/mocks/averageRainfallForEgyptFrom1980to1999Exists.md @@ -21,6 +21,7 @@ Connection: keep-alive ``` Content-Type: application/xml +Connection: keep-alive Access-Control-Allow-Origin: * Access-Control-Allow-Headers: X-Requested-With Access-Control-Allow-Methods: GET

In this case World Bank's Climate API responded one day with an extra "Keep Alive" header. And the developer investigating decided it was probably OK to assume that it would be a regular feature of the API. XML or JSON payload differences can be multi-line of course. Committing the change (after a dev-workstation reproduction, means the same thing will not cause a TCK failure in the future.

History

Service Virtualization history

  1. Servirtium beta Released

    Autumn 2019

    Markdown record/playback syntax, a Java library, and examples: released. The key git-diff leveraging "TCK" (see below) aspect talked about too

    Servirtium-Java

  2. Service Virtualization the way it was

    2010 - 2019

    Centralized services (or a local daemon) that would record/playback (or manually stub/mock) HTTP services that stored recordings in JSON (or YAML) in source-control (or a centralized DB)

    Legacy SV technologies

  3. Before Service Virtualization

    1993 - 2010

    Your tests had to hit a shared 'integration' server's endpoints, where that always involved luck and some goodwill that the service was fast, online, consistent, and the version you wanted

    Before SV