Service Virtualization for tests
Like Ruby's VCR — record the HTTP your tests make, replay it offline — but the tape is readable Markdown you can share across languages and review in a Git diff.
One native engine · seventeen language bindings · byte-identical tapes
## Interaction 0: GET /person/42 ### Response body (200: application/json): { - "name": "Ada Lovelace" + "name": "Ada Lovelace", + "hairColor": "auburn" }
About
Like Ruby's VCR, Servirtium records the HTTP conversations your tests make and replays them later without calling the real service — but the recording format is canonical, human-readable Markdown, so tapes can be shared across languages and API drift is reviewable in a Git diff.
For teams that love Agile, CI/CD, DevOps, or fast builds that devs can run on their workstations too. The name: SERvice VIRTualization IUM (inspired by Selenium).
Eliminate the slow, flaky and costly parts of working against someone else's API in your test suite.
Ship language-agnostic example scenarios — "out of credit at time of purchase", "increase an auction bid by a specified amount" — as Markdown tapes your clients can replay.
Shipping Servirtium tapes is a developer-experience play. A vendor that hands clients ready-to-replay recordings lets those clients build and test against the API offline, for free, without sandbox keys or flaky network calls — so integrations land faster and churn less. Better DX wins evaluations and grows market share, which is exactly the incentive of a challenger trying to pull developers away from an incumbent.
Conversely, a vendor that is already dominant in its market has little reason to promote Servirtium: friction-free, portable integration makes it easier for customers to evaluate and switch to rivals. The vendors most likely to embrace shareable, Git-diffable tapes are the ones competing on openness and developer goodwill — not the ones relying on lock-in.
Servirtium VCR
servirtium-vcr is the current direction for Servirtium. Instead of a separate re-implementation per language, there is a single record/replay engine — Markdown parser/emitter, HTTP server, request matching, redaction, whole-tape normalization and drift detection — built once as a native shared library, with a thin FFI binding per language on top.
Because every language drives the same engine, cross-language compatibility is a build-time guarantee, not something each library re-derives and drifts on. One monorepo, seventeen bindings, one build.
| Language | Binding mechanism | In the monorepo |
|---|---|---|
| Go | cgo | go/ |
| Python | ctypes | python/ |
| Java | FFM / Panama (JDK 22+) | java/ |
| .NET | P/Invoke | dotnet/ |
| Rust | libloading | rust/ |
| Ruby | Fiddle | ruby/ |
| JavaScript & TypeScript | koffi (Node) | javascript/ |
| Dart (and Flutter) | dart:ffi | dart/ |
| PHP | ext-ffi | php/ |
| Haskell | foreign import ccall | haskell/ |
| Elixir | shares the Erlang NIF (BEAM) | elixir/ |
| Pharo (Smalltalk) | UnifiedFFI | pharo/ |
| Nim | importc (linked) | nim/ |
| Zig | extern C (linked) | zig/ |
| Lua | C extension (Lua 5.4) | lua/ |
| Erlang | C NIF (canonical, shared) | erlang/ |
| Gleam | shares the Erlang NIF (BEAM) | gleam/ |
The seventeen bindings build and test through one aeb run against the shared libservirtium_vcr.so. See the monorepo README for per-language quick-starts.
On the JVM, the Java binding also serves Kotlin, Scala, Clojure and Groovy — thin idiomatic layers (a Kotlin trailing-lambda DSL, a Groovy Closure DSL, Scala helpers, Clojure fns + with-open) over the same jar, with no second native FFI.
Looking for the independent http4k implementation or the earlier standalone per-language libraries? See Other & earlier implementations.
Markdown format
This is the tape file your tests commit and diff. One readable Markdown document captures the whole HTTP conversation — request, response, headers, bodies — so a change to the API shows up as an ordinary line-by-line diff in code review.
Each payload sits in a Markdown code fence, so the XML or JSON appears verbatim — pretty-printed and unescaped. Compare that with stuffing an XML body into a JSON (or YAML) cassette, where it has to be escaped into a single unreadable string with \" and \n everywhere. A fenced block keeps the payload exactly as it went over the wire, which is what makes the diffs legible.
Here's some raw Servirtium markdown source — shown as inline SVG, so the text is real (selectable, searchable) and stays crisp at any zoom:
(actual source file: github.com/servirtium/site/examples/example1.md)
Drop that same file in a repo and a code portal renders it beautifully — here's what GitHub makes of it. The rendered view is the default, and the exact bytes are always one click away via the Raw button (or ?plain=1):
That's the whole point: one file that's human-inspectable in raw form and that your code portal renders in a pretty way too. You get the readable rendered view for review and the exact raw bytes for diffing — no trade-off between the two. If your code portal is GitHub, then 'pretty' is true.
(rendered on GitHub: github.com/servirtium/site/examples/example1.md)
... and you'd be storing that VCS as you would your automated tests.
## Interaction N: <METHOD> <PATH-FROM-ROOT>N starts as 0, and goes up depending on how many interactions there were in the conversation. <METHOD> is GET or POST (or any standard HTML or non standard method/verb name). <PATH-FROM-ROOT> is the path without the domain & port. e.g. /card/addTo.doIt Each interaction has four sections denoted by a *Level 3 Markdown headers
### Request headers recorded for playback:### Request body recorded for playback (<MIME-TYPE>):. And <MIME-TYPE> is something like application/json ### Response headers recorded for playback:### Response body recorded for playback (<STATUS-CODE>: <MIME-TYPE>):Within each of those there is a single Markdown code block (three back-ticks) with the details of each. The lines in that block may be reformatted depending on the settings of the recorder. If binary, then there is a Base64 sequence instead (admittedly not so pretty on the eye).
You'll write your test (say JUnit) and that will use a library (that your company may have written or be from a vendor). For recording you will swap the real service URL for one running a Servirtium middle-man server (which itself will delegate to the real service). If that service is flaky - keep re-running the test manually until the service is non-flaky, and commit that Servirtium-style markdown to source-control. Best practice is to configure the same test to have two modes of operation: 'direct' and 'recording' modes. This is not a caching idea - it is deliberate - you are explicitly recording while running a test, or not recording while running a test (and doing direct to the service)
Anyway, the recording ends up in the markdown described in a text file on your file system - which you'll commit to VCS alongside your tests.
Those same markdown recordings are used in playback. Again an explicit mode - you're running in this mode and it will fail if there are no recordings in the dir/file in source control.
Playback itself will fail if the headers/body sent by the client to the real service (through the Servirtium library) are not the same they were when the recording was made. It is possible that masking/redacting and general manipulations should happen deliberately during the recording to get rid of transient aspects that are not helpful in playback situations. The test failing in this situation is deliberate - you're using this to guard against potential incompatibilities.
For example any dates in headers of the body that go from the client to the HTTP Server could be swapped for some date in the future like "2099-01-01" or a date in the past "1970-01-01".
The person who's designing the tests that recording or playback would work on the redactions/masking towards an "always passing" outcome, with no differences in the markdown regardless of the number of time the same test is re-recorded.
Note: How a difference in request-header or request-body expectation is logged in the test output needs to be part of the deliberate design of the tests themselves. This is easier said than done, and you can't catch assertion failures over HTTP.
Note2: this is a third mode of operation for the same test as in "Recording a HTTP conversation" above - "playback" mode meaning you have three modes of operation all in all.
What it is
Servirtium records HTTP conversations from your tests and replays them later without calling the real service. Unlike most VCR-style tools, the recording is canonical Markdown under source-control, so teams using different languages can share the same tapes. It aims to be a lingua franca for mock HTTP conversations:
If you know VCR (or a Betamax-style port), you already know the workflow. Servirtium keeps the familiar record/replay loop and adds:
The same record-once, replay-forever loop you'd get from VCR.
Human-readable Markdown recordings, not YAML/JSON cassettes.
Record in one language, replay in another.
Tape diffs document API compatibility — or drift.
One native engine driven by thin per-language bindings.
Record with a Ruby Test::Unit library, replay it from Java JUnit — useful when the publishing and consuming teams use different stacks.
Wikipedia maintains a Comparison of API simulation tools and a page on service virtualization; the comparison table lacks the columns needed to differentiate.
With Servirtium, the HTTP conversations invoked by running tests are recorded and replayed from the same Markdown format. Two teams could negotiate changes over time in those recordings. If that were a vendor and a client, the client could ask for (say) hair color in a /get-person/{id} web-API by sharing an example in Markdown — ideally in a Git repo, with a test/spec example of use. The vendor could likewise communicate forthcoming changes via Servirtium Markdown conversations in Git repos (private or public).
OpenAPI (formerly Swagger) and RAML are complementary to Servirtium, not competitive — they describe the shape of an API (its endpoints, parameters and schemas), while a Servirtium tape is a concrete, recorded instance of real traffic. OpenAPI says what can happen; a tape is a specific exchange that did happen, replayable offline and diffable in Git. The two pair naturally: use OpenAPI as the contract, and Servirtium tapes as the worked examples and regression guards against it — a tape can even confirm that a real response still validates against the OpenAPI schema you publish.
TypeSpec (Microsoft's API-design language) sits one level up again: you author the API in TypeSpec and compile it down to OpenAPI, JSON Schema or protobuf as a build step. That's a design-time concern; Servirtium is a test-time one — and the two slot into the same pipeline. A CI build can emit the OpenAPI/JSON Schema from TypeSpec, replay the Servirtium tapes for the relevant endpoints, and then assert that each recorded response still validates against the freshly-generated schema. If TypeSpec changes the contract, the tape replay (or a git diff of the tapes) flags exactly where real traffic no longer matches — turning "the spec drifted from reality" into a failing build rather than a production surprise.
Postman or Postwoman remain tools you use to explore and learn web-APIs.
Ruby's VCR and its Betamax-style ports are the established record/playback service-virtualization tools — the closest prior art to Servirtium. They lack the canonical Markdown data model and the "git diff" TCK angle (see How this differs from Ruby VCR above), but could readily gain a mode that reads and writes Servirtium Markdown.
Mountebank has long offered an advanced way to programmatically mock web-APIs (and other wire protocols) and co-evolve them toward business deliverables; we hope it gains a "dumber" mode supporting our Markdown format. Similarly there's WireMock, Pact ("contract tests" since 2013), Netflix's Polly.js (2019), LinkedIn's Flashback (2017), Specto Lab's Hoverfly (2015), CA's Lisa (since 2014 — note it does not co-locate recordings with prod & test source), and Karate (Intuit's Peter Thomas). Contract testing is the same field as this, too.
Service-Oriented Architecture (SOA) and Micro-Services are both better with Servirtium.
TCKs
Technology Compatibility Kit is best known as a 2004 source set that Sun released to allow (subject to license) other implementation of Java. For Servirtium, a previously recorded set of HTTP interactions would be stored in source-control adjacent to the tests they correspond to and used in TWO broad ways:
Developers, test engineers and the CI-related automated jobs are running service tests in playback mode thousands or millions of times a day, and always pass quickly. If they fail that's something that can be fixed before commit/push and integration into trunk/master.
Because something could incompatible versus the "real" service an hourly or daily job is run in the same build infrastructure that runs the same tests in record mode. Those tests could fail, in which case the job fails an a developer investigates. If the service is flaky, this job can be run with retries=10 mode (whatever that is for the test framework) and may be coerced into passing. Sometimes this is needed because the vendor's infrastructure for "sandbox" is not as reliable as their production service. If the test suite passes, one last check is made on the Servirtium recordings (Servirtese?) before that build completes. That check is git diff and if there are any, then the job deliberately fails and a developer would be asked to investigate.
Oh and yes, a Markdown representation of one or more HTTP interactions is the easiest for seeing differences in a changed recording. That is because XML and JSON payloads sit in Markdown code fences, where they stay pretty-printed and verbatim. There's none of the escaping you'd get from cramming an XML body into a JSON (or YAML) cassette field — so the diffs read cleanly.
In this case World Bank's Climate API responded one day with an extra "Keep Alive" header. And the developer investigating decided it was probably OK to assume that it would be a regular feature of the API. XML or JSON payload differences can be multi-line of course. Committing the change (after a dev-workstation reproduction, means the same thing will not cause a TCK failure in the future.
History
Markdown record/playback syntax, a Java library, and examples: released. The key git-diff leveraging "TCK" (see below) aspect talked about too
Servirtium-Java
Centralized services (or a local daemon) that would record/playback (or manually stub/mock) HTTP services that stored recordings in JSON (or YAML) in source-control (or a centralized DB)
Legacy SV technologies
Your tests had to hit a shared 'integration' server's endpoints, where that always involved luck and some goodwill that the service was fast, online, consistent, and the version you wanted
Before SV
News
Implementation notes, demos, and writing on TCKs and service virtualization.