Case study
Integration testing → DB-per-test, through real layers
The tension
Most "integration tests" are not. They mock the repository, fake the database, and verify that one method calls another. That tests the wiring, not the system. Real bugs live in the seams between rules — and rules only interact when state has accumulated.
Why the naive solution fails
[Fact]
public async Task BuyingAStockReducesCash()
{
var portfolio = new Mock<IPortfolio>();
portfolio.Setup(p => p.Cash).Returns(10000);
var service = new TradingService(portfolio.Object);
await service.BuyAsync("AAPL", quantity: 10, price: 150);
portfolio.Verify(p => p.SetCash(8500), Times.Once);
}
This test passes whether the production code is correct or not. It verifies that TradingService calls SetCash(8500). If SetCash is broken, if portfolio weight cascades incorrectly, if a fee calculation is wrong, the test sees nothing. It tests the interface contract, not the system behaviour.
The design rule
Tests run against realistic state, through the production Facade and BusinessService code. Time and external vendors are mocked. The application layers are not.
The simulation that builds the baseline is exhaustive: real student personalities, weeks of activity, real trades, real progression — all driven through real Facade methods. The resulting Postgres state is captured as a pg_dump and checked into source control. Every test starts from the same realistic state, executed through the same code that runs in production.
What was actually built
This pattern is in production on a gamified financial-literacy platform. The artefacts are real and named:
Facade/DataGeneration/DayLoopSimulation.cs — 1,172 lines, 45 days of simulated activity. Every active student per day rolls a personality-based login probability (Casual 50%, Moderate 80%, Engaged 95%), then executes 1–3 weighted actions — quizzes, stock buys and sells, arcade, shop, XP allocations, real-estate purchases, social actions, arena, weekly quests. Every action calls the real IStudentFacade and IGamerPageFacade. Not mocks. Not stubs.
The simulation is anchored: students freeze at their first occurrence of OnboardingCompleted, FirstStockBought, FirstAgreementSigned, ReachedLevel5, and seven others. The dump captures every system state a test could need.
BusinessServices.Tests/Integration/Postgres/baseline.dump — the pg_dump file, checked into git alongside the test code.
BaselineFixture — an xUnit fixture that, per test, spins up a postgres:18-alpine container, runs pg_restore --no-owner --no-privileges /tmp/baseline.dump, wires a fresh DI container with the real Facade + BusinessServices stack, exposes CreateScope() for the test to consume.
LedgerInvariantTests — concrete invariants the simulation must preserve and tests must verify:
- XP allocations stored with negative sign — earnings positive, allocations negative.
- No Gamer has a negative raw cash balance — overspending must be impossible.
- Every
StockOrdertargets a stock with a current price.
Evidence
The test project has the dump in it. Not generated at runtime — checked into source control, alongside ~25 test files organized by Page, Action, and Invariant.

A real test, not a contrived example. [Trait("Category", "DockerRequired")] so it can be filtered out on machines without Docker. [SkippableFact] so it skips gracefully when the baseline does not contain the precondition the test needs.

The test reads page state, calls a Facade action, checks resulting state — all through the same code paths a real user would hit.
What does not live here
- Pure-function logic.
xpForLevel(N) = 200 + (N - 1) × 50doesn't need a database. Unit tests cover it. - HTTP shape. Whether a controller maps a POST to the right Facade method is a wiring test.
- External vendors. TwelveData, Stripe, DeepL are stubbed at the boundary; integration tests don't reach the live vendor.
The pattern is for system behaviour under real state through real layers — not "every test must be slow."
What this proves Smallbox can do
Make a system safe to change before adding new work — by designing the testing pattern at the same time as the architecture. The simulation is itself a test: if 45 days of activity through the production Facade can run without invariant violations, the system is consistent.
The full simulation lives inside the gamified financial-literacy platform. The pattern generalizes to any system where business rules interact across multiple services.
Want your product in this shape?