Values

What quality of change means.

A backend is not good because it is elegant, recently written, or covered in tests. A backend is good when the next important change can happen with less guessing, less hidden coupling, and better evidence than the change before it. Everything below is a value in service of that one measure.

Each section is written in the same shape: the value, what it means in practice, the failure it prevents, and the place in our own running system where it can be checked. None of the anchors are aspirational. They are surfaces, files, or admin pages that exist today — including the ones where the value is partially leaking, because pretending otherwise would defeat the point.

Quality of change is the centre.

The job is not to make the system clever. The job is to make the next valuable change cheaper, safer, and better-grounded than the change before it. That framing governs every other choice on this page — what counts as evidence, which boundaries matter, which gates are worth building, which tests are worth keeping.

What it means. A healthy backend raises the floor for whoever touches it next, not just the person writing the current commit. Each change either reduces hidden coupling, narrows blast radius, exposes a previously invisible assumption, or adds a check the system will keep running on its own. A change that does none of those may still be useful, but it has not improved the system's ability to change.
What it prevents. Backend work drifting into a craft contest — elegant refactors with no downstream consequence, micro-optimisations on cold paths, the kind of activity that looks like progress but leaves the next change exactly as expensive as it was.
Where it shows up. The System Report ranks findings by their effect on the next valuable change, not by technical ugliness. A messy file with no business consequence ranks below a clean-looking flow that quietly blocks the work the team actually needs to ship next.

Evidence beats confidence.

A build log, a deploy message, a green test suite, an AI answer, a README, an internal doctrine file — each of those is a signal. None of them is the truth. The discipline is to hold signals and ground-truth at different weights, and to widen the search when they disagree.

What it means. Memory routes you to where to look. Code, database state, deployed binaries, and production behaviour answer whether the thing you remember is still true. For any non-trivial claim — “does X exist?”, “is Y still wired up?”, “where is Z owned?” — the cost of grepping is small and the cost of being confidently wrong is large.
What it prevents. Confident inference against a map that no longer matches the system. The shape of failure: the doc, the memory, and the build log all say the change shipped; the running binary says it did not.
Where it shows up. A live example is documented in full on orientation, not oracle — a deploy that read complete across three repositories while serving yesterday's SDK from disk. The page shows the file-timestamp comparison that broke the tie.

Inspectability beats opacity.

A backend should emit enough evidence about itself that a human can answer “what is running, what just failed, and does the map still match the terrain?” without reading code. Inspectability is not logging volume. It is the property that the questions that matter have surfaces.

What it means. Each subsystem identifies itself by its real name. Startup emits a heartbeat. Failures land somewhere that a reviewer can scan in a few seconds, not a folder that no one opens. Names are full, not short — stock-observation-engine, not engine — because short names hide drift.
What it prevents. The class of failure where a subsystem stops working and nobody notices for a week because nothing alerted, nothing surfaced, and the dashboard kept rendering cached numbers. Opaque systems fail silently because that is the only direction available to them.
Where it shows up. The CompanyGraph admin has a Logs page that aggregates startup heartbeats from every subsystem — twelve services, each in a stable colour, deduplicated to one row per service. A reviewer scans the page; a missing colour is a missing subsystem. The page is pictured on orientation, not oracle; the wider set of surfaces — batch health, exceptions, vendor usage, and a live test board — is walked through on built to show its pulse.

Coherent ownership beats scattered behaviour.

Each part of the system should own a kind of change. Where ownership is scattered, every change becomes political — three people, three intuitions, three places the rule might live. Where ownership is coherent, the question “where does this belong?” has one answer, and the answer points at one file.

What it means. Identity and truth live in one place per concept. Composition — the work of stitching several truths together for one product — lives in a separate seam from the truths themselves. A subsystem owns its content in-repo; the database row is a cache, not the authoring surface.
What it prevents. The slow failure mode where the same business value is computed two different ways in two different files, drifts apart over a year, and produces a bug that takes a week to track because both answers look reasonable in isolation.
Where it shows up. CompanyGraph is one product split into a composition backend and several extracted subsystems — stock-service, stock-observation-engine, content-management-system, image-service, email-service, and logger-service. A three-bucket rule decides what belongs where; references across a subsystem boundary go by id only, and contract drift surfaces as a compile error, not a runtime parse failure.

Reality-anchored verification beats green theatre.

A green test suite proves something in proportion to what its tests are anchored against. A test that mocks every input and asserts the mocks were called proves only that it agrees with itself. The question is not “are the tests green?” The question is “what reality independent of the test do these tests actually check?”

What it means. High-trust verification round-trips an entity through a real database, sends a real request to a running system, or compares output against a confirmed example. Heavy mocking is a regression alarm at best; it is not evidence the system is correct. A test that mirrors the implementation so closely it would pass any compiling implementation gets cut, not kept.
What it prevents. The false-confidence pattern where the suite is large, green, and load-bearing in the team's reasoning — while the actual coverage against production behaviour is thin.
Where it shows up. The integration model on the verification surface builds the test database through the Facade the application itself uses, with mocked time and simulated users, then clones that baseline for each test. Tests run against a database the system produced, not against rows a test author invented.

Gradual, reversible evolution beats big-bang confidence.

A large change that lands in one step is a bet that the assumption set is complete. It usually is not. The discipline is to evolve through stages where rollback, comparison, and learning are cheap, and to refuse the kind of step that buys speed by giving up reversibility.

What it means. Add the new path next to the old one. Read from both in parallel. Confirm the new path produces the same answer. Remove the old path only after the new one has survived real traffic for a defined window. Each intermediate state must be a state the system can live in safely, not a half-step that has to be finished before the lights go out.
What it prevents. The hero-deploy failure mode where a single migration carries six assumptions, one of them is wrong, and the rollback path was never exercised because the change was “simple enough not to need one”.
Where it shows up. The CompanyGraph subsystem extractions ran as named phases — each one shippable on its own, each one reversible. The bigger version of this shape is offered to clients as the Parallel Replacement Beta: build the candidate alongside the old system, prove it under comparison, transition only when parity is confirmed.

Honest coverage beats process theatre.

A gate is only as valuable as its honesty about its own limits. A pre-commit hook that refuses two kinds of layer violation and is silent on a third is useful in proportion to whether the team knows which third it does not catch. A gate that claims full coverage when it has partial coverage is worse than no gate at all, because it teaches the team to stop watching.

What it means. Every gate states what it catches today and what it does not catch yet. The uncaught case is not buried in a tracker; it is named on the page that documents the gate, next to the rule the gate is meant to back. The shape of an honest gate is: this is the line, here is what enforces it, here is the hole, and here is the next iteration that closes it.
What it prevents. Process theatre — the slow corruption where the team trusts a green checkmark that does less than its name implies, and the gap between the rule and the enforcement widens silently.
Where it shows up. The code discipline page shows the pattern explicitly — a layer rule that the pre-commit gate enforces for two cases, a third case the gate does not enforce yet, and a live tuple leak in production that the silence let through. The leak is named on the page, with the file and line, before it has been fixed.

The shape these values produce.

Read together, these are not seven independent preferences. They describe one shape: a system that can be inspected, whose ownership is local, whose tests anchor against something other than themselves, whose changes are reversible and small enough to learn from, whose gates are honest about their coverage, and whose decisions are settled by evidence rather than confidence. Everything reduces to making the next valuable change cheaper, safer, and better-grounded than the change before it.

None of this is novel as a list. What makes it Smallbox-specific is that each value is paired with a place in the running system where it can be checked, and the failures of each value are named next to the rule rather than hidden. The published map is allowed to be one commit behind the running code; it is not allowed to claim coverage it does not have.

The point is not perfect systems. The point is systems that make their own drift visible — and engineers who change the map in the same commit as the code, every time the two get out of step.

This is the same lens we'd bring to your system.

The System Report is one application of these values to an inherited backend. The method pages walk through the rule, the live evidence, the gate, and the honest gap, on code that is running right now. Each is a small dose of the same thing.

Start the conversation See the System Report →How the System Report thinks →