The four properties of safe change

When somebody says a change is "safe to make", they usually mean one of three things. I read the code and I think this works. Or the tests pass. Or we'll find out in production. None of those is enough on an inherited system, and the System Report does not let any of them count as evidence on their own.

A change is safe when it is observable, testable, reversible, and confirmable. Those four properties are the gate the report uses before recommending any work — and they are concrete enough that a non-engineer can hold a recommendation against them and ask, which one of the four is missing?

This article unpacks each one.

Why "the code looks fine" is not the standard

The common mistake on inherited systems is to read the code, satisfy yourself that the new behaviour is correct, and ship. This works on a system you wrote. It fails on a system somebody else wrote, because the code rarely contains the full meaning of the change. Some of the meaning lives in the database state, in the deployment, in a partner integration, in an admin tool, in an old timezone assumption, in the heads of two operators. A change that is correct in code can still break the business.

The four properties are designed to catch that.

Observable

You can see whether the change worked, in something other than the code. A log line, a metric, an admin view, a row count, a queue depth — somewhere outside the function that was changed. If the only place the change is visible is in the source file, the change is not observable.

The trap on inherited systems is that observability is patchy. The new feature ships with a Datadog dashboard; the legacy area it touches has neither. A "successful" deploy that breaks an admin export goes unseen for three weeks. Before recommending a change in a low-observability area, the report asks: what evidence will tell us whether this change worked? If the answer is "the user will email us when something breaks", the recommendation is to add the observation hook before the change, not after.

Testable

There is a way to exercise the new behaviour against something other than itself. The phrase "other than itself" is what carries the weight. A test where the author wrote both the implementation and the assertion — usually with a heavy mock between them — proves only that the test agrees with itself. A test that round-trips a real entity through the production code into a real database, and back out, anchors against reality.

The test-trust classification draws this distinction in detail. The short version: a green test is not, on its own, evidence. A test is evidence to the degree that it anchors against an independent source — a real database, a real HTTP request, a real existing output, a business rule confirmed by the product owner.

When a change touches code that has no anchor against reality, the report's first move is usually a characterization test — a test that captures what the code does today, on real input, before any change is made. The pattern in production looks like the DB-per-test setup — every test starts from a frozen real-state dump, runs through the same Facade and BusinessService code that runs in production, and verifies behaviour the test author did not invent.

Reversible

If the change goes wrong, you can undo it without throwing the whole system away. Reversibility on inherited systems is rarely free. It is built — sometimes with a feature flag, sometimes by leaving the old function in place and routing around it, sometimes with a forward-only migration that has a corresponding backfill, sometimes with a database column that is added before the old one is removed.

The mistake is treating reversibility as an afterthought. By the time you need to revert, the new path has accumulated state that the old path does not understand. A user signed up under the new flow; a payment ran through the new pricing rule; a row was written into a table that did not exist last week. Reverting now means writing a second migration to bring the new state back into the old shape — work that is expensive precisely when you do not have time for it.

The report names reversibility as a property the recommendation must carry. If the change cannot be reversed cheaply, the recommendation is either to make it reversible first, or to call it irreversible and require a much higher confidence bar before shipping.

Confirmable

Behaviour that is weird, surprising, or undocumented gets signed off by someone who knows the business before it changes. This is the property most often skipped, because confirmation feels slow. A new engineer reads a function, decides it does not make sense, deletes it, and finds out a month later that it was the rounding rule for a specific class of invoice that one customer cared about.

Confirmation is a person, not a test. A long-tenured operator, a product owner, a support lead, a finance person. The report's job is to identify which behaviours need confirmation before change — usually behaviours flagged in the five-kinds-of-weird-code classification as known business rule or accidental-but-relied-upon — and to name who needs to sign off, by role.

When confirmation is impossible — the original team is gone, the product owner is new, the customer who cared has churned — the report says so, and the recommendation either narrows the scope (only change the parts that are confirmable) or gates the change behind a long observation window.

What it looks like in practice

A finding in a System Report does not say "fix this". It says, "this is the change, and here is how each of the four properties is satisfied".

Recommendation. Extract OrderService ownership of the discount calculation.

Observable — adds a structured log on every applied discount; an admin view counts discount events per day; both ship before the extraction.

Testable — a characterization test captures the existing discount on a sample of 200 historical orders; the new path must produce the same result on each.

Reversible — the new path is gated behind Features.UnifiedDiscount. The old code stays in place for two release cycles.

Confirmable — the partner-pricing operator (Marie, Operations) reviews three sample discounts on the new path before the flag is enabled in production.

If any of the four cannot be filled in, the recommendation is incomplete and the report says so. We cannot make this confirmable today; the operator who would sign this off has not been hired yet. That is a finding in its own right — and often the precondition that determines whether the engagement should refactor, pause to build a safety net, or rewrite.

Where this fits in a System Report

The four properties live in Part I.5 of the report's process. They are the standard the report holds itself to before any recommendation is written. The Change-Safety Plan, the Implementation Strategy, and the test-trust classification are all instruments for filling each property in.

The four properties also work as a tool you can apply on your own, before you hire anyone. Take any change you are considering. Ask the four questions. The one you cannot answer is the one to invest in next — usually the cheapest move you can make to widen the surface of changes the team can ship safely.

Articles describe the lens. The questions a System Report asks are how that lens is applied to your system.

Other articles in this cluster →See the System Report →Send your system context →

← All articles