The five kinds of weird code in a legacy codebase

Every legacy codebase has weird code. The mistake teams make is treating it as one category — legacy cruft — and trying to clean it up on the same Friday afternoon. The result, every few years, is a quiet outage somewhere downstream of the change, traced back to a function that "looked unused" and turned out to be the only thing keeping a partner integration alive.

The System Report classifies weird behaviour into exactly five kinds. The classification matters because each kind has a different correct response, and three of the five are do not delete this on a Friday.

The five categories

Every piece of weird code in an inherited system falls into one of these. The label is decided by evidence, not by feel.

Known business rule. The code looks weird because the business is weird. Customers signed up before 2018 are billed monthly in advance; everyone else in arrears. The pricing logic for that does not look elegant in a vacuum, but it is correct. The right action is to preserve and document — usually by adding a comment that names the rule and the owner, not by refactoring the surface.

Accidental but relied upon. The code does something the original author did not intend, and a customer or operator has come to depend on it. The export rounds to the nearest cent in a way the spec does not require, and a partner reconciles their books against that rounding every month. The right action is preserve, characterise with a test, ask product later. The trap is to "fix" the rounding and break the partner.

Dead. The code has no callers, no traffic in the last 12 months, no recent edits, and the owner agrees it is gone. The right action is remove — but only after evidence is collected, not on intuition. Looks unused is not the same as is unused.

Unknown. The code does something, but nobody on the team can explain why. Logs do not help; tests do not exist; the original author is gone. The right action is do not refactor blindly. The behaviour goes onto the focused-question list and either gets confirmed by the business or stays on the list. The correct posture toward unknown code is humility, not removal.

Unsafe. The code has known correctness or security problems and is exposed enough to matter. A SQL injection that has not been exploited yet. A race condition that fires once a year and corrupts a row when it does. A retry loop with no idempotency that double-bills under load. The right action is stabilise — with a characterisation test and a targeted fix — before implementation work touches the area.

These five are exhaustive. Every piece of weird code the report flags is labelled with one of them, with evidence attached.

Why the labels matter

The same line of code can be in any of the five categories depending on what is true about it that the line itself does not say. Whether a function is dead or known business rule is not a property of the function; it is a property of the function plus the world around it. The label is what tells the team how to act.

A team without the labels falls into one of three failure modes.

The clean-it-up failure. Every weird thing is treated as cruft. Some of it is. Most of it is not. The clean-up changes behaviour the business depended on, and the bug is found two months later by an operator who is now angry.

The do-not-touch failure. Every weird thing is treated as load-bearing. Code that is genuinely dead stays in the system, and the team grows afraid of every file. New work slows because the surface area of "do not touch" grows faster than the surface area of "safe to change".

The guess-and-ship failure. The team makes the call on each piece of weird code by feel, in the moment, under deadline. The classification is not written down. Two engineers make different calls on similar code, and the inconsistency is invisible until the next regression.

Naming the five categories — and writing the label down per case — collapses all three failure modes into one routine: what is the evidence, what is the label, what is the correct response.

How the labels are decided

The label is a function of evidence, not of how the code looks. The report assigns it using a checklist that is short on purpose:

Are there callers? Static + dynamic — git grep, runtime traces, logs.
Is there recent traffic? Production logs, request counts, batch run history.
Is there a known owner? Code, product, or operations.
Is there a test? At what trust level?
Has the business confirmed the behaviour is intentional?
Has the area had an incident in the last 24 months?

The answers map to one of the five labels. No callers, no traffic, owner agrees → dead. Callers exist, behaviour is surprising, business confirms it is intentional → known business rule. Callers exist, behaviour is surprising, nobody can confirm → unknown. And so on. The rule is that no answer never produces a delete recommendation.

A real example, drawn from the kind of finding the batch service pattern surfaces: a small executor that runs once a week and writes a column nobody references. Looks dead. The traffic check confirms no readers. The owner check finds that the column feeds an annual partner reconciliation that runs on the third quarter — i.e. it is not dead, it is known business rule, used quarterly. The label saves the column. Without the check, the executor disappears, and three months later the partner integration is missing a number nobody knows how to compute anymore.

The unsafe case in particular

The unsafe label deserves its own attention because the response to it is the most aggressive of the five. Unsafe code must be stabilised before any other work touches the area. Stabilisation means a characterisation test that captures the dangerous behaviour explicitly, a targeted fix that addresses only the unsafe part, and an observation hook that confirms the fix did what it claimed.

The trap with unsafe code is two-sided. Touching it without stabilising is risky. Not touching it is also risky — it is the area where the next incident is going to come from. The report does not let either become the default. It names the unsafe area, recommends the smallest stabilising change, and refuses to recommend feature work in the same area until the safety net is in place.

What this tells you

Three moves, in order.

Stop sorting weird code into one bucket. Every weird thing in the system is one of five things. Until the label is written down, every decision about it is a guess.

Bias toward humility on unknown code. On an inherited system, the right default for unknown behaviour is leave it alone and add it to the focused-question list. The team will be tempted to delete; the business will be glad they did not.

Treat unsafe code as the first cleanup, not the last. The unsafe label is the only one whose response is do this before anything else. Most teams do the opposite — feature work continues, and the unsafe code is left for "later". Later is when the incident happens.

Where this fits in a System Report

The classification lives in Pass 6B of the System Report process. It is the lens through which weird behaviour is read across all 12 passes, and the input that shapes characterisation testing priorities. Every weird thing the report names carries a label, evidence, and a recommended response.

If a piece of code in your system makes you uneasy, the right next move is not to clean it up. The right next move is to put a label on it. The labels are what make the rest of the work safe.

Articles describe the lens. The questions a System Report asks are how that lens is applied to your system.

Other articles in this cluster →See the System Report →Send your system context →

← All articles