smallbox

← All articles

Working with AI agents safely

Can I let Claude or Cursor touch this codebase?

How to use AI coding tools without creating false confidence

The risk with AI on a real codebase is not that it writes bad code. The risk is that it writes confident code about behaviour that does not exist.

A model given a function to refactor will produce a clean refactor with passing tests. The clean refactor sometimes preserves the original behaviour. The passing tests sometimes correspond to that behaviour. The diff is calm, the CI is green, the PR description is articulate. The team merges, and three weeks later an operator notices that a downstream report is showing different numbers, and the trail back to the AI session is no longer there.

This is not a failure of the tool. The tool did what it was asked. It is a failure of the frame. The System Report names the frame in writing — the AI Collaboration Strategy — and treats it as a precondition for any implementation work that will use AI. This article unpacks what that frame contains and why each piece of it is load-bearing.

What goes wrong by default

Without a frame, three failure modes recur. Each one is real, each one is common, and each one looks, for a while, like the tool working well.

Invented business rules. The model reads a function, decides what it must do, and writes code that does that. Must is the trap. The model has no source for what the function actually does in production. It infers. Sometimes the inference matches reality. Sometimes the inference is plausible and wrong. The output is fluent, structured, and incorrect — a much harder bug to spot than a syntactically broken one.

False-green tests. The model writes the implementation and the tests in the same session, against an expected value the model also chose. The tests pass because the model agrees with itself. This is the AI version of the author-wrote-both-sides trap — and at machine speed, a hundred such tests are produced in an afternoon, each one carrying the appearance of safety into the next decision.

Drift across sessions. The model held part of the architecture in its working context during the session, and now that context is gone. The next session starts from the code as it stands and re-derives the architecture. The re-derivation is slightly different. The codebase accumulates small inconsistencies — same problem solved twice, naming conventions split, error handling patterns multiplying — none of which are individually obvious, all of which compound. The discipline that handles this — memory for orientation, the code for truth, the map updated in the same commit as the code that falsified it — is worked through on a live system on the orientation-not-oracle method page.

The frame the System Report uses exists to prevent each of these by structure, not by hoping the model behaves.

The frame

Five things, all named, all load-bearing. None is optional once AI is in scope.

Documented context, in the repo. The model does not know what the team knows. It knows what is on disk in the working directory. The team's architecture, naming conventions, layer boundaries, forbidden actions, business rules, and current open questions all need to live in committed documentation that the model reads at the start of every session. Convention by tribal knowledge fails; convention by CLAUDE.md survives.

Named operating modes. The model does very different things in discovery, plan, scaffold, test-first, implementation, and verification mode. The same prompt produces different output depending on which mode the session is in. Treating this as informal — just write the code — collapses the modes into one and makes drift inevitable. The report's recommendation is to name the mode at the start of the session in writing, in the same chat, so both the human and the model agree what kind of work is happening.

Bounded sessions. A session has a start, a goal, and an end. The session is over when the goal is met or when the session has accumulated enough context that a fresh session would do better. Long sessions look productive and quietly degrade. The report's posture is that more context is not always better — past a certain point, accumulated context is noise the model has to filter, and the failure mode is invented detail.

Branch isolation and per-edit safety gates. Every AI session writes to a branch the human reviews. Every edit, especially destructive ones, passes through gates the human can deny. The session does not silently push, does not skip CI, does not commit --no-verify. The gates are not a sign of distrust; they are the part of the workflow that lets the speed exist at all. A model that cannot push without review is a model the team can run faster.

The same trust classification, applied to AI tests. AI-generated tests are subject to the same test-trust classification as human ones. A high-trust AI test — a real DB round-trip, a real HTTP request, a known-output diff — is fine. A low-trust AI test that mocks everything it touches is worse than no test, because it carries the appearance of safety. The team that does not classify AI tests treats coverage as a substitute for evidence, which is exactly the trap the model accelerates.

What the human still owns

The frame is not a way to delegate judgement. The model does not invent business rules; it does not decide which behaviours are intentional; it does not choose which areas of the system are off-limits. Those are human decisions, and they have to remain human decisions because the model has no source for them.

The split that works in practice — and the one the report writes down before any AI work begins — is roughly:

  • Intent, business rules, structural decisions, final review — human.
  • Code, tests, refactors, implementation drafts, verification scripts, documentation — AI, inside the frame.
  • Boundaries between the two — written down in the AI Collaboration Strategy at the start of the engagement, not invented per session.

When a rule is unclear, the question goes back to the human, not into the code. The model does not get to fill in the unclear bit. That single rule prevents most of the invented business rules failure mode, because the model is permitted to say I do not know what this should do — and is required to.

What this looks like in production

The pattern is in production on the structured-data surfaces case. The team uses AI to enrich typed fields on entities — SystemFunction, BindingConstraint, StructuralDifferentiator, StructuralVulnerability — not to write paragraphs of prose. Each field is its own column in the database, validated independently. The prompt is versioned in source control, v1 through v7+, with each version preserved for at least two iterations because rollback is non-negotiable. Hard validation gates run before storage: length caps, banned phrases, required structure, a removal test.

What that pattern does, in the language of this article, is let the AI do what it is good at — short, structured outputs constrained by typed fields — and refuse to let it do what it is not good at: writing free-text prose at scale that nobody reviews. The constraints are the frame. The frame is what makes the AI useful.

The opposite shape — ask the model nicely to describe ten thousand companies, and run the loop — produces output that drifts in tone, mixes true and plausible-sounding, repeats structural patterns, and quietly fails for edge cases. By the time anyone notices, the database is full and the volume hides the rot. The structural-data case writes that lesson down explicitly. It is not an argument against AI. It is an argument for the frame.

What this tells you

Three moves, in order.

Write the frame down before the first session. Documented context, named modes, bounded sessions, branch isolation, trust classification. The frame is short and goes in the repo. It costs an afternoon. It pays back the first time a session would otherwise have invented a business rule.

Treat AI-generated tests with more scepticism than human ones. The model is faster than human review, so the test-to-review ratio is higher. A team that adopts AI without raising the test-trust bar is a team whose suite drifts low-trust, faster.

Refuse to delegate intent. If the rule is unclear, the question goes to a person. The model is welcome to write the code that follows the rule once the rule is decided. The order matters.

Where this fits in a System Report

The AI Collaboration Strategy is one of the artefacts the System Report produces when implementation will use AI. The strategy names the context-document set, the operating modes, the session boundaries, the human/AI responsibility split, and the per-edit safety gates. It is not generic. It is shaped against the specific codebase, the specific access boundary, and the specific risks the Business-Use Map puts at the top.

If AI is in your implementation plan, the cheapest investment is the frame. Without it, the speed shows up first and the cost shows up later. With it, both stay visible, and the team gets to keep the speed.

Articles describe the lens. The questions a System Report asks are how that lens is applied to your system.

← All articles