smallbox

← All articles

What one capability unlocks

What has to stay yours when the model belongs to someone else?

Generate text with a model you don't own — and a record you do

A capable model is dangerous in a specific way: it's so good that it feels like the product. You send it text, it returns something better than you'd have written, and the temptation is to let the provider's endpoint simply be the feature. That works in a demo. It's the part you keep on your own side that decides whether it works the month after.

What the capability gives you

You're renting intelligence. A language model will generate, summarise, extract, classify, rewrite, and hold a conversation — fluently, across domains, and better every few months. It's strong, it's improving, and it is not yours. It runs on someone else's machines and changes on their schedule, without asking you. Renting it is the right call; almost nobody should train their own. The mistake isn't renting — it's renting and keeping no record of what you rented.

What stays yours, and what doesn't

The model, the weights, the inference — theirs, rightly rented. Five things have to stay yours, and together they're the difference between "we use AI" and "we can answer for what our AI did":

  • The prompt. The instruction is your product logic. It belongs in your code, versioned, not pasted into a console and forgotten.
  • The output. What the model actually said, stored — not just shown to the user and discarded.
  • The model version. Which model produced this, on which date.
  • The verdict. Whether that output passed your own check before you used it.
  • The gate. Whether this generation runs at all — a switch you control, not one buried in a vendor account.

What breaks when it's hacked in

The failure mode is treating the endpoint as a function that always returns the right answer. Three concrete breaks follow from it.

You can't reproduce what happened. With no record of the output and the model version, the day the provider updates the model — they will, quietly — yesterday's behaviour is gone and you can't tell whether something that changed is your fault or theirs.

Failure hides instead of surfacing. The reflex when an output is wrong is to retry until it's right. That builds a loop that conceals the failure rate instead of showing it. The more honest design is to validate once, log the failure, and decide on repair deliberately — not to paper over it automatically.

The prompt grows into a wall of pleading. Every bad output tempts you to add another "do not" line, until the prompt is a paragraph of rules the model may or may not honour. Rules that matter belong in your code, where they can be tested. A prompt teaches best by example, not by prohibition.

Where it already runs

CompanyGraph generates company descriptions with Anthropic models — and keeps that generation gated off until the output quality is validated, precisely because the model is capable is not the same claim as the output is shippable. The gate is one of the five things above, exercised in production.

The translation service shows the one place a retry is earned. When a translation fails its check, it doesn't quietly try again and hope. It escalates on a fixed path — a cheaper model, then a stronger one, then one last fresh attempt — logs every step, and if it still fails, stops and flags the row for a human instead of looping forever. That isn't a contradiction of validate once; it's escalation with a visible end, not a hidden retry that buries the failure rate. Either way the record is kept: what was asked, what came back, which model, and what the verdict was.

Where it shows up in a build idea

Every build idea with AI in it rides on this — an image generator's prompt and result, a support chatbot's grounded answer, a tutor's explanation, a moderation decision. In each, the model call is the easy part and the record-plus-gate is the product. The capability note next to this one — what a payment provider should never be the only record of — makes the identical argument about Stripe, because it's the same discipline applied to a different rented service.


This way of using a model is part of the foundation — the gate, the log, and the validation run in production on CompanyGraph's content generation today. The same record-keeping is what lets structured AI output beat prose at scale instead of becoming an unrepeatable party trick.

Articles describe the Foundation. The Foundation Map is the thing itself — accounts, admin, email, logging, and deployment, with one real workflow running through them.

← All articles