The promises are verified

Every studio says it works. If you have built an AI prototype, or commissioned one, you have already met the distance between it works and it holds — the demo runs, the screens click through, and then the first real user does something the prototype never planned for. A claim that something works is not worth much on its own. The question worth asking, before you build a business on top of anything, is narrower: how would I check that it still works — six months from now, without taking anyone's word for it?

The Foundation is built to answer that question by not asking for your trust.

It is a set of plain promises. Accounts and sign-in work. Email queues, retries, and keeps a history. Failures land somewhere an operator can see them. Each service deploys on its own. One real workflow runs from end to end. Promises like these are cheap to say. What makes one worth anything is that it is paired with a check you can point to — an automated test that runs the real code against a real database, recorded with the exact version of the code it ran against.

So the claim is never "we tested a lot." It is narrower, and more useful: this promise, verified — and here is the record of when.

What gets promised, and what proves it

Two of the promises are already verified this way, with recorded results.

Sign-in works, and stays where it should. Identity has twenty-nine checks, each tied to a behaviour you would actually care about. A sign-in link does not expire and can be used more than once without erroring — click yesterday's link today and it still signs you in. The four different ways a sign-in can fail all return the identical response, so a stranger probing the form learns nothing from the difference. One customer's data cannot be read on another customer's request — the attempt returns a flat "not found," not a quiet leak. And the service comes back from a restart with its state intact. Twenty-nine checks, one behaviour each.

Email queues, retries, and keeps its history. The mail service has twenty-three. A queued message becomes a real rendered row, built from the actual template rather than a stand-in. A send that keeps failing walks a fixed retry schedule — minutes, then hours — and is finally marked abandoned rather than silently dropped. One product's mail cannot reach into another's. A message already translated into a customer's language is never overwritten behind their back.

Underneath both sits a quieter layer, proven by running rather than by a test: every service reports in when it restarts, so a service that fell over announces its own return, and its errors land in one log an operator can actually read. That is how each service deploys on its own and failures are visible are checked — not in a suite, but in the live behaviour of the system, every day.

One promise is deliberately not on the verified list yet: the whole workflow, end to end. The piece being built now is a single test that signs a user in the entire way — a real address typed in, a real account created, a real templated email queued, the link pulled back out of that email, and a session established at the end of it. That is the one that matters most, and it is the one still in progress. Saying so plainly is the point of the section after next.

The test most places skip: testing the tests

A green test suite is not the same as safety. That is the central warning on the inherited-systems side of this site: a suite can be entirely green and prove almost nothing, because the tests mock away the very things that could break, or check the shape of the code instead of what it does. A green checkmark is easy to manufacture. Evidence is not.

So the Foundation's tests are themselves tested. The method is blunt: we plant real bugs in the code on purpose — flip a comparison, drop a rule, weaken a check — and confirm the suite catches each one. A suite that stays green while the code is quietly broken is not a safety net; it is a decoration. On the identity service, nine of ten planted bugs were caught on the first run. The one that slipped through named the test that was missing — that test now exists and catches it.

This is the part that most "we test everything" claims never submit to, because it is the part that can embarrass you. It is also the only thing that turns a green run from a feeling into evidence.

What "verified" is allowed to mean

The word does real work here, so it is worth being exact about what it is allowed to mean.

A promise is called verified only when its check is green and recorded — stamped with the exact version of the code it passed against. A run from a working copy with uncommitted changes is never shown as a receipt; it cannot be reproduced, so it does not count. Numbers appear only attached to a named promise — "twenty-nine checks on sign-in," never a bare percentage or a score on a badge. A coverage figure tells you how much code a test touched, not whether it would notice anything going wrong, so it is not used here as proof of anything.

And what is not finished is not dressed up as if it were. Today, identity and email are verified with recorded results, and both suites have survived the planted-bug test. The end-to-end workflow check is the next piece of work, not a current claim. The honest map of what is proven, and what is still being proven, lives with the Foundation itself.

The same proof, in your environment

There is a reason this matters beyond reassurance. The proof is not a marketing artifact that lives on our side of a wall — it is the same code, in the same repositories, so it arrives in your environment by construction. You get the checks, the records, and the heartbeat the day the Foundation lands, and the next developer you hire runs them on their first morning instead of guessing what is safe to touch.

It also gives the build a definition of done that is not a matter of opinion. The first workflow is finished when your named workflow — the one path your product actually turns on — is written as one of these checks, runs green in your environment, is recorded in your own logs, and is shown to you working. "Done" stops being a feeling on a call and becomes something you can point at.

Where this fits

This way of working is not new, and it is not theory. The same approach already runs in production on a gamified financial-literacy platform, where a forty-five-day simulation is frozen into a real database and every test runs the real application code against that captured state. The Foundation's own services are built the same way.

If what you have is not a fresh build but a system you inherited from someone else, the first question is a different one — there, the job is reading what the existing tests actually prove before trusting any of them. But if you are putting something new on a real base, this is what the base owes you in return: not a promise that it works, but a way to check.

See what the Foundation includes, and where each promise is checked.

Articles describe the Foundation. The Foundation Map is the thing itself — accounts, admin, email, logging, and deployment, with one real workflow running through them.

Other articles in this cluster →Send your first workflow →

← All articles