smallbox

← All articles

What one capability unlocks

Where should my users' uploaded files actually live?

Store the files anywhere — own the index and the rules

A user uploads a signed contract. A patient uploads a scan. Someone uploads a private document they would never post in public. Each of those is now a file your product is holding, and each carries a promise: this is mine, and only the right people see it. The moment you decide where that file lives, you are quietly deciding whether you can keep that promise — and the decision usually gets made by reflex, by dropping the file at a storage provider and moving on.

That reflex is half right. The trouble is that a stored file is not one thing. It is two things, and they have two different owners.

A file is the bytes, and the rule about who may see them

The first thing is the bytes themselves — the actual contents of the contract, the actual pixels of the scan. Keeping those safe and intact, durably, for years, is a genuinely hard job, and it is the wrong job to build yourself. This is exactly the kind of capability to rent: an object store or blob service does nothing but hold bytes and hand them back unchanged, at any scale, without you thinking about disks. Rent it and don't look back.

The second thing is the rule. Who is allowed to fetch this file? That is not a storage question — it is a product question, and only your product knows the answer. The signed contract belongs to one account and its lawyer; the scan belongs to one patient and their doctor. Whether a given request is allowed to retrieve a given file is a sentence about your product's permissions, and it has to be decided by the side that understands those permissions: you.

The reflex that fails is letting the storage provider answer both questions. It can hold the bytes perfectly. It cannot know that this contract is private to that account, because you never told it — it isn't built to. So when storage becomes the only place a file lives, the rule about who may see it quietly disappears, and a file behind a public or guessable address is, in practice, a public file. Nobody chose that; it's just what's left when the second owner is missing.

What that means for the address of every file

Put the distinction to work and one rule falls out of it immediately: a private file must never be reachable by guessing its address. If the only thing standing between a stranger and a medical scan is whether they can guess the right URL, the file is not private — it's unlisted, which is a different and much weaker thing. Sequential numbers, predictable names, a long random string with no check behind it: all of these are "unlisted," and all of them fail the first time someone tries the next number along.

So a fetch goes through your product, not straight to the bucket. Your product holds the index — the list of which files exist, which account each one belongs to, and what kind of file it is — and on every request it asks its own question first: is this requester entitled to this file? Only on a yes does it reach into rented storage for the bytes. The provider holds the contents; your code holds the gate. The bytes can live anywhere; the rule lives at home.

There's a small, telling refinement an evaluator will want to see. When the answer is no, the honest response is not found, not not allowed. Not allowed confirms the file exists, which already leaks something about a private document — that there is a contract here for this party to ask after. Not found tells an unentitled requester nothing at all. The gate doesn't just refuse the bytes; it refuses to admit they exist. That is a one-line decision in the access layer, and it's the difference between a system that keeps a secret and one that merely declines to share it.

Where this already runs, and what it lets you answer

This isn't a sketch. CompanyGraph runs a small image-service in production today holding well over a thousand uploaded files — around 1,400 — backed up daily, and what comes back is byte-identical to what went in. That service keeps its own bytes rather than renting a blob store — which is exactly the anywhere in the title: where the bytes physically sit is a choice, and here they happen to sit at home. What isn't a choice is the other half, and the image-service owns it: which app a file belongs to, which category it sits in, and who is allowed to fetch it. A gated read there returns not found rather than not allowed — the system won't even concede that a file it's protecting exists. You can check that behaviour against a system that's live, which is the only reason to mention it.

That index — the catalogue of what exists and whose it is — earns its keep on a day the demo never shows. A user, or a regulator on their behalf, asks the question every product eventually gets: show me everything you hold for this account, or delete everything you hold for this account. If your only catalogue of "what this person uploaded" is the provider's bucket, you cannot answer cleanly. You're left scanning storage you don't fully understand, hoping nothing is missed — and "we think that's all of it" is not an answer you want to give to a deletion request. With the index at home, both questions are a query you already own: list the files for this account, then remove them. The rented storage holds the bytes; your index is what makes them findable, countable, and deletable as a set.

The part the demo will never show you

Storing files looks finished the moment the first upload succeeds and the first viewer loads it. One file in, one file out, on screen — done. That demo passes whether the design is sound or not, which is exactly why it's misleading. Every consequence in this note is invisible until the file is supposed to be private, or until someone asks you to account for all of a user's files at once. Neither happens with one upload and one viewer. Both arrive the moment real people start storing things they actually care about.

So the verdict is plain, and it's not "build your own storage" — building byte storage yourself would be the mistake. Rent the durable storage; that half is solved and worth paying for. Keep the other half: the index of what files exist and whose they are, and the access rule on each one, enforced by your code before any bytes move. A file system you rent the bytes for is fine. A file system you don't own the rules for isn't really yours.


Storing files this way is part of the foundation — the same split runs CompanyGraph's image-service in production today, the bytes one concern and the index and access rule another. Taking a payment splits the same way: the provider moves the money, the meaning of it stays home.

Articles describe the Foundation. The Foundation Map is the thing itself — accounts, admin, email, logging, and deployment, with one real workflow running through them.

← All articles