smallbox

← All articles

What you could build

What does a moderation pipeline on vision and text APIs really cover?

In content moderation, the review queue is the product

A post comes in. A photo with a caption, uploaded to a product you run. You send it through a classifier and it returns a number: 0.61. Sixty-one percent that this breaks your rules. Not 0.04, where you'd let it through without a thought. Not 0.98, where you'd block it and move on. Sixty-one — a maybe. What your product does with that number is the entire product. Everything before it is a commodity; everything after it is the part you'd be building a company around.

The reflex is to reach for a threshold. Block above 0.7, allow below, and the awkward middle disappears into one of the two buckets. That works right up until the first time it doesn't — the post you blocked that was fine, the post you allowed that wasn't, and the user, or eventually the regulator, who asks you to explain the call. A moderation product that only has a threshold has no answer to that question. The honest version of what you do with the 0.61 is: a human looks. The machine defers, and a person decides. That hand-off, and everything around it, is the work.

The classifier is the easy part

Strip the pipeline to what has to happen and the model shrinks fast. A piece of content arrives. You run it through a vision API — Azure's or AWS's content-safety service — to catch the categories an image model is good at: explicit imagery, graphic violence, the obvious. You run the text through an LLM classifier — OpenAI's or Claude's — for the things that need language and context: harassment dressed as a joke, a scam in polite phrasing, a rule specific to your community that no general model was trained on. You get back scores. Then you route: clear-allow, clear-block, or the band in the middle that goes to a person. A webhook tells the originating system what was decided, so the post publishes, hides, or waits.

Exactly one of those steps is the impressive demo, and it's the cheapest to acquire. Vision moderation is a metered API call; LLM classification is a prompt and a metered API call. Both are commodities — the same call your competitor makes, priced by the same vendors, improving on the same curve whether or not you do anything. If the product were the classifier, there would be no product, because the classifier is for rent by anyone.

The product is the routing and what sits at the end of it. The clear-allow and clear-block paths resolve themselves. The middle band is where a moderation company actually lives — a queue with humans at the end of it.

What the queue actually has to carry

Call it a review queue and it sounds like a list. It isn't. It's where every consequential decision your product makes gets made, and that puts real demands on it.

It has to hold the uncertain cases and present them well — the content, the scores, the rule that might apply, the context a reviewer needs to judge without going hunting. A reviewer staring at a post with no idea why it was flagged makes slow, inconsistent calls.

It has to keep the deciders consistent. Two reviewers looking at the same borderline post should reach the same verdict most of the time, and when they don't, that disagreement is a signal you need to see — about an unclear rule, or a reviewer who's drifted. Consistency across people is not something the model gives you. It's something the queue, the guidelines beside it, and the way you measure agreement have to produce.

It has to resolve inside a time bound. A post held for review is one not yet published, or already published and not yet checked — either way a user is waiting, and "we'll get to it" is a different product from "decided within the hour." The service-level on resolution is a feature, and a hard one, because it's a staffing and routing problem, not a code problem.

And it has to remember. Every decision — what was flagged, the scores, which reviewer ruled, what they decided, when — has to be written down and kept. Not because it's tidy, but because the day someone asks "why did you remove my post," or "why did you leave that one up," the audit trail is the only thing that can answer. A moderation product without a defensible record of its decisions isn't a lighter version of the product — it's the part that gets you in trouble, shipped on purpose.

What it rides on

Named as modules, a real version needs:

  • Background jobs — classification runs out of band. You don't block an upload while two providers score it; you queue the work, call the APIs, handle the one that times out, and write the result when it lands.
  • A review queue with an operator UI — the screen where uncertain cases wait, where a reviewer sees the content and the scores and rules, and where the decision is recorded. This is the product surface, not a back-office afterthought.
  • Logging — the durable, queryable record of every decision and every classifier call, which is both how you tune the thresholds and how you answer the accountability question later.

That list is not hypothetical. A background job runner, an operator admin built to do real work rather than just inspect rows, and a logger that records what happened all run in production today under CompanyGraph — the batch runner driving its pipelines, the admin its operators act from, the logging that captures events across the system. CompanyGraph runs no content-moderation product; it is checkable evidence that the parts a moderation pipeline leans on — the queue's job machinery, the operator surface, the decision record — are real and operated. The genuinely new work for moderation is the classification routing and the review workflow tuned to your rules; the machinery underneath the queue already exists.

What you own, and what you don't

The split here is unusually clean, which makes it a good one to get right.

The classifiers stay rented. The vision API and the LLM are someone else's models on someone else's schedule, and that's correct — building a content-safety model is a different company than building a moderation product. Treat them as swappable parts: a better or cheaper provider should be a configuration change, not a rewrite.

What's yours is everything the rented models can't give you. Your rules — the specific lines your community draws, which no general model knows. Your routing thresholds — where the allow, block, and review bands sit, tuned to how much false-positive pain you can take versus false-negative risk. The queue and the reviewers' decisions. And the audit trail — the full record of what was decided and why. If the only trace of your moderation decisions lives in a provider's logs, you can't tune, can't explain a call, and can't move when a model changes under you. Let the providers be the record and you've outsourced your product's judgment to a vendor whose incentives are about their exposure, not yours.

This is the same shape, from the queue's side, as the moderation you'd run on the way into an image product. Owning the policy on what's allowed to be generated is the input bookend of a creation product; the review queue here is the same decision at scale, made about content that already exists. Same rule, same ownership line: the filter can be rented, the line can't.

The hard part

The named risk is not the integration. It's that classifiers are wrong in both directions, and the cost of each kind of wrong is different and unavoidable.

A false positive blocks something fine — a frustrated user, a legitimate post gone, a support ticket, and over time a community that learns your product punishes the innocent. A false negative lets through the thing you exist to stop, which is reputational, or legal, or worse. You cannot tune both to zero at once; pushing the thresholds to catch more bad content catches more good content with it, and loosening them to spare the good content lets more bad content through. The model hands you a dial between two kinds of harm, not a solution to either, and choosing where to set it is a judgment about your product that's never finished.

Which is exactly why the human-review queue is the product and not a feature bolted to the side: it's how you absorb the cases the dial can't resolve cleanly. But a queue is only as good as the people in it and the record it keeps. If the reviewers are inconsistent, you've moved the unreliability from the model to the bench and changed nothing. If decisions aren't logged in a form you can stand behind, the queue resolved the case and left you no way to defend it. So the hard part has three parts and none of them are the API call: the queue that holds and presents the uncertain cases, the consistency of the humans deciding in it, and the audit trail that proves, later, why each call was made. That is months of careful work. The classifiers are an afternoon.

There's a discipline that keeps this honest. When a case sits in the band where the model isn't sure, the temptation is to re-run it — another prompt, a second model, a higher temperature, until something crosses cleanly into allow or block — and skip the human. That feels like automation; it's the opposite. Re-rolling until a score looks decisive hides exactly the cases that most needed a person, and deletes the signal that your rules have a genuinely ambiguous edge. The discipline is the same one a support chatbot built to say "I don't know" honestly lives by: validate once, and when the result is uncertain, route it to a human rather than gambling for a confident-looking answer. An uncertain case sent to the queue is information you can act on. An uncertain case re-rolled until it passes is the same information, thrown away — until the day the call it papered over is the one you have to explain.

The verdict

This is almost always a module behind a product that already has the content, not a standalone company. The buyer isn't shopping for "a moderation SaaS" in the abstract — they run a platform with users posting things, and they need a queue, a routing rule, and a defensible record wired into the product they already have. Sold as a separate company, you're asking someone to route the single most sensitive decision their product makes through a third party, and to trust that third party's audit trail when their own accountability is on the line. Usually the wrong shape: the moderation belongs inside the product, owned by the team who owns the rules and answers for the calls.

Which is what makes the foundation the right place to find out cheaply. If the queue's machinery — jobs, an operator surface, logging — already runs, the new work is the classification routing and the review workflow tuned to your community. You can stand a real moderation path up behind a real product, watch how often the model defers, see whether your reviewers agree, and learn where your rules are genuinely unclear — without first building the boring half. The classifier was never the question. Whether your queue is consistent, fast, and defensible is — and the only way to learn that is to run real content through it and watch what the people at the end of it decide.


The jobs, the operator queue, and the logging this rides on — the parts that turn two classifier calls into a moderation decision you can defend — are the foundation. If you have a pointed version of an idea like this, that's exactly what one workflow is meant to prove.

Articles describe the Foundation. The Foundation Map is the thing itself — accounts, admin, email, logging, and deployment, with one real workflow running through them.

← All articles