smallbox

← All work

Case study

Structured Data → Knowledge Surfaces

The tension

A system already contains value. It has entities, relationships, histories, classifications, statuses, logs, business rules. Most of that value is buried inside tables, admin screens, and code paths that nobody but the developers ever sees. The engineering question is not "should we add an AI layer." The engineering question is "what is already here, and what would become useful if it were surfaced?"

Why the naive solution fails

The shortcut is to ask the model nicely:

"Describe the company {NAME} in {FIELD} sector. Make it sound professional and informative. About 150 words."

Run this five thousand times. The output drifts in tone, mixes true and plausible-sounding, repeats structural patterns ("Founded in...", "Today, the company..."), and quietly fails for edge cases — small companies, foreign tickers, recent IPOs. By the time anyone notices, the database is full and the volume hides the rot.

The deeper failure is that the AI flattens what is unusual about each subject — the mechanism, the binding constraint, the strange thing that makes the business work — into generic industry boilerplate. The interesting structure was already in the system. Asking the model to imagine it is the wrong question.

The design rule

Find the structure first. AI enriches typed fields, not paragraphs. The system models entities and relationships with explicit types. Hand-written narrative does the heavy lifting where it matters. AI generates short, structurally-constrained, typed outputs — and only after curated input context is in place.

What was actually built

Four layers, in order.

1. Typed structural entities

Domain/IndustryConnection/IndustryConnection represents directional industry-to-industry relationships. Domain/IndustryConnection/ConnectionType enumerates exactly six kinds: InputSupply, Infrastructure, Tooling, Distribution, Demand, Regulation. Not free-text categories.

Domain/Stock/1Collation/StockCollation stores Pearson correlation, regression-slope gearing, and overlap-weeks reliability between every pair of stocks that have enough overlap. Abstractions/Enums/CollationRelationshipType enumerates nine relationship types — StrongAmplifier, StrongMirror, StrongDampened, ModerateAmplifier, and so on — derived from correlation strength × gearing intensity.

The relationships are queryable. The classifications are typed. Nothing important is in a free-text blob.

Relationship classification admin

2. Hand-curated narrative

The Copper Supply Chain page on companygraph.me is not AI-generated. content-management-system/content/articles/supply-chains/supply-chain-copper.json contains 2,000+ words of hand-written structural commentary — declining ore grades, smelting concentration in countries that control processing, ten-to-fifteen-year mine timelines from discovery to production. The kind of writing that compounds: it stays right whether the model improves or not.

Copper Supply Chain page

3. AI enrichment of typed fields

For company-level descriptions, AI is used — but the output is structured, not prose.

BatchCoordinator/Executors/CyberneticSpineJsonParser.ApplySpine parses LLM output into eight individual typed columns on the CyberneticSpine entity: SpineAnchor, SpinePrimary, SpineSecondary, CompanyOneLiner, SystemFunction, BindingConstraint, StructuralDifferentiator, StructuralVulnerability.

ApplySpine source

The migration 20260420091656_AddCyberneticSpine.cs shows each spine field as its own text column. Not a JSONB blob. Queryable across companies, validated independently, comparable side-by-side. That separation is what makes the AI output useful: it can be sampled, audited, and replaced field-by-field as the prompt evolves.

The prompt itself is versioned. v1 → v7+, each version a document in source control. A deprecated version is preserved for at least two more iterations because rollback is non-negotiable: a "better" prompt sometimes produces worse outputs in ways that only show up on companies you didn't sample.

Hard validation gates run before storage: length caps (no soft "around 150 words" — a hard cap or it fails), banned phrases checked literally, required structure (present-tense, mechanism-focused), and a removal test (the description must say something that would change if a key fact about the company changed; generic descriptions get rejected).

4. Surfaces the structure

The structured layer makes navigation possible. The supply-chain graph is one rendering of IndustryConnection rows.

Supply chain knowledge graph

The screener is another — typed observations and interpretations rendered as filterable pills, navigable by users who would never see the underlying tables.

CompanyGraph screener

Same data, different surface. Once the entities and types exist, surfaces are cheap.

What this proves Smallbox can do

Take the data already inside a system — entities, relationships, classifications, histories — and turn it into pages, graphs, search, internal linking, and explanations. Use AI for typed enrichment after the model is in place. Keep prose where prose belongs, in hand-written articles that compound over time.


A note on naming

This case used to be titled "AI-assisted content production." That framing put the tool in the headline and obscured the actual capability. The interesting engineering is the structured pipeline. The AI is one component inside it.

Want your product in this shape?

← All work