Importing a spreadsheet means adopting a schema you don't control

A customer hands you a Google Sheet, or points you at an Airtable base, and asks you to "just pull the data in." It looks like the easiest integration on the list. The provider's API is clean, the rows come back as neat objects, and an import that reads them and writes them into your own tables takes an afternoon. Then it works, for weeks, and you stop thinking about it — which is exactly when it becomes the most fragile edge in the whole product. Because what you actually did that afternoon was adopt a schema you do not control, maintained by a person who will change it without telling you.

The columns are a promise nobody made

The trouble hides inside how good the API looks. Reading a spreadsheet is genuinely simple — that part is solved, and renting it from Google or Airtable is the right call. What's missing is the thing a real database gives you for free and a spreadsheet never does: a contract about shape. A database column is typed, named, and enforced; a spreadsheet column is a label a human typed once and is free to retype, move, or repurpose at any moment, for reasons that have nothing to do with your product. The header that says Price today is a convention, not a guarantee, and it's a convention the customer doesn't know you depend on.

So the import that read column C as the price will keep reading column C as the price — faithfully, every run — right up to the morning someone inserts a column to the left, or renames Price to List Price, or starts typing TBD into a cell your code expects to be a number. None of that is a mistake on their part. A spreadsheet is a human surface: it is supposed to be edited freely by people who treat it as theirs, because it is. The error is on your side, in having quietly assumed that a surface built for human flexibility would hold still like a schema you designed. It won't, and it was never going to.

What you own: the seam, the audit, and drift detection

The instinct under pressure is to treat the importer as plumbing — read, map, write, done. The durable version treats the seam itself as the product, and three things live there.

The first is a validation and mapping layer that stands between their columns and your tables and assumes the worst on every run. A renamed header will arrive. A text value in a number column will arrive. An empty cell where you assumed a value, a date in a format you didn't plan for, a stray row of notes someone left at the bottom — all of it will arrive, because the surface invites exactly that. The mapping layer's job is to name what it expects, check each incoming row against that, and refuse to write anything it doesn't understand. The point is not to parse cleverly; it's to make a bad row fail loudly and stop, rather than slide into your database wearing the right column's name.

The second is an import audit: a record, every run, of what came in, what was accepted, what was rejected, and why. When a customer says "my data's wrong," the answer has to be a row you can read — forty-two records imported, three rejected on the fourteenth of the month because the Quantity column held text — not a shrug and a re-run. The provider's surface is theirs and opaque; the record of what you did with it is yours and answerable, and that record is the difference between debugging an import in seconds and re-running it blind.

The third, and the one most importers skip entirely, is schema-drift detection. Don't just map the columns — watch them. Before an import trusts its mapping, it should confirm the shape it expects is still the shape that arrived: the columns it depends on are present, named what they were named, and carrying the kind of value they carried last time. When that check fails, the right move is to stop and flag it for a human, not to import against a layout that has silently moved. Drift detection is what converts a silent corruption into a visible alert — the difference between finding out at the seam and finding out from a customer six weeks later.

What stays theirs

The reflex correction, having read all that, is to want to fix the spreadsheet — lock the columns, enforce types, make it behave. Resist it. The spreadsheet staying messy is not a bug to be engineered away; it's the correct division of the work. The sheet is the customer's working surface, the place they actually live and edit and make sense of their business. Owning it would mean owning a worse spreadsheet than the one they already like, and fighting them for control of a tool that is theirs by right. Leave the bytes — the rows, the layout, the freedom to change it — exactly where they are. What you take responsibility for is not the surface but the seam: the translation from their world into yours, and the discipline that the translation is allowed to refuse.

What breaks when it's hacked in

There is one failure mode, and it is quiet, which is what makes it the dangerous one. The hacked-in version reads column C as the price by position and trusts it forever. It works in the demo, works in the first weeks, and keeps working in the literal sense — it never errors — long after it has stopped being correct. The day they insert a column, C becomes "notes," and your importer dutifully writes a wall of free text into the price field and reports success. Nobody gets an error. The numbers are simply wrong now, in your database, downstream of every report and calculation that reads them — and you discover it when a customer does, which is the worst possible reviewer and the latest possible moment. An importer that fails loudly on a shape it doesn't recognise costs you a flagged run and a five-minute fix. An importer that trusts positions costs you a silent data-quality incident with no obvious cause.

Where it shows up, and the verdict

This seam sits underneath a whole class of build ideas. It is the front door of the report generator that turns structured rows into branded documents — and a branded PDF built on quietly-corrupted rows is a confidently wrong document, which is worse than no document at all. It rides the same foundation those ideas do: background jobs so a large import doesn't block a request and can retry when the provider hiccups, an admin where an operator sees a flagged run and the rejected rows, logging that captures what drifted and when. Those modules aren't hypothetical — they run in production today under CompanyGraph, the batch runner, the admin, and the logger all live, which is the evidence that the foundation under this seam is real. The importer on top of it is not something the studio has built; this note is about the shape it would have to take, not a claim that it exists.

The verdict is narrow and load-bearing: own the validation and mapping layer at the seam and the schema-drift detection, and never trust the columns. The spreadsheet is correctly external and correctly theirs — a human surface, expected to be messy, free to change. Everything that protects you from that freedom lives on your side of the seam, and an import built without it isn't an integration; it's a slow data-quality incident that hasn't surfaced yet. The other import seam — syncing from a system of record like a CRM — lands on the same ownership question from the structured side, because it's the same question every time: rent the read, own the translation and the proof of what you let through.

An importer built this way is part of the foundation — the background jobs, admin, and logging that catch a drifted column at the seam run in CompanyGraph's production today. The next sensible step is to bring the spreadsheet you'd want to import as the first workflow, and let the seam get tested against real, messy rows.

Articles describe the Foundation. The Foundation Map is the thing itself — accounts, admin, email, logging, and deployment, with one real workflow running through them.

Other articles in this cluster →Send your first workflow →

← All articles