Natural language on a spreadsheet — useful, or quietly dangerous?

Some ideas in this set arrive with a verdict attached. This one arrives with a question mark, and the honest thing is to leave it there. A natural-language layer on top of a spreadsheet — ask in plain English, get back a formula, a chart, a cleaned-up column — is the kind of idea that sounds obviously good for about a minute and then becomes genuinely hard to judge. Smallbox has not built it, is not certain it should be built, and that uncertainty is exactly why it is worth thinking through slowly rather than pitching quickly. Most of this site is about being honest where the work is. Here, honesty means being open about where the answer is.

Why it pulls at you

The appeal is real and easy to feel. Spreadsheets are the most widely used data tool on earth, and most of the people in them cannot write the formula they actually need. They know what they want — "show me the months where returns went up but revenue didn't" — and they do not know how to express it in the spreadsheet's own language. A layer that turns the plain-English want into the working formula would meet an enormous number of people exactly where they are stuck. Stated that way, it sounds like an obvious win, and for a certain kind of small, bounded task it might genuinely be one.

The technology to attempt it is mostly rented and mostly ready. The same language model that generates prose can read a request and produce a formula; the spreadsheet's own API gives you the cells. Wiring "question in, formula out" is not the hard part. The hard part is what happens when the formula is wrong.

The reason it is hard to judge

A spreadsheet is not a chat window, and the difference is the whole problem. A chat assistant that answers confidently and wrongly wastes your time, and you usually notice, because you were reading the answer. A spreadsheet is where people keep the numbers they act on — budgets, inventory, payroll, the figures a decision gets made from — and an AI that confidently writes a wrong formula or quietly reshapes a column does not waste your time. It corrupts the thing you were relying on, invisibly, and you find out when a decision built on bad numbers goes wrong weeks later. The failure mode is not "unhelpful." It is "silently destructive," and that is a different and much higher bar.

This is why the idea resists a quick verdict. The same capability that makes it appealing — it acts on your data — is the capability that makes it dangerous, and the two cannot be separated. You cannot have a tool that usefully changes your spreadsheet without having a tool that can usefully damage it. So the question is not "is this useful," which it sometimes obviously is, but "can it be made safe enough that the usefulness is worth the risk," and that is genuinely open.

What would have to be true for it to work

If there is a version of this that is real, it almost certainly runs on two disciplines, and they are worth naming because they are the difference between a tool people trust and a tool that destroys data once and is deleted forever.

Suggest, don't apply. The AI proposes; the human disposes. It shows you the formula it would write and what it would do to your data before anything changes, in plain terms, and nothing happens to a single cell without a deliberate, visible step you took. The model's confidence is never enough to move your data on its own. This is the same instinct a careful product applies wherever a model's output has consequences — the honest "I don't know" and the discipline of not blindly trusting a generated answer are the chat-shaped version of the same rule; here the stakes are your actual numbers, so the rule has to be stricter, not looser.

Provenance and reversibility. Every change the tool makes is recorded and undoable — you can see exactly what the AI did, when, in response to what, and step back from it cleanly. A spreadsheet AI without an undo trail is asking for a level of trust no careful person should give it. With one, a mistake is a nuisance you reverse rather than a loss you discover.

Those two together do not guarantee the idea works. They are the minimum that makes it not reckless — and notice that both are about restraint, not reach. The interesting engineering here is not in making the model do more. It is in making it safe to be wrong.

What it would ride, and what stays open

The familiar parts are familiar. The model is rented. What you would own is the data, the provenance log, and the safety layer that sits between the model's suggestion and the user's cells — and the foundation carries the rest: accounts, billing, the jobs, the admin, the logging. None of that is the uncertain part. The uncertain part is not technical at all.

It is whether people want this enough to adopt it, trust it enough to let it near their real spreadsheets, and pay for it once the novelty wears off. That is not a question you can reason your way to from a desk. The market is genuinely unproven — there have been attempts, the category has not obviously broken open, and it is not clear yet whether the want is broad and deep or narrow and shallow. Writing confidently either way would be exactly the manufactured certainty this kind of piece is supposed to avoid.

The verdict, held open

So this one does not get a clean verdict, and forcing one would be dishonest. It is a maybe — interesting enough to be worth a careful look, risky enough that the careful look has to come before any commitment. The honest position is that nobody, including us, knows yet whether this is a real product, a thin feature, or a clever demo that does not survive contact with people's actual data.

Which is the precise situation a foundation is for. The reason to care that the boring parts already exist is not to save building time for its own sake — it is that it makes an open question cheap to answer. You could put a deliberately narrow, suggest-only version in front of real users — one task, fully reversible, no ability to silently change anything — and learn the only thing that matters: whether people reach for it twice. That is how you resolve a question mark like this one — build the safe, narrow version on something that already carries the plumbing, watch what people actually do, and let the answer come from them rather than from a pitch. The foundation will not tell you whether this idea is good. It will make finding out affordable, which on an idea this uncertain is the whole game.

Articles describe the Foundation. The Foundation Map is the thing itself — accounts, admin, email, logging, and deployment, with one real workflow running through them.

Other articles in this cluster →Send your first workflow →

← All articles