Field notes

How to make AI safe enough for ecommerce search


Ecommerce search is one of those things that sounds obvious right up until you actually think about it.

Here's the catch nobody mentions in the demo: customers don't search the way your catalogue is built. They search the way they talk, which is to say they show up wanting "towbar wiring for a 2018 Hilux", or a present for someone who kills every plant they've ever owned, or coffee that tastes fancy but not like a science experiment—real questions, half-formed, full of context a database column has never heard of. A language model is genuinely good at untangling that kind of messy intent. The trouble is it's just as good at sounding completely certain when it should be saying nothing at all, and that confidence is delightful over dinner and a real problem when it invents a compatibility answer and sends someone to the wrong SKU.

Don't let the model be the source of truth

The pattern that actually works is almost boring once you say it out loud. The AI understands the question. Verified data supplies the answer. That's it.

Let the model do what it's good at—reading the customer's words, working out the category, the use case, the constraints, the thing they're really trying to solve—and then make the recommendation come from data you'd stake the business on: the catalogue fields, the fitment tables, inventory, tags, metafields, compatibility rules, collection logic, all the unglamorous plumbing the business already uses to know what fits what. On one build, the stack was Algolia plus an LLM intent layer sitting over verified fitment data, and a customer could type in plain English, the system would interpret it just as loosely, and the answer it handed back still had to pass through the hard data before it reached anyone. That's the line between a helpful shop assistant and a recommendation the support team quietly cleans up on Thursday.

It fails by sounding right

Bad AI search rarely fails the way you'd hope. It doesn't stammer or throw an error. It gives a smooth, well-reasoned, confident answer—and then someone in support spends the afternoon unwinding it. The failure there isn't stupidity, it's fluency—which is exactly why the build needs refusal paths.

And I'd argue they're the whole point rather than some apologetic fallback. If the data isn't enough to answer, the system should say so plainly. If compatibility is genuinely uncertain, it should ask one good follow-up rather than guess. And if the customer actually needs a person, it should hand over cleanly, with the context already gathered so nobody starts from scratch. A system that knows when not to answer is worth far more than one that answers everything.

What it needs under the hood

  • Clean product data: titles, descriptions and attributes good enough for a machine to actually reason over.
  • Structured compatibility rules: especially for automotive, parts, technical goods, or anything where "close enough" isn't.
  • Search retrieval: the fast layer that finds candidate products before the model explains a thing.
  • Answer constraints: the rules that stop the model recommending something the data doesn't support.
  • Logging: so you can see what people asked, what came back, and where the catalogue is thin.

That last one quietly earns its keep. Search logs turn into product research without any extra effort, because when twenty people ask for the same thing and the system keeps coming up empty, that's not really a search problem—it's a hint about merchandising, or content, or a gap in the range you didn't know you had.

What the customer feels

None of this should be the customer's problem. They don't care that there's a retrieval layer, a fitment table and a model reading intent underneath, and they shouldn't have to—the whole thing should feel like one search box that finally understands what they mean. They ask in their own words, they get a short answer and a recommendation with enough reasoning to trust it, and where there's genuine uncertainty they see the uncertainty rather than fake confidence dressed up to look sure of itself.

The simplest way I can put it: use AI for the messy human question, and verified data for the answer. Get that split right and most of the danger goes away.

If this is your problem

This is a bespoke product-finder build, scoped to your catalogue and your data, from $5,000. To start it I need your product data, the rules that actually matter, and a clear line on what the system must never guess. If you'd rather test the thinking before committing to a build, a $2,500 roadmap sprint is the low-commitment way in. And the ongoing piece—the logging, the search-gap intelligence, the guardrails that keep it honest over time—lives in the $3,000/mo Intelligence Retainer. If your store sells parts, technical products, bundles, or anything customers struggle to describe, this is usually the build. Full pricing is at /pages/ai-implementation.

The honest goal here is small. Help someone buy the right thing, the first time, without leaving a mess for support to find later.