AI agencies have multiplied faster than AI models. In the last eighteen months, LinkedIn has sprouted a thousand "AI automation experts," most of whom have never shipped a system that lives in production past week six.

If you're an SME owner trying to pick one, the hype is your biggest enemy.

The good ones look the same as the bad ones in a pitch meeting. The difference shows up six months later, when one has a working system and the other has a folder of unfinished Notion pages.

Here's the checklist I'd use if I were hiring Faction AI. Ten questions. Most "agencies" will fail at least half of them.

1. Can they show me a system running right now?

Not a demo video. Not a screenshot. A live system, in production, that their client is using today.

If they can't name three, skip them.

2. Who owns the code and the prompts?

Some agencies build on top of their own proprietary platform — great for them, terrible for you if you ever need to leave. Ask who owns the deliverables on final payment. Custom code and prompts should become yours.

3. What happens when OpenAI / Anthropic change their API?

Model providers deprecate endpoints, change pricing, and rate-limit without notice. A serious agency will tell you how they handle this. A shallow one will look blank.

4. Do they understand your ops, or are they just selling a template?

If the first call is a demo of "our AI workflow builder," they're selling a product. If the first call is them asking how your business actually runs, they're building a solution.

One is cheaper. The other works.

5. What's their production error rate?

Real AI systems have error rates. Ask for their actual success rate on a live client — 24-hour, 7-day, whatever they track. "99%" with no unit is a red flag. "92% first-call resolution on inbound AI phone" with a specific metric is credible.

6. How do they handle AI hallucinations?

If they say "we prompt-engineer them away," run. Hallucinations are a statistical property of LLMs — you mitigate, you don't eliminate.

Look for: retrieval grounding, human-in-the-loop checkpoints for high-stakes output, confidence thresholds, output validation. That's the real answer.

7. What's the minimum review cycle before something goes live?

Anything that touches a customer — email, call, document, decision — needs human review at first and potentially forever. If they're racing to put AI in front of your customers with no review layer, they're prioritising their own speed over your reputation.

8. Can they explain their stack without buzzwords?

"We use Retell AI for voice, Clio for case management, Xero for billing, n8n to orchestrate, and Claude for reasoning" — that's a real stack.

"We leverage next-generation agentic workflows and autonomous AI-driven transformation" — that's a deck.

9. What does month-six look like?

Most AI projects launch fine and quietly die three months in because nobody's looking after them. Ask what ongoing support looks like. Is it a retainer? Billed hourly? Fire-and-forget?

A system without an owner is a time bomb.

10. How do they charge — and why?

Fixed-price per project sounds safe but incentivises shortcuts. Pure time-and-materials has no cap. The best arrangements I've seen: scoped build fee plus a modest ongoing retainer for monitoring, tuning, and improvements.

Whatever the structure, it should be transparent. If you can't understand their quote in two minutes, something's off.

The bottom line

Pick someone who ships, not someone who decks. Ask for receipts. Ask hard questions. The agencies worth hiring welcome them.

If we're the right fit, great. If we're not, you'll know what to ask the next person in the door.

Evaluating AI Agencies: a 10-Point Checklist for UK SMEs

1. Can they show me a system running right now?

2. Who owns the code and the prompts?

3. What happens when OpenAI / Anthropic change their API?

4. Do they understand your ops, or are they just selling a template?

5. What's their production error rate?

6. How do they handle AI hallucinations?

7. What's the minimum review cycle before something goes live?

8. Can they explain their stack without buzzwords?

9. What does month-six look like?

10. How do they charge — and why?

The bottom line

Book a free 30-min audit.

More from Playbook.

How SMEs Can Prioritise AI Without Getting Overwhelmed

The Realistic Cost of AI Automation for UK SMEs

The Five Ops Processes Every SME Should Automate First