TL;DR: Large language models — AI systems trained on vast amounts of text to predict and generate human-like responses — are genuinely useful at work, but most people use them badly. The gap isn't the technology; it's knowing which tool to pick, which features to turn on, and how much to trust what comes back.

What LLMs actually are

An LLM (large language model) is software that has been trained on an enormous volume of text — books, websites, documentation, conversations — until it can generate fluent, contextually relevant responses to almost any input. You type something in; it predicts what a useful response would look like.

That framing matters. LLMs are not databases. They don't look things up; they generate. Which means they can be wrong with complete confidence. A model that tells you the correct VAT threshold and one that invents a plausible-sounding figure will both answer in exactly the same tone. That's not a flaw to avoid — it's a baseline behaviour to design around.

The practical implication: LLMs are excellent at drafting, reasoning, and reformatting. They need human judgement on anything load-bearing.

How it works

Every LLM runs on what's called a context window — the amount of text it can hold in view at once, including your instructions, the conversation so far, and any documents you've uploaded. Think of it as a very capable temporary desk. Everything on the desk, it works with. Everything off the desk, it can't access.

When you send a message, the model generates a response token by token — word fragment by word fragment — based on patterns from its training and whatever is currently on that desk. Newer models can also call tools: running a web search, executing a calculation in Python, reading a file. Those tools fetch information from outside the model's training and bring it onto the desk.

There are two broad model types worth knowing:

  • Standard models — fast, good for drafting, summarising, explaining. Most things.
  • Thinking models (sometimes called reasoning models, like Claude's extended thinking or OpenAI's o1) — slower, more expensive, better at multi-step problems. Use for complex analysis, not quick replies.

Where it helps SMEs

1. First drafts of almost any document

Client letters, proposal sections, internal policies, job descriptions, meeting agendas — LLMs produce serviceable first drafts in seconds. The value isn't that the draft is perfect (it rarely is). It's that editing a draft is faster than writing from a blank page. A financial broker might draft a client-facing rate summary in two minutes rather than twenty. The human still reviews and signs off; the model handles the blank-page problem.

2. Summarising and extracting from documents

Upload a contract, a supplier agreement, or a stack of meeting notes. Ask the model to pull out the key obligations, flag any unusual clauses, or produce a one-page summary. This works reliably when the document is in the context window and you're asking the model to work with what's there rather than recall from memory. A letting agent reviewing a dozen tenancy agreements can surface anomalies in minutes rather than hours.

3. Research and regulatory context

With web search turned on, modern LLMs can pull current information — HMRC guidance, FCA regulatory updates, planning policy changes — and synthesise it into plain English. Treat this as a starting point, not a legal opinion. Cross-check anything you'll act on. But for getting oriented in an unfamiliar area quickly, it's far faster than reading three government PDFs from scratch.

What to watch out for

Hallucination is the default risk, not the edge case. When a model doesn't know something — a specific company name, a recent policy change, a precise figure — it often generates a plausible-sounding answer anyway. The fix is to give it the source material (upload the document, turn on web search) and ask it to work from that, rather than from memory. Any fact that matters should be verified.

Free tiers cap what's possible. The most useful features — web search, file uploads, memory, longer context windows — are typically gated behind paid subscriptions (around £20/month per person for most platforms). Teams running on free tiers are using a meaningfully weaker tool. It's worth treating a paid tier as a software cost, not a luxury.

Generic prompts produce generic output. "Write me a client email" produces a generic client email. "Write a client email to a mortgage broker who has just had an application declined, explaining the next steps, in a tone that's professional but not cold, and around 150 words" produces something you can actually use. The more specific the input, the more useful the output.

Getting started

Pick one platform and pay for it for a month. Claude, ChatGPT, and Gemini are all capable — the right choice matters less than the decision to actually use one properly. Then do this:

  1. Turn on web search.
  2. Upload a real document from your business — a contract, a policy, a report.
  3. Ask it to summarise, extract, or explain something specific from that document.
  4. Edit the output rather than accepting it wholesale.
  5. Note what it got right, and what needed fixing.

Most teams find a rhythm after a few weeks. The people who get real value from LLMs are the ones who treat the output as a first draft written by a fast but sometimes overconfident colleague — useful, but worth checking before it goes anywhere important.


Source: Andrej Karpathy, "How I use LLMs", YouTube, 2026 (2.4M views). Synthesised and adapted for UK SME context by Faction AI.