Fable 5 vs Codex: Which AI Coding Agent, and When
Two coding agents launched within weeks of each other this spring. One from Anthropic, one from OpenAI. The benchmarks show the most dramatic capability separation the market has seen. And the Anthropic model was pulled by the US government three days after it shipped.
If you're trying to work out which AI coding tool your agency or developer should be using — or which one your automations are built on — here's what the comparison actually shows.
What Landed (Then Didn't)
Claude Fable 5 arrived on 9 June 2026. Anthropic's first Mythos-class model cleared for general release — a tier above Opus — built for long-horizon autonomous coding: multi-day tasks, sub-agent delegation, self-verification. It scored 95% on SWE-Bench Verified and 80.3% on SWE-Bench Pro, the hardest public coding benchmark. Nothing else is close on that second number.
Three days later it was gone. On 12 June, the US government issued an export-control directive requiring Anthropic to suspend all access to Fable 5 and Mythos 5 for foreign nationals — which meant, in practice, switching the models off for every customer worldwide. The stated reason was a reported jailbreak: "asking the model to read a specific codebase and fix any software flaws." Anthropic pushed back, noting that the capability "is widely available from other models (including OpenAI's GPT-5.5), and is used every day by the defenders who keep systems safe." The models came down anyway.
As of 24 June, Fable 5 remains suspended. Opus 4.8 is the Anthropic fallback. GPT-5.5 — which OpenAI shipped in April and which powers Codex, their agentic coding environment — has never been touched.
The Numbers Worth Keeping
On the hardest tasks, the gap is real and large. SWE-Bench Pro — end-to-end GitHub issue resolution on real-world tasks — sits at 80.3% for Fable 5 and 58.6% for GPT-5.5. FrontierCode Diamond (complex production diffs): 29.3% versus 6.3%. These aren't rounding errors.
On terminal and CLI workflows, the picture flips. Terminal-Bench 2.1: Codex on GPT-5.5 scores 83.4%, Claude Code on Fable 5 scores 83.1%. Essentially level.
Pricing: Fable 5 costs $10/$50 per million tokens (input/output). GPT-5.5 is $5/$30. Both have 1M-token context windows.
One figure most comparisons underplay: for fact-critical work in legal, finance, or medical contexts, EdenAI's benchmarks put Fable 5's hallucination rate at 36%, against 85% for GPT-5.5. That gap matters if your AI is touching client data or compliance-adjacent outputs.
Real-world anchoring: Stripe used a Fable 5 agentic pipeline to scan, classify, and migrate a 50-million-line Ruby codebase autonomously, completing the task in a day. That is the use case Fable 5 was built for. Whether it is your use case is a separate question.
The Routing Logic
If Fable 5 returns — Anthropic says it intends to restore access — here is how the decision shapes up.
Reach for Fable 5 when the work is autonomous, long-horizon, and errors are expensive. Large codebase migrations. Complex refactors across many interconnected files. Any agentic task where the model needs to plan, delegate, and verify over many steps without a developer in the loop. The 22-point SWE-Bench Pro lead closes the apparent price gap quickly when failed runs cost engineering days.
Reach for GPT-5.5 via Codex when the work is terminal-centric, involves GUI automation, or lives in the OpenAI ecosystem already. Codex supports computer use on macOS and Windows, GitHub, Slack, and Linear integrations, and an in-app browser. For interactive sessions where a developer reviews each step, GPT-5.5 is competitive on the benchmarks that matter there and costs half as much.
For most SMEs commissioning AI automation from an agency: neither of these decisions is yours to make directly. What matters is that your agency has a model-routing strategy and is not hard-coded to a single provider. The week Anthropic pivoted toward small businesses is worth reading alongside this — the direction of travel is clear, but the infrastructure risk is new.
One Layer You Didn't Plan For
Here is the part that does not appear in benchmark tables.
Both models are the best at what they do. But one was pulled by a government in its first week. For UK businesses — or any non-US business — you are, by definition, a foreign national under US export controls. Anthropic's own statement was unambiguous: the directive ordered suspension for "any foreign national, whether inside or outside the United States." You were the target demographic.
This is not a reason to avoid Anthropic's tools. Fable 5 will almost certainly return. The export control was narrow, the jailbreak claim was disputed, and commercial pressure to restore access is substantial. But it is a reason to think about how you are building.
If your automation runs on a hard-coded model ID and that model disappears overnight, you have a fragility. The fix is not complex: build to an abstraction layer that can swap models without touching your workflow. Most well-built AI automations already do this. Most demo-to-production rushes do not.
The realistic cost guide covers what sensible AI automation actually runs for a growing business. The question after the Fable 5 episode is whether the build your agency delivers includes a model fallback — or whether it is praying the same endpoint stays available. Worth asking explicitly before the contract is signed.
