Skip to content

Brief · 28 May 2026

What changed

Anthropic shipped Claude Opus 4.8 — 88.6% on SWE-bench Verified (up from 87.6%) and the strongest computer-use model it has tested (84% on Online-Mind2Web, ahead of GPT-5.5) — while holding the price at $5 / $25 per million tokens, the same as 4.7.

One number

88.6%

SWE-bench Verified for Claude Opus 4.8, up from 87.6% on Opus 4.7

source ↗

Still vapor

The headline 88.5% on BrowseComp is measured with a multi-agent orchestrator, not the base model — the single-model gain is +5.0 points. The flashiest agentic numbers bundle scaffolding you still have to build yourself.

Anthropic shipped Claude Opus 4.8 today, and the number that matters to anyone pricing an inference budget isn't the top-line benchmark — it's the price that didn't move. Opus 4.8 holds at $5 / $25 per million input/output tokens, the same as 4.7, while posting 88.6% on SWE-bench Verified (up from 87.6%) and 93.6% on GPQA Diamond. A frontier coding model that gets cheaper per unit of capability each release tightens the make-vs-buy line for anyone weighing a local 70B-class rig against an API subscription — the breakeven token volume just moved.

The real movement is agentic. Opus 4.8 is now the strongest computer-use and browser-agent model Anthropic has tested — 84% on Online-Mind2Web, a clear step over both Opus 4.7 and GPT-5.5 — alongside an optional 2.5× fast mode and parallel-subagent workflows in Claude Code. If your rig is doing tool-calling or browser automation, the orchestration ceiling went up today.

What to watch

The eye-catching agentic figures bundle the harness with the model. The headline 88.5% on BrowseComp is measured with a multi-agent orchestrator; the single-model gain is the more honest +5.0 points. Independent breakdowns put the base-model jump in perspective — treat the multi-agent numbers as a ceiling you have to engineer toward, not a default you get for free. The price hold is the durable story; the orchestrator records are a benchmark you build, not a model you buy.

Tags

What we read