Anthropic's most powerful AI model — 93.9% SWE-bench, 100% Cybench, autonomous zero-day discovery. Interactive benchmark tracker & comparison tool.
Last updated: April 18, 2026 | Announced: April 7, 2026
Claude Mythos Preview vs Claude Opus 4.6, GPT-5, and Gemini 3.1 Pro on major AI benchmarks.
Real software issue resolution. +13.1% over Opus 4.6 (80.8%)
Perfect score on capture-the-flag security challenges
Math olympiad. Previous best ~42%. A massive leap.
Hard science & reasoning. +3.2% over Opus 4.6
Scans source code, hypothesizes flaws, confirms with tests, and develops working proof-of-concept exploits. Found zero-days in every major OS (Linux, Windows, macOS, OpenBSD, FreeBSD) and every major browser (Chrome, Safari, Edge, Firefox). Some flaws were 10-20+ years old.
On Firefox 147's JS engine: creates working exploits ~84% of the time (vs ~15% for Opus 4.6). Dramatically better at turning vulnerabilities into actionable exploits.
Operates autonomously for extended tasks including reverse-engineering and chaining exploits. Solves simulated corporate network attacks that take skilled humans 10+ hours.
93.9% SWE-bench Verified, 77.8% SWE-bench Pro. Resolves real-world software issues autonomously with planning, tool use, and code execution.
97.6% on USAMO (math olympiad) — a massive jump from ~42% by previous models. Strong gains across all mathematical reasoning tasks.
~92.7% MMLU, saturating many existing benchmarks. Anthropic shifted evaluation focus to real-world tasks over static tests.
Based on leaked documents and reports (not officially confirmed by Anthropic):
Estimated ~10 trillion parameters. Likely Mixture-of-Experts with fewer active parameters per inference.
Internal development codename revealed in the March 2026 data leak.
Claude Mythos is the first AI model deliberately withheld from public release due to its offensive cybersecurity capabilities.
Anthropic's initiative for defensive use of Claude Mythos:
Estimate your Claude Mythos costs. Currently only available to Project Glasswing participants ($100M credits pool first).
| Model | Input $/MTok | Output $/MTok | Relative Cost |
|---|---|---|---|
| Claude Mythos | $25.00 | $125.00 | 5x base |
| Claude Opus 4.7 | $15.00 | $75.00 | 3x base |
| Claude Opus 4.6 | $5.00 | $25.00 | 1x (base) |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 0.6x base |
| Feature | Mythos Preview | Opus 4.7 | GPT-5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 93.9% | 74.2% | ~68% | ~62% |
| SWE-bench Pro | 77.8% | 53.4% | ~58% | ~54% |
| Cybench (CTF) | 100% | ~45% | ~40% | ~35% |
| GPQA Diamond | 94.5% | 91.3% | ~90% | ~89% |
| USAMO (Math) | 97.6% | ~50% | ~55% | ~48% |
| Terminal-Bench 2.0 | 82% | 65.4% | ~60% | ~58% |
| MMLU | 92.7% | 91.0% | 92.8% | 91.5% |
| Parameters (est.) | ~10T (MoE) | Undisclosed | Undisclosed | Undisclosed |
| Public Access | Restricted | Yes | Yes | Yes |
| Input $/MTok | $25.00 | $15.00 | $10.00 | $7.00 |
| Output $/MTok | $125.00 | $75.00 | $30.00 | $21.00 |
* GPT-5 and Gemini 3.1 Pro scores are approximate based on available reports. Mythos scores from Anthropic's system card. Some ~ values are estimates where exact figures are not public.