Anthropic Claude Opus 4.7 Release Features Lift Coding By 13%

Anthropic released Claude Opus 4.7 on Thursday, its most capable generally available AI model to date, with measurable gains in coding, vision, and multi-step agentic work. The launch carries an unusual admission: the company’s stronger model, Mythos Preview, remains behind a restricted access program that most developers and enterprises cannot join.

What the Claude Opus 4.7 Release Features Include

Table of Contents

Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision. pic.twitter.com/PtlRdpQcG5
— Claude (@claudeai) April 16, 2026

The new model is available via the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. According to Anthropic, on a 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, covering four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve. Rakuten reported that on its internal SWE-Bench, the new model resolves 3x more production tasks than Opus 4.6, with double-digit gains in both Code Quality and Test Quality.

The model ships with a new xhigh effort level, positioned between the existing high and max settings, giving developers finer control over the tradeoff between reasoning depth and response latency on hard problems. Image resolution support has expanded substantially: Opus 4.7 now handles images up to 3.75 megapixels and 2,576 pixels on the long edge — roughly three times the ceiling of its predecessor. An updated tokenizer increases token usage by 1.0–1.35x, which developers should factor into cost planning; Anthropic has published a migration guide to assist teams moving from Opus 4.6.

Task budgets — a developer tool for guiding Claude’s token spend — entered public beta alongside this release. Anthropic says Opus 4.7 has the strongest efficiency baseline it has observed for multi-step work on its internal research-agent benchmark, scoring 0.715 overall.

Concrete Benefits and Documented Limitations

Early adopters across sectors report consistent improvement on complex, long-running tasks. CursorBench scored Opus 4.7 at 70%, up from 58% for Opus 4.6, describing it as “a very impressive coding model, particularly for its autonomy and more creative reasoning.” On XBOW’s visual-acuity benchmark, the model scored 98.5% against Opus 4.6’s 54.5% — a near-doubling that Solve Intelligence linked to “major improvements in multimodal understanding, from reading chemical structures to interpreting complex technical diagrams.”

In finance and legal verticals, Harvey reported 90.9% accuracy on BigLaw Bench at high effort, with better reasoning calibration on review tables and noticeably smarter handling of ambiguous document editing tasks. Databricks found 21% fewer errors on its OfficeQA Pro benchmark when Opus 4.7 worked with source documents, compared to its predecessor. The General Finance module on Anthropic’s internal benchmark improved from 0.767 under Opus 4.6 to 0.813 under Opus 4.7.

Need ROI on Social Media? Create content with AI!
Join 100,000+ businesses in 180+ countries using Ocoya!

Agentic platforms noted efficiency gains without the typical regression tax. Notion Agent described a 14% improvement in complex multi-step workflows achieved at fewer tokens and a third of the tool errors. Genspark credited the model with excelling at loop resistance, consistency, and graceful error recovery — three attributes that often degrade as agent chains grow longer. Factory Droids reported between 10% and 15% improvement in task success rates across its droid workflows.

Anthropic offered a notable internal demonstration: Opus 4.7 autonomously built a complete Rust text-to-speech engine from scratch — neural model, SIMD kernels, and a browser demo — then fed its own output through a speech recognizer to verify it matched a Python reference implementation. Qodo added that the model passed three TBench tasks prior Claude models could not complete, and fixed a race condition that had eluded the previous best model.

The limitations are documented in the Claude Opus 4.7 System Card. Anthropic acknowledges that Opus 4.7 is modestly weaker than Opus 4.6 on certain safety measures — specifically showing a greater tendency to offer overly detailed harm-reduction advice on controlled substances. Anthropic also states plainly that Mythos Preview remains the best-aligned model the company has trained. The tokenizer’s 1.0–1.35x token usage increase has direct cost implications for high-volume API consumers, an operational reality the migration guide addresses directly.

Competition, Access Inequality, and the Mythos Ceiling

The release positions Opus 4.7 against OpenAI’s GPT-5.4 and Google’s Gemini 3.1 Pro. According to The Next Web, Opus 4.7 leads on SWE-bench Pro with a score of 64.3% against GPT-5.4’s 57.7%. Gemini 3.1 Pro is priced lower per token, which will matter to cost-sensitive teams even where benchmark numbers favor Anthropic’s offering.

The more structurally significant context is Anthropic’s two-tier model strategy. As The Verge reported, Anthropic acknowledged in Opus 4.7’s system card that the model does not advance the company’s capability frontier, since Mythos Preview scored higher on every relevant evaluation. Mythos Preview remains restricted to select organizations — including, per 9to5Mac, key software platform vendors such as Apple — while the broader developer market uses Opus 4.7.

Anthropic frames the split as safety-driven. The company stated it intends to keep Mythos Preview’s rollout limited and use Opus 4.7 as a testbed for new cybersecurity safeguards before wider deployment. Security professionals seeking early access for legitimate offensive and defensive work can apply through the Cyber Verification Program; Anthropic’s Project Glasswing outlines its public framework for weighing AI risks and benefits in cybersecurity. Independent evaluators can benchmark the model’s performance on economically valuable knowledge work through the third-party tool GDPval-AA.

Anthropic has also established a roughly bimonthly cadence for Opus upgrades, per 9to5Mac — a pace that signals pipeline confidence but also accelerates the question of how quickly the publicly accessible model tier falls behind the restricted one.

What Comes Next and What Remains Unresolved

Partner endorsements are broad and consistent across sectors. Replit called the upgrade an easy decision. Warp described it as a meaningful step up. Vercel reported a solid upgrade with no regressions. Bolt confirmed gains of up to 10% on longer-running app-building work without the regressions typically associated with highly agentic models. Hebbia reported a double-digit jump in accuracy of tool calls and planning in its core orchestrator agents.

Need ROI on Social Media? Create content with AI!
Join 100,000+ businesses in 180+ countries using Ocoya!

Quantium called Opus 4.7 “the best model in the world for building dashboards and data-rich interfaces” and “the most capable model we’ve tested.” A financial technology platform in early testing said it saw “the potential for a significant leap” for its developers. Qodo said the model “feels like a real step up in intelligence.” Ramp noted that Opus 4.7 stands out specifically in agent-team workflows, where coordination between multiple model instances matters most.

The open questions are structural. How long will Anthropic maintain a meaningful performance gap between Opus 4.7 and Mythos Preview, and at what point does that gap become a competitive disadvantage for enterprises that cannot qualify for restricted access? The tokenizer change quietly raises total cost of ownership for high-volume deployments, even as per-task efficiency improves. And as Anthropic puts it, Opus 4.7 “extends the limit of what models can do to investigate and get tasks done” — which still implies a limit, and that limit sits below what the company’s own internal evaluations can achieve.

FAQ – Frequently Asked Questions

How will the new xhigh effort level in Claude Opus 4.7 impact my API costs?

The xhigh effort level is expected to increase costs by around 15-20% compared to the high effort level, but this can vary depending on the specific use case and task complexity. To mitigate this, developers can adjust their task budgets and fine-tune their models to optimize cost-performance tradeoffs. Anthropic provides guidance on cost planning in their migration guide.

Can I fine-tune Claude Opus 4.7 for my specific industry or task?

Yes, Anthropic allows developers to fine-tune Claude Opus 4.7 for specific tasks and industries through their API. This can be particularly useful for domains like finance and law, where specialized knowledge and terminology are crucial. Fine-tuning can help improve the model’s accuracy and relevance for specific use cases.

How does Claude Opus 4.7 compare to other state-of-the-art AI models in terms of multimodal understanding?

Claude Opus 4.7 demonstrates significant improvements in multimodal understanding, rivaling other top models like GPT-4 and Gemini. Its ability to interpret complex technical diagrams and chemical structures has been particularly praised by early adopters. However, a comprehensive comparison with other models is still needed to fully assess its relative strengths and weaknesses.

Last Updated on April 16, 2026 7:25 pm by Laszlo Szabo / NowadAIs | Published on April 16, 2026 by Laszlo Szabo / NowadAIs