Meta Muse Spark AI Model Benchmarks: Competitive but Not Leading

A dark, stylized low-poly and cel-shaded graphic illustration showing figures walking through a fractured, rocky landscape towards a massive, carved stone monument that reads 'INTRODUCING MUSE SPARK', with a bright glowing star above it, visually representing the Meta Muse Spark AI model benchmarks.
A stylized visualization of the Meta Muse Spark AI model benchmarks, suggesting a significant but competitive performance.

Meta launched Muse Spark on April 8, 2026, its first proprietary AI model built entirely from scratch by the newly formed Meta Superintelligence Labs. The release came almost exactly 10 months after Mark Zuckerberg overhauled Meta’s AI operations and installed 29-year-old Alexandr Wang โ€” former co-founder and CEO of Scale AI โ€” as the division’s head. Meta’s stock rose 9% on the day of the announcement.

Meta Muse Spark AI Model Benchmarks: What the Numbers Actually Say

Benchmarks list of Meta's Muse Spark
Benchmarks list of Meta’s Muse Spark

On the Artificial Analysis Intelligence Index, Muse Spark scores 52 โ€” nearly triple the company’s previous efforts and close to Google’s Gemini 3.1 Pro Preview, which scores 57. Meta claims the model required 58 million output tokens to complete the full Intelligence Index run, a measure of computational intensity that independent auditing firm Artificial Analysis tracked.

The model posts strong results in PhD-level reasoning: 89.5 on GPQA Diamond and 86.4 on CharXiv Reasoning. It scores 80.4 on MMMU Pro and 71.3 on Visual Factuality (SimpleVQA). On the notoriously difficult Humanity’s Last Exam, Muse Spark reaches 58%, while FrontierScience Research comes in at 38%.

Where the numbers soften: ARC AGI 2 lands at 42.5, a score Meta’s own data shows trailing both GPT-5.4 and Gemini 3.1 Pro Preview by a visible margin. CritPT, the physics research benchmark, sits at just 11%. A Meta executive told Axios directly that Muse Spark does not mark a new state of the art.

What Muse Spark Can Do โ€” and Where It Still Falls Short

Introduction page of Muse Spark
Introduction page of Muse Spark

Meta describes Muse Spark as a “natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration.” The model includes a “contemplating mode” that can orchestrate multiple agents simultaneously, and Meta says it delivers improved health responses โ€” a domain where 1,000 physicians collaborated on curating training data.

Doris Xin, CEO of AI startup Disarray, told CNBC that based on the published benchmarks, Muse Spark appears to excel specifically in image and video processing. Planned use cases include Shopping Mode, Health Reasoning, and Interactive UI integrations across Meta’s apps. According to Mashable, Zuckerberg confirmed “Muse Spark now powers an updated version of Meta AI, which users can access online at meta.ai or in the Meta AI app,” with a rollout to Facebook, Instagram, and WhatsApp planned next.

The gaps are real and Meta does not hide them. The company acknowledges that Muse Spark’s ability to act across long-horizon software and office workflows is still being refined. Coding workflows remain a weak point, and Gizmodo noted the model is not yet challenging for the top position in most benchmark categories.

The Ecosystem Bet Muse Spark Is Quietly Upending

For context, the Llama family โ€” released in 2023 โ€” reached 100 million downloads by Q3 of that year and accumulated 1.2 billion downloads across the ecosystem by early 2026. Developers described Llama as the LAMP stack of AI: foundational infrastructure that others built on top of. Self-hosting Llama models offered up to 88% cost reduction compared to proprietary API providers, making it indispensable for cost-sensitive deployments.

That open-source goodwill is now at stake. Meta’s decision to launch Muse Spark as a proprietary model โ€” even while Axios reports an open-source version is planned โ€” puts it in direct competition with the same developer community that built its ecosystem. US deployments account for 35% of global Llama usage, but by late 2025, Chinese models from Alibaba, DeepSeek, and Zhipu AI had grown to 41% of downloads on platforms like Hugging Face, compressing Meta’s dominance from below.

The financial logic is blunt. Meta reaches 3 billion people through its apps and describes a “27 billion brain budget” โ€” the scale of AI inference required to power those interactions. Business Insider reports Meta invested $14 billion into Scale AI as part of the broader overhaul. The company now needs Muse Spark, described by Meta Superintelligence Labs as the first model in the Muse family aimed at realizing “superintelligence for personal use” โ€” a digital extension of the self โ€” to translate that spend into a revenue line that open-source Llama never provided.

Wang framed the internal transformation in a post on X: “Nine months ago we rebuilt our AI stack from scratch. New infrastructure, new architecture, new data pipelines… This is step one. Bigger models are already in development with plans to open-source future versions.” Meta’s own announcement called it “the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts.”

Open Questions for Developers and Decision-Makers

The benchmark story is only partially written. Muse Spark’s performance in long-horizon agent tasks and complex coding workflows remains unproven at production scale, and it is exactly those workflows โ€” enterprise automation, software development pipelines โ€” where OpenAI and Anthropic’s Claude Opus 4.6 currently hold enterprise contracts.

For the 1.2 billion-download Llama ecosystem, the path forward is unclear. Llama 4 debuted to mixed reviews in 2025, and it is uncertain whether future Llama versions will continue at the same pace or play second fiddle to the proprietary Muse family. Developers who built cost structures around Llama’s 88% API savings have no direct replacement if the proprietary tier scales up.

The monetization question โ€” raised bluntly by CNBC โ€” has no clean answer yet. Wang called Muse Spark “the most powerful model that Meta has released,” but that bar was not especially high before this week. Whether the model can convert its multimodal strengths into paid enterprise or consumer products, and whether it can close the ARC AGI 2 gap against Gemini and GPT-5.4, will determine whether the $14 billion bet reads as a foundation or a sunk cost.

FAQ – Frequently Asked Questions

How will Muse Spark’s proprietary nature affect the open-source Llama community?

Meta’s decision to launch Muse Spark as a proprietary model may lead to a divergence in the Llama community, with some developers continuing to support the open-source Llama models and others migrating to Muse Spark for its improved performance. This could result in a fragmented ecosystem, with different models being used for different applications. However, Meta has announced plans to release an open-source version of Muse Spark in the future, which may help to mitigate this effect.

What are the potential implications of Muse Spark’s limitations in coding workflows?

Muse Spark’s weaknesses in coding workflows may limit its adoption in certain industries, such as software development, where AI models are used to assist with coding tasks. However, Meta is likely working to address these limitations in future updates, and the model’s strengths in image and video processing make it a strong candidate for applications in other areas, such as computer vision and multimedia analysis.

How will the rollout of Muse Spark to Facebook, Instagram, and WhatsApp change the user experience?

The integration of Muse Spark into Meta’s apps is expected to bring significant improvements to features such as content generation, image and video processing, and conversational AI. Users can expect to see more sophisticated and accurate AI-powered features, such as improved chatbots and more realistic image generation. The rollout is likely to be gradual, with some features being introduced in the coming weeks and months.

Laszlo Szabo / NowadAIs

Laszlo Szabo is an AI technology analyst with 6+ years covering artificial intelligence developments. Specializing in large language models, ML benchmarking, and Artificial Intelligence industry analysis

Categories

Follow us on Facebook!

A dark, minimalist graphic with bold white and neon green typography that reads "Investing in the POST-AGI World
Previous Story

OpenAI Alumni Venture Capital Fund Zero Shot Bets $100M on AI’s Blind Spots

A stark, angular low-poly illustration in a deep charcoal gray and warm gold palette. On the right, a monumental cliff face is carved with the giant, blocky letters 'CIA', positioned under a radiant star-like emblem and a shaft of light, representing the vast ambition of the Agency. In the left foreground, a solitary human figure in a pensive walking pose observes this monumental structure, casting a deep shadow. The scene visualizes the concept of massive technological scale (the 'CIA') confronting the solitary nature of the human analyst in the context of CIA AI coworker plans.
Next Story

CIA AI Coworker Plans: What the Agency Is Building โ€” and What It Still Cannot Fix

Latest from Blog

Go toTop