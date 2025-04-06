In what Meta CEO Mark Zuckerberg called a “big upgrade,” Meta today unveiled its Llama 4 series of AI models, showcasing models that span extremes. Zuckerberg framed the launch with a clear objective: “Our goal is to build the world’s leading AI, open source it, and make it universally accessible so that everyone in the world benefits.” On the one hand is the openly available Llama 4 Scout, boasting what Zuckerberg calls an “industry-leading, nearly infinite, 10 million token context length” capable of analyzing the equivalent of 15,000 pages in one go. For comparison, OpenAI’s GPT-4o has a context window of 128,000 tokens.

A reasoning model is in the works. “The Llama 4 Reasoning model is being developed as a specialized model focused on enhanced logical capabilities and problem-solving,” Zuckerberg said.

I’m not aware of anyone training a larger model out there.

Meta also is previewing the nearly 2 trillion parameter Llama 4 Behemoth, positioned as a state-of-the-art “teacher” model. Zuckerberg teased this massive model, stating, “This thing is massive – more than 2 trillion parameters. I’m not aware of anyone training a larger model out there. It is already the highest-performing base model in the world, and it is not even done training yet.” While Meta’s own announcements emphasize favorable, albeit sometimes selective, comparisons against rivals like GPT-4o, independent validation is quickly emerging: the other key open model, Llama 4 Maverick, has already debuted at the second spot on the Hugging Face Chatbot Arena leaderboard. Zuckerberg had previously expressed his belief that “open source AI is going to produce the leading models, and with Llama 4, this is starting to happen.”

Meta achieved its context window breakthroughs, particularly in Scout, through its “iRoPE architecture.” This approach uses interleaved attention layers without positional embeddings and employs inference time temperature scaling of attention to enhance length generalization. The “i” in iRoPE stands for “interleaved” attention layers, reflecting progress toward Meta’s long-term goal of supporting potentially “infinite” context length – aligning with Zuckerberg’s description of Scout’s “nearly infinite, 10 million token context length.”

[Behemoth] is already the highest-performing base model in the world, and it is not even done training yet.

Parameter efficiency through mixture-of-experts

Both released Llama 4 models utilize a Mixture-of-Experts (MoE) approach, a key strategy for achieving high performance with computational efficiency. Llama 4 Maverick, for example, has 17 billion active parameters used per token, but draws from a vast pool of 400 billion total parameters spread across 128 experts plus a shared expert. As Zuckerberg noted for Maverick: “This one is 17 billion parameters with 128 experts.” In this architecture, each token activates only a fraction of the total parameters, improving inference efficiency and lowering latency compared to dense models of similar total size, as the company noted in an announcement.

Maverick and Scout: distinct design philosophies

Maverick’s high leaderboard ranking reflects Meta’s positioning of it as the flagship open model, designed as a powerful all-rounder competitive with top-tier proprietary systems. Zuckerberg dubbed it “the workhorse” and claimed, “It beats GPT-4o and Gemini 2.0 Flash on all benchmarks.” He added specific technical details: “It is smaller and more efficient than DeepSeek V3 but is still comparable on text.” Despite activating only 17 billion parameters per token via its 128-expert MoE design (out of 400B total), Meta claims Maverick outperforms key rivals on a range of coding, reasoning, and multilingual benchmarks. It supports a still-massive 1 million token context window and, like Scout, is “natively multi-modal,” capable of processing up to eight images alongside text for complex vision-language tasks. Zuckerberg emphasized its deployment ease: “designed to run on a single host for easy inference. This thing is a beast.”

Scout, meanwhile, prioritizes extreme context and efficiency. Its 16-expert MoE structure (17B active/109B total), detailed by Zuckerberg as “17 billion parameters with 16 experts,” combined with the iRoPE architecture allows it to handle its industry-first 10M token context. Zuckerberg highlighted its speed and accessibility: “It is extremely fast… and it is designed to run on a single GPU.” This makes it ideal for processing vast documents or codebases on limited hardware. He concluded Scout is “by far the highest-performing small model in its class.”

Native multimodality through early fusion

A core architectural differentiator for the Llama 4 series is its native multimodality, achieved through an “early fusion” strategy. This aligns with Zuckerberg’s description of both Scout and Maverick as being “natively multi-modal.” This approach integrates text and vision processing into a unified backbone from the outset. It thus contrasts with methods that append vision capabilities to predominantly text-trained models. By enabling joint pre-training across diverse datasets—including text, images, and significantly, video frames—Meta aims for a more fundamentally integrated cross-modal understanding.

The architecture incorporates an optimized vision encoder derived from MetaCLIP technology, reportedly tuned specifically for interaction with the language model components. Functionally, this translates to demonstrated capabilities such as handling multiple concurrent images (up to eight tested effectively) and sophisticated image grounding, particularly noted in Scout, which links textual queries precisely to visual regions. The ability to interpret temporal sequences from video frames further underscores the potential depth of this integrated approach. This represents a strategic architectural shift from prior Llama generations, potentially simplifying model development and enabling more nuanced multimodal applications than achievable with loosely coupled vision-text systems.

Raising the bar for open models

With the Llama 4 series, Meta significantly raises the bar for open-access AI models, delivering unprecedented context length, competitive performance through efficient architectures, and robust multimodal capabilities. Zuckerberg summarized the achievement: “Overall, Llama 4 is a milestone for Meta AI and for open source. For the first time, the best small, mid-sized, and potentially soon frontier models will be open source.” While Behemoth remains an internal benchmark for now, and another model, “Llama 4 Reasoning,” is expected next month, the availability of the powerful Maverick and ultra-long-context Scout provides developers with compelling new tools. As Zuckerberg concluded, “There’s a lot more to do, but the trajectory here is clear. We’ve got more model drops coming soon, so stay tuned out there.” These releases are poised to accelerate innovation across the AI landscape.

Users wanting to interact with the new capabilities can do so immediately, as Zuckerberg pointed out: “If you want to try Llama 4, you can use Meta AI in WhatsApp, Messenger, or Instagram Direct, or you can go to our website at meta.ai.”