This Week's Sponsor:

Clic for Sonos

The native app Sonos should’ve made. No lag, no hassle, just music. Save up to 50%.


The Third Generation of Apple’s Foundation Models and AFM Core Advanced

I just came back to my hotel room after a long day at Apple Park (I documented most of it in my Instagram stories, including a very cool shot), and, like everyone else here in Cupertino, I’m still processing the information overload from the past 12 hours. The MacStories team already covered iOS and iPadOS 27, plus Siri AI and Apple Intelligence, and we have more coming tomorrow.

Before I call it a day though, I wanted to link the first thing I read on my way back: Apple’s latest article on the Machine Learning blog about the new Apple Foundation Models that were announced today – three cloud-based models, and two on-device ones.

At the heart of this architecture is our third generation of Apple Foundation Models (AFM), a family of five foundation models custom-built in collaboration with Google. These span from on-device models to server-based models running on Private Cloud Compute.

The cloud models hosted on Private Cloud Compute are interesting: they’re not “Gemini” models; they’re Apple Foundation Models trained using proprietary data via RL, which were then “refined” with data from Google’s “frontier” models. I’d love to know the details, but in reading between the lines, I’d guess that AFM Cloud is based on Gemini 3.1 Flash-Lite, AFM Cloud Pro on Gemini 3.5 Flash, and ADM Cloud on Nano Banana Pro (Gemini 3 Pro Image).

And yet, at the moment I’m much more interested in the second local model that Apple announced earlier today: AFM Core Advanced. From Apple’s blog:

AFM 3 Core Advanced, our most powerful on-device model. It’s natively multimodal, enabling helpful features like expressive voices and higher-accuracy dictation. Built on cutting-edge Apple research, this 20-billion-parameter model uses a sparse architecture, activating just 1 to 4 billion parameters at a time depending on the request. AFM 3 Core Advanced is unlocked by and optimized for our most capable Apple silicon systems.

AFM Core Advanced won’t be available to all devices that can otherwise run Apple Intelligence with the smaller AFM Core on-device model – which is something you’d expect with a jump from a 3B model to a 20B one. However, running a 20B, multimodal model on phones with 12 GB of RAM is no joke and, as it turns out, it’s been made possible by a technology that Apple invented for sparse architecture models where the model is stored in flash memory and activated parameters are locked-in per prompt:

One area of deep innovation is our most powerful on-device model, AFM 3 Core Advanced. Traditional large language models—whether dense or sparsely activated—require all weights to reside in active memory (DRAM), creating a massive footprint that limits scalability on consumer hardware. To break this barrier, AFM 3 Core Advanced introduces a novel sparsely activated architecture built on Instruction-Following Pruning (IFP), a technique developed by Apple researchers (see Figure 1).

Instead of forcing the entire model into DRAM, the full model is stored in flash memory (NAND). Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt. A lightweight, dense block selects a fixed set of experts during initial processing, periodically reselecting them during generation. To minimize data movement, the model relies on a high percentage of always-active “shared experts” alongside input-dependent “routed experts” swapped into DRAM only when needed.

Like the server models, AFM Core Advanced was also introduced as an “Apple and Google collaboration”. (If I were to guess again, probably more Gemini distillation for an AFM model Apple had been working on for some time? Their paper on instruction-following pruning was originally submitted in January 2025.)

I’ve been covering the AFM family of models for a while now, and AFM Core Advanced is one of the most interesting on-device models I’ve read about in a while, especially in the context of model size for mobile devices and built-in multimodality with support for text, images, and audio. I’m very keen to play around with this model and understand how it holds up in practice. I wonder if the new CLI (!) for AFM will let you test this one.


You can follow all of our WWDC coverage through our WWDC 2026 hub or subscribe to the dedicated WWDC 2026 RSS feed.

Screens – Access your computers from anywhere.