Trying to Make Sense of the Rumored, Gemini-Powered Siri Overhaul

Quite the scoop from Mark Gurman yesterday on what Apple is planning for major Siri improvements in 2026:

Apple Inc. is planning to pay about $1 billion a year for an ultrapowerful 1.2 trillion parameter artificial intelligence model developed by Alphabet Inc.’s Google that would help run its long-promised overhaul of the Siri voice assistant, according to people with knowledge of the matter.

There is a lot to unpack here and I have a lot of questions.

First, let’s backtrack a little. Gurman previously reported that Apple was targeting iOS 26.4 in spring 2026 for the delayed Siri features from earlier this year, which matches Apple’s promise of “next year” as well. Those features were about on-screen awareness, in-app actions, and personal context based on App Intents, which Apple announced at WWDC 2024 and never shipped (in the meantime, third-party developers have been updating their apps with App Intents in preparation for Apple Intelligence).

In September, however, Gurman also reported that Apple was looking to launch a “world knowledge” version of Siri infused with web search capabilities to “rival OpenAI and Perplexity”. He wrote:

The company is working on a new system – dubbed internally as World Knowledge Answers – that will be integrated into the Siri voice assistant, according to people with knowledge of the matter. Apple has discussed also eventually adding the technology to its Safari web browser and Spotlight, which is used to search from the iPhone home screen.

And:

Apple’s new search experience will include an interface that makes use of text, photos, video and local points of interest, according to the people. It also will offer an AI-powered summarization system designed to make results more quickly digestible and more accurate than what’s offered by the current Siri.

In that September report, Gurman also said that those features were on track for a spring 2026 launch, therefore joining the other previously delayed Apple Intelligence features in the rumored Siri overhaul.

Back to yesterday’s report:

Under the arrangement, Google’s Gemini model will handle Siri’s summarizer and planner functions – the components that help the voice assistant synthesize information and decide how to execute complex tasks. Some Siri features will continue to use Apple’s in-house models.

I have no idea what Gurman means by “planner” and “complex tasks”. Surely we’re not looking at a reasoning model that thinks longer after a user query to plan each step of an answer, right? Apple appears to be testing a traditional “LLM Siri” chatbot app, but it sounds like it’s for internal testing only and won’t be released anytime soon. That is, in my opinion, a mistake: people clearly like chatbots, especially the ones that work.

Let me try and make sense of everything we’ve heard about this rumored Siri overhaul and Google partnership. For starters, this custom Gemini model hosted on Private Cloud Compute is not going to have 1.2T parameters activated all at once; just like the Gemini 2.5 family of models, Google and Apple have likely implemented a mixture-of-experts architecture that activates only a subset of parameters to guarantee faster performance during inference. This is the direction the industry has been going for the past year, including some fascinating research by Perplexity on running large models with MoE that was published this week. For context, Kimi K2 (the model I recently covered here) is a 1T parameter model with 384 experts; only 32B parameters are activated at inference, with 8 experts selected per token. Put simply: without a MoE architecture, this Apple/Google model would be too costly for Apple to run at scale, and it would be slow for users.

That said, a 1.2T model is a big model, which squares nicely with the idea that Apple is building an “answer engine” inside Siri. At that size, the model would have a lot of knowledge built-in, and since it’d be Gemini, it would be able to reliably call Google search to look up more specific questions or gather additional context via web search. The picture is a little more clear then: Apple found a partner – with whom they have an existing deal – that can provide a large, custom model with state-of-the-art performance, pre-trained knowledge, and direct integration with web search. The last item is key here, I think. If Apple wanted to build an answer engine, Anthropic couldn’t be a candidate since they don’t have their own search index (they rely on Brave Search instead); Perplexity has one, but the company doesn’t train models; Mistral is like a worse version of Claude, and doesn’t have its own search index either. A Gemini model, without the Gemini brand, at 1.2 trillion parameters, running on Private Cloud Compute and under Apple’s control at $1 billion/year sounds like a crazy good deal to me. The one that a certain Ferrari board member would make.

But back to the model: which one is it? Gemini 3.0 is launching (very?) soon, and one would assume that, given the spring 2026 timing, a “Gemini 3.0 Flash” would exist by then. I could also see a scenario in which we’ll never officially know which version of Gemini this model is based on, if we’ll hear anything from Apple about this partnership in the first place. But let’s assume that it’s going to be Gemini 3.0 Flash, with the ability to provide answers, look up web results, summarize them, and support multiple modalities with text and images: what about the other delayed Siri features? Is Gemini going to be the model that will steer Siri toward one App Intent instead of another when trying to make sense of different app domains in the semantic index that Apple announced in 2024? Is that architecture still in place? At the time, Apple’s vague wording suggested that a mix of local and cloud models would be in charge of parsing an on-device semantic index, depending on the complexity of the task. Is that what Gurman’s “planner function” is about?

I’d also point out another advantage for implementing Gemini in Private Cloud Compute: it’s natively multimodal, and it’s very good at dealing with image recognition, OCR, and audio transcription (which I started using months ago, and I recently implemented as a skill in Claude). The vision capabilities of Gemini could probably come in handy for the on-screen awareness features that Siri was supposed to receive before Apple delayed them. Keep in mind that whenever you use Visual Intelligence and ask a question, the image has to go through ChatGPT; a Gemini-powered Siri would be able to answer that without the ChatGPT extension.

Lastly, here’s a fascinating tidbit from the story: according to Gurman, the Apple Intelligence Foundation model running in the cloud today is a 150B one.

The custom Gemini system represents a major advance from the 150 billion parameter model used today for the cloud-based version of Apple Intelligence. The move would vastly expand the system’s power and its ability to process complex data and understand context.

I’ve long wondered about the size of that model, and it’s nice to hear a number finally, albeit an unofficial one.

If everything Gurman has reported so far is accurate – and I have no reason to believe it isn’t – I still have many questions about the details of this future Siri, but at least I think I get what Apple is preparing. In iOS 26.4, Apple may unveil a much smarter Siri, powered by Gemini behind the scenes, that can answer a lot of questions on its own, search the web, summarize text and understand images, and maybe even power the long-delayed app integrations of Apple Intelligence. If these rumors are correct, it sounds like Siri’s new leadership is making all the right moves.

M5 iPad Pro Review: An AI and Gaming Upgrade for AI and Games That Aren’t There Yet

A Fresh Spin on Apple Music: Exploring Daft Music’s Liquid Glass Design

Jump Into the Liquid Glass Pool: A MacStories OS 26 App Roundup

Trying to Make Sense of the Rumored, Gemini-Powered Siri Overhaul

Access Extra Content and Perks