I Finally Tested the M5 iPad Pro’s Neural-Accelerated AI, and the Hype Is Real

By Federico Viticci

The M5 iPad Pro.

The best kind of follow-up article isn’t one that clarifies a topic that someone got wrong (although I do love that, especially when that “someone” isn’t me); it’s one that provides more context to a story that was incomplete. My M5 iPad Pro review was an incomplete narrative. As you may recall, I was unable to test Apple’s promised claims of 3.5× improvements for local AI processing thanks to the new Neural Accelerators built into the M5’s GPU. It’s not that I didn’t believe Apple’s numbers. I simply couldn’t test them myself due to the early nature of the software and the timing of my embargo.

Well, I was finally able to test local AI performance with a pre-release version of MLX optimized for M5, and let me tell you: not only is the hype real, but the numbers I got from my extensive tests over the past two weeks actually exceed Apple’s claims.

Trying to Make Sense of the Rumored, Gemini-Powered Siri Overhaul

By Federico Viticci

Quite the scoop from Mark Gurman yesterday on what Apple is planning for major Siri improvements in 2026:

Apple Inc. is planning to pay about $1 billion a year for an ultrapowerful 1.2 trillion parameter artificial intelligence model developed by Alphabet Inc.’s Google that would help run its long-promised overhaul of the Siri voice assistant, according to people with knowledge of the matter.

There is a lot to unpack here and I have a lot of questions.

On MiniMax M2 and LLMs with Interleaved Thinking Steps

By Federico Viticci

MiniMax M2 with interleaved thinking steps and tools in TypingMind.

In addition to Kimi K2 (which I recently wrote about here) and GLM-4.6 (which will become an option on Cerebras in a few days, when I’ll play around with it), one of the more interesting open-source LLM releases out of China lately is MiniMax M2. This MoE model (230B parameters, 10B activated at any given time) claims to reach 90% of the performance of Sonnet 4.5…at 8% the cost. You can read more about the model here; Simon Willison blogged about it here; you can also test it with MLX on an Apple silicon Mac.

What I find especially interesting about M2 is that it’s the first model to support interleaved thinking steps in between responses and tool calls, which is something that Anthropic pioneered with Claude Sonnet 4 back in May. Here’s Skyler Miao, head of engineering at MiniMax, in a post on X (unfortunately, most of the open-source AI community is only active there):

As we work more closely with partners, we’ve been surprised how poorly community support interleaved thinking, which is crucial for long, complex agentic tasks. Sonnet 4 introduced it 5 months ago, but adoption is still limited.

We think it’s one of the most important features for agentic models: it makes great use of test-time compute.

The model can reason after each tool call, especially when tool outputs are unexpected. That’s often the hardest part of agentic jobs: you can’t predict what the env returns. With interleaved thinking, the model could reason after get tool outputs, and try to find out a better solution.

We’re now working with partners to enable interleaved thinking in M2 — and hopefully across all capable models.

I’ve been using Claude as my main “production” LLM for the past few months and, as I’ve shared before, I consider the fact that both Sonnet and Haiku think between steps an essential aspect of their agentic nature and integration with third-party apps.

That being said, I have been testing MiniMax M2 on TypingMind in addition to Kimi K2 for the past week and it is, indeed, impressive. I plugged MiniMax M2 into TypingMind using their Anthropic-compatible endpoint; out of the box, the model worked with interleaved thinking and the several plugins I’ve built for myself in TypingMind using Claude. I haven’t used M2 for any vibe-coding tasks yet, but for other research or tool-based queries (like adding notes to Notion and tasks to Todoist), M2 effectively felt like a version of Sonnet not made by Anthropic.

Right now, MiniMax M2 isn’t hosted on any of the fast inference providers; I’ve accessed it via the official MiniMax API endpoint, whose inference speed isn’t that different from Anthropic’s cloud. The possibility of MiniMax M2 on Cerebras or Groq is extremely fascinating, and I hope it’s in the cards for the near future.

AI Experiments: Fast Inference with Groq and Third-Party Tools with Kimi K2 in TypingMind

By Federico Viticci

Kimi K2, hosted on Groq, running in TypingMind with a custom plugin I made.

I’ll talk about this more in depth in Monday’s episode of AppStories (if you’re a Plus subscriber, it’ll be out on Sunday), but I wanted to post a quick note on the site to show off what I’ve been experimenting with this week. I started playing around with TypingMind, a web-based wrapper for all kinds of LLMs (from any provider you want to use), and, in the process, I’ve ended up recreating parts of my Claude setup with third-party apps…at a much, much higher speed. Here, let me show you with a video:

Kimi K2 hosted on Groq on the left.Replay

Claude Adds Screenshot and Voice Shortcuts to Its Mac App

By John Voorhees

Claude’s new in-context screenshot tool.

Anthropic introduced a couple of new features in its Claude Mac app today that lower the friction of working with the chatbot.

First, after giving screenshot and accessibility permissions to Claude, you can double tap the Option button to activate the app’s chat field as an overlay at the bottom of your screen. The shortcut simultaneously triggers crosshairs for dragging out a rectangle on your Mac’s screen. Once you do, the app takes a screenshot and the chat field moves to the side of the area you selected with the screenshot attached. Type your query, and it and the screenshot are sent together to Claude, switching you to Claude and kicking off your request automatically.

Instead of double-tapping the Option key, you can also set the keyboard shortcut to Option + Space, or a custom key combination. That’s nice because not all automation systems support two modifier keys as a shortcut. For example, Logitech’s Creative Console cannot record a double tap of the Option button as a shortcut.

Sending your query and screenshot takes you back to the Claude app for your response.

I send a lot of screenshots to Claude, especially when I’m debugging scripts. This new shortcut will greatly accelerate that process simply by switching me back to Claude for my answer. It’s a small thing, but I expect it will add up over time.

My only complaint is that the experience has been inconsistent across my Macs. On my M1 Max Mac Studio with 64GB of memory, it takes 3-5 seconds for Claude to attach the screenshot to its chat field whereas on the M4 Max MacBook Pro I’ve been testing, the process is almost instant. The MacBook Pro is a much faster Mac than my Mac Studio, but I was surprised at the difference since it occurs at the screenshot phase of the interaction. My guess is that another app or system process is interfering with Claude.

Am I talking to the Claude chatbot or lighting my Dock on fire.

The other new feature of Claude is that you can set the Caps Lock button to trigger voice input. Once you trigger voice input, an orange cloud appears at the bottom of your screen indicating that your microphone is active. The visual is a little over-the-top, but the feature is handy. Tap the Caps Lock button again to finish the recording, which is then transcribed into a Claude chat field at the bottom of your screen. Just hit return to upload your query, and you’re switched back to the Claude app for a response.

One of the greatest strengths of modern AI chatbots is their multi-modality. What Anthropic has done with these new Claude features is made two of those modes – images and audio – a little bit easier, which gets you from input to a response a little faster, which I appreciate. I highly recommend giving both features a try.

Max Weinbach on the M5’s Neural Accelerators →

Linked By Federico Viticci

In addition to the M5 iPad Pro, which I reviewed earlier today, I also received an M5 MacBook Pro review unit from Apple last week. I really wanted to write a companion piece to my iPad Pro story about MLX and the M5’s Neural Accelerators; sadly, I couldn’t get the latest MLX branch to work on the MacBook Pro either.

However, Max Weinbach at Creative Strategies did, and shared some impressive results with the M5 and its GPU’s Neural Accelerators:

These dedicated neural accelerators in each core lead to that 4x speedup of compute! In compute heavy parts of LLMs, like the pre-fill stage (the processing that happens during the time to first token) this should lead to massive speed-ups in performance! The decode, generating each token, should be accelerated by the memory bandwidth improvements of the SoC.

Now, I would have loved to show this off! Unfortunately, full support for the Neural Accelerators isn’t in MLX yet. There is preliminary support, though! There will be an update later this year with full support, but that doesn’t mean we can’t test now! Unfortunately, I don’t have an M4 Mac on me (traveling at the moment) but what I was able to do was compare M5 performance before and after tensor core optimization! We’re seeing between a 3x and 4x speedup in prefill performance!

Looking at Max’s benchmarks with Qwen3 8B and a ~20,000-token prompt, there is indeed a 3.65x speedup in tokens/sec in the prefill stage – jumping from 158.2 tok/s to a remarkable 578.7 tok/s. This is why I’m very excited about the future of MLX for local inference on M5, and why I’m also looking forward to M5 Pro/M5 Max chipsets in future Mac models.

Permalink

M5 iPad Pro Review: An AI and Gaming Upgrade for AI and Games That Aren’t There Yet

By Federico Viticci

The M5 iPad Pro.

How do you review an iPad Pro that’s visually identical to its predecessor and marginally improves upon its performance with a spec bump and some new wireless radios?

Let me try:

I’ve been testing the new M5 iPad Pro since last Thursday. If you’re a happy owner of an M4 iPad Pro that you purchased last year, stay like that; there is virtually no reason for you to sell your old model and get an M5-upgraded edition. That’s especially true if you purchased a high-end configuration of the M4 iPad Pro last year with 16 GB of RAM, since upgrading to another high-end M5 iPad Pro model will get you…16 GB of RAM again.

The story is slightly different for users coming from older iPad Pro models and those on lower-end configurations, but barely. Starting this year, the two base-storage models of the iPad Pro are jumping from 8 GB of RAM to 12 GB, which helps make iPadOS 26 multitasking smoother, but it’s not a dramatic improvement, either.

Apple pitches the M5 chip as a “leap” for local AI tasks and gaming, and to an extent, that is true. However, it is mostly true on the Mac, where – for a variety of reasons I’ll cover below – there are more ways to take advantage of what the M5 can offer.

In many ways, the M5 iPad Pro is reminiscent of the M2 iPad Pro, which I reviewed in October 2022: it’s a minor revision to an excellent iPad Pro redesign that launched the previous year, which set a new bar for what we should expect from a modern tablet and hybrid computer – the kind that only Apple makes these days.

For all these reasons, the M5 iPad Pro is not a very exciting iPad Pro to review, and I would only recommend this upgrade to heavy iPad Pro users who don’t already have the (still remarkable) M4 iPad Pro. But there are a couple of narratives worth exploring about the M5 chip on the iPad Pro, which is what I’m going to focus on for this review.

LLMs As Conduits for Data Portability Between Apps

By Federico Viticci

One of the unsung benefits of modern LLMs – especially those with MCP support or proprietary app integrations – is their inherent ability to facilitate data transfer between apps and services that use different data formats.

This is something I’ve been pondering for the past few months, and the latest episode of Cortex – where Myke wished it was possible to move between task managers like you can with email clients – was the push I needed to write something up. I’ve personally taken on multiple versions of this concept with different LLMs, and the end result was always the same: I didn’t have to write a single line of code to create import/export functionalities that two services I wanted to use didn’t support out of the box.

Apps in ChatGPT →

Linked By Federico Viticci

OpenAI announced a lot of developer-related features at yesterday’s DevDay event, and as you can imagine, the most interesting one for me is the introduction of apps in ChatGPT. From the OpenAI blog:

Today we’re introducing a new generation of apps you can chat with, right inside ChatGPT. Developers can start building them today with the new Apps SDK, available in preview.

Apps in ChatGPT fit naturally into conversation. You can discover them when ChatGPT suggests one at the right time, or by calling them by name. Apps respond to natural language and include interactive interfaces you can use right in the chat.

And:

Developers can start building and testing apps today with the new Apps SDK preview, which we’re releasing as an open standard built on the Model Context Protocol⁠ (MCP). To start building, visit our documentation for guidelines and example apps, and then test your apps using Developer Mode in ChatGPT.

Also:

Later this year, we’ll launch apps to ChatGPT Business, Enterprise and Edu. We’ll also open submissions so developers can publish their apps in ChatGPT, and launch a dedicated directory where users can browse and search for them. Apps that meet the standards provided in our developer guidelines will be eligible to be listed, and those that meet higher design and functionality standards may be featured more prominently—both in the directory and in conversations.

Looks like we got the timing right with this week’s episode of AppStories about demystifying MCP and what it means to connect apps to LLMs. In the episode, I expressed my optimism for the potential of MCP and the idea of augmenting your favorite apps with the capabilities of LLMs. However, I also lamented how fragmented the MCP ecosystem is and how confusing it can be for users to wrap their heads around MCP “servers” and other obscure, developer-adjacent terminology.

In classic OpenAI fashion, their announcement of apps in ChatGPT aims to (almost) completely abstract the complexity of MCP from users. In one announcement, OpenAI addressed my two top complaints about MCP that I shared on AppStories: they revealed their own upcoming ecosystem of apps, and they’re going to make it simple to use.

Does that ring a bell? It’s impossible to tell right now if OpenAI’s bet to become a platform will be successful, but early signs are encouraging, and the company has the leverage of 800 million active users to convince third-party developers to jump on board. Just this morning, I asked ChatGPT to put together a custom Spotify playlist with bands that had a similar vibe to Moving Mountains in their Pneuma era, and after thinking for a few minutes, it worked. I did it from the ChatGPT web app and didn’t have to involve the App Store at all.

If I were Apple, I’d start growing increasingly concerned at the prospect of another company controlling the interactions between users and their favorite apps. As I argued on AppStories, my hope is that the rumored MCP framework allegedly being worked on by Apple is exactly that – a bridge (powered by App Intents) between App Store apps and LLMs that can serve as a stopgap until Apple gets their LLM act together. But that’s a story for another time.

Permalink

Our Top Amazon Early Black Friday Picks

I Finally Tested the M5 iPad Pro’s Neural-Accelerated AI, and the Hype Is Real

Apple Announces 45 App Store Awards Finalists for 2025

Posts tagged with "AI"

I Finally Tested the M5 iPad Pro’s Neural-Accelerated AI, and the Hype Is Real

Trying to Make Sense of the Rumored, Gemini-Powered Siri Overhaul

On MiniMax M2 and LLMs with Interleaved Thinking Steps

AI Experiments: Fast Inference with Groq and Third-Party Tools with Kimi K2 in TypingMind

Claude Adds Screenshot and Voice Shortcuts to Its Mac App

Max Weinbach on the M5’s Neural Accelerators →

M5 iPad Pro Review: An AI and Gaming Upgrade for AI and Games That Aren’t There Yet

LLMs As Conduits for Data Portability Between Apps

Apps in ChatGPT →