Federico is the founder and Editor-in-Chief of MacStories, where he writes about Apple with a focus on apps, developers, iPad, and iOS productivity. He founded MacStories in April 2009 and has been writing about Apple since. Federico is also the co-host of AppStories, a weekly podcast exploring the world of apps, Unwind, a fun exploration of media and more, and NPC: Next Portable Console, a show about portable gaming and the handheld revolution.
This week, Federico and John explain how they go about creating personal productivity tools with the assistance of AI and walk through some of what they have created.
On AppStories+, we talk about our Black Friday tech purchases.
While I’ve decided to use Wispr Flow as my cross-platform dictation app (I signed up for a year of Pro membership yesterday, in fact), there are times when I want to quickly dictate an idea into my iPhone without having to go through the dance of enabling Wispr’s software keyboard, launching the app, and going...
Great post by Allen Pike on the importance of a great app experience for modern LLMs, which I recently wrote about. He opens with this line, which is a new axiom I’m going to reuse extensively:
A model is only as useful as its applications.
And on ChatGPT for Mac specifically:
The app does a good job of following the platform conventions on Mac. That means buttons, text fields, and menus behave as they do in other Mac apps. While ChatGPT is imperfect on both Mac and web, both platforms have the finish you would expect from a daily-use tool.
[…]
It’s easier to get a polished app with native APIs, but at a certain scale separate apps make it hard to rapidly iterate a complex enterprise product while keeping it in sync on each platform, while also meeting your service and customer obligations. So for a consumer-facing app like ChatGPT or the no-modifier Copilot, it’s easier to go native. For companies that are, at their core, selling to enterprises, you get Electron apps.
I don’t hate Electron as much as others in our community, but I can’t deny that ChatGPT is one of the nicest AI apps for Mac I’ve used. The other is the recently updated BoltAI. And they’re both native Mac apps.
I’m not saying the new model isn’t an improvement on Sonnet 4.5—but I can’t say with confidence that the challenges I posed it were able to identify a meaningful difference in capabilities between the two.
This represents a growing problem for me. My favorite moments in AI are when a new model gives me the ability to do something that simply wasn’t possible before. In the past these have felt a lot more obvious, but today it’s often very difficult to find concrete examples that differentiate the new generation of models from their predecessors.
This is something that I’ve felt every few weeks (with each new model release from the major AI labs) over the past year: if you’re really plugged into this ecosystem, it can be hard to spot meaningful differences between major models on a release-by-release basis. That’s not to say that real progress in intelligence, knowledge, or tool-calling isn’t being made: benchmarks and evaluations performed by established organizations tell a clear story. At the same time, it’s also worth keeping in mind that more companies these days may be optimizing their models for benchmarks to come out on top and, more importantly, that the vast majority of folks don’t have a suite of personal benchmarks to evaluate different models for their workflows. Simon Willison thinks that people who use AI for work should create personalized test suites, which is something I’m going to consider for prompts that I use frequently. I also feel like Ethan Mollick’s advice of picking a reasoning model and checking in every few months to reassess AI progress is probably the best strategy for most people who don’t want to tweak their AI workflows every other week.
I love my iPad Pro, but, as you know, lately I’ve been wondering about what comes after iPadOS 26. We have much better multitasking now, and key workflow limitations such as file management, audio recording, and long-running background tasks have been addressed by Apple this year. But now that the user-facing system’s foundation has been “fixed”, what about the app ecosystem?
Over at Snazzy Labs, Quinn Nelson has been wondering the same, and I highly recommend watching his video:
Quinn makes a series of strong, cogent arguments with factual evidence that show how, despite multitasking and other iPadOS 26 improvements, using apps on an iPad Pro often falls short of what can be achieved with the same apps on a Mac. There is so much I could quote from this video, but I think his final thought sums it up best:
There are still days that I reach for my $750 MacBook Air because my $2,000 iPad Pro can’t do what I need it to. Seldom is the reverse true.
I’m so happy that Apple seems to be taking iPadOS more seriously than ever this year. But now I can’t help but wonder if the iPad’s problems run deeper than windowing when it comes to getting serious work done on it.
Stop me if you’ve heard this one before: I created a shortcut to quickly append content to my daily note so I don’t forget to save stuff I come across during the day, thoughts that pop into my head, or random things that John or Silvia send me. Right, we’ve been over this. What’s different...
The best kind of follow-up article isn’t one that clarifies a topic that someone got wrong (although I do love that, especially when that “someone” isn’t me); it’s one that provides more context to a story that was incomplete. My M5 iPad Pro review was an incomplete narrative. As you may recall, I was unable to test Apple’s promised claims of 3.5× improvements for local AI processing thanks to the new Neural Accelerators built into the M5’s GPU. It’s not that I didn’t believe Apple’s numbers. I simply couldn’t test them myself due to the early nature of the software and the timing of my embargo.
Well, I was finally able to test local AI performance with a pre-release version of MLX optimized for M5, and let me tell you: not only is the hype real, but the numbers I got from my extensive tests over the past two weeks actually exceed Apple’s claims.