This Week's Sponsor:

Textastic

The Powerful Code Editor for iPad and iPhone — Now Free to Try


Posts tagged with "AI"

What Siri Isn’t: Perplexity’s Voice Assistant and the Potential of LLMs Integrated with iOS

Perplexity's voice assistant for iOS.

Perplexity’s voice assistant for iOS.

You’ve probably heard that Perplexity – a company whose web scraping tactics I generally despise, and the only AI bot we still block at MacStories – has rolled out an iOS version of their voice assistant that integrates with several native features of the operating system. Here’s their promo video in case you missed it:

This is a very clever idea: while other major LLMs’ voice modes are limited to having a conversation with the chatbot (with the kind of quality and conversation flow that, frankly, annihilates Siri), Perplexity put a different spin on it: they used native Apple APIs and frameworks to make conversations more actionable (some may even say “agentic”) and integrated with the Apple apps you use every day. I’ve seen a lot of people calling Perplexity’s voice assistant “what Siri should be” or arguing that Apple should consider Perplexity as an acquisition target because of this, and I thought I’d share some additional comments and notes after having played with their voice mode for a while.

Read more


How Federico Turns Voice Recordings into Searchable Obsidian Notes with Shortcuts, Hazel, and LLMs

Automation on the Mac is powerful because you have so many choices when building a workflow. Now, with large language models, you can do even more, which is the approach Federico took in his latest Automation Academy lesson for Club MacStories Plus and Premier members:

I built a hybrid automation to bridge spoken words and Markdown – a system that combines the non-deterministic nature of human language and messy voice recordings with the reliability of Shortcuts, the power of Hazel rules on macOS, and the flexibility of LLMs, which are ideal for processing natural language. The system revolves around a shortcut called Process Transcript that takes the raw transcript of a voice recording and turns it into a structured note in Obsidian, complete with a summary, action items, an embedded audio player, and an internal link to the full transcript.

It’s an amazing automation that takes his audio notes, transcribes them into text, structures the results in an Obsidian template that includes extracted tasks, and embeds the original audio file and transcript for reference. Along the way, Federico used Simon Willison’s llm CLI, Google Gemini 2.5 Pro Hazel, Shortcuts, and other tools. It’s a great example of how to make the most of automation on the Mac.


Automation Academy is just one of the many Club MacStories perks.

Automation Academy is just one of the many Club MacStories perks.

Automation Academy is just one of many perks that Club MacStories Plus and Club Premier members enjoy including:

  • Weekly and monthly newsletters 
  • A sophisticated web app with search and filtering tools to navigate eight years of content
  • Customizable RSS feeds
  • Bonus columns
  • An early and ad-free version of our Internet culture and media podcast, MacStories Unwind
  • A vibrant Discord community of smart app and automation fans who trade a wealth of tips and discoveries every day
  • Live Discord audio events after Apple events and at other times of the year

On top of that, Club Premier members get AppStories+, an extended, ad-free version of our flagship podcast that we deliver early every week in high-bitrate audio.

Use the buttons below to learn more and sign up for Club MacStories+ or Club Premier.

Join Club MacStories+:

Join Club Premier:

Permalink

Apple Is Using Differential Privacy to Improve Apple Intelligence

Apple has been using differential privacy for nearly ten years to collect its users data in a way that isn’t traceable back to an individual. As Apple explains in a recent post on its Machine Learning Research site:

This approach works by randomly polling participating devices for whether they’ve seen a particular fragment, and devices respond anonymously with a noisy signal. By noisy, we mean that devices may provide the true signal of whether a fragment was seen or a randomly selected signal for an alternative fragment or no matches at all. By calibrating how often devices send randomly selected responses, we ensure that hundreds of people using the same term are needed before the word can be discoverable.

The company has used the technique to analyze everything from the popularity of emoji to what words to suggest with QuickType.

Now, Apple is using differential privacy to mine the data of users who have opted into sharing device analytics to improve Apple Intelligence. So far, the technique’s use has been limited to improving Genmoji, but in upcoming OS releases, it will be used for “Image Playground, Image Wand, Memories Creation and Writing Tools in Apple Intelligence, as well as in Visual Intelligence,” too.

The report explains that:

Building on our many years of experience using techniques like differential privacy, as well as new techniques like synthetic data generation, we are able to improve Apple Intelligence features while protecting user privacy for users who opt in to the device analytics program. These techniques allow Apple to understand overall trends, without learning information about any individual, like what prompts they use or the content of their emails. As we continue to advance the state of the art in machine learning and AI to enhance our product experiences, we remain committed to developing and implementing cutting-edge techniques to protect user privacy.

For Genmoji, this means collecting data on the most popular prompts used to create the emoji-like images. Apple explains that written content is more challenging but that it can use an LLM to generate synthetic data like emails. The synthetic data is then sent to users’ devices who have opted into device analytics to determine which data matches actual user data most closely and frequently, again using differential privacy to prevent individual device identification.

Using differential privacy to improve Apple Intelligence without directly scraping user data is clever, but it does make me wonder why something similar wasn’t used to generate Apple’s large language models that were trained on the contents of the Internet. Perhaps that’s not possible at the scale of an LLM, or maybe that initial model needs a level of precision that differential privacy doesn’t offer, but I think it’s fair to ask.

Permalink

How Could Apple Use Open-Source AI Models?

Yesterday, Wayne Ma, reporting for The Information, published an outstanding story detailing the internal turmoil at Apple that led to the delay of the highly anticipated Siri AI features last month. From the article:

In November 2022, OpenAI released ChatGPT to a thunderous response from the tech industry and public. Within Giannandrea’s AI team, however, senior leaders didn’t respond with a sense of urgency, according to former engineers who were on the team at the time.

The reaction was different inside Federighi’s software engineering group. Senior leaders of the Intelligent Systems team immediately began sharing papers about LLMs and openly talking about how they could be used to improve the iPhone, said multiple former Apple employees.

Excitement began to build within the software engineering group after members of the Intelligent Systems team presented demos to Federighi showcasing what could be achieved on iPhones with AI. Using OpenAI’s models, the demos showed how AI could understand content on a user’s phone screen and enable more conversational speech for navigating apps and performing other tasks.

Assuming the details in this report are correct, I truly can’t imagine how one could possibly see the debut of ChatGPT two years ago and not feel a sense of urgency. Fortunately, other teams at Apple did, and it sounds like they’re the folks who have now been put in charge of the next generation of Siri and AI.

There are plenty of other details worth reading in the full story (especially the parts about what Rockwell’s team wanted to accomplish with Siri and AI on the Vision Pro), but one tidbit in particular stood out to me: Federighi has now given the green light to rely on third-party, open-source LLMs to build the next wave of AI features.

Federighi has already shaken things up. In a departure from previous policy, he has instructed Siri’s machine-learning engineers to do whatever it takes to build the best AI features, even if it means using open-source models from other companies in its software products as opposed to Apple’s own models, according to a person familiar with the matter.

“Using” open-source models from other companies doesn’t necessarily mean shipping consumer features in iOS powered by external LLMs. I’ve seen some people interpret this paragraph as Apple preparing to release a local Siri powered by Llama 4 or DeepSeek, and I think we should pay more attention to that “build the best AI features” (emphasis mine) line.

My read of this part is that Federighi might have instructed his team to use distillation to better train Apple’s in-house models as a way to accelerate the development of the delayed Siri features and put them back on the company’s roadmap. Given Tim Cook’s public appreciation for DeepSeek and this morning’s New York Times report that the delayed features may come this fall, I wouldn’t be shocked to learn that Federighi told Siri’s ML team to distill DeepSeek R1’s reasoning knowledge into a new variant of their ∼3 billion parameter foundation model that runs on-device. Doing that wouldn’t mean that iOS 19’s Apple Intelligence would be “powered by DeepSeek”; it would just be a faster way for Apple to catch up without throwing away the foundational model they unveiled last year (which, supposedly, had a ~30% error rate).

In thinking about this possibility, I got curious and decided to check out the original paper that Apple published last year with details on how they trained the two versions of AFM (Apple Foundation Model): AFM-server and AFM-on-device. The latter would be the smaller, ~3 billion model that gets downloaded on-device with Apple Intelligence. I’ll let you guess what Apple did to improve the performance of the smaller model:

For the on-device model, we found that knowledge distillation (Hinton et al., 2015) and structural pruning are effective ways to improve model performance and training efficiency. These two methods are complementary to each other and work in different ways. More specifically, before training AFM-on-device, we initialize it from a pruned 6.4B model (trained from scratch using the same recipe as AFM-server), using pruning masks that are learned through a method similar to what is described in (Wang et al., 2020; Xia et al., 2023).

Or, more simply:

AFM-server core training is conducted from scratch, while AFM-on-device is distilled and pruned from a larger model.

If the distilled version of AFM-on-device that was tested until a few weeks ago produced a wrong output one third of the time, perhaps it would be a good idea to perform distillation again based on knowledge from other smarter and larger models? Say, using 250 Nvidia GB300 NVL72 servers?

(One last fun fact: per their paper, Apple trained AFM-server on 8192 TPUv4 chips for 6.3 trillion tokens; that setup still wouldn’t be as powerful as “only” 250 modern Nvidia servers today.)

Permalink

A Peek Into LookUp’s Word of the Day Art and Why It Could Never Be AI-Generated

Yesterday, Vidit Bhargava, developer of the award-winning dictionary app LookUp, wrote on his blog about the way he hand-makes each piece of artwork that accompanies the app’s Word of the Day. While revealing that he has employed this practice every day for an astonishing 10 years, Vidit talked about how each image is made from scratch as an illustration or using photography that he shoots specifically for the design:

Each Word of the Day has been illustrated with care, crafting digital illustrations, picking the right typography that conveys the right emotion.

Some words contain images, these images are painstakingly shot, edited and crafted into a Word of the Day graphic by me.

I’ve noticed before that each Word of the Day image in LookUp seemed unique, but I assumed Vidit was using stock imagery and illustrations as a starting point each time. The revelation that he is creating almost all of these from scratch every single day was incredible and gave me a whole new level of respect for the developer.

The idea of AI-generated art (specifically art that is wholly generated from scratch by LLMs) is something that really sticks in my throat – never more so than with the recent rip-off of the beautiful, hand-drawn Studio Ghibli films by OpenAI. Conversely, Vidit’s work shows passion and originality.

To quote Vidit, “Real art takes time, effort and perseverance. The process is what makes it valuable.”

You can read the full blog post here.


Using Simon Willison’s LLM CLI to Process YouTube Transcripts in Shortcuts with Claude and Gemini

Video Processor.

Video Processor.

I’ve been experimenting with different automations and command line utilities to handle audio and video transcripts lately. In particular, I’ve been working with Simon Willison’s LLM command line utility as a way to interact with cloud-based large language models (primarily Claude and Gemini) directly from the macOS terminal.

For those unfamiliar, Willison’s LLM CLI tool is a command line utility that lets you communicate with services like ChatGPT, Gemini, and Claude using shell commands and dedicated plugins. The llm command is extremely flexible when it comes to input and output; it supports multiple modalities like audio and video attachments for certain models, and it offers custom schemas to return structured output from an API. Even for someone like me – not exactly a Terminal power user – the different llm commands and options are easy to understand and tweak.

Today, I want to share a shortcut I created on my Mac that takes long transcripts of YouTube videos and:

  1. reformats them for clarity with proper paragraphs and punctuation, without altering the original text,
  2. extracts key points and highlights from the transcript, and
  3. organizes highlights by theme or idea.

I created this shortcut because I wanted a better system for linking to YouTube videos, along with interesting passages from them, on MacStories. Initially, I thought I could use an app I recently mentioned on AppStories and Connected to handle this sort of task: AI Actions by Sindre Sorhus. However, when I started experimenting with long transcripts (such as this one with 8,000 words from Theo about Electron), I immediately ran into limitations with native Shortcuts actions. Those actions were running out of memory and randomly stopping the shortcut.

I figured that invoking a shell script using macOS’ built-in ‘Run Shell Script’ action would be more reliable. Typically, Apple’s built-in system actions (especially on macOS) aren’t bound to the same memory constraints as third-party ones. My early tests indicated that I was right, which is why I decided to build the shortcut around Willison’s llm tool.

Read more


AI Adds a New Dimension to DEVONthink 4

DEVONthink is a difficult app to review because its flexibility means it can serve a wide variety of purposes. I’ve been using it for the past few weeks as an archive and research companion that houses thousands of plain text files, but the app is capable of effectively replacing your Mac’s file system, storing and cataloging all sorts of files. With lightning-fast search, tagging, and a plethora of other organization methods, DEVONthink 3 has a well-earned reputation as a premier tool for researchers working with lots of files. However, DEVONthink’s capabilities are so varied that it can also serve as a text editor, an RSS reader, a read-later app, and a lot more.

Today, DEVONtechnologies is releasing a public beta of DEVONthink 4, a big update with a focus on AI, but with other new features and refinements to existing capabilities, too. Which of these features matters most to you will depend in large measure on how you use the app. I’m going to focus on the new AI tools because those are the additions that have had the greatest impact on the way I use DEVONthink, but it’s worth keeping in mind that the app offers many other tools that may suit your needs better.

Read more


On Apple Allowing Third-Party Assistants on iOS

This is an interesting idea by Parker Ortolani: what if Apple allowed users to change their default assistant from Siri to something else?

I do not want to harp on the Siri situation, but I do have one suggestion that I think Apple should listen to. Because I suspect it is going to take quite some time for the company to get the new Siri out the door properly, they should do what was previously unthinkable. That is, open up iOS to third-party assistants. I do not say this lightly. I am one of those folks who does not want iOS to be torn open like Android, but I am willing to sign on when it makes good common sense. Right now it does.

And:

I do not use Gemini as my primary LLM generally, I prefer to use ChatGPT and Claude most of the time for research, coding, and writing. But Gemini has proved to be the best assistant out of them all. So while we wait for Siri to get good, give us the ability to use custom assistants at the system level. It does not have to be available to everyone, heck create a special intent that Google and these companies need to apply for if you want. But these apps with proper system level overlays would be a massive improvement over the existing version of Siri. I do not want to have to launch the app every single time.

As a fan of the progressive opening up of iOS that’s been happening in Europe thanks to our laws, I can only welcome such a proposal – especially when I consider the fact that long-pressing the side button on my expensive phone defaults to an assistant that can’t even tell which month it is. If Apple truly thinks that Siri helps users “find what they need and get things done quickly”, they should create an Assistant API and allow other companies to compete with them. Let iPhone users decide which assistant they prefer in 2025.

Some people may argue that other assistants, unlike Siri, won’t be able to access key features such as sending messages or integrating with core iOS system frameworks. My reply would be: perhaps having a more prominent placement on iOS would actually push third-party companies to integrate with the iOS APIs that do exist. For instance, there is nothing stopping OpenAI from integrating ChatGPT with the Reminders app; they have done exactly that with MapKit, and if they wanted, they could plug into HomeKit, HealthKit, and the dozens of other frameworks available to developers. And for those iOS features that don’t have an API for other companies to support…well, that’s for Apple to fix.

From my perspective, it always goes back to the same idea: I should be able to freely swap out software on my Apple pocket computer just like I can thanks to a safe, established system on my Apple desktop computer. (Arguably, that is also the perspective of, you know, the law in Europe.) Even Google – a company that would have all the reasons not to let people swap the Gemini assistant for anything else – lets folks decide which assistant they want to use on Android. And, as you can imagine, competition there is producing some really interesting results.

I’m convinced that, at this point, a lot of people despise Siri and would simply prefer pressing their assistant button to talk to ChatGPT or Claude – even if that meant losing access to reminders, timers, and whatever it is that Siri can reliably accomplish these days. (I certainly wouldn’t mind putting Claude on my iPhone and leaving Siri on the Watch for timers and HomeKit.) Whether it’s because of superior world knowledge, proper multilingual abilities (something that Siri still doesn’t support!), or longer contextual conversations, hundreds of millions of people have clearly expressed their preference for new types of digital assistance and conversations that go beyond the antiquated skillset of Siri.

If a new version of Siri isn’t going to be ready for some time, and if Apple does indeed want to make the best computers for AI, maybe it’s time to open up that part of iOS in a way that goes beyond the (buggy) ChatGPT integration with Siri.

Permalink

App Store Vibes

Bryan Irace has an interesting take on the new generation of developer tools that have lowered the barrier to entry for new developers (and sometimes not even developers) when it comes to creating apps:

Recent criticism of Apple’s AI efforts has been juicy to say the least, but this shouldn’t distract us from continuing to criticize one of Apple’s most deserving targets: App Review. Especially now that there’s a perfectly good AI lens through which to do so.

It’s one thing for Apple’s AI product offerings to be non-competitive. Perhaps even worse is that as Apple stands still, software development is moving forward faster than ever before. Like it or not, LLMs—both through general chat interfaces and purpose-built developer tools—have meaningfully increased the rate at which new software can be produced. And they’ve done so both by making skilled developers more productive while also lowering the bar for less-experienced participants.

And:

I recently built a small iOS app for myself. I can install it on my phone directly from Xcode but it expires after seven days because I’m using a free Apple Developer account. I’m not trying to avoid paying Apple, but there’s enough friction involved in switching to a paid account that I simply haven’t been bothered. And I used to wrangle provisioning profiles for a living! I can’t imagine that I’m alone here, or that others with less tribal iOS development knowledge are going to have a higher tolerance for this. A friend asked me to send the app to them but that’d involve creating a TestFlight group, submitting a build to Apple, waiting for them to approve it, etc. Compare this to simply pushing to Cloudflare or Netlify and automatically having a URL you can send to a friend or share via Twitter. Or using tools like v0 or Replit, where hosting/distribution are already baked in.

Again, this isn’t new—but being able to build this much software this fast is new. App distribution friction has stayed constant while friction in all other stages of software development has largely evaporated. It’s the difference between inconvenient and untenable.

Perhaps “vibe coding” is the extreme version of this concept, but I think there’s something here. Creating small, low-stakes apps for personal projects or that you want to share with a small group of people is, objectively, getting easier. After reading Bryan’s post – which rightfully focuses on the distribution side of apps – I’m also wondering: what happens when the first big service comes along and figures out a way to bypass the App Store altogether (perhaps via the web?) to allow “anyone” to create apps, completely cutting out Apple and its App Review from the process?

In a way, this reminds me of blogging. Those who wanted to have an online writing space 30 years ago had to know some of the basics of hosting and HTML if they wanted to publish something for other people to read. Then Blogger came along and allowed anyone – regardless of their skill level – to be read. What if the same happened to mobile software? Should Apple and Google be ready for this possibility within the next few years?

I could see Google spin up a “Build with Gemini” initiative to let anyone create Android apps without any coding knowledge. I’m also reminded of this old Vision Pro rumor that claimed Apple’s Vision team was exploring the idea of letting people create “apps” with Siri.

If only the person in charge of that team went anywhere, right?

Permalink