Posts tagged with "featured"

My Latest Mac Automation Tool is a Tiny Game Controller

Source: 8BitDo.

Source: 8BitDo.

I never expected my game controller obsession to pay automation dividends, but it did last week in the form of the tiny 16-button 8BitDo Micro. For the past week, I’ve used the Micro to dictate on my Mac, interact with AI chatbots, and record and edit podcasts. While the setup won’t replace a Stream Deck or Logitech Creative Console for every use case, it excels in areas where those devices don’t because it fits comfortably in the palm of your hand and costs a fraction of those other devices.

My experiments started when I read a story on Endless Mode by Nicole Carpenter, who explained how medical students turned to two tiny 8BitDo game controllers to help with their studies. The students were using an open-source flashcard app called Anki and ran into an issue while spending long hours with their flashcards:

The only problem is that using Anki from a computer isn’t too ergonomic. You’re hunched over a laptop, and your hands start cramping from hitting all the different buttons on your keyboard. If you’re studying thousands of cards a day, it becomes a real problem—and no one needs to make studying even more intense than it already is.

To relieve the strain on their hands, the med students turned to 8BitDo’s tiny Micro and Zero 2 controllers, using them as remote controls for the Anki app. The story didn’t explain how 8BitDo’s controllers worked with Anki, but as I read it, I thought to myself, “Surely this isn’t something that was built into the app,” which immediately drew me deeper into the world of 8BitDo controllers as study aides.

8BitDo markets the Micro's other uses, but for some reason, it hasn't spread much beyond the world of medical school students. Source: 8BitDo.

8BitDo markets the Micro’s other uses, but for some reason, it hasn’t spread much beyond the world of medical school students. Source: 8BitDo.

As I suspected, the 8BitDo Micro works just as well with any app that supports keyboard shortcuts as it does with Anki. What’s curious, though, is that even though medical students have been using the Micro and Zero 2 with Anki for several years and 8BitDo’s website includes a marketing image of someone using the Micro with Clip Studio Paint on an iPad, word of the Micro’s automation capabilities hasn’t spread much. That’s something I’d like to help change.

Read more


Interview: Craig Federighi Opens Up About iPadOS, Its Multitasking Journey, and the iPad’s Essence

iPadOS 26. Source: Apple.

iPadOS 26. Source: Apple.

It’s a cool, sunny morning at Apple Park as I’m walking my way along the iconic glass ring to meet with Apple’s SVP of Software Engineering, Craig Federighi, for a conversation about the iPad.

It’s the Wednesday after WWDC, and although there are still some developers and members of the press around Apple’s campus, it seems like employees have returned to their regular routines. Peek through the glass, and you’ll see engineers working at their stations, half-erased whiteboards, and an infinite supply of Studio Displays on wooden desks with rounded corners. Some guests are still taking pictures by the WWDC sign. There are fewer security dogs, but they’re obviously all good.

Despite the list of elaborate questions on my mind about iPadOS 26 and its new multitasking, the long history of iPad criticisms (including mine) over the years, and what makes an iPad different from a Mac, I can’t stop thinking about the simplest, most obvious question I could ask – one that harkens back to an old commercial about the company’s modular tablet:

In 2025, what even is an iPad according to Federighi?

Read more



Hands-On: How Apple’s New Speech APIs Outpace Whisper for Lightning-Fast Transcription

Late last Tuesday night, after watching F1: The Movie at the Steve Jobs Theater, I was driving back from dropping Federico off at his hotel when I got a text:

Can you pick me up?

It was from my son Finn, who had spent the evening nearby and was stalking me in Find My. Of course, I swung by and picked him up, and we headed back to our hotel in Cupertino.

On the way, Finn filled me in on a new class in Apple’s Speech framework called SpeechAnalyzer and its SpeechTranscriber module. Both the class and module are part of Apple’s OS betas that were released to developers last week at WWDC. My ears perked up immediately when he told me that he’d tested SpeechAnalyzer and SpeechTranscriber and was impressed with how fast and accurate they were.

It’s still early days for these technologies, but I’m here to tell you that their speed alone is a game changer for anyone who uses voice transcription to create text from lectures, podcasts, YouTube videos, and more. That’s something I do multiple times every week for AppStories, NPC, and Unwind, generating transcripts that I upload to YouTube because the site’s built-in transcription isn’t very good.

What’s frustrated me with other tools is how slow they are. Most are built on Whisper, OpenAI’s open source speech-to-text model, which was released in 2022. It’s cheap at under a penny per one million tokens, but isn’t fast, which is frustrating when you’re in the final steps of a YouTube workflow.

An SRT file generated by Yap.

An SRT file generated by Yap.

I asked Finn what it would take to build a command line tool to transcribe video and audio files with SpeechAnalyzer and SpeechTranscriber. He figured it would only take about 10 minutes, and he wasn’t far off. In the end, it took me longer to get around to installing macOS Tahoe after WWDC than it took Finn to build Yap, a simple command line utility that takes audio and video files as input and outputs SRT- and TXT-formatted transcripts.

Yesterday, I finally took the Tahoe plunge and immediately installed Yap. I grabbed the 7GB 4K video version of AppStories episode 441, which is about 34 minutes long, and ran it through Yap. It took just 45 seconds to generate an SRT file. Here’s Yap ripping through nearly 20% of an episode of NPC in 10 seconds:

Replay

Next, I ran the same file through VidCap and MacWhisper, using its V2 Large and V3 Turbo models. Here’s how each app and model did:

App Transcripiton Time
Yap 0:45
MacWhisper (Large V3 Turbo) 1:41
VidCap 1:55
MacWhisper (Large V2) 3:55

All three transcription workflows had similar trouble with last names and words like “AppStories,” which LLMs tend to separate into two words instead of camel casing. That’s easily fixed by running a set of find and replace rules, although I’d love to feed those corrections back into the model itself for future transcriptions.

Once transcribed, a video can be used to generate additional formats like outlines.

Once transcribed, a video can be used to generate additional formats like outlines.

What stood out above all else was Yap’s speed. By harnessing SpeechAnalyzer and SpeechTranscriber on-device, the command line tool tore through the 7GB video file a full 2.2× faster than MacWhisper’s Large V3 Turbo model, with no noticeable difference in transcription quality.

At first blush, the difference between 0:45 and 1:41 may seem insignificant, and it arguably is, but those are the results for just one 34-minute video. Extrapolate that to running Yap against the hours of Apple Developer videos released on YouTube with the help of yt-dlp, and suddenly, you’re talking about a significant amount of time. Like all automation, picking up a 2.2× speed gain one video or audio clip at a time, multiple times each week, adds up quickly.

Whether you’re producing video for YouTube and need subtitles, generating transcripts to summarize lectures at school, or doing something else, SpeechAnalyzer and SpeechTranscriber – available across the iPhone, iPad, Mac, and Vision Pro – mark a significant leap forward in transcription speed without compromising on quality. I fully expect this combination to replace Whisper as the default transcription model for transcription apps on Apple platforms.

To test Apple’s new model, install the macOS Tahoe beta, which currently requires an Apple developer account, and then install Yap from its GitHub page.


iOS 26, iPadOS 26, and Liquid Glass: The MacStories Overview

During today’s WWDC 2025 keynote, held in person at Apple Park and streamed online, Apple unveiled a considerable number of upgrades to iOS and iPadOS, including a brand-new design language called Liquid Glass. This new look, which spans all of Apple’s platforms, coupled with a massive upgrade for multitasking on the iPad and numerous other additions and updates, made for packed releases for iOS and iPadOS.

Let’s take a look at everything Apple showed today for Liquid Glass, iOS, and iPadOS.

Read more


macOS Tahoe: The MacStories Overview

At its WWDC 2025 keynote held earlier today, Apple officially announced the next version of macOS, macOS Tahoe. As per the company’s naming tradition over the past decade, this new release is once again named after a location in California. This year, however, to unify the version numbers across all its operating systems, Apple has decided to align the new release with the upcoming year. This is why the version number for macOS Tahoe will be macOS 26, directly up from last year’s macOS 15.

macOS 26 features the brand-new Liquid Glass design language, which Apple is also rolling out across iOS, iPadOS, visionOS, watchOS, and tvOS. But macOS Tahoe doesn’t stop there. In addition to the flashy new look, Apple has introduced many features, ranging from a supercharged new version of Spotlight and intelligent actions in Shortcuts to new Continuity and gaming-focused features for the Mac.

Here’s a recap of everything that Apple showed off today for macOS Tahoe.

Read more


Apple Intelligence Expands: Onscreen Visual Intelligence, Shortcuts, Third-Party Apps, and More

Source: Apple.

Source: Apple.

One of the big questions heading into today’s WWDC keynote was how Apple would address its AI efforts. After a splashy introduction last year followed by a staggered rollout and the eventual delay of the more personalized Siri, it was unclear how much focus the company would put on Apple Intelligence during its big announcement video.

Surprisingly, they came right out of the gate with a segment on Apple Intelligence, even going so far as to mention the fact that the more personalized Siri needed more time; it’s slated to be released “in the coming year.” But SVP of Software Craig Federighi also said that Apple Intelligence had progressed with more capable and efficient models and teased that more Apple Intelligence features would be revealed throughout the presentation. Rather than dedicating a significant portion of the keynote just to AI features, the company returned to a platform-centered structure for the rest of the video and mentioned Apple Intelligence as it related to each OS.

In its second year, Apple Intelligence is set to expand in more ways than one. Perhaps most excitingly, third-party developers will soon have access to Apple Intelligence’s on-device foundation model, enabling them to implement AI features in their apps that work offline in a privacy-respecting way. And because the framework is local, it will be available to developers at no additional cost with no API fees.

Read more


From the Creators of Shortcuts, Sky Extends AI Integration and Automation to Your Entire Mac

Sky for Mac.

Sky for Mac.

Over the course of my career, I’ve had three distinct moments in which I saw a brand-new app and immediately felt it was going to change how I used my computer – and they were all about empowering people to do more with their devices.

I had that feeling the first time I tried Editorial, the scriptable Markdown text editor by Ole Zorn. I knew right away when two young developers told me about their automation app, Workflow, in 2014. And I couldn’t believe it when Apple showed that not only had they acquired Workflow, but they were going to integrate the renamed Shortcuts app system-wide on iOS and iPadOS.

Notably, the same two people – Ari Weinstein and Conrad Kramer – were involved with two of those three moments, first with Workflow, then with Shortcuts. And a couple of weeks ago, I found out that they were going to define my fourth moment, along with their co-founder Kim Beverett at Software Applications Incorporated, with the new app they’ve been working on in secret since 2023 and officially announced today.

For the past two weeks, I’ve been able to use Sky, the new app from the people behind Shortcuts who left Apple two years ago. As soon as I saw a demo, I felt the same way I did about Editorial, Workflow, and Shortcuts: I knew Sky was going to fundamentally change how I think about my macOS workflow and the role of automation in my everyday tasks.

Only this time, because of AI and LLMs, Sky is more intuitive than all those apps and requires a different approach, as I will explain in this exclusive preview story ahead of a full review of the app later this year.

Read more


Early Impressions of Claude Opus 4 and Using Tools with Extended Thinking

Claude Opus 4 and extended thinking with tools.

Claude Opus 4 and extended thinking with tools.

For the past two days, I’ve been testing an early access version of Claude Opus 4, the latest model by Anthropic that was just announced today. You can read more about the model in the official blog post and find additional documentation here. What follows is a series of initial thoughts and notes based on the 48 hours I spent with Claude Opus 4, which I tested in both the Claude app and Claude Code.

For starters, Anthropic describes Opus 4 as its most capable hybrid model with improvements in coding, writing, and reasoning. I don’t use AI for creative writing, but I have dabbled with “vibe coding” for a collection of personal Obsidian plugins (created and managed with Claude Code, following these tips by Harper Reed), and I’m especially interested in Claude’s integrations with Google Workspace and MCP servers. (My favorite solution for MCP at the moment is Zapier, which I’ve been using for a long time for web automations.) So I decided to focus my tests on reasoning with integrations and some light experiments with the upgraded Claude Code in the macOS Terminal.

Read more