This Week's Sponsor:

SoundSource

New Year, New Audio Setup: SoundSource 6 from Rogue Amoeba


Posts tagged with "automation"

How I Used Claude to Build a Transcription Bot that Learns From Its Mistakes

Step 1: Transcribe with parakeet-mlx.

Step 1: Transcribe with parakeet-mlx.

[Update: Due to the way parakeet-mlx handles transcript timeline synchronization, which can result in caption timing issues, this workflow has been reverted to use the Apple Speech framework. Otherwise, the workflow remains the same as described below.]

When I started transcribing AppStories and MacStories Unwind three years ago, I had wanted to do so for years, but the tools at the time were either too inaccurate or too expensive. That turned a corner with OpenAI’s Whisper, an open-source speech-to-text model that blew away other readily available options.

Still, the results weren’t good enough to publish those transcripts anywhere. Instead, I kept them as text-searchable archives to make it easier to find and link to old episodes.

Since then, a cottage industry of apps has arisen around Whisper transcription. Some of those tools do a very good job with what is now an aging model, but I have never been satisfied with their accuracy or speed. However, when we began publishing our podcasts as videos, I knew it was finally time to start generating transcripts because as inaccurate as Whisper is, YouTube’s automatically generated transcripts are far worse.

VidCap in action.

VidCap in action.

My first stab at video transcription was to use apps like VidCap and MacWhisper. After a transcript was generated, I’d run it through MassReplaceIt, a Mac app that lets you create and apply a huge dictionary of spelling corrections using a bulk find-and-replace operation. As I found errors in AI transcriptions by manually skimming them, I’d add those corrections to my dictionary. As a result, the transcriptions improved over time, but it was a cumbersome process that relied on me spotting errors, and I didn’t have time to do more than scan through each transcript quickly.

That’s why I was so enthusiastic about the speech APIs that Apple introduced last year at WWDC. The accuracy wasn’t any better than Whisper, and in some circumstances it was worse, but it was fast, which I appreciate given the many steps needed to get a YouTube video published.

The process was sped up considerably when Claude Skills were released. A skill can combine a script with instructions to create a hybrid automation with both the deterministic outcome of scripting and the fuzzy analysis of LLMs.

Transcribing with yap.

Transcribing with yap.

I’d run yap, a command line tool that I used to transcribe videos with Apple’s speech-to-text framework. Next, I’d open the Claude app, attach the resulting transcript, and run a skill that would run the script, replacing known spelling errors. Then, Claude would analyze the text against its knowledge base, looking for other likely misspellings. When it found one, Claude would reply with some textual context, asking if the proposed change should be made. After I responded, Claude would further improve my transcript, and I’d tell Claude which of its suggestions to add to the script’s dictionary, helping improve the results a little each time I used the skill.

Over the holidays, I refined my skill further and moved it from the Claude app to the Terminal. The first change was to move to parakeet-mlx, an Apple silicon-optimized version of NVIDIA’s Parakeet model that was released last summer. Parakeet isn’t as fast as Apple’s speech APIs, but it’s more accurate, and crucially, its mistakes are closer to the right answers phonetically than the ones made by Apple’s tools. Consequently, Claude is more likely to find mistakes that aren’t in my dictionary of misspellings in its final review.

Managing the built-in corrections dictionary.

Managing the built-in corrections dictionary.

With Claude Opus 4.5’s assistance, I rebuilt the Python script at the heart of my Claude skill to run videos through parakeet-mlx, saving the results as either a .srt or .txt file (or both) in the same location as the original file but prepended with “CLEANED TRANSCRIPT.” Because Claude Code can run scripts and access local files from Terminal, the transition to the final fuzzy pass for errors is seamless. Claude asks permission to access the cleaned transcript file that the script creates and then generates a report with suggested changes.

A list of obscure words Claude suggested changing. Every one was correct.

A list of obscure words Claude suggested changing. Every one was correct.

The last step is for me to confirm which suggested changes should be made and which should be added to the dictionary of corrections. The whole process takes just a couple of minutes, and it’s worth the effort. For the last episode of AppStories, the script found and corrected 27 errors, many of which were misspellings of our names, our podcasts, and MacStories. The final pass by Claude managed to catch seven more issues, including everything from a misspelling of the band name Deftones to Susvara, a model of headphones, and Bazzite, an open-source SteamOS project. Those are far from everyday words, but now, their misspellings are not only fixed in the latest episode of AppStories, they’re in the dictionary where those words will always be corrected whether Claude’s analysis catches them or not.

Claude even figured out "goti" was a reference to GOTY (Game of the Year).

Claude even figured out “goti” was a reference to GOTY (Game of the Year).

I’ve used this same pattern over and over again. I have Claude build me a reliable, deterministic script that helps me work more efficiently; then, I layer in a bit of generative analysis to improve the script in ways that would be impossible or incredibly complex to code deterministically. Here, that generative “extra” looks for spelling errors. Elsewhere, I use it to do things like rank items in a database based on a natural language prompt. It’s an additional pass that elevates the performance of the workflow beyond what was possible when I was using a find-and-replace app and later a simple dictionary check that I manually added items to. The idea behind my transcription cleanup workflow has been the same since the beginning, but boy, have the tools improved the results since I first used Whisper three years ago.


Two Months with the Narwal Freo X10 Pro

In the depths of the pandemic, I bought an iRobot Roomba j7 vacuum. At the time, it was one of the nicer models iRobot offered, but it was expensive. It did a passable job in areas with few obstacles, but it filled up fast, had a hard time positioning itself on its base and frequently got clogged with debris, requiring me to partially disassemble and clean it regularly. The experience was bad enough that I’d written off robot vacuums as nice-to-have appliances that weren’t a great value.

So, when Narwal contacted me to see if I wanted to test its new Freo X10 Pro, I was hesitant at first. However, I’d seen a couple of glowing early reviews online, so I thought I’d see if the passage of time had been good to robo-vacuums, and boy has it. The Narwal Freo X10 Pro is not only an excellent vacuum cleaner, but a mopping champ, too.

Read more


Sky Acquired by OpenAI

Source: OpenAI

Source: OpenAI

Sky, the AI automation app that Federico previewed for MacStories readers in May, has been acquired by OpenAI.

Nick Turley, OpenAI’s Vice President & Head of ChatGPT said of the deal in an OpenAI press release:

We’re building a future where ChatGPT doesn’t just respond to your prompts, it helps you get things done. Sky’s deep integration with the Mac accelerates our vision of bringing AI directly into the tools people use every day.

I’m not surprised by this development at all. OpenAI, Anthropic, and Perplexity have all been developing features similar to what Sky could do for a while now. In addition, Sam Altman was an investor in Software Applications Incorporated, the company behind Sky.

Ari Weinstein of Software Applications Incorporated, who was one of the co-founders of Workflow, which was later acquired by Apple and became Shortcuts, said of the acquisition:

We’ve always wanted computers to be more empowering, customizable, and intuitive. With LLMs, we can finally put the pieces together. That’s why we built Sky, an AI experience that floats over your desktop to help you think and create. We’re thrilled to join OpenAI to bring that vision to hundreds of millions of people.

It’s not entirely clear what will become of Sky at this point. OpenAI’s press release simply states that the company will be working on integrating Sky’s capabilities.


LLMs As Conduits for Data Portability Between Apps

One of the unsung benefits of modern LLMs – especially those with MCP support or proprietary app integrations – is their inherent ability to facilitate data transfer between apps and services that use different data formats.

This is something I’ve been pondering for the past few months, and the latest episode of Cortex – where Myke wished it was possible to move between task managers like you can with email clients – was the push I needed to write something up. I’ve personally taken on multiple versions of this concept with different LLMs, and the end result was always the same: I didn’t have to write a single line of code to create import/export functionalities that two services I wanted to use didn’t support out of the box.

Read more


One Month with the Aqara G410 Video Doorbell

Last month, after an advanced preview at CES back in January, Aqara released an update to its G4 smart video doorbell dubbed the Doorbell Camera Hub G410 Select. I had been keeping my eye out for this release ever since its announcement, and it just so happened to coincide with the passing of my existing smart doorbell from Netatmo. That was more than enough reason to purchase the G410, and over a month of daily usage, I’ve been enjoying several of the camera’s excellent new features while also wishing for some improvements in other areas.

Read more


Philips Hue Adds Flexibility to the Play Line with New Wall Washer Lights

For the past couple of weeks, I’ve been testing a pair of Philips Hue Play wall washer lights along with a Play HDMI sync box 8K that the company sent me to test. The wall washer lights are a new and interesting approach to accent lighting for the Hue Play line that I like a lot, but they also come with a premium price tag, so it’s worth taking a close look at what they offer.

Philips Hue's Play wall washer lights. Source: Philips Hue.

Philips Hue’s Play wall washer lights. Source: Philips Hue.

I’ve been using Philips Hue Play lights for a while. I have two Play gradient light tubes in my office; one sits behind a shelf on my desk, providing a backlight to my work environment, while the other is on the top of a tall bookshelf, illuminating what would otherwise be a dark corner of the room. I typically set them to a natural light color using Adaptive Lighting in Apple’s Home app, but they can do fancy gradient colors, too, which can be a fun way to mix things up.

A more traditional Play wall washer setup than mine. Source: Philips Hue.

A more traditional Play wall washer setup than mine. Source: Philips Hue.

But the downside of tube lights is that they take up a lot of horizontal space. That’s where the new wall washer lights come in. They’re cylindrical with a vertical and angled slice taken out of one side, which is where the LEDs are located. Most notably, though, at around six inches tall by a little more than three inches wide, the wall washers work in a much wider variety of places than tube lights. That compact footprint has been perfect for fitting behind my TV, where I’ve already crammed gaming consoles, a Wi-Fi router, and other gear.

Other highlights of the Hue Play wall washers include:

  • ColorCast, Philips Hue’s term for the way the wall washers generate highly saturated multi-colored gradients,
  • 1035 lumens of light, which is impressive for such a small device, and
  • the ability to display white light in a wide 2000–6500 Kelvin range.

The Play wall washers require a Hue Bridge and are compatible with HomeKit, allowing you to use either the Hue app or the Home app to turn them on and off, dim them, and change their colors.

Read more


My Latest Mac Automation Tool is a Tiny Game Controller

Source: 8BitDo.

Source: 8BitDo.

I never expected my game controller obsession to pay automation dividends, but it did last week in the form of the tiny 16-button 8BitDo Micro. For the past week, I’ve used the Micro to dictate on my Mac, interact with AI chatbots, and record and edit podcasts. While the setup won’t replace a Stream Deck or Logitech Creative Console for every use case, it excels in areas where those devices don’t because it fits comfortably in the palm of your hand and costs a fraction of those other devices.

My experiments started when I read a story on Endless Mode by Nicole Carpenter, who explained how medical students turned to two tiny 8BitDo game controllers to help with their studies. The students were using an open-source flashcard app called Anki and ran into an issue while spending long hours with their flashcards:

The only problem is that using Anki from a computer isn’t too ergonomic. You’re hunched over a laptop, and your hands start cramping from hitting all the different buttons on your keyboard. If you’re studying thousands of cards a day, it becomes a real problem—and no one needs to make studying even more intense than it already is.

To relieve the strain on their hands, the med students turned to 8BitDo’s tiny Micro and Zero 2 controllers, using them as remote controls for the Anki app. The story didn’t explain how 8BitDo’s controllers worked with Anki, but as I read it, I thought to myself, “Surely this isn’t something that was built into the app,” which immediately drew me deeper into the world of 8BitDo controllers as study aides.

8BitDo markets the Micro's other uses, but for some reason, it hasn't spread much beyond the world of medical school students. Source: 8BitDo.

8BitDo markets the Micro’s other uses, but for some reason, it hasn’t spread much beyond the world of medical school students. Source: 8BitDo.

As I suspected, the 8BitDo Micro works just as well with any app that supports keyboard shortcuts as it does with Anki. What’s curious, though, is that even though medical students have been using the Micro and Zero 2 with Anki for several years and 8BitDo’s website includes a marketing image of someone using the Micro with Clip Studio Paint on an iPad, word of the Micro’s automation capabilities hasn’t spread much. That’s something I’d like to help change.

Read more


Testing AirPods 4’s Beta Update and Improved Recording Quality for Voice Notes

Earlier today, I updated my AirPods 4’s firmware to the beta version, which Apple released yesterday. I was curious to play around with the software update for two reasons:

  1. AirPods are getting support for automatically pausing media playback when you fall asleep, and
  2. Apple is advertising improved “studio quality” recording on AirPods 4 and AirPods Pro 2 with this update.

I’ll cut to the chase: while I haven’t been able to test sleep detection yet since I don’t take naps during the day, I think Apple delivered on its promise of improved voice recordings with AirPods.

Read more


From the Creators of Shortcuts, Sky Extends AI Integration and Automation to Your Entire Mac

Sky for Mac.

Sky for Mac.

Over the course of my career, I’ve had three distinct moments in which I saw a brand-new app and immediately felt it was going to change how I used my computer – and they were all about empowering people to do more with their devices.

I had that feeling the first time I tried Editorial, the scriptable Markdown text editor by Ole Zorn. I knew right away when two young developers told me about their automation app, Workflow, in 2014. And I couldn’t believe it when Apple showed that not only had they acquired Workflow, but they were going to integrate the renamed Shortcuts app system-wide on iOS and iPadOS.

Notably, the same two people – Ari Weinstein and Conrad Kramer – were involved with two of those three moments, first with Workflow, then with Shortcuts. And a couple of weeks ago, I found out that they were going to define my fourth moment, along with their co-founder Kim Beverett at Software Applications Incorporated, with the new app they’ve been working on in secret since 2023 and officially announced today.

For the past two weeks, I’ve been able to use Sky, the new app from the people behind Shortcuts who left Apple two years ago. As soon as I saw a demo, I felt the same way I did about Editorial, Workflow, and Shortcuts: I knew Sky was going to fundamentally change how I think about my macOS workflow and the role of automation in my everyday tasks.

Only this time, because of AI and LLMs, Sky is more intuitive than all those apps and requires a different approach, as I will explain in this exclusive preview story ahead of a full review of the app later this year.

Read more