This Week's Sponsor:

Mindspace

A Private All-in-One Journal App Made for iPad, Offering 50% Off Your First Year.


Posts tagged with "AI"

Google Releases Gemini for Mac

Google released a native Mac app for its Gemini chatbot today.

The app, which can be launched from your Applications folder, Dock, the menu bar, or a global hotkey, will be familiar to anyone who has used Gemini in a browser. The chatbot supports Gemini 3 in Fast and Thinking modes, as well as Pro mode, which uses Gemini 3.1 Pro. Gemini can also interact with files, the contents of a window, Google Drive, Photos, and NotebookLM. It’s multimodal, too, with support for the generation of text, images, video, and music. Dig a little deeper into Gemini’s menus and you’ll find support for Canvas, Deep Research, Guided Learning, and Personalized Intelligence.

A Gemini mini window is available from the menu bar and a global hotkey.

A Gemini mini window is available from the menu bar and a global hotkey.

Even though I just downloaded the app a short time ago, my Gemini chat history was immediately available in the app. The history appears in the app’s sidebar along with a search field, My Stuff, which includes things like images and video generated in the past, and access to your account. The app is written in Swift which was a pleasant surprise.

All my past prompts were immediately available in the new Gemini Mac app.

All my past prompts were immediately available in the new Gemini Mac app.

I’ve only just begun testing Gemini for Mac, but I can already tell that it’s a cut above my hand-crafted single-purpose Safari web app solution. All the same tools found on the web are here, but in a native wrapper, which I appreciate. If you use a Mac and Gemini, the new app is well worth giving a try.

Gemini for Mac is available as a free download from Google.


Introducing Apple Frames 4: A Revamped Shortcut, Support for Frame Colors, Proportional Scaling, and the Apple Frames CLI for Developers

Apple Frames 4.

Apple Frames 4.

Well, it’s been a minute.

Today, I’m very happy to introduce Apple Frames 4, a major update to my shortcut for framing screenshots taken on Apple devices with official Apple product bezels. Apple Frames 4 is a complete rethinking of the shortcut that is noticeably faster, updated to support all the latest Apple devices, and designed to support even more personalization options. For the first time ever, Apple Frames supports multiple colors for each device, allowing you to mix and match different colored bezels for each framed screenshot; it also supports proportional scaling when merging screenshots from different Apple devices.

But that’s not all. In addition to an updated shortcut, I’m also releasing the Apple Frames CLI, an open source command-line utility that lets developers and tinkerers automate the process of framing screenshots directly from the Mac’s Terminal. And there’s more: the Apple Frames CLI is also designed to work with AI agents, and it comes with a Claude Code/Codex skill that lets coding agents take care of framing dozens or even hundreds of screenshots in just a few seconds, from any folder on your Mac.

Apple Frames 4 is the result of an idea I had months ago that enabled me to remove more than 500 actions from the shortcut, going from over 800 steps down to ~300. I did all that work manually, but it was worth it; the improved shortcut is faster and vastly more reliable than before thanks to a more intelligent logic that adapts to the growing ecosystem of Apple screen sizes and display resolutions.

Apple Frames 4 and the Apple Frames CLI represent a substantial step forward for screenshot automation, and I’ve been using both extensively for the past few weeks.

Let’s dive in.

Read more


Claude Mythos Preview Will Only Secure Part of the Internet

Yesterday, Anthropic announced Claude Mythos Preview, a new general-purpose model that it says is exceptionally good at finding security vulnerabilities in code. In fact, the model is so good that Anthropic has decided not to release Mythos Preview to the general public. Instead, it’s being released to a select group of companies that control OSes and other critical software.

Anthropic found thousands of vulnerabilities across every major OS and web browser with Mythos Preview, but used these three examples to illustrate their severity:

  • Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it;
  • It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem;
  • The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.

A lengthy Frontier Red Team report brings the receipts for security researchers with an in-depth look at what Mythos Preview uncovered and the step change that the new model represents over Opus 4.6:

For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.

As part of a test, Mythos Preview also managed to escape its sandboxed environment, message the researcher conducing the test, and then, outside the parameters of the test, posted about the exploit online.

The idea behind Project Glasswing, whose participants include Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, is to give them a head start at securing their systems before similar models emerge and are exploited for cyberattacks. If Mythos Preview’s capabilities are as Anthropic makes them out to be, this seems like the right approach. However, I do worry that with time, it could lead to a two-tier Internet where big tech companies operate in relative security thanks to tools like Mythos Preview, while those without access are left to swim with the sharks.


Roadtripping with ChatGPT Voice Mode

On Saturday, my wife Jennifer and I drove to Blowing Rock, a quaint little town in the Blue Ridge Mountains. We’d been there once before, but didn’t know the town well, so as we headed west I poked at the ChatGPT icon on my dashboard to give the app’s new CarPlay integration a try. I asked:

What activities would you recommend for a day trip to Blowing Rock, North Carolina?

What I got back was a short but good list of highlights including a hike, a visit to the Blowing Rock cliffside overlook, a few restaurants, a coffee shop, and some local shops. It was similar to a list of activities I’d looked up before we left using Claude. So far, so good.

I switched back to Apple Maps and was thinking I probably wouldn’t use ChatGPT in my car very often, but that it could come in handy for similar requests, when things got a little creepy. I explained to Jennifer that ChatGPT’s CarPlay feature was new, and I had been meaning to check it out all week. Then, just as I’d said I thought it had done a pretty good job, a voice interrupted. It was ChatGPT’s voice mode saying it was glad I liked it.

You see, just like a phone call doesn’t drop when you switch apps in CarPlay, neither does ChatGPT. I supposed I should have anticipated that the mic would remain live, but I didn’t. Nor did I notice the End button in the corner of the screen; I was driving, not studying the app’s UI.

I take it as a positive sign that I didn’t expect ChatGPT to follow me back to Apple Maps. I treat chatbots like I do any app. Give it some input, and you get an output. Close the app, and you’re done. It’s not my little robot buddy. It’s a tool like any other app.

Of course, that’s not how the voice modes of these chatbots are designed to work. Chats are meant to be an engaging back and forth. But having ChatGPT jump in on our one-on-one conversation while driving down the highway was too much. Suddenly, it felt like something else was in the car eavesdropping on us.

The experience was a good lesson in the balancing of utility and social norms around AI tools. Useful as they can be in some situations, their developers need to be more mindful of user expectations and provide better cues about how they work to avoid uncomfortable surprises. The recommendations we got from ChatGPT were good, but I also don’t expect it will get a second chance on our family road trips anytime soon.


OpenAI Bets Big on Building an Everything App

OpenAI is making a big bet. One as old as time – at least time as measured by the course of app history. Having abandoned Sora and SmutGPT, the company has put all of its chips on an everything app, raising $122 billion to build it and fund its other operations.

If you listen to AppStories, you know this is a topic that goes back to our earliest episodes. Everything apps, known more commonly these days as superapps, have beguiled companies big and small forever. The temptation of “what if we stuffed so much in our app that nobody would leave” is hard to resist, but often fails. Just ask Mark Zuckerberg.

OpenAI is up front about its ambitions:

As models become more capable, the limiting factor shifts from intelligence to usability. Users do not want disconnected tools. They want a single system that can understand intent, take action, and operate across applications, data, and workflows. Our superapp will bring together ChatGPT, Codex, browsing, and our broader agentic capabilities into one agent-first experience.

Maybe. Look, I think AI is one of the most significant innovations of my lifetime, but for my money, I also think this a classic example of the mismatch between what users sometimes say they want and what companies want to hear.

However, I’m willing to entertain the idea that AI might be different. After all, it’s closer to a natural language OS than your typical productivity app in just enough ways that it may just work as a sort of super-layer that sits on top of “real” OSes like macOS, Windows, iOS, and Android.

Part of what OpenAI is imagining is straight out of the iOS playbook:

Our consumer scale becomes the front door for enterprise usage, as familiarity in daily life drives adoption at work.

I remember when my old law firm finally caved and swapped Blackberries for the iPhone its employees were demanding. So, it’s not unprecedented that consumer demand can drive enterprise adoption, but historically, it’s rare.

And, while I agree with OpenAI that “Moments like this do not come often,” its comparison of its product to electricity and highways strikes me as a bit much. Will the app that OpenAI is imagining be something that will fundamentally reshape your life or will it be just another thing that competes for your attention, like TikTok? That’s the $122 billion bet OpenAI is making, and based on my experience with everything apps, I’ll take the other side of that bet.

Permalink

First Look: Hands-On with Claude Code’s New Telegram and Discord Integrations

Late yesterday, Anthropic announced messaging support for Claude Code, allowing users to connect to a Claude Code session running on a Mac from a mobile device using Telegram and Discord bots. I spent a few hours playing with it last night, and despite being released as a research preview, the messaging integration is already very capable, but a little fiddly to set up.

Let’s take a look at what it can do.

Read more


A Developer’s Month with OpenAI’s Codex

An eye-opening story from Steve Troughton-Smith, who tested Codex for a month and ended up rewriting a bunch of his apps and shipping versions for Windows and Android:

I spent one month battle-testing Codex 5.3, the latest model from OpenAI, since I was already paying for the $20 ChatGPT Plus plan and already had access to it at no additional cost, with task after task. It didn’t just blow away my expectations, it showed me the world has changed: we’ve just undergone a permanent, irreversible abstraction level shift. I think it will be nigh-impossible to convince somebody who grows up with this stuff that they should ever drop down and write code the old way, like we do, akin to trying to convince the average Swift developer to use assembly language.

From his conclusion:

This story is unfinished; this feels like a first foray into what software development will look like for the rest of my life. Transitioning from the instrument player to the conductor of the orchestra. I can acknowledge that this is both incredibly exciting, and deeply terrifying.

I have perused the source code of some of these projects, especially during the first few days. But very quickly I learned there’s simply nothing gained from that. Code is trivial, implementations are ephemeral, and something like Codex can chew through and rewrite a thousand lines of code in a second. Eventually, I just trusted it. Granted, I almost always had a handwritten source of truth, as detailed a spec as any, so it had patterns and structure to follow.

The models are good now. A year ago, none of them could do any of this, certainly not to this quality level. But they don’t do it alone. A ton of work went into everything here, just a different kind of work to before. Above all, what mattered most in all of the above examples was taste. My taste, the human touch. I fear for the companies, oblivious to this, that trade their priceless human resources for OpenClaw nodes in a box.

The entire story is well-documented, rich in screenshots, and full of practical details for developers who may want to attempt a similar experiment.

It’s undeniable that programming is undergoing a massive shift that has possibly already changed the profession forever. Knowing what code is and does is still essential; writing it by hand does not seem to be anymore. And it sounds like the developers who are embracing this shift are happier than ever.

I’ve been thinking about this a lot: why are some of us okay with the concept of AI displacing humans in writing code, but not so much when it comes to, say, writing prose or music? I certainly wouldn’t want AI to replace me writing this, and I absolutely cannot stand the whole concept of “AI music” (here’s a great Rick Beato video on the matter). I don’t think I have a good answer to this, but the closest I can get is: code was always a means to an end – an abstraction layer to get to the actual user experience of a digital artifact. It just so happened that humans created it and had to learn it first. With text and storytelling, the raw material is the art form itself: what you read is the experience itself. But even then, what happens when the human-sourced art form gets augmented by AI in ways that increasingly blur the lines between what is real and artificial? What happens when a videogame gets enhanced by DLSS 5 or an article is a hybrid mesh of human- and AI-generated text? I don’t have answers to these questions.

I find what’s happening to software development so scary and fascinating at the same time: developers are reinventing themselves as “orchestrators” of tools and following new agentic engineering patterns. The results, like with Steve’s story, are out there and speak for themselves. I wish more people in our community were willing to have nuanced and pragmatic conversations about it rather than blindly taking sides.

Permalink

Comet Is the First Agentic Browser for iOS Worth Trying

Comet for iOS.

Comet for iOS.

[Update: Perplexity has released an iPad version of Comet alongside the iPhone version, which you can install using the same App Store links below. However, because it wasn’t part of the TestFlight version of the app that we tested, we were unaware that it was launching with the iPhone version.]

For the past three weeks, I’ve been testing Comet, Perplexity’s cross-platform agentic web browser, on my iPhone Air. The iOS version of Comet, launching today on the App Store and (sadly) lacking an iPad counterpart, follows the expansion of Comet from macOS to Windows and Android devices, and it carries the inherent limitations of Apple’s platform. Comet for iOS is based on Safari’s WebKit engine; you cannot install third-party browser extensions due to iOS sandboxing restrictions; you can make Comet your default iOS browser, but in-app web views in third-party apps will still open with Safari View Controller, not Comet. By and large, Comet on iOS is a skin of Safari, but for the first time since the debut of Arc Search on iPhone two years ago (R.I.P.), I’m actually excited about an alternative to Safari on iOS once again.

Read more