Posts tagged with "artificial intelligence"

Claude Mythos Preview Will Only Secure Part of the Internet

Yesterday, Anthropic announced Claude Mythos Preview, a new general-purpose model that it says is exceptionally good at finding security vulnerabilities in code. In fact, the model is so good that Anthropic has decided not to release Mythos Preview to the general public. Instead, it’s being released to a select group of companies that control OSes and other critical software.

Anthropic found thousands of vulnerabilities across every major OS and web browser with Mythos Preview, but used these three examples to illustrate their severity:

  • Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it;
  • It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem;
  • The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.

A lengthy Frontier Red Team report brings the receipts for security researchers with an in-depth look at what Mythos Preview uncovered and the step change that the new model represents over Opus 4.6:

For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.

As part of a test, Mythos Preview also managed to escape its sandboxed environment, message the researcher conducing the test, and then, outside the parameters of the test, posted about the exploit online.

The idea behind Project Glasswing, whose participants include Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, is to give them a head start at securing their systems before similar models emerge and are exploited for cyberattacks. If Mythos Preview’s capabilities are as Anthropic makes them out to be, this seems like the right approach. However, I do worry that with time, it could lead to a two-tier Internet where big tech companies operate in relative security thanks to tools like Mythos Preview, while those without access are left to swim with the sharks.


Roadtripping with ChatGPT Voice Mode

On Saturday, my wife Jennifer and I drove to Blowing Rock, a quaint little town in the Blue Ridge Mountains. We’d been there once before, but didn’t know the town well, so as we headed west I poked at the ChatGPT icon on my dashboard to give the app’s new CarPlay integration a try. I asked:

What activities would you recommend for a day trip to Blowing Rock, North Carolina?

What I got back was a short but good list of highlights including a hike, a visit to the Blowing Rock cliffside overlook, a few restaurants, a coffee shop, and some local shops. It was similar to a list of activities I’d looked up before we left using Claude. So far, so good.

I switched back to Apple Maps and was thinking I probably wouldn’t use ChatGPT in my car very often, but that it could come in handy for similar requests, when things got a little creepy. I explained to Jennifer that ChatGPT’s CarPlay feature was new, and I had been meaning to check it out all week. Then, just as I’d said I thought it had done a pretty good job, a voice interrupted. It was ChatGPT’s voice mode saying it was glad I liked it.

You see, just like a phone call doesn’t drop when you switch apps in CarPlay, neither does ChatGPT. I supposed I should have anticipated that the mic would remain live, but I didn’t. Nor did I notice the End button in the corner of the screen; I was driving, not studying the app’s UI.

I take it as a positive sign that I didn’t expect ChatGPT to follow me back to Apple Maps. I treat chatbots like I do any app. Give it some input, and you get an output. Close the app, and you’re done. It’s not my little robot buddy. It’s a tool like any other app.

Of course, that’s not how the voice modes of these chatbots are designed to work. Chats are meant to be an engaging back and forth. But having ChatGPT jump in on our one-on-one conversation while driving down the highway was too much. Suddenly, it felt like something else was in the car eavesdropping on us.

The experience was a good lesson in the balancing of utility and social norms around AI tools. Useful as they can be in some situations, their developers need to be more mindful of user expectations and provide better cues about how they work to avoid uncomfortable surprises. The recommendations we got from ChatGPT were good, but I also don’t expect it will get a second chance on our family road trips anytime soon.


OpenAI Bets Big on Building an Everything App

OpenAI is making a big bet. One as old as time – at least time as measured by the course of app history. Having abandoned Sora and SmutGPT, the company has put all of its chips on an everything app, raising $122 billion to build it and fund its other operations.

If you listen to AppStories, you know this is a topic that goes back to our earliest episodes. Everything apps, known more commonly these days as superapps, have beguiled companies big and small forever. The temptation of “what if we stuffed so much in our app that nobody would leave” is hard to resist, but often fails. Just ask Mark Zuckerberg.

OpenAI is up front about its ambitions:

As models become more capable, the limiting factor shifts from intelligence to usability. Users do not want disconnected tools. They want a single system that can understand intent, take action, and operate across applications, data, and workflows. Our superapp will bring together ChatGPT, Codex, browsing, and our broader agentic capabilities into one agent-first experience.

Maybe. Look, I think AI is one of the most significant innovations of my lifetime, but for my money, I also think this a classic example of the mismatch between what users sometimes say they want and what companies want to hear.

However, I’m willing to entertain the idea that AI might be different. After all, it’s closer to a natural language OS than your typical productivity app in just enough ways that it may just work as a sort of super-layer that sits on top of “real” OSes like macOS, Windows, iOS, and Android.

Part of what OpenAI is imagining is straight out of the iOS playbook:

Our consumer scale becomes the front door for enterprise usage, as familiarity in daily life drives adoption at work.

I remember when my old law firm finally caved and swapped Blackberries for the iPhone its employees were demanding. So, it’s not unprecedented that consumer demand can drive enterprise adoption, but historically, it’s rare.

And, while I agree with OpenAI that “Moments like this do not come often,” its comparison of its product to electricity and highways strikes me as a bit much. Will the app that OpenAI is imagining be something that will fundamentally reshape your life or will it be just another thing that competes for your attention, like TikTok? That’s the $122 billion bet OpenAI is making, and based on my experience with everything apps, I’ll take the other side of that bet.

Permalink

First Look: Hands-On with Claude Code’s New Telegram and Discord Integrations

Late yesterday, Anthropic announced messaging support for Claude Code, allowing users to connect to a Claude Code session running on a Mac from a mobile device using Telegram and Discord bots. I spent a few hours playing with it last night, and despite being released as a research preview, the messaging integration is already very capable, but a little fiddly to set up.

Let’s take a look at what it can do.

Read more


Apple Is Working on an AI Music Tagging System

Music Business Worldwide (via MacRumors) is reporting that Apple is rolling out a voluntary metadata system for identifying AI-generated content on Apple Music called Transparency Tags. Introduced by Apple in a newsletter sent to music industry partners, Transparency Tags is:

a system of disclosure labels that record labels and music distributors can begin applying to content delivered to Apple Music immediately, and will be required to use when delivering new content in [the] future.

According to Music Business Worldwide, the tagging system covers artwork, tracks, composition elements such as lyrics, and music videos. The publication quotes Apple’s newsletter as explaining that it views Transparency Tags as part of an initial effort toward giving the music industry what it needs to develop AI policies.

Although there are currently no consequences for failing to properly tag AI-generated music, Transparency Tags are a step in the right direction. The music industry and other creative industries are all grappling with how to deal with a flood of AI-generated content in a rapidly evolving environment. I don’t expect to see one approach sweep across industries any time soon, but it’s encouraging to see Apple taking a lead in pushing the conversation forward.

Permalink


OpenClaw Showed Me What the Future of Personal AI Assistants Looks Like

Using OpenClaw via Telegram.

Using OpenClaw via Telegram.

Update, February 6: I’ve published an in-depth guide with advanced tips for secure credentials, memory management, automations, and proactive work with OpenClaw for our Club members here.

For the past week or so, I’ve been working with a digital assistant that knows my name, my preferences for my morning routine, how I like to use Notion and Todoist, but which also knows how to control Spotify and my Sonos speaker, my Philips Hue lights, as well as my Gmail. It runs on Anthropic’s Claude Opus 4.5 model, but I chat with it using Telegram. I called the assistant Navi (inspired by the fairy companion of Ocarina of Time, not the besieged alien race in James Cameron’s sci-fi film saga), and Navi can even receive audio messages from me and respond with other audio messages generated with the latest ElevenLabs text-to-speech model. Oh, and did I mention that Navi can improve itself with new features and that it’s running on my own M4 Mac mini server?

If this intro just gave you whiplash, imagine my reaction when I first started playing around with OpenClaw, the incredible open-source project by Peter Steinberger (a name that should be familiar to longtime MacStories readers) that’s become very popular in certain AI communities over the past few weeks. I kept seeing OpenClaw being mentioned by people I follow; eventually, I gave in to peer pressure, followed the instructions provided by the funny crustacean mascot on the app’s website, installed OpenClaw on my new M4 Mac mini (which is not my main production machine), and connected it to Telegram.

To say that OpenClaw has fundamentally altered my perspective of what it means to have an intelligent, personal AI assistant in 2026 would be an understatement. I’ve been playing around with OpenClaw so much, I’ve burned through 180 million tokens on the Anthropic API (yikes), and I’ve had fewer and fewer conversations with the “regular” Claude and ChatGPT apps in the process. Don’t get me wrong: OpenClaw is a nerdy project, a tinkerer’s laboratory that is not poised to overtake the popularity of consumer LLMs any time soon. Still, OpenClaw points at a fascinating future for digital assistants, and it’s exactly the kind of bleeding-edge project that MacStories readers will appreciate.

Read more


Apple Confirms AI Partnership with Google

Apple has confirmed to CNBC that it has entered into a multi-year partnership with Google to use the search giant’s models and cloud technology for its own AI efforts. According to an unnamed Apple spokesperson:

After careful evaluation, we determined that Google’s technology provides the most capable foundation for Apple Foundation Models and we’re excited about the innovative new experiences it will unlock for our users.

The report still leaves many questions unanswered, including how Gemini fits in with Apple’s own Foundation Models and whether and to what extent Apple will rely on Google hardware. However, after months of speculation and reports from Mark Gurman at Bloomberg that Apple and Google were negotiating, it looks like we’re on the cusp of Apple’s AI strategy coming into better focus.


UPDATE:

Subsequent to the statement made by Apple to CNBC, Apple and Google released a slightly more detailed joint statement that Google published on X:

Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google’s Gemini models and cloud technology. These models will help power future Apple Intelligence features, including a more personalized Siri coming this year.

After careful evaluation, Apple determined that Google’s Al technology provides the most capable foundation for Apple Foundation Models and is excited about the innovative new experiences it will unlock for Apple users. Apple Intelligence will continue to run on Apple devices and Private Cloud Compute, while maintaining Apple’s industry-leading privacy standards.

So, while the Apple Foundation Models that power Apple Intelligence will be based on Gemini and unspecified cloud technology, Apple Intelligence features themselves, including more personalized Siri, will continue to run locally on Apple devices and on Apple’s Private Cloud Compute to maintain user privacy.


How I Revived My Decade-Old App with Claude Code

Blink from 2017 (left) and 2026 (right).

Blink from 2017 (left) and 2026 (right).

Every holiday season, Federico and I spend our downtime on nerd projects. This year, both of us spent a lot of that time building tools for ourselves with Claude Code in what developed into a bit of a competition as we each tried to one-up the other’s creations. We’ll have more on what we’ve been up to on AppStories, MacStories, and for Club members soon, but today, I wanted to share an experiment I ran last night that I think captures a very personal and potentially far-reaching slice of what tools like Claude Code can enable.

Blink from 2017 running on a modern iPhone.

Blink from 2017 running on a modern iPhone.

Before I wrote at MacStories, I made a few apps, including Blink, which generated affiliate links for Apple’s media services. The app had a good run from 2015-2017, but I pulled it from the App Store when Apple ended its affiliate program for apps because that was the part of the app that was used the most. Since then, the project has sat in a private GitHub repo untouched.

Last night, I was sitting on the couch working on a Safari web extension when I opened GitHub and saw that old Blink code, which sparked a thought. I wondered whether Claude Code could update Blink to use Swift and SwiftUI with minimal effort on my part. I don’t have any intention of re-releasing Blink, but I couldn’t shake the “what if” rattling in my head, so I cloned the repo and put Claude to work.

Read more