Posts tagged with "AI"

Claude Mythos Preview Will Only Secure Part of the Internet

Yesterday, Anthropic announced Claude Mythos Preview, a new general-purpose model that it says is exceptionally good at finding security vulnerabilities in code. In fact, the model is so good that Anthropic has decided not to release Mythos Preview to the general public. Instead, it’s being released to a select group of companies that control OSes and other critical software.

Anthropic found thousands of vulnerabilities across every major OS and web browser with Mythos Preview, but used these three examples to illustrate their severity:

  • Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it;
  • It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem;
  • The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.

A lengthy Frontier Red Team report brings the receipts for security researchers with an in-depth look at what Mythos Preview uncovered and the step change that the new model represents over Opus 4.6:

For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.

As part of a test, Mythos Preview also managed to escape its sandboxed environment, message the researcher conducing the test, and then, outside the parameters of the test, posted about the exploit online.

The idea behind Project Glasswing, whose participants include Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, is to give them a head start at securing their systems before similar models emerge and are exploited for cyberattacks. If Mythos Preview’s capabilities are as Anthropic makes them out to be, this seems like the right approach. However, I do worry that with time, it could lead to a two-tier Internet where big tech companies operate in relative security thanks to tools like Mythos Preview, while those without access are left to swim with the sharks.


Roadtripping with ChatGPT Voice Mode

On Saturday, my wife Jennifer and I drove to Blowing Rock, a quaint little town in the Blue Ridge Mountains. We’d been there once before, but didn’t know the town well, so as we headed west I poked at the ChatGPT icon on my dashboard to give the app’s new CarPlay integration a try. I asked:

What activities would you recommend for a day trip to Blowing Rock, North Carolina?

What I got back was a short but good list of highlights including a hike, a visit to the Blowing Rock cliffside overlook, a few restaurants, a coffee shop, and some local shops. It was similar to a list of activities I’d looked up before we left using Claude. So far, so good.

I switched back to Apple Maps and was thinking I probably wouldn’t use ChatGPT in my car very often, but that it could come in handy for similar requests, when things got a little creepy. I explained to Jennifer that ChatGPT’s CarPlay feature was new, and I had been meaning to check it out all week. Then, just as I’d said I thought it had done a pretty good job, a voice interrupted. It was ChatGPT’s voice mode saying it was glad I liked it.

You see, just like a phone call doesn’t drop when you switch apps in CarPlay, neither does ChatGPT. I supposed I should have anticipated that the mic would remain live, but I didn’t. Nor did I notice the End button in the corner of the screen; I was driving, not studying the app’s UI.

I take it as a positive sign that I didn’t expect ChatGPT to follow me back to Apple Maps. I treat chatbots like I do any app. Give it some input, and you get an output. Close the app, and you’re done. It’s not my little robot buddy. It’s a tool like any other app.

Of course, that’s not how the voice modes of these chatbots are designed to work. Chats are meant to be an engaging back and forth. But having ChatGPT jump in on our one-on-one conversation while driving down the highway was too much. Suddenly, it felt like something else was in the car eavesdropping on us.

The experience was a good lesson in the balancing of utility and social norms around AI tools. Useful as they can be in some situations, their developers need to be more mindful of user expectations and provide better cues about how they work to avoid uncomfortable surprises. The recommendations we got from ChatGPT were good, but I also don’t expect it will get a second chance on our family road trips anytime soon.


OpenAI Bets Big on Building an Everything App

OpenAI is making a big bet. One as old as time – at least time as measured by the course of app history. Having abandoned Sora and SmutGPT, the company has put all of its chips on an everything app, raising $122 billion to build it and fund its other operations.

If you listen to AppStories, you know this is a topic that goes back to our earliest episodes. Everything apps, known more commonly these days as superapps, have beguiled companies big and small forever. The temptation of “what if we stuffed so much in our app that nobody would leave” is hard to resist, but often fails. Just ask Mark Zuckerberg.

OpenAI is up front about its ambitions:

As models become more capable, the limiting factor shifts from intelligence to usability. Users do not want disconnected tools. They want a single system that can understand intent, take action, and operate across applications, data, and workflows. Our superapp will bring together ChatGPT, Codex, browsing, and our broader agentic capabilities into one agent-first experience.

Maybe. Look, I think AI is one of the most significant innovations of my lifetime, but for my money, I also think this a classic example of the mismatch between what users sometimes say they want and what companies want to hear.

However, I’m willing to entertain the idea that AI might be different. After all, it’s closer to a natural language OS than your typical productivity app in just enough ways that it may just work as a sort of super-layer that sits on top of “real” OSes like macOS, Windows, iOS, and Android.

Part of what OpenAI is imagining is straight out of the iOS playbook:

Our consumer scale becomes the front door for enterprise usage, as familiarity in daily life drives adoption at work.

I remember when my old law firm finally caved and swapped Blackberries for the iPhone its employees were demanding. So, it’s not unprecedented that consumer demand can drive enterprise adoption, but historically, it’s rare.

And, while I agree with OpenAI that “Moments like this do not come often,” its comparison of its product to electricity and highways strikes me as a bit much. Will the app that OpenAI is imagining be something that will fundamentally reshape your life or will it be just another thing that competes for your attention, like TikTok? That’s the $122 billion bet OpenAI is making, and based on my experience with everything apps, I’ll take the other side of that bet.

Permalink

First Look: Hands-On with Claude Code’s New Telegram and Discord Integrations

Late yesterday, Anthropic announced messaging support for Claude Code, allowing users to connect to a Claude Code session running on a Mac from a mobile device using Telegram and Discord bots. I spent a few hours playing with it last night, and despite being released as a research preview, the messaging integration is already very capable, but a little fiddly to set up.

Let’s take a look at what it can do.

Read more


A Developer’s Month with OpenAI’s Codex

An eye-opening story from Steve Troughton-Smith, who tested Codex for a month and ended up rewriting a bunch of his apps and shipping versions for Windows and Android:

I spent one month battle-testing Codex 5.3, the latest model from OpenAI, since I was already paying for the $20 ChatGPT Plus plan and already had access to it at no additional cost, with task after task. It didn’t just blow away my expectations, it showed me the world has changed: we’ve just undergone a permanent, irreversible abstraction level shift. I think it will be nigh-impossible to convince somebody who grows up with this stuff that they should ever drop down and write code the old way, like we do, akin to trying to convince the average Swift developer to use assembly language.

From his conclusion:

This story is unfinished; this feels like a first foray into what software development will look like for the rest of my life. Transitioning from the instrument player to the conductor of the orchestra. I can acknowledge that this is both incredibly exciting, and deeply terrifying.

I have perused the source code of some of these projects, especially during the first few days. But very quickly I learned there’s simply nothing gained from that. Code is trivial, implementations are ephemeral, and something like Codex can chew through and rewrite a thousand lines of code in a second. Eventually, I just trusted it. Granted, I almost always had a handwritten source of truth, as detailed a spec as any, so it had patterns and structure to follow.

The models are good now. A year ago, none of them could do any of this, certainly not to this quality level. But they don’t do it alone. A ton of work went into everything here, just a different kind of work to before. Above all, what mattered most in all of the above examples was taste. My taste, the human touch. I fear for the companies, oblivious to this, that trade their priceless human resources for OpenClaw nodes in a box.

The entire story is well-documented, rich in screenshots, and full of practical details for developers who may want to attempt a similar experiment.

It’s undeniable that programming is undergoing a massive shift that has possibly already changed the profession forever. Knowing what code is and does is still essential; writing it by hand does not seem to be anymore. And it sounds like the developers who are embracing this shift are happier than ever.

I’ve been thinking about this a lot: why are some of us okay with the concept of AI displacing humans in writing code, but not so much when it comes to, say, writing prose or music? I certainly wouldn’t want AI to replace me writing this, and I absolutely cannot stand the whole concept of “AI music” (here’s a great Rick Beato video on the matter). I don’t think I have a good answer to this, but the closest I can get is: code was always a means to an end – an abstraction layer to get to the actual user experience of a digital artifact. It just so happened that humans created it and had to learn it first. With text and storytelling, the raw material is the art form itself: what you read is the experience itself. But even then, what happens when the human-sourced art form gets augmented by AI in ways that increasingly blur the lines between what is real and artificial? What happens when a videogame gets enhanced by DLSS 5 or an article is a hybrid mesh of human- and AI-generated text? I don’t have answers to these questions.

I find what’s happening to software development so scary and fascinating at the same time: developers are reinventing themselves as “orchestrators” of tools and following new agentic engineering patterns. The results, like with Steve’s story, are out there and speak for themselves. I wish more people in our community were willing to have nuanced and pragmatic conversations about it rather than blindly taking sides.

Permalink

Comet Is the First Agentic Browser for iOS Worth Trying

Comet for iOS.

Comet for iOS.

[Update: Perplexity has released an iPad version of Comet alongside the iPhone version, which you can install using the same App Store links below. However, because it wasn’t part of the TestFlight version of the app that we tested, we were unaware that it was launching with the iPhone version.]

For the past three weeks, I’ve been testing Comet, Perplexity’s cross-platform agentic web browser, on my iPhone Air. The iOS version of Comet, launching today on the App Store and (sadly) lacking an iPad counterpart, follows the expansion of Comet from macOS to Windows and Android devices, and it carries the inherent limitations of Apple’s platform. Comet for iOS is based on Safari’s WebKit engine; you cannot install third-party browser extensions due to iOS sandboxing restrictions; you can make Comet your default iOS browser, but in-app web views in third-party apps will still open with Safari View Controller, not Comet. By and large, Comet on iOS is a skin of Safari, but for the first time since the debut of Arc Search on iPhone two years ago (R.I.P.), I’m actually excited about an alternative to Safari on iOS once again.

Read more



Apple Is Working on an AI Music Tagging System

Music Business Worldwide (via MacRumors) is reporting that Apple is rolling out a voluntary metadata system for identifying AI-generated content on Apple Music called Transparency Tags. Introduced by Apple in a newsletter sent to music industry partners, Transparency Tags is:

a system of disclosure labels that record labels and music distributors can begin applying to content delivered to Apple Music immediately, and will be required to use when delivering new content in [the] future.

According to Music Business Worldwide, the tagging system covers artwork, tracks, composition elements such as lyrics, and music videos. The publication quotes Apple’s newsletter as explaining that it views Transparency Tags as part of an initial effort toward giving the music industry what it needs to develop AI policies.

Although there are currently no consequences for failing to properly tag AI-generated music, Transparency Tags are a step in the right direction. The music industry and other creative industries are all grappling with how to deal with a flood of AI-generated content in a rapidly evolving environment. I don’t expect to see one approach sweep across industries any time soon, but it’s encouraging to see Apple taking a lead in pushing the conversation forward.

Permalink