OpenAI Opens Up ChatGPT App Submissions to Developers

By John Voorhees

Announced earlier this year at OpenAI’s DevDay, developers may now submit ChatGPT apps for review and publication. OpenAI’s blog post explains that:

Apps extend ChatGPT conversations by bringing in new context and letting users take actions like order groceries, turn an outline into a slide deck, or search for an apartment.

Under the hood, OpenAI is using MCP, Model Context Protocol, which was pioneered by Anthropic late last year and donated to the Agentic AI Foundation last week.

Apps are currently available in the web version of ChatGPT from the sidebar or tools menu and, once connected, can be accessed by @mentioning them. Early participants include Adobe, which preannounced its apps last week, Apple Music, Spotify, Zillow, OpenTable, Figma, Canva, Expedia, Target, AllTrails, Instacart, and others.

I was hoping the Apple Music app would allow me to query my music library directly, but that’s not possible. Instead, it allows ChatGPT to do things like search Apple Music’s full catalog and generate playlists, which is useful but limited.

ChatGPT’s Apple Music app lets you create playlists.

Currently, there’s no way for developers to complete transactions inside ChatGPT. Instead, sales can be kicked to another app or the web, although OpenAI says it is exploring ways to offer transactions inside ChatGPT. Developers who want to submit an app must follow OpenAI’s app submission guidelines (sound familiar?) and can learn more from a variety of resources that OpenAI has made available.

A playlist generated by ChatGPT from a 40-year-old setlist.

I haven’t spent a lot of time with the apps that are available, but despite the lack of access to your library, the Apple Music integration can be useful when combined with ChatGPT’s world knowledge. I asked it to create a playlist of the songs that The Replacements played at a show I saw in 1985, and while I don’t recall the exact setlist, ChatGPT matched what’s on Setlist.fm, a user-maintained wiki of live shows. I could have made this playlist myself, but it was convenient to have ChatGPT do it instead, even if the Apple Music integration is limited to 25-song playlists, which meant that The Replacements’ setlist was split into two playlists.

We’re still in the early days of MCP, and participation by companies will depend on whether they can make incremental sales to users via ChatGPT. Still, there’s clearly potential for apps embedded in chatbots to take off.

Adobe Announces Image and PDF Integration with ChatGPT

By John Voorhees

Source: Adobe.

Adobe announced today that it has teamed up with OpenAI to give ChatGPT users access to Photoshop, Express, and Acrobat from inside the chatbot. The new integration is available starting today at no additional cost to ChatGPT users.

Source: Adobe.

In a press release to Business Wire, Adobe explains that its three apps can be used by ChatGPT users to:

Easily edit and uplevel images with Adobe Photoshop: Adjust a specific part of an image, fine tune image settings like brightness, contrast and exposure, and apply creative effects like Glitch and Glow – all while preserving the quality of the image.

Create and personalize designs with Adobe Express: Browse Adobe Express’ extensive library of professional designs to find the best one for any moment, fill in the text, replace images, animate designs and iterate on edits – all directly inside the chat and without needing to switch to another app – to create standout content for any occasion.

Transform and organize documents with Adobe Acrobat: Edit PDFs directly in the chat, extract text or tables, organize and merge multiple files, compress files and convert them to PDF while keeping formatting and quality intact. Acrobat for ChatGPT also enables people to easily redact sensitive details.

Source: Adobe.

This strikes me as a savvy move by Adobe. Allowing users to request image and PDF edits and design documents with natural language prompts makes its tools more approachable. That could attract new users who later move to an Adobe subscription to get more control over their creations and Adobe’s other offerings.

From OpenAI’s standpoint, this is clearly a response to the consumer-facing Gemini features that Google has begun releasing, which include new image and video generation tools and reportedly caused Sam Altman to declare a “code red” inside the company. I understand the OpenAI freakout. Google has a huge user base and has been doing consumer products far longer than OpenAI, but I can’t say I’ve been very impressed with Gemini 3. Perhaps that’s simply because I don’t care for generative images and video, but these latest moves by Google and OpenAI make it clear that they see them as foundational to consumer-facing AI tools.

How Stu Maschwitz Vibe Coded His Way Into an App Rejection and What It Means for the Future of Apps →

Linked By John Voorhees

This week on AppStories, Federico and I talked about the personal productivity tools we’ve built for ourselves using Claude. They’re hyper-specific scripts and plugins that aren’t likely to be useful to anyone but us, which is fine because that’s all they’re intended to be.

Stu Maschwitz took a different approach. He’s had a complex shortcut called Drinking Buddy for years that tracks alcohol consumption and calculates your Blood Alcohol Level using an established formula. But because he was butting up against the limits of what Shortcuts can do, he vibe coded an iOS version of Drinking Buddy.

Two things struck me about Maschwitz’s experience. First, the app he used to create Drinking Buddy for iOS was Bitrig, which Federico and I mentioned briefly on AppStories. His experience struck a chord with me:

It’s a bit like building an app by talking to a polite and well-meaning tech support agent on the phone — only their computer is down and they can’t test the app themselves.

But power through it, and you have an app.

That’s exactly how scripting with Claude feels. It compliments you on how smart you are, gets you 90% of the way to the finish line quickly, and then tortures you with the last 10%. That, in a nutshell, is coding with AI, at least for anyone with limited development skills, like myself.

But the second and more interesting lesson from Maschwitz’s post is what it portends for apps in general. App Review rejected Drinking Buddy’s Blood Alcohol Level calculation on the basis of Section 1.4, the Physical Harm rule.

Maschwitz appealed and was rejected, even though other Blood Alcohol Level apps are available on the App Store. However, instead of pushing the rejection with App Review further, Maschwitz turned to Lovable, another AI app creation tool, which generates web apps. With screenshots from his rejected iOS app and a detailed spec in hand, Maschwitz turned Drinking Buddy into a progressive web app.

Maschwitz’s experience is a great example of what we covered on AppStories. App creation tools, whether they generate native apps or web apps, are evolving rapidly. And, while they can be frustrating to use at times, are limited in what they can produce, and don’t solve a myriad of problems like customer support that we detail on AppStories, they’re getting better at code quickly. Whether you’re building for yourself, like we are at MacStories, or to share your ideas with others, like Stu Maschwitz, change is coming to apps. Some AI-generated apps will be offered in galleries inside the tools that created them, others will be designed for the web to avoid App Review, and some will likely live as perpetual TestFlight betas or scripts sitting on just one person’s computer, but regardless of the medium, bringing your ideas to life with code has never been more possible.

Permalink

John Giannandrea’s Retirement From Apple Announced

By John Voorhees

Today Apple announced the retirement of John Giannandrea, the company’s senior vice president for Machine Learning and AI Strategy. Giannandrea will remain at Apple as an advisor until next spring.

News of Giannandrea’s retirement was paired with an announcement that Apple has hired Amar Subramanya as vice president of AI. Subramanya, who worked at Microsoft since this past summer, previously worked at Google for 16 years on projects including the company’s Gemini Assistant. Subramanya will take the lead on Apple Foundation Models, ML research, and AI Safety and Evaluation, while other areas of Giannandrea’s work will be inherited by Sabih Khan and Eddy Cue.

Apple CEO Tim Cook thanked Giannandrea for his tenure at the company:

We are thankful for the role John played in building and advancing our AI work, helping Apple continue to innovate and enrich the lives of our users. AI has long been central to Apple’s strategy, and we are pleased to welcome Amar to Craig’s leadership team and to bring his extraordinary AI expertise to Apple. In addition to growing his leadership team and AI responsibilities with Amar’s joining, Craig has been instrumental in driving our AI efforts, including overseeing our work to bring a more personalized Siri to users next year.

Given the troubled history of Apple’s AI efforts, the retirement of Giannandrea isn’t surprising. It will be interesting to see if Subramanya settles into his new role given the frequency with which top AI talent tends to turn over in the tech industry.

Why is ChatGPT for Mac So Good?→

Linked By Federico Viticci

Great post by Allen Pike on the importance of a great app experience for modern LLMs, which I recently wrote about. He opens with this line, which is a new axiom I’m going to reuse extensively:

A model is only as useful as its applications.

And on ChatGPT for Mac specifically:

The app does a good job of following the platform conventions on Mac. That means buttons, text fields, and menus behave as they do in other Mac apps. While ChatGPT is imperfect on both Mac and web, both platforms have the finish you would expect from a daily-use tool.

[…]

It’s easier to get a polished app with native APIs, but at a certain scale separate apps make it hard to rapidly iterate a complex enterprise product while keeping it in sync on each platform, while also meeting your service and customer obligations. So for a consumer-facing app like ChatGPT or the no-modifier Copilot, it’s easier to go native. For companies that are, at their core, selling to enterprises, you get Electron apps.

I don’t hate Electron as much as others in our community, but I can’t deny that ChatGPT is one of the nicest AI apps for Mac I’ve used. The other is the recently updated BoltAI. And they’re both native Mac apps.

Permalink

The AI App Experience Matters More Than Benchmarks Now

By Federico Viticci

Different experiences with app connectors in Claude, Perplexity, and ChatGPT.

I was catching up on different articles after the release of Claude Opus 4.5 earlier this week, and this part from Simon Willison’s blog post about it stood out to me:

I’m not saying the new model isn’t an improvement on Sonnet 4.5—but I can’t say with confidence that the challenges I posed it were able to identify a meaningful difference in capabilities between the two.

This represents a growing problem for me. My favorite moments in AI are when a new model gives me the ability to do something that simply wasn’t possible before. In the past these have felt a lot more obvious, but today it’s often very difficult to find concrete examples that differentiate the new generation of models from their predecessors.

This is something that I’ve felt every few weeks (with each new model release from the major AI labs) over the past year: if you’re really plugged into this ecosystem, it can be hard to spot meaningful differences between major models on a release-by-release basis. That’s not to say that real progress in intelligence, knowledge, or tool-calling isn’t being made: benchmarks and evaluations performed by established organizations tell a clear story. At the same time, it’s also worth keeping in mind that more companies these days may be optimizing their models for benchmarks to come out on top and, more importantly, that the vast majority of folks don’t have a suite of personal benchmarks to evaluate different models for their workflows. Simon Willison thinks that people who use AI for work should create personalized test suites, which is something I’m going to consider for prompts that I use frequently. I also feel like Ethan Mollick’s advice of picking a reasoning model and checking in every few months to reassess AI progress is probably the best strategy for most people who don’t want to tweak their AI workflows every other week.

I Finally Tested the M5 iPad Pro’s Neural-Accelerated AI, and the Hype Is Real

By Federico Viticci

The M5 iPad Pro.

The best kind of follow-up article isn’t one that clarifies a topic that someone got wrong (although I do love that, especially when that “someone” isn’t me); it’s one that provides more context to a story that was incomplete. My M5 iPad Pro review was an incomplete narrative. As you may recall, I was unable to test Apple’s promised claims of 3.5× improvements for local AI processing thanks to the new Neural Accelerators built into the M5’s GPU. It’s not that I didn’t believe Apple’s numbers. I simply couldn’t test them myself due to the early nature of the software and the timing of my embargo.

Well, I was finally able to test local AI performance with a pre-release version of MLX optimized for M5, and let me tell you: not only is the hype real, but the numbers I got from my extensive tests over the past two weeks actually exceed Apple’s claims.

Trying to Make Sense of the Rumored, Gemini-Powered Siri Overhaul

By Federico Viticci

Quite the scoop from Mark Gurman yesterday on what Apple is planning for major Siri improvements in 2026:

Apple Inc. is planning to pay about $1 billion a year for an ultrapowerful 1.2 trillion parameter artificial intelligence model developed by Alphabet Inc.’s Google that would help run its long-promised overhaul of the Siri voice assistant, according to people with knowledge of the matter.

There is a lot to unpack here and I have a lot of questions.

On MiniMax M2 and LLMs with Interleaved Thinking Steps

By Federico Viticci

MiniMax M2 with interleaved thinking steps and tools in TypingMind.

In addition to Kimi K2 (which I recently wrote about here) and GLM-4.6 (which will become an option on Cerebras in a few days, when I’ll play around with it), one of the more interesting open-source LLM releases out of China lately is MiniMax M2. This MoE model (230B parameters, 10B activated at any given time) claims to reach 90% of the performance of Sonnet 4.5…at 8% the cost. You can read more about the model here; Simon Willison blogged about it here; you can also test it with MLX on an Apple silicon Mac.

What I find especially interesting about M2 is that it’s the first model to support interleaved thinking steps in between responses and tool calls, which is something that Anthropic pioneered with Claude Sonnet 4 back in May. Here’s Skyler Miao, head of engineering at MiniMax, in a post on X (unfortunately, most of the open-source AI community is only active there):

As we work more closely with partners, we’ve been surprised how poorly community support interleaved thinking, which is crucial for long, complex agentic tasks. Sonnet 4 introduced it 5 months ago, but adoption is still limited.

We think it’s one of the most important features for agentic models: it makes great use of test-time compute.

The model can reason after each tool call, especially when tool outputs are unexpected. That’s often the hardest part of agentic jobs: you can’t predict what the env returns. With interleaved thinking, the model could reason after get tool outputs, and try to find out a better solution.

We’re now working with partners to enable interleaved thinking in M2 — and hopefully across all capable models.

I’ve been using Claude as my main “production” LLM for the past few months and, as I’ve shared before, I consider the fact that both Sonnet and Haiku think between steps an essential aspect of their agentic nature and integration with third-party apps.

That being said, I have been testing MiniMax M2 on TypingMind in addition to Kimi K2 for the past week and it is, indeed, impressive. I plugged MiniMax M2 into TypingMind using their Anthropic-compatible endpoint; out of the box, the model worked with interleaved thinking and the several plugins I’ve built for myself in TypingMind using Claude. I haven’t used M2 for any vibe-coding tasks yet, but for other research or tool-based queries (like adding notes to Notion and tasks to Todoist), M2 effectively felt like a version of Sonnet not made by Anthropic.

Right now, MiniMax M2 isn’t hosted on any of the fast inference providers; I’ve accessed it via the official MiniMax API endpoint, whose inference speed isn’t that different from Anthropic’s cloud. The possibility of MiniMax M2 on Cerebras or Groq is extremely fascinating, and I hope it’s in the cards for the near future.

Introducing RemCTL: The Power-User Reminders CLI for macOS and AI Agents

Introducing the All-New MacStories Shortcuts Archive

Shortcuts Playground Remote: The Generative Shortcut That Makes Other Shortcuts

This Week's Sponsor:

Posts tagged with "artificial intelligence"

OpenAI Opens Up ChatGPT App Submissions to Developers

Adobe Announces Image and PDF Integration with ChatGPT

How Stu Maschwitz Vibe Coded His Way Into an App Rejection and What It Means for the Future of Apps →

John Giannandrea’s Retirement From Apple Announced

Why is ChatGPT for Mac So Good?→

The AI App Experience Matters More Than Benchmarks Now

I Finally Tested the M5 iPad Pro’s Neural-Accelerated AI, and the Hype Is Real

Trying to Make Sense of the Rumored, Gemini-Powered Siri Overhaul

On MiniMax M2 and LLMs with Interleaved Thinking Steps