Posts tagged with "MacStories"

Wired Confirms Perplexity Is Bypassing Efforts by Websites to Block Its Web Crawler

Last week, Federico and I asked Robb Knight to do what he could to block web crawlers deployed by artificial intelligence companies from scraping MacStories. Robb had already updated his own site’s robots.txt file months ago, so that’s the first thing he did for MacStories.

However, robots.txt only works if a company’s web crawler is set up to respect the file. As I wrote earlier this week, a better solution is to block them on your server, which Robb did on his personal site and wrote about late last week. The setup sends a 403 error if one of the bots listed in his server code requests information from his site.

Spoiler: Robb hit the nail on the head the first time.

Spoiler: Robb hit the nail on the head the first time.

After reading Robb’s post, Federico and I asked him to do the same for MacStories, which he did last Saturday. Once it was set up, Federico began testing the setup. OpenAI returned an error as expected, but Perplexity’s bot was still able to reach MacStories, which shouldn’t have been the case.1

Yes, I took a screenshot of Perplexity's API documentation because I bet it changes based on what we discovered.

Yes, I took a screenshot of Perplexity’s API documentation because I bet it changes based on what we discovered.

That began a deep dive to try to figure out what was going on. Robb’s code checked out, blocking the user agent specified in Perplexity’s own API documentation. What we discovered after more testing was that Perplexity was hitting MacStories’ server without using the user agent it said it used, effectively doing an end run around Robb’s server code.

Robb wrote up his findings on his website, which promptly shot to the top slot on Hacker News and caught the eye of Dhruv Mehrotra and Tim Marchman of Wired, who were in the midst of investigating how Perplexity works. As Mehrotra and Marchman describe it:

A WIRED analysis and one carried out by developer Robb Knight suggest that Perplexity is able to achieve this partly through apparently ignoring a widely accepted web standard known as the Robots Exclusion Protocol to surreptitiously scrape areas of websites that operators do not want accessed by bots, despite claiming that it won’t. WIRED observed a machine tied to Perplexity—more specifically, one on an Amazon server and almost certainly operated by Perplexity—doing this on wired.com and across other Condé Nast publications.

Until earlier this week, Perplexity published in its documentation a link to a list of the IP addresses its crawlers use—an apparent effort to be transparent. However, in some cases, as both WIRED and Knight were able to demonstrate, it appears to be accessing and scraping websites from which coders have attempted to block its crawler, called Perplexity Bot, using at least one unpublicized IP address. The company has since removed references to its public IP pool from its documentation.

That secret IP address—44.221.181.252—has hit properties at Condé Nast, the media company that owns WIRED, at least 822 times in the last three months. One senior engineer at Condé Nast, who asked not to be named because he wants to “stay out of it,” calls this a “massive undercount” because the company only retains a fraction of its network logs.

WIRED verified that the IP address in question is almost certainly linked to Perplexity by creating a new website and monitoring its server logs. Immediately after a WIRED reporter prompted the Perplexity chatbot to summarize the website’s content, the server logged that the IP address visited the site. This same IP address was first observed by Knight during a similar test.

This sort of unethical behavior is why we took the steps we did to block the use of MacStories’ websites as training data for Perplexity and other companies.2 Incidents like this and the lack of transparency about how AI companies train their models have led to a lot of mistrust in the entire industry among creators who publish on the web. I’m glad we’ve been able to play a small part in revealing Perplexity’s egregious behavior, but more needs to be done to rein in this sort of behavior, including closer scrutiny by regulators around the world.

As a footnote to this, it’s worth noting that Wired also puts to rest the argument that websites should be okay with Perplexity’s behavior because they include citations in their plagiarism. According to Wired’s story:

WIRED’s own records show that Perplexity sent 1,265 referrals to wired.com in May, an insignificant amount in the context of the site’s overall traffic. The article to which the most traffic was referred got 17 views.

That’s next to nothing for a site with Wired’s traffic, which Similarweb and other sites peg at over 20 million page views that same month. That’s a mere 0.006% of Wired’s May traffic. Let that sink in, and then ask yourself whether it seems like a fair trade.


  1. Meanwhile, I was digging through bins of old videogames and hardware at a Retro Gaming Festival doing ‘research’ for NPC
  2. Mehrotra and Marchman correctly question whether Perplexity is even an AI company because they piggyback on other company’s LLMs and use them in conjunction with scraped web data to provide summaries that effectively replace the source’s content. However, that doesn’t change the fact that Perplexity is surreptitiously scraping sites while simultaneously professing to respect sites’ robot.txt file. That’s the unethical bit. 

Access Extra Content and Perks

Founded in 2015, Club MacStories has delivered exclusive content every week for nearly a decade.

What started with weekly and monthly email newsletters has blossomed into a family of memberships designed for every MacStories fan.

Learn more here and from our Club FAQs.

Club MacStories: Weekly and monthly newsletters via email and the web that are brimming with apps, tips, automation workflows, longform writing, early access to the MacStories Unwind podcast, periodic giveaways, and more;

Club MacStories+: Everything that Club MacStories offers, plus an active Discord community, advanced search and custom RSS features for exploring the Club’s entire back catalog, bonus columns, and dozens of app discounts;

Club Premier: All of the above and AppStories+, an extended version of our flagship podcast that’s delivered early, ad-free, and in high-bitrate audio.

Learn more here and from our Club FAQs.


The Talk Show, Episode 399: ‘I Decapitated the MacBook Air’ with Federico Viticci

This week, Federico joined John Gruber on The Talk Show for a wide-ranging conversation about:

It’s a terrific episode from two people who have witnessed the evolution of blogging firsthand and Apple’s struggle to find a comfortable place for the iPad in its product lineup. That makes it the perfect warmup for next week’s Apple event.

Permalink

The MacStories Team Is on Threads (Again)

With Threads’ launch in Europe today, we thought it would be a good time to reintroduce readers to the MacStories Threads account as well as those of the MacStories team.

It’s been quite a year for social media. Almost exactly one year ago today, we announced that MacStories had established its own dedicated Mastodon server for MacStories, AppStories, and Club MacStories. That move has been successful beyond our wildest imaginations. MacStories’ core audience is on Mastodon, which has made it the perfect place to interact with readers and listeners.

However, not everyone is on Mastodon. That’s why we created MacStories Instagram and Threads accounts earlier this year. Federico and I have been on Instagram for years and joined Threads immediately, although it wasn’t long before Meta prevented Federico and other users in the EU from accessing Threads.

Today, Meta has reopened Threads to Europe, which means Federico, Silvia, and Niléane are back on the service along with Alex, Jonathan, and me. So, today, we thought we’d reintroduce the MacStories Threads account to everyone and link the team’s Threads accounts below to make it easy to follow whomever you’d like.

You can expect to hear about the latest stories published by the team on MacStories.net, what’s going on with Club MacStories, and updates on AppStories, MacStories Unwind, and upcoming new projects if you follow MacStories. To follow individual team members, you can use the links below:

We know that Threads isn’t for everyone, and the same is true of Mastodon, which is why we’re on both. So, wherever you’re hanging out these days, feel free to say hello. We love hearing from the MacStories community and are excited to have the full team together on Threads again.

Access Extra Content and Perks

Founded in 2015, Club MacStories has delivered exclusive content every week for nearly a decade.

What started with weekly and monthly email newsletters has blossomed into a family of memberships designed for every MacStories fan.

Learn more here and from our Club FAQs.

Club MacStories: Weekly and monthly newsletters via email and the web that are brimming with apps, tips, automation workflows, longform writing, early access to the MacStories Unwind podcast, periodic giveaways, and more;

Club MacStories+: Everything that Club MacStories offers, plus an active Discord community, advanced search and custom RSS features for exploring the Club’s entire back catalog, bonus columns, and dozens of app discounts;

Club Premier: All of the above and AppStories+, an extended version of our flagship podcast that’s delivered early, ad-free, and in high-bitrate audio.

Learn more here and from our Club FAQs.


MacStories Is on Mastodon with Its Own Server

As of today, MacStories is officially on Mastodon with its own server for each of its properties and team members. You can find us here:

The new MacStories Mastodon account.

The new MacStories Mastodon account.

We’re not closing down our Twitter accounts (yet), but as you may have noticed, they haven’t been active lately and won’t be going forward. That’s because we’ve grown increasingly uncomfortable with the direction the company is heading. If you’ve been keeping up with the news, you know what I mean. If you haven’t, I highly recommend Casey Newton’s recent piece on Platformer. Casey’s perspective and reasons for winding down his personal and business presence on Twitter are very close to our own.

Although I’ll miss what Twitter was at its best and always remember what it’s meant to me professionally, I’m excited to be moving on too. I don’t know if Mastodon will be the next big thing, but it doesn’t have to be. It gives us a place to experiment and expand the places we connect with the MacStories audience, which we’re eager to do.

Read more

Access Extra Content and Perks

Founded in 2015, Club MacStories has delivered exclusive content every week for nearly a decade.

What started with weekly and monthly email newsletters has blossomed into a family of memberships designed for every MacStories fan.

Learn more here and from our Club FAQs.

Club MacStories: Weekly and monthly newsletters via email and the web that are brimming with apps, tips, automation workflows, longform writing, early access to the MacStories Unwind podcast, periodic giveaways, and more;

Club MacStories+: Everything that Club MacStories offers, plus an active Discord community, advanced search and custom RSS features for exploring the Club’s entire back catalog, bonus columns, and dozens of app discounts;

Club Premier: All of the above and AppStories+, an extended version of our flagship podcast that’s delivered early, ad-free, and in high-bitrate audio.

Learn more here and from our Club FAQs.