On the heels of a feature story in Wired last week, Apple executives and engineers opened up about how Siri works in interviews with Fast Company. As the publication explained it, a narrative has emerged that Apple’s AI work is behind other companies’ efforts because of its dedication to user privacy.
In an interview with Fast Company, Apple’s Greg Joswiak disagrees:
“I think it is a false narrative. It’s true that we like to keep the data as optimized as possible, that’s certainly something that I think a lot of users have come to expect, and they know that we’re treating their privacy maybe different than some others are.”
Joswiak argues that Siri can be every bit as helpful as other assistants without accumulating a lot of personal user data in the cloud, as companies like Facebook and Google are accustomed to doing. “We’re able to deliver a very personalized experience . . . without treating you as a product that keeps your information and sells it to the highest bidder. That’s just not the way we operate.”
The article provides concrete examples of how Siri works and the advances that have been made since it was introduced with a level of detail that has not been shared before.
The effectiveness of Siri and Apple’s machine learning research is an area where Apple’s culture of secrecy has hurt it. Apple seems to have recognized this and has made a concerted effort to turn perceptions around with interviews like the ones in Wired and Fast Company. Apple employees have also begun to engage in more public discussion of the company’s machine learning and AI initiatives through outlets like its recently-introduced journal and presentations made by Apple employees. Apple even enlisted The Rock to help it get the word out about Siri’s capabilities. Competition for virtual personal assistant supremacy has heated up, and Apple has signaled it has no intention of being left out or backing down.
David Pierce has a feature story on WIRED today that’s all about Siri – especially the new Siri voice coming in iOS 11. It features a variety of interesting details concerning Siri’s history, the way Apple thinks about the digital assistant, and in-depth details on how new Siri languages are added.
One of my favorite bits involves a quote from Apple’s VP of product marketing, Greg Jozwiak, who said Apple focuses on Siri’s ability to get things done:
It drives him crazy that people compare virtual assistants by asking trivia questions, which always makes Siri look bad. "We didn't engineer this thing to be Trivial Pursuit!" he says.
This explains Siri’s productivity-focused commercial starring The Rock, and also helps make sense of the fact that Siri is often embarrassingly clueless when it comes to current events or other simple queries. Though Apple’s awareness of the problem exacerbates its lack of a suitable response in beefing up Siri’s trivia knowledge.
Other interesting tidbits from the story include the fact that Siri now has a massive 375 million active monthly users, and that Siri’s new, more natural voice was inspired in part by the movie Her.
Earlier this year Tom Gruber, the co-creator of Siri and current member of Apple’s AI team, gave a TED talk focusing on his vision for the future of AI, which is rooted in a philosophy he calls “humanistic AI.” The video and full transcript for that talk recently became available, providing a broader audience with Gruber’s insights into the place of AI in our everyday lives. While he doesn’t offer any specifics regarding work Apple is doing in this space, it is clear that Gruber’s vision represents, at least in part, the vision of Apple for Siri and AI as a whole.
Gruber describes humanistic AI as “artificial intelligence designed to meet human needs by collaborating and augmenting people.” This theme of AI augmenting, complementing humans is fleshed out by Gruber in several ways; one example involves Siri serving as an accessibility tool, while another theorizes at the benefits AI can offer to the human memory. The full talk provides an interesting glimpse into how Apple sees AI evolving in the near future.
Apple released an advertisement showcasing Siri starring former pro-wrestler turned film star, Dwayne (‘The Rock’) Johnson. Teased yesterday by Johnson on Twitter and Facebook, the video, posted to Apple’s YouTube channel, features Johnson accomplishing a long list of life goals with the help of Siri during a single day. The tongue-in-cheek spot highlights several Siri features such as:
- reading Johnson’s schedule;
- creating a reminder;
- scheduling a Lyft ride;
- getting the weather forecast;
- reading email;
- displaying photos;
- texting someone;
- converting measurements;
- playing a playlist;
- starting a FaceTime call; and
- taking a selfie.
The Siri ad is a clever and entertaining way of explaining the breadth of tasks that can be accomplished with Siri, from the basics like weather forecasts to less well-known features like taking a selfie.
Great overview by Steven Aquino on the Accessibility changes coming with iOS 11. In particular, he’s got the details on Type to Siri, a new option for keyboard interaction with the assistant:
Available on iOS and the Mac, Type to Siri is a feature whereby a user can interact with Siri via an iMessage-like UI. Apple says the interaction is one-way; presently it’s not possible to simultaneously switch between text and voice. There are two caveats, however. The first is, it’s possible to use the system-wide Siri Dictation feature (the mic button on the keyboard) in conjunction with typing. Therefore, instead of typing everything, you can dictate text and send commands thusly. The other caveat pertains to “Hey Siri.” According to a macOS Siri engineer on Twitter, who responded to this tweet I wrote about the feature, it seems Type to Siri is initiated only by a press of the Home button. The verbal “Hey Siri” trigger will cause Siri to await voice input as normal.
Technicalities aside, Type to Siri is a feature many have clamored for, and should prove useful across a variety of situations. In an accessibility context, this feature should be a boon for deaf and hard-of-hearing people, who previously may have felt excluded from using Siri due to its voice-first nature. It levels the playing field by democratizing the technology, opening up Siri to an even wider group of people.
I wish there was a way to switch between voice and keyboard input from the same UI, but retaining the ‘Hey Siri’ voice activation seems like a sensible trade-off. I’m probably going to enable Type to Siri on my iPad, where I’m typing most of the time anyway, and where I could save time with "Siri templates" made with native iOS Text Replacements.
Stephen Nellis, writing for Reuters, shares an interesting look into Apple's method for teaching Siri a new language:
At Apple, the company starts working on a new language by bringing in humans to read passages in a range of accents and dialects, which are then transcribed by hand so the computer has an exact representation of the spoken text to learn from, said Alex Acero, head of the speech team at Apple. Apple also captures a range of sounds in a variety of voices. From there, an acoustic model is built that tries to predict words sequences.
Then Apple deploys “dictation mode,” its text-to-speech translator, in the new language, Acero said. When customers use dictation mode, Apple captures a small percentage of the audio recordings and makes them anonymous. The recordings, complete with background noise and mumbled words, are transcribed by humans, a process that helps cut the speech recognition error rate in half.
After enough data has been gathered and a voice actor has been recorded to play Siri in a new language, Siri is released with answers to what Apple estimates will be the most common questions, Acero said. Once released, Siri learns more about what real-world users ask and is updated every two weeks with more tweaks.
The report also shares that one of Siri's next languages will be Shanghainese, a dialect of Wu Chinese spoken in Shanghai and surrounding areas. This addition will join the existing 21 languages Siri currently speaks, which are localized across a total of 36 different countries.
Debating the strengths and weaknesses of Siri has become common practice in recent years, particularly as competing voice assistants from Amazon, Google, and Microsoft have grown more intelligent. But one area Siri has long held the lead over its competition is in supporting a large variety of different languages. It doesn't seem like Apple will be slowing down in that regard.
Alongside beta versions of iOS, macOS, and tvOS, Apple today announced the release of the first beta of watchOS 3.2. The beta has yet to appear on Apple's developer portal, but it should be available soon. Besides the standard bug fixes and performance improvements, this update includes a couple new features, one of which is called Theater Mode. From Apple's developer release notes:
Theater Mode lets users quickly mute the sound on their Apple Watch and avoid waking the screen on wrist raise. Users still receive notifications (including haptics) while in Theater Mode, which they can view by tapping the screen or pressing the Digital Crown.
This sounds like an interesting new option that could be useful in scenarios besides being at the movie theater. Personally, I'm likely to use Theater Mode when I wear my Apple Watch overnight for sleep tracking. My normal practice is to turn off Raise to Wake in the Settings app before going to bed, but this could prove an easier method.
Besides Theater Mode, the most significant update in 3.2 is enhancements to Siri. Last year iOS 10 improved Siri by enabling it to handle queries from third-party apps that fit into specific categories:
- Ride booking
- Searching photos
Though all of those areas could be handled by Siri on iOS 10, Siri on Apple Watch was previously only able to direct you to your iPhone to perform those actions. But with watchOS 3.2, that is longer the case, as Siri on the Watch is now able to perform these third-party requests.
watchOS 3.2 will likely see a public release this spring, after a couple months of beta testing is complete.
Some interesting thoughts about the AirPods by Steven Aquino. In particular, he highlights a weak aspect of Siri that isn't usually mentioned in traditional reviews:
The gist of my concern is Siri doesn't handle speech impediments very gracefully. (I've found the same is true of Amazon's Alexa, as I recently bought an Echo Dot to try out.) I’m a stutterer, which causes a lot of repetitive sounds and long breaks between words. This seems to confuse the hell out of these voice-driven interfaces. The crux of the problem lies in the fact that if I don’t enunciate perfectly, which leaves several seconds between words, the AI cuts me off and runs with it. Oftentimes, the feedback is weird or I’ll get a “Sorry, I didn’t get that” reply. It’s an exercise in futility, sadly.
Siri on the AirPods suffers from the same issues I encounter on my other devices. It’s too frustrating to try to fumble my way through if she keeps asking me to repeat myself. It’s for this reason that I don’t use Siri at all with AirPods, having changed the setting to enable Play/Pause on double-tap instead (more on this later). It sucks to not use Siri this way—again, the future implications are glaringly obvious—but it’s just not strong enough at reliably parsing my speech. Therefore, AirPods lose some luster because one of its main selling points is effectively inaccessible for a person like me.
That's a hard problem to solve in a conversational assistant, and exactly the kind of Accessibility area where Apple could lead over other companies.