The battle for voice recognition inside vehicles is heating up

What startups, Amazon and Google are doing to win over automakers

Once a fringe feature found only in luxury vehicles, voice recognition has moved into the mainstream as more automakers promise a seamless connection between your car, home and all the devices in between. The opportunity to reach consumers in their vehicles — and collect all that data — has automakers, tech giants like Amazon and Google, as well as investors scrambling for a share of the connected cars market.

But this is just the beginning. Voice recognition is expected to be an essential feature in future autonomous vehicles, which will see drivers ultimately surrendering the ability to control the car mechanically. Other applications for voice recognition are also emerging, including automated drones, two-wheelers and even air taxis.

Voice recognition is expected to be an essential feature in future autonomous vehicles, which will see drivers ultimately surrendering the ability to control the car mechanically.

The upshot? A market with significant growth potential and opportunities for investors and companies of all sizes.

The opportunity

The share of cars featuring in-car connected services, which voice recognition requires, grew to 45% in 2020 from 30% in 2018, and is expected to reach 60% by 2024, according to IHS Markit. Automakers keen to improve the consumer experience are driving that growth, said Kyle Davis, IHS Markit’s senior analyst for vehicle experience and connected car, noting that “one of the biggest aspects of the user experience is voice.”

Voice recognition is becoming more common, but that doesn’t mean the technology is always received well by consumers. J.D. Power surveys consistently show consumers complaining about voice recognition systems in vehicles, said John Scumniotales, director of products and design for Alexa Auto at Amazon. Scumniotales sees this as an opportunity to improve that experience with Alexa, and help Amazon gain an even larger foothold in the marketplace.

While there are clear giants in the voice recognition field, there won’t ever be one system or type of digital assistant in vehicles, according to Greg Basich, associate director of Strategy Analytics’ global automotive practice. “You’re going to see multiple systems,” Basich said. “So it’s definitely a growing space.”

Startups will have to contend with behemoths like Google and Amazon, Basich said, adding, “It’s a tough market if you’re a startup [ … ] You need to be doing something very new or very different.”

In his view, automakers prefer to work with larger, more established companies that can provide long-term support for the technology once it’s in the vehicle. Amazon’s Scumniotales agrees, as the big companies are at a huge advantage since it takes a significant amount of investment to build the technology and then to do it at the scale required for the automotive industry.

Yet, a closer look indicates there is not only room for a number of players, but automakers aren’t always placing their bets on the biggest companies.

The players

Partnerships between automakers and Amazon Alexa or Google get much of the buzz. However, Cerence, a publicly traded company spun off from Nuance Communications in October 2019, actually controls 87% of the embedded virtual personal assistant market, according to Davis.

“The space is pretty small and we’re the largest and most entrenched player in it,” Cerence CTO Prateek Kathpal said in a recent interview. He believes that his company is small enough to take risks, innovate and not be hamstrung by funding issues like a traditional startup.

In January, the company unveiled Cerence Drive, its new platform for mobility assistants that integrates cloud and embedded technologies to provide what it describes as a more seamless and accurate AI voice-recognition experience. The system can support more than 70 languages and can understand commands when vehicle occupants are speaking multiple languages at the same time. It also can comprehend complex, multistep queries and commands like, “Find directions to Starbucks and also call my mom.”

Cerence has landed a number of customers over the years, including BMW, which has been using the company’s technology since 2000. Simon Euringer, head of personal assistants and voice interaction at BMW, is particularly impressed by Cerence’s hybrid system, which operates both via an embedded system and in the cloud, and provides answers through whichever of the two systems is quicker at the time.

Systems that are solely cloud-based don’t work well in areas with connectivity issues, which won’t enable the best user experience, Euringer said. BMW also incorporates other voice assistants, including Amazon’s Alexa, to let drivers ask for information outside the car.

Cerence is expanding beyond traditional passenger vehicles, too. In January 2021, the company launched a platform designed for two-wheelers such as motorcycles, e-bikes and scooters that allows riders to get directions and request music. It’s also considering ways to enable voice control in drones to allow for automated delivery. Other potential applications include air taxis and other flying vehicles.

According to Kathpal, this market is a challenging space for startups, as acquiring the data required for a successful voice recognition system requires significant investment and time. Revenue streams are another big hurdle, he said, as it takes an average of four to five years to bring a car to production from the design and conception stage. Even if a startup wins a deal with an automotive company, it won’t see any revenue until the car goes into production, he added.

Euringer also believes larger companies are better equipped to succeed in this space. “To develop the whole technology of voice recognition, in-cloud processing and language model training, I wouldn’t see how a smaller company would be able to compete in that market,” he said.

Some startups are succeeding despite these challenges.

Founded in 2005, SoundHound has raised $300 million and now employs about 400 people. In 2015, it launched the Houndify Voice AI platform to add a custom conversational interface to any product, including appliances, speakers, apps, robots and cars. The company has locked in four automotive partners, including Mercedes-Benz.

“The more devices become connected, the more voice becomes a very powerful interaction mechanism,” SoundHound COO Michael Zagorsek said. “We have a very powerful, unique voice that no one else has, not even Google.”

SoundHound’s pitch is that its technology comprehends speech in real time, so there are no delays in responses. The company’s technology also is capable of understanding complex and compound queries that mirror the way a person tends to speak, a process called “Deep Meaning Understanding,” he added.

There are also smaller voice recognition startups, like Speak With Me, that are trying to carve out a niche for themselves. Speak With Me founder Ajay Juneja views himself and his product as a disrupter. The nine-person company, which has raised $4 million from 30 investors, has created a “guardian angel” system that uses multiple cameras, mics and interior radar and other sensors to monitor occupants and pets.

It’s a feature that Juneja says sets his company’s technology apart. It also has the capability to monitor emotions, which lets the system intervene with soothing music or dialogue-driven breathing exercises if stress is detected. Juneja boasts that his product has the best context-tracking and memory, and says the company’s end-to-end AI workflow automation lets it “build and update AI models continuously for customers at very little internal costs.”

Startups like Speak With Me and SoundHound must contend with well-funded tech companies. However, they also have the freedom to innovate, which is a key asset in this relatively new space, according to Zagorsek.

Meanwhile, Google is casting an increasingly long shadow in the automotive industry. In 2018, the company extended its Google Assistant to cars, letting it to bring the “best of the home and the phone into the car,” according to Austin Chang, the product director for Google Assistant focused on auto.

Automakers, including those that were initially reticent to welcome Google in, have been won over. Google Assistant was first built into vehicles in the Polestar 2 as part of Android Automotive OS, with Google Assistant, Google Maps and Google Play. Other automakers, including GM, Nissan, Volkswagen and Volvo Cars soon partnered with Google to embed apps and services like Assistant. Ford, the latest addition, has agreed to use the Android operating system in millions of Ford and Lincoln-branded vehicles beginning in 2023.

The money and the market

Some investors, including the early-stage and venture arms of automakers, see the value in less established companies. “Startups often have the most innovative approaches to solving problems,” said Henry Chung, head of CRADLE, the venture capital investment and open innovation division of Hyundai Motor Group.

Hyundai is an investor and customer of SoundHound, and first deployed the startup’s music-tagging tool on its 2015 Hyundai Genesis Sedan. SoundHound’s voice recognition system is now integrated into the 2021 Hyundai Elantra and Elantra Hybrid, and the 2022 Hyundai Tucson SUV. “We’re very bullish on their technology. Otherwise we wouldn’t have deployed it in our vehicles,” Chung said.

Nils Schanz, head of voice assistant and user interaction concepts at Daimler, said relying on Alexa and Google isn’t enough, and he’s open to collaborating with small companies and startups “if they have something which is interesting for us and which can add value to what we are doing.”

Daimler began using SoundHound’s technology in North American Mercedes-Benz vehicles in 2018. In the U.S., the company is using SoundHound for cloud applications, while the embedded part of the software in the car is from Cerence.

“We are using these partners and choosing the best pieces out of their portfolio to create the best experience in our cars,” Schanz said. Going with smaller companies has allowed customization unique to Mercedes and a way to differentiate the company’s voice experience from its competitors, he said. An “off the shelf” solution provided by Google or Amazon “would not help us to serve our customers right. We want to have something very specific,” he said.

Amazon has responded to the demand for customization. In January, the e-commerce giant launched Alexa Custom Assistant, which lets carmakers build their own branded experience that co-exists with Alexa. It allows companies to develop their own wake words, voices and capabilities like car control functions while still using Alexa for features like smart home control.

Stellantis, formerly FCA, in January said it would integrate Alexa Custom Assistant into its vehicles. “The most recognized voice assistant is Amazon Alexa,” said Vince Galante, Stellantis’ chief designer for user experience. He said that being able to bring that directly from a customer’s home into their cars “creates that consistency and that familiarity” and provides a seamless, more personalized experience while still being able to have Amazon create custom skills.

Chang says that Google is also providing ways for automakers to differentiate their voice recognition experience. However, he did note that the company doesn’t offer a complete white-label solution. Chang says even though he feels Google, as a large, well-resourced player, is better positioned to succeed, because the voice assistant industry is such a “nascent” area, there’s ample opportunity for any company that can develop innovative solutions.

Kristin Kolodge, executive director of driver interaction and human machine interface at J.D. Power, agrees. Consumers surveyed by her company consistently identify problems with voice recognition systems and the deficiencies are across all car models and with all voice assistants currently in vehicles, she said. Yet, despite being frustrated and disappointed, she says consumers remain interested, and there’s a big opportunity for any company to “truly capture this voice modality.”

“Meeting those consumer expectations is still where the opportunity lies.”