“Hey Mercedes, do I need sunglasses tomorrow in Miami?” The announcement of the new Mercedes-Benz User Experience (MBUX) during Consumer Electronics Show (CES) 2018 in Las Vegas raised worldwide attention. Especially the intelligent voice control system was highlighted in the reviews and makes the MBUX far more than a Human-Machine-Interface (HMI) or infotainment system. It is an example of an automotive assistant powered by artificial intelligence (AI) which learns and adapts to the needs and preferences of the driver – a real “revolution in the cockpit” as Daimler stated in the official announcement.
During the last few years, we have seen a new generation of automotive assistants enter the market, that have been rewarded with great reviews and user acceptance. What makes the difference between a good and an excellent automotive assistant user experience? This article highlights some of the technology innovation behind these newer systems revolutionizing the user experience, driving safety, and performance.
“You need to think about interacting with an automotive assistant as building and maintaining a relationship with a friend,” explains Fatima Vital, Senior Director Marketing Automotive at Nuance. “Many core-principles of interaction between humans are also valid for the human-machine interaction. In addition, we learned that people just don’t want to adapt their behavior but rightfully expect machines, devices, and cars to adapt to them.”
Starting and conducting a conversation
Every conversation starts with raising the dialogue partner’s attention: Traditionally, users have been expected to push a button to activate the speech recognition. While this method was used for many years and had the advantage of being very explicit and unambiguous, it doesn’t reflect how people communicate with each other. When humans want to get someone else’s attention, they call them by their name or just talk. Typically, they don’t even wait for a confirmation that they have been heard but continue talking to articulate their intention.
“Another very human behavior is interrupting each other. While in human conversations it is occasionally considered unpolite, it is part of our conversational behavior. Barge-in capabilities as supported in newer systems shorten the dialog, help the user get the required information quicker or let the user correct the system right away without having to listen through lengthy prompts”, Fatima Vital adds.
Understanding human talk: Natural language understanding
A further big step in improving the user experience and thus increasing the acceptance of an automotive assistant is to remove constraints that limit the way that users can talk to the car. The key to a flexible, human-like interaction with the automotive assistant is so-called Natural Language Understanding (NLU). This technology reflects a real paradigmatic shift as – in contrast to conventional voice control systems in cars – NLU no longer calls for certain fixed commands from the users. Furthermore, with modern assistants, users can even use implicit commands such as “I am cold” to have the system change the temperature in the seating zone.
“Technically, the understanding of natural language is realized by a combination of Deep Learning, based on Neuronal Networks (NN) and statistical modeling”, explains Fatima Vital.
Responding in a way that the user feels understood
Conversations are bilateral or even multilateral: As a result, creating a more human-like interaction with the automotive assistant requires the system to respond like humans do. Innovation in text-to-speech technologies (TTS) leveraging artificial intelligence capabilities lets the system output sound much more natural. Latest advancements are achieved by so-called natural language generation (NLG) as supported by Dragon Drive. This technology allows to vary the dialogue output instead of using stereotypical answers, thus resulting in a real conversation with the assistant instead of a mere exchange of commands. The necessary content is provided by the vehicle (for in-vehicle functions), from Nuance’s content database (e.g. in the case of weather information) or from cloud sources which were carefully chosen and integrated into the system.
Impress with broad domain coverage
As in every relationship, it is very important to have common topics to talk about. Traditional systems were limited to some core domains such as dialing, navigation and basic entertainment functions which were sufficient as long as cars weren’t connected and became true digital hubs. Today’s connected cars give drivers and passengers access to a myriad of connected services, apps and assistants. Recent deployments keep pushing the boundaries further and further, supporting hundreds of domains – a profound basis for vivid and diverse conversations. “This aspect is becoming more and more important with the ever-advancing automation of driving functions”, says Fatima Vital. “The driver will have more and more time to deal with non-driving related issues and demand access to a larger variety of domains.”
Keep learning to maintain the relationship
When two people keep talking to each other regularly, everybody expects, that the counterpart gets to know oneself over time. Needing to correct the same information again and again, leaves the impression that the other is either not very smart or unwilling to help. In any case, this will lead to frustration. The same happens when interacting with a voice assistant. The assistant needs to get to know the user, learn preferences as well as understand the context of a specific situation.
Nuance uses new artificial intelligence (AI) technologies such as machine learning and reasoning to enable today’s systems to learn user’s preferences and consider contextual information such as time, location, weather, restaurant ratings, car sensors, just to name a few. (For further information on this, check out Michael Kaisser’s article ‘4 pillars of Artificial Intelligence that are building the car of the future’!)
In addition, it is mandatory to keep the system up-to-date. As Nuance Dragon Drive is a hybrid system with cloud connection, the voice control as well as the content can be constantly updated. While the software model is continuously enriched with new words or changing use of language over time, new domains are included to broaden the spectrum of available information – the system is learning.
A seamless user experience delivering on the brand promise
A commonality of recent successful automotive infotainment systems is the seamless integration into the car brand’s HMI. These customized, automotive branded assistants give users a consistent and seamless user experience for all in-car and connected services and apps. Based on Dragon Drive, Nuance’s customizable platform, each deployment has been comprehensively adapted to the car manufacturer’s needs, making it an OEM branded solution, user experience and thus a distinguishing factor.
Focus on HMI usability
Obviously, innovative technology alone doesn’t guarantee a good user experience. It all depends on the execution and an intelligent dialog design. Adam Emfield addressed the five common automotive HMI usability pitfalls in a series of blog articles.
To help customers avoid many of these usability pitfalls Nuance is packaging and optimizing core technology, domain content and user experience in tested and validated domains. These broader packaged solutions range from core automotive domains (e.g. voice-controlled phone and navigation, hybrid music and local search), to “smart domains” leveraging AI capabilities for driving-related information such as parking, fuel and others or even, new services like mobile office integrations and car-related domains such as voice-enabled car manuals.
More to come
While these technologies are already being implemented and well perceived in the market, Nuance is already offering a new set of technologies to further humanize the interaction with the automotive assistants. Fatima Vital: “Our latest prototypes include just talk capabilities, to activate the assistant without a wakeup-word, as well as interaction with the help of gaze and gestures. Furthermore, the system is able to interact with multiple users at the same time.”