When we look at how AI assistants work today, we realize that most of them are able to detect intent and references to objects, but they still lack the dimension of sensing or expressing emotions. Nuance is making another big step in delivering a more humanlike experience for car users by adding emotion detection capabilities and, more importantly, by adapting the style and tone of the mobility assistant’s response. These features not only strengthen the emotional connection between users and the in-car assistant but can also increase driving safety and become an incremental technology to optimize transfer of control in autonomous and semi-autonomous vehicles. A demo of this concept will be shown at CES 2019.
The German-American psychologist Karl Bühler noted three aspects of speech as an instrument of communication: it helps us to express ourselves, to appeal to others, and to refer to things in the world. When we look at how AI assistants work today, we easily identify two of the functions: appeal and reference; for researchers working with Natural Language Understanding these functions are called intent and named entity recognition. So, when I tell my mobility assistant in the car, “Make a reservation for a table in Tonio’s Osteria,” the intent is the table reservation and the named entity mentioned is Tonio’s Osteria.
But what about the third function, expressing one’s emotions? As humans we don’t even have to make a deliberate effort to make explicit statements about how we feel – quite the contrary. We cannot really hide our feelings (or cognitive and emotional states) from the outside world. By looking at our face, by listening to the way we speak, other humans can tell if we are tired or happy, stressed or excited.
Introducing AI that can sense and express emotions
Today’s AI assistants still lack this dimension for the most part, but that’s about to change. New technology makes it possible for AI to detect human emotions by observing facial expressions or analyzing speech signals. This blog post explains how. And in the other direction, Speech synthesis, or TTS (Text-To-Speech), technology is now able to produce “emotional” output in the form of “multi-style” voices, for example, generated to mimic vocal tones and speak in different styles like cheerful and neutral.
The fundamental question is: should we use these new technologies to build a more human-like assistant experience? Some people would say that we should keep artificial intelligence recognizable by having them speak in robotic voices, not giving them a name, etc. On the other hand, many users enjoy the ease of thinking about an assistant as a “she” (or, “he” – Siri for example lets you choose the gender of the TTS voice), and psychologists found that hearing speech causes humans to automatically project human-like characteristics onto the machine, a process called anthropomorphism. I would say that while people recognize that an AI is not a human being, they enjoy the illusion of speaking to a “virtual human” – as long as they don’t feel cheated.
Emotional interaction with the mobility assistant in the car
So, what does this experience look like, one that draws from both emotion detection and emotional TTS? In one use case, the AI assistant picks up the emotional state of the user (like the driver in a car) and mirrors it back: if it detects a driver is stressed or concentrating, the assistant should keep the interaction to a minimum, use short and to-the-point prompts and read them out in a neutral voice. If, on the other hand, the driver is detected to be in a good mood, then the system can mirror and reinforce this positive mood by switching to a “chatty” mode where it is more proactive, uses longer prompts and speaks in a cheerful voice.
Please have a look at the examples below. Or if you want to see this in action, you actually can: Nuance and Affectiva will show this as a joint demo at CES in Las Vegas in January.
There’s also another application for the combination of emotion detection and AI assistants in the car – enhanced safety. Take drowsiness detection for example. Many cars have such a feature already, and the NCAP demands camera-based Driver Monitoring Systems to be in cars that want to score 5 stars starting in 2020. But today, most cars just show a coffee cup icon in the display to remind drivers to stay alert. However, chances are that drivers who are tired will miss that icon or won’t know what to do with it. Here’s where the AI assistant comes in: It can engage in a dialog with the driver, offer options, like finding a coffee shop or a safe place to rest, lower the temperature in the car, call someone, etc. This application will be part of our CES 2019 demo. With the rise of semi-autonomous vehicles, emotion detection functionality integrated with mobility assistants in the car will allow the system to proactively propose driving modes or initiate transfer of control when it notices drowsiness, driver distraction or emotional stress that might affect driving style and safety. And with more and more surveillance systems entering the car, these functionalities can be added without a need for additional hardware.
Studies conducted in our DRIVE Lab show that drivers are very much looking forward to seeing these technologies in action and see vast potential to make the driving experience safer, more productive and enjoyable.
Use Case Example: Neutral Mode example from the restaurant domain
User: “Find a restaurant”
System: “<style=neutral> There is a rest area with a Burger King and KFC three miles ahead.”
User: “No, I’m in the mood for tacos.”
System: “<style=neutral> There’s also a Taco Bell at the next rest area. Do you want to stop there?”
User: “No, something better quality, but not expensive.”
System: “<style=neutral> Alright, ’Tarasco Taqueria’ has 4.5 stars from 212 reviewers on Yelp and is two miles off Exit 18, a detour of 5 minutes. Do you want to go there?”
Use Case Example: Chatty Mode example from the restaurant domain
User: “Is there a place to have some food along the way?”
System: “<style=lively > Plenty of options. Do you want a quick bite or a more relaxed place with good service?”
User: “I’m not sure. Do you have any alfresco options? Pizza maybe?”
System: “<style=lively > Sure! [gilded] There is a highly-rated Italian place five miles ahead with outdoor seating named ‘Il Casolare’. I am not quite sure they serve pizza though. Would you like to call them to ask?”
User: “Is it a casual place or fancy?”
System: “<style=lively > Uhm [gilded]. From the information I can access I cannot say with certainty. Pricing is a two-out-of-four, so my guess is that it’s more on the affordable and casual side.”
User: “Are there any other places coming up where I can sit outside?”
System: “<style=lively > The next option seems to be an American bar and grill, but, hmm [gilded], it is located several miles further off our route.
User: “OK, let’s check out your first suggestion. Do you think I should call ahead to make a reservation?”
System: “<style=lively > Their listing does not suggest reservations, but I can connect you by phone if you would like to check.”
User: “Nah, just bring me there. How long will it take us?”
System: “<style=lively > Traffic is not bad so we should arrive in about fifteen minutes.”
User: “Great, thanks.”
System: “Always my pleasure, Robert.”
User has food there, conversation continues after user re-enters car.
System: [earcon] “<style=lively > How was lunch Robert?”
User: “Food was really tasty! Terrific tacos and a very nice vibe.”
System: “<style=lively > Phew [gilded]. I’m glad it worked out. Would you like me to bookmark this place and include your comment as a note?”
Nils joined Nuance in 2003, after holding various roles for Philips Speech Processing for nearly a decade. Nils oversees the coordination of various research initiatives and activities across many of Nuance’s business units. He also organizes Nuance’s internal research conferences and coordinates Nuance’s ties to Academia and other research partners, most notably IBM.
Nils attended the Universities of Bonn, Koblenz, Duisburg and Hagen, where he earned an M.A. in Communication Research, a Diploma in Computer Science, a Ph.D. in Computational Linguistics, and an M.Sc. in Environmental Sciences.
Nils can speak six languages, including his mother tongue German, and a little Russian and Mandarin. In his spare time, Nils enjoys hiking and hunting in archives for documents that shed some light on the history of science in the early modern period.