What’s next:
In the Labs

×

Multimodal interaction – How machines learn to understand pointing

Pointing at subjects and objects – be it with language or using gaze, gestures or the eyes only – is a very human ability. Smart, multimodal assistants, like in your car, account for these forms of pointing, thus making interaction more human-like than ever before. Made possible by image recognition and Deep Learning technologies, this will have significant implications for the autonomous vehicles of the future.

By
Smart multimodal assistants, such as Nuance Dragon Drive, now also include gaze detection based on eye-tracking

As we learn more about the biological world around us, the list of things only humans can do has dwindled – and that’s before computers started to play chess and Go. Counting? Birds can deal with numbers up to twelve. Using tools? Dolphins in Shark Bay, Australia, are using sponges as a tool for hunting. Against this background, it may come as a surprise how specifically human pointing is: Although it seems very natural and easy to us, not even chimpanzees, our closest living relatives, can muster more than the most trivial forms of pointing. So how could we expect machines to understand it?

 

Three forms of pointing

In 1934, the linguist and psychologist Karl Bühler distinguished three forms of pointing, all connected to language: The first is pointing “ad oculos,” that is in the field of visibility centered around the speaker (“here”) and also accessible to the listener. While we can point within this field with our fingers alone, languages offer a special set of pointing words to complement this (“here” vs. “there;” “this” vs. “that;” “left” and “right;” “before” and “behind” etc.). The second form of pointing operates in a remembered or imagined world, brought about by language (“When you leave the Metropolitan Museum, Central Park is behind you and the Guggenheim Museum is to your left. We will meet in front of that”). The third form is pointing within language: As speech is embedded in time, we often have the necessity to point back to something we said a little earlier or point forward to something we will say later. In a past blog post, I described how the anaphoric use of pointing words (“How is the weather in Tokyo?” “Nice and sunny.” “Are there any good hotels there?”) can be supported in smart assistants (and how this capability distinguishes the smarter assistants from the not-so-smart). And he first mode of pointing at elements in the visible vicinity is now also available in today’s smart assistants.

 

First automotive assistants to support “pointing”

At CES in Las Vegas this month, we demonstrated how drivers can point to buildings outside the car and ask questions like, “What are the opening hours of that shop?” But, the “pointing” doesn’t need to be done with a finger. With the new technology, you can simply look at the object in question, something made possible by eye gaze detection based on a camera tracking of the eyes. This technology is imitating human behavior, as humans are very good at guessing where somebody is looking just by observing his or her eyes.

 

 

Biologists suggest that the distinct shape and appearance of the human eye (a dark iris and a contrasting white surrounding) is no accident, but a product of evolution facilitating the ability of gaze detection. Artists have exploited that for many centuries: with just a few brush strokes of paint, they can make figures in their paintings look at other figures or even outside the picture – including at the viewer of the painting. Have a look at Raffael’s Sistine Madonna, which is displayed in Dresden, and see how the figures’ viewing directions make them point at each other and how that guides our view.

RAFAEL - Madonna Sixtina (Gemäldegalerie Alter Meister, Dresden, 1513-14. Óleo sobre lienzo, 265 x 196 cm).jpg

By Raphael – Google Art Project: Public Domain

Multimodal interaction: When speech, gesture, and hand writing work hand in hand

Machines can also do this based on image recognition and Deep Learning, capabilities which, coming out of our cooperation with DFKI, will bring us into the age of truly multimodal assistants. It is important to remember that “multimodal” does not just mean you have a choice between modalities (typing OR speaking OR handwriting on a pad to enter the destination into your navigation system), but that multiple modalities work together to accomplish one task. For example, when pointing to something in the vicinity (modality 1) and saying, “tell me more about this” (modality 2), both modalities are needed to explain what the person performing this wants to accomplish.

 

Multimodal interaction – a key feature for Level 4 and 5 autonomous vehicles?

While it is obvious why such a capability is attractive to today’s drivers, there are hints that it might become even more important as we enter the age of autonomous vehicles. Many people are wondering what drivers will do when they don’t have to drive any more, something they would experience in Levels 4 and 5 of the autonomous driving scale. Some studies indicate that perhaps the answer is not that much, actually. For example, a 2016 German study asked people about the specific advantages they perceived in such vehicles, and “… that I can enjoy the landscape” came out as the top choice at all levels of autonomy. It’s not too difficult to imagine a future with gaze and gesture detection, combined with a “just talk” mode of speech recognition – one where you can ask “what is that building?” without having to press a button or say a keyword first. This future will give users of autonomous vehicles exactly what they want. And for today’s users of truly multimodal systems, machines just got a little more human-like again.

Read full article

More from the editor

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo: Part 1 of What’s left to tackle in voice technology
Automatically generating dialog for conversations is a complex problem to solve.
Beyond the algorithms: Shaping the future of the Automotive Assistant for autonomous cars
New study reveals how drivers want to get notifications in the cars of the future
Dragon celebrates its 20th anniversary
Two decades later, we’re still talking to our computers – and much more
Nuance Research Conference 2017: Reflections on Deep Learning and AI innovation
Keynotes from John Searle & Barbara Grosz inspire Nuance’s global voice, AI research teams
1,000 years of emoji history and what Machine Learning means for its future
A look at emoji: how they’ve changed over time and where they are going
Dragon, do you speak my dialect?
Once divisive, now unifying, dialects play an important role in defining who we are
Nuance’s inaugural engineering conference, Ncode, rocks it in Montreal
Q&A with Nuance Mobile R&D from the event
How the technology transcribing your meetings actually works
Simple isn’t always as simple as it seems
Why we’re using Deep Learning for our Dragon speech recognition engine
Unique application of Neural Nets results in greater productivity
Winograd Schema Challenge: Can computers reason like humans?
Results from the inaugural Winograd Schema Challenge unveiled at the IJCAI-16 in New York
Hearing is like seeing – for our brains and for machines
How CNNs developed for image recognition help with ASR and NLU, too
Part 1: How to avoid 5 common automotive HMI usability pitfalls
Audio and touch input are at the core of a powerful automotive HMI system
Part 3 – AI for customer care: Using Machine Learning to solve customer requests
Turning big data into big knowledge for better customer service
Part 2 – AI for customer care: Turning ‘bags of words’ into meaning with machine learning
Machine learning and AI turn big data into big knowledge for a better customer experience
Nuance and DFKI help students create interactive appliances of the future with speech tools
Providing easy to use speech tech helps usher forth tomorrow’s interactive appliances
Part 1 – AI for customer care: Human assisted virtual agents get smart with big knowledge
Machine learning and AI turn big data into big knowledge for a better customer experience
Mercedes-Benz’s Margarete Wies discusses the future of the connected car
Extending digital living with infotainment systems, autonomous vehicles, and more
In a galaxy (not so) far, far away
Star Wars and the relationship between man and machine
Then and NAO: Bringing conversational robots to homes, hotels, and hospitals
Aldebaran's NAO and Pepper show the power of specialized voice experiences for robotics
How many Neural Nets does it take to catch the big fish in Machine Learning?
NLU and AI innovation goes deeper so machines can understand human language
KITT – Please open the garage
How talking cars that talk to "things" make life simpler, smarter
Meet Lisa, a world championship robot with a lot to say
How this student-built social robot can lend a helpful hand to our aging society
Just be yourself: More on variation, voice biometrics, and the science of voice technology
Using Deep Neural Networks to add variation and improve accuracy
Taking a pause to discuss speaker variation… and Machine Learning
New research observes variation in communication to abstract meaning
Innovating dialog: How machines use and make sense of ellipses
Building speech systems that naturally use ellipses in human-machine interaction
Innovating dialogue: How machines make sense of anaphora
Building speech systems that naturally use anaphora in human-machine interaction
#iLookLikeAnEngineer: Breaking down gender stereotypes in tech
An inside look at Nuance and how diversity fuels innovation
Lost in translation: A solo trip abroad and discovering the art of language
Sunrises in Spain and machines you can talk to
Innovating dialogue: How machines make sense of paraphrasing and adult language
Building speech systems that naturally use paraphrases in human-machine interaction
Innovating machine dialog: Brush up on your Greek and read Aristotle
Building systems that can make sense of Rhetoric and nuanced speech
Deep learning, coming to a car near you
Computing systems inspired by the human brain change the way we interact with cars
Getting “deep” about “deep learning”
A detailed exploration of deep machine learning, a concept rooted in metaphors
The personality of Science: The traits that help define an industry
Nuance researchers answer, “What qualities does a good researcher possess?”
Technology inspired by humans – A look back at NRC 2015
Reflections from Nuance Research Conference 2015
The intersection of Science Fiction, super-pi, and technology innovation
An ode to Mr. Spock and to chasing the impossible
Star Trek, Mr. Spock, and a highly sought-after future
How a vision for the future helped allay the anxieties of the time.
It’s time to take off your tinfoil hats: AI is safe for human consumption
Exploring the effects of artificial intelligence on our daily lives
Nuance senior research scientist David Martin receives AAAI Senior Member status
Leading Artificial Intelligence industry group recognizes Martin for career achievements
Can machines think?
Nuance to host annual Winograd Schema Challenge, an alternative to the Turing Test
Humanizing technology through Cognitive Computing and Artificial Intelligence
Nuance furthers AI investment with DFKI research center
Why “innovation” doesn’t always have to be new (or at least on first sight)
Decades old concepts give light to revolutionary innovations
Ethics and design: Doing the right thing
The importance of design stretches far beyond basic appeal
Will machine language bring about the demise of voice actors?
The science behind creating next-generation synthetic voices
Can we build ‘Her’?: What Samantha tells us about the future of AI
The journey to making virtual assistants more humanlike
Innovation and design: The coolness (and unusability) of our rich friends’ houses
The challenges of designing for fanfare vs. usability
The never-ending evolution of natural language understanding
Rapid development in natural language understanding creates new possibilities
Nuance’s Ron Kaplan awarded honorary doctorate from University of Copenhagen
Award recognizes significant contributions to linguistics and natural language
Video: Innovating a relationship between people and technology
Making technology that works *for* us - not against us
Nuance’s Peter Patel-Schneider receives prestigious SWSA Ten-Year Award
SWSA award honors most impactful research
Beyond the GUI: It’s time for a conversational user interface
Conversational user interface promoting new interactions between people and devices
Voice recognition and the dawn of intelligent systems
Examining the rapid progress of voice recognition and natural language understanding
Nuance opens new Mobile Innovation Center in Cambridge’s Central Square
The new mobile innovation center is home to the expanding segment of Nuance’s R&D.
Nuance Chief Technology Officer Vlad Sejnoha named 2013 CTO of the year
Nuance CTO presented with the CTO of the Year Award at Mass TLC Awards
Smart watches need intelligent systems
Here's how you deliver intelligent systems for the evolving wearables ecosystem
Nuance and Intel keep their heads in the cloud
Collaborating on cloud computing to advance intelligent NLU systems
Got GPUs? Nuance puts groundbreaking NVIDIA GPUs to work to accelerate voice innovation
Ushering a new era in Machine Learning
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo: Part 1 of What’s left to tackle in voice technology
Automatically generating dialog for conversations is a complex problem to solve.
bringing together leading minds to tackle advancements in AI
Nuance Research Conference 2017: Reflections on Deep Learning and AI innovation
Keynotes from John Searle & Barbara Grosz inspire Nuance’s global voice, AI research teams
Nuance engineers gather at Ncode
Nuance’s inaugural engineering conference, Ncode, rocks it in Montreal
Q&A with Nuance Mobile R&D from the event
Contestants for the Winograd Schema Challenge build intelligent systems to test natural language and reasoning capabilities.
Winograd Schema Challenge: Can computers reason like humans?
Results from the inaugural Winograd Schema Challenge unveiled at the IJCAI-16 in New York
How you can use machine learning and natural language methods to accurately answer customer service questions
Part 3 – AI for customer care: Using Machine Learning to solve customer requests
Turning big data into big knowledge for better customer service
An agent in a call center supports virtual agents
Part 1 – AI for customer care: Human assisted virtual agents get smart with big knowledge
Machine learning and AI turn big data into big knowledge for a better customer experience
Machine Learning, Neural Nets, and advanced voice technology are making the robots for homes, banks, hotels, and more, even smarter
Then and NAO: Bringing conversational robots to homes, hotels, and hospitals
Aldebaran's NAO and Pepper show the power of specialized voice experiences for robotics
Students from the University of Koblenz-Landau built Lisa, a helpful social robot who can communicate with humans and perform daily tasks.
Meet Lisa, a world championship robot with a lot to say
How this student-built social robot can lend a helpful hand to our aging society
How machine speech systems use and make sense of ellipses rhetorical devices
Innovating dialog: How machines use and make sense of ellipses
Building speech systems that naturally use ellipses in human-machine interaction
Traveling alone in Spain, I formed a new appreciation for language as an art and the technology we’ve built to interpret and interact with people
Lost in translation: A solo trip abroad and discovering the art of language
Sunrises in Spain and machines you can talk to
deep learning connected car echnology
Deep learning, coming to a car near you
Computing systems inspired by the human brain change the way we interact with cars
Nuance Research Conference 2015 explored R&D topics like Deep Neural Nets, Artificial Intelligence, Natural Language Understanding, Anaphora, and more
Technology inspired by humans – A look back at NRC 2015
Reflections from Nuance Research Conference 2015
Two things so different can live in harmony - these are the positive effects of artificial intelligence on humanity
It’s time to take off your tinfoil hats: AI is safe for human consumption
Exploring the effects of artificial intelligence on our daily lives
artificial-intelligence
Humanizing technology through Cognitive Computing and Artificial Intelligence
Nuance furthers AI investment with DFKI research center
Will machine language bring about the demise of voice actors?
The science behind creating next-generation synthetic voices
The never-ending evolution of natural language understanding
Rapid development in natural language understanding creates new possibilities
Nuance’s Peter Patel-Schneider receives prestigious SWSA Ten-Year Award
SWSA award honors most impactful research
Nuance opens new Mobile Innovation Center in Cambridge’s Central Square
The new mobile innovation center is home to the expanding segment of Nuance’s R&D.
Nuance and Intel keep their heads in the cloud
Collaborating on cloud computing to advance intelligent NLU systems
Beyond the algorithms: Shaping the future of the Automotive Assistant for autonomous cars
New study reveals how drivers want to get notifications in the cars of the future
The history of emoji and its future with Machine Learning
1,000 years of emoji history and what Machine Learning means for its future
A look at emoji: how they’ve changed over time and where they are going
New technology can transcribe meetings between colleagues
How the technology transcribing your meetings actually works
Simple isn’t always as simple as it seems
Seeing is like hearing for machines and human brains
Hearing is like seeing – for our brains and for machines
How CNNs developed for image recognition help with ASR and NLU, too
Machine learning turns bags of words from big data into big knowledge for customer care
Part 2 – AI for customer care: Turning ‘bags of words’ into meaning with machine learning
Machine learning and AI turn big data into big knowledge for a better customer experience
The Future Mobility vehicle becomes a contextual and highly personalized digital living space.
Mercedes-Benz’s Margarete Wies discusses the future of the connected car
Extending digital living with infotainment systems, autonomous vehicles, and more
How many Neural Nets does it take to catch the big fish in Machine Learning?
How many Neural Nets does it take to catch the big fish in Machine Learning?
NLU and AI innovation goes deeper so machines can understand human language
Variation can improve accuracy of speaker verification for voice biometrics
Just be yourself: More on variation, voice biometrics, and the science of voice technology
Using Deep Neural Networks to add variation and improve accuracy
in communication, speech systems are built to interpret and use rhetorical devices like anaphora
Innovating dialogue: How machines make sense of anaphora
Building speech systems that naturally use anaphora in human-machine interaction
in communication, speech systems are built to make sense of and use rhetorical devices like paraphrase
Innovating dialogue: How machines make sense of paraphrasing and adult language
Building speech systems that naturally use paraphrases in human-machine interaction
Getting “deep” about “deep learning”
A detailed exploration of deep machine learning, a concept rooted in metaphors
On Super Pi Day, we celebrate those who dare to chase the impossible and innovate a futuristic world full of things even Mr. Spock couldn't imagine
The intersection of Science Fiction, super-pi, and technology innovation
An ode to Mr. Spock and to chasing the impossible
aaai-senior-member-david-martin
Nuance senior research scientist David Martin receives AAAI Senior Member status
Leading Artificial Intelligence industry group recognizes Martin for career achievements
Why “innovation” doesn’t always have to be new (or at least on first sight)
Decades old concepts give light to revolutionary innovations
Can we build ‘Her’?: What Samantha tells us about the future of AI
The journey to making virtual assistants more humanlike
Nuance’s Ron Kaplan awarded honorary doctorate from University of Copenhagen
Award recognizes significant contributions to linguistics and natural language
Beyond the GUI: It’s time for a conversational user interface
Conversational user interface promoting new interactions between people and devices
Nuance Chief Technology Officer Vlad Sejnoha named 2013 CTO of the year
Nuance CTO presented with the CTO of the Year Award at Mass TLC Awards
Got GPUs? Nuance puts groundbreaking NVIDIA GPUs to work to accelerate voice innovation
Ushering a new era in Machine Learning
Dragon speech recognition software celebrates its 20 year anniversary
Dragon celebrates its 20th anniversary
Two decades later, we’re still talking to our computers – and much more
Nuance speech technology can understand over 80 languages and their dialects
Dragon, do you speak my dialect?
Once divisive, now unifying, dialects play an important role in defining who we are
Dragon uses deep learning for more accurate speech recognition.
Why we’re using Deep Learning for our Dragon speech recognition engine
Unique application of Neural Nets results in greater productivity
How to get the key foundation right for automotive HMI with audio and touch input
Part 1: How to avoid 5 common automotive HMI usability pitfalls
Audio and touch input are at the core of a powerful automotive HMI system
DFKI students use nuance speech tools to create interactive IoT applications
Nuance and DFKI help students create interactive appliances of the future with speech tools
Providing easy to use speech tech helps usher forth tomorrow’s interactive appliances
Star Wars’ portrayal of relationships between robots and humans is becoming today’s reality with new technological advancements
In a galaxy (not so) far, far away
Star Wars and the relationship between man and machine
Connected cars are becoming more connected to the IoT and more useful, employing ubiquitous personal assistants that exist across devices and experiences
KITT – Please open the garage
How talking cars that talk to "things" make life simpler, smarter
Speech systems need to observe and deal with pauses and other variation to elicit more natural communication between man and machine
Taking a pause to discuss speaker variation… and Machine Learning
New research observes variation in communication to abstract meaning
Female Nuance engineers share stories about combatting gender stereotypes in the tech industry
#iLookLikeAnEngineer: Breaking down gender stereotypes in tech
An inside look at Nuance and how diversity fuels innovation
The ancient Greeks discovered rhetorical devices which are now common in everyday language - something we need to specially design speech systems to accommodate
Innovating machine dialog: Brush up on your Greek and read Aristotle
Building systems that can make sense of Rhetoric and nuanced speech
Childlike curiosity, being comfortable with a blank page... Nuance researchers share what qualities they think a good researcher possesses
The personality of Science: The traits that help define an industry
Nuance researchers answer, “What qualities does a good researcher possess?”
Star Trek, Mr. Spock, and the future of patient care
Star Trek, Mr. Spock, and a highly sought-after future
How a vision for the future helped allay the anxieties of the time.
winograd-schema-challenge
Can machines think?
Nuance to host annual Winograd Schema Challenge, an alternative to the Turing Test
Ethics and design: Doing the right thing
The importance of design stretches far beyond basic appeal
Innovation and design: The coolness (and unusability) of our rich friends’ houses
The challenges of designing for fanfare vs. usability
Video: Innovating a relationship between people and technology
Making technology that works *for* us - not against us
Voice recognition and the dawn of intelligent systems
Examining the rapid progress of voice recognition and natural language understanding
Wearables like smart watches need intelligent systems to enable a meaningful human-device interaction
Smart watches need intelligent systems
Here's how you deliver intelligent systems for the evolving wearables ecosystem
Show more articles