What’s next:
In the Labs

×

Why we’re using Deep Learning for our Dragon speech recognition engine

Everybody is special in how we use language – how we speak and the words we use. And in some cases, the individuality of the speaker matters and can be leveraged to create even better experiences through Deep Learning and Neural Networks – like our latest Dragon Individual and Dragon Legal offerings.

By
Dragon uses deep learning for more accurate speech recognition.

Everybody is special in how we use language – how we speak, the words we use, etc. In an earlier blog post, we saw how speech recognition systems eliminate this variation by training on speech and language data that cover many accents, age groups, or other variations in speaking style you might think of. This creates very robust systems that work well for (nearly) every speaker; we call this “speaker-independent” speech recognition.

But in some cases, the individuality of the speaker matters and can be leveraged to create even better experiences – like our latest Dragon Individual and Dragon Legal offerings ,that are typically  used by one user.  This allows us to go beyond speaker-independent speech recognition by adapting to each user in a speaker-dependent way. Dragon does this on several levels:

  • It adapts to the user’s active vocabulary by inspecting texts the user has created in the past, both by adding custom words to its active vocabulary and by learning the typical phrases and text patterns the user employs.
  • During each session, it does a fast adaptation of its acoustic model (capturing how words are pronounced) based on just a few seconds of speech from the user. By doing this, it can also adapt to how a user’s voice sounds in the moment; for instance are they impacted by a cold, using a different microphone or is there a change in environment.
  • During the optional enrollment step, or later after a dictation session ends, Dragon will do some more intense learning in an offline mode. It continues to adapt models very well over time to a specific user’s speaking patterns.

This latter point deserves more attention. Dragon uses Deep Neural Networks end-to-end both at the level of the language model — capturing the frequency of words and in which combinations they typically occur — and of the acoustic model, deciphering the smallest spoken units, or phonemes of a language.

These models are quite large and before they leave our labs, they have already been trained on lots and lots of data. One of the reasons why Neural Networks have taken off only now and not in the late 20th century when they were invented is that training is quite a computing intensive process. We use significant amounts of GPUs (Graphical Processing Unit) to train our models. GPUs were originally invented for computer graphic applications like video games. Computing images and training Deep Neural Networks have a lot in common as both tasks require the application of relatively simple calculations towards lots of data points at the same time, and this is what GPUs are good at. We use multiple GPUs in parallel in one training session to speed up the training process

But how do we apply this outside of our data centers? Adapting those Deep Neural Networks that make up the acoustic model to the speech coming from the user is similar to training them, and we want to make that happen on the user’s PC, Mac or laptop – and we want it to be fast. It is a demanding task as we need to make sure adaptation works with just a little data and computationally it is a very efficient process.

Packaging this process in a way that allows the individual to run it on their desktop or laptop is the culmination of many years of innovation in speech recognition and machine learning R&D. Enjoy the result of a highly accurate Dragon experience that is fully personalized to you and your voice.

Read full article

More from the editor

Winograd Schema Challenge: Can computers reason like humans?
Results from the inaugural Winograd Schema Challenge unveiled at the IJCAI-16 in New York
Hearing is like seeing – for our brains and for machines
How CNNs developed for image recognition help with ASR and NLU, too
Part 1: How to avoid 5 common automotive HMI usability pitfalls
Audio and touch input are at the core of a powerful automotive HMI system
Part 3 – AI for customer care: Using Machine Learning to solve customer requests
Turning big data into big knowledge for better customer service
Part 2 – AI for customer care: Turning ‘bags of words’ into meaning with machine learning
Machine learning and AI turn big data into big knowledge for a better customer experience
Nuance and DFKI help students create interactive appliances of the future with speech tools
Providing easy to use speech tech helps usher forth tomorrow’s interactive appliances
Part 1 – AI for customer care: Human assisted virtual agents get smart with big knowledge
Machine learning and AI turn big data into big knowledge for a better customer experience
Mercedes-Benz’s Margarete Wies discusses the future of the connected car
Extending digital living with infotainment systems, autonomous vehicles, and more
In a galaxy (not so) far, far away
Star Wars and the relationship between man and machine
Then and NAO: Bringing conversational robots to homes, hotels, and hospitals
Aldebaran's NAO and Pepper show the power of specialized voice experiences for robotics
How many Neural Nets does it take to catch the big fish in Machine Learning?
NLU and AI innovation goes deeper so machines can understand human language
KITT – Please open the garage
How talking cars that talk to "things" make life simpler, smarter
Meet Lisa, a world championship robot with a lot to say
How this student-built social robot can lend a helpful hand to our aging society
Just be yourself: More on variation, voice biometrics, and the science of voice technology
Using Deep Neural Networks to add variation and improve accuracy
Taking a pause to discuss speaker variation… and Machine Learning
New research observes variation in communication to abstract meaning
Innovating dialog: How machines use and make sense of ellipses
Building speech systems that naturally use ellipses in human-machine interaction
Innovating dialogue: How machines make sense of anaphora
Building speech systems that naturally use anaphora in human-machine interaction
#iLookLikeAnEngineer: Breaking down gender stereotypes in tech
An inside look at Nuance and how diversity fuels innovation
Lost in translation: A solo trip abroad and discovering the art of language
Sunrises in Spain and machines you can talk to
Innovating dialogue: How machines make sense of paraphrasing and adult language
Building speech systems that naturally use paraphrases in human-machine interaction
Innovating machine dialog: Brush up on your Greek and read Aristotle
Building systems that can make sense of Rhetoric and nuanced speech
Deep learning, coming to a car near you
Computing systems inspired by the human brain change the way we interact with cars
Getting “deep” about “deep learning”
A detailed exploration of deep machine learning, a concept rooted in metaphors
The personality of Science: The traits that help define an industry
Nuance researchers answer, “What qualities does a good researcher possess?”
Technology inspired by humans – A look back at NRC 2015
Reflections from Nuance Research Conference 2015
The intersection of Science Fiction, super-pi, and technology innovation
An ode to Mr. Spock and to chasing the impossible
Star Trek, Mr. Spock, and a highly sought-after future
How a vision for the future helped allay the anxieties of the time.
It’s time to take off your tinfoil hats: AI is safe for human consumption
Exploring the effects of artificial intelligence on our daily lives
Nuance senior research scientist David Martin receives AAAI Senior Member status
Leading Artificial Intelligence industry group recognizes Martin for career achievements
Can machines think?
Nuance to host annual Winograd Schema Challenge, an alternative to the Turing Test
Humanizing technology through Cognitive Computing and Artificial Intelligence
Nuance furthers AI investment with DFKI research center
Why “innovation” doesn’t always have to be new (or at least on first sight)
Decades old concepts give light to revolutionary innovations
Ethics and design: Doing the right thing
The importance of design stretches far beyond basic appeal
Will machine language bring about the demise of voice actors?
The science behind creating next-generation synthetic voices
Can we build ‘Her’?: What Samantha tells us about the future of AI
The journey to making virtual assistants more humanlike
Innovation and design: The coolness (and unusability) of our rich friends’ houses
The challenges of designing for fanfare vs. usability
The never-ending evolution of natural language understanding
Rapid development in natural language understanding creates new possibilities
Nuance’s Ron Kaplan awarded honorary doctorate from University of Copenhagen
Award recognizes significant contributions to linguistics and natural language
Video: Innovating a relationship between people and technology
Making technology that works *for* us - not against us
Nuance’s Peter Patel-Schneider receives prestigious SWSA Ten-Year Award
SWSA award honors most impactful research
Beyond the GUI: It’s time for a conversational user interface
Conversational user interface promoting new interactions between people and devices
Voice recognition and the dawn of intelligent systems
Examining the rapid progress of voice recognition and natural language understanding
Nuance opens new Mobile Innovation Center in Cambridge’s Central Square
The new mobile innovation center is home to the expanding segment of Nuance’s R&D.
Nuance Chief Technology Officer Vlad Sejnoha named 2013 CTO of the year
Nuance CTO presented with the CTO of the Year Award at Mass TLC Awards
Smart watches need intelligent systems
Here's how you deliver intelligent systems for the evolving wearables ecosystem
Nuance and Intel keep their heads in the cloud
Collaborating on cloud computing to advance intelligent NLU systems
Got GPUs? Nuance puts groundbreaking NVIDIA GPUs to work to accelerate voice innovation
Ushering a new era in Machine Learning
Contestants for the Winograd Schema Challenge build intelligent systems to test natural language and reasoning capabilities.
Winograd Schema Challenge: Can computers reason like humans?
Results from the inaugural Winograd Schema Challenge unveiled at the IJCAI-16 in New York
How you can use machine learning and natural language methods to accurately answer customer service questions
Part 3 – AI for customer care: Using Machine Learning to solve customer requests
Turning big data into big knowledge for better customer service
An agent in a call center supports virtual agents
Part 1 – AI for customer care: Human assisted virtual agents get smart with big knowledge
Machine learning and AI turn big data into big knowledge for a better customer experience
Machine Learning, Neural Nets, and advanced voice technology are making the robots for homes, banks, hotels, and more, even smarter
Then and NAO: Bringing conversational robots to homes, hotels, and hospitals
Aldebaran's NAO and Pepper show the power of specialized voice experiences for robotics
Students from the University of Koblenz-Landau built Lisa, a helpful social robot who can communicate with humans and perform daily tasks.
Meet Lisa, a world championship robot with a lot to say
How this student-built social robot can lend a helpful hand to our aging society
How machine speech systems use and make sense of ellipses rhetorical devices
Innovating dialog: How machines use and make sense of ellipses
Building speech systems that naturally use ellipses in human-machine interaction
Traveling alone in Spain, I formed a new appreciation for language as an art and the technology we’ve built to interpret and interact with people
Lost in translation: A solo trip abroad and discovering the art of language
Sunrises in Spain and machines you can talk to
deep learning connected car echnology
Deep learning, coming to a car near you
Computing systems inspired by the human brain change the way we interact with cars
Nuance Research Conference 2015 explored R&D topics like Deep Neural Nets, Artificial Intelligence, Natural Language Understanding, Anaphora, and more
Technology inspired by humans – A look back at NRC 2015
Reflections from Nuance Research Conference 2015
Two things so different can live in harmony - these are the positive effects of artificial intelligence on humanity
It’s time to take off your tinfoil hats: AI is safe for human consumption
Exploring the effects of artificial intelligence on our daily lives
artificial-intelligence
Humanizing technology through Cognitive Computing and Artificial Intelligence
Nuance furthers AI investment with DFKI research center
audio waves
Will machine language bring about the demise of voice actors?
The science behind creating next-generation synthetic voices
Dragon Mobile Assistant Girl and Phone
The never-ending evolution of natural language understanding
Rapid development in natural language understanding creates new possibilities
SWSA Logo
Nuance’s Peter Patel-Schneider receives prestigious SWSA Ten-Year Award
SWSA award honors most impactful research
cmb_office_103113_37
Nuance opens new Mobile Innovation Center in Cambridge’s Central Square
The new mobile innovation center is home to the expanding segment of Nuance’s R&D.
intel-logo
Nuance and Intel keep their heads in the cloud
Collaborating on cloud computing to advance intelligent NLU systems
Seeing is like hearing for machines and human brains
Hearing is like seeing – for our brains and for machines
How CNNs developed for image recognition help with ASR and NLU, too
Machine learning turns bags of words from big data into big knowledge for customer care
Part 2 – AI for customer care: Turning ‘bags of words’ into meaning with machine learning
Machine learning and AI turn big data into big knowledge for a better customer experience
The Future Mobility vehicle becomes a contextual and highly personalized digital living space.
Mercedes-Benz’s Margarete Wies discusses the future of the connected car
Extending digital living with infotainment systems, autonomous vehicles, and more
How many Neural Nets does it take to catch the big fish in Machine Learning?
How many Neural Nets does it take to catch the big fish in Machine Learning?
NLU and AI innovation goes deeper so machines can understand human language
Variation can improve accuracy of speaker verification for voice biometrics
Just be yourself: More on variation, voice biometrics, and the science of voice technology
Using Deep Neural Networks to add variation and improve accuracy
in communication, speech systems are built to interpret and use rhetorical devices like anaphora
Innovating dialogue: How machines make sense of anaphora
Building speech systems that naturally use anaphora in human-machine interaction
in communication, speech systems are built to make sense of and use rhetorical devices like paraphrase
Innovating dialogue: How machines make sense of paraphrasing and adult language
Building speech systems that naturally use paraphrases in human-machine interaction
deep-machine-learning-metaphors
Getting “deep” about “deep learning”
A detailed exploration of deep machine learning, a concept rooted in metaphors
On Super Pi Day, we celebrate those who dare to chase the impossible and innovate a futuristic world full of things even Mr. Spock couldn't imagine
The intersection of Science Fiction, super-pi, and technology innovation
An ode to Mr. Spock and to chasing the impossible
aaai-senior-member-david-martin
Nuance senior research scientist David Martin receives AAAI Senior Member status
Leading Artificial Intelligence industry group recognizes Martin for career achievements
nuance-research-conference-2014
Why “innovation” doesn’t always have to be new (or at least on first sight)
Decades old concepts give light to revolutionary innovations
future of AI her movie
Can we build ‘Her’?: What Samantha tells us about the future of AI
The journey to making virtual assistants more humanlike
The Queen congratulating Nuance's Ron Kaplan after he receives his award.
Nuance’s Ron Kaplan awarded honorary doctorate from University of Copenhagen
Award recognizes significant contributions to linguistics and natural language
Dragon Mobile Assistant Photo
Beyond the GUI: It’s time for a conversational user interface
Conversational user interface promoting new interactions between people and devices
vladheadshot
Nuance Chief Technology Officer Vlad Sejnoha named 2013 CTO of the year
Nuance CTO presented with the CTO of the Year Award at Mass TLC Awards
nvidialogo2
Got GPUs? Nuance puts groundbreaking NVIDIA GPUs to work to accelerate voice innovation
Ushering a new era in Machine Learning
How to get the key foundation right for automotive HMI with audio and touch input
Part 1: How to avoid 5 common automotive HMI usability pitfalls
Audio and touch input are at the core of a powerful automotive HMI system
DFKI students use nuance speech tools to create interactive IoT applications
Nuance and DFKI help students create interactive appliances of the future with speech tools
Providing easy to use speech tech helps usher forth tomorrow’s interactive appliances
Star Wars’ portrayal of relationships between robots and humans is becoming today’s reality with new technological advancements
In a galaxy (not so) far, far away
Star Wars and the relationship between man and machine
Connected cars are becoming more connected to the IoT and more useful, employing ubiquitous personal assistants that exist across devices and experiences
KITT – Please open the garage
How talking cars that talk to "things" make life simpler, smarter
Speech systems need to observe and deal with pauses and other variation to elicit more natural communication between man and machine
Taking a pause to discuss speaker variation… and Machine Learning
New research observes variation in communication to abstract meaning
Female Nuance engineers share stories about combatting gender stereotypes in the tech industry
#iLookLikeAnEngineer: Breaking down gender stereotypes in tech
An inside look at Nuance and how diversity fuels innovation
The ancient Greeks discovered rhetorical devices which are now common in everyday language - something we need to specially design speech systems to accommodate
Innovating machine dialog: Brush up on your Greek and read Aristotle
Building systems that can make sense of Rhetoric and nuanced speech
Childlike curiosity, being comfortable with a blank page... Nuance researchers share what qualities they think a good researcher possesses
The personality of Science: The traits that help define an industry
Nuance researchers answer, “What qualities does a good researcher possess?”
Star Trek, Mr. Spock, and the future of patient care
Star Trek, Mr. Spock, and a highly sought-after future
How a vision for the future helped allay the anxieties of the time.
winograd-schema-challenge
Can machines think?
Nuance to host annual Winograd Schema Challenge, an alternative to the Turing Test
Putting The Puzzle Together
Ethics and design: Doing the right thing
The importance of design stretches far beyond basic appeal
innovation and design
Innovation and design: The coolness (and unusability) of our rich friends’ houses
The challenges of designing for fanfare vs. usability
inno6
Video: Innovating a relationship between people and technology
Making technology that works *for* us - not against us
Smart TV Living Room-1
Voice recognition and the dawn of intelligent systems
Examining the rapid progress of voice recognition and natural language understanding
Wearables like smart watches need intelligent systems to enable a meaningful human-device interaction
Smart watches need intelligent systems
Here's how you deliver intelligent systems for the evolving wearables ecosystem
Show more articles