What’s next.

Continued progress in reinventing the relationship between people and technology.

×

Hearing is like seeing – for our brains and for machines

In a time when Neural Networks are increasingly popular for advancing voice technologies, language understanding and AI, it’s interesting to remember that many of the current approaches were originally developed for image or video processing. Studying Convolutional Neural Networks (CNNs), it’s no coincidence that the brain uses very similar processes to process both visual and audio/speech stimuli, is it?

By
Seeing is like hearing for machines and human brains

As noted in previous posts, there is an array of neural net machine learning approaches that are simply more than just “deep.” In a time when Neural Networks are increasingly popular for advancing voice technologies, language understanding and AI, it’s interesting that many of the current approaches were originally developed for image or video processing. One of those methods, Convolutional Neural Networks (CNNs), creates exciting opportunities for advancing the state of the art in voice, and it’s easy to see how image processing neural nets can be applied today to voice when compared to the way we as humans process things in our brains.

Here’s what you need to know about CNNs:

When people search for visual features, say edges or curves at a lower level or eyes and ears at a higher level (in the example of face recognition), you typically do so locally, as all relevant pixels are close to each other. In human visual perception this is reflected by the fact that a cluster of neurons is focused on a small receptive field, which is part of the much larger entire visual field.  Because you don’t know where the relevant features will appear, you have to scan the entire visual field, either sequentially, sliding your small receptive field as a window over it top to bottom and from left to right, or have multiple smaller receptive fields (clusters of neurons) that each focus on (overlapping) small parts of the input. The latter is what CNNs do. Together, these receptive fields cover the entire input and are called “convolutions.” Higher levels of the CNNs then condense the information coming from the individual lower level convolutions and abstract away from the specific location, as shown below.

convolutional-neural-networks-process

 

Because CNNs originated in image recognition, my colleagues who work in handwriting recognition (a visual task) find CNNs very useful for their work, achieving more than 60% error reduction versus previous methods.

But we have also found several applications of CNNs to speech and language.

For example, my colleague Raymond Brueckner just contributed to a paper published at ICASSP 2016 last month, which shows how CNNs can be applied to a raw speech signal in an end-to-end way (i.e. without manual definition of features). The CNNs look at the speech signal by unfolding an input field with time as one dimension and the energy distribution over the various frequencies as the second dimension into their “convolutions,” thereby learning automatically which frequency bands are most relevant for speech. The higher layers of the network were then used to detect emotions in the speech signal.

The next example is “intent classification” in Natural Language Understanding (NLU), or understanding from a user request what type of task the user wants to achieve (we covered how the other aspect of NLU, named Entity Recognition, works in this post).  For example, in the command “Transfer money from my checking account to John Smith,” the intent would be “money_transfer.” The intent is typically signaled by a word or a group of words (usually local to each other), which can appear anywhere in the query. So, in analogy to image recognition we need to search for a local feature by sliding a window over a temporal phenomenon (the utterance; looking at one word and its context at a time) rather than a spatial field. And this works very well: when we introduced CNNs for this task they performed more than 10% more accurately than the previous technology.

 

Neighbors in the brain – and in the field

Why are CNNs successful at these tasks?  A rather straightforward explanation could be that they just share characteristics with image processing; they are all of the ‘find something small in something bigger, and we don’t know where it might be’ type. But there may be another, a little more interesting explanation, namely the fact that CNNs designed for visual tasks also work for speech-related tasks is a reflection of the fact that the brain uses very similar processes to process both visual and audio/speech stimuli.

Consider phenomena like Synesthesia, or the “stimulation of one sensory or cognitive pathway lead[ing] to automatic, involuntary experiences in a second sensory or cognitive pathway.”  For example, audio or speech stimuli can lead to a visual reaction. (I have a mild version of this, for me each day of the week, or rather the word describing the day, has a distinct color, Monday is dark red, Tuesday grey, Wednesday a darker grey and Thursdays a lighter red, and so on). It is being interpreted as an indication that processing of audio and speech signals and optical processing have to be so-called “neighbors” in the brain somehow. Similarly, it has been shown that brain areas designed for the processing of audio signals and speech can be used for visual tasks, such as people born with hearing impairments who can re-purpose the audio/speech area of their brains to process sign language. This probably means that the organization of brain cells (neurons) processing visual or audio signals must be very similar.

There is also very practical ramification of the similarity of visual and audio/speech and language processing. We have found that Graphical Processing Units (GPUs), which were developed for computer graphics (visual channel), can be employed to speed up machine learning tasks for speech and language, too. The reason is that the tasks that need to be handled again are similar in nature: applying relatively simple mathematical operations to lots of data points in parallel. So you could say it’s the new developments in computer gaming helped to make the training of Deep Neural Nets feasible.

Clearly, there is really no way to be an isolationist when working in this field. Just as my colleagues who work in handwriting recognition study Convolution Neural Networks, the same can be said for those working to advance Natural Language Understanding. We are essentially specialists just as much as we are generalists. Applying one thing to another to improve a process or technology is the nature of our work. But that shouldn’t be a surprise, right? After all, the human brain works in a very similar way, looking at how it processes visual and audio stimuli. Now it’s not so far-fetched to believe that CNNs, originally designed for vision, will ultimately help machines to listen and better understand us – something that’s crucial as we are continually propelled forward into this new era of human-machine interaction.

Read full article

More from the editor

Part 2: How to avoid 5 common automotive HMI usability pitfalls
HMI design is a balancing act of too much and not enough information
Nuance named worldwide global device and print management leader
Nuance Imaging named market share leader among global device and print management vendors
Worries and insights about nurses and technology
Technology and training the next generation of nurses in patient care
Mandates for proactive engagement are perfect catalyst for card issuers
How are you thinking about Visa and Mastercard’s requirements for transaction alerts?
What print architecture is right for you?
Optimizing printing total cost of ownership (TCO)
Unlocking the code to first call resolution
Solving customer issues the first time around
Don’t overlook this critical part of your IT security strategy
Combining print management and document capture significantly improves security
Winograd Schema Challenge: Can computers reason like humans?
Results from the inaugural Winograd Schema Challenge unveiled at the IJCAI-16 in New York
A surprisingly effective way to avoid data leakage and improve security
Prevent confidential information from leaving your business
The four stages of the insurance customer journey: Don’t get left behind
Minimize customer leakage with self-service capabilities
A new video shows the right way to combine PDF documents to avoid “Frankenstein PDFs”
How the right PDF tool can help you eat more ice cream with your family this summer.
The building blocks to create a virtual assistant
What’s in a name? A lot.
Celebrating change: Is it time to revolutionize your call center?
Break free from outdated IVR solutions
Hearing is like seeing – for our brains and for machines
How CNNs developed for image recognition help with ASR and NLU, too
The problem with ‘Silicon Valley’s’ Pied Piper is poor customer understanding
Design and customer research are inherent to building a successful product
Why every business should embrace text messages with customers
The elephant in the room
Physicians play a high stakes game to identify quality problems
Documentation problems masquerading as quality problems hurt reputations
IVR is the bread and butter of customer service
Three tips to increasing the performance of your IVR
Orphaned print jobs: the silent security leak
A closer look at the true cause of print security issues and what can be done to solve them
Part 1: How to avoid 5 common automotive HMI usability pitfalls
Audio and touch input are at the core of a powerful automotive HMI system
Using my voice to write to turn lemons into lemonade
How injury led to a revitalized blog style
Dragon Anywhere and Evernote: Documentation for the productive mobile workforce
Mobile productivity tools enable busy professionals to create, access content everywhere
Disruption is coming to customer service, care of the on-demand economy
Creating intelligent customer conversations
Sepsis: The knock-out punch
How to prevent sepsis deaths, such as Muhammad Ali and Patty Duke
Are your PDFs secure? 5 ways to make sure
The PDF format delivers powerful security capabilities – but only if you use PDF properly
How to tame the wild, wild west of ambulatory CDI
Enhancing hospital revenue strategies in outpatient care for quality-based payments
Bank robbers don’t all look like Bonnie and Clyde
Voice biometrics helps protect companies from fraudulent wire transfers
Sometimes the answer to less paper waste and reduced costs lies within
Nuance reinvents its internal relationship with secure printing using Equitrac
Part 3 – AI for customer care: Using Machine Learning to solve customer requests
Turning big data into big knowledge for better customer service
Survey results: Why hospitals with Nuance CDI found ICD-10 success
Practical strategies for managing the next wave of ICD-10 codes.
Cross-channel, multi-channel, omni-channel: What’s the difference?
Demystify the jargon to craft a clear customer experience
Knock out business risks associated with two-in-one devices
Eliminate security and workflow concerns as two-in-one devices become more popular
Mythbusters: 5 misplaced beliefs about voice biometrics
Concerned about the security of voice biometrics? Let’s fix that
Nina for Messaging advances the future of customer service
Intelligent self-service extends to the messaging channels customer prefer
Improve print management with direct IP network printing
What is direct IP network printing and how can your organization take advantage of it?
How to avoid driver information overload when using automotive human machine interfaces
Part 2: How to avoid 5 common automotive HMI usability pitfalls
HMI design is a balancing act of too much and not enough information
Meet the Visa and Mastercard mandate for card transaction alerts and build a foundation for growing and maintaining market share
Mandates for proactive engagement are perfect catalyst for card issuers
How are you thinking about Visa and Mastercard’s requirements for transaction alerts?
Print-and-document-capture-improves-security
Don’t overlook this critical part of your IT security strategy
Combining print management and document capture significantly improves security
Home and auto insurance companies can avoid customer leakage by building a customer experience around automated and natural self-service options.
The four stages of the insurance customer journey: Don’t get left behind
Minimize customer leakage with self-service capabilities
We celebrate all kinds of decisions countries make on behalf of their citizens – to stay or go, or start a revolution – but companies should be evaluating and constantly making decisions that better their customer service as well.
Celebrating change: Is it time to revolutionize your call center?
Break free from outdated IVR solutions
Customers may not have the memory of elephants, but businesses can give them a helping hand by providing proactive notifications.
Why every business should embrace text messages with customers
The elephant in the room
Improve-security-with-print-management-solutions
Orphaned print jobs: the silent security leak
A closer look at the true cause of print security issues and what can be done to solve them
Dragon Anywhere Professional-Grade Mobile Dictation App
Dragon Anywhere and Evernote: Documentation for the productive mobile workforce
Mobile productivity tools enable busy professionals to create, access content everywhere
Improve-PDF-security-in-five-steps
Are your PDFs secure? 5 ways to make sure
The PDF format delivers powerful security capabilities – but only if you use PDF properly
Equitrac helps Nuance print securely to redcue wasted time and money.
Sometimes the answer to less paper waste and reduced costs lies within
Nuance reinvents its internal relationship with secure printing using Equitrac
Though multi-channel, cross-channel and omni-channel are often used interchangeably, learn the differences in order to craft a clearer customer experience.
Cross-channel, multi-channel, omni-channel: What’s the difference?
Demystify the jargon to craft a clear customer experience
As customers are increasingly using new methods of communication such as texting and messaging apps, businesses must adapt their customer service accordingly.
Nina for Messaging advances the future of customer service
Intelligent self-service extends to the messaging channels customer prefer
Nuance-named-leader-by-IDC
Nuance named worldwide global device and print management leader
Nuance Imaging named market share leader among global device and print management vendors
Choose-the-right-print-architecture
What print architecture is right for you?
Optimizing printing total cost of ownership (TCO)
Contestants for the Winograd Schema Challenge build intelligent systems to test natural language and reasoning capabilities.
Winograd Schema Challenge: Can computers reason like humans?
Results from the inaugural Winograd Schema Challenge unveiled at the IJCAI-16 in New York
save-time-with-pdf-converter-tools-and-eat-more-ice-cream
A new video shows the right way to combine PDF documents to avoid “Frankenstein PDFs”
How the right PDF tool can help you eat more ice cream with your family this summer.
Seeing is like hearing for machines and human brains
Hearing is like seeing – for our brains and for machines
How CNNs developed for image recognition help with ASR and NLU, too
Physicians play high stake games if documentation is not thorough
Physicians play a high stakes game to identify quality problems
Documentation problems masquerading as quality problems hurt reputations
How to get the key foundation right for automotive HMI with audio and touch input
Part 1: How to avoid 5 common automotive HMI usability pitfalls
Audio and touch input are at the core of a powerful automotive HMI system
Offering 24/7 automated self-service, virtual assistants could change the way businesses approach customer service.
Disruption is coming to customer service, care of the on-demand economy
Creating intelligent customer conversations
Enhancing revenue strategies in outpatient care for value-based payments
How to tame the wild, wild west of ambulatory CDI
Enhancing hospital revenue strategies in outpatient care for quality-based payments
How you can use machine learning and natural language methods to accurately answer customer service questions
Part 3 – AI for customer care: Using Machine Learning to solve customer requests
Turning big data into big knowledge for better customer service
As two-in-one laptop devices become more popular, knock out potential compatibility issues with a PDF converter
Knock out business risks associated with two-in-one devices
Eliminate security and workflow concerns as two-in-one devices become more popular
Improve-print-management
Improve print management with direct IP network printing
What is direct IP network printing and how can your organization take advantage of it?
technology advancements in nursing practices
Worries and insights about nurses and technology
Technology and training the next generation of nurses in patient care
By solving customer issues on the first attempt, businesses can improve the call center experience and reduce costs.
Unlocking the code to first call resolution
Solving customer issues the first time around
Preven-data-leakage
A surprisingly effective way to avoid data leakage and improve security
Prevent confidential information from leaving your business
Businesses must carefully piece together all the elements of their virtual assistant to ensure a superior customer experience.
The building blocks to create a virtual assistant
What’s in a name? A lot.
HBO’s Silicon Valley isn’t the only place where companies need to focus on solving real customer pain points.
The problem with ‘Silicon Valley’s’ Pied Piper is poor customer understanding
Design and customer research are inherent to building a successful product
IVR is the bread and butter of customer service, so make sure you’re doing more with it than using it as a staple service.
IVR is the bread and butter of customer service
Three tips to increasing the performance of your IVR
Public speaking coach and trainer Geoffrey X. Lane uses Dragon to continue working while experiencing a debilitating shoulder injury
Using my voice to write to turn lemons into lemonade
How injury led to a revitalized blog style
hospital-support-for-CMS-new-core-measure-set-for-sepsis (2)
Sepsis: The knock-out punch
How to prevent sepsis deaths, such as Muhammad Ali and Patty Duke
Voice biometrics helps protect companies from fraudulent wire transfers.
Bank robbers don’t all look like Bonnie and Clyde
Voice biometrics helps protect companies from fraudulent wire transfers
ICD10 and CDI training result in smooth transition & improve quality and finances.
Survey results: Why hospitals with Nuance CDI found ICD-10 success
Practical strategies for managing the next wave of ICD-10 codes.
Biometrics adoption is increasing, so learn the facts about voice biometrics technology – don’t get swayed by the myths.
Mythbusters: 5 misplaced beliefs about voice biometrics
Concerned about the security of voice biometrics? Let’s fix that
Show more articles

Feeds & threads

 

Loading…