Nuance recently concluded its Nuance Research Conference (“NRC”) in Montreal, where the Company’s diverse research team from all over the world comes together for four days to dig deep into the next generation of voice and touch innovation.
If this year’s conference had a theme, it was “Inspired by Humans.”
The event began with a scientific keynote by world-famous Deep Learning expert, Yoshua Bengio from Montreal University, which was a short walk from the conference. Professor Bengio spoke about “Deep Learning with Attention” and the entire (Deep) “Neural Net” approach. These nets are a particular form of machine learning, and consist of layers of heavily interconnected simple processors (“neurons”). Their structure is based on the way real biological neurons appear to be organized in the human brain, and their application over the past few years has led to impressive gains in speech and image recognition, as well as meaning extraction.
Prof. Bengio focused on a particular innovation he called “learning with attention,” inspired by our ability to select only information that’s relevant to solving a particular problem (like understanding an utterance or solving a mathematical problem), as opposed to always considering all background knowledge, along with every bit of contextual information (relevant or not). Instead, Prof. Bengio discussed the importance of allowing the deep networks to learn how to focus on only what’s relevant, along the lines of humans’ “attention” guiding them through problem solving.
Neural Nets continue to grow in complexity and the scope of information they process, and the new concept of “attention” helps these more sophisticated nets to simultaneously learn how to focus on the relevant subsets of information needed for each processing step.
Many of the 115 posters and 27 talks presented by Nuance researchers touched on Deep Neural Nets: how to apply them to ASR, TTS, Voice Biometrics, how to combine them with adaptation techniques for personalization, how to deal with a multi-lingual aspect, how to speed up training times by applying innovative hardware, and many more questions were addressed. And during conversations with participants, it was clear that this was the hottest topic of the conference.
The “inspired by humans” theme returned in another scientific keynote. Tiago Falk from INRS (also in Montreal) spoke about “modulation spectrum processing of reverberant and dereverberated speech.” While the topic sounds complex, the idea behind it is quite simple: Tiago’s research starts from his understanding of how humans process audio (especially speech) and applying it to analyzing and enhancing speech signals. Or, as he eloquently put it: building “anthropomorphic human-machine interfaces.” His talk and demos sparked a lively follow-up discussion with the “speech enhancement” community at Nuance (who, of course also presented some of their results during the conference, like how new smartphones now have more than one microphone that can be exploited to make speech systems more immune against background noise) and a visit to Tiago’s lab right across the street.
Speaking of Neural Nets – When we say that Neural Nets or other Machine Learning techniques are inspired by humans, we don’t claim that they are an exact model of brain’s work or even could replace human brains.
Deep Neural Nets and Machine Learning are incredibly hot topics right now when it comes to all things AI – and even AI itself has been the center of deep discussion on what it will bring for the future. There are some bold claims out there that machines will take over from us humans or even ruling us in the future. We have a more balanced view. Terms like “Neural Nets” are viewed as metaphors that inspire us to find ideas in human behavior that we can apply to technology and assist humans. A great deal of learning still needs to be done to keep up with complex human communication, thinking and behavior (and the ease with which humans do it).
Take anaphora as an example. Using anaphora (a topic of several presentations at this year’s NRC along with other advanced linguistic topics) is a way of “pointing” back (another metaphor) to something somebody said earlier. Let’s look at the illustration below depicting an intelligent assistant in the car environment. How do we, as humans, know (without thinking much) that “they” refers to the restaurant and not the charging station? Obviously you need your computer to learn about world knowledge, deep linguistic analysis and dialog in order to mimic this.
Another hot topic at NRC this year was how we as researchers and innovators can make technology easier to be used by humans. Speech recognition, Natural Language Understanding, AI and related technologies are all relatively complex and need deep know-how in order to write grammars, design dialogs, train and tune models, etc. We need to empower the broader ecosystem to make it easier and more accessible to a wider community. We see these capabilities emerging as part of our developer tools and programs here at Nuance, with a mission to arm the world’s dreamers and innovators with the ability to easily transform the way we engage with ‘Things,’ systems, cars, apps and much more. Looking back at NRC 2015, I am confident we’ll get there.