Isn’t innovation, the root of which is the Latin word “nova,” meaning “new,” always about new things? Well, it turns out there are (at least) two types of innovation: one that you can see and one that you cannot see, or not easily see, anyway. The former involves creating entirely new products – or at least new features of products – while the latter has to do with existing products or features (and hence on first sight there is seemingly no innovation at all) but comes up with new ways of making existing features just work better. Speech recognition accuracy is a classic example of the latter; a more accurate Automatic Speech Recognition (ASR) application doesn’t look different on first sight, but you experience the difference once you’re using it. Of course the two types of innovation are not completely independent; often new features or products only become feasible because underlying technology gets more powerful and reliable. Intelligent assistants running on mobile phones being used in noisy places and reacting to complex commands from their users just wouldn’t have been possible on top of ASR as it was 10 years ago. This is then the second reason why innovation isn’t always new: it may have been conceived decades ago but has only now come to fruition.
Of course, the unseen innovation behind better performance of old features is as important as the easy-to-see one, and it is just as much innovation in itself. You don’t bring down error rates by 20% (or more) in subsequent versions of Dragon NaturallySpeaking without a lot of ideas about how to make ASR engines better. And what’s more, finding a way to make an “old” innovation finally work can also be a truly innovative act.
Some of these more refined innovations were on display a few weeks ago when Nuance held its annual Research Conference. Already the 7th installment, this year’s conference drew its biggest attendance ever: more than 250 researchers from Nuance’s labs in the US, Canada, Belgium, France, the UK, and China, among others, along with a number of delegates from Nuance’s research partner, IBM, joined for four interactive days packed with content.
The opening keynote titled, “Big Knowledge, not just Big Data” by Peter Patel-Schneider, the well-known semantic web expert, revealed our recurring theme which was, appropriately, the semantic web. As you may already know, ontologies and semantic nets are not new concepts. They have been around for thousands of years (if you count that Plato defined a human being as a “featherless biped” and Aristotle declared such definitions by genus and differentia to be the best way to define concepts), and in the Artificial Intelligence (AI) community they have been discussed as a means to capture knowledge on a computer since at least the 1960s. The problem back then and until quite recently, however, has been the following: constructing and reasoning on a toy example is easy – a car IS-A vehicle, vehicles are USED-TO move, ergo a car is used to move something (or someone). This example is helpful but if you want to create something useful you need to understand not only millions of concepts but also the relations between them. Assembling these manually is, if not impossible (it has been done), at least very expensive and takes significant amounts of time. And then once these concepts are organized, performing reasoning on them becomes quite tricky, as there are a multitude of potential conclusions you could draw from all of these facts. The importance of a system being able to make sense of complex sets of data is underscored by the following example: Say you have plans to meet a friend for dinner tonight but you haven’t yet decided where to go. Naturally, you reach for your phone and ask to see which restaurants are nearby and accepting reservations. It’s 1:00 pm already (dinner is only five hours away) so you need the information quickly. Who needs an intelligent assistant who broods for hours over an answer to your question? The answer, of course, is no one. We expect information to be readily available to suit our needs, so technology must be able to satisfy these needs.
Following Patel-Schneider’s keynote, knowledge acquisition and learning from data were the topic of several of the more than 30 talks and 120 posters. Only recently has the idea of ontologies and semantic nets begun to have a true impact on commercial systems. The availability of huge data collections, techniques to automatically learn and extract knowledge from these collections, crowd-sourcing initiatives like Wikipedia, new ways of storing facts and doing inference, and modern hardware with its cheap memory and impressive computing power all had to come together to make this happen. One field where this is beginning to come into fruition is healthcare, particularly with Clinical Language Understanding (CLU). Combining ontologies with machine-learning methods to extract facts from hundreds of thousands of documents and learning processing rules leads to systems that aid humans in performing tasks like filling Electronic Health Records (EHRs) or assigning billing codes. These innovations meet a huge demand driven by factors such as the introduction of ICD-10 in the healthcare arena, as Nuance’s CMIO Reid Coleman, M.D explained in another keynote address.
Soon, more leisurely activities like watching TV will even profit from the concept of “Big Knowledge.” At the conference we created a digital living room where we demoed emerging consumer technologies and then engaged in sessions to discuss further. For example, we demonstrated how the next generation of the Dragon TV application will be able to answer queries like, “Show me movies about spies” in a smart way, even if “spies” are not mentioned in a film’s description (see below), as Adwait Ratnaparkhi explained in his accompanying talk.
Another poignant example of an innovation which is not only “hidden” but also based on a rather old concept is Deep Belief Networks which have revolutionized the architecture of ASR (and related systems) over the last few years; This topic was highlighted at the conference by a number of talks and posters on everything from acoustic and language modeling to text-to-speech (TTS) and voice biometrics. As the more recent term “Deep Neural Networks” already indicates, this is the renaissance of the Neural Networks (NNs) of the 1990s. Back then these networks were hugely appealing to AI researchers as they, modeled after the human brain, seemed to have a superior explanatory and modeling power over the Hidden Markov Models/ Gaussian Mixture Models (HMMs/GMMs) combination, which had dominated acoustic modeling in ASR since long before. But yet again it turned out that when applied to “real problems” training methods for NNs were either vulnerable to get stuck in local minima or computationally too consuming to be of practical value. And so HMMs/GMMs continued to rule, bringing error rates down and accuracy up by piling many smaller optimization and (yes) innovations within this framework. Again, the combination of several factors – modern, more powerful hardware including GPUs (chips originally developed for graphical processing, think: computer games) and new ideas about how to train NNs – changed this picture radically over the last few years. Once researchers knew how to make them work, “deep” NNs helped researchers to bring error rates down by 25% or even more with just this one [at its core, rather old] innovation. After this first revolutionary wave swept over ASR and its relatives of TTS and voice biometrics a couple of years ago it was followed again by a phase in which many smaller innovations saw the day of light day, now within the framework of DNNs.
It is probably no coincidence that the examples above involved concepts from the field of AI and Computational Linguistics which are now being combined with data-driven, mathematical models. After functioning in very separate worlds for a long time it turns out that purely statistical methods can profit from absorbing “new” ideas and concepts from AI, whereas the latter finally matures into something useful by pulling from huge data sets, machine learning, and the experimental rigor of its mathematical cousin. This same idea is also reflected at Nuance: after being known as a “Speech” company for many years and having been dominated by statistical models powering ASR, the company has now also absorbed the “other side” by hiring a team of AI and NLU researchers including, in addition to Peter Patel-Schneider, well-known researchers such as Charlie Ortiz and Ron Kaplan. And as another sign of this shift, Nuance has just acquired a share in DFKI, the largest AI research lab in the world.
And of course, at the conference you could find delegates from both of these worlds alike. When you take into account the size of Nuance and its product portfolio, the many ways in which new techniques developed for one product may be combined with ideas learned for a quite different product, and so on, it is hard to imagine how 200+ researchers, provided with a thoughtful program of talks by their peers and enough time to have a chat in the hallways of the conference could not come up with a lot of “innovation” right on the spot – be it of the more “hidden” form initially.
It should be emphasized, though, that as we continue to progress our successes are in large part reflective of our dedicated research team’s efforts – a team that we enthusiastically invite you to become a part of as well. After all, the world’s greatest inventions came to fruition by collaboration – whether it’s through uniting theories or data sets or individual skill sets, that’s up to you. Search research openings at Nuance and apply now for a chance to be part of the research team which is reinventing the relationship between people and technology. Hey, I just might see you at next year’s Nuance Research Conference.