So said the Editor-in-Chief of PC Magazine, Michael Miller in 1997, when Dragon Systems (now Nuance) released Dragon NaturallySpeaking at the World Trade Center in New York City. The technology landscape looked quite different in 1997. Steve Jobs rejoined Apple as CEO and Windows 95 – the first “modern” Windows – was gaining market share. The IEEE released the “802.11” standard – today known as “Wi-Fi” – to wirelessly connect computers to the still nascent World Wide Web. Productivity-minded professionals mastered the Palm Pilot’s “Graffiti” handwriting recognition shorthand as Personal Digital Assistants reigned five years before the first Blackberry smartphone. The “dancing baby GIF” was the internet’s first sensation. Mark Zuckerberg was not yet at Harvard, and two Stanford dropouts – Larry Page and Sergey Brin – renamed their “BackRub” search engine by registering the google.com domain name!
It was against this backdrop that Dragon Systems made its own mark by bringing the first speech recognition program to the general population that understood natural, human language speech and converted it to text. It was, of course, called “Dragon NaturallySpeaking” and it kicked off a revolution in how people, and eventually entire industries, interact with computers and increase their productivity with documentation.
The concept of computers understanding human speech was not new (“the computer” in Star Trek remains a popular cultural marker). Early speech recognition systems had significant limitations – they worked with a limited, predefined, vocabulary, and users had to speak words individually in a halting, staccato-like fashion. Recognition accuracy was often mediocre, and the speed of transcription was frequently not much faster than traditional typing. Dragon NaturallySpeaking, while “not perfect” at the time, represented a sea change in addressing earlier shortcomings. It was the first software to make speech recognition practical for business services professionals, students, authors, bloggers, persons with physical or cognitive disabilities, and busy parents looking to reclaim time in their hectic lives.
Today, we take for granted the ease in speaking to computers and having them understand our intent or transcribing our voice – the most natural “input” mechanism around – into accurate text. But how did we progress from the earliest speech recognition systems to the “big step forward but not perfect” Dragon NaturallySpeaking of 1997, to the speech recognition systems we enjoy today, like Nuance Dragon Professional Anywhere and Nuance Dragon Medical One, that are fast, up to 99% accurate and available for multiple industry verticals like Healthcare, Law Enforcement and Legal? How did the magical ability to convert our voice to text become available on the smartphones we carry around with us everywhere today? What allowed speech recognition to effectively accommodate people with accents, and become available in world languages like German, Spanish, French, Swedish, and Italian? The answer is perhaps best captured in two words: technological convergence.
Modern speech recognition is, at heart, a statistical numbers game, turbocharged by technological convergence. When a person’s voice is digitally captured, software matches those sounds with word sequences. An acoustic model compares the voice to vast digital libraries of phonemes (the smallest units of consonants and vowels in a spoken word), while a language model provides context (distinguishing between words that sound the same, like “whether” and “weather”). The result is voice transcribed into text. In the past ten years, the key technologies that made this practical and economically viable for industry have matured. Massive computing power (often delivered via cloud-hosting platforms like Microsoft Azure), vast acoustic libraries, highly sophisticated algorithms (driven by breathtaking advances in machine learning and AI), faster, ever more powerful hardware (the latest smartphone) and the ubiquitous availability of high-speed, mobile connectivity (today’s rollout of 5G networks), have collectively delivered the miracle of affordable speech recognition that we know today. In fact, while the phrase “Artificial Intelligence” is seemingly everywhere today, it was speech recognition – a technology with roots dating back to the 1970’s – that became one of the first, and perhaps largest, beneficiaries of its profound advances.
For the past 25 years, Nuance has held tight to the vision of humanizing computers and making them natural to use. This legacy is visible today in the millions of clinicians that use Nuance Dragon Medical One to capture the patient story with accuracy and patient empathy, while reducing clinician burnout. It is seen in the thousands of law enforcement professionals that use Nuance Dragon Law Enforcement to stay safe, and “heads up” and situationally aware capturing police reports three times faster than typing in their patrol vehicles. It is visible in caring professions like social workers quickly capturing client notes and insights in their mobile work settings that comprise their “office.” It is changing lives for those with physical or cognitive disabilities, as Nuance does its part to further the goals of the disability inclusion movement. Finally, the vision is taking shape today as Nuance – now a Microsoft Company – is advancing ambient computing as the next frontier in intelligent AI to generate “clinical documentation that writes itself” within the healthcare sector.
Happy 25th Birthday, Dragon!
We cannot wait to see what you will do in the next twenty-five years! And, you know what? Maybe “the computer in Star Trek” isn’t so far away after all! (Hint: stick around to the 1:50 second mark!)