It is quite amazing how much speech has come to be able to be used to control things in our everyday lives. Way back in the 1950s when it was first demonstrated that it was possible to say something to a machine and have that machine recognise what you said, it was hardly credible that by now we would be able to stand in our own homes and ask a device to tell us what’s on at the local cinema, or whether it will rain today. But we can, and that’s only the beginning.
I would not go so far as to say that we are going to see the end of the keyboard for interacting with computers, at least not anytime soon, but it will become much less important as time goes on. That’s for a number of reasons.
A growing number of devices don’t cater for keyboards – and shouldn’t do so. Smart devices in our homes from doorbells to coffee makers, fridges to smart speakers, would be less useful if we need to use a keyboard with them. They need first class speech recognition to allow for seamless use.
There is also the fact that what we used to think of as a computer has changed dramatically. Not that long ago, the primary computer at home and at work was a desktop computer, its separate components sitting in a large case, and connected by wires to a screen, keyboard, mouse, speakers, and maybe other ‘peripheral’ devices as well. Now laptops are the norm, and wireless connections are prevalent.
In addition, we carry smartphones and use tablets, and both are perfectly capable of a wide range of office productivity functions – and both can respond to voice, including transcribing speech into text. Nuance recognises this with our own Dragon Anywhere app. And there are computers everywhere we look. They’re in our washing machines, our watches, and our cars, and in use every time we pay for something whether by cash, card or funds transfer.
At least as important as these factors, and probably more important in terms of guiding the future direction of speech recognition, is our expectations. There is now a whole generation of people who have grown up in a world where the internet, computers and mobile phones are the norm. They can’t remember a time before email.
The ‘digital native’ ‘Generation Y’, those born roughly between 1980 and 2000 (date estimates vary), and Generation Z, born since the late 1990s, are the ones shaping future expectations of speech technology. They don’t expect to faff about with a keyboard to find out if it will rain tomorrow – they just want to ask the question and hear the answer.
Speech recognition technology, as it is today, is able to meet only some expectations. Understanding everyday speech is hard, because language is complex, but speech technology needs to be able to have useful conversations with people, and it is progressing towards that goal at speed.
Dragon’s deep learning capabilities means it is better than ever at turning the spoken word into written text – and getting the meaning and context right. The next challenge is to develop the technology so that it can understand everyday speech, react to it, and hold a meaningful conversation. That’s not just about keeping digital natives and Generation Z happy, it’s about the future of work, leisure and life for all of us.