The science behind creating next-generation synthetic voices is an elusive world comprised of highly-trained speech linguists, voice-over actors reciting text in booths, and fancy technology paired with trade secrets that are, quite frankly, so secret and complex that they fall outside my immediate scope of knowledge. But what I do know is that these voices – whether synthetic or real – lead everyday people like you and I to form emotional attachments to them, eliciting varying physiological reactions.
Take the sultry, smooth timbre of Scarlett Johansson’s voice as Samantha, an intelligent operating system in the movie Her. Sure, you never actually saw the actress in the film, but, nonetheless, you instinctively knew her voice and she was able to make you believe that she was somehow ‘present.’ The iconic voice of Morgan Freeman has been paired with documentaries, making the issues unfolding on-screen appear that much more striking and significant. When he speaks, he doesn’t ask for our attention; he commands it. And hey, that’s probably why he’s made it to the semi-final round of our Greatest Voice tournament.
Yet, while we know of the profound impact that voices have on us and the many roles that they serve, there is still controversy over the science of machine language being used to generate synthetic voices that power the rapidly-evolving landscape of conversational personal assistants. Well – there is at least from one party: the voice-over actors. When companies will one day be able to design a voice of their choosing to reflect their specific brand’s characteristics, or even when Joe Schmo wants to create a voice for his robot girlfriend (no judgments), what role will voice actors play? For now at least, a pretty big one.
“Today’s talking phones and cars are almost human sounding. That’s because they are human. Or at least, they once were.” – The Verge
Besides being cast in roles in which they provide their voices for animated family movies, radio broadcast product endorsements or movie trailers showcasing the next blockbuster hit, voice actors are really the foundation upon which we build these futuristic voices. It’s one thing to simulate a voice, but to make it believable – to make it represent a humanlike entity that you want to converse with in the car and over tea – is another entirely. It involves countless hours of voice actors reading scripts comprised of a nonsensical strings of words and phrases, strategically crafted to reflect the peak of phonetically rich language. Then once these words and phrases are catalogued, a team of linguists are able to attach meaning to various data sets and eventually use a text-to-speech engine to generate speech that sounds scarily close to that of a voice actor, because, well, it was technically a voice actor’s expressivity that helped shape it. It’s a long and arduous yet wildly-rewarding process.
I have no doubt that the voices of Walt Disney as Mickey Mouse, Seth MacFarlane as Peter Griffin, and James Earl Jones as Darth Vader will live on in infamy; I wouldn’t even begin to make the case against it. But for now, we’re focused on something we find much more exciting: creating the voice that you hear coming out of your smartphone, PC, fridge, or any other connected device. And with the help of both voice actors and refined linguistic technology we are able to realize the once-seemingly impossible feat of giving computers a voice.