Innovating machine dialog: Brush up on your Greek and read Aristotle

Intricacies in our speech are inherent in our everyday lives; they are acquired over time and ultimately become natural ways for us to interact with the people – and things – in the world around us. Speaking to devices the same way we speak to each other and having them understand us, though, is not second nature.
The ancient Greeks discovered rhetorical devices which are now common in everyday language - something we need to specially design speech systems to accommodate

Intricacies in our speech are inherent in our everyday lives; they are acquired over time and ultimately become natural ways for us to interact with the people – and things – in the world around us. Speaking to devices the same way we speak to each other and having them understand us, though, is not second nature.

To illustrate this, take these three words: paraphrase, anaphora and ellipsis. Not by accident do all three words have a Greek root (παράφρασις, meaning “additional manner of expression,” ἀναφέρειν or “to refer to something,” and ἔλλειψις “omission,” respectively). The reason is that these concepts were all discovered by the Ancient Greeks as part of their theory of Rhetoric, prominently in a book by Aristotle of the same name. As one of the “seven liberal arts,” Rhetoric, the art of speaking, made it into the canon of medieval schools.  As such, you may have learned about Rhetoric, hyperbole, and metaphors, and many others – at grammar school. However, many people think of these rhetorical devices as something special that is only used as “ornaments” in highly-elaborated speech by skilled speakers (something I explored further in this post).

Linguists and communication researchers in the 20th century came to a quite different conclusion: any communication – not just literature – but your everyday dialog on the street, is full of these rhetoric devices (be it with a slightly changed definition, e.g. a linguistic anaphor is a bit different from a rhetorical anaphor); without them, communication is not really possible. As human speakers we use these devices quite automatically, and as listeners we decode them without really thinking about it. Let’s have a look at a snippet from The Big Bang Theory (from Series 5 Episode 11 – The Speckerman Recurrence).

Leonard: I don’t think something like that’s [glasses that let you see 2-d movies in 3-D] even possible.

Jimmy: Aw, come on, you can figure it out. You’re like the smartest guy I’ve ever known.

Sheldon: The smartest? All right, you know, I may not have a firm grasp on sarcasm, but even I know that was a doozy. Leonard, you can’t live in fear of this man forever.

Leonard: Sheldon, I got this.

Sheldon: You clearly don’t. What my spineless friend lacks the courage to say is you’re a terrible person who took advantage of his tiny size, his uncoordinated nature and his congenital lack of masculinity.

Even Sheldon’s speech – despite his general inability to detect irony and sarcasm, as he alludes to here – uses lots of rhetorical devices. For example, his description of Leonard as his “spineless friend” is clearly a metaphor (or so we hope for Leonard’s sake), but also his “firm grasp on sarcasm” is another metaphor because sarcasm isn’t a thing you can literally grasp (let alone firmly). His “tiny size” for Leonard, whose size is just a little under average, is a hyperbole, so is Jimmy’s “the smartest guy I have ever known.”  Sheldon anaphorically refers back to Jimmy’s preceding statement in his “that was a doozy;” there are lots of other anaphoric uses of pronouns like “it,” “that,” “this.” And the phenomenon of ellipsis is clearly at work in “You clearly don’t [get this],” etc. Now imagine how our mobile and consumer devices might respond to these myriad of linguistic devices.  They are core to human language and often, we’ll approach our conversations with voice-enabled virtual assistants in the same way.  And while voice and NLU are quite good so that some will understand these aspects of our language, there is much research underway here at Nuance to driver greater intelligence and machine learning and understanding.

For some rhetorical devices, processing them can be rather simplistic. While it is intriguing to think about a computational model of how metaphor works (and such attempts have been made), for all practical means if you “lexicalize” the metaphorical meaning of words and phrases you are already done. That is, in system building time you equate a phrase like “to lend somebody a hand” with the meaning of “to help” as its definition in the system’s lexicon rather than let it figure it out at runtime (or fail to do so and treat the phrase in a literal way). For others, this simplistic processing is not possible. That’s when it gets really interesting.

Teaching these systems to listen, derive meaning, and carry out the appropriate actions related to complex requests by a user is rooted in deep learning, natural language processing, and artificial intelligence advancements. Without continued research in these areas, we would be dependent on rigid commands, forgoing the natural – and beautiful – complexities of speech as we know it. A previous post by my colleague Charles Ortiz expanded on this topic, answering the question, “Can machines think?” Relatedly, I explored the evolution of deep machine learning since the 1950s, an area of research that today we have seen contribute to increased accuracy and decreased error rates for speech systems. And that’s just the tip of the iceberg (yes, there had to be time for one last metaphor before moving on).

Be on the lookout for more blog posts in this series in the coming weeks as I explore paraphrase, anaphora, and ellipses – and also how these rhetorical devices are considered when designing intelligent, interactive systems powered by speech.

Tags: , , ,

Nils Lenke

About Nils Lenke

Nils joined Nuance in 2003, after holding various roles for Philips Speech Processing for nearly a decade. Nils oversees the coordination of various research initiatives and activities across many of Nuance’s business units. He also organizes Nuance’s internal research conferences and coordinates Nuance’s ties to Academia and other research partners, most notably IBM. Nils attended the Universities of Bonn, Koblenz, Duisburg and Hagen, where he earned an M.A. in Communication Research, a Diploma in Computer Science, a Ph.D. in Computational Linguistics, and an M.Sc. in Environmental Sciences. Nils can speak six languages, including his mother tongue German, and a little Russian and Mandarin. In his spare time, Nils enjoys hiking and hunting in archives for documents that shed some light on the history of science in the early modern period.