Callers are not used to speaking naturally when interacting with speech automated systems when they reach out to a customer service call center. So we were shocked by our own usability test results when we found users easily spoke in a conversational manner. In this post, I’ll walk you through why that was.
“It’s literally as easy as you guys could possibly make it”
While preparing for the Nuance Conversation Engine (NCE) usability test, we were concerned this was going to be one of the most challenging test scenarios we had ever run.
The test participants responded to six or more questions, including a complex natural language interaction using NCE. NCE is designed to understand sentences from callers containing several pieces of information shared in a conversational manner (called “multi-slot utterances” by us nerds), such as “Sell $25,000 from my Large Cap Value Class A and send it to my bank account.” Surprisingly, participants spoke in full 20-word sentences on average, and one participant said the exchange was “literally as easy as you guys could possible make it.”
This finding was a huge shock. Nuance’s past research has suggested callers respond with an average of two or three words at the Call Steering prompt (when callers receive an open-ended prompt asking the reason for their call, and depending on the caller’s response, they are routed to an agent or a self-service module) for the majority of applications.
We believe the acceptance of having a natural conversation with an automated system is due to the words we use in prompting people to speak and the accuracy of speech recognition.
Let’s talk about prompts and recognition accuracy
To encourage natural conversations, a best practice is to employ personal and dynamic prompts. Personalized prompts are altered to each specific caller’s needs and dynamic prompts include examples to increase the caller’s confidence in responding naturally and selecting an accurate action.
Let’s go back to our earlier example – “Sell $25,000 from my Large Cap Value Class A and send it to my bank account.” This person would call into the interactive voice response system (IVR), state that they want to sell shares, and then they would hear this prompt: “First, tell me which fund you would like to withdraw from? For example, you could say ‘Sell all my shares from Large Cap Value Class A and send me a check’ or you could say ‘Sell 300 dollars from U.S. Government Money Market Fund Shares and send it to my bank account.’ Go ahead.”
From Nuance’s research, we know about 70 percent of investment portfolios have one or two funds, and about 20 percent have three funds. This means callers will most likely hear the funds they want to sell as part of the dynamic prompt examples (in above, “Large Cap Value Class A”). Callers also hear the disbursement methods that they’re most likely to use in a dynamic prompt (in above, “send it to my bank account”).
The dynamic prompting may initially elicit callers’ longer natural language responses, but if the speech recognition engine fails to consistently recognize all different pieces of information – for example, if the fund name is recognized but not the dollar amount – then the caller’s confidence in the system will decrease over time and multi-slot natural responses will stop. Therefore, it’s important the parameters supporting the NCE interaction be specialized for this particular interaction. The parameters for this kind of project will only support the fund redemption functionality (as opposed to also allowing purchases and exchanges) and will only recognize fund names that are within the customer’s portfolio. This ensures high recognition accuracy for all the caller’s data elements.
The caller is in control
You might ask, “Does NCE always require natural multi-slot caller responses?” The answer is ‘no’. The caller can speak any or all of the information prompted, in any order. Based on the caller’s response, the NCE will fill all the recognized slots, check if any information is missing, and if so, prompt the caller for that specific piece of information. Using our example, if the caller’s response was “Sell from my Large Cap Value Class A and send the money to my bank” (missing the dollar amount), then the next prompt would be, “And how much would you like to sell?”
NCE can make callers feel empowered by putting them in control of the conversation. NCE interactions can be quicker and even more natural than speaking to a human agent. I know that might sound controversial but consider this: When is the last time you spoke with a human agent using a multi-slot request and were able to complete the task without having to repeat any of the information?
Can machine understanding someday exceed human understanding? Absolutely! From what I saw in our usability test, Nuance Conversation Engine will be an important next step to help us get there.