What’s next.

Continued progress in reinventing how people connect with technology and each other.

From the library of What’s next: archives from R&D

Making speech recognizers more robust in the wild

Live adaptation capability of an ASR system is desirable but can be hard to achieve due to computational limitations, since it must operate while the service is up and running. The technique we propose fits both needs due to its simplicity and light requirement in terms of computational resources, while being applicable to any feed-forward architecture. In an Internet-of-Things scenario, we achieve word error rate reductions of 8.0% and 12.1%, respectively for two popular DNN architectures as Linear-Augmented DNN and Residual Net. The achieved result significantly improves over an i-vector baseline.

Automatic Speech Recognition (ASR) facilitates humans interacting with machines in the most natural way: by speaking to them.  Ideally, we would like to be able to talk with machines without limitation. This implies an ASR system should work well for any user and under any environment. Unfortunately, there are tremendous challenges towards achieving this goal. Today’s state-of-the-art ASR systems are still fragile in the sense that their recognition accuracy may vary substantially under different operating environments. The root cause of this fragile performance is the mismatch between the data used to train the ASR models and the data they operate on in practice, despite attempts to train on data collected from a large number of users across various environments.

One of the most popular approaches to reduce the degradation of an ASR system in a new environment is to adapt the model with data collected in the targeted environment where the system operates. However, this usually requires collecting a certain amount of data in the targeted environment and adapting the model with the data beforehand. Obviously, such an approach is not convenient for users and is not amenable to applications in which environments can change dynamically during operation.

ASR performance has been improved significantly in recent years thanks to the application of Deep Neural Networks (DNN) technologies to acoustic and language modeling in ASR systems. However, these DNN-based systems still suffer from mismatched data in practice. Therefore, a great deal of research work has been done, including the aforementioned model adaptation, towards making DNN based ASR systems more robust in mismatched environment.

In the paper titled “Online Batch Normalization Adaptation for Automatic Speech Recognition (ASRU 2019, Automatic Speech Recognition session III), we envision a live adaptation (aka online adaptation) procedure which dynamically adjusts the ASR system. Live adaptation is appealing from the user point of view, because it operates without user-perceived interruption. It is also capable of capturing any dynamic change in the environment during the operation. However, it puts strong constraints on the underlying computational cost of the system in order to satisfy the latency requirement of the service. In this work, we present a simple and effective technique suitable for live adaptation to compensate train-test mismatch for acoustic models. The effectiveness of the method is measured in a domain mismatched scenario for two state-of-the-art DNN based acoustic models:  Linear-Augmented DNN (LA-DNN) and Residual Networks (ResNet).

In DNN training, it is common practice to normalize input features to zero mean and unit variance to ease convergence of the training process. But after non-linear transformation layers, the output again becomes un-normalized, which makes training deep DNNs harder. The batch normalization (BN) technique enforces a normalization of the inputs to each layer in a DNN. This is achieved by computing the means and variances of each hidden layer outputs on a subset of the training data (called a mini-batch), then normalizing the outputs to zero mean and unit variance (again).

Our idea is to adapt the means and variances used for BN at test time in order to counteract a change in environment, speakers, etc. that could cause a domain mismatch and hence performance degradation. By doing this, we combine the benefits of BN (being able to train deep networks for better performance) and live adaptation (being able to counter domain mismatch).

All of this is more formally depicted in Figure 1. The BN layer shifts and scales each value of the inputs from the original distribution to a new one. It has some parameters, i.e. (m, s, b, g), which control the processing and are learned during the training stage. The input distribution of the BN layer, i.e. (m, s), is appealing for live adaptation, because it can be used to quantify the magnitude of the changes of the environment domain where the ASR system is operating. We can compensate these variations by re-estimating the (m, s) distribution dynamically during the operation of the ASR system. We call this enhancement “Online Batch Normalization” (OBN) technique. After the re-estimation of the input means and variances, the BN layer will realize the compensation automatically by shifting and scaling these updated values toward the pre-trained and fixed output (b, g) distribution.\

Figure 1 – The “batch normalization” layer can shift and scale the incoming activations from an input distribution toward an output one. Any new domain, that is reflected in the input distribution by a mis-matching condition with respect to the training, can be compensated by re-estimating the new input distribution during test. Then, the usual shift and scale processing can keep the output distribution stable.

The capability of the OBN layer to compensate for domain mismatched conditions is applicable to any feed-forward DNN model architecture. We demonstrated the generality of the method by applying OBN layer in LA-DNN and ResNet acoustic model. The models are trained by using anonymized field data that are representative for human-machine interactions and collected from devices with close-talk microphones in our various cloud services. The domain chosen for evaluation consists of far-field speech data collected in anonymized way from an IoT (Internet of Things) application, which presents a strong mismatch with the training data. In this scenario, we can reach a word error rate reduction of 8.0% and 12.1% for LA-DNN and ResNet, respectively. For comparison, we conducted the same experiments with i-vector based online adaptation and observed a word error rate reduction of 2.9% and 8.8% for LA-DNN and ResNet. This demonstrates the relative effectiveness of our technique versus a strong baseline for the domain mismatched scenario.

In conclusion, a live adaptation capability of an ASR system is desirable but can be hard to achieve due to computational limitations, since it must operate while the service is up and running. The proposed technique fits both needs due to its simplicity and light requirement in terms of computational resources, while being applicable to any feed-forward architecture. In an IoT scenario, for LA-DNN and ResNet DNNs we achieve word error rate reductions of 8.0% and 12.1%, respectively, significantly improving over an i-vector baseline.


This paper was co-authored by Franco Mana, Felix Weninger, Roberto Gemello, and Puming Zhan. It was presented at ASRU 2019, the International Conference on Automatic Speech Recognition and Understanding.

Delivering personalized user experiences with speaker adapted end-to-end speech recognition

In the latest generation of Nuance’s deep learning technology for speech recognition, we developed techniques for user adaptation that can reduce the error rate by up to 33 percent, exploiting regularized end-to-end learning, adaptation on text data, and minimum word error rate adaptation.

A crucial capability of automatic speech recognition (ASR) systems is to cope with variability in speech, caused by various accents, age groups, or other variations in speaking style, as well as noisy environments. Several years ago, Nuance’s Dragon dictation product line pioneered the usage of deep learning technology for speaker adaptation in professional dictation systems.

Deep learning technology has rapidly transformed the way that computers perform speech recognition. Traditionally, the difficult task of building an ASR system was broken down into smaller pieces, resulting in several components for modeling the frequency and usage of words (language model), the way that words are formed of phonemes — the smallest spoken units of a language –(pronunciation model), and how phonemes are realized in the signals captured by the microphone (acoustic model).

Given the advances in deep learning, these days, it is possible to subsume the acoustic, pronunciation and language models into a single deep neural network (DNN), which is also known as “end-to-end” ASR. While end-to-end learning is certainly advantageous in terms of simplicity and accuracy, it does not solve the problem of variability in speech; thus, the need arises for effective adaptation schemes for end-to-end ASR.

In the paper titled “Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR”, we explored the adaptation of state-of-the-art end-to-end ASR, demonstrating up to 33 percent reduction in word error rate (WER) on a dictation task. Such large reductions are possible because our end-to-end learning methods adapt to the user’s specific vocabulary, sentence structure, accent, microphone, etc., all at the same time, in contrast to conventional systems which adapt the acoustic model and language model separately. Our paper is accepted as an oral presentation at INTERSPEECH 2019, the world’s largest conference on spoken language understanding.

One challenge that we faced in implementing end-to-end ASR adaptation is that DNNs comprising the acoustic, pronunciation and language model need to be huge in order to be effective: In fact, the model used in the paper contains more than 100 million parameters. Adapting these parameters on speech data from a single user can easily result in an optimization problem known in the literature as “overfitting”, which yields undesirable side effects. For example, a user’s dictations could always start with the words “This is Jane Smith dictating a letter …”. If the model is trained on too many examples of similar kind, it will lose its ability to output general sentences.

We solved this problem by changing the way the optimization is done, by employing a strategy known in the literature as “regularization”. Rather than only minimizing the error rate on the user-specific sentences, we also discourage the output of the model from deviating too much from the outputs of our off-the-shelf (speaker-independent, SI) speech recognizer. Furthermore, we showed that we do not need to adapt all the parameters in the model: carefully selecting a subset can yield very good performance as well.

In the result, we could get significant error rate reductions even with a few minutes of speech, and up to 25 percent with 20 hours of speech:

As can be seen from the graph above, end-to-end ASR adaptation already outperforms the conventional acoustic model adaptation given enough data. However, we were able to obtain even larger gains by also exploiting pure text data sources from the same users (for example, written documents), rather than only their recorded speech data. This was not trivial to achieve since there is no separation into acoustic and language models in end-to-end systems, and text data can only be used to train a language model. Our strategy is to use the text data to fine-tune an external language model integrated into the end-to-end system. Similarly to the full end-to-end system, we regularize it using techniques we had previously developed in the paper “Efficient language model adaptation with noise contrastive estimation and Kullback-Leibler regularization” (INTERSPEECH 2018).

Another contribution of our paper is to change the optimization criterion in adaptation to directly reflect the WER metric (minimum WER adaptation). Overall, it turned out that the gains from regularized end-to-end learning, adaptation on text data, and minimum WER adaptation added up nicely, so that we were able to obtain up to 33 percent WER reduction from adaptation.

In conclusion, the advanced end-to-end deep learning technologies we employed do not only make ASR systems simpler and more powerful – they also deliver a much more personalized user experience, through seamless adaptation of the way computers perceive both acoustic and linguistic aspects of speech.



Jesús Andrés-Ferrer, Xinwei Li and Puming Zhan contributed to the research presented in this blog post along with Felix Weninger. The paper was presented in September at the INTERSPEECH conference in Graz, Austria.

DeepAAA: Detecting Abdominal Aortic Aneurysms with Deep Learning

An abdominal aortic aneurysm is a potentially serious condition that can be missed by radiologists when it was not the primary reason for a scan. We discuss in detail how Nuance and CCDS are collaborating to use deep learning approaches for detecting them.

Deep learning based approaches have recently made great strides in automating some types of medical image analysis. Despite some hype about deep learning algorithms replacing radiologists, it is most likely that deep learning algorithms will become sophisticated tools for radiologists, assisting and complementing their expertise. AI based approaches may assist by improving workflow efficiency, providing confirmation, or remaining alert for unusual circumstances which may escape the radiologist’s attention.



The aorta is the largest artery in the body. It connects to the left ventricle, loops over the heart, and then descends roughly vertically to the pelvis, where it splits into the two iliac arteries. Freshly oxygenated blood from the lungs is pumped through it with each heartbeat.  Aneurysms, swellings which indicate a weakening of the vessel walls of the aorta, occur with moderate frequency as people age. So long as the aneurysm remains intact, symptoms, if any, are usually minor. However, rupture of an abdominal aortic aneurysm (AAA) can lead to massive internal bleeding and is frequently fatal.

Figure 1 Major arteries of the human body. Aorta is shown in red. Image credit: Modified from Fig 20.24, OpenStax, Anatomy and Physiology. OpenStax CNX. Aug 20, 2019 Download for free at http://cnx.org/contents/14fb4ad7-39a1-4eee-ab6e-3ef2482e3e22@16.5 .

There are a range of treatment options if the aneurysm is noticed before it ruptures. However, as the symptoms are minimal, it is frequently detected only incidentally, when an examination is being done for some other reason. Thus when a radiologist reads a scan covering the aorta, they should report the presence of any aneurysm. However, the literature shows that these are often missed. Together with the researchers at the Massachusetts General Hospital (MGH) Center for Clinical Data Science (CCDS), we see an opportunity for an AI based approach to provide a “second set of eyes” to assist radiologists in incidentally detecting this condition.


CT Scans

The most likely scan type which would permit incidentally finding an abdominal aortic aneurysm is a CT (computed tomography) scan. A CT scanner works by rotating an X-ray source around the patient, projecting a beam of X-rays through them at different angles. The set of projections can be mathematically combined to form a 3D image of the body. To enhance the contrast of the blood vessels, a contrast agent may be injected into the bloodstream. The blood vessels will absorb more x-rays, and appear brighter, when the contrast is present. While the presence of contrast will definitely make an abdominal aortic aneurysm easier to detect, to be effective our approach should work with or without contrast.

The patient is usually scanned lying down on a motorized platform, which moves them through the rotating ring containing the x-ray source. The resulting CT scan is usually structured as a series of slices perpendicular to the head to foot axis of the patient. These slices may be spaced closer together or further apart depending on the scan parameters. To be practical, a method for detecting abdominal aortic aneurysms must work with a variety of slice spacings. While there are a number of additional variables that can change the imaging process, contrast agent and slice spacing are the two most important variables the method must contend with.

Figure 2 Contrast and non contrast CT of roughly the same area in a patient’s abdomen. (Putting the patient’s right on the image left is radiological convention. In this orientation, the image lines up with the patient if they are standing in front of you, or you are standing at the foot of the patient’s bed.)


In the first phase of this project, we have concentrated on the abdominal region of the aorta.   In this area, a healthy aorta has a fairly straight cylindrical form. To detect aneurysms we first must segment the aorta from the scan, and then measure it to detect any bulges which would indicate an aneurysm. We chose to apply a deep learning approach to the segmentation problem, and a more classical geometric approach to the problem of determining if the aorta is healthy.


Segmentation of the aorta

For the segmentation problem, we chose to use the U-Net neural network architecture, a state of the art approach to segmentation of medical imagery. The U-Net structure can be described at a high level as an encoder/decoder network with skip connections. Specifically, both the encoder and decoder sections are constructed of a series of convolutional blocks. Each block in the encoder section increases the number of channels in the feature map, and decreases spatial detail. Conversely, each block in the decoder section reduces the number of channels and increases the spatial detail. The output of each encoder block before reducing the resolution is branched off and combined with the decoder part after increasing its resolution. The structure is usually drawn with the encoder descending on the left and the decoder rising on the right, in a U shape, which gives rise to its name. Conceptually, semantic information (the what) is carried through the bottleneck in the filters, while spatial information (the where) is carried across to the decoder network through the skip connections.

Figure 3 Conceptual structure of the U-Net network architecture.

Originally developed in a 2D context, several variations of the U-Net have been explored which extend the idea into 3D. In our case, we expect a wide variation in slice spacing along the head-foot axis, but anticipate reasonably consistent and high resolution in the other directions. In this case, to allow operation across variable numbers of slices, we did not reduce the spatial resolution in the head/foot direction. This allows the system to process scans with an arbitrary number of slices.

We obtained 321 CT scans from 223 unique patients at the Massachusetts General Hospital to use for developing the network. The aorta was manually segmented by CT technologists to provide a reference segmentation. The dataset was split roughly 80/10/10 into training validation and test sets.  The network was trained by directly optimizing the Dice score between the output segmentation and the reference segmentation. The weights which performed best on the validation set were chosen. For further details on the design or training procedure, consult this paper.


Identification of aneurysms

Once trained, the network proposes a segmentation of the aorta for any input CT image of the abdomen. This segmentation is then measured to determine if it shows a bulge which would be interpreted as an aneurysm. A classical geometric approach was used for this, an ellipse is fitted to each axial slice. We consider the major axis of this ellipse to be a good estimator of the maximum diameter of the aorta in this slice, and we correct the estimate for the tilt of the centerline. Clinical guidelines suggest that a diameter greater than 3.0 cm should be considered as an aneurysm.

Figure 4 Detection of aneurysms. At each slice, the aorta is first segmented, an ellipse (green) is fit to that segmentation, and its major axis is measured. If the centerline, as determined from the centers of successive ellipses is not aligned, a correction for tilt is applied.


On the original dataset of 321 scans, the output of the model could be compared to the reference segmentations. The estimated maximum diameter could also be compared to the maximum diameter of the reference segmentation produced by the technologists. However, while the official diagnosis was presumably consistent with the reference segmentation size, it was not directly tracked. In a slightly unexpected, but very promising result, the system performed as well on scans with contrast as on those without. The segmentation algorithm achieved an average Dice score of 0.90, and had an average error with respect to the maximum diameter of the reference segmentation of only -2 mm.

As a further test, an additional 57 scans were obtained – these scans did not have reference segmentations, but had clinically validated diagnoses of the presence or absence of an abdominal aortic aneurysm, and estimates of the aneurysm size made by medical experts in a clinical context. On this additional, more clinically relevant test, the system also performed well, with a mean delta of less than a millimeter, a sensitivity of 85% and a specificity of 100%.

The system has difficulty however, in correctly segmenting the aorta in the thorax. This is not surprising, as we concentrated on the abdomen for this development and did not have thorax data in the training data. For the purpose of this study, thorax data was trimmed out of the second dataset by hand and the results presented here correspond to that trimmed dataset. For realistic clinical use, however, a means of handling the thoracic aorta needs to be developed. Many scans intended for the abdomen contain a certain amount of the thorax, and, more importantly, aneurysms of the thoracic aorta also occur, and it is important to detect them as well.


Conclusion and Future work

By combining a U-net based segmentation with a geometric analysis, our system is able to detect abdominal aortic aneurysms in the abdomen with high sensitivity and specificity. These results compare favorably with the performance of human radiologists in detecting incidental findings as described here. While these numbers aren’t directly comparable, not least because they are on different datasets, it shows the approach has potential to be useful.

This work was intentionally limited to the abdominal region. Work to extend the technique to the thorax is currently in progress. In the thorax the shape of the aorta is more complex, it loops up and over the heart and is surrounded by other blood vessels. This presents additional challenges both to the segmentation approach, and to the geometric post processing used to identify aneurysms.

The work so far has also been limited to a research context on retrospective data. Nuance and the CCDS are currently working to integrate the system with the Nuance AI marketplace, and eventually to make it available in a clinical context. Achieving clinical utility and regulatory acceptance is challenging, but necessary in order for the work to deliver its potential benefits.



This system described in this post was developed at the Massachusetts General Hospital Center for Clinical Data Science, and includes the work of multiple contributors. Jen-Tang Lu was the lead author of this work, with contributions from Rupert Brooks, Stefan Hahn, Jin Chen, Varun Buch, Gopal Kotecha, Katherine P. Andriole, Brian Ghoshhajra, Joel Pinto, Paul Vozila, Mark Michalski, and Neil A. Tenenholtz.

All usage of patient data in this project was approved by the institutional review board of the Massachusetts General Hospital.


Publication and press

This work has been published at the [22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019). A preprint is available on Arxiv.

An early version of the system was presented at the 2019 Society for Imaging Informatics in Medicine (SIIM) conference, where it won the Best Scientific Paper award.

An early demonstration of the system integrated with the Nuance AI marketplace was conducted at RSNA 2018.


Intelligence at Work: Qure.ai applies deep learning and artificial intelligence to streamline and improve radiologic diagnosis of chest x-rays and triage brain CTs

Qure.ai’s team of experts work to define clinically relevant problems and design real-world solutions that are deployed in 14 countries around the globe. The company seeks to improve diagnostic efficiency and accuracy in radiology, with an initial focus on chest X-rays and head CTs. Once FDA-cleared, Qure.ai algorithms will be integrated with Nuance’s next-generation reporting platform, PowerScribe One.

Access to accurate and early diagnosis has become key to delivering quality healthcare around the globe. In many locales, the doctor-patient ratio is low, and even more so in the case of specialized practitioners such as radiologists. In underserved and remote regions, radiologist expertise is scarce, costly, and unequally distributed. Even in developed parts of the world, workloads are creating burnout issues and higher error rates for radiologists. This means that not all patients receive the most accurate, timely diagnosis. AI-driven radiology solutions can automate a lot of the routine work, saving precious time for radiologists and help mitigate clinician burnout.

Chiranjiv Singh, Chief Commercial Officer of Qure.ai, shares his insights about how Qure.ai’s algorithms aim to make radiologic diagnoses more accurate and efficient, by delivering AI capabilities within radiologists’ everyday workflows, to optimize results and deliver better patient care.

Qure.ai’s product philosophy is to solve clinical and workflow needs of customers and to go deep into certain areas rather than spreading across a spectrum of clinical areas.  Qure.ai has trained its algorithms on more than 7 million exams sourced globally and prides itself on having been validated by multiple papers in peer-reviewed research.  In line with this philosophy, Qure.ai has commercially released two algorithms to date, one focusing on detecting abnormalities in chest X-rays and the other for triage and diagnostic support of head CT scans. As of writing, the CT algorithm is 510(k) Pending with the US FDA.


Jonathon Dreyer: Tell us about your business – when and how you started and your development journey.

CS: Qure.ai is a healthcare AI startup that applies artificial intelligence and deep learning technology to radiology imaging for quick and accurate diagnosis of diseases. Our algorithms can automatically detect clinical findings and highlight the relevant areas from X-rays, CT scans, and MRIs in a few seconds. This allows physicians to spend more quality time with patients to better understand their specific case/symptoms, communicate the diagnosis, and determine and discuss customized treatment plans – leading to better patient care.

Qure.ai was founded in 2016 by Prashant Warier and Dr. Pooja Rao. Prashant is a career data scientist and entrepreneur, and Pooja is a trained clinician. Together they bring complementary skills of engineering and medicine critical to product development. From humble beginnings in India 3 years ago, Qure.ai is now present across 14 countries through 80+ deployments and has processed more than 200,000 scans.

Our solutions have been validated and reviewed by clinicians at leading healthcare institutions such as the Massachusetts General Hospital and the Mayo Clinic, among others. The Lancet published validation of our technology, making it the first radiology AI article released by the journal. Qure.ai’s software is vendor-neutral and is deployed online with cloud-based processing capabilities integrated with the radiologists’ current reporting workflow.

JD: What AI algorithms do you have and what do they do?

CS:  We have two commercially released algorithms so far and are working to get them regulatory cleared for clinical use in the US market.

  • qXR scans abnormal chest X-rays to identify and localize 18 clinically relevant findings with an accuracy of over 95%. We have deployed this in various use cases, from screening to radiology assistance, to even post-read quality control. For example, qXR can screen for tuberculosis and is used in public health screening programs globally. When used as a point-of-care screening tool for TB, followed by immediate bacteriological/NAAT confirmation, qXR significantly reduces time to diagnosis.
  • qER is designed to triage critical cases and provide diagnostic assistance in head CT scans – a first-line diagnostic modality for patients with head injury or stroke. qER automatically detects intracranial hemorrhages (ICH) and its subtypes (intraparenchymal (IPH), intraventricular (IVH), subdural (SDH), extradural (EDH) and subarachnoid (SAH)), cranial fractures, midline shift and mass effect from non-contrast head CT scans.

JD: What’s the big “Aha” moment when you first show users what your AI algorithm(s) can do for them?

CS: The first Aha moment we get from customers is the depth of our capability. Unlike other AI algorithms in the market that may detect only a few findings on x-ray, we are able to detect and show accuracy numbers on 18 clinical findings from qXR. Similarly, for qER, we detect multiple sub-types of ICH along with cranial fractures, midline shift and mass effect – a larger triage capability than what most customers have seen so far from other AI vendors.

The next big Aha is when customers see our richness of peer-reviewed publications. Every AI company wants to claim high accuracy numbers, and yet there is a lack of trust among clinicians. We take this job of building trust as core to our company and therefore have invested resources to expose our algorithms to multiple independent reviews and peer-reviewed publications that help us reduce that trust deficit. The fact that our algorithms can identify and label the exact abnormalities, as well as their locations within the scans in a matter of minutes, with near-radiologist accuracy in a clinical setting, has been our biggest highlight.

Lastly, our integration within radiology workflow is the final wow! For example, we have worked with Nuance to integrate our AI algorithm outputs in PowerScribe One to allow radiologists to consume these outputs according to their preferred workflow. We are also integrating our outputs to help to prioritize radiologist worklists using PowerScribe Workflow Orchestration.

JD: What challenges or needs did you see that drove you to focus on this?

CS: Access to accurate and early diagnosis is crucial to delivering quality healthcare. In many places around the world, the availability of specialized radiology resources is limited. And even in more developed countries, the volume is increasing exponentially, putting limits on the ability of radiologists to deliver timely, accurate diagnoses.  Burnout is increasing as well as the potential for errors.  Our solutions can help automate a lot of the routine work, saving precious time for radiologists and thereby preventing clinician burnout.

We saw this as a need and simultaneously an opportunity to leverage the power of deep learning to develop solutions dedicated to this market. Our mission is to use artificial intelligence to make healthcare more accessible and affordable.

JD: What’s the number one benefit you offer?

CS: The number one benefit we offer our users is “trust and peace of mind.” This is possible only when a product is reliable and also invisible. We want our users – be it radiologists or public health experts – to focus on their patients and trust us for the accuracy of our algorithms. We also want to embed ourselves into their workflow in a manner that almost becomes invisible to their daily practice. We believe that our AI solutions shall be successful only if we are able to build integrated solutions with companies like Nuance that solve clinically relevant problems.

This is easier said than done. It means working hard to build solutions that are globally trained and validated, built on a large volume and variety of data, and embedded into diverse clinical workflows. It’s the challenge of meeting our customers’ expectations on this benefit that keeps us up at night.

JD: Are there any stories you can share about how your algorithm(s) drove measurable patient care outcomes?

CS: One of our customers is the Philippine Business for Social Progress, a local screening agency and the first adopter of artificial intelligence algorithms for tuberculosis detection in the Philippines. Working with their team, we built a custom, end-to-end TB workflow and patient registration software that helps health workers immediately refer potential TB suspects for confirmatory tests. Our solution is deployed in multiple mobile vans that move across different pockets of Manila and have been in use for >6 months. Prior to using qXR, the time to complete a patient diagnosis was >2 weeks. We have reduced that time to <1 day (from screening to x-ray to lab tests). We have identified 25% more TB cases than the original workflow and have screened 30,000+ individuals using our AI solution.

JD: What benefits does Nuance and its AI Marketplace for Diagnostic Imaging bring to your users?  What problems does the marketplace and integration into Nuance’s workflow solve?

CS: Nuance and its AI Marketplace brings two key benefits to our users. The first benefit is that it offers a single platform to review, try, and buy AI algorithms. Customers need a trusted partner with vetted solutions that connect trusted AI developers to clinical users. The Nuance AI Marketplace does this for every stakeholder in the user organization – clinicians get access to algorithms they can evaluate for clinical accuracy; IT administrators get easy integration without running multiple deployment projects with independent vendors; purchase/finance teams get streamlined negotiations and reduced time to execute multiple contracts.

The second and equally important benefit is seen once the purchase decision has been made. For our solutions to work and be used, they need to be accessible to the users when they are reviewing images and dictating their reports. We want to embed ourselves into customers’ workflow in a manner that is almost invisible to their daily practice. Nuance offers the right point and platform for this integration into the radiologist workflow for AI solutions like ours, and we are really excited to be part of this platform.

JD: What has your experience been working with the Nuance team?

CS: The experience of working with the Nuance team has been one of dealing with a team that is not only professional but also extremely knowledgeable and proficient in diagnostic imaging and reporting workflows. They understand the use cases of bringing in technologies like AI to meet real needs of their customers. I am looking forward to this partnership as we jointly work with our customers and deliver value to them.

JD: What is your vision for how your solution(s) will evolve over the next 5 years?

CS:  In the next five years, I see us offering more comprehensive solutions across various clinical domains, solving customer challenges at various points in the diagnostic journey of patients. We will enhance our capabilities by increasing our clinical coverage beyond chest x-ray and head CT that we offer today. In terms of diagnostics workflows, we see ourselves being able to offer more measurement and diagnostic tools to aid radiologists in their reads and even do tasks like treatment progression monitoring to aid other clinical users. Five years is a very long time in the field of AI, and I am confident that Qure.ai will be a dominant global player and a trusted partner for our customers over that time frame.

JD:  In one sentence, tell us what you think the future of medicine will look like?

CS: The future of medicine will be custom designed and served, focusing both on prevention and cure, and most importantly, accessible to all.

Learn more:

To learn more about Qure.ai, please visit www.qure.ai

To learn more about Nuance AI Marketplace for Diagnostic Imaging, please visit https://www.nuance.com/healthcare/diagnostics-solutions/ai-marketplace.html

Intelligence at Work is a blog series by Jonathon Dreyer, Senior Director, Solutions Marketing, Healthcare Division for Nuance Communications. Intelligence at Work showcases projects and applications that demonstrate how Nuance technologies extend the value, use, and performance of integration and development partner offerings. This blog series focuses on inspiring the healthcare developer community to think beyond their current state and take their innovations to new heights by tapping into the latest in artificial intelligence.

Tags: , , ,

Intelligence at Work: Knee Osteoarthritis Labeling Assistant (KOALA) for detecting signs of knee osteoarthritis by IBL

Read how IBL’s KOALA AI-driven application, currently pending FDA 501K approval, can help improve assessment and diagnosis of many musculoskeletal conditions and impact patient care. It will support physicians in detecting signs of knee osteoarthritis based on standard joint parameters, and help track disease progression. It is available for review on the Nuance AI Marketplace for Diagnostic Imaging, and once approved, will be integrated with Nuance’s next-generation reporting platform, PowerScribe One.

As the population ages, arthritis and other musculoskeletal diseases are an increasing cause of physician visits and health care spending.  With increased prevalence comes an increased burden for rapid, precise diagnosis and staging, as well as an ability to predict future disability. Unfortunately, interpreting orthopedic images can be laborious. There is a need for standardization and simplification while providing quantitative disease parameters to support treatment decisions. Having precise measurements is the missing link to tracking the slow progression of degenerative diseases.

Dr. Richard Ljuhar, CEO and co-founder of ImageBiopsy Lab (IBL), shares his thoughts about how IBL’s AI-driven musculoskeletal imaging algorithms aim to improve assessment and diagnosis of a range of musculoskeletal conditions, including osteoarthritis (OA), osteoporosis, and rheumatoid arthritis.  The goal is driving timely and appropriate interventions to reduce morbidity and disability – relieving pain and improving patients’ lives.

Interpreting musculoskeletal images is a challenge due to the lack of objective analysis methods and standardized digital documentation of radiographic changes. Because of these shortcomings, diagnosis and predictive assumptions show significant inter-rater variabilities and are thus often unreliable. IBL uses state-of-the-art artificial intelligence technology to efficiently address these challenges, relieving physicians and researchers of time-consuming image analysis tasks, while at the same time improving diagnostic accuracy and predictive capability.


Jonathon Dreyer: Tell us about your business – when and how you started and your development journey.

Richard Ljuhar: ImageBiopsy Lab (IBL) was founded by a team of experienced professionals and specialists in medical technology and AI, along with board-certified doctors in orthopedics and radiology. Based on personal experience of the management team, plus intensive discussions, brainstorming, and surveys of medical users, core elements of our AI modules have been successively worked on since 2012.  IBL was incorporated in 2016 and began implementing its business strategy. The initial focus has been on applying deep-learning methods to knee osteoarthritis (OA), and this was our first use case. But our modular platform technology is designed to be applicable to any orthopedic imaging data, so we have expanded beyond knee OA to other musculoskeletal disease applications.

JD: What AI algorithms do you have and what do they do?

RL: The focus of IBL is on digital X-ray and musculoskeletal diseases, with artificial intelligence-driven solutions for anatomical regions such as the knee, hand, hip, whole leg, and spine. Our first CE-marked/510k pending module KOALA (Knee Osteoarthritis Labeling Assistant) supports physicians in detecting signs of knee osteoarthritis based on standard joint parameters and OARSI criteria of standing radiographs of the knee. PANDA (Pediatric Bone age and Developmental Assessment) supports an objective and standardized determination of pediatric bone age. HIPPO (Hip Positioning) supports objective and standardized measurement of the most important hip angles based on digital x-rays.

JD: What’s the big “Aha” moment when you first show users what your AI algorithm(s) can do for them?

RL: A remark from Peter Steindl, MD, an orthopedic surgeon, sticks in my mind.  He said, “I guess my biggest “Aha moment” was that I realized the potential to measure and compare sclerosis, joint space narrowing, and OA-grades in an objective way in a particular patient over a couple of years. I think this device/software might be very helpful in finding the optimal timing for planning a joint replacement surgery of the patients’ knee.”

JD: What challenges or needs did you see that drove you to focus on this?

RL: After years of experience and discussions with medical experts, IBL identified that orthopedic diagnoses could benefit immensely from AI-driven solutions. Workflows are time-consuming and elaborate with interpretations often subjective and difficult to reproduce. Additionally, image reading and interpretation often hasn’t changed significantly since the introduction of radiography. The need to bring musculoskeletal/orthopedic radiology into the digital age drove our motivation to change the status quo. IBL’s software offers simplification and standardization while at the same time providing quantitative disease parameters to support treatment decisions.

JD: What’s the number one benefit you offer?

RL: While we support medical experts and their patients in numerous areas during the diagnostic pathway, we see the greatest benefit of our solutions in automation and in consistent documentations of radiological parameters. Big data and artificial intelligence cannot replace physicians, but they can relieve them of time-consuming routine tasks. This should allow medical experts to invest their time where it is most needed—with their patients!

JD: Are there any stories you can share about how your algorithm(s) drove measurable patient care outcomes?

RL: Our experience and that of our customers has shown that through our solutions there is a higher level of agreement between physicians, improved patient communication, more appropriate and timely therapy decisions, and an increase in patient loyalty.  In fact, we even had patients approaching us directly asking if we can run the digital analysis of their X-rays as they wanted to get an accurate assessment of their disease progression.

JD: What benefits does Nuance and its AI Marketplace for Diagnostic Imaging bring to your users?  What problems does the marketplace and integration into Nuance’s workflow solve?

RL: IBL and Nuance deliver their core value at the most critical interface of the radiology workflow: Translating the image information to a report. Our AI solutions facilitate this transition by providing quantitative and objective measurements. Thus, the flawless integration of our AI output to pre-fill reporting templates via Nuance delivers the most value to existing workflows. Being delivered at the heart of where radiologists’ time and decision making matters the most is what is streamlined by Nuance while proving a scalable IT infrastructure and customer base to build a win-win-win situation for IBL, Nuance and the physicians benefiting from time-saving and quality improvements.

JD: What has your experience been working with the Nuance team?

RL: We at IBL especially like the forward-thinking design of how AI results are injected to existing reporting workflows which made it highly attractive for us to collaborate. The early designs of the Nuance AI-driven solutions already reflect the experience and professionalism of a company with tremendous domain knowledge and ability to deliver the promised value of AI for physicians. Nuance’s responsive support allowed IBL to quickly ramp up demos and use cases, and we are very happy to be part of the family.

JD: What is your vision for how your solution(s) will evolve over the next 5 years?

RL: IBL will expand its portfolio of fully automated AI solutions for musculoskeletal radiology where automation matters the most – time-saving and objective outcome measures on standardized, high-volume tasks that enable easier comparison between repeated visits of the same patient. With this, the workload of the orthopedist and radiologist can decrease, while the quality of results can increase. And because precise measurements are the missing link to tracking the slow progression of certain MSK diseases, radiologists using IBL’s solutions deliver the perfect service to their referring orthopedists, who can apply IBL’s outcome measures to tailor personalized treatments and monitor their efficacy over time. The longitudinal structured data of our AI solutions supports powerful prediction models which use our AI results and clinical data to predict the future progression of the patient’s condition. This is possible due to IBL’s decade-long experience of building image processing algorithms and experience to transform immense datasets to actionable clinical decision support.

JD: In one sentence, tell us what you think the future of medicine will look like.

RL:  Automation and standardization will lead to an increasing amount of structured data which in turn will lead to a growing number of AI-applications in the years to come.

Learn more:

To learn more about ImageBiopsy Lab, please visit www.imagebiopsylab.ai

To learn more about Nuance AI Marketplace for Diagnostic Imaging, please visit https://www.nuance.com/healthcare/diagnostics-solutions/ai-marketplace.html

Intelligence at Work is a blog series by Jonathon Dreyer, Senior Director, Solutions Marketing, Healthcare Division for Nuance Communications. Intelligence at Work showcases projects and applications that demonstrate how Nuance technologies extend the value, use, and performance of integration and development partner offerings. This blog series focuses on inspiring the healthcare developer community to think beyond their current state and take their innovations to new heights by tapping into the latest in artificial intelligence.

Tags: , , ,

A Real View: the last mile in implementing AI

Nuance Healthcare Diagnostics Vice President and General Manager Karen Holzberger sat down with Nuance Senior Director of Product Management Sander Kloet to discuss the importance of addressing the “last mile” challenge in deploying AI for radiology, one of the key topics that he and other industry experts will cover at the RSNA regional course on AI on May 31 in San Francisco, CA.

Imagine that you are a busy professional whose very long days are packed with a mix of routine and critical time-sensitive tasks, each of which requires close attention and thorough and accurate paperwork. One day you learn that powerful new tools can help you get more work done in less time with improved quality and greater benefit to those counting on your expertise. But there’s a catch: to achieve these gains you must take time you don’t have to fit the new tools into your workday.

That’s the essence of the “last mile” challenge facing radiologists looking to leverage a growing number of AI diagnostic models and workflow tools to manage increasing workload volumes, maximize value-based reimbursements, reduce administrative burdens that contribute to burnout, and ultimately, improve patient outcomes. The ability to integrate AI tools into current radiology workflows seamlessly and intuitively is vital to realizing the benefits.

I recently sat down with Sander Kloet, who will lend his expertise in product design and implementation to the upcoming RSNA regional AI course by discussing the “last mile” challenge and the solutions and approaches to address it.

KH: What’s the “last mile” problem and what does it mean for radiologists and AI?

Sander: The idea of the “last mile” connoting the final leg of a journey originated in telecom and logistics to describe the work remaining to get to the intended destination or outcome. At the same time, it indicates that although there are still a few hurdles to clear the goal is within reach. In that sense it’s a highly motivating and energizing challenge.

When we think about the last mile problem for radiologists, we recognize that in order to realize the potential AI has to advance radiology it must fit seamlessly into a radiologist’s workflow and not be an add-on requiring extra steps. It must deliver both practical and clinical value as an integral part of how radiologists work. If it doesn’t it simply won’t be used.

The key from a product design perspective is to think comprehensively. For example, image characterization algorithms can be invaluable in helping radiologists identify pulmonary nodules or brain bleeds quickly. But those results need to be delivered before the radiologists has read the study and dictated the report, otherwise they have to take additional time to review the AI findings and modify their reports if needed. That also means making sure that image processing is optimized so that the AI results are available promptly alongside the images from the PACS and history from the patient’s EHR. Those are complex issues but getting the workflow right is essential.

KH: How is access to AI models integrated into the workflow?

Sander: That’s a two-part issue. The first part is simplifying the development and deployment of the many different algorithms that are needed to address the wide variety of modalities, exams, and specialties. A radiology department could potentially require over a hundred algorithms from dozens of developers, each addressing a specific diagnostic use case. Developers need to be able to reach users at scale to justify app development. Healthcare systems need to consolidate vendor access, so they don’t have to establish relationships with every developer they want to work with. Adoption of AI-driven solutions will take a frustratingly long time if there’s not a unified market where developers can reach large numbers of radiology users who can easily discover and purchase new models. That’s where the Nuance AI Marketplace for Diagnostic Imaging comes into play. It’s essentially an app store for AI diagnostic models and workflow optimization tools. It connects the 75% of radiologists and 6,000 healthcare facilities in the U.S. who use Nuance radiology reporting or image sharing solutions with AI algorithm developers in a collaborative marketplace, with a built-in feedback channel for continuous improvement.

The second part is that access to the AI Marketplace is integrated into the radiologist’s workflow tools, the worklist, the PACS and the Nuance PowerScribe reporting system. That allows AI Marketplace clients to quickly evaluate and use the latest AI solutions and then seamlessly integrate the results into their current workflows.

KH: That covers AI model access, but what about enhancing workflow and augmenting radiologists’ expertise with AI?

Sander: Yes, good question. Physicians know from past experience that new technologies that promised improvements instead impeded their ability to deliver quality care. It was a case of the doctors having to serve the needs of the technology instead of the technology serving the needs of the doctors. Our fundamental top-down mission at Nuance is to create technologies and solutions that not only get out of the way but really empower clinicians to do what they love – take better care of their patients.

Ensuring that access to AI models is seamless from within the PowerScribe workflow is one way. A great example of that is the FDA-cleared ICH detection application developed by Aidoc and deployed at the University of Rochester to prioritize unread exams. It analyzes CT exams indicating a suspected intracranial hemorrhage and then prioritizes them on the PowerScribe worklist for a radiologist’s immediate attention when time-to-treatment is critical.

Another excellent example is the new PowerScribe One platform. It helps radiologists review and if necessary, edit AI results and automatically prompts the users with appropriate follow-up recommendation based on the ACR AssistTM clinical guidelines.

All of that is driven by our innovations in natural language processing and clinical language understanding (CLU) that actually understand the meaning and context of what the radiologist is dictating and correlating it with the AI findings. It recognizes and stores the narrative report contents as structured data, all without requiring the radiologist to change how they work or add additional steps. That’s a very big deal because it can make every part of a report accessible to the EMR and to clinical data analytics. Now, incidental findings, follow-up recommendations, and many other radiology report elements can be leveraged and tracked in ways that previously were too difficult or impractical.

I think it’s important here to note the importance of combining workflow-integrated access to AI with the collaborative feedback loop of the AI Marketplace. Access from within the PowerScribe desktop makes AI usable from a practical point-of-view. Giving radiologists and developers a built-in channel to share feedback on AI model implementation and results makes it truly useful. It enables ongoing refinement of AI models for improved accuracy and specificity and addresses radiologists’ preferences and priorities. It creates a virtuous cycle that builds confidence and capability in the technology and fosters increased adoption.

KH: What should radiologists expect as we move forward on closing the AI last mile?

Sander: In a word, I would say “momentum.” By that I mean accelerating progress toward widespread practical adoption in the near term. As I noted earlier, there are still multiple challenges ahead. For example, there will be issues connected to using AI including how reimbursements will be structured, and things like access to diverse training data to create robust diagnostic models. We also are seeing interesting report creation challenges resulting from data generated by AI that was previously impractical to obtain by radiologists, and we look forward to collaborating with our clients to determine how to leverage all this data in reports in the future.

The growth and advancements we’re already seeing with the AI Marketplace, PowerScribe One, and CLU are really making the destination more clearly within reach than ever before. We’re also seeing work by multiple stakeholders on issues like reimbursements, for example, and by the ACR Data Science Institute on the data challenges. As you noted in a blog post late last year after the RSNA conference, there has been a real sea change in the outlook for AI within the radiology community. It’s highly motivating.

Ultimately, where we end up at the end of that last mile is using AI to augment radiologists to enable them to work more effectively and efficiently, meaningfully address burnout, and most of all, improve patient outcomes.

KH: Thank you, Sander. It’s exciting to hear the details of how we and the radiologists with whom we work closely are addressing these last mile challenges. Beginning on May 31, Sander will share these and other insights during RSNA’s spotlight course, “Radiology in the Age of AI.”

The Real View is a Q&A blog series with Karen Holzberger, Vice President and General Manager of Nuance Healthcare’s Diagnostic Division. The Real View cuts through the hype and gets to what’s real, here, and now. The blog series features interviews and insights from health IT movers and shakers – and uncovers disruptive technologies that solve challenges, optimize workflow, and increase efficiencies to improve patient care.



Q&A: How our latest internal hackathon brought innovation to non-profits

We rise by lifting others. That’s one of the reasons why our latest internal hackathon Innovation Challenge took on a different mold – one geared towards leveraging employee’s innovative drive and creativity for social good.
Nuance SS11 Innovation Challenge

We rise by lifting others. Robert Ingersoll said this a long time ago and it still rings true. And it’s one of the reasons why we ran a different kind of internal hackathon from our usual product-focused event. The SS11 Innovation Challenge for Social Good leveraged Nuance employee’s creativity for social good while empowering them to create relevant solutions. It allowed them to target real needs and problems from non-profit organizations that could be solved with Nuance’s technology. Girl Scouts, a Performing Arts Ticketing Service, an organization for at-risk high-potential students and a political campaign are just a few of the nonprofits for which our teams designed. And, it was fun! Eduardo Olvera, Senior Manager, User Interface Design of the Cognitive Innovation Group led the event with leader Guy Hoovler, Senior Director Professional Services of the Enterprise Division. Here’s what they had to say:


Why hold an Innovation Challenge for social good?

Guy Hoovler: Innovation challenge events demonstrate our division’s support for innovative ideas, committing significant time and resources to support our employees’ innovative and competitive initiatives. Incorporating the “social good” element this time sparked creativity in a number of the teams who were motivated by the challenge to solve real world problems, using our tech in a way that did something more than provide a bank balance.


Nuance SS11 Innovation Challenge


What is the impact of the event and the ideas? How will this help non-profits?

Eduardo Olvera: To facilitate team-building and cross-training within the Enterprise division while making the team’s ideas and solution relevant by tying them to real world problems. Non-profits benefit because the nature of their problems makes them suitable to be solved with technology.

Nuance teams built innovations for nonprofits


What was the most exciting part of the event?

Guy: Seeing the excitement and energy the teams brought to their presentations, especially while watching some of the teams execute live tech demos that actually worked!

Eduardo: For me, it was the response we received from participants, supporters and organizers. We had a very successful event, with 9 total teams with over 40 participants, all of which made it to the final line. We received support from upper management, directors and managers, local liaisons, subject matter experts, IT specialist, facilities administrators, human resources and legal staff. This event also marked many ‘firsts’ in the history of Innovation Challenges: the first Enterprise challenge that combined teams (CIG + PS), multiple locations, a curated list of projects teams chose from, an emphasis on social good, a platform used for team’s development, flexibility across a larger development period, idea checkpoints and SME office hours.


Nuance SS11 Innovation Challenge


What does innovation mean to you?

Eduardo: Innovation is the application of creative processes and ideas in novel and useful ways that add value and solve real world problems. The biggest mistake I see is people not realizing innovation is bigger than a product or technology platform, which means companies tend to not put the required level of support required to make it happen and then grow.

Guy: Innovation involves rethinking both the problem and the solution and implementing what is needed to fill in the gaps. I find things to be most innovative when they illustrate how we’ve been focused on the wrong questions.


Why is innovation important for not only Nuance, but for the greater community?

Eduardo: Because those companies, organization and communities that truly understand innovation, install strategies around it and execute it well, are the ones that achieve and sustain long-term success while at the same time keeping their employees, members and volunteers feeling satisfied, fulfilled, productive and excited about their personal and professional future.

Guy: Innovation keeps our minds agile – whether we do it ourselves or appreciate it when done by others.  Seeing and doing innovation both serve to break us out of today’s complacency and get us thinking in a constructive context about what happens next.

Nuance SS11 Innovation Challenge


Needless to say, it was a tight (and impressive) competition. In the end, the winning team decided to donate their money to the Water Project, a non-profit working to end the water crisis and provide access to clean, safe, and reliable water across sub-Saharan Africa. We were honored to donate $1,000 to help them further their mission and our teams are looking forward to the next Challenge.


Tags: ,

Nuance SS11 Innovation Challenge

Sorry, Team “Yanny” – AI says it’s “Laurel”

Is it Laurel, or is it Yanny? The audio clip that has divided the world has an answer – and according to Nuance researcher Nils Lenke, it’s backed by artificial intelligence.

Back in 2015, the internet was in an uproar over the phenomenon known as “The Dress” – a seemingly innocuous photo of a black and blue (or was it white and gold?) dress that prompted experts to investigate the science behind human vision and color perception. This week, a new sensation has ignited another fiery debate.

While studying for an exam, a high school freshman found an audio clip for the world “laurel” on Vocabulary.com – except to her, it didn’t sound like “laurel” at all. The recording was posted to Reddit and chaos ensued. The sound bite left listeners astonished that they could hear something so entirely different than the person sitting next to them.


So, is it Laurel or Yanny? The audio clip that has divided the world finally has an answer. Well, another perspective at least. And this one is backed by artificial intelligence.

We used our Dragon platform, the speech recognition software behind many Intelligent Assistants in the car, on TVs, IoT devices, and beyond, to find out what it would make of the clip. The result was “Laurel,” without a doubt.

Nils Lenke, senior director of corporate research at Nuance, said: “Dragon hears ‘Laurel.’ Speech recognition technology today is based on artificial neural networks that are supposed to mimic the way the human brain works. We train these networks on thousands of hours of human speech, so it can learn how the phonemes – the smallest building blocks of language – are pronounced. But it is clearly not an exact copy of how we as humans hear and interpret sounds, as this example shows. So, we can all still feel “special”– especially those who hear ‘Yanny’!”

Artificial intelligence stands with Laurel. Maybe those of us in camp Yanny will be on the right side of technology for the next viral controversy.

Tags: ,


A tribute to Stephen Hawking and his iconic voice

Stephen Hawking, the famous cosmologist and theoretical physicist, passed away last week at the age of 76. Like much of the world, I was fascinated by both his accomplishments and his iconic voice. Having spent my career in the Text-to-Speech field, I was lucky to have met Hawking once, working on his synthetic voice then and again years later, with Nuance.
Stephen Hawking speaks famous words with his computer-generated text-to-speech voice

The famous cosmologist and theoretical physicist Stephen Hawking passed away last week. Apart from his scientific contributions, he was also a role model for people living with a disability. “Concentrate on things your disability doesn’t prevent you doing well, and don’t regret the things it interferes with,” he said in a 2011 interview with the New York Times, “Don’t be disabled in spirit.” Millions of people became familiar with his synthetic, computer-generated voice, which he began using after losing the ability to speak in 1985.

The synthetic voice he used for more than 30 years was generated by a circuit board named CallText 5010, made by a company called Speech Plus, which is now part of the Nuance family. Hawking originally owned three copies of CallText, but one board broke after it fell to the ground. Concerned that the hardware would break or fail to work in the future, Intel, who had started providing him with a PC and technical support, wanted to replace his hardware synthesizer with a software version. They didn’t want to risk leaving the scientist without his voice again.

When I was a postdoc at the Oregon Graduate Institute working on speech synthesis, I was contacted to help with the project. In the following months I borrowed Hawking’s spare CallText board and recorded 2,000 speech sounds called diphones with it (synthesis by concatenation of diphones was the dominant text-to-speech technology in those days.) When Professor Hawking was in Oregon for a lecture a few weeks later, I presented the new voice to him at his hotel room in downtown Portland. I connected it to a loudspeaker so he could hear the sample sentences I had prepared from his lectures read aloud. After a few minutes of silence (during which Hawking was typing), came the reply: “I like it. But more importantly, will my wife like it?”

That same evening, I attended his public lecture. I remember feeling a personal connection as he was presenting, and a sense of privilege to have been a part of his story, no matter how small. Hawking ultimately continued to use his original circuit board synthesizer during public appearances. After all, it was that voice that the world had come to recognize as the iconic Stephen Hawking; the new implementation just didn’t sound quite the same. Hawking’s synthesized voice was as much a part of him as our natural voices are of us.



In late 2017, we revisited a project first discussed a few years early: The project entailed Nuance working with Professor Hawking and his team, agreeing to provide him with a version of the source code of his TTS voice, which we had stored in an archive. The goal was to transition Hawking’s system to a modern, software version while maintaining the authenticity of his original voice. Unfortunately, he passed before we could complete our work together.

Hawking once wrote on his website, “I try to lead as normal a life as possible.” Ironically, there wasn’t all that much normal about him. His wit was unmatched. He once acknowledged that although he had a PhD, “women should remain a mystery.” He was a brilliant physicist, a renowned cosmologist, a respected professor, and a prolific author. He won countless awards and held thirteen honorary degrees. Hawking’s professional success was matched only by the strength and depth of his personal relationships. “It would not be much of a universe if it wasn’t home to the people you love,” Hawking once said. He will be deeply missed by his family, friends, colleagues, and the countless people he inspired in the universe he helped explore. It is a gift to all of us that even when he could not speak, Stephen Hawking never lost his voice.


Tags: ,

Stephen Hawking speaks famous words with his computer-generated text-to-speech voice

How the machines will adjust to us: A short story about “conversational AI” growing up

2018 is going to be an exciting year to witness the start of a huge leap into the area of conversational AI. Josefine Fouarge takes a look into how it has developed so far and where it's going very soon.

For years we have been trained on how to interact with machines – how to use a mouse, what to click for a specific action, and maybe even how to write code in a variety of languages. But talking, gestures, or facial expressions are natural ways for us to communicate. Machines that can understand these nuances have only been subject to Hollywood interpretation so far.

“So far” are the key words here. Technology has evolved in a way that it can interpret the human language and draw a conclusion based on what was said or texted. The complex part here is not just the algorithm, though; it’s the ability to combine phonemes to words for speech recognition, letters to words for text recognition, and either one of them to meaning – and then react based on that. 2018 is going to be an exciting year to witness the start of a huge leap in that area, because today’s technology is already capable of engaging with humans in a conversational way.


Where do we start?

Where do we see conversational interfaces? Chatbots and virtual assistants are probably the most known example. Used in customer service scenarios, conversational interfaces can do a lot of things already. They can react to very specific scenarios like resetting a password, updating the address or helping with selecting a specific product. Usually, these can be found on a brand’s website, in their messaging and social channels and even in the IVR.

If you have used a smart speaker like the Amazon Echo before, then you have dealt with a machine that interprets your words into meaning for itself. For example, when you ask Alexa to play music it will analyze your request and then, as a result, start to play some tunes. Have you ever called a brand and it told you to simply ask a question instead of pressing “1”? That’s basically a virtual assistant with speech recognition.


What’s the next step?

There is a variety of conversational interfaces available, for example the ones that provide a list of items from which a user can pick; others react on specific keywords and can be used as a simple Q&A. The next step of these rather “simple-minded” versions is a conversational interface that is capable of handling all sorts of conversations, back and forth, without the need for human intervention. Today’s “state of the art” virtual assistant can disambiguate without a pick list, just by asking for the missing information.


That’s the goal.

The final, so far unsolved, stage is actual complex interactions. Something that could simulate a heated discussion, a brainstorming with a colleague, etc. Things that require a lot of external data or background information that influence the conversation. These are the areas on which Nuance is working, bringing automated conversations from a simple back and forth to an actual conversational tool that will allow you to augment your life.

To give you an idea of how this future could look, watch our vision of next generation omni-channel customer engagement.


Discover the intelligence behind our conversations

Conversational AI lets consumers engage in natural interactions through text or speech to gain immediate access to information and easy, effortless outcomes through IVR, messaging or web channels.

Learn more

Tags: , , , , ,