Imagine you feel a lump in your breast. Your doctor sends you to get a mammogram, which is read by radiologist who trained for many years to recognize faint patterns of suspicious masses. The region in question does not clearly fit the common pattern for a benign mass, so you get a needle biopsy to extract a piece of tissue from the suspected area for further testing.
The sample gets sent to a lab, where a technician slices the tissue and fixes it on a glass slide. Like the radiologist, the pathologist is another human trained to interpret visual patterns. She scrutinizes the slide, inspecting the tissue at high magnification for regions of cells that look abnormal. Based on years of training and a well-tuned eye, she creates a report indicating the type and severity of your breast cancer along with how you might respond to certain treatments.
Finally, your oncologist reviews the report. She chooses from hundreds of treatment options the one she thinks is right for your tumor, a decision that could be a matter of life or death.
This story plays on repeat across the world. The cancer diagnostic chain, while mostly good, remains a fragile process susceptible to human error.
The explosion of intelligent data
The last two decades have produced a data explosion in healthcare. New tests and therapies are emerging, and the patient population is growing, underpinned by an increased emphasis on digitization. Data is being created faster than humans can make sense of it, overwhelming a declining population of diagnosticians to make sense of it.
While advances in machine learning have incrementally translated artificial intelligence (AI) from a matter of science fiction to reality, breakthroughs in hardware and algorithmic techniques at the beginning of this decade sparked an explosion of possibility. The surplus of data and deficit of adequate human intelligence makes cancer diagnostics a prime opportunity to leverage computational approaches that drive objectivity and unlock diagnostic efficiencies.
This confluence of trends is relatively new, but quickly evolving, especially given the constraints unique to healthcare. We’re beginning to see new applications prove themselves in controlled academic settings, become validated in the wild, and percolate into medical practice across radiology, pathology, molecular diagnostics, and more.
In 2016 and 2017, researchers held a two-part worldwide competition to pit AI-based cancer diagnostic systems against another. CAMELYON16 tested to see if the systems could diagnose the presence of cancer cells in pathology images, called whole slide images, and CAMELYON17 tested whether the systems could stage the cancer and locate it on the slide image. Participants were provided with 899 whole-slide images, some of which were confirmed as cancerous and others as cancer-free, for developing their algorithms. The systems were evaluated against a test set of 500 whole-slide images from 100 patients.
Overall, the participants in the CAMELYON challenge performed well, and the quadratic weighted Cohen’s kappa metric ranged from 0.89 to -0.13, with the top performers frequently in concordance with the human experts’ determination of cancer stage and location. The systems had the most difficulty in identifying smaller metastasis, with a detection rate below 40 percent, though that’s a difficult task even for human experts. Post-event analysis also showed that combining multiple AI algorithms together resulted in higher kappa metrics than any single algorithm.
For pathology, perhaps the most important link in the cancer diagnostics chain, the CAMELYON challenges were a highly publicized event that proved models could be trained to exceed human-only performance in diagnosis. In less than two years, it has sparked a flurry of efforts to demonstrate how this class of algorithms could be scaled to address real-world heterogeneous data. These models are already being applied in pilot settings for disease states like melanoma or prostate cancer, not yet as a surrogate for, but as a supplement to the human pathologist—quality control systems that ensure risky cases are not overlooked.
These, and a number of computer-assisted diagnosis applications are already impacting radiology and pathology, providing a virtual second opinion, or suggesting areas in scans or tissue that might require a closer look by a human expert. On the horizon are new computer-based screening methods that allow pathologists and radiologists to eliminate obviously benign or straightforward cases from heavy workloads, enabling pathologists to focus on cases that need attention and possibly further testing. Soon, we may see AI-based companion diagnostics, which enable a targeted approach to matching patients with the perfect therapy for the profile of their tumor—a core theme of precision medicine in cancer.
The black box fallacy
Progress from a field defined largely by academic work into real commercial applications had happened very rapidly. Many points along the diagnostic chain are poised for rapid transformation. But it’s still early days, and nothing happens overnight in healthcare. Patient data is sensitive, and human lives hang in the balance, two defining characteristics of AI in cancer that gate the translation of research to commercial applications. Increasingly impactful applications and widespread adoption will require costly and time-consuming validation and regulatory work.
Much of the regulatory debate in applying AI to diagnostics is the cloudy nature of the underlying processes that drive decisions in deep learning models, often considered a “black box.” Historically, mechanisms that drive diagnostics approved by regulatory bodies are well known and easily explained. Some question whether deep learning models will be able to pass through the same level of scrutiny that regulatory bodies require. But this challenge is already being proven a surmountable one.
In early 2017, the U.S. Food and Drug Administration (FDA) approved the first AI-based tool for use in a clinical setting. The Arterys Cardio DL software reviews conventional cardiac MRI images to produce ventricular segmentations in a matter of seconds. Perhaps indicating the role future AI tools might play in clinical applications, Cardio DL is at least as accurate as and far faster than a human expert and provides editable results to supplement the analysis. Since then, the FDA has approved additional AI-based tools for assessing and diagnosing diabetic retinopathy, wrist fractures, and stroke damage.
A new way to diagnose cancer
Machine learning is opening up possibilities in healthcare and cancer diagnosis. AI is now being incorporated into tools for a wide range of clinical applications. We’re already seeing this transformation unfold, and widespread adoption is on the horizon.