A collaboration between the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Mayo Clinic, and the University of Michigan are introducing a new machine-learning-driven approach to LTBI diagnostics. While leveraging a high throughput detection technology and powerful bioinformatics, this approach aims to reveal multi-marker signatures for LTBI diagnosis and risk stratification.
Latent tuberculosis infections (LTBI) are estimated in nearly one-quarter of the world’s population, and of those immunocompetent and infected roughly 10 percent will proceed to active tuberculosis (TB). While current diagnostics cannot definitively identify LTBI and provide no insight into reactivation risk, there remains an unmet diagnostic challenge of global significance.
This approach is enabled through an individualized normalization procedure that allows disease-relevant biomarker signatures to be revealed in a basal immune response. Specifically, cytokines (cell signaling proteins) were detected using silicon photonic sensor arrays and multidimensional data correlation of individually-normalized immune responses revealed signatures important for LTBI status.
“The difficulties of extracting biomarker signatures from LTBI data is primarily analytical because these data belong to a class of problems where the number of measured predictors is larger than the number of samples in the study,” said Heather Robison, a former postdoctoral scholar at University of Michigan who completed her Ph.D. at the University of Illinois, “we have employed an approach from machine learning based on random forest, Boruta, which has shown success in biomarker signature extraction particularly for this type of data set.”
Boruta, a feature selection method, uses a Random Forest classification algorithm with shadow features, which are data that are random by design and act as a threshold above which real features that are truly important emerge. This approach removes biomarkers that do not show any correlation with specific LTBI related diagnoses, while retaining all features with importance above the noise, thereby avoiding the elimination of relevant features.
“The VI-Bio group has developed an environment that incorporates approaches from the domain of machine learning for feature selection, feature ranking, and predictive modeling which was utilized for this project,” said Loretta Auvil, data scientist. “We strive to identify the definitively important features from the large number of features that have been gathered.”
In addition to Boruta analysis, each of the individually selected features were further statistically evaluated to first ensure their differential diagnostic capacity relative to LTBI status.
“A multi-array test can provide a more detailed, disease-specific glimpse into patient’s infection and likely outcome,” said Ryan Bailey, professor in Department of Chemistry at the University of Michigan and an author on the study. “Using a precision medicine approach reveals previously obscured diagnostic signatures and reactivation risk potential.”
Since 2010 the Mayo Clinic & Illinois Alliance for Technology-Based Healthcare has launched numerous collaborations that have grown and continued for years. “Tackling these hard problems takes strong commitments from a diverse group of people,” said Colleen Bushell, senior research scientist and lead of NCSA’s VI-Bio group. Thanks to the Alliance’s funding and efforts to bring together innovative teams, this collaboration has leveraged NCSA’s expertise in machine-learning and computation to make strides in precision healthcare using big data to improve diagnostic treatments and early detection, all while keeping patients at the center of its focus.
These results demonstrate a powerful combination of multiplexed biomarker detection technologies, precision immune normalization, and feature selection algorithms that revealed positively correlated multi-biomarker signatures for LTBI status and reactivation risk stratification from a relatively simple blood-based assay.
Learn more about the study, published in Integrated Biology here.