Recognizing that administrative health care databases can be a valuable, yet challenging, tool in the nation’s ongoing pursuit of personalized medicine, statisticians Liangyuan Hu and Madhu Mazumdar of the Icahn School of Medicine at Mount Sinai have developed advanced statistical modeling and analytic tools that can make health care and medical data more meaningful. Hu will present their findings August 3 at the 2017 Joint Statistical Meetings (JSM) in Baltimore, Md.
The availability of large electronic health records is promising for medical discovery and efforts to develop individualized treatments. “Powerful statistical analyses and results from these records and databases can be the foundation on which informed medical questions are asked and decisions are made,” notes Hu.
For example, doctors seeking to provide optimal treatment for high-risk cancer patients could consider multiple radical prostatectomy (RP) or radiotherapy (RT) modalities. But, since it is difficult to conduct randomized controlled trials that would yield quality results comparing RP to RT for long-term survival among such a high-risk group, physicians are limited to the available data that can help them make precise, customized decisions. “Therefore, finding evidence using statistical tools from large, representative national databases is crucial to inform such critical medical decisions,” says Hu.
Demonstrating with a case study in chronic diseases, Hu will show challenges typically associated with drawing inferences from electronic health records and administrative databases. Limitations such as uncontrolled data collection settings, practice variation among physicians and missing data can lead to false conclusions, if not addressed properly by rigorous statistical methods. Their methods leverage machine learning and flexible models to draw valid inference using electronic health records sampled from a representative population and reflect outcome from actual clinical practice.
“In clinical prediction studies, we show that combining strengths of nonparametric algorithms and parametric models leads to the development of a data-driven and reproducible tool that will not only generate immediate public health impact, but also advance developments in statistical methodology pertaining to drawing valid and useful information from vast data sources,” concludes Hu.