Accessibility of Data and Application of Algorithms to Provide Insights in Predictive Analytics

Michael Boruta is Optical Spectroscopy Product Manager at ACD/Labs. Although there are a diverse range of applications for predictive analytics in R&D, two common basic requirements are data and insight. Data may be generated by running experiments/analyses, or re-applied from previous work when available. Insights come from application of knowledge — both explicit (read/formally accumulated) and tacit (accumulated over time/experience). There are a variety of roles for informatics in predictive analytics, here we will discuss accessibility of data and the application of algorithms to provide insights.

Data Accessibility

Analytical techniques are applied in many areas of R&D to help characterize materials and understand their composition, quality, purity and physical properties. An understanding of the problem at hand helps scientists decide what experiments/analyses to run. If organizations are able to effectively capture analytical knowledge and enable instant re-use, the initial investigation may focus on reapplying existing data. This can help drive decisions quickly and provide a more focused approach to further experimentation. The difficulty experienced by many organizations is ineffectual data capture and accessibility.

Not only do we need data, but also the ability to effectively use it, which raises the questions, why capture data, what data is required, and how should it be captured? The lack of success of many attempts at knowledge management in the past have been due to unclear definitions around the why, what, and how data is accumulated, which often resulted in saving everything without regard to how the data would be used and little information about connections between pieces of data and the results or decisions garnered from it.

Knowledge gathered by scientists in their daily work is an invaluable asset. The inability of existing informatics systems to capture this tacit analytical knowledge and turn it into explicit knowledge that may be re-applied is a gap that many organizations are now recognizing. Heterogeneity of data in analytical chemistry leads to silo data cultures, making data access challenging. Standardization of analytical data from different techniques and different vendor formats into a single environment is invaluable at bringing together all the relevant information in one place. The capability to automate data collection from instruments removes tedious manual labor in data retrieval. The final hurdle to data access is the ability to find what you want when you need it. Search parameters tied to analytical and chemical features (structure, substructure, spectral/chromatographic text/numeric) enable scientists to access the most relevant information quickly and easily.

Delivering Insights

The traditional accumulation of knowledge involved mentoring from an experienced colleague whose years working in their field and skill at pattern recognition enabled them to pick components out of a complex mixture spectrum with a quick glance. Today, laboratories want more immediate access to this tacit knowledge by wider audiences. Algorithms that aid data analysis can help to deliver insight along with a searchable knowledgebase of contextual data. Since prediction algorithms are only as good as the data behind them, it is important they are able to be trained to be adaptable to future needs.

Predictive analytics is a vast field and allows scientists the ability to apply knowledge beyond their immediate experience to see trends, predict behaviors and make decisions based on smaller amounts of data.

There are a multitude of solutions to aid scientists. For example, the ACD/Spectrus Platform was developed to help organizations capture the chemical context of analytical data, whether that context is in the form of metadata (such as experimental conditions) or the tacit knowledge of individual users concerning the importance of specific spectrum structure correlations. When data is standardized and captured in context, it is easier to apply beyond its initial application. Furthermore this vendor neutral environment addresses the problems of data heterogeneity, and in-built chemical intelligence means all relevant data is interconnected providing a full map of information for future leverage. Finally, the capture of live data in the Spectrus Platform that can be instantly re-analyzed and reapplied makes this solution an effective analytical and chemical knowledge management solution.

Chemically intelligent algorithms in various applications help scientists gain insight from data. For example, NMR chemical shift prediction algorithms assist in interpretation of the spectra of new molecules. Addition of structure/shift assignments for new classes of molecules or different nuclei not only help improve the performance of prediction, but also help extend the applicability of the algorithms to novel chemical space. Similarly, molecular property prediction algorithms, such as those in ACD/Percepta, and algorithms for retention time prediction, like those in ACD/Spectrus chromatography applications, learn from new data added to the training set to aid the scientist. Clustering algorithms also give insights into how data is related to one another and rapidly allow the comparison of new samples with previously clustered and reviewed data to assign proper classifications.

Michael Boruta is Optical Spectroscopy Product Manager at ACD/Labs. He may be reached at [email protected].

Related Articles Read More >

Unlocking the value of your scientific data

Sofar Ocean debuts Maritime Open Standard, Bristlemouth, at OCEANS 2021

The natural resources industry can no longer afford to be a digital laggard

Cambridge Quantum develops algorithm to accelerate Monte Carlo Integration on quantum computers

Search R&D World