Overlap Density Heatmap Technology
A novel visualization tool for spectroscopic and metabolomics study
Spectroscopists typically generate large amounts of spectral and chromatographic data from various analytical instruments. These data require an efficient data mining and
click to enlarge
analysis process in order to sieve the requisite information from the datasets. In many cases, it is necessary to display multiple spectral data at the same time in order to identify trends in the data. However, techniques employing stacked or offset display modes to visualize large amounts of spectral or chromatographic data can make it very difficult to discern the trends and to make correct interpretations. This problem is well illustrated by the example depicted in Figure 1, which shows a traditional graphical representation of a plurality of infrared (IR) spectra. Each spectrum is arbitrarily assigned a different unique color. While there is clearly overlap between the different spectra, it is difficult, if not impossible, with this traditional type of display to visualize the areas of highest overlap among the IR spectra displayed.
Systems and methods designed to overcome this data visualization problem should offer useful displays of overlapping data, and should allow for flexible manipulation of the displays to provide enhanced data mining and trend visualization ability. One new technology designed to address multiple display of large spectral and chromatographic data is termed Overlap Density Heatmaps (ODH).1 This new approach provides a clear and concise way of discerning similarity (commonality) and dissimilarity (uniqueness) among a large set of overlaid spectral data. In this way, it offers a discrete look into traditional analytical as well as the new area of biomarker identification and metabolomics research.
Figure 2a shows an ODH display of the same spectral overlay of the data depicted in Figure 1. The display highlights common and unique features of overlapped objects through color coding levels of overlap. By changing the OD scale (Figure 2b), one can choose to display only those features of a certain level of commonality (ODC) or uniqueness (ODU) and can generate their respective consensus spectra. Areas of highest similarity are colored red, while areas of highest uniqueness or dissimilarity are colored violet, and all regions of moderate overlap are colored in intermediate colors. By adjusting the OD slider in Figure 2b, unique or similar features can be easily discerned from large numbers of overlaid spectral data. In principle, there are no limitations to the number of spectral data that can be visualized by ODH; limitations are set only by the memory limit of the user’s computer hardware.
Below, we describe two examples of the application of ODH to analytical and metabolomics studies.
• Infrared analysis of fiber mixtures
An IR spectrum of a nylon-rayon mixture was used as a search query against an IR spectral database.2 A maximum of 50 hits were requested. Principal component analysis (PCA)3 of the resulting hit list yielded clear separation between the 15 nylon and 35 rayon spectra; the search query — a nylon-rayon mixture — was located in the middle region of the PCA scores plot between the tight clusters of nylon and rayon hits.4 ODH displays of the spectral overlay of each class of fibers in the PCA plot highlight the level of similarity within each class. The areas of highest overlap in the ODH display for each class are shown in Figure 3, highlighting both the clear difference in the spectra between these two classes of fibers as well as the high degree of internal similarity within each class.
ODH shows a clear utility in assessing the two components of this mixture. Superimposing the search query spectrum (mixture) onto the OD display of the pure rayon hits (Figure 4a) and moving the OD slider to the area of maximum dissimilarity (OD level = -88; violet) reveals a spectrum of the nylon class (effectively subtracting out the more
click to enlarge
Figure 2: (a) ODH display of the multiple IR spectra shown in Figure 1, (b) color-coded slider (OD level = 0) for selectively visualizing regions of maximum similarity (red) or maximum dissimilarity (violet) in overlaid spectra.
common rayon spectral features in the combined set). Conversely, moving the OD slider to the area of dissimilarity (OD level = -88; violet) when the mixture spectrum is overlaid onto the set of nylon hits yields a spectrum for the other component in the mixture, rayon (effectively subtracting out the more common nylon spectral features in the combined set) (Figure 4b).
This study represents an excellent example of the utility of color-coded ODH’s ability to easily delineate mixture components by adjusting the OD slider to highlight areas of similarity or dissimilarity. Such information is difficult, if not impossible, to discern from a traditional stacked display of multiple spectra.
• MR-based metabolomics study of diabetic and non-diabetic patients
37 1H NMR spectra of human serum samples of diabetic and non-diabetic patients were analyzed using the ODH approach in a study of biomarker identification.5 The data was placed into a database and sorted by “status” into the two sample populations. OD spectra of the normal (24) and diseased (13) samples were overlaid, and an OD consensus spectrum generated. This process was repeated for the diseased samples. A difference spectrum (Figure 5) was generated by subtracting the OD consensus spectrum of the normal samples from the diabetic samples in order to identify peak regions that may be diagnostic for a potential biomarker. Using the difference OD spectrum as a search query against the 1H NMR library of 256 common metabolites retrieved D-glucose and its derivatives among the top hits.6 D-glucose is a known biomarker for diabetes, confirming the utility of the ODH spectral approach to metabolomics study.
A far more powerful approach for spectral analysis, pattern recognition in metabolomics is afforded by a combination of PCA and ODH approaches. For the same biofluid sample described above, we applied PCA to separate the two classes of diseased and normal patient populations. The scores plot shows clear class separation. The loadings plot along PC2 was compared to the consensus OD difference spectrum and found to share close similarities (Figure 6). Confirmation of the utility of ODH-based approach to metabolomics was afforded when the loadings plot, used as a search query against the metabolite database, retrieved D-glucose as one of the top hits (Figure 7), a result consistent with that obtained in the ODH-based study.
ODH technology is a new tool that promises to have significant impact in how large spectral or chromatographic datasets can be better visualized and analyzed. The examples demonstrated here show the applicability of this tool within traditional analytical research, as well as the emerging area of metabolic profiling and metabolomics. In the diabetic study, ODH-based difference spectrum identified the requisite regions between diseased and control samples, and a search of a metabolite database using this difference spectrum yielded a known biomarker for this disease type. In addition, the ODH-based difference spectrum correlates quite nicely with the loadings plot from a principal component analysis of the data. This suggests that the combination of ODH and PCA can be a very powerful diagnostic tool for spectral visualization and chemometric analysis of metabolomics-related research. Further studies are underway to extend the application of this tool to other spectroscopic and chromatographic datasets.
1. Overlap density heatmap application released as part of KnowItAll Informatics System 7.0, June 2006,
Bio-Rad Laboratories2. Sadtler “IR-Fibers by Microscope” available as part of HaveItAll spectral database, Bio-Rad Laboratories
3. Principal component analysis performed with AnalyzeIt MVP KnowItAll application built using Infometrix Pirouette IPAK technology
4. G. M. Banik and M. Scandone, Int. LabMate, 2006, Vol. XXXI, Iss. II., p. 76.
5. Dataset from Professor Bin Xia, Beijing NMR Center, Peking University, Bejing 1088971, China
6. 1H and 13C NMR database of metabolites obtained from Biological Magnetic Resonance Bank www.bmrb.wisc.edu
Omoshile Clement is Manager, Product Management; Ty Abshear is Manager, Software Development and Databases; Gregory M. Banik is General Manager, Informatics Division; and Chen Peng is Senior Scientist, Software Development, at Bio-Rad. They may be reached at editor@ScientificComputing.com