Your corresponding editor really loves to review these genomics programs, as genomics (the study of the entire gene complement in an organism) is his area of research, and an exciting one at that. It is now at the center of a cutting-edge movement within the area of personalized medicine, as we have come to realize that no two humans are exactly alike and that drugs and other treatments, which are useful in one patient may not help (or may actively hurt) another. The software for doing this is highly advanced in that its functioning mates the precision of mathematics/statistics with the variability of biology. Now on to Partek’s latest version…
Version 6.6 is unique in that it can easily integrate data from a variety of sources, assays and vendors into a single study, which is very useful from the biological interpretation standpoint. Unfortunately, we have no genomics software that goes the full gamut of integrating all known inputs to new drug discovery (e.g. pathway analysis, proteomics, post-translational events, epigenetic events and drug metabolism, to name but a few). Researchers in the area quickly learned that there is much more involved in drug discovery than simply gene expression. Still, in the strictly genomic arena, the software has much to recommend it, including ease-of-use features, gradual learning curve, customization of many features, excellent help materials, and a strong tech support department, all within a menu-driven base!
Enhanced features in this latest version include:
- gene expression
- miRNA expression
- exon expression
- copy number
- allele-specific copy number
- loss of heterozygosity (LOH)
All statistical features standard to genomic analysis and graphics are included:
- three-dimensional PCA plots to visualize data distribution and outliers
- hierarchical clustering to classify vast numbers of genes into revealing patterns
- profile trellis to identify groups of differentially expressed genes
- Venn diagrams to identify common and unique sample characteristics
- motif discovery to determine binding-site motifs of sequences
- gene ontology enrichment to determine gene grouping based on their molecular functions
- chromosome viewer to visualize how samples map to the reference genome with customizable tracks for SNP detection, differential expression, peak detection, gene annotation and methylation regions, all integrated into a single study
As per usual, there are import and data formatting functions that are a pain until the user gains a bit of familiarity with the system. As the dia- log boxes get more intelligent, they keep asking for further information. There are also the nomenclature problems,as some software asks for the same thing as others but with different verbiage (e.g. when I used my old WinZip program, I could ask it to Unzip the file. Now, it asks me to Extract the file; when certain programs ask for the “type” of data some specify the precision, some by mathematical type (real, integer, text), others, such as my venerable JMP, classify them as continuous, ordinal or nominal!
It takes a bit of practice to translate from one to the other. These are actually minor quibbles, as when the user acquires the necessary knowledge to address the software’s idiosyncrasies (it doesn’t take long; the easiest way is to call the help desk if you are impatient). The tools are actually a delight to use with a little practice and are comprehensive, especially in the exploratory analysis and assisting in the biological interpretations. Specifically, these include
- Venn diagrams
- clustering with “heat maps”
- parametric and non-parametric ANOVA
- principal components analysis
- correlation matrices (with automatic “find correlated variables” function)
- parametric and non-parametric one and two sample tests
- automatic removal of batch effects
- multiple test correction
Now, as to the graphics, it is easy to generate quality assessment (QA) graphs of the chips with the ‘postImport QC spreadsheet’ (Figure 1).
The chip itself can be visualized as a pseudochip picture to further assist with QA and spotting patterns (Figure 2).
Finally, a group separation may be had with the PCA graphic (under the review title, page 1 of review). This graphic is rotatable in three dimensions and is interactive with the workflow sheet. Furthermore, it is possible to color and group the points in the PCA plot (Figure 3).
We may also pull up the obligatory line plot to see lines of probe intensity versus frequency of probe intensity. Again, this graphic is interactive with the workflow sheet and may be customized to user preferences (Figure 4).
The ‘Sources of Variation’ plot dramatically illustrates the major inputs that affect variation in the experiment (Figure 5).
It should be noticed that the variation in a single gene may be quickly produced by right-clicking on the row header and asking for the Sources of Variation on the resulting pop-up.
The two remaining obligatory genomics plots are the Volcano plot and the hierarchical clustering heat map that will assist in identifying significantly regulated genes and patterns across genes and samples (Figures 6 & 7).
If we need to look at a graphic of the intensity values and distribution for a single gene in a sample vs. control grouping, this is accomplished by requesting a Dot plot of the original data (Figure 8).
In this example, it is easy to discern the difference from the control patient to one with Down Syndrome.
In summary, the software combines easy-to-use statistics with interactive graphics and is available for Windows, Macintosh and Linux. Interested parties should request the free two-week trial version. Downloading and setup are quick and easy, and the novice will find many exciting and extremely helpful features.
624 Trade Center Boulevard
St. Louis, Missouri 63005
314-878-2329; Fax: 314-275-8453
John Wass is a statistician based in Chicago, IL. He may be reached at editor@ScientificComputing.com.