Advancing analytics to a new generation through dynamic visualization and advanced diagnostics
click to enlarge
Figure 1: Bubble plot of total CO2 emissions from fossil-fuels (thousand metric tons of C) by births (20-24 yr. age group), across year (ID Nation) – Year = 1971
A perennial favorite, JMP is statistical software that finds wide usage in biological research, engineering, product development and business. Simple to use (with a little practice), dynamically linked to graphics, and possessing remarkable statistical breadth, the older versions had much to recommend them. JMP 7, a major upgrade, advances analytics to a new generation through dynamic visualization of data and advanced diagnostics. Simplicity is still there, but this is no longer software just for the novice.
Subtitled ‘Statistical Discovery from SAS,’ this new version contains several valuable advancements in visual examination of data, near unlimited database size, and easy access to SAS for those who have it. Long used in a variety of industries for pattern discovery and just plain data snooping, JMP is becoming a valuable tool for advanced analytics, especially in those areas of life science and chemistry that are utilizing ever larger data sets through breakthroughs in high-throughput screening. [Bias Warning: I am a longtime user and have a very happy relationship with the software (it works in the hands of even the less sophisticated)]. A summary of the new features is provided in Table 1.
JMP now comes in five flavors: Windows (2000, XP, or Vista), 64-bit Windows, Macintosh, Linux, and 64-bit Linux. Memory requirements vary from 128 MB to 1 GB depending upon the OS. Likewise, disc size allowances go from 110 MB to 150 MB, with 64-bit Windows being the most demanding. To fully utilize the new graphics, 1024×768 screen resolution is the minimal recommended.
Although the tendency, in most cases for purely economic reasons, is to go with on-line or electronic versions only, the JMP people mercifully took pity on us paper junkies and again supply five really
click to enlarge
Figure 2: Bubble plot of total CO2 emissions from fossil-fuels (thousand metric tons of C) by births (20-24 yr. age group), across year (ID Nation) – Year = 1996
nice manuals. These are: Introductory, Users, Statistics and Graphics, Design of Experiments (DOE), and Scripting. Experienced users will find little in the first two, but the beginner will find them invaluable. As one who avoids programming whenever possible, I found limited use for the scripting guide, but looked upon the Statistics/Graphics and DOE guides as old friends. The former has been bulked up with sections on the new graph types and statistical methodologies. While not heavy on theory, there is much here on practical application using the many examples provided with the software. Engineers, scientists and business types will all find suitable examples from their respective disciplines to make life a little easier.
My complaints on these manuals remain the same, and include the online help. As with most technical tomes, the indexing is woefully inadequate. In the Statistical manual, there is no entry on slopes, and regression is covered in three superficial and uninformative lines. When attempting to find information on multiple regressors in the Help menu, typing in “ANCOVA” generated a ” No Topics Found” box and forced me to type “analysis of covariance,” which then generated a large list of uninformative (for my purpose) descriptors.
Now for the bright and shiny part! JMP has long been used for its excellent graphics that are dynamically linked to the data. At the same
click to enlarge
Figure 3: 3-D scatterplot of “SmogNBabies” data
time that a graph is presented, many diagnostics and descriptive statistics also appear. Now we have a new type of interactive dynamic plot that allows visualizing changes in several parameters across another, e.g., size and shape versus time. Figure 1 shows total CO2 emissions versus births in a particular age group in many countries. Notice Mexico, India and Japan. This was in 1971. By moving a time slider, the analyst can “see” what happens as time progresses and the level of pollution increases (Figure 2). The same data is presented as 3-D scatterplot, colored by country, in Figure 3. Figure 4 presents this same data as a scatterplot matrix whereby each individual contributor to the smog is graphed against time and, again, colored by country. Applications in molecular biology and chemistry may be envisioned whereby the genetic components embedded in chemical groupings may be seen to move with time under treatment, and separate groups may be tracked through biological processes or chemical reactions.
JMP has always offered an ample menu of statistical tests and diagnostics which has slowly grown with each version. Best known for its menu-based approach to statistics, rather than the test-based approach of most other commercial software, it may be a bit confusing to the new user. For example, rather than seeing a dropdown menu offering a list of test names, the user is greeted with menus that reflect the type of analysis needed, e.g., distribution, fit Y by X, fit model, multivariate methods, etcetera.
Although much work has been put into upgrading the popular areas of quality assurance, such as control charting, capability analysis, and variability and gauge charts, the hard core statistician has not been ignored. The SAS
click to enlarge
Figure 4: Scatterplot matrix of “SmogNBabies” data
integration features alone are enough to justify its use to anyone with SAS on the desktop. By combining the outstanding array of tests that has always been a staple of SAS with the dazzling and informative graphics capabilities of JMP, those using both leverage these powers.
For the rest, the addition of the categorical platform, the improvements on the time series platform, the increased speed of the clustering routines, and the more sophisticated Gaussian modeling properties will be greatly appreciated. For those doing experimental design, the optimality and increased speed capabilities will be most welcomed. Lest I short-change the molecular biologists, JMP now has the capability to work with near unlimited number of rows and two billion columns (luckily I had no need to test processing speed at that size). This is invaluable to the users of those very large genomic, proteomic and chemoinformatic data sets.
click to enlarge
Table 1 : New Features in JMP 7
Due to a long history of aggressively soliciting customer feedback (and actually listening to what they say!), JMP 7 has capped a long line of versions that have very little in the way of purely cosmetic upgrades but always include substantial analytic advances. The learning curve for this software is amazingly gradual and the novice can become proficient in just a few weeks. Beyond that lies the joy of discovering the fine tuning capable on the graphics and tables, as well as a better understanding of many areas in applied statistics that the manuals and on-line help will illuminate.
The price of most packages also includes one year of support from their excellent technical assistance staff and most questions are answered within 24 hours (although I have found them to be much faster than that). Considering the small price differential between a six- and 12-month student package and the programming oriented freebies available on-line, this package is highly recommended to those students that are not contemplating careers as programmers and need more in the way of a powerful, ready-to-go tool rather than an applications development platform (although JMP can do that too).
• $1,495 commercial
• $595 academic
• $49.95 academic 12-month license
JMP, a business division of SAS
SAS Campus Drive, Cary, NC 27513
John Wass is a statistician based in Chicago, IL. He may be reached at editor@ScientificComputing.com.