What’s so remarkable about it? Well for one it was originally geared to chemical engineers, is now used by a wide spectrum of chemists and is remarkable for its choice of tools, namely experimental design (DOE) and multivariate statistics. It does a good job of both and the developers have added many new features (this is NOT your father’s ver. 9.2!). As with most software, it has its own way of doing things and takes a little getting used to however, with a little practice the routines get easier and the learning curve is not steep. Let’s start in with an overview of the capabilities and then take a tour of the software in action by following one of the tutorials.
As was mentioned, the capabilities have been considerably bulked-up from the previous versions and there are many changes on the main menu. The file menu has been considerably reduced with several choices assigned to other areas. The edit menu has been simplified making variable changes slightly easier. This is one area where I would love to see standardization across the industry as defining variables is always a pain in that the definitions change in nomenclature between platforms and easily confound the novice. However as most practitioners concentrate on a single platform, experience soon overcomes the initial confusion.
The other choices such as View, Plot, Modify, Tasks, and Results have all been simplified and reassigned. Most of these changes were for the better as the new menu bar offers a cleaner result that is easy to follow and almost intuitive to use. The new Tools menu offers some really nice features such as the Matrix Calculator, design extensions, and Audit Trail, this last being invaluable in today’s documentation heavy environment. Examples of the Plot and Tasks/Transform menus are in figure 1 and 2.
![]() figure 1 click to enlarge |
![]() figure 2 click to enlarge |
Not to ignore the statistics, where the software has much to offer within its niche, we have the following (as well as the ability to classify and predict) as seen in figure 3:
![]() figure 3 click to enlarge |
Now that we have a brief background on the capabilities, let’s walk through an example. Here I should apologize to the chemists and engineers as I have chosen a life science example (and one that I’m somewhat familiar with), Fisher’s classical Iris problem. For the life scientists, as you undoubtedly guessed from the Transformations menu, a lot of the UnscramblerX is targeted to analysis of chemical spectra.
As to background, the problem here is to classify 3 types of Iris’ based upon four measurements taken on a number of plants. The first item is to import the data and define variables. Importing is usually easy from the File/Open or File/Import Data menus (and EXCEL files are imported with minimal hassles) but here we merely click on the tutorial link and the file is imported (figure 4):
![]() figure 4 click to enlarge |
There are four measurement types as well as a classification column that marks the rows as either Calibration or Training (for some reason a 50:50 split is used, rather extravagant in small samples). This may be confusing to the novice used to Training versus Testing nomenclature but may be more easily assimilated by the chemists. Note also the addition of a column to the extreme left. This is the classification column that tells the software what samples belong to the same class, a requirement for further analysis. This last task is accomplished by a simple Edit/Inset command followed by specifying the levels of the categorical data variable.
Once the data is imported and formatted the analysis begins with a simple (Hierarchical) clustering of the data (figure 5):
![]() figure 5 click to enlarge |
In this case there are actually two varieties of Iris commingled in the blue area. To try a finer classification we use Principal Components analysis to get a more complete picture. We get the usual scores and loadings (not shown) as well as the Influence and Explained Variance graphics (figure 6):
![]() figure 6 click to enlarge |
It is nice to see that we have already explained approximately 95% of the variance with our first two principal components. All graphics can be modified by zooming, coloring, and scaling the axes, among other things.
Finally, to get the best split or classification, the user must create class models for each of the iris types by PCA and then choose a classification scheme to do the final grouping. This gets a bit more involved than I would like, but suffice it to say that we produce a detailed separation graphic that is interpretable (figure 7):
![]() figure 7 click to enlarge |
As we still have overlap, we must be satisfied with incomplete separation however, this reflects a true problem in the data collection (design) rather than any shortcomings in the software.
What I really enjoyed with this new version was all of the helps in the tutorials (the addition of tips and rules of thumb are priceless to the novice) and the readily searchable help files. In addition, there are instructional webinars and instructor lead classes to assist in mastering the techniques. Little touches like the Beginner/Expert Slider in the DOE routines (which toggles the choice menu between model names and actual descriptions of the model) are really valuable additions.
Anyone seriously interested should also obtain a copy of Kim Esbensen’s excellent text on multivariate data analysis which is available from the Camo website. Other than some nomenclature problems and a few “undocumented features” in the DOE, there is little wrong with the software and it has very much to recommend it.
Availability
- The UnscramblerX : $7,000 (Single User, Industrial), $2,500 (Single User, Academic).
CAMO Software Inc.
1 Woodbridge Center , Suite 319 , Woodbridge, NJ 07095
Phone: (732) 726 9200
Fax: (973) 556 1229
www.camo.com
John Wass is a statistician based in Chicago, IL. He may be reached at editor@ScientificComputing.com.