A stand-alone package useful to the scientist, engineer and student
As with most things, the more familiar are what we usually feel comfortable dealing with on a daily basis. In this case, although there are probably thousands of software packages pertinent to data analysis worldwide, we tend to concentrate on the ones from the U.S. It is therefore a nice change to see what else is out there and review a package from our colleagues in Britain, Australia and New Zealand.
GenStat is made to summarize, display and analyze data. It provides some flexibility and much simplicity in doing this, and more than a few surprises in how the routines are laid out, the dialog boxes constructed, and how the user interacts with various parts of the program. It’s not that the software is difficult to use, it’s just that it seems a bit “different.” These differences can sometimes be annoying, as when the user is trying to intuitively feel their way through a menu-driven sequence, or delightful when one discovers an easy way to do something, or an especially illuminating output format. In any event, it is just a bit different from the usual statistical program but offers sufficient coverage in terms of testing and graphics, so as to satisfy almost all of the common analytic needs.
The software is written for PCs running Windows 95, 98, NT, 2000 and XP with minimal hardware requirements (Pentium with 32 MB RAM). It does have the rather annoying requirement for a license key, but this is easy to get with an Internet connect.
Documentation
Readers of these reviews know that this editor is an aficionado of paper manuals, so it is always a joy to sift through a new manual to discover what nuggets may be mined or lead blocks critiqued. In this case, we have more praise than brickbats. While there is not much (save two items) that stands out in the single manual and distinguishes it from the usual, the clear, simple style and logical layout do much to recommend it. The chapter headings are refreshingly descriptive, informing the reader of the tasks that are being done rather than the tests or procedures, and typos seem non-existent. The examples are illustrative and science-oriented. Some warnings to alert the novice analyst to common pitfalls are included, as are detailed descriptions of stepping through the menu-driven elements. In fact, if the text can be faulted at all, it is in this concentration on how to actually do the test at the expense of discussing the statistical underpinnings.
Now for the real gems! The manual is lightly sprinkled with those precious rules-of-thumb that are so helpful to the beginning analyst and applied scientist, and there are actually exercises at the back of each chapter. This is a first in my experience (it may be more common in rest-of-world packages). In any case, perusal of the manual is encouraged.
Workspace
Another big plus! We finally have a worksheet that is Excel-like. It is relatively straightforward to copy, paste, and move and delete multiple columns. Provisions are made to mathematically treat columns with a variety of functions. Wizards are available to import, display and convert data structures. There are also several features that are not routine in Excel, such as the ability to copy column names, stack or merge data and automatically subset data. These will all make life quite a bit easier when wrestling data sets into shape. The layout is standard and will be familiar to anyone who has used Excel or statistical spreadsheets. Customization of the toolbar is straightforward. Data formats supported include ASCII, Excel, Lotus, Minitab, SPSS, SAS, Splus, Systat, Stata, MATLAB and Gauss. Output may be formatted to RTF, HTML and Latek.
Statistics
Surprisingly extensive, the menu overview is displayed in Table 1. Each group is just a major heading and may contain up to 13 subheadings, some with even further choices. Thus the table reflects just those broad categories, each of which is well-represented with specific tests.
Table 1: Statistical test groups in GenStat | |
Summary Statistics | Six Sigma |
Statistical Tests | Survey Analysis |
Distributions | Time Series |
Regression Analysis | Spatial Analysis |
Experimental Design (DOE) | Survival Analysis |
ANOVA | Repeated Measures |
Mixed Models | Multiple Experiments |
Multivariate Analysis | (Meta Analysis) |
Microarrays | Sample Size |
The depth and breadth of the routines will be adequate for most users. Major omissions are few but occasionally disappointing, mostly in the area of automatic generation of support graphics. For example, in linear regression we come to expect graphics with 95 percent confidence and prediction intervals without having to manually intervene. In fitting a distribution, it is nice to see a data histogram with the selected fit overlaid as a line. The output is always plain and simple; those accustomed to SAS without the fancy output data system (ODS) will be used to this, however (Figure 1). Version 8 does include color and font changes that upgrade the appearance a bit. The context sensitive help is a bit light, and the glossary could use additions as a help to the non-statistician.
The choice of tests and options are quite good and all that is necessary to generate a reasonable analysis with supporting diagnostics has been included. In some areas, such as
click the image to enlarge Figure 1: Typical statistical output |
experimental design, the minimal inclusion of features (compared with DOE software) is still enough to produce most diagnostics as well as a number of user-modified designs. The repertoire is enough to satisfy most needs without confusing or overwhelming the novice analyst. For the more serious, there is a high-level programming language built in, which will allow customization of testing as well as the generation of new routines.
I was quite surprised (but delighted) to see the inclusion of a platform for microarray analysis. This is a hot area, but most standard statistical packages will not even import the proper files let alone do the analyses. The design screen comes up for the two-color type areas but, presumably, the oligonucleotide arrays could be imported as subsequent choices do allow for Affymetrix data. Someone on the development staff has at least a nodding acquaintance with these types of studies, as the further choices such as its ability to calculate log ratios and expression values, as well as to normalize data and make use of Empirical Bayes Error Estimation and False Discovery Rate, are all expected in software that normally does this.
Graphics
Graphics capabilities in most statistical programs are limited, and this is no exception. This is not a great shortcoming, as the list of graph types is rather impressive, and the adroit user can figure out how to construct any special graphic to illustrate a point or clarify an issue. Unfortunately, sometimes something very helpful becomes overly cumbersome or just plain not possible. For example, when generating a histogram of the data, there appears to be no way to overlay a line of the normal distribution for comparative purposes. Although most graphics are capable of extensive axis modification, basic graph generation is too non-intuitive in many areas. In particular, attempting to produce a 3-D graphic resulted in error dialogs most of the time (missing data) and very odd graphics the rest of the time. It is possible to rotate these graphs, but the manual and in-program help menus have somehow buried it.
For most applications, however, there are a number of quick and easy constructions available through either the toolbar graphics menu or the graph wizard (Figure 2). To really increase usability, it would be nice to see analysis “x-in-one” graphics, such as the diagnostic graphs for regression (Figure 3). This is especially important in quality assurance, and the lack of these graphs under Six Sigma on the statistics menu is an important omission. They are included with ANOVA, REML and hierarchical generalized linear models, however.
click the image to enlarge Figure 2: Bar chart from Chart Wizard |
click the image to enlarge Figure 3: 4-in-1 diagnostic graphics |
Summary
As with most reviews, we conclude that the package has its pluses and minuses. The former include the comprehensively featured spreadsheet and types of statistical analysis, with ease of use a big plus. The latter include the lack of more fully functional supportive graphics package, as well as more in the way of printed and electronic support from the help sections. Although somewhat pricey by our standards, it is a nice stand-alone package that will be useful to the scientist, engineer and student. The product may be purchased locally through a U.S. vendor and, for further information on the products, the Web site is a good source of information. This includes download of a demo version and program-specific areas with information on obtaining user-supplied utilities. I found on-line support to be very helpful and courteous and questions were rapidly addressed despite the six-hour time differential between my site and theirs.
Availability
• Commercial $2,495
VSN International
5 The Waterhouse, Waterhouse Street
Hemel Hempstead, Herts, UK HP1 1ES
+44-1442-450230; +44-870-1215653
[email protected]; www.vsn-intl.com
Available in the U.S. from
Adept Scientific
7909 Charleston Ct.
Bethesda, MD 20817
800-724-8380; Fax: 1-240-465-0422
[email protected]; www.vsn-intl.com
John A. Wass is a statistician with GPRD Pharmacogenetics, Abbott Laboratories. He may be contacted at [email protected].