Menu-driven statistics on steroids (lots of steroids!)
![]() click to enlarge Figure 1: ANOVA startup panel |
It’s been a while since I have reviewed this software, so it was quite a surprise using this edition. The more some things change, the more they seem to stay the same. In STATISTICA’s case, it is still overwhelming. It is rapidly seen, however, that this seeming drawback is greatly mitigated by several new features, such as the Quick Tab on analysis menus, which allows for default settings and ensure a rapid ‘plain vanilla’ analysis that gives an indication of what the data is trying to tell us. It almost seems to be a menu-driven SAS, i.e., whatever you want, it’s there. If you can’t find it, you haven’t looked hard enough. Other than an extremely complete set of statistical routines, this software now offers very powerful data mining, customization, programming and Web features. This review covers the Desktop General and Industrial systems, as well as the Data Miner portion of the Enterprise System. As with most modern software, the user can greatly extend the capabilities by programming and, in this case, by (relatively) simple Visual Basic.
![]() click to enlarge Figure 2: ANOVA/MANOVA repeated measures ANOVA dialog box |
The new version runs on both Windows (2000, XP and Vista) and Macintosh (with a Windows emulator) systems and can be used with modest system features. However, it runs best on a fast processor (2 GHz) with plenty of memory (at least 512 MB) and 220 MB of disk space. This review was done with the Windows Vista system with 2 GB of memory run with an AMD Athlon 64×2 Dual Core Processor (4800+). Now on to the good stuff…
Documentation
My shrink-wrapped package came with three manuals; Quick Reference, Visual Basic Primer and Data Miner. I usually expect fairly clear expository writing as well as concise listing of only the important points. It appears not only that someone in the technical writing department bent over backwards to see that these standards were met but, as a nod to our aging
![]() click to enlarge Figure 3: Variable selection dialog box |
population, they upped the typeface from the usual 8-point (or so it appears to my aged eyes) to at least a 12! Most time was spent with the Quick Reference and the novice users would be advised to ground themselves with a brief review of the first seven chapters. From there, much will appear intuitive.
When further assistance is needed, there is the Electronic Manual with built-in Statistical Advisor and the Web site with a Frequently Asked Questions section and download area. As with most technical software, the court of last resort includes both e-mail and phone support. The Electronic Manual was found to be especially useful as: it is very easy to call up and navigate, it’s very complete, it is well-cross referenced with active links, it’s very logically arranged, and it has excellent search functions. There was very little that I could not find on the second try, and total search time was measured in seconds (very few seconds).
For those new to statistics, the Statistical Advisor will take them step-by-step through the logic of an analysis by asking, in plain English terms, just what the analyst wants to do. From an initial group of choices, it is possible to drill down to specific tests, most of which are fully explained in terms of mechanisms, strengths and weaknesses. This is done quickly through direct jumps from the hypertext links.
Import and data manipulation capabilities
One of the nicest features of a statistics/mathematics program is to be able to quickly and easily import Excel files. STATISTICA will allow the user to open Excel within its work area and perform analyses using
![]() click to enlarge Figure 4: Specify within-subjects dialog box |
Excel as the data source and STATISTICA as the analytic engine. When this happens, the menus of the two programs merge, allowing access to the functionality of both. When using this feature, a dialog box will be displayed to let the analyst ensure that the headers and data are correctly identified. STATISTICA also allows the data types (numeric or text) to be specified for any given columns. Data from other popular statistical programs such as SAS, JMP, MINITAB and SPSS also are easily imported, as are a number of database types.
Once imported, data may be manipulated for any desired analysis from the ‘Data’ menu. These include such useful functions as transpose, merge, filter, recode, sort, subset, rank, transform, shift, stack and standardize.
Statistical menus and graphics
The General Desktop System includes the following modules: Basic Tests, Advanced Linear and Nonlinear Models, Multivariate Exploratory Techniques, Automated Neural Networks, Power Analysis, and the Data Miner/Text/Miner/Sequence-Link Analysis modules. The Industrial Desktop System includes: Quality Control Charts, Process Analysis, Design of Experiments and QC Miner. Each module includes a large variety of test and graphic choices, each with their own submenus and customizations. Let’s look at a specific example using sample data included with the program.
![]() click to enlarge Figure 5: ANOVA results dialog box |
This is an example of a repeated-measures ANOVA using crossed and between-group factors. The data is loaded with the File/Open command, drilling down to the proper data set. Once this has been called, the test is started with the Statistics/ANOVA menu choice, which brings up the General ANOVA/MANOVA startup panel (Figure 1).
Next, the analyst selects ‘Repeated measures ANOVA’ from the ‘Type of Analysis’ menu and ‘Quick specs dialog’ from the ‘Specification method’ menu and hits . Now the ANOVA/MANOVA Repeated Measures ANOVA dialog box appears (Figure 2).
We now specify the design variables. This is an advertising study where our between group factors are gender (male,female) and advert (Pepsi,Coke) and they are crossed, as there are both genders in each level of the advert group. The responses to the survey are contained in the variables Measure01-03. By clicking on the variables button in the above box, we
![]() click to enlarge Figure 6: Model summary |
get the variable selection dialog box (Figure 3).
The dependent variables (Measures01-03) and categorical predictors (Gender, Advert) may be selected by clicking on them or by typing their factor numbers into the appropriate white space. We then click to return to the ANOVA dialog box. For a plain vanilla (read independent factors) ANOVA this is all we need do. However, as this is a repeated measures study, the levels of the repeated measure or the within-subject factor needs to be addressed. This is done by hitting the ‘Within effects’ button (Figure 2) to display the ‘Specify within-subjects’ dialog box (Figure 4).
We have a number of helps here. For one, STATISTICA has already suggested the 3-level factor as the correct within subjects factor. Although we can only specify one within-subjects factor in this box, we could also specify more with a design through the General Linear Models module. Also, to get an in-depth discussion of the present model, the analyst need only press or the question mark on the dialog box to display the Electronic Manual section with links to this test. In this example, we just change the factor name to Response and hit . By then clicking on the ANOVA/MANOVA Repeated Measures ANOVA dialog box (Figure 2), the analysis is performed and the ANOVA results dialog is displayed (Figure 5).
![]() click to enlarge Figure 7: Condensed all effects table |
This box allows a choice of which results will be viewed. By hitting the ‘All effects’ button, a summary of the model is presented (Figure 6). The ‘All Effects/Graphs button will bring up much the same information (Figure 7). However, by double-clicking on any effect, the respective plot is immediately displayed (Figure 8). By right-clicking on the graph, a menu of customization options is displayed (Figure 9).
Most output from STATISTICA is offered with a wide range of customization options, and this short exercise does not even scratch the surface of these extensive capabilities. Among the more impressive are the Variable bundle feature that allows for data subsetting; the By-Group feature, which quickly lets all analyses and graphics become by-group analyses; the
![]() click to enlarge Figure 8: LS means of response |
Quality Sixpacks, which displays the graphic results of many analyses at once, inputting data automatically from Excel and outputting results just as easily to Word (as well as STATISTICA worksheets and reports), as well as the colorful and detailed 3-D graphics from a very complete menu of graphics. While on the subject, another very useful feature is the ability to right-click on any data cell in a spreadsheet and immediately access the ‘Graphs of Input Data’ feature which allows instant production of a variety of graphs from column data, a nice exploratory feature.
Data miner
This is worth a review in itself, as the latest features are rather astonishing, especially the capability to run many algorithms in series and to display quality features for each, thus allowing the analyst to pursue
![]() click to enlarge Figure 9: Graph options |
those models with the least error. The brag in the Data Miner manual is that STATISTICA contains the most comprehensive collection of methods available (although they do preface it with “To the best of our knowledge…”). A quick overview of the general methods available under the Data Mining menu choice is displayed in Figure 10.
The experienced analyst can select any methodology, but it is far better for the novice to try the wizard (Data Mining Project workspace) or the new Data Miner Recipe dialog box. Both will offer choices to clean, filter and transform the data, to select variable types and methodologies and, finally, to run the analysis. The great advantage here is that, for data sets of unknown underlying patterns, several to many analytic methodologies may be chosen and run in a single analyses, with summary tables showing error rates (for train versus test sets or observed versus predicted) for each method. The speed of analyses depends, of course, on the algorithm and the size of the set. So, it is always wise to clean or reduce as much as possible the data set and to choose the test method based upon prior experience. Hints, overviews, strengths and weaknesses for these analyses are covered in the electronic
![]() click to enlarge Figure 10: Data mining methodologies |
manual.
Summary
Like all software of this type, STATISTICA has a learning curve and takes adaptation on the part of the new user. However, due to years of feedback from users, the developers have done much to flatten the curve and to add user-friendly features. The biggest challenge is to not become overwhelmed by the s heer breadth of what is there. There is also a longing for return to many of the features of smaller programs that allow simple, instant actions such as click and drag and rotation of graphs. Novice users may get frustrated at the number of choices that must be satisfied prior to the actual running of an analyses, while the more experienced will appreciate the flexibility. As we have not even addressed features such as STATISTICA Visual Basic, STATISTICA Query and Web STATISTICA in this brief review, readers are encouraged to visit the Web site for more complete information (and to decide for themselves whether or not I have been overly effusive in this review).
Bottom Line: This is menu-driven statistics on steroids (lots and lots of steroids!).
Availability
Single User, Desktop Basic
$1,395 commercial
Single User, Industrial
$1,995 commercial
Data Miner and Enterprise Systems
call for pricing
StatSoft
2300 East 14th Street, Tulsa, OK 74104
1-918-749-1119; Fax: 1-918-749-2217
[email protected]; www.statsoft.com
John Wass is a statistician based in Chicago, IL. He may be reached at [email protected].