*User base broadens from scientists and engineers to business and social science areas*

In the past, I have extolled STATISTICA 10 statistical software for its ability to cover just about any type of test the user may need. Now, the product line consists of bits and pieces of the whole for special applications and/or limited needs. The new Data Miner product contains all of the routine and advanced statistical tests, as well as a number of very sophisticated mining routines. All of this is still wrapped in a very straightforward and, in my opinion, simple user interface. Although some of the nomenclature will be unfamiliar to the new user, much of it is still intuitive, and the developers have done their homework in keeping the learning curve gentle.

In these reviews, your editor generally likes to list the new features then delve into an actual example to display the capabilities of the software. With the breadth of these features it would be rather cumbersome to list them here. So, to summarize, there are

- a variety of new tests and routines
- speed increments for computational efficiency
- graphics technology that can automatically detect the newer high-performance features of the computer
- simplified ribbon bars
- enhanced connectivity and integration features.

To see the gory details (and an exhaustive list), go to: www.statsoft.com/products/statistica-10-new-features

As a statistician and life scientist, the simulator for distributions and covariance structures was rather impressive. It allows the user to fit theoretical distributions to limited sets of observed data and to simulate from the resultant distributions. As conclusions may be drawn from these simulations, they are quite useful for running “what if” scenarios and suggesting the most profitable future experiments. Now, on to the specifics…

For our example, we will select a common industrial problem in response optimization. (Aside: in STATISTICA 10, the very useful tutorials are cleverly hidden under Help/STATISTICA Examples). Statistically, we are looking at the advanced linear/nonlinear features in the software, specifically a type of generalized linear model (GLM).

According to the electronic manual that comes with the software, the procedures used in product development generally involve two steps:

1. predicting responses on the dependent, or Y variables, by fitting the observed characteristics of the product using a regression equation based on the levels of the independent, or X variables

2. finding the levels of the X variables that simultaneously produce the most desirable predicted responses on the Y variables.

Figures 1, 2 and 3 |

The specific example that is used concerns finding a chemical formulation that will produce an optimal tire tread compound. The problem is made somewhat more complex (and therefore ‘real world’) in that there are not one, but four dependent (Y) variables to be modeled and optimized:

•an abrasion index

•a 200 percent modulus

•elongation at break

•hardness.

The independent (X) variables are:

•hydrated silica level

•silane coupling level

•sulfur.

Figure 4 |

The data was originally analyzed by a central composite type of response surface design. The Data appear as shown in Figure 1.

We will proceed as follows: Select <Statistics/’Advanced Linear/Nonlinear Models’/ General Linear Models> and select Response Surface Regression (type of analyses) and Quick specs dialog as the Specification Method in the Resultant Dialog box shown in Figure 2.

When we hit <OK>, we fill in the variables in the resultant GLM Response Surface Regression Quick Specs Dialog, as in Figure 3. When we again hit <OK> we see the GLM Results box (Figure 4).

The design is a standard, three-variable, rotatable, central composite design with six center points. We can see this by hitting the <Design Items> button (Figure 5).

We now wish to specify the desirability function for each dependent variable. We enable the proper fields with the ‘Show desirability function’ check box. Desirability functions can then be entered, either from the literature or from historical data in the laboratory notebooks. Low, medium and high values are entered to indicate the inflection points for the desirability function for each dependent variable. The tutorial then supplies the following valuable information:

Figure 5 |

“Because the function does not specify any curvature in the “fall off” of desirability between inflection points, the values in the s (low) parameter and t (high) parameter fields can be left at their default values of 1.0, specifying linear changes in desirability between inflection points.”

Figure 6 |

Our profiler tab then appears, as shown in Figure 6. These specifications can be saved using the ‘Save specs’ button then retrieved using the Open specs button, so as not to overburden the user with repetitive typing.

Figure 7 |

STATISTICA can now profile the predicted responses and desirabilities. By clicking on the ‘View’ button the compound response profile graph is produced. The prediction profiles for the dependent variables are shown in Figure 7 as a series of graphs, one for each independent variable vs. each dependent variable. To maximize the useful information, the confidence intervals for the predicted values are shown and graphs of the desirability functions (again, one for each dependent variable) appear on the right.

As the overall desirability value is sub-optimal, the analyst now can change the inputs (independent variables) and examine the effects on the outputs (dependent variables) to see what optima and overall optimum can be achieved. There is an ‘Optimum’ button to assist with this. As with any graphic in this software, modifications are easily possible, including specifying the number of grid squares for better resolution.

Figures 8 and 9 |

2-D Surface plots and 3-D Contour plots for overall response desirability are easily produced with the ‘Profiler’ tab. These plots greatly assist in interpreting the effects on overall response desirability of different combinations of levels of each pair of independent variables (the remaining independent variables held constant), as shown in Figures 8 and 9.

By observing these 2-D and 3-D plots the analyst quickly sees that we have a broad flat manufacturing area around the optima. This is good news to the manufacturing staff, as it is possible to hit the optima with a wider range of the inputs and to hold the QA to a reasonable range. Another advantage is that these graphics, by the nature of their contours, tell the experimenter how great an effect one input has on the overall desirability.

This is just one (very useful) problem solving mode that STATISTICA can readily employ. As I have said in other reviews, it is very difficult to do justice to software of this breadth and depth in a short review. Since the program is now exceedingly “bulked up” with data mining capabilities, the power is immense.

STATISTICA always has had a very complete repertoire of statistical functions in an easy-to-use menu-driven format but the scope of the present edition broadens the user base from scientists and engineers, to the business and social science areas. There are now user forums and, as is true with most modern statistical software, an ever-expanding Internet presence. As many of the functions are intuitive, the reader is highly encouraged to download the free, 30-day trial copy.

**Availability**•STATISTICA Data Miner starts at $15,000, single user license.

•Basic STATISTICA packages start at $2,000.

•Call for academic and multi-seat pricing.

**StatSoft**

2300 East 14 Street, Tulsa, OK 74104

1-918-749-1119

www.statsoft.com

*John Wass is a statistician based in Chicago, IL. He may be reached at* *[email protected]**.*