It is always a pleasure to review programs with which one is intimately familiar. To this reviewer, that means JMP and SAS, as well as a few others. Although very proficient with both, JMP Pro is a relatively new experience. As there is a great need in industry and government for what is called predictive analytics, the additions to JMP Pro add advanced techniques and diagnostics/graphics to ensure that the needs of the widest possible audience are met. As a bit of history, among statistical programs, JMP was the pioneer in joining just about every analysis to a graphic. As one graph may be worth a thousand equations and lines of proof, this was a great advancement in bringing statistics to the masses. This is a program that may profitably be used by not only statisticians, but also scientists, technicians and business types. Now to the details…
The following represents an overview of the new features in JMP Pro:
- The Cols menu has a new Modeling Utilities submenu that also includes ‘Make Validation Column.’
- You may launch the utility for creating validation columns directly by clicking the Validation column role in any platform that supports it.
- A new Covering Arrays platform enables you to design an experiment to maximize the probability of finding defects while minimizing cost and time.
- Performance is greatly improved for larger problems using the Mixed Model or Generalized Regression personalities of Fit Model.
- The forward stepwise class of modeling is now supported in Generalized Regression.
- Additional distributions are available in Generalized Regression, including Quantile regression.
- Variograms, for determining which spatial correlation structure is most appropriate, are now available in the Mixed Model personality of Fit Model platform.
- The Discriminant platform now supports validation.
- K-nearest neighbors prediction is added to the Partition platform.
- The PLS Discriminant analysis fits a PLS regression with a categorical response.
- The Bootstrapping model predictions are now supported for all PLS model fits.
- Enhancements to Reliability Block Diagram include the option to construct series or parallel diagrams from existing blocks.
- There is a new Cumulative Hazard Profiler.
- There is now a by-component overlay plots in CDF, PDF, Hazard and Cumulative Hazard.
- A component integrated importance report is now included.
- System designs and library items that are easier to copy and paste for system design template creation are also included.
Key features of the new version may be lumped under the broad headings of:
- predictive analytics
- cross-validation
- model comparison
- generalized regression
- advanced multivariate techniques
- reliability block diagrams
- covering arrays
- additions to the mixed models profile
- uplift models
- advanced computational techniques
- easily connect to SAS
- enhancements to the sharing and communication of results
As your editor is most familiar with predictive analytics, advanced multivariate techniques and model comparison in JMP Pro, let’s take a closer look at these.
Predictive analytics includes the model building that is highly popular at present and is so well-used as to be a buzzword. Actually, statisticians have been using techniques to model data and predict future outcomes for some time but it only became widely popular in the last five or so years. These techniques fit very large databases and (hopefully) generalize well to new data. JMP Pro contains the decision trees and neural network classes of modeling that are among the most useful in building predictive models.
Also included in these techniques is the partition platform which implements splitting and pruning of the trees, whereby the analyst can step through a series of operations and evaluate the results with the excellent diagnostics such as G-squared and log worth, as well as the ability to split by different criteria, and the generation of a rule-set for splitting. If the data classes are coherently colored, the graphic is made even more illustrative of the splitting efficiency. The technique also makes use of cross-validation, as do the neural network techniques.
Under the heading of advanced multivariate methods, JPM Pro has partial least squares (PLS) and variable clustering. The former addresses the common problem (especially prevalent in the newer field of genomics) of the data set that has many columns but, comparatively, very few rows. It also works well with highly correlated column variables, another ever present problem in genetic data sets. In these cases, using ordinary least squares would produce biased estimates of several parameters. The PLS models can be constructed with continuous or categorical responses, specify interaction effects and implement missing value imputation.
Variable clustering is very useful when attempting to reduce the number of variables in a model to simplify calculations. This is frequently done with Principal Components Analysis (PCA) but, due to the complexity of the component vectors, as well as the danger of going into more than three dimensions to effect an optimal splitting, the results are sometimes difficult to interpret. In variable clustering, if the data is naturally clustered, JMP Pro will automatically select the most representative column in the cluster, thus enabling more efficient specification of the model terms.
The last item that will be addressed, and JMP PRO is loaded with great additions, is the model comparison feature. This is probably one of the most useful tools that could be incorporated in modern software for model builders. The reason is simple. Unless the data is so clean and straightforward that only one type of model is obviously sufficient, the modeler will attempt several different models based upon the complexity and structure of the data, the sheer size of the data set, and theoretic reasons for fitting data in a particular model structure. JMP PRO will do this quickly and automatically. As each model is formed, common quality measures (e.g., R-squared, misclassification rate, ROC curves, and AUC) may be used to assess the “goodness” of the model. JMP PRO will then compare all of the saved prediction columns and pick the best combination of goodness of fit, simplicity of the model and, of course, its cross-validation feature. You may also use visual profilers to see and compare the features that each model has discovered. Quite valuable in assigning function or elucidating pathways in scientific work. JMP PRO also has many other capabilities of use to the business community.
This software is used alongside of SAS by many statisticians and may very well be used by non-statisticians. Highly-recommended.
Availability
- JMP Division of SAS
SAS Campus Drive, Building T
Cary, NC 27513-2414, USA
919.677.8000
http://www.jmp.com
John Wass is a statistician based in Chicago, IL. He may be reached at [email protected].