click to enlarge Figure 1: Mean/median/standard deviation for two points |
Readers of my columns (the few, the proud, the technogeeks) will know that one of my pet peeves is technical computing upgrades that are mainly cosmetic, have very few substantial additions to the analytic menu, and are accompanied by enough superlatives to announce the second coming. Well, this isn’t one of those. The Wolfram folks may have let the marketing department run riot with some of the flash, but Mathematica 6 is light years ahead of a minor upgrade (hyperbole all mine).
For starters, the experienced user will notice a new informational palette upon bootup. This is a nice introduction and includes choices to list the new features, Demos, a ‘First Five Minutes’ orientation, and the link to the Documentation Center. This last appears in a new format that allows the user to drill down and get more information on such core features as the language, mathematics, algorithms, I/O functions, graphics, notebooks and the new dynamic interactivity functions. In this review, I can barely touch upon just a few of the more important (and impressive) new capabilities, so the interested reader is directed to the Web site for a more complete list of the new functions and features (Table 1).
This review will concentrate on the following functionality:
• fully automated graphics layout and dynamic interactivity
• numerical integration
• equational theorem proving
• exploratory data analysis
But first, a few words on the system requirements….
Requirements
Although Mathematica works well on Linux and Mac systems, this review is directed to the Windows user. Mercifully, the developers have made this version compatible with a wide variety of Windows versions, Me through Vista. This review was done on the new Vista system, and no problems were encountered either in downloading from the Web, or in installation. The installation does take quite a bit longer than the older versions of Mathematica, but there
click to enlarge Figure 2: Mean/median/standard deviation for many points |
is so much more here that this is understandable. The download size is 500 MB, and 512 MB is the minimal useful memory. However, keep in mind that, as the size and complexity of the problem increases, so do the memory requirements. This is especially true of graphics and large matrix operations. A hard drive larger than 2.5 GB is useful and should not be a problem on today’s desktop PCs.
Now, on to the features, beginning at the beginning…
Graphics
As I frequently tell my students, graph the data first, then do the exploratory analysis. In point of fact, I consider the initial graphics as part of the exploratory analysis, but as to be so important as to precede everything else. In any data set, we would like to assess integrity, especially if imported from another source (the usual case), look for missing data, and delve for any patterns or trends, expected or not.
Table 1: New Functions and Features
Functions:
reference.wolfram.com/mathematica/guide/NewIn60AlphabeticalListing.html
Features:
reference.wolfram.com/mathematica/guide/SummaryOfNewFeaturesIn60.html
For a set of numbers randomly generated from a given distribution (or just imported from somewhere), the analyst can easily calculate simple, descriptive statistics such as mean, median, mode and standard deviation to characterize the data. Data is simply imported
click to enlarge Figure 3: Double helix |
from Excel with the Import[path] command. Descriptive statistics are easily generated by preceding the data with the command indicating the statistic desired. The nifty new part is the addition of the Manipulate command to the original commands that generated the statistics. Now, by also specifying a range of values for one or more of the variables, an interactive graphic is instantly produced to allow the analyst to see how a parameter can move as the values of the variables change. The example here shows how the mean, median and standard deviation change as more points are added from either a uniform or normal distribution. In Figure 1, only two points define the parameters (slide bar is at the extreme left; uniform distribution).
In Figure 2, many points have been added. Notice that the median and the mean are now distinct, and the standard deviation has narrowed. Now imagine generating a set of points for any given distribution, generating these interactive graphics, and automatically running the simulation to see the effect of N (the number of points) upon any given statistic.
The applications are actually limitless, and, should you think that they are confined to mathematics, take a look at Figures 3 (biology), 4 (education) and 5 (physics). Wolfram’s Web site alone has over
click to enlarge Figure 4: Shortest distance between a point and a line |
1,300 demonstrations such as these, so the analysts are bound only by their imagination. The obvious uses are not only as simple teaching tools, but as visualizations of complex physical and biological phenomena. These visualizations are among the more sophisticated tools that may be used to understand the underlying mechanisms of action, thus another valuable research tool.
Now, on to numerical integration….
Numerical integration
Mathematica contains a general numerical integrator (the function is NIntegrate) that handles numerous n-dimensional integrals by estimation through sampling over the integrated region. The various methods used dictate the sampling methods and initial plus subsequent steps. The strategies used attempt the estimates with any specified precision and accuracy goals, given a well-defined problem and adequately conditioned parameters. Weighted sums are employed
click to enlarge Figure 5: Wave interference patterns |
for these computations. There are many specialized algorithms used for specific circumstances.Symbolic preprocessing is used by the NIntegrate function to simplify piecewise, even and odd functions. When this preprocessing is not used, Mathematica will generate an automatic “slow convergence (slwcon) message with hints as to the nature of the problem, e.g. an integration value of zero, a highly oscillatory integrand or small precision.
There are several methods used to deal with singular integrands, and all are deterministic, adaptive strategies that will speed the convergence and integration process. Here is an example with and without singularity handling (notice the timing).
The first example in Table 2 shows a one-dimensional integration with singularity handling. Without singularity, handling the previous integral is computed more slowly (Table 2, Example 2).
Of course, all of the built-in functions can be used to construct larger, more specialized integrators with specific user-defined parameters.
Equational theorem proving
Perhaps one of the most powerful new functions is the bulked-up FullSimplify function (and its cousins Find Instance, Resolve and Reduce) that can be very useful in proof of theorems. Basically, FullSimplify will apply a range of transformations upon expressions
click to enlarge Figure 6: Cluster analysis |
in an attempt to return the “simplest” form possible. Its facility may be extended by adding assumptions. In the past, this could be used for algebraic theorem proving, but is now extended to equations and adds further capabilities, such as operating on more abstract systems of axioms and relations. The third example in Table 2 is from abstract algebra.
My not-so-secret desire here (and with all of the more complex calculations in the areas of calculus and differential equations) is to see Mathematica spew forth all of the intermediate steps of the proof or calculation, i.e., the solution rather than just the answer! What a phenomenal teaching tool for the mathematically challenged!
Exploratory data analysis
Moving once again into familiar territory, the exploratory data analysis capabilities are an advance for Mathematica, but more of a “ho-hum” for the data analyst who is accustomed to using graphic-/diagnostic-rich, menu-driven programs such as JMP, MINITAB and SPSS. This is not to say that the
click to enlarge Table 2: Examples |
capabilities are not useful or are not speeding the analysis. However, many menu-driven sources can get at these functions and more with a few simple mouse clicks.
The exploratory data analysis functions include clustering, binning, smoothing and nearest-neighbor analysis, as well as some basic statistics. Mathematica’s power here is the performance of these analyses with a variety of data, not just numeric, and with arbitrary distance functions in many dimensions. Here, we will ignore the physical interpretation of what solutions in many dimensions indicate. This, of course, may be a non-trivial concern in many physical science problems, but rapidly becomes intractable when considering biological processes.
In any event, it is a fairly straightforward matter to import a data set, calculate a few descriptive statistics, and then perform the finer work such as similarity measurements and cluster analysis. In Figure 6, Mathematica has clustered some simulated data. Note that the code remains fairly simple.
In summary, with the addition of almost 1,000 new functions, as well as extensive interactive graphics capabilities, Mathematica has become an even more useful analytic platform that finds the widest utility in the science, engineering and educational fields. I expect that this may be increasingly true in many areas of applied economics. There are many demos freely available on the Web to further illustrate the new features.
Availability
• Standard $2,495
• Educational $1,095
• Student $139.95
Wolfram Research
100 Trade Center Drive
Champaign, IL 61820-7237
1-217-398-0700, 1-800-WOLFRAM
Fax: 1-217-398-0747
www.wolfram.com
John Wass is a statistician based in Chicago, IL. He may be reached at [email protected].