While not for the faint-of-heart, a diverse group of fields have adopted this type of analysis
This month’s column is devoted to another very useful bit of freeware. WinBUGS is a Bayesian analysis program for Windows. It comes to us from the Imperial College and Medical Research Council of the UK. For those of you unaware of the centuries-old feud between certain elements of the statistical community, let’s just say that the mainstream community (called frequentists) and those previously in a small minority (called Bayesians) disagreed upon the efficacy of using certain prior informational estimates in data analysis. Of late, it seems that both are well accommodated in the scheme of things and there is far less distance between the two.
As natural scientists and engineers, we may all be considered Bayesians in the sense that we will use all previously gathered information to make estimates of what may be happening in a present or future experiment. This software somewhat simplifies the task. A downloadable manual is available for techniques and examples, and a (somewhat) active users group may be consulted for advice. For further information on the theory, the new user is referred to a formal course on Bayesian analysis as well as the Classic BUGS Manual. WinBUGS is easily downloaded from the Web site and an unlock code, supplied free, is needed on a yearly basis.
Bayesian analysis makes use of prior and conditional probabilities and is based upon Bayes theorem for calculating the probability of an outcome given additional evidence. WinBUGS makes use of several techniques such as Gibbs Sampling and Markov Chain Monte Carlo (MCMC) simulations to estimate these probabilities and place confidence intervals around them. Warning! Although the program is Windows-based, offers a convenient graphical user interface, and includes several automatic graphics and diagnostic features, all of the analyses are done with code, thus programming. That won’t be a problem for many of our readers, but it is always necessary to prepare the newcomer. Also be aware, as the manual takes pains to stress, these techniques are not geared to the novice analyst and it is very easy to generate incorrect answers with unreasonable estimates (and sometimes with reasonable estimates!). The level of difficulty of these problems can be assessed from the following passage in the manual describing the MCMC technique:
“Having specified the model as a full joint distribution on all quantities, whether parameters or observables, we wish to sample values of the unknown parameters from their conditional (posterior) distribution given those stochastic nodes that have been observed. The basic idea behind the Gibbs sampling algorithm is to successively sample from the conditional distribution of each node given all the others in the graph (these are known as full conditional distributions): the Metropolis-within-Gibbs algorithm is appropriate for difficult full conditional distributions….”
Now, let’s walk through a “simple” example. During this exercise, your editor discovered several severely non-intuitive steps that are not always clearly spelled out in the manual, but was fortunate enough to have an experienced colleague walk him through the analysis. WinBUGS gives many examples (we use ‘Seed’ here) and it is best to learn the programming through the use and modification of already written code.
The first step is to import data from a source, usually Excel in this editor’s environment. The good news is that data can easily be cut and pasted in, the bad news is that the format must be acceptable to WinBUGS. Particular care is needed for headers and the columnar format must be fed to the program in a linear form that will be quite familiar to Mathematica users. As an example is being used, the data appears in the proper format to be employed later in the process.
The first step then will actually be checking the model given. The ‘Seeds’ example is a random effect logistic regression problem, and concerns the number of seeds germinating on a given number of plates arranged as a 2 by 2 factorial design. A graphical model of the process is presented and may be consulted for parameter/variable type and model details by clicking on the various elements. That the code is correct (only as far as syntax goes) is determined by double clicking on the word model and choosing the Model/Specification option from the menu bar. If the program gives its ok, the code is loaded and compiled with a similar menu function. (At this point, I really longed for the run button in SAS!)
After compiling, it is necessary to initialize the variables via a dialog box, specify number of sample draws for the various nodes, and enter the nodes to monitor. Once the program actually starts running, progress may be monitored by convenient graphics. These graphs also serve a diagnostic function as the suitability of the convergence (behavior of the curves) may be assessed. The sample monitoring tool offers several other very useful diagnostics including trace, history, autocorrelation and kernel density curves, as well as the actual statistical outputs of the parameters and quantiles (Credible Intervals in Bayesian parlance) around them. These outputs constitute the estimates of the posterior probability distributions, which may be further utilized in statistical inference. It is a shame that the manual does not discuss these examples further, offering guidance and highlighting specific pitfalls, but the documentation by other sources, listed at the home site, is far more inclusive. As a plus, many macros have been written for this software so code may be run from SAS, MATLAB and R, among others.
While not for the faint-of-heart, this type of analysis is routinely done by statisticians and has been readily adopted in fields as diverse as genetics, engineering and economics. And the price is right!
Availability
WinBUGS freeware is copyrighted by Imperial College and Medical Research Council, UK, and is available at www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml
John Wass is a statistician with GPRD Pharmacogenetics, Abbott Laboratories. He may be reached at [email protected].