Statistics in a Nutshell: A really nifty practical guide

The title here should include “on a Number of Levels,” as this simple text has so much to recommend it. No, it is not watered-down statistics! It is a practical guide to quickly getting the reader up to speed on those items most used in describing data, performing hypothesis testing and conveying some of the more fascinating aspects of statistics in research. Having gone through dozens of introductory statistics texts, teaching from some and being taught from others, it was refreshing to find a book that avoids the usual approach, i.e., not teaching statistics as a series of fixed obstacles that need to be overcome!

The publishers (O’Reilly) will be familiar to many readers as the purveyors of books on programming, each with interesting zoological covers spanning the animal world. Their foray into statistics is welcome, if this text is typical of their philosophy. It is stated on the back cover that this book conveys a “solid understanding of statistics without the numbing complexity of most textbooks.” They do this in 390 pages of one of the most well-organized and relevant treatments that has come across my desk in some time (and yes, there are many worked examples). If the book can be faulted for anything, it’s an over-reliance on software for computation, where I would put an emphasis on manual computation of simple examples, followed by software use for the more realistic problems (they do follow this pattern to some extent, however). As such, the book is not a standalone text that should be used as the first teaching tool. As the title suggests, it is a good reference work.

Whereas most readers will skip or speed through a preface, I would encourage the reader to slow down and savor this piece. It acquaints the reader with the authors’ philosophy in writing the book, explains why understanding the data and correctly drawing conclusions from it is so important, gives a brief overview of the focus and chapter organization, and includes a few words on symbols and conventions. In addition, there is a very useful paragraph with Web pages for questions and errata, which encourages the reader to contribute and keep the book fresh and “relatively” error-free. Currently, the site lists five serious technical errors, 11 minor technical errors, and five typos, not really that bad for a technical tome. (Admission of guilt: I only found four of these errors and two typos upon first reading).

As a reference, non-textbook work, the volume covers
• data types (nominal, ordinal, interval, ratio and the continuous/discrete split)
• probability (you didn’t think that you could avoid that, did you?; but they do introduce the really useful Bayes’ theorem)
• data management (logically treated as an approach and NOT a set of recipes)
• descriptive statistics and graphics
• inferential statistics
• t-tests
• correlation coefficient
• categorical data
• nonparametric statistics
• introduction to general linear methods
• ANOVA
• regression
• multivariate analysis

In addition, there are very useful chapters on
• research design
• critiquing statistics (perhaps one of the most important lessons)
• business and QA applications
• medical and epidemiological statistics
• educational and psychological statistics

The authors have covered several bases by dividing the topics up both by the test methodology and by the area of application.

Perhaps the only real omissions/shortcomings from this introductory book (in my opinion, of course) come from the lack of updated software listings and the philosophy that, due to the ease of acquiring reliable software, “…the need to understand and interpret statistics has far outstripped the need to learn how to do the calculations themselves.” Perhaps outstripped is a good word in this case, as it has certainly not done away with the need for understanding. As to the software omissions, the authors tell us a little bit about Excel (dangerous for the non-statistician), SPSS, R, SAS and Minitab. They fail to mention popular and widely used packages such as JMP, STATISTICA, SYSTAT, SigmaStat, Stata and Prism, as well as some of the packages from across the pond such as GenStat. Obviously, the authors can’t be faulted for not listing every major package and have confined themselves to what they believed were most used in their experience. However, from years of reviewing statistical software, these are major omissions. Still, I stand by the many pluses cited and summarize thusly: highly recommended!

Availability
Statistics in a Nutshell: A Desktop Quick Reference, by Sara Boslaugh and Paul Andrew Watters. O’Reilly Media, 476 pp. (2008), paperback, $34.99.

John Wass is a statistician based in Chicago, IL. He may be reached at [email protected].

Related Articles Read More >

Unlocking the value of your scientific data

Sofar Ocean debuts Maritime Open Standard, Bristlemouth, at OCEANS 2021

The natural resources industry can no longer afford to be a digital laggard

Cambridge Quantum develops algorithm to accelerate Monte Carlo Integration on quantum computers

Search R&D World