Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

The R Language: Fun with statistics and programming

By R&D Editors | August 31, 2007

 

 
click to enlarge 


Image and 3-D plot of Maunga-whau (Mount Eden), a 196-meter volcano in the Auckland, New Zealand, volcanic field. Image courtesy of R Foundation2 

As a statistician, your contributing editor can personally vouch for the joys of data analysis, but it takes a lot to make him say that programming can ever be construed as anything other than sheer agony. Here is a small exception. For many years, R has been gradually taking its place alongside the venerable pillars of academic statistics such as SAS, S-Plus and Minitab. The primary reason may be price (it’s free) but, given the ease of learning the code, as well as ease of obtaining many well-written (and sometimes even well documented) routines, this language is well worth a try. For the faint-of-heart there is even a Windows GUI to ease the pain.

The R language was developed specifically as a programming environment for statistics and graphics. It is a freeware project, similar to the S language developed at the Bell Laboratories. Although there are differences, unaltered S code will run, in many instances, under R. According to the Web site (www.r-project.org), a variety of highly extensible statistical and graphical techniques are available, and some of the strongest features are the presentation-style graphics which are under full user control. The software runs on a wide variety of UNIX platforms, Linux, Windows and MacOS. R is a fully integrated suite of routines for data manipulation, calculation and graphics. Users may add routines and define these as new functions. For jobs requiring intensive computation, R can link C, C++ and Fortran that is then called at run time. Eight packages are available with the download, and many more are available at various Internet sites to extend the capabilities. New packages are constantly appearing, and user sites spring up to allow communication of novices with the more experienced. Both base system and add-ons are distributed through the Comprehensive R Archive Network (CRAN) that may be accessed at the site referenced above.

Although the new user can access necessary helps and manuals through the site, I used the Everitt and Hothorn handbook1 as a guide. Downloading the software is quite easy, and the Web site contains the full information. The GUI that is produced is perfectly ‘Windows-like’ and contains the standard menu bar and specialized toolbar. This allows not only the expected copy/paste/save/print functions, but also very useful transfer tools for copying and pasting commands from sub-windows to the main window (called a Console), plus another for quickly switching between consoles. Commands are typed at the prompt and output appears on the following lines.

Unfortunately, some of the more important and frequently performed operations that are easy to do in a menu-driven environment take a lot more care (and typing) in the programming environment. Although R can import data from a variety of SQL base engines, including Excel spreadsheets and standard statistical programs such as SPSS, actually querying these databases is non-trivial and requires consultation with the ‘R Data Import/Export’ Manual. Most spreadsheets may be accessed as .csv files, with care taken to specify header and row vectors. Once data is entered, however (and assuming that all necessary packages have been successfully loaded), the pain is greatly reduced.

As with the menu-driven programs, a great deal of the effort comes with the data manipulation steps prior to the actual analysis. Depending upon what needs to be done, this can be relatively straightforward (or not!). For the simple spreadsheet examples that I used, extracting subsets of data was relatively easy with just a few, almost intuitive, commands. Summary statistics and detailed analyses were performed with short code sequences, as in SAS. For example, a multiple linear regression with appropriate scatter plots was constructed with 15 lines of code, including model specification, design matrix, model fitting and plot instructions.

As with SAS, I found that once the instructions for importing data were mastered and the relevant data structure formed, things went a lot faster. I still cannot quickly produce intricate and colorful graphics to match those that effortlessly spring from JMP and many graphics packages, but with practice and communication with the R community, things should quickly improve. As the price is unbeatable and the user community rapidly growing, I would highly recommend this package to the cash-challenged and adventurous.

Availability

R Foundation for Statistical Computing
c/o Department for Statistics and Mathematics
Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria
+43 1 31336 4754, Fax: +43 1 31336 774
www.R-project.org

Resources

1. Everitt, Brian S. and Torsten Hothorn. A Handbook of Statistical Analysis Using R. Chapman & Hall/CRC, Boca Raton, FL (2006). ISBN: 1584885394.
2. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2005). ISBN 3900051070, www.r-project.org.

John Wass is a statistician based in Chicago, IL. He may be reached at [email protected].

Related Articles Read More >

Why IBM predicts quantum advantage within two years
Aardvark AI forecasts rival supercomputer simulations while using over 99.9% less compute
This week in AI research: Latest Insilico Medicine drug enters the clinic, a $0.55/M token model R1 rivals OpenAI’s $60 flagship, and more
How the startup ALAFIA Supercomputers is deploying on-prem AI for medical research and clinical care
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE