The Softer Side of Genomics
Mike May, Ph.D.
In March 2011, Thomson Reuters Science Watch rated genetics as the hottest research area. Around the same time, the U.S. Food & Drug Administration approved the first genomics-derived drug — Benlysta for lupus from Human Genome Sciences (Rockville, Md.) and GlaxoSmithKline (Brentford, U.K.). These bits of news also arose during a seemingly unending increase in sequencing power, from next generation to next-next and beyond. All of these aspects of genomics indicate the ongoing need for advances in software. Researchers need tools to track data in today’s complex experiments and to analyze the output.
When asked about the biggest challenges in using genomics software today, data analyst Johanna Swanson, who works in Deborah Nickerson’s laboratory at the University of Washington, says, “For our lab, it’s scalability. We need to have the computer power to do the analysis and also flexibility in what you can do since the science is changing constantly.”
Beyond scaling in terms of technology, genomics software must also fit the needs of a wide range of users. “There are both advanced users and general users,” says Mindy Zhang, a senior scientist who works in the functional genomics group at Genzyme (Cambridge, Mass.). In addition, the range of users puts the software to use in different ways. “You need a group of software to meet the different user needs—all with minimal overlap,” says Zhang. “It’s quite challenging.”
As an overview of some of the categories of genomics research, Zhang mentions handling data from arrays, genetics, and next-generation sequencing (NGS), as well as data mining. In considering researchers working in all of these areas, she says that you must think about how many applications people can handle. “We have all this different software, but we want to make sure that people understand what’s going on behind the user interface.” That’s especially difficult to do since you “need to revisit software every year. [A] program that is the greatest one year can be obsolete the next,” she says.
Staying on track
To keep track of data, Swanson and her colleagues use the GenoLogics LIMS, or lab information management system. GenoLogics (Victoria, B.C., Canada) made this LIMS software just for genomics labs. “It tracks what tests we do to samples from the moment they enter the lab until we hand them off to the sequencers,” Swanson explains. “It tracks plates, transfers, sample prep, or any quality control steps.”
The complexity of modern genomics research requires such tracking. “Some labs have been using manual or semi-manual process or homebrew solutions,” says Bruce Pharr, vice president, products and marketing at GenoLogics. “Given the expense of some modern techniques, including NGS, it pays to prepare samples as efficiently as possible. For example, a lab might run samples from multiple patients—even samples from different labs—in one NGS run,” says Michael Kuzyk, PhD, senior product manager, genomics at GenoLogics.
Such a variety of samples means some researchers want the ability to modify software. That’s why GenoLogics developed its rapid-scripting application programming interface (API). “This allows on-the-fly adaptations to workflows,” says Pharr.
Kuzyk adds, “With the rapid-scripting API, users with some programming capabilities can ensure that any data-analysis pipeline they generate can be seamlessly integrated into our own system—all without our assistance.” For customers who lack that programming skill, GenoLogics helps there too. “We can customize our software for customers where needed,” says Pharr.
Swanson points out that the Genologics software is pretty easy to use. “We picked it up in a few weeks,” she says. “Our techs use it through a [graphical user interface]. They can show that, for instance, they moved a sample from this plate to that one and what reagents they added.”
To make this software even easier for some users, GenoLogics and Illumina (San Diego, Calif.) signed a co-selling agreement. In short, researchers who use Illumina sequencers and kits will find that the GenoLogics LIMS is ready to run the sequencing. “It includes preconfigured workflows so you can be up and running right away,” Kuzyk explains.
Getting a jump on statistics
The increasing complexity of genomics research also demands new ways to analyze the results. “Many studies are now conducted at the whole-genome scale,” says Zhang of Genzyme, “so it becomes essential that software includes a comprehensive stats tool.”
To get that statistical power, Zhang and her colleagues use JMP Genomics from SAS (Cary, N.C.). According to Shannon Conners, a product manager for JMP life sciences, the latest JMP Genomics 5 needs to handle a range of users. “There are biologists at a bench who need genomic analysis and software to make sense of their data,” she says. “There are also biostatisticians who want to do even more detailed analysis.” That breadth of applications creates a challenge.
As Conners explains it: “You need workflows that are simplified for nonexperts but provide enough flexibility and openness of code to satisfy biostatisticians.” To help bridge that divide, SAS built JMP Genomics with an open architecture. When a biostatistician wants to “look under the hood,” it’s possible. For a nonexpert who just wants to “run the engine,” that’s possible too.
Expanding applications
When asked how drug discovery and development researchers might use genomics software, Conners quickly starts down a list. “On the expression side and in early studies,” she says, “researchers might look at profiles to explain how a drug works—looking for the mode of action and which pathways get turned on or off.” She adds that pharmaceutical researchers also explore pathways in search of adverse side effects.
With expanding genomics capabilities, researchers also dig deeper into an individual’s genome. “They often look at how individual variation impacts someone’s response to drugs,” Conners says. “They even look at profiles of expression changes in cells and try to relate that to genetic variants.”
Genomics research also plays a role in developing diagnostics. “This can be used to see which set of patients should be given one drug or another,” Conners says. “We see lots of predictive modeling efforts aimed at finding biomarkers that indicate who will respond —o r not — to a particular drug.”
Instead of continuing the list, Conners just says, “Genomics is being used throughout the process, and it’s not always with the same platform or tool.”
Consequently, SAS keeps adding features to JMP Genomics. Conners says that recent advances include a server version of the software and analytic tools that collectively examine groups of rare and common variants. She adds, “For example, you can group variants by gene pathways or within a locus to look for a relationship with a specific outcome or trait.” For NGS, Connors says, “We’ve added import tools for standard formats, with more coming for JMP Genomics 5.1. We also support text files output from partner software, such as GenoLogics.”
According to Zhang, “Statistics make up the strength of SAS software. It is a very effective tool for predictive modeling.” Here, the SAS software is getting even better. Conners points out the addition of tools for predictive modeling. “You can call different SAS procedures—behind the scenes—that are made for predictive modeling and tailored to genomics data sets,” she says.
To keep genomics as a hot area of research, scientists need ongoing improvement in software. This technology fuels deeper mining of the genome, and that unearths increasingly valuable discoveries that explain biology and can be used in clinical research and practice.
About the Author
Mike May is a publishing consultant for science and technology based in Austin, Texas.