Four Months to Full NGS Output
How the University of Washington scaled up with a LIMS
When the University of Washington (UW) received a $23 million portion of a $64 million grant for the Large-Scale Exome Sequencing Project from the National Heart Lung and Blood Institute (NHLBI), investigators knew the clock was ticking. Grants generally have a limited shelf-life and this was no exception. The UW’s Northwest Genomics Center would need to sequence a total of 4,000 exomes over two years — an ambitious goal in a tight timeframe. Moreover, DNA would come from large cohort studies such as the Framingham Heart Study and the Women’s Health Initiative. The grant consequently mandated that investigators be able to track samples and associated data from the moment samples arrived in the lab to the point when they reported results. The pressure to produce actionable data that would advance scientific research and improve outcomes quickly was significant.
The project aims to identify genetic connections to heart, lung and blood diseases, and also explores the application of exome sequencing to analyze sequences from 12 well-phenotyped heart and lung cohorts. UW’s Northwest Genomics Center and the Broad Institute will analyze 7,000 exomes to find genes linked not just to heart, lung and blood diseases, but also to cancer and autism.
The Northwest Genomics Center knew it would need to scale rapidly, so it invested in a commercial laboratory information management system (LIMS) specifically designed for next-generation sequencing (NGS). It then determined what sequencers, equipment and staff would be needed for the project. In just 12 weeks, a little-used storeroom had been converted to a full-scale NGS research lab. Four weeks later, the center had installed the GenoLogics LIMS, was simultaneously running 11 genome analyzers and had doubled its scientific staff. Today, the lab produces more than 130 gigabases daily.
UW was well-positioned to serve as one of the project’s two primary sequencing centers. Mark Rieder, Ph.D., research associate professor at UW and one of the principal investigators at the Northwest Genomics Center, and Debbie Nickerson, Ph.D, professor of genome sciences at UW, are leaders in medical sequencing of cardiovascular, blood and lung diseases. Other UW principal investigators on the project are Dr. Jay Shendure, a pioneer in NGS development and application; Phil Green, Ph.D., a world leader in developing new software tools for sequence analysis, including the tools that helped to generate the first human genome sequence, and Dr. Michael Bamshad, who the NHLBI tapped to lead the lung disease component of the national project.
The Northwest Genomics Center faced several challenges. First was determining how to trace, track and manage samples associated with the project. Some of the samples would be limited in quantity and most would be irreplaceable.
“We wanted to be able to really trace what was done to plates and what was done to samples and be able to see exactly what happened. And we needed to be able to find this information months, or even years, later and not have it spread across paper notebooks, Excel spreadsheets and Google docs,” said Johanna Swanson, previously a scientific programmer at the Northwest Genomics Center.
Accurately tracking samples and what was done to them meant the lab needed visibility across the entire sequencing workflow. Samples would need to be labeled uniquely in order to be tracked, and researchers would need to track tubes, plates, flow cells and reagents — everything with which samples might interact. Lab staff also needed to monitor instrumentation and sample status, to schedule instrument runs to avoid conflicts, and to keep processes running efficiently.
Automation was another consideration. Researchers needed to generate robotic work lists automatically, capture data directly off robots and minimize potential errors associated with rotating or swapping plates. Additionally, the center already had developed its own custom analysis system (Integrated Sequencing Information System, or ISIS) to handle sequencing automation analysis and quality control (QC); any system implemented for sample tracking would need to integrate seamlessly with ISIS. Finally, with ever-changing protocols, the genomics LIMS would need to be flexible and adaptable.
To accelerate the implementation process, the Northwest Genomics Center hired two programmers to adapt the system to support the center’s various specialized sample handling and processing workflows (Figure 1). The changes would be made using the Rapid Scripting application program interface (API) available in the GenoLogics LIMS. The API enables staff to create custom scripts to tailor the LIMS functionality beyond that available out-of-the-box. It is built to accommodate open source or commercial bioinformatics tools; the Northwest Genomics Center coded scripts in Groovy.
In just four months, the center had configured the LIMS to support the lab’s existing sample quality control, exome capture and sequencing workflows. Two months later, the lab had fully scaled to eleven Genome Analyzer II (GAIIx) instruments. Today, the lab has eleven HiSeqs, five GAIIxs, seven cBots, and employs 25 staff in the NGS lab.
The center’s qPCR workflow has changed several times as the lab has updated protocols and added instrumentation. One of the first workflows changed is shown in Figure 2. The initial workflow (Figure 2a) took seven days and required extensive manual effort, particularly during size selection and amplification. Implementing Roche NimbleGen technology enabled the center to shave a full day off the workflow and removed the time consuming “tube work” (shown in purple) that created bottlenecks in the process (Figure 2b). The workflow begins with a template that researchers complete to record sample information. The form is verified and imported into the LIMS; sample IDs will then be tracked throughout the project. The complete history of a sample, as well as links to all data files and details regarding processes applied to the sample, are easily viewed and accessed in the sample genealogy stored in the LIMS.
Most of the new qPCR workflow occurred on 96-well plates, which sped sample handling and prevented potential errors associated with transferring samples from plates to tubes. More significantly, the steps shown in green in the new workflow were handled entirely by Nimblegen and initiated with a robotic work list generated by the LIMS. Implementing this new, streamlined workflow took three weeks, including one week to script and the remainder to ensure smooth implementation and plate transfer. Since implementing the initial workflow, Swanson has added scripts to accommodate new library prep protocols and the lab’s HiSeqs and cBots. Each of these changes in procedure has been managed through the Rapid Scripting API, enabling Swanson to swap out scripts without disrupting the overall workflow.
Implementing the LIMS also enabled the Northwest Genomics Center to streamline key sample QA/QC steps by collecting data automatically to inform go/no-go decisions about individual samples or to validate sample tracking. Flags in the system direct scientists to data points that need attention. The LIMS assists with the following decision points:
- Production of a sample manifest: Does the manifest match the barcode labeled plate?
- Picogreen assay: Is the DNA concentration sufficient?
- Gender assay: Are the plate sample locations correct?
- DNA fingerprinting: Are the original and final SNP scores for a sample correct?
- qPCR validation: Is there a sufficient quantity of DNA to sequence?
As part of its ongoing customization, the team has developed scripts to streamline multiplexing, which has created a significant sample handling challenge. Technicians can set up a multiplex run by selecting a plate layout in a LIMS dropdown box; when the final library is built, the LIMS automatically adds barcode information and modifies the names of the library tubes to reflect the selected layout. The team also has implemented a mix of in-house-developed tools and third-party software to handle analysis pipelining, and the LIMS facilitates mapping by providing an index file that enables researchers to associate raw sequence files with their sample source.
The Northwest Genomics Center recently had an experience that drove home the institutional benefits of its new LIMS. The center was asked to remap every exome it had sequenced in order to reconcile its work with a modern build of the genome.
“We could do this because the LIMS tells us what’s what, what it was run on, what the target definition was — and we can find it on demand,” said Jeff Furlong, senior computer specialist at the Northwest Genomics Center. “Having a LIMS built for next-generation sequencing means we can do things like meet this request — something that would have been inconceivable before having implementing the GenoLogics LIMS.”
Mike Sanders is Genomics Product Manager at GenoLogics Life Sciences Software. He may be reached at editor@ScientificComputing.com.