The Garvan Institute of Medical Research in Sydney, Australia, has pioneered insights into some of the most widespread diseases affecting the world today. Garvin is widely recognized for its genomics expertise, as well as its willingness to adopt new technologies. The institute envisions itself as an enabler that can rapidly prototype and evaluate specific analyses. Once verified, those analyses are made available to downstream research institutions, as well as businesses working to commercialize genomics technology.
“Meeting our commitments to researches requires extremely high computational power that is available 24/7,” says Dr. Warren Kaplan, chief of informatics at Garvan.
When Illumina, a manufacturer of genomics sequencing instruments, introduced its new HiSeq X Ten sequencing platform in 2014, Garvan was one of only three organizations worldwide to upgrade at product introduction. The HiSeq X Ten system produces up to five terabytes of data each day, and presented some daunting challenges to building a genomics production line. In order to keep the line rolling, downstream analyses and archiving operations had to be able to handle the data torrent generated by the new system. Running the Illumina system at full capacity required changes to the existing infrastructure, most notably, implementing parallel processing.
Five Panasas ActiveStor network-attached storage (NAS) appliances “arrived just in time to quell a user mutiny over painfully low application response,” as an expansion of the research staff from 10 to 80 had overloaded the existing storage system. The Garvan team later installed an additional ActiveStor storage system, bringing the Institute’s total Panasas storage to 400 TB.
ActiveStor is a fully integrated solution consisting of hybrid storage hardware, file system and protocols, combined with triple parity protection. The high performance storage solution delivers fast data access necessary to support rapid prototyping and evaluation of specific analyses required for genomic sequencing.
The system allows performance and data protection to increase with scale to accelerate time-to-results for even the most demanding high-performance workloads. 8 TB drive technology supports scalability to more than 20 petabytes (PB) and 200 gigabytes per second (GB/s), while RAID 6+ triple parity protection with per-file distributed software based on erasure codes provides a 150-fold increase in reliability over dual-parity approaches.
In Illumina’s recommended workflow, sequencer data moves back and forth between EMC Isilon central storage and local storage on the compute nodes. However, “thanks to Panasas’ exceptional performance, our sequencing data stays in the central repository throughout the analysis,” said Kaplan. “This streamlined workflow saves time and bandwidth, enabling us to deliver results quickly to researchers around the world.”
The Illumina sequencers combined with Panasas high-performance storage have increased Garvan’s sequencing capacity to 50 genomes per day on average — a fiftyfold improvement.