The San Diego Supercomputer Center (SDSC) at UC San Diego has been awarded a National Science Foundation (NSF) grant that will augment its campus computing cluster with targeted capabilities for bioinformatics analyses to support researchers across campus and their collaborators – including the ability to conduct de-multiplexing, mapping, and variant calling of a single human genome in less than one hour.
The grant is part of the NSF’s Campus Cyberinfrastructure (CC*) program, which invests in coordinated campus-level cyberinfrastructure (CI) components of data, networking, computing infrastructure, capabilities, and integrated services that lead to higher levels of performance, reliability, and predictability for science applications and distributed research projects. Learning and workforce development in CI is explicitly addressed in the program, and science-driven requirements are the primary motivation for any proposed activity.
“This new award illustrates SDSC’s increasing role in providing high-performance campus cyberinfrastructure in addition to its ongoing national supercomputing role,” said SDSC Director Michael Norman. “Through the capabilities enabled by this award, we expect to see substantial gains in productivity that should be of benefit to many UC San Diego researchers using bioinformatics tools and techniques for life sciences research.”
“A key objective of this project is to leverage new technology to provide accelerated computing capacity so that researchers can conduct approximately 8,000 whole-genome analyses per year, plus the ability to conduct quick turnaround single-genome analyses in about one hour,” said Ron Hawkins, SDSC’s Industry Relations director and the principal investigator for the project. “The latter capability could be particularly useful for precision medicine and emerging clinical applications of genomics.”
“The project will enable analysis and re-analysis of existing genome data in the context of the new genomes that will be sequenced over the coming years,” said Terry Gaasterland, a co-investigator in the project as well as a UC San Diego professor of computational biology and genomics, and director of the Scripps Genome Center. “This ability will bring new value to genome information and will accelerate how we tie genome variants to diagnosis and prediction of progression, onset and response to therapy.”
Triton Shared Computing Cluster Upgrades, ‘BioBurst’
The NSF award, valued at almost half a million dollars and slated to run through January 2018, provides funding for new hardware for UC San Diego’s Triton Shared Computing Cluster, or TSCC, a “condo computing” program established in 2013 that has seen strong growth over the last two years. Condo computing is a form of shared ownership model in which researchers use funds from grants or other sources to purchase and contribute compute “nodes” (computer servers) to the system. The result is a researcher-owned, shared computing resource of medium- to large-proportions and much larger than could typically be afforded by the typical principal investigator for dedicated use. The already large and growing life sciences research enterprise at UC San Diego is an increasing consumer of computing capacity on TSCC.
Under the NSF award, SDSC will implement a separately scheduled partition of TSCC with technology designed to address key areas of bioinformatics computing including genomics, transcriptomics, and immune receptor repertoire analysis. Called ‘BioBurst’, the system will incorporate the following major components:
- An input/output (I/O) accelerator appliance with 40 terabytes of non-volatile memory and software designed to improve network throughput by alleviating the small-block/small-file I/O problem characteristic of many bioinformatics codes;
- A field programmable gate array (FPGA)-based computational accelerator system that has been demonstrated to perform de-multiplexing, read mapping, and variant calling of complete human genomes in about 22 minutes;
- About 700 commodity computing cores, which will access the I/O accelerator and provide a separately scheduled resource for running bioinformatics applications;
- Integration with a large-scale, Lustre parallel file system which supports streaming I/O and has the capacity to stage large amounts of data associated with many bioinformatics studies; and
- Customization of the job scheduler to accommodate bioinformatics workflows, which can consist of hundreds to thousands of jobs submitted by a single user at one time.
“UC San Diego is grateful for this new support from the National Science Foundation’s CC* program to enhance the power of TSCC,” said Valerie Polichar, Director of Research IT Services at UC San Diego. “This award extends our overall research capabilities to innovatively enable our biomedical and life sciences researchers to push the boundaries of science through new computational resources and methods.”