SANTA CLARA, CA — At the University of Miami’s Center for Computational Science (CCS), more than 2,000 internal researchers and a dozen expert collaborators across academic and industry sectors worldwide are working together in workflow management, data management, data mining, decision support, visualization and cloud computing. CCS maintains one of the largest centralized academic cyberinfrastructures in the country, which fuels vital and critical discoveries in Alzheimer’s, Parkinson’s, gastrointestinal cancer, paralysis and climate modeling, as well as marine and atmospheric science research.
In order to streamline workflows and keep pace with data-intensive discovery demands, CCS integrated its high performance computing (HPC) environment with data capture and analytics capabilities, allowing data to move transparently between research steps. To speed scientific discoveries and boost collaboration with researchers around the world, the center deployed high-performance DataDirect Networks (DDN) GS12K scale-out file storage. CCS now relies on GS12K storage to handle bandwidth-driven workloads while serving very high IOPS demand resulting from intense user interaction, which simplifies data capture and analysis. As a result, the center is able to capture, store and distribute massive amounts of data generated from multiple scientific models running different simulations on 15 Illumina HiSeq sequencers simultaneously on DDN storage. Moreover, number-crunching time for genome mapping and SNP calling has been reduced from 72 to 17 hours.
“DDN enabled us to analyze thousands of samples for the Cancer Genome Atlas, which amounts to nearly a petabyte of data,” explained Dr. Nicholas Tsinoremas, director of the Center for Computational Sciences at the University of Miami. “Having a robust storage platform like DDN is essential to driving discoveries, such as our recent study that revealed a link between certain viruses and gastrointestinal cancers. Previously, we couldn’t have done that level of computation.”
In addition to providing significant storage processing power to meet both high I/O and interactive processing requirements, CCS needed a flexible file system that could support large parallel and short serial jobs. The center also needed to address “data in flight” challenges that result from major data surges during analysis, and which often cause a 10x spike in storage. The system’s performance for genomics assembly, alignment and mapping is enabling CCS to support all its application needs, including the use of BWA and Bowtie for initial mapping, as well as SamTools and GATK for variant analysis and SNP calling.
“Our arrangement is to share data or make it available to anyone asking, anywhere in the world,” added Tsinoremas. “Now, we have the storage versatility to attract researchers from both within and outside the HPC community … we’re well-positioned to generate, analyze and integrate all types of research data to drive major scientific discoveries and breakthroughs.”
DataDirect Networks is a big data storage supplier to data-intensive, global organizations. For more than 15 years, the company has designed, developed, deployed and optimized systems, software and solutions that enable enterprises, service providers, universities and government agencies to generate more value and to accelerate time to insight from their data and information, on premise and in the cloud. Organizations leverage DDN technology and the technical expertise of its team to capture, store, process, analyze, collaborate and distribute data, information and content at largest scale in the most efficient, reliable and cost effective manner. DDN customers include financial services firms and banks, healthcare and life science organizations, manufacturing and energy companies, government and research facilities, and web and cloud service providers.
For further information
- University of Miami Center for Computational Science case study
- Genomics Solution Brief