Super-sized Flash Memory System will Aid Genomics, Astrophysics
Leveraging lightning-fast technology already familiar to many from the micro storage world of digital cameras, thumb drives and laptop computers, the San Diego Supercomputer Center (SDSC) at the University of California, San Diego has unveiled a “super-sized” version —a “flash” memory-based supercomputer that accelerates investigation of a wide range of data-intensive science problems.
The new high performance computing (HPC) system, dubbed “Dash,” is an element of the Triton Resource, an integrated, data-intensive resource primarily designed to support UC San Diego and UC researchers that went online earlier this summer. As envisioned, this “system within a system” will help researchers looking for solutions to particularly data-intensive problems that arise in astrophysics, genomics and many other domains of science.
While Dash, which already has begun trial runs, is a medium-sized system as supercomputers go with a peak speed of 5.2 teraflops (TF), it has several unique properties, including the first use of flash memory technology in an HPC system, using Intel High-Performance SATA Solid-State Drives. Four of its nodes are specially configured as I/O nodes, each serving up 1 terabyte (TB) of flash memory to any other node, courtesy of new I/O controllers also developed by Intel and integrated by Appro International (One terabyte equals one trillion bytes of storage capacity).
The system features 68 Appro GreenBlade servers with dual-socket quad-core Intel Xeon processor 5500 series (formerly codenamed Nehalem) nodes linked to an InfiniBand interconnect. In its current configuration, Dash has 48 gigabytes (GB) of DRAM memory on each node, and employs vSMP Foundation software from ScaleMP. that provides virtual symmetric multiprocessing capabilities and aggregates memory across 16 nodes into shared memory “supernodes,” giving users access to as much as 768 GB of shared DRAM memory in addition to 1 TB of flash memory per “supernode.”
“Dash’s use of flash memory for fast file-access and swap space — as opposed to spinning discs that have much slower latency or I/O times — along with vSMP capabilities for large shared memory will facilitate scientific research,” said Michael Norman, interim director of SDSC. “Today’s high-performance instruments, simulations and sensor networks are creating a deluge of data that presents formidable challenges to store and analyze, challenges that Dash helps to overcome.”
For example, Dash will have the capability to search sky survey data for near-earth asteroids and brown dwarfs that may help researchers better understand periodic extinctions on Earth, and it will speed up investigations to establish relationships among species based on their genes. Such research not only could yield new information regarding evolution, but help biomedical researchers mine these complex data sets for clues to develop new drugs or cures for a variety of diseases.
“Dash can do random data accesses one order-of-magnitude faster than other machines,” said Allan Snavely, associate director at SDSC. “This means it can solve data-mining problems that are looking for the proverbial ‘needle in the haystack’ more than 10 times faster than could be done on even much larger supercomputers that still rely on older ‘spinning disk’ technology.”
Dash is currently being tested but soon will be made available to users of the TeraGrid, the nation’s largest open-access scientific discovery infrastructure, for evaluation and development of application codes that can take advantage of flash memory and virtual “supernodes” technology. For additional information about access and allocations, see www.teragrid.org.
As an organized research unit of UC San Diego, SDSC is a national leader in creating and providing cyberinfrastructure for data-intensive research. Cyberinfrastructure refers to an accessible and integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC recently doubled its size to 160,000 square feet with a new, energy-efficient building and data center extension, and is a founding member of TeraGrid, the nation’s largest open-access scientific discovery infrastructure.