Sandia National
Laboratories and supercomputer
manufacturer Cray Inc. are forming an institute focused on data-intensive
supercomputers.
The Supercomputing
Institute for Learning and Knowledge Systems (SILKS), to be located at Sandia
in Albuquerque, will take advantage of the strengths of Sandia and Cray by
making software and hardware resources available to researchers who focus on a
relatively new application of supercomputing. That task is to make sense of
huge collections of data rather than carry out more traditional modeling and
simulation of scientific problems. Sandia and Cray signed a cooperative
research and development agreement (CRADA) to establish the institute.
“It’s an unusual
opportunity,” said Bruce Hendrickson, Sandia senior manager of computational
sciences and math. “Cray has an exciting machine [the XMT] and we know how to
use it well. This CRADA should help originate new technologies for efficiently
analyzing large data sets. New capabilities will be applicable to Sandia’s
fundamental science and mission work.”
Shoaib Mufti,
director of knowledge management in Cray’s custom engineering group, said, “Sandia is a leading national lab with strong expertise in areas of data
analysis. The concept of big data in the HPC [high-performing computing]
environment is an important area of focus for Cray, and we are excited about
the prospect of new solutions that may result from this collaborative effort
with Sandia.”
Rob Leland, Sandia
director of computing research, added, “This is a great example of how Sandia engages
our industrial partners. The XMT was originally developed at Sandia’s
suggestion. It combined an older processor technology Cray had developed with
the Red Storm infrastructure we jointly designed, giving birth to a new class
of machines. That’s now come full circle. The Institute will leverage this
technology to help us in our national security work, benefitting the Labs and
the nation as well as our partner.”
Red Storm was the
first parallel processing supercomputer to break the teraflop barrier. Its
descendants, built by Cray, are still the world’s most widely purchased
supercomputer. The XMT, however, has a different mode of operation from
conventional parallel-processing systems.
Says Hendrickson, “Think about your desktop: The memory system’s main job is to keep the
processor fed. It achieves this through a complex hierarchy of intermediate
memory caches that stage
data that might be needed soon. The XMT does away with this hierarchy. Though
its memory accesses are distant and time-consuming to reach, the processor
keeps busy by finding something else to do in the meantime.”
In a desktop machine
or ordinary supercomputer, Hendrickson said, high performance can only be
achieved if the memory hierarchy is successful at getting data to the processor
fast enough. But for many important applications, this isn’t possible and so
processors idle most of the time. Said another way, traditional machines try to
avoid latency (waiting for data) though the use of complex memory hierarchies.
The XMT doesn’t avoid
latency; instead, it embraces it. By supporting many fine-grained snippets of a
program called “threads,” the processor switches to a new thread when memory
access would otherwise make it have to wait for data.
“Traditional machines
are pretty good for many science applications, but the XMT’s latency tolerance
is a superior approach for lots of complex data applications,” Hendrickson
says. “For example, following a chain of data links to draw some inference
totally trashes memory locality because the data may be anywhere.” More
broadly, he says, the XMT supports programs very good at working with large
data collections that can be represented as graphs.
Such computations
appear in biology, law enforcement, business intelligence, and in various
national security applications. Instead of a single answer, results are often
best viewed as graphs.
Sandia and other labs
have already built software to run graph algorithms, though “the software is
still pretty immature,” Hendrickson said. “That’s one reason for the institute.
As semantic database technology grows in popularity, these kinds of
applications may become the norm.”
Among its other
virtues, the XMT saves power because it runs at slower clock speeds. This
normally bad thing is good here because rapid computation is not the goal but
rather the accurate laying-out of data points.
SILKS’ primary
objectives, as described in the CRADA, are to accelerate the development of
high-performance computing, overcome barriers to implementation, and apply new
technologies to enable discovery and innovation in science, engineering, and
for homeland security.
The CRADA’s main
technical categories include software, hardware, services, outreach, education,
and training. University students and faculty, as well as scientists and
engineers from industry and government, are expected to be invited to take part
in and benefit from the institute’s research.