For decades, it has been a commonly held belief that supercomputing capability is a predictable phenomenon with the “fastest” system in the world increasing in power by three orders of magnitude (1000X) about every 11 years. I put the term “fastest” in quotes, because very few ask the question: Fastest in what way? It turns out, of course, that this notion of “fastest” is limited to a narrow consideration of system performance that focuses on floating point capability.
The interesting thing about this is that, for many years, this notion of “fastest” was reasonably close to the notion of “most useful” as well. However, the linkage between speed and utility began to unravel several years ago without people paying much attention. About five years ago, a team from IBM Research had been working on the problem of how to get to the next 1000X system improvement (the so-called exascale system) when a simple thought intruded in the technical discussions: is the design of a supercomputer immune from the implications of big data?
It turns out this was the right question to ask at the right time, because it cast light on the bifurcation of “speed” from utility. It is pretty well understood that we are awash in a sea of data with the tide coming in higher every day, minute and second. In high performance computing every market segment has its unique set of data problems: in oil and gas, it’s driven by new seismic technology coupled to a desire for greater imaging fidelity; in life sciences, it is the growing amount of data coming from genomic research; and in social media it is the volume and diversity of interpersonal contacts. These are just a few of a myriad of examples that exist today and, when one speaks to the scientists, engineers and analysts in any of these segments, the message is the same: we haven’t seen anything yet.
With this being the case, it struck us that conventional system design (do what we did yesterday, only more so.) was not going to satisfy the problems of the future: supercomputer design was most assuredly not immune from big data. And, in 2009, when we started this inquiry the future was defined as 2014.
So, what should a supercomputer look like given that it has to be useful in a world where data is growing so dramatically? It turns out that a lot of conventional design approaches have to be rethought, and much of this has to do with the amount of time it takes to move data throughout the computing infrastructure. We concluded that moving data is, by itself, a pernicious process, because it directly contributes to the latency or delay in the time to get a solution. We thought that if we could build systems based on data centric design that minimized data movement, we could have a dramatic impact on the time to solution; the faster one gets a solution the faster they can act to gain strategic advantage.
In some sense, this is a restatement of the adage “time is money:” the quicker one can decide to drill an oil well or get a genomic characterization of a patient in a clinical setting or a better understanding of the intended behavior of a consumer, the better off we all are. Our technical approach was simple to state: to reduce time to solution, we would minimize the amount of time data flows through the system by computing on the data wherever it is in the infrastructure.
To do this, we intend to put computing every place where data sits or flows in the computing infrastructure. There will be no more passive devices; they will all become active in the sense that computing will be embedded into every device at every level of the architectural hierarchy. This doesn’t mean data will never move. It means that data will move only when there is no other option and, when data movement is required, we will move it with breakneck speed.
This discussion might strike some as obvious, but I think the number is small. For years, the supercomputing community has had an algorithmic-centric point of view: make RTM in oil and gas run faster or make SOAP in life sciences run faster. But the question often overlooked is: so what? If the algorithm sits in the context of a workflow that is consumed with data management issues, optimizing the algorithm without focus on the data issues might only provide marginal benefit in the pursuit of reducing time to solution. We have empirically observed customers who invest much more in their infrastructure today on data issues than they do on the algorithmic part of the workflow. That circumstance will only get more biased towards data issues over time.
The conclusion is unavoidable: supercomputer designers must either adapt to the realities of big data and move to a data-centric approach to design, or they will become anachronistic. From IBM’s perspective, the design for the future is here today.
David Turek is Vice President, Technical Computing OpenPOWER at IBM Corporation. He may be reached at editor@ScientificComputing.com.