What’s in Store for the Coming Generation?
Exascale systems will handle both ravenous and daintier appetites
“Supercomputer jobs have an insatiable appetite for computing power,” the decades-old saying goes, but even the biggest supercomputers have typically run a mix of large, ravenous “capability” problems and smaller, less-voracious “capacity” jobs. Both types of jobs have contributed to the return-on-investment for these mega-computers. And, although capability jobs will be the main justification for their purchases, coming generations of multi-petaflop and exaflop HPC systems will earn their keep by continuing to handle mixed capability-capacity workloads.
This mixed-workload ROI lesson may seem self-evident, but it’s important to keep in mind. At the October 2010 HPC China conference in Beijing, I chaired a panel discussion on whether petascale and exascale supercomputers would inevitably be narrow-purpose systems. The consensus among the U.S., Chinese and European panelists was that these ultrahigh-end systems will likely have a reasonable breadth-of-applicability. Yet, this consensus was based on counting only jobs capable of exploiting a substantial fraction of the supercomputer’s peak processing power — at least 20 percent, according to the question posed to the panel.
Certain applications domains, such as quantum chromodynamics (QCD), the particle physics codes addressing the strong interaction, are “embarrassingly parallel.” They are likely to remain scalability champions even on future exascale computers with more than one million processor cores. (Getting these codes to efficiently exploit ultrahigh-end HPC systems is nevertheless a difficult, impressive achievement.)
![]() |
What about the less-scalable jobs that can consume the majority of the aggregate runtimes even on ultrahigh-end supercomputers? In IDC’s latest HPC end-user study, 52 percent of the HPC applications reported by the 188 surveyed sites were running on only one node, or a fraction of one node. Only about 12 percent of the codes were running on more than 1,000 cores. And just one percent of the codes were able to exploit 10,000 or more cores. Not much fodder for the 200,000-plus cores of today’s largest HPC systems or the million-core systems due out later in this decade.
The scalability of these codes can be limited by a number of things. Most HPC codes were originally written to run on one single-threaded processor. Many of these codes have never been fundamentally re-written. Instead, these “dusty deck” codes have been laboriously modified over time to enable modest scalability — generally not more than eight-way parallelism. In other cases, the lack of more scalable algorithms curbs sustained performance. This is a solvable problem, although there may not be enough people on Planet Earth with the right combination of brainpower and experience to solve it for every deserving application.
More and more often, however, scalability is constrained by the limitations of the underlying, known science. Where the science doesn’t support large single runs of the problem, smaller iterative runs allow users to home in on productive solutions. Long-established examples include stochastic modeling in the financial services sector and parametric modeling in the design engineering realm. Newer examples are the increasingly complex ensemble models in the weather/climate domain, and a whole host of genomics and proteomics applications.
Systems biology might still be classified as pre-science today, in the sense that researchers in this fascinating area often are still “cone-ing up,” rather than “cone-ing down,” the number of possibilities. How can activities like this be scaled up with validity? Even in the long-standing field of design engineering, the drive toward multi-scale, multi-physics solutions presents new challenges for scalability on HPC systems.
![]() |
All of this is to say that iterative and other capacity-class solutions are no less valuable simply because they are less scalable than the highest-performing HPC codes. Iterative solutions will continue to proliferate as the algorithms and known science in more and more domains fail to keep pace with the rampant parallelism and peak performance of ultrahigh-end supercomputers. And, while iterative solutions run in parallel may have daintier computing appetites than the highest-scaling codes, they can still require considerable time on HPC resources.
That’s why government agencies in charge of multi-petaflop and exaflop HPC programs in the U.S., Europe and Asia increasingly are encouraging industry, including small and medium enterprises (SME), as well as small and medium-size scientific initiatives (SMS), to apply for time on ultrahigh-end supercomputers.
The HPC community has debated whether these mega-supercomputers should be co-designed with the key applications intended to run on them. This would certainly make sense if the supercomputers were intended to be narrow-purpose systems that would run only a handful of scientific codes at very large scale. Even if co-design were not carried out this rigorously, it would be important for the system architects to keep the requirements of these applications strongly in mind.
But the architects shouldn’t take the co-design concept so far that the resultant supercomputer truly is a narrow-purpose system that can’t run a range of iterative and other capacity-class solutions with reasonably good performance (this has been a problem with some government-funded HPC programs in the past.) These less-ravenous applications have always boosted the ROI of ultrahigh-end supercomputers. They will become increasingly important for justifying the funding of these systems in the future.
Steve Conway is Research VP, HPC at IDC. He may be reached at editor@ScientificComputing.com.