The Next Disruption in HPC is Already Here
Multicore-enabling technologies must grow their footprints in the HPC space
High performance computing isn’t about technology at all. Rather, it’s all about the answers to difficult questions. The technology is simply a means to an end, and novel advances in technology are pioneered by scientists only when there is no other way to get to an answer in an acceptable time.
In the 70s, scientists didn’t develop supercomputing clusters based on array processors because they were an elegant technology; they simply needed more horsepower than the immature general computing market could provide. In the 80s and 90s, commercial processors provided a more workable solution, and most HPC systems happily embraced these “ordinary” processors, focusing more on the expansion of their software capabilities. When more horsepower was needed, supercomputing clusters from off-the-shelf systems emerged.
The general purpose computing market has made another quantum leap in processing power in the last five years, but this time it’s not in clock rates, it’s in the number of processing cores. Contrary to popular belief, Moore’s Law is not dead. The number of transistors on modern processors continues to double every 18 months. Those transistors are now just manifesting themselves as additional processing cores. There are two primary reasons that this shift has been made: power and memory
Power
As process technologies allow advances, transistor sizes decrease. This serves two related purposes:
• more transistors can be fit onto a die
• those transistors can be clocked at higher rates since distances for switching are shorter.
However, as clock frequency increases linearly, the required power increases exponentially. Therefore, practical limits to heat dissipation make higher clock rates impractical. Instead, the increased density is used to provide additional identical cores, each of which are clocked well below their theoretical maximum to manage power consumption.
Memory
The rate of improvement in DRAM memory speed has been unable to keep pace with the improvements in microprocessor speed. Though each is improving exponentially, the exponent for microprocessors is substantially larger than that for DRAMs. This disparity had memory latency on pace to be the largest bottleneck in overall system performance within single-core solutions.
With this new avenue to computational performance, new classes of HPC are going to have the opportunity to migrate from highly customized and specialized systems to commercially available computing systems. The equivalent supercomputer of 10 years ago can now be found in the freshman dorm room.
There is a catch, however. The typical desktop computer is running no fewer than 60 processes at any given time (my Windows7 Machine is currently running 108 processes). If the OS is multicore-aware, even single threaded applications will benefit from a multicore processor, because the OS has more places to schedule threads. Single processes may actually run slower on a new multicore PC if there are no other processes running, but the assumption is that a typical desktop PC will have many processes running and, therefore, overall system performance will increase.
HPC applications are the exception. So, just like the supercomputing clusters of the past, algorithms written in FORTRAN and C need to be modified to take advantage of parallel processing cores. These applications need to be broken into threads and these threads need to be designed to avoid some of the common mistakes in parallelization of code like race conditions and priority inversion. In addition, memory and communication between processes must be made thread-safe, and shared resources need to be avoided or addressed. These issues continue to haunt developers updating legacy code to new architectures, and they often result in instability and/or disappointing performance gains. As a result, a set of complementary technologies are growing into maturity that allow programmers to take advantage of multicore systems in new and interesting ways.
Dataflow programming
Traditionally, computer scientists have customarily written programs that directly relate to how the hardware is structured. At the most basic level, this concept is highlighted in assembly language, where processor registers themselves are directly manipulated. This is notably different from the flow diagram approach that many engineers take during the brainstorming process for an application.
Traditionally, an inefficient and intermediate step has been required in the design process in order to translate flow diagram components into sequential statements. However, a significant amount of research has been done on programming languages that directly encode both function and data dependency information, which makes this step unnecessary. These languages are referred to as dataflow languages, and they have one important benefit that cannot be ignored: the ability to automatically identify parallel sections of code and execute them on multicore processors.
Virtualization
Prevalent in the IT market today, virtualization is a technology that allows running two or more operating systems side-by-side on just one PC or embedded controller. As multicore processors with 4, 8 and 16 cores per chip become commonplace, most applications will have only a small amount of parallel tasks that can be executed at a given time, leaving many processor cores idle. Virtualization software can help solve this challenge by allocating groups of processor cores to individual operating systems running in parallel. Simply put, virtualization allows processing that would have been achieved on multiple computers to run on just one multicore processor.
Cloud computing
Though cloud computing is not directly addressing the challenges of multicore programming as outlined in this article, it does present an intriguing opportunity. Cloud computing offers theoretically infinite computing resources at a very low cost, and it may completely displace supercomputing clusters. Though this infrastructure is still in its infancy, it has the potential to change the face of HPC with the promise of such computational power combined with a low capital expenditure.
As these technologies continue to mature and emerge, their footprint in the HPC space will need to grow. Otherwise, it could lead to a step in the wrong direction with a new shift away from commercial technologies.
P.J. Tanzillo is the Group Manager of the Industrial and Embedded Software Team at National Instruments. He can be reached at [email protected]