Over the past decade, high performance computing has scaled from teraflop performance to petaflop performance, and is now heading toward the Exaflop Era. Technology development has had to keep up in order to enable these performance leaps, with such notable advancements as the move from SMP architecture to clustered multiprocessing with multi-core processors, as well as added acceleration from GPUs, FPGAs and other co-processing technologies.
Historically, increased performance has been achieved with development of the individual hardware devices, drivers, middleware and software applications, furthering scalability and maximizing higher throughput. However, this trend is becoming short-lived. Enabling the next order of magnitude performance improvements for exascale-class computing will require technology collaboration in all areas. The discrete development and typical integration strategy is not feasible as a solution that will meet the requirements of exascale, as no one company or development effort can efficiently provide all the components necessary to scale performance to such a degree; therefore, a system-level approach to exascale computing is already underway.
Co-design is a collaborative effort among industry thought leaders, academia and manufacturers to reach exascale performance by taking a holistic system-level approach to fundamental performance improvements. Co-design architecture enables all active system devices to become acceleration devices by orchestrating a more effective mapping of communication between devices in the system. This produces a well-balanced architecture across the various compute elements, networking and data storage infrastructures that exploits system efficiency and even reduces power consumption.
Exascale computing will undoubtedly include three primary concepts: heterogeneous systems, direct communication through a more sophisticated intelligent network, and backward/forward compatibility. Co-design includes these concepts in order to create an evolutionary architectural approach that will enable exascale-class systems.
An example of recent efforts, and a more unified approach to better enable heterogeneous systems, is the OpenUCX project. OpenUCX is a collaborative effort of industry, laboratories and academia, working together to create an open production-grade communication framework for high-performance computing applications. OpenUCX is already well underway and addresses fundamental concerns of application portability across a variety of hardware, without the need to migrate applications and the system software stack for every type of infrastructure. The participants in this initiative include IBM, NVIDIA, Mellanox, the University of Houston, Oak Ridge National Laboratory, The University of Tennessee and many others. The project is also composed of many thought-leaders on an advisory panel to guide the efforts toward the most effective solutions for exascale.
UCX was initially created by merging three existing HPC frameworks into one all-encompassing communications paradigm that will support the existing communication frameworks on one side and all hardware interfaces on the other side. The result of this approach is an optimized communication path with low software overheads, producing near-bare-metal performance and portability of software from one interconnect to another.
GPUDirect RDMA is another example of co-design collaboration, in which NVIDIA and Mellanox have developed a technology to allow direct peer-to-peer communication between remote GPUs, completely bypassing the need for CPU and host memory intervention to move data. This capability reduces latency for internode GPU communication by upwards of 70 percent.
The continued co-design of this technology will soon include additional key aspects of peer-to-peer transactions, including more control of network operations to the accelerator and offloading of the control plane from the CPU and the data path. The result will further reduce latency, allow much lower power CPUs to be coupled with GPU acceleration capabilities, and address power reduction across peer devices that will be typical in a heterogeneous-based exascale system balanced with both vector and scalar components.
Backward compatibility must always be a consideration when advancing technologies with performance improvements, but forward compatibility will be of paramount importance toward implementing Exascale computing. Whereas it is not uncommon for 10 to 20 petaflop machines to be completely replaced within a five-year period today, exascale machines will not be able to be supplanted so easily. As such, co-design is inclusive of using open standards for portability and compatibility, ensuring that Exascale computing can be achieved without the fear that clusters will need to be entirely overhauled or upgraded.
A common concern when working with the traditional approach (in which technologies are integrated instead of co-designed) is with point-to-point processor technologies such as QPI or HyperTransport. Such technologies have their own defined set of physical, link, routing, transport and protocol layers that have not remained consistent and compatible over time. This not only introduces backwards compatibility issues between SOC-technologies, but it also eliminates future-proofing to the next generation of integrated elements. Exascale systems must have guaranteed future-proofing to maintain such a level of investment, performance and capabilities, and to keep millions of lines of application code from being overhauled for every generation of hardware.
While 100Gb/s interconnect solutions are already available, enabling even more data to be transferred in less time, it will be difficult to reach exaflop levels of scalability and performance without co-design’s holistic system perspective that addresses the next order of magnitude of performance. Continued collaboration is crucial to achieving the flexibility, efficiency, and portability necessary to make the move to exascale computing a reality.
Gilad Shainer is Chairman of the HPC Advisory Council.