How Do You Fit 10 Million Computers into a Single Supercomputer?

ExaNeSt Consortium members The ExaNeSt Consortium is a project in a group of projects known as FETHPC-2014(a) that is working to develop the technology needed to build next-generation high performance computing (HPC) systems, also known as supercomputers or exascale-level systems. These systems achieve performance in the range of 1 ExaFLOPS = 10¹⁸ FLOPS (one quintillion FLoating-point Operations Per Second). The project states that “HPC is, today, a tool of capital importance in the hands of humankind.” As such, the Horizon2020 FET-HPC ExaNeSt project is developing and prototyping solutions for some of the crucial problems on the path to production of exascale-level supercomputers. These include

Interconnection Network: exascale performance can only be reached by interconnecting millions of computing cores, their (volatile) memories and (non-volatile) storage, special-purpose accelerator hardware, and their input/output (I/O) devices, in a way such that all of them can cooperate tightly and effectively in solving one huge problem in a reasonable amount of time. This amounts to huge challenge for the network that implements this interconnection and its interface to the hardware and software components of the entire system: it has to be fast, resilient and low-cost, both in term of cost-to-build and energy-to-operate.
The project develops and prototypes innovative hardware and software for such networks to become tightly integrated with the system components, to become faster, to offer better quality-of-service (QoS) — especially congestion mitigation, to be resilient to failures, and to consume less energy.
Storage: traditional supercomputers used a large number of magnetic disks for storing non-volatile and permanent checkpoints and data, where these disks appeared as I/O devices to the computing cores. Storage technologies now change to flash and non-volatile memories (NVM), featuring dramatically lower latencies; interconnection and software architecture have to also change, in order to take advantage of such much faster access times.
ExaNeSt develops and prototypes a distributed storage system where NVM’s are local to the compute cores; hence, fast to access at low energy cost, yet the aggregate NVM’s in the entire system form a unified storage.
Cooling: communicating at low delay and energy cost requires physical proximity, i.e. packing thousands of cores and their components into a blade board and packing about a hundred blades into a rack (which also economizes on installation floor area). The by-product, unfortunately, is a large heat density to be removed.
The project develops and prototypes innovative Packaging and Cooling technology, based on total immersion in a sophisticated, non-conductive, engineered coolant fluid that allows the highest possible packing density while maintaining reliability.
Applications: ExaNeSt evaluates all these technologies using real high performance computing (HPC) and big data applications — from HPC simulations to business intelligence support — running on a real prototype at the scale of many hundred nodes containing thousands of compute cores.
Furthermore, it tunes its firmware, the systems software, libraries and such applications so that they take the best possible advantage of the novel communication and storage architecture: the project supports task-to-data software locality models, to ensure minimum data communication energy overheads and property maintenance in databases; and provides a platform management scheme for big-data I/O to our resilient, unified distributed storage compute architecture.

According to the European Technology Platform for High Performance Computing (ETP4HPC) Strategic Research Agenda (SRA): “HPC is a pervasive technology that strongly contributes to the excellence of science and the competitiveness of industry. Questions as diverse as how to develop a safer and more efficient generation of aircraft, how to improve the green energy systems, such as solar panels or wind turbines, what is the impact of some phenomena on the climate evolution, how to deliver customized treatments to patients… cannot be answered without using HPC […] HPC is also recognized as crucial in addressing grand societal challenges. Today, to out-compute is to out-compete best describes the role of HPC.”

The HiPEAC Roadmap (Vision 2015) adds: “Science is entering its 4th paradigm: data intensive discovery. After being essentially based on theory, empirical science, and simulation, science is now using data analytics (cognitive computing) on instrument data, sensor data, but also on simulation data. Simulations are, therefore, becoming more and more important for science and for industry, avoiding spending millions in experiments. High performance computing is an enabling technology for complex simulations.”

The ExaNeSt Consortium explains that “improving the performance of supercomputers as much as possible is an everlasting need, because that progress makes it possible to solve bigger and bigger problems that were impossible to solve with the previous generation of supercomputers, or it allows us to solve problems at greater accuracy than what was previously possible. In previous decades, performance of supercomputers grew owing to the advances in technology that allowed individual processors to become faster using faster transistors (higher clock frequency) and more transistors (a more advanced pipelined architecture): it was possible to keep the number of processors fixed, and still come up with faster supercomputers. Not anymore, though — technology has recently reached a plateau where individual general-purpose processing “cores” cannot be made any faster, because they would consume excessive electric power relative to the computational performance that they would offer. Thus, the only two ways left for achieving higher performance are to integrate a larger number of cores and to integrate special-purpose compute engines in a HPC systems.”

It continues to say “In this project, we work on enabling more cores — millions of compute cores — by improving on the network needed to interconnect them: ordinary networks become slow and congested when they grow in size, and novel techniques are needed in order to prevent that.” In the meantime, this increase in the number of computing cores has to be achieved … while keeping electric power consumption constant! Large supercomputers already consume a few tens of mega-watts — like a town of a few tens of thousands of people, each. Neither electric power companies nor society can afford to spend more than that for supercomputing. Thus, in order for progress to be sustained, we need each computing core to spend less watts of electricity, if we are to integrate more cores in a supercomputer. For that reason, in this project, we use computing cores designed by ARM, as opposed to more electricity-hungry processors used by other HPC systems. ARM — a European company — is the world-leader in low-power-consumption processors, which is the reason why it already dominates the mobile-phone market; owing precisely to this low-consumption advantage, ARM processors are now moving into Datacenters and into HPC. The computing “chiplets” that we use are provided to us by the previous EuroServer and the concurrent ExaNoDe projects.”

First meeting

The members of ExaNeSt held a kick-off meeting in Heraklion from December 10-11, 2015. A variety of technical matters were covered during two days of intense discussions, including

applications for the exascale era, both in HPC and Big-data contexts
interconnect architecture, implementation and simulation
design of exabyte storage systems
implementation of our node and rack prototypes
integration and optimization of the partners’ applications onto our prototypes

and management points:

project schedule and collaboration axes
day-to-day management
reporting of resources, publications and dissemination activities
utilization of the project Web site and collaborative software infrastructure

ExaNeSt partners

FORTH, a major European Research Center in Greece, coordinates the project and builds the hardware Prototype that will eventually contain a few thousand cores. FORTH also contributes to the interconnection architecture, with emphasis in low latency and congestion management, and to the systems software with emphasis in low-overhead communication.
Iceotope designs and manufactures scalable and robust high performance computing systems. Iceotope’s technology is fully immersed in warm liquid to effectively cool components, and reduce energy costs and consumption, overall offering a sustainable product. Iceotope leads the Technology and the Dissemination and Exploitation work packages, to develop a novel heat transfer method for much higher power density, providing the system packaging and cooling technology.
Allinea creates development tools and application performance analytics software for HPC, which it adapts and applies to the ExaNeSt architecture, especially for debugging and profiling.
EnginSoft is a premier consulting firm in the field of Simulation Based Engineering Science, and leads the Integration and Evaluation Workpackage.
eXact Labs is an SME in the HPC domain working on the porting and tuning of scientific applications in the material science and weather forecast area.
MonetDB Solutions is the technical consulting company for MonetDB, an open-source in-memory optimized column-based relational database system. The company is porting MonetDB to the ExaNeSt architecture for an in-depth comparison and evaluation using selected (real-world) workloads. MonetDB Solutions leads the Storage and Data Access Workpackage.
Virtual Open Systems is a French high-tech start-up software company active on KVM virtualization and Linux kernel. In this Project, Virtual Open Systems develops virtualization technologies for HPC, in order to allow the nodes of an application to run under Virtual Machines that can be dynamically migrated; among others, this is used to improve communication and sharing.
INAF is the Italian National Institute for Astrophysics; its Trierste Observatory leads the Exascale HPC Applications Workpackage, and adapts theoretical physics and complex systems simulation codes to the ExaneSt architecture.

Four academic and research partners collaborate on the development of Interconnection Network architectures, beyond their other topics, each. FORTH is one of them, and the other three are:
INFN is the Italian National Institute of Nuclear Physics; besides owning a rich HPC infrastructure, it has also developed, itself, such systems. INFN participates in the design of the low latency interconnect, and takes also part in the activities on storage and on applications.
The University of Manchester, one of the top research universities, leads the Interconnects Workpackage, with a strong focus on the simulation of very large networks; it also takes part in the design of the packaging organization and in the overall system evaluation.
The Technical University of Valencia conducts our research on optical network architectures, especially by exploring the impact of different link and switch technologies within the optical parts of the network.
And Fraunhofer, Europe’s largest organization for application-oriented research, contributes the research towards highly-scalable storage solutions, based on the distributed file system that it has previously developed.
For further information: http://www.exanest.eu

Related Articles Read More >

Q-CTRL hires defense industry leader to expand business partnerships in US and UK

Quantinuum provides RIKEN large-scale hybrid quantum–supercomputing platform

Finland’s 20-qubit quantum computer launch continues its supercomputer development

Bigger and better quantum computers possible with new ion trap, dubbed the Enchilada

Search R&D World