Most of the principles behind supercomputing were set in place during the 1940s, so it is not surprising that they are in need of rethinking or “rebooting.” Turing’s computability theory helped many people discover the potential of computers and, with von Neumann’s concept of software stored in memory, enabled some of those people to program computers with easy-to-understand abstractions. This ultimately led to a long period of supercomputing growth driven by faster processors and memory and the additional concept of parallelism. We have now reached the point where parallelism available in applications has been largely exploited and device physics limits further increase in single-thread performance. We now see a shift in priorities from the ability to express programs conveniently, to the reducing of power consumed by the computer while executing those programs. This shift is driving a need for a new approach to computing.
The IEEE Rebooting Computing (RC) initiative is catalyzing the technical community to rethink the concept of computing through a holistic look at all aspects of computing. Working in conjunction with partner International Technology Roadmap for Semiconductors (ITRS), IEEE RC has had four meetings scoping options for the future. This includes three summit meetings discussing the future of computing holistically. A fourth meeting focused on the implications of device physics advances on computer architecture. One conclusion from that meeting is that the semiconductor industry is on a trajectory to scale logic to the limits of power dissipation while memory is beginning to exploit the third dimension for additional capacity.
Computer technology followed the path of exponential scaling over the last several decades. This led to diagrams like Figure 1 becoming common, with time on the horizontal axis and showing some property of computing going up or down on the logarithmic vertical axis. The specific feature in Figure 1 is the decline in energy per operation in computers, shown by the green curve. This decline in energy per operation is dependent not only on device feature size, but also a reduction in power supply voltage. Current transistors are of the MOSFET design, where leakage current and resulting energy consumption rise exponentially once power supply voltage drops below a certain level. The power from leakage current is in the orange curve of Figure 1. Total power per operation is shown in the diagram by adding the green and orange curves to create the red curve. The red curve has a minimum, which would limit further scaling of energy per operation unless addressed.
Energy consumption can be reduced without lower energy per operation. Improvements in architecture and algorithms allow a given problem to be solved by use of fewer operations. Energy-saving features are also helpful, like powering down sections of a chip during brief periods when they are unused.
The device physics community feels a more power-efficient transistor is inevitable, yet there is no agreement on the time frame. This “millivolt switch” would enable several more generations of scaling, overcoming the current energy-based limitation but, in turn, being limited by reliability and error rates.
This next limitation is due to a basic principal of physics where a signal on a transistor will be misinterpreted due to the signal competing with the natural noise and motion of electrons in a transistor. The probability of this misinterpretation is exp(-Esignal / kT), where Esignal is the signal’s energy, k is Boltzmann’s constant, and T is the temperature in degrees Kelvin. Esignal » 1000 kT today; if energy efficiency improves more than about 10 times to Esignal < 100 kT, computers will require new types of error detection circuitry to give consistently correct answers.
The future need for error correction raises challenging technical issues, but is what is often colloquially called “a problem we would like to have.” A data center pays a power bill each month, but the cost of engineering more power-efficient transistors and architectures with error-correcting architectures will be paid just once.
IEEE RC and ITRS considers the error correction described above to be a “partial reboot,” because it would preserve high-level architecture and algorithms, but require development of stronger fault mitigation.
IEEE RC and ITRS plan to examine other approaches to computing. One approach, for instance, is to accommodate the lack of reliability through new approaches called random computing and approximate computing.
Memory as a path for improvement is another topic. There are many new devices that promise not only to make the traditional memory hierarchy more effective, but also introduce up to perhaps 1000´ capability boost through relief of memory bus congestion, often called the “von Neumann bottleneck.” New memory devices that can be integrated with logic are within reach; the challenge is to develop new algorithms that operate at high parallelism with specific interaction patterns.
Other approaches to computing may enable economic development into new areas, but the computers may need to be programmed or otherwise controlled in very different ways. Some examples of these novel approaches covered by IEEE RC and ITRS include magnetic wave propagation patterns for functions like FFT, coupled oscillators for pattern matching, and artificial neural networks. It is intriguing to consider that an artificial neural network could not only replace current computers, but could replace some of the activities of current computer programmers.
Erik DeBenedictis is on the staff at Sandia National Labs and participates in the IEEE Rebooting Computing initiative and International Technology Roadmap for Semiconductors. Erik’s first connection with Scientific Computing was building a computer now called the Cosmic Cube as a graduate student at Caltech in the early 1980s. This research computer became the model for almost all supercomputers today. Erik works on various aspects of advanced computing, notably a “Beyond Moore” project at Sandia working in conjunction with the IEEE Rebooting Computing initiative and International Technology Roadmap for Semiconductors.