Technical Computing Throws a New Challenge to Science Budgets
Investments in software engineering can be the make-or-break factor
For many years, scientists have been able to rely on computer technology doubling in speed roughly every 18 months — usually referred to as a result of Moore’s Law. To increase the speed or capability of the simulations used in your research, all you had to do was buy updated hardware every year or so. Increased application performance and measurable productivity increases were straightforward and predictable results of periodic hardware upgrade investments.
Of course, it is now well known and publicized that this “free lunch” of easy increases in science computing speed from buying new hardware has ended. The Moore’s Law-driven hardware race of the past few decades is not the major fuel for performance gains in today’s most successful HPC centers. Now, performance increases — sometimes on the orders-of-magnitude scale — come from re-engineering software for scalability and from better matching of hardware and software. It’s all about enhanced parallel software engineering — parallel, scalable and robust code tuned to the new world of manycore architectures.
The new twist on Moore’s Law — roughly twice as much parallelism every 18 months — means that investments in software engineering can be the make-or-break factor. A side issue of this development is that it is renewing focus on the entire HPC ecosystem and the debate on how the “P” of HPC should be defined.
For those who have never been inclined to focus on the meaning of the “P” in HPC, here’s a quick re-cap. Originally the “P” in HPC was universally accepted as “Performance.” But then some came along who instead said that “P” should represent “Productivity,” which is actually a very pertinent concept that scientific researchers cannot afford to ignore.
The case for “High Productivity Computing” becomes really clear when you consider that it is not only the raw compute performance at your disposal that counts but, more importantly, how well you are able to make use of that performance for your team’s research objectives. The technical computing capability available to further your research efforts will be impacted by the entire HPC ecosystem — not just the specific hardware or software deployed. How will this ecosystem allow your research team to develop, verify, and use research applications?
Buying HPC infrastructure purely on theoretical (even arbitrary) metrics such as peak teraflops could well have little correlation with your research output — especially if your code does not scale to efficiently use the number of cores involved. Focusing your spend on the initial hardware price could leave you with lots of subsequent cost, time and pain in integration work, software engineering, electricity costs, etcetera— money and effort that could be better turned to your research.
What about effective support for pre- and post- processing of data — for example, visualization to explore the data produced by the simulations? What about computational efficiency? Are the applications using the processors, memory and interconnect optimally, or can the software be re-engineered to run faster on the hardware (meaning you spend less on hardware to achieve a given performance). Today, these are the types of issues that will impact (or benefit) the progress of your research more than anything else.
This new direction of computer technology evolution (parallelism) means that the great number of Scientific Computing readers whose research relies on applications originally written in the pre-parallel computing era are at risk of losing performance and competiveness, if these applications have not been re-engineered for the new HPC world. Sometimes this is immediately apparent and painful for budget managers who paid significant funds for new hardware only to find that applications run slower than on the older hardware.
What does this mean to your research team when you are in budget planning mode? First and foremost, consider the realistic working life of your applications and the likelihood that your application will need to be ported repeatedly. Whatever code you are using — whether open source, commercial or internally developed — make sure that is has been developed with any eye toward effectively exploiting both current and foreseeable future hardware technologies.
Taking time and resources to re-engineer your applications with the assistance of computational scientists who are expert in scientific software engineering for scalable parallel computing — and how to revamp applications to keep pace with the increasing parallelism and many-core architectures (including GPUs) — will enable your research to continue benefiting from the hardware race. Truth to tell, most scientific research applications have long outlived their original hardware, and software investments have long provided the better, if under-recognized, returns-on-investment. Now, however, the case for such software investments is stronger than ever.
Andrew Jones is VP of HPC Consulting at Numerical Algorithms Group. He may be reached at [email protected].