Shifting the Focus to Science
New programming model allows researchers to utilize multi-core systems without manual re-writes
Admiral Grace Hopper, a computer scientist and U.S. Navy officer who developed the first compiler and the COBOL programming language, once noted decades ago: “In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger
computers, but for more systems of computers.” Today’s scientific computing challenges demand systems of computers, because increasing transistors on a single processor and increasing its clock rate (i.e., getting a larger ox), has reached practical limits. Fortunately, the emergence of multi-core desktops and servers, combined with a new programming model, are putting amazing computing power into the hands of scientists and engineers.
And none too soon. As simulation replaces physical testing, and as increasingly complex phenomena are modeled, the data sets and computing requirements for new models have grown exponentially. It is not unusual anymore for modeling and simulation to perform experiments that are either too expensive, dangerous, time consuming or physically impossible to perform in the scale of a real laboratory. For example, the National Cancer Institute genomic micro-array database has more than 40,000 entries. The correlations between this data help researchers better understand the relationship of genes, and their conclusions can provide the basis for additional genomic research. The largest potential correlation analysis — with a data matrix of 100,000 by 100,000 — would require more than 256 GB of memory to solve, far more than contained in the desktop computer available to scientists.
Changing nature of scientific programming
Thirty years ago, FORTRAN was a standard part of the science and engineering curriculum. It was the de facto standard way to create custom applications in engineering and science. Learning FORTRAN was part of the cost of doing business — there was little else available for numeric computation and scientific computing. Originally developed by IBM for scientific and engineering applications, it came to dominate scientific programming early on and has been in continual use in computationally intensive areas ever since.
Over the past 15 years, there’s been a major shift in how scientists and engineers write applications. Rather than computer languages designed to run efficiently on scarce computers, the direction
today is to design languages that can be used efficiently by scarce scientists, engineers and analysts all striving to reduce the time-to-discovery of a new idea, or time-to-delivery of a new product. Nowadays, very high level languages (VHLLs) such as Python, MATLAB, R, SAS and others have become very popular due to their high-level constructs and interactive environment. They have largely displaced FORTRAN and other lower-level languages. And, while these tools have leveraged the increased CPU power of the single processor on the desktop, to date they have not yet evolved in scale to use the growing multi-core, multi-processor, and multi-computer resources that are increasingly available in the computer room. As a result, the opportunity to use these processors in a parallel way to speed up the computation has not been available. These multi-machines or supercomputers have not become interactive like the desktop!
The good news is that cost-effective solutions envisioned by Admiral Hopper with her teams of oxen analogy (i.e., teams of more processors) have recently arrived on the scene. With breakthroughs in multi-core processors from AMD and Intel, there is now a broad spectrum of affordable multi-processor solutions available for scientists and engineers ranging from multi-core desktop workstations, to dedicated entry-level servers, to workgroup and department clusters.
The challenge, however, is that the desktop programming model has not been available on multi-processor systems — whether shared memory machines, clusters or grids — that are still programmed in lower-level languages such as C and FORTRAN, and with the inter-processor communication protocol MPI (message passing interface).
Most scientific application software does not yet take advantage of multiple processors, as most vendors have not ported their applications to multi-processor environments. Combined with the fact that processor clock speeds are not going up, many desktop applications will actually run slower and leave most of the processing capacity of desktops and servers unused.
Vector processing is arguably obsolete. Multi-threading never really scaled, and the future is therefore message passing. But the hurdle today is programming complexity. Message passing is the most effective form of parallel programming, but it is tedious and complex due to the “assembly language level” of parallel programming required. The potential for faster scientific computing, either multi-core chips on the desktop or multi-node clusters for the workgroup, will therefore remain a largely untapped resource until application software can fully exploit the parallel processing resources.
Abstracting away parallel programming challenges
What is needed is the ability to leverage the various forms of parallelism from the desktop through a high level of abstraction that delivers the runtime speed-up, without the programming slow down. A high-productivity HPC
programming model would combine a user-friendly VHLL with optimized parallel libraries and existing applications coming from a variety of sources. Making this easy to do for non-professional programmers would be a major contribution to computational science and engineering.
Over the past two years, both commercial desktop application vendors and the open source community have introduced programming tools that extend these applications to parallel systems. These include Interactive Supercomputing’s Star-P, the Distributed Computing Toolbox from The MathWorks, and parallel extensions to the open source language Python. While these tools vary widely in terms of algorithm coverage and ease-of-use, they all represent a great leap forward for the technical computing community.
Interactive discovery at National Cancer Institute
The aforementioned National Cancer Institute (NCI) example exemplifies how scientists can leverage the parallel computing power of servers along with a new programming model to shatter the limits of traditional computing. The medical/genetics research team at NCI used The Mathworks’ MATLAB on a desktop PC to compute cross-correlation of measured samples in large data sets. This genomic profiling effort helps researchers better understand genetic risk factors for cancer, and develop new procedures for testing the genomic profiles of tumors — procedures that might advance the cause of personalized medicine, in which a patient’s genetic information may be used to customize the detection, treatment or prevention of disease.But an explosion in the amount of genomic data available to NCI researchers has made their work increasingly difficult. Their tasks require more computing power, more system memory and — all too often — more time. And, in the race to understand how genetics and cancer are linked, time is precious.
Until recently, a single correlation computation running on a desktop system could take days to complete, and computations of correlations had to be done in parts. The data sets and temporary variables vastly exceeded the desktop memory. The larger the probe numbers, the longer the pattern recognition routine takes. Turn-around times of up to a week had proven to be the practical limit for researchers. And, with bioinformatics on a steep growth curve, the problem was only going to grow worse.
Furthermore, desktop speed limitations were eroding the interactivity benefits of MATLAB. Researchers knew that larger correlations could be completed faster if the application could be parallelized to run on a multi-processor system outfitted with a
click to enlarge
Figure 4: Science and engineering desktop tools — such as MATLAB, Python, and R — are connected with high performance computers via the Star-P platform.
shared-memory architecture. But because interactive applications such as MATLAB don’t run on parallel systems, scientists would be forced to reprogram its algorithms for parallelization, most likely rewriting its software in C or FORTRAN, perhaps using MPI.
The team at NCI decided to explore the new programming model. The Star-P interactive parallel computing platform and the SGI Altix 3700 server bridge desktop PCs with high-performance parallel computing resources without manually reprogramming algorithms written for the desktop. This approach automates the process of parallelizing models and algorithms developed in MATLAB, adding the computational power of scalable SGI Altix servers to the desktop interactivity on which MATLAB users rely.
The new programming model has its advantages, notes Dr. Mark Potts, principal of HPC Applications, a consulting firm contracted to get NCI’s software up and running on the SGI Altix. “If your goal is to take the same interactive environment and transfer it to a parallel processing system with a lot more memory, then you’ll look for the easiest way to get there,” he said. “NCI is accustomed to working in MATLAB, and the Star-P approach retains that environment.”
Even more important was the impact on NCI’s scientific workflow. A typical routine might represent a data matrix 30,000 by 30,000, in which each probe is correlated with every other probe in the sample. On the desktop, the routine took more than two days to complete. But, using just six processors and 25 GB of memory on the Altix system, the entire correlation is done in 15 minutes or less. That means typical routines are running on the order of 200 times faster than before.
click to enlarge
Figure 5: A computational photonics model, created in MATLAB by researchers at the University of Central Florida, easily extended to a parallel environment with Star-P using a handful of data tags and commands. .
That acceleration is significant to researchers looking to continually extend the reach of their studies. As correlations increase in size, the computational requirements increase as well. In fact, doubling the data matrix takes more than twice as long to complete the correlation. On the desktop, the size and capability of the machine determined the scope of the problem. But today, when even the largest problems can be solved in hours or minutes, researchers can run 10 correlations in a day.
The new programming model has given researchers the ability to run more samples, and to approach problems differently than they would have before. Parallelizing their MATLAB code has given them more flexibility to find what they want and in greater detail. With a more powerful parallel system at their disposal, researchers also may try more complex searches that previously weren’t an option, scaling to data sets requiring terabytes of distributed memory.
Multi-core, multi-processor systems will usher in an era of breakthrough productivity in scientific computing. By eliminating the parallel programming challenges that today plague high performance computing, users can dramatically accelerate scientific discovery and time-to-market while cutting labor costs — the focus is on science and engineering, not the computer. Ultimately, the emerging programming model for parallel systems will enable a faster path to new science.
Bill Blake is the CEO of Interactive Supercomputing. He may be reached at editor@ScientificComputing.com.