Star-P software brings parallel cluster muscle to the researcher’s desktop.
The world’s most powerful computing systems used to be the domain of the world’s most elite technical users. Programming applications made to run on multiprocessor parallel processing architectures were far beyond the capabilities of most of the world’s scientists and engineers.A new type of user, however, has appeared. The emergence of low-cost Linux cluster hardware based on off-the-shelf standard processors from Intel and AMD are opening parallel clusters to a much wider base of researchers.
The limitations of single-core CPU desktop systems have driven them there. Gene data from microarrays, image and sensor data from satellites, relationships in social networks, and metabolic pathways in biological networks all represent rapidly growing data sets, and when combined with advanced numerical and combinatorial algorithms, significant computational challenges arise. A crucial threshold for “big” is whether the input fits in the random access memory of a modern desktop computer (generally less than 4 GB). Often it does not, and high performance servers are the only practical solution.
The clusters themselves haven’t changed; programming in parallel remains a huge barrier. What is changing is the approach to software. Parallel cluster models, which traditionally run in non-real-time batch mode, have been turned into an interactive computing powerhouse. A new class of applications is letting ordinary users solve problems using their familiar desktop tools and then automatically parallelizing the code to run on clusters, accelerating scientific discovery and saving time and money.
Bridging the gap with Star-P
Star-P software, from Interactive Supercomputing Inc., Waltham, Mass., is a client-server parallel-computing platform that’s been designed to work with multiple very high-level language (VHLL) client applications such as MATLAB, Python, or R, and has tools to expand VHLL computing capability through the addition of libraries and hardware-based accelerators.
Star-P tool consists of a client, an interactive engine, and a computation engine. The client connects the desktop application to the parallel server and outsources to it the most computationally-intensive operations. It resides on the user’s desktop, intercepts calls to VHLL function libraries, and forwards them to parallel libraries on the server if parallel computing is needed. An interactive engine runs on top of the server operating system, managing the multiple interactive sessions in a multi-user environment and giving client applications interactive access to the server’s processors, memory, and file system. Finally, the computation engine consists of built-in parallel computing and add-on parallel computing, accessible via a Star-P Connect API library.
Star-P supports both data-parallel computations—for high-level matrix and vector operations on large data sets and inter-processor communication—and task-parallel computations—for many independent calculations in parallel, such as Monte Carlo simulations, or “un-rolling” serial FOR loops.
Star-P and ultrasound achieve ultra-high resolution
Biomedical engineers at the Univ. of Virginia School of Engineering and Applied Science, Charlottesville, recently used Star-P software while developing a new imaging tool intended to dramatically improve medical ultrasounds. The new method has the potential to achieve more accurate and timely diagnoses of breast cancer and other life-threatening conditions.
The university’s biomedical engineering research team, led by associate professor William Walker, created an advanced beamforming algorithm—called time-domain optimized near-field estimator (TONE)—to significantly improve the contrast and resolution of ultrasound images.
“While conventional beamforming algorithms have been used in ultrasound scanners for nearly a half century, they typically result in degraded images that are blurry or cluttered,” says James Aylor, dean of the university’s School of Engineering and Applied Science. “The culprit is off-axis signals, the sound wave reflections coming from undesired locations within the organ or tissue.”
The TONE algorithm reduces these undesired off-axis signals, resulting in much higher definition images. Typical resolution for ultrasound imaging systems is in the 200-300 µm range. The new approach allowed the team to generate images with 67-µm resolution.
But this comes at the price of a much greater computational load. Developed on desktop computers, TONE overwhelmed the computer’s processing ability. The only way to use the innovation was to find a way to automatically parallelize the algorithms to run on a more powerful system.
Using MATLAB on desktop computers, the biomedical engineering team coded algorithms and imaging models for use by Star-P, which executed them instantly and interactively on a 32-processor Linux cluster with 64 GB of memory. Star-P eliminated the need to re-program the applications in C, Fortran, or MPI to run on parallel systems—which would have taken months to complete for large, complex problems.
“We were not able to generate images with such a fine sampling pitch until we used Star-P,” says research associate Francesco Viola.
Parallel systems enable computational ecology
Researchers at the National Center for Ecological Analysis and Synthesis (NCEAS), at the Univ. of California, Santa Barbara, are harnessing supercomputers and electronic circuit theory to help save wildlife from shrinking habitats. Using new methods in the emerging field of computational ecology, they are tracking wildlife migration and gene flow across fragmented landscapes.
The goal of the work is to help conservation organizations decide where to invest limited conservation budgets by identifying which lands to preserve or restore. As part of their work, a massive volume of landscape data is applied to complex circuit theory algorithms. NCEAS has been speeding up this code with Star-P, using sparse linear solvers, graph computations, vectorization, and parallelization. Computing time has dropped from days to minutes on their eight-core server.
The team was able to represent landscapes as conductive surfaces—with features like forests and highways having different resistance to movement—and analyze connectivity across them using circuit algorithms. Unlike standard conservation planning tools, the algorithms simultaneously incorporate all possible pathways when predicting how corridors, barriers, and other features affect movement and gene flow over large areas.
NCEAS scientists have used Star-P to help model mountain lion movements in Southern Calif. and habitat connectivity of mahogany in Central America, among other organisms. For each species, researchers analyze geographic datasets representing habitat suitability over vast areas, sometimes entire continents. Choosing between how large or how finely-scaled the maps should be can be challenging.
“Even a relatively small region like the three-county area of Southern Calif. can contain millions of raster cells, but our computing resources limited how finely we could grid those locations,” says Brad McRae, the NCEAS project leader. “While a mountain lion might perceive its habitat at a scale of about 100 m, we originally had to increase the cell sizes to around a kilometer to keep our data requirements manageable.”
A key step of the NCEAS simulations is a computation on a large graph (or network) that represents the connectivity of the landscape. Scientists integrated their code with GAPDT, a toolbox for graph computation that allows researchers who are not experts in the field of combinatorial scientific computing to leverage its methods in their own research. The combination of vectorization with Star-P’s graph toolbox and efficient sparse linear solvers dropped computing time from three days to about 15 min for typical problems. Scientists can now model larger maps with much finer grids.
VP of Marketing,
Interactive Supercomputing, Inc.
Interactive Supercomputing Inc., Waltham, Mass., 781-419-5050, www.interactivesupercomputing.com
NCEAS, Santa Barbara, Calif, 805-893-8000, www.nceas.ucsb.edu
Univ. of Calif., Santa Barbara, 805-893-8000, www.ucsb.edu
Univ. of Virginia, Charlottesville, Va., 434-924-0311, www.virginia.edu