Stanford Professor Charbel Farhat and his research team at the Army High Performance Computing Research Center (AHPCRC) used a new, high-end, massively parallel computer to demonstrate the power of algorithms that instruct processors to work together to solve challenging problems. They directed 22,000 processors to solve billions of mathematical equations in just a few minutes, a rare feat in computer engineering.
“I believe we may have really set a record here,” said Farhat, director of AHPCRC at Stanford and a professor of aeronautics and astronautics and of mechanical engineering. “We’ve solved over 10 billion equations in a little over three minutes.”
This unique collaboration came about after the ARL acquired Excalibur, a Cray XC40 computer with 101,184 processors. Once completely up and running at ARL’s Maryland-based Defense Computing Resource Center (ARL DSRC), the raw computational power of this system will support dozens—potentially hundreds—of research projects.
But before the Army carved up the spoils, they gave Farhat and his team a chance to harness a significant slice of Excalibur’s massive computational cake and demonstrate the power of algorithms to solve large-scale equations.
The team succeeded far beyond Farhat’s last endeavor.
“The last time we’d had access to a large machine like this one, we probably ran our algorithms on about 3,000 processors,” said Farhat, referring to the computations that helped his team win a 2002 Gordon Bell Prize from the Institute of Electrical and Electronics Engineers. “Now we ran on 22,000.”
Engineers want to harness multiple computer processors to improve the efficiency of the complex and time-consuming calculations necessary for many technological problems, including solid mechanics, fluid dynamics, heat transfer and signal processing.
Theoretically, multiple computers can confront, digest and solve complex equations faster than a single computer. But coordination among computers can be as difficult as it is among people, Farhat said. Adding 10 people to a task does not make the group ten times faster or more efficient, because increasing the number of brains also increases the lines of communication and complexity of interactions. The same is true for computers.
“When you connect 100 computers and tell them to solve a system of equations, I need to break it into 100 pieces and ship each piece to a computer, and then they need to talk to each other,” Farhat said. “They cannot do this independently.”
To confront this well-known problem, Farhat and his team—led by Jari Toivanen, Radek Tezaur and Philip Avery—collaborated with the ARL DSRC to craft algorithms to divide these calculations among thousands of computers.
The team members worked around the clock for three weeks to prepare their software for the test on Excalibur. When the day came last month, they had access to a significant chunk of the facility’s 101,184 processors to divide up slices of their equations, share information and solve the problem. A mere three minutes later, those thousands of processors had solved over 10 billion calculations accurately.
The entire Excalibur probably won’t be available for a repeat of this epic performance. But, Farhat said he believes this new scalable algorithm will be tremendously useful on smaller computing systems.
“When Mercedes wins the Formula One race, the engineering feats get shifted to the street cars that Mercedes sells,” Farhat said. “So, when we improve our algorithms on the highest-end computers, we get to benefit from these improvements on our more pedestrian computers.”
“We empower researchers to solve the most difficult Army operational challenges through innovative computational science research and advanced computing,” said Raju Namburu, ARL DSRC director and AHPCRC cooperative agreement manager.
In the long run, AHPCRC engineers and their Army partners hope that this successful endeavor speeds up the timetable for calculating and solving complex problems in computer engineering.
“Here we’re able to demonstrate what we can do with access,” Farhat said.