For three days this May, more than 40 researchers visited the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy (DOE) Office of Science User Facility, to improve the performance of their computational science codes by working alongside the experts who know the facility’s supercomputers best.
With extensive hands-on sessions, dedicated access to ALCF systems, and introductions to the tools, services, and computing resources available to facility users, the annual ALCF Computational Performance Workshop is designed to accelerate researchers’ efforts to test, debug, and optimize their codes on leadership-class systems.
“Over the course of three days, we accomplished what would have probably taken us a month and a half to do on our own,” said Thomas Robinson, a member of the Modeling Systems Group in the National Oceanic and Atmospheric Administration’s Geophysical Fluid Dynamics Laboratory (GFDL). Robinson attended the workshop to prepare the GFDL’s atmosphere model (AM4) to run on ALCF computing resources.
Every year, the ALCF hosts hands-on workshops to connect current and prospective users with ALCF staff members and high-performance computing (HPC) experts from places like Intel, Cray, ParaTools (TAU), and Rice University (HPCToolkit).
“The face-to-face time afforded by our on-site workshops often results in significant code improvements,” said Ray Loy, ALCF’s lead for training, debuggers, and math libraries. “Not only can attendees work on performance issues with staff and vendor experts in real time, they also form working relationships that extend well beyond the three-day workshop.”
In addition to the hands-on sessions, the event featured talks on a wide range of topics, including ALCF system architectures, debugging and performance profiling tools, optimizing I/O, and data and learning frameworks. The workshop agenda and presentation slides can be viewed here.
Federico Zahariev, an associate scientist at Iowa State University, attended the workshop to gain experience with the tools and frameworks available to improve code performance on leadership computing systems.
“The beauty of this workshop is the combination of lectures and the actual hands-on work,” said Zahariev, who is working on an application development project involving the GAMESS code for DOE’s Exascale Computing Project. “You hear about a tool or method and then you have an opportunity to work with the presenter to figure how to apply it to your code.”
One of the workshop’s primary goals is to help researchers demonstrate computational readiness for ALCF project proposals through allocation programs like the ALCF Data Science Program (ADSP) and the Innovative and Novel Computational Impact on Theory and Experiment (INCITE).
For Robinson and his GFDL colleague Jessica Liptak, the workshop presented an opportunity to jumpstart their efforts to prepare for a future allocation award. They spent a bulk of their time working with Vitali Morozov, ALCF’s lead for application performance and performance modeling, to compile and run AM4 on the facility’s Mira and Theta systems.
In the matter of a few days, the researchers went from carrying out a preliminary seven-node simulation on Theta to using 2,996 nodes for a significant ensemble run comprised of 107 bundled subjobs that used different parameters to simulate one year of global climate conditions.
“We definitely gave Theta a workout. We burned through about 2.1 million core-hours in our three days at the ALCF,” Liptak said. “Working with Vitali, who helped us get our finicky code to run on Theta and build on Mira, and having access to the large amount of compute time, were both vital to our success at the workshop.”
The GFDL researchers plan to do preliminary analysis of the data generated by their Theta simulations to determine how the results compare to those produced by ensemble simulations performed on Gaea, GFDL’s in-house computing system. This is part of a larger effort to study the deviation of results from several climate evolution simulations run on different platforms using a variety of compilers and compiler options.
After the workshop, Robinson and Liptak applied for and received a Director’s Discretionary award to continue their work on ALCF computing systems in advance of applying for a larger allocation award.
“We need to run several hundred more year-long ensembles with each compiler and optimization on Theta and Mira to build a statistically viable dataset,” Robinson said. “Ultimately, we want to use the ensemble-based metrics to develop a portable code base that will allow the GFDL model to run climate simulations on any machine.”