Recently, our research work got a shot in the arm because Wayne State University was the recipient of a complete high-performance compute cluster donated by Silicon Mechanics as part of its 3rd Annual Research Cluster Grant competition. The new HPC cluster gives us some state-of-the-art hardware (a head node, eight compute nodes, InfiniBand and Gigabit Ethernet networking, Intel Xeon Phi coprocessors, and NVIDIA Tesla GPUs),* which will enhance the development of what we’ve been working on — a novel GPU-Optimized Monte Carlo simulation engine for molecular systems, known as GO-MC.
The software supports performing high-throughput computational screening of porous materials for CO2 sequestration, developing novel materials for the stabilization of drug dispersions, predicting phase behavior of polymers and polymer composites, and providing molecular-level insight into such fundamental biological processes as membrane fusion.
Using GPUs for atomistic simulation
What we’re working on is developing a general-purpose Gibbs Ensemble Monte Carlo (GEMC) simulation engine that uses GPUs for acceleration. We’re developing and implementing GPU-accelerated configurational-bias methods, efficient algorithms for computing Ewald sums on the GPU, and automated tuning of the code for different GPUs. The work builds on an existing particle-based GPU-GEMC engine we’ve been working on, and will introduce functionality that enables the simulation of biological processes and adsorption in porous materials. The code will be written to maintain compatibility with the file formats used by the NAMD and VMD software packages, simplifying simulation setup and data analysis. The resulting simulation engine will be released under the GNU General Public License v3 (GPLv3) and made available to users via the Internet.
The GEMC work focuses on atomistic simulation — a computational approach where we determine certain properties of the system by calculating energy or forces of atoms interacting with each other. Molecular dynamics methods are well-established for this purpose, but until now there hasn’t been the same level of code optimization and parallel optimization for Monte Carlo simulation. We’ve spent a few years developing this new GPU-based Monte Carlo simulation engine to simulate systems that contain up to two orders of magnitude more atoms than can be done with other software programs, which is very exciting work indeed.
Now, with the new cluster, we will be able to develop code and run calculations to get more complex data out of the model. For example, we are looking at polymer phase behavior, where there may be over 100,000 atoms to improve the accuracy of the simulation. Calculating polymer phase behavior on a typical CPU with 100,000 or 500,000 atoms would be just about impossible, but with the kind of code we’re developing to run on the GPU, the calculation is now tractable and is greatly enhanced by having this cluster.
For most of the calculations we have been working on, what might have taken us 30 days to complete can now be done in a day. Six months of calculations on a CPU take only a week when we add GPUs. It’s just that much faster. And this faster calculation time really enhances the research process in a lot of interesting ways.
For example, the new HPC cluster is going to be a boon to graduate training. If students have made a mistake in setting up the input configuration, and it takes six months to find out about the error, it can really slow them down. If the calculation only takes a day, and they come in and find out they’ve made an error, back they come the next day to fix it. Access to this advanced cluster will really help students graduate in a timely fashion.
In addition, the work we’re doing requires a focused and steady thought process in which the mind is actively engaged in concentrated thinking about the problem over a period of time. These faster calculations make that easier — unlike when you have to walk away for a long period while the calculations are running, and risk becoming disengaged.
GPUs enable much faster system prototyping
Another interesting benefit the cluster gives us is the option to do more accurate initial simulations. First, we get the configuration right, and then run a longer computation. At present, we run models that have lower resolution because they’re fast, and we use those to figure out how to use the calculation correctly. Then we use a different, more extensive model to run the final calculations.
With the cluster, we can run the same size model for both stages of development, by using lower precision for the prototype. Once we’ve tweaked the calculation process, we run longer, more accurate simulations. This lets us do prototyping much faster. When we get correct calculations using lower precision, we already know it will work at full precision. Changing the precision is a much better option than changing the model for prototyping.
Another exciting outgrowth of the new HPC cluster is being able to tap the resources of a cluster that has multiple nodes and multiple GPUs — we have four nodes with two GPUs per node. We have a lot on our plate already, but on our radar are plans for exploiting that capability further, especially using multiple GPUs to speed up the calculations even further.
Finally, we are interested in exploring whether the performance of the algorithms is better using the GPU or the Phi coprocessor, or whether some parts of the algorithm run better on the Phi while others run better on the GPU.
All in all, exciting times are ahead. In our line of work, we can always fill all available computation resources. More nodes and more GPUs allow us to get more work done. And that’s the way we like it.
* The HPC cluster includes hardware and software donated by Intel, NVIDIA, HGST, Mellanox Technologies, Supermicro, Seagate, Kingston Technology, Bright Computing, and LSI Logic.
Jeffrey Potoff is a professor of Chemical Engineering and Materials Science, and Loren Schwiebert is an associate professor of Computer Science at Wayne State University. They may be reached at [email protected].