Today’s installment is the third in a series covering how researchers from national laboratories and scientific research centers are updating popular molecular dynamics, quantum chemistry and quantum materials code to take advantage of hardware advances, such as the next-generation Intel Xeon Phi processors.
Georgia Institute of Technology, known as Georgia Tech, is an Intel Parallel Computing Center (Intel PCC) that focuses on modernizing the performance and functionality of software on advanced HPC systems used in scientific discovery. Georgia Tech developed a new HPC software package, called GTFock, and the SIMINT library to make quantum chemistry and materials simulations run faster on servers and supercomputers using Intel Xeon processors and Intel Xeon Phi coprocessors. These tools, which continue to be improved, provide an increase in processing speed over the best state-of-the-art quantum chemistry codes in existence.
“GTFock and SIMINT allow us to perform quantum chemistry simulations faster and with less expense, which can help in solving large-scale problems from fundamental chemistry and biochemistry to pharmaceutical and materials design,” states Edmond Chow, Associate Professor of Computational Science and Engineering and Director of the Georgia Institute of Technology Intel PCC.
Simulating binding of Indinavir drug with HIV II protease
The Intel PCC at Georgia Tech has been simulating the binding of the drug Indinavir with human immunodeficiency virus (HIV) II protease. Indinavir is a protease inhibitor that competitively binds to the active site of HIV II protease to disrupt normal function as part of HIV treatment therapy. Such systems are too large to study quantum mechanically, so only a part of the protease closest to the drug is typically simulated. The aim of the work at Georgia Tech is to quantify the discrepancy in the binding energy when such truncated models of the protease are used. To do this, simulations with increasing larger portions of the protease are performed. These are enabled by the GTFock code, developed at the Georgia Tech Intel PCC in collaboration with Intel, which has been designed to scale efficiently on large cluster computers, including Intel Many Integrated Core (MIC) architecture clusters.
Calculations were performed at the Hartree-Fock level of theory. The largest simulations included residues of the protease more than 18 Angstroms away from the drug molecule. These simulations involved almost 3000 atoms and were performed on more than 1.6 million compute cores of the Tianhe-2 supercomputer (an Intel Xeon processor and Intel Xeon Phi processor-based system that is currently number one on the TOP500 list). The results of this work so far show variations in binding energy that persist throughout the range up to 18 Angstroms. This suggests that at even relatively large cutoff distances, leading to very large model complexes (much larger than are typically possible with conventional codes and computing resources), the binding energy is not converged to within chemical accuracy. Further work is planned to validate these results as well as to study additional protein-ligand systems.
New quantum chemistry code: GTFock
The GTFock code was developed by the Georgia Tech Intel PCC in conjunction with the Intel Parallel Computing Lab. GTFock addresses one of the main challenges of quantum chemistry, which is the ability to run more accurate simulations and simulations of larger molecules through exploiting distributed memory processing.
- Read more: Speeding up Molecular Dynamics: Modified GROMACS Code Improves Optimization, Parallelization
How to Get a Copy of GTFock GTFock has been released in open-source form. Users can download it at https://code.google.com/p/gtfock |
GTFock was designed as a new toolkit with optimized and scalable code for Hartree-Fock self-consistent field iterations and the distributed computation of the Fock matrix in quantum chemistry. The Hartree-Fock (HF) method is the one of most fundamental methods in quantum chemistry for approximately solving the electronic Schrödinger equation. The solution of the equation, called the wavefunction, can be used to determine properties of the molecule. Georgia Tech’s goals in the code design of GTFock include scalability to large numbers of nodes and the capability to simultaneously use CPUs and Intel Xeon Phi coprocessors. GTFock also includes infrastructure for performing self-consistent field (SCF) iterations to solve for the Hartree-Fock approximation and uses a new distributed algorithm for load balancing and reducing communication.
GTFock code can be integrated into existing quantum chemistry packages and can be used for experimentation as a benchmark for high-performance computing. The code is capable of separately computing the Coulomb and exchange matrices and, thus, can be used as a core routine in many quantum chemistry methods.
Georgia Tech Hardware Through an educational grant from Intel, the Georgia Tech Intel PCC operates two Intel Xeon Phi processor-based servers, each with 8 Intel Xeon Phi coprocessor cards. These servers, called “joker” and “gotham,” have dual 10-core and dual 16-core processors, respectively. |
As part of IPCC collaborations, Georgia Tech graduate student Xing Liu and Intel researcher Sanchit Misra spent a month in China optimizing and running GTFock on Tianhe-2. During testing, the team encountered scalability problems when scaling up the code to 8100 nodes on Tianhe-2. They resolved these issues by using a better static partitioning and a better work stealing algorithm than used in previous work. They utilized the Intel Xeon Phi coprocessors on Tianhe-2 by using a dedicated thread on each node to manage offload to coprocessors and to use work stealing to dynamically balance the work between CPUs and coprocessors. The electron repulsion integral (ERI) calculations were also optimized for modern processors including the Intel Xeon Phi coprocessor.
The partitioning framework used in GTFock is useful for comparing existing and future partitioning techniques. The best partitioning scheme may depend on the size of the problem, the computing system used and the parallelism available. In Fock matrix construction, each thread sums to its own copy of Fock submatrices in order to avoid contention for a single copy of the Fock matrix on a node. However, accelerators including Intel Xeon Phi coprocessors have limited memory per core, making this strategy impossible for reduction across many threads. Thus, novel solutions had to be designed. Figure 2 shows speed up results from running the GTFock code.
Georgia Tech develops new SIMINT library
A deficiency in quantum chemistry codes that Georgia Tech saw had to be addressed is the bottleneck of computing quantities called electron repulsion integrals. This calculation is a very computationally intensive step: there are many of these integrals to calculate and these calculations do not run efficiently on modern processors, including the Intel Xeon processor. One of the reasons is that the existing codes do not take advantage of single instruction, multiple data (SIMD) processing that is available on these processors. It is difficult for algorithms to exploit SIMD operations because of the structure of the algorithms. The existing algorithms that are used are recursive in multiple dimensions and require substantial amounts of intermediate data. In general, it is difficult to vectorize these calculations. Many attempts in the past involved taking existing libraries and rearranging code elements to try to optimize and speed up the calculations.
The Georgia Tech team felt it was necessary to create a new library for electron integral calculations from scratch. The library they created is called SIMINT, which means Single Instruction Multiple Integral (named by SIMINT library developer Ben Pritchard). This library applies SIMD instructions to compute multiple integrals at the same time, which is the efficient mode of operation of Intel Xeon processors as well as the Intel Xeon Phi microarchitecture (MIC), which has wide SIMD units.
SIMINT is a library for calculating electron repulsion integrals. The Georgia Tech PCC team designed it to use the SIMD features of Intel Xeon processors — it is highly efficient and faster than other state-of-the-art ERI codes. The approach is to use horizontal vectorization; thus, you must compute batches of integrals of the same type together. The Georgia Tech team has posted information so that users can take a look.
The team uses Intel VTune amplifier extensively in optimizing SIMINT, because it helps tune the vectorization and cache performance. Developers know how fast the processor can go and the speed limits of the calculation because of the instructions they need to perform. Intel VTune amplifier provides a variety of statistics at a line of code level that help determine why they may not be reaching the expected performance.
Figure 3 shows an approximate 2x speedup over libint with a test case that has many worst-case configurations. Figure 4 shows a 3x speedup for another basis set without worst-case configurations.
“SIMINT has been designed specifically to efficiently use SIMD features of Intel processors and co-processors. As a result, we’re already seeing speedups of 2x to 3x over the best existing codes.”
Edmond Chow, Associate Professor of Computational Science and Engineering and Director of the Georgia Institute of Technology Intel PCC.
“GTFock has attracted the attention of other developers of quantum chemistry packages. We have already integrated GTFock into PSI4 to provide distributed memory parallel capabilities to that package. In addition, we have exchanged visits with the developers of the NWChem package to initiate integration of GTFock into NWChem (joint work with Edo Apra and Karol Kowalski, PNNL). Along with SIMINT, we hope to help quantum chemists get their simulations — and their science — done faster,” states Chow.
Other articles in this series covering the modernization of popular chemistry codes include:
- Modified NWChem Code Utilizes Supercomputer Parallelization
- Speeding up Molecular Dynamics: Modified GROMACS Code Improves Optimization, Parallelization
References
- The study was performed by the following members of the Georgia Tech Intel PCC and Intel:
Edmond Chow, Associate Professor of Computational Science and Engineering and Director of the IPCC; David Sherrill, Professor of Chemistry and Biochemistry; Graduate students: Xing Liu, Marat Dukhan, and Trent Parker; Researchers at the Intel Parallel Computing Lab: Sanchit Misra, Mikhail Smelyanskiy, Jeff Hammond, and Pradeep Dubey. http://www.cc.gatech.edu/~echow/ipcc/ - B. Pritchard and E. Chow, Horizontal Vectorization of Electron Repulsion Integrals, Journal of Computational Chemistry, 2016, submitted.
- E. Chow, X. Liu, S. Misra, M. Dukhan, M. Smelyanskiy, J. R. Hammond, Y. Du, X.-K. Liao, and P. Dubey, Scaling Up Hartree-Fock Calculations on Tianhe-2, International Journal of High Performance Computing Applications, 30, 85-102 (2016).
- E. Chow, X. Liu, M. Smelyanskiy, and J. R. Hammond, Parallel Scalability of Hartree-Fock Calculations, The Journal of Chemical Physics, 142, 104103 (2015).
Linda Barney is the founder and owner of Barney and Associates, a technical/marketing writing, training and web design firm in Beaverton, OR.
R&D 100 AWARD ENTRIES NOW OPEN: Establish your company as a technology leader! For more than 50 years, the R&D 100 Awards have showcased new products of technological significance. You can join this exclusive community! Learn more.