*How using CPU/GPU parallel computing is the next logical step*

My work in computational mathematics is focused on developing new, paradigm-shifting ideas in numerical methods for solving mathematical models in various fields. This includes the Schrödinger equation in quantum mechanics, the elasticity model in mechanical engineering, the Navier-Stokes equation in fluid mechanics, Maxwell’s equations in electromagnetism, Monge-Ampère equations in differential geometry, and the Black-Scholes model in finance.

This research produces innovative numerical algorithms approximating practical equations with improved effectiveness. The design of efficient parallel algorithms is a promising new trend in computational mathematics that will lead to tremendous performance improvements for computations in science and engineering. However, implementation involves massive computing that is extremely time-consuming in sequential computers, bottlenecking the entire development procedure.

We often think of using this type of numerical simulation in industries where experiments would be expensive or physically impossible. One example is studying the effects of car collisions without colliding two real cars. To be accurate about the collision, we have to be able to look at a really large number of smaller pieces that stand in for the whole car. Essentially, you need to break the structure so you can accurately analyze it at finer levels for the different materials that might be in the crash. This “domain decomposition” is the starting point for most numerical simulations.

This single task of breaking the computational domain into smaller pieces, however, inherits the most common problem in numerical approximations: the computational complexity increases exponentially with respect to the increase of the domain dimensions. (This is called the *curse of dimensionality.*) To achieve a moderate numerical accuracy, a decomposition for a three-dimensional domain, like a cube, should consist of billions of small pieces. The computational time for this task alone takes ** more than a year** using an Intel Core i7 3.2 GHz processor, and it is only the first step in the numerical simulation.

Having reached the physical limitation of sequential processing, the next logical step has to be alternative solutions in CPU/GPU parallel computing. Luckily, I was fortunate enough to be able to get my hands on a complete high-performance compute cluster donated to Wayne State University by Silicon Mechanics as part of its 3rd Annual Research Cluster Grant competition. The new HPC cluster includes a head node, eight compute nodes, InfiniBand and Gigabit Ethernet networking, Intel Xeon Phi coprocessors, and NVIDIA Tesla GPUs.

My research group and I are hoping to use this cluster to develop new parallel algorithms for each module of the finite element method (FEM). This includes algorithms for the domain decomposition, the matrix/vector assembling, and the numerical solver for large systems of equations. I will be conducting a thorough study on the parallelizability of each stage, to help determine the best computing strategy (CPUs or GPUs) for each part. Rigorous mathematical analysis and intensive numerical implementations will help us derive highly effective parallel FEMs that can solve equations in high-dimensional domains.

We will incorporate these new algorithms into a software package and plan to make it readily available for computations in various disciplines, in collaboration with colleagues in the chemistry and oncology departments, where numerical algorithms will be used to detect abnormal genes in sequence.

We are really excited to have the hardware to be able to perform this work. We are going to be looking at whether different architectures offer different possibilities for working in parallel. We are also looking into whether we can use our existing software with the GPUs, or how we can modify the code to be used on the Intel Xeon Phi coprocessors.

*Hengguang Li is an assistant professor in the Department of Mathematics at Wayne State University. He may be reached at editor@ScientificComputing.com.*