If you’ve ever been involved in configuring a high performance computing (HPC) system for a broad range of scientific disciplines, then you know how difficult it can be to balance different user needs with budgetary realities. You have to consider everything from application performance for specific types of workloads, to what type of expertise will be needed for managing the system, to ongoing operating costs and much more.
That’s why the Fudan University HPC Center faced a puzzle of sorts when it began looking for a new solution for supporting more than 50 projects, including the temperamental Vienna ab-initio simulation package (VASP) application package. The HPC Center needed to get the best performance/price possible for key workloads, and it also sought a highly stable and manageable solution capable of keeping its diverse team of researchers moving forward. By turning to an Intel-based solution, Fudan found a way to meet technical requirements while leaving enough budget to grow.
A leading research institution and university
As one of the top universities in China, Fudan University has 2,400 faculty members spread across 69 departments, including some of the nation’s leading researchers in an array of disciplines. The university founded its HPC Center in 1998 to support scientific research across the chemistry, bioscience, environmental science, information technology, mathematics and physics departments. Today, the HPC Center supports a total of 49 research groups that undertake government projects.
The main user of the HPC Center is the Key Laboratory of Computational Physical Sciences (Fudan University), a leading center for science research in China’s Ministry of Education. The Laboratory, which was founded as part of an effort to establish and promote world-class universities called Project 985, focuses on solving key problems in computational mathematics, computational chemistry and computational physics.
Supporting physics and materials research: First principle calculations and beyond
The first principle calculation, which is widely used by research organizations to model atomic-scale materials, is a cornerstone of the Fudan University Key Laboratory of Computational Physical Sciences. Professor Xin-gao Gong, Director of the Fudan HPC Center, explains: “The vast majority of research performed using the first principle program can be completed based on standard outputs. At the same time, since VASP and Quantum Espresso (QE) are popular open source programs, there are extensive user groups to facilitate research-related questions and exchanges.”
Professor Gong says that, in addition to using first principle methods for studying basic theories, Fudan faculty are using these methods to aid understanding of the material world and to create new functional materials. “Our materials research group has made some outstanding achievements, including a National Natural Science Prize for nano-material research,” he says.
The Fudan HPC Center currently supports an array of first principle programs. Researches use VASP and QE for electronic-structure calculations and materials modeling at the nanoscale. They use SIESTA to perform electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids. And they study molecular systems and reactions using Gaussian.
The end of a lifecycle
The Fudan HPC Center’s last cluster (built in 2011) included 502 compute nodes with a QDR Network InfiniBand switch. The peak performance of the system was 64 TFlops. After several years of use, the cluster could not meet increasing scientific computing user requirements, and the system was running at nearly full capacity (>85 percent). Based on performance, power consumption and stability considerations, the Fudan HPC Center knew it was time for an upgrade to a bigger HPC cluster, and it began evaluating new options.
Clear requirements, careful analysis
When the HPC Center started looking for a new solution, the price/performance and stability of the new cluster were top considerations. The challenge, however, was to find a system that could efficiently support their first principle calculations, in addition to other key workloads.
“Although many first principle codes are easy-to-use, achieving optimal performance from the algorithm, especially in large systems, is often difficult,” notes Professor Gong. Therefore, it was not just a matter of picking a system with the best price/performance ratio on paper.
The HPC Center ended up evaluating options based on several factors, and rated each factor using the following scoring system in its request for proposal:
- Price: 40%
- Compute node: 15%
- Peak performance: 25%
- GPU: 1%
- SSD: 1%
- InfiniBand network: 8%
- Blade: 5%
- Business terms: 5%
When weighing a detailed analysis of the real workload — including compute balance, network traffic, message/data size, scalability and other factors — an Intel HPC solution came out on top.
Outstanding manageability and performance
The HPC Center’s new cluster is fully based on Intel HPC components, with the high-performance compute nodes using dual Intel Xeon processors E5-2660 v3. The system was built with an Intel True Scale Fabric 12800 switch to help optimize price and performance.
- Intel True Scale Fabric Host Channel Adapters (HCA) facilitate load balancing.
- Intel Enterprise Edition for Lustre software supports management of the high-performance computing storage.
- Intel Parallel Studio XE Cluster Edition is used to help optimize coding and tuning.
- Intel Solid State Drives DC S3500 are used in every compute node to help ensure the best I/O throughput.
The current configuration, which includes just 144 compute nodes, can already achieve more than 120 TFlops.
According to Professor Gong, the comprehensive Intel capabilities provide an ideal solution for the HPC Center’s needs. “True Scale was fully integrated and validated with Lustre, and it provides tremendous performance for our target applications at an affordable price. The user interface also makes it easy to monitor and manage the health of each node and storage component, so we can stay focused on research,” explains Professor Gong. He also appreciates Intel’s enterprise support capabilities. “I know, if we have issues, we can make one call and get a quick response.”
Leaving room to grow
Professor Gong says that the Intel-based solution not only met current HPC Center requirements, but it also left the center well-positioned to expand. And given all of the work happening at Fudan University, it’s only a matter of time when additional compute power will be needed
“The Intel HPC solution has left us plenty of room to grow in our existing facility, and we have a plan to grow our cluster in the near future,” he explains. “We are still getting to know the new system, but based on our initial work, we are confident that it will remain stable and manageable, even as the cluster size and user demands increase,” concludes Professor Gong.
Sean Thielen is a Portland, OR-based technical writer.