
Korea Institute of Science and Technology Information (KISTI)
Once the celebrations are over and the world-class supercomputer is installed and operational, it becomes imperative to deliver on the promised value of such a significant computational asset. World-class supercomputers such as the recently installed Korea Institute of Science and Technology Information (KISTI) NURION system (currently the 13th fastest system in the world as of the November 2018 TOP500 list) exemplify how these computational platforms are both national assets and flagship technology tools procured to provide for the future—be it in science or to meet the economic needs of a region.
Underlying Korea’s strong economic development has been a consistent effort to create a robust science and technology (S&T) capacity. As the 4th largest economy in Asia and 11th largest in the world, South Korea is a global technology leader where advanced materials science plays an important role. As a leader in advanced materials research, KISTI procured the NURION system in part to help their researchers simulate larger solid atomic structures.
“Electronic structure simulation of realistically sized solid structures is quite critical to help experimentalists who work on designs of new materials or advanced electronic devices,” said Soonwook Hwang, Ph.D, General Director and Principal Researcher, Division of National Supercomputing at KISTI. “With large-scale simulations, we expect to cover design factors for nanoscale devices with large-scale simulations that can predict physical behaviors of solid structures having up to several million atoms”.
Selecting the best system for advanced material science workloads
The importance of advanced materials research to South Korea is evidenced by the significant investment represented by the NURION class supercomputer.For this reason, the KISTI team critically evaluated the various hardware solutions upon which the NURION procurement could be based – including GPU accelerated systems. Their results have been published in the literature for Intel processors and GPUs.
Scalability and the need to solve large-scale PDE problems which involve sparse matrix operations were key technology considerations in the KISTI procurement. Superior sparse-matrix performance observed on Intel many-core processors coupled with performance advantages of the Intel Xeon Phi HBM memory helped in selecting a CPU as opposed to a GPU-based system.
Sparse-matrix performance is critical
Tasked with making the NURION system deliver on its promised potential, KISTI researchers such as Hoon Ryu, Ph.D, Principal Researcher at KISTI, have been deeply involved in the evaluation and selection of the NURION system. According to a statistical workload analysis performed at KISTI, approximately 50 percent of their workloads involve sparse matrix operations. This means the NURION supercomputer has to deliver best in class sparse-matrix performance. It has been pointed out in the literature that GPUs are challenges by sparse-matrix based computations.
Memory bandwidth and scalability go hand-in-hand
The KISTI team found that the speedup due to the performance of the Intel Xeon Phi processor’s high bandwidth memory (HBM) meant that a single node could take a larger workload. Ryu points out that “inter-node scalability is quite nice.” Scalability tests demonstrate a speedup when increasing the number of computing nodes. processors such as those in the earlier Tachyon-II HPC cluster.
KISTI observed a 1.5-3x speedup when they made use of the high bandwidth memory (HBM) packaged with the many-core Intel Xeon Phi processor 7250 nodes. More recently, they successfully ran a 0.4billion atomic structure in NURION system and checked the strong scalability up to 2,500 computing nodes (170,000 computing cores).
Results publically reported at IHPCD2017. Configuration: Intel Xeon Phi 7250 nodes; Up to 272 (68×4) cores/node using 4 MPI processes + 68 threads per node; Quad / Flat memory mode; 10G network connectivity.

Figure 1: Strong scalability of end-to-end simulations (a) Small-scale BMT target was to calculate 5 lowest conduction band states in 27x33x33 nm3 (~1.5million atoms) SI:P quantum dot. The scalability is tested up to 3 computing nodes (204 cores). (b) Extremely large-scale BMT target was to calculate 3 lowest conduction subbands in 2715x54x54 nm3 Si:P nanowires (0.4billion atoms). The scalability here is tested up to 2,560 computing nodes (170,000 cores) in NURION system.
Ryu points out that “Intel technology matches with the purpose of KISTI”. According to a statistical workload analysis performed at KISTI, approximately 50 percent of their workloads involve sparse matrix operations. This means the NURION supercomputer should perform well in meeting the needs of KISTI researchers across a wide range of research areas. Summarizing the selection of Intel Xeon Phi processors for many of the NURION nodes, Ryu states, “With Intel Xeon Phi processors, we are able to drive a huge reduction of end-to-end simulation times for million atomic systems.”
Hitting the ground running with advanced codes for advanced material science
In a software development project spanning years, Dr. Ryu has been developing codes for tight-binding simulations of large-scale electronic structures. He explains, “This work basically needs to solve Schrödinger-Poisson equations that normally involves nanostructures consisting of tens of million atoms, which are numerically described with system matrices of billion degrees of freedom (DOFs).
“This software package,” Ryu said “is useful to the national business of South Korea in developing advanced semiconducting devices.”
Thus the KISTI team has spent considerable effort creating highly optimized CPU and GPU versions. When discussion the selection of CPUs over GPUs for the NURION procurement, Dr. Ryu emphasizes, “We particularly want to note that the GPU optimization was not ‘loosely’ done to make CPUs look good”. Dr. Ryu and various co-authors have published their performance results in the literature for CPUs and GPUs.To assist others, Ryu is developing a white paper to tell the full CPU vs. GPU story in an article to be published later this year.
Hitting the ground running
KISTI has the leadership R&D capability but the capabilities of the CPU-based NURION system introduced new levels of technology capability.
Starting in 2013, KISTI was the first Intel Parallel Computing Center (Intel PCC) in Asia-Pacific area. An ongoing collaboration, this joint effort has paid off with quick returns on the NURION supercomputer even though the system was just recently installed and is just starting to be made available to users.
Experimental validation
One example is research on metal halide perovskite, a promising material candidate for optoelectronic devices. It can provide nice guidelines for device designs such as how to map optical gaps and how to alleviate light-induced phase separation (a bottleneck in LED designs). More efficient LEDs can translate to more efficient lighting and screen displays for consumption by a global marketplace.
In an early demonstration of the efficacy of the NURION supercomputer, Dr. Min Sun Yeom, Director and Principal Researcher, Center for Applied Scientific Computing, KISTI ROK writes, “With tight-binding simulations of nanostructures having > 100,000 atoms on NURION system, we were able to explore the effect of size and structural engineering on band gap energies of physically realizable lead halide perovskite nanostructures within quite reasonable time. We also obtained the preliminary ideas for how to reduce the light-induced phase separation in halide mixtures, which would not be feasible with DFT simulations.”

Figure 2: Connection of experiments and large-scale simulations (a) Experimental image of perovskite (CsPbBr3) quantum dots (Nano Letters 15, 3692-3696) (b) Dependency of band gap energies on quantum dot sizes. The KISTI numerical results connect nicely to experiment.
Summary
Given the importance of advanced material design to South Korea, the search for ever more capable hardware continues at KISTI. It did not stop with the procurement of a CPU-based supercomputer. In particular, Dr. Ryu points out the KISTI Intel PCC team is evaluating the use of FPGAs to assist in large scale electronic structure calculations. As with the GPU and Intel processor evaluations, the KISTI team has been publishing their work on FPGAs as well.
This article was produced as part of Intel’s HPC editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC community through advanced technology. The publisher of the content has final editing rights and determines what articles are published.