Ensuring Value in HPC Procurements
A few benchmarking tips from HECToR
Will the investments your research organization makes in high performance computing resources really pay off? This question is always a worry, or should be, when return-on-investment is so tightly tied to technical insights, as it is in HPC procurement projects. However, the UK’s national supercomputing service for academic research, known as “HECToR” (High End Computing Terascale Resource), provides a few tips.1
The HECToR procurement serves as an excellent example of the role system benchmarking can play in making good, technically informed procurement decisions. Key questions can be answered by thorough benchmarking, such as:
• How will my codes perform on this system?
• How will they perform in a loaded system?
• In which part of the hardware should I invest most: the interconnect, more processors, more disk space, faster disks?
Of course, the driving force for benchmarking is to ensure value for money.
Benchmarking for large HPC procurements, especially those destined to deliver a service life-cycle spanning multiple systems, comes with its own complexities. How do we establish a scoring method that is fair, influences bids in the best way for the users, and is practical to implement by the vendors and by the procurement team?
For each of the machines evaluated in the HECToR procurement, the performance of each benchmark code (in FLOPS) on that machine was divided by the best performance figure for that benchmark taken over all machines and the base figure for the previous national HPC service. To get the overall score for a machine, an aggregate scoring system was used: calculate the sum of the product of every benchmark and its weight, so that the machine with a score closest to 1 is the winner. This system was favored because it guides the vendors to focus their solutions according to the weight of each benchmark.
For HECToR, the weightings were based on predicted usage and representation, with the following rule of thumb: the weight for any benchmark should be between 10 and 25 percent. This means that no application in the suite is assigned such a low weight that it is effectively ignored — each application must have been selected for a reason and, therefore, must contribute something (but not too much) to the overall score.
The benchmarks were based on runs of the following UK HPC user applications: CASINO, CASTEP, NAMD, HELIUM and the Unified Model (Table 1). For each code, two different problem sizes were used:
• Bench1 represented the types of problems being tackled on the previous service, i.e. a typical production simulation at the time of the HECToR procurement project.
• Bench2 was designed to represent the new kinds of problems HECToR was being procured to enable and, therefore, much larger than Bench1.
A micro-benchmarking package with good community acceptance, High Performance Computing Challenge (HPCC) also was used to fill any gaps left by the user applications, as well as to stress specific parts of the systems.2 Results from a range of different systems are given on the HPCC Web site, which allows direct comparisons to be made.
Each benchmark was executed on a clean system with no other jobs, and in a so-called saturation test. The saturation test involved running two instances of Bench1 for each user code (i.e. 10 jobs) and lots of smaller jobs. Two measurements were taken from this exercise:
• time to complete all 10 Bench1 jobs
• number of smaller jobs completed in this time
The aim was to demonstrate the effectiveness of a system under load, where many jobs are competing for shared resources (e.g. disk and interconnect access). This also guided the vendors toward including appropriate system software (e.g. batch schedulers) in their solutions.
Benchmarking is vital to provide a technically informed decision on how different systems will perform the kinds of tasks to be supported by a large HPC procurement. Getting the decision right, therefore, relies on a representative selection of user codes in the benchmark suite and the appropriate assignment of weightings. Test cases should be (relatively) short but also balanced, such that startup costs do not dominate in a way that skews scaling results. Thus, timing separate phases of computation is important, but also be aware that e.g. iterations of the same computational process can have different performance profiles. Using the synthetic benchmarking suite HPCC was valuable, since it allowed any aspect of system performance unstressed by the user codes to be tested, but synthetic benchmarks should not be relied upon alone.
Testing the system as a shared resource through the saturation runs is harder than evaluating the performance of individual applications. The workload devised during HECToR procurement was based on usage of the previous service, and it thoroughly put the operating system and scheduler to the test.
Finally, working closely with bidders in order to learn some of the tips and tricks used to get the most out of their systems was useful, especially for future use of the winning machine.
Benchmarks do not, of course, provide a magic answer to the procurement selection — the choice of supplier and solution is quite properly ultimately a business decision rather than a purely technical one. However, in an area such as HPC, where leading-edge technology is the essence of the capability enabled by the procurement, the decision must include high quality technical input — and a correctly designed benchmarking regime matched to the individual needs of the procurement is almost certainly the best method of establishing a robust technical assessment.
1. The Numerical Algorithms Group (NAG) has been involved in the HECToR service from its launch in September 2007 by providing computational science and engineering (CSE) support to users, which includes training on all aspects of HPC development and usage, writing documentation and more direct assistance with software engineering, such as optimization, debugging and advice on algorithms. However, NAG’s involvement with HECToR goes even further back, having provided technical consulting on HPC procurement to the funding Research Councils including HPC technology and market advice and designing and overseeing the benchmarking regime. The procurement process eventually selected Cray to provide the supercomputer and, after its first technology refresh, HECToR currently has a 208 Teraflops quad-core Cray XT4 with over 22,000 AMD Opteron cores.
Table 1: Applications upon which Benchmarks were Based
• CASINO is a code for quantum Monte Carlo electronic structure calculations: www.tcm.phy.cam.ac.uk/~mdt26/casino2_introduction.html
• CASTEP uses Density Functional Theory to provide atomic level simulations of materials: www.castep.org
• NAMD is a bio-molecular dynamical simulation package: www.ks.uiuc.edu/Research/namd
• HELIUM models laser interaction with the helium atom: www.am.qub.ac.uk/ctamop/ili_1.html
• The Unified Model is a collection of codes developed by the UK Met Office for climate and weather prediction: www.metoffice.gov.uk/science/creating/daysahead/nwp/um.html
Chris Armstrong is an HPC software developer and a member of the Computational Science and Engineering team at Numerical Algorithms Group. He may be reached at editor@ScientificComputing.com