Can Cloud Computing Address Scientific Computing Requirements for DOE Research? Well, Yes, No and Maybe
After a two-year study of the feasibility of cloud computing systems for meeting the ever-increasing computational needs of scientists, Department of Energy researchers have issued a report stating that the cloud computing model is useful, but should not replace the centralized supercomputing centers operated by DOE national laboratories.
Cloud computing’s ability to provide flexible, on-demand and cost-effective resources has found acceptance for enterprise applications and, as the largest funder of basic scientific research in the U.S., DOE was interested in whether this capability could translate to the scientific side. Launched in 2009, the study was carried out by computing centers at Argonne National Laboratory in Illinois and Lawrence Berkeley National Laboratory in California. Called Magellan, the project used similar IBM computing clusters at the two labs. Scientific applications were run on the systems, as well as on commercial cloud offerings for comparison.
At the end of the two years, staff members from the two centers produced a 169-page report with a number of findings and recommendations. Overall, the project members found that, while commercial clouds are well-suited for enterprise applications, scientific applications are more computationally demanding and, therefore, the computing systems require more care and feeding. In short, the popular “plug and play” concept of cloud computing does not carry over to scientific computing.
To thoroughly evaluate the cloud computing model, the team ran scientific applications for studying particle physics, climate, quantum chemistry, plasma physics and astrophysics on the Magellan system, as well as on a Cray XT4 supercomputer and a Dell cluster system. The applications also were run on Amazon’s EC2 commercial cloud offering, including the HPC offering the Cluster Compute instances, for comparison purposes.
Many of the cost benefits from clouds result from the increased consolidation and higher average utilization. Because existing DOE centers are already consolidated and typically have high average utilization, they are usually cost-effective when compared with public clouds.
“Our analysis shows that DOE centers are often three to four times less expensive than typical commercial offerings,” the authors wrote in their report. “These cost factors include only the basic, standard services provided by commercial cloud computing, and do not take into consideration the additional services, such as user support and training, that are provided at supercomputing centers today and are essential for scientific users who deal with complex software stacks and require help with optimizing their codes.”
A key characteristic of many scientific applications is that processes or phenomena are modeled in three dimensions over time, such as fuel burning in an engine, fluid flowing over different surfaces or climate changing over years or decades. In order to create realistic simulations, the applications often run on hundreds or thousands of processors in parallel and constantly communicate with one another, sharing data. Enterprise applications typically analyze data in sequence without much inter-processor communication, and many commercial cloud systems are designed to handle this type of computing.
The Magellan project also looked at a number of other concerns specific to Department of Energy and scientific computing concerns:
• Can DOE cyber security requirements be met within a cloud?
• Can DOE HPC applications run efficiently in the cloud? What applications are suitable for clouds?
• How usable are cloud environments for scientific applications?
• When is it cost effective to run DOE HPC science in a cloud?
Over the course of the project, a number of scientific applications successfully used the Magellan system at Berkeley Lab’s National Energy Research Scientific Computing Center. These applications include the STAR (Solenoid Tracker at RHIC) at Brookhaven National Laboratory, the Materials Project led by Berkeley Lab, the National Science Foundation’s Laser Interferometer Gravitational-Wave Observatory (LIGO) and the Integrated Microbial Genomes database at the Joint Genome Institute. As an example, access to the two Magellan systems helped physicists at STAR assess whether cloud computing was a cost-effective alternative to using systems with much longer lead times. In their quest to understand proton spin, the scientists were able to create a real-time data processing system on the cloud, providing their research community with faster access to experimental data.
In the end, the Magellan research teams came up with a set of key findings:
• Scientific applications have special requirements that require cloud solutions that are tailored to these needs.
• The scientific applications currently best suited for clouds are those with minimal communication and I/O (input/output).
• Clouds can require significant programming and system administration support.
• Significant gaps and challenges exist in current open-source virtualized cloud software stacks for production science use.
• Clouds expose a different risk model, requiring different security practices and policies.
• The MapReduce programming model shows promise in addressing scientific needs, but current implementations have gaps and challenges.
• Public clouds can be more expensive than in-house large systems. Many of the cost benefits from clouds result from the increased consolidation and higher average utilization.
• DOE supercomputing centers already achieve energy efficiency levels comparable to commercial cloud centers.
• Cloud is a business model and can be applied at DOE supercomputing centers.
The progress of the Magellan study was closely watched by the scientific computing community. Berkeley Lab computer scientists who worked on the project presented their work at workshops over the course of the Magellan project. The team received “best paper” awards for “I/O Performance of Virtualized Cloud Environments” presented at the Second International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC11) in November 2011, “Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud” presented at the IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2010) in November 2010 and “Seeking Supernovae in the Clouds: A Performance Study,” presented at ScienceCloud 2010, the 1st Workshop on Scientific Cloud Computing in June 2010.
In February 2012, NERSC Division Director Kathy Yelick will give a presentation on the Magellan project findings and recommendations to the Computer Science and Telecommunications Board (CSTB) of the National Academies, which includes the National Academy of Engineering, National Academy of Sciences, and the Institute of Medicine. CSTB is composed of nationally recognized experts from across the information technology fields and complementary fields germane to the Board’s interests in IT and society.