Just prior to SC15 in Austin, founding members introduced a new community called OpenHPC, a Linux Foundation-supported project focused on reducing inefficiencies and speeding up innovation across the HPC ecosystem.
It’s about time
Time is a precious and expensive commodity for HPC engineers, system administrators, and researchers alike at any TOP500 supercomputing site. Anything that can be done to speed the cycle of innovation could deliver potentially huge payoffs, from providing more time to more researchers, to helping rein in management costs.
The HPC ecosystem is ripe with opportunities for new efficiencies. For example, a lot of the work being done across TOP500 sites to integrate and validate system software stacks is being replicated. But a host of factors, ranging from rapid release cycles, to compatibility considerations, to the specificity of user requirements for any given workload present obstacles. Absent an effort to start reining in HPC stack development, configuration and management challenges, the work will only continue to multiply.
The OpenHPC community was formed with the goal of streamlining some of the redundant efforts across HPC sites by fostering a community that can work together to develop flexible, open source HPC solutions.
If things go according to plans, the open-source solutions will help scientists to more easily disseminate their tools and to use the tools of other scientists through integration, which ultimately translates into more time spent on groundbreaking research.
OpenHPC is a Linux Foundation Collaborative Project that will be led by a team of open source and HPC experts. Early members will help set the direction for the project. “In my previous research role at an academic supercomputing environment, we talked about how useful it would be to have a community effort to address some of the shared challenges across sites, and it’s great that we’re getting the ball rolling,” says Karl Schulz, Principal Engineer for the Intel Enterprise and High Performance Computing Group. “OpenHPC is a place where system administrators, ISVs, OEMs and end users, including national labs and academia, can come together to solve shared problems,” explains Schulz.
Founding Members of OpenHPC Allinea Software, Altair, ANSYS, Argonne National Laboratory, Atos-Bull, Barcelona Supercomputing Center, Cray, Dell, Fujitsu, Hewlett-Packard, The Center for Research in Extreme Scale Technologies at Indiana University, Intel, Jülich Supercomputing Center, Lawrence Livermore National Laboratory, Lenovo, Leibniz Supercomputing Centre, MSC Software, NEC, National Energy Research Scientific Computing Center, Oak Ridge National Laboratory, Par-Tec, Pittsburgh Supercomputing Center, Pacific Northwest National Laboratory, SENAI CIMATEC Supercomputing Center for Industrial Innovation, SUSE, Texas Advanced Computing Center |
Why focus on the software stack?
OpenHPC is focusing on the software stack because of the potential of an integrated solution to ease the effort of using, developing for, and administering an HPC machine and speeding the cycle of innovation. “Many sites are left to their own devices when assembling a software stack on a high-end HPC system. Since most of the components they use are open source-driven, it makes sense to put together an open source community around this,” says Schulz. With good participation from HPC experts, the OpenHPC community can provide a centralized resource for sites to pull from, using an approach that is analogous to Linux distributions.
“When I was at a supercomputing center leading an HPC staff, a big part of our job was evaluating and selecting packages, integrating and testing these packages and making them available for users on our supercomputers. This is, hopefully, a good first step in providing some automation and centralization for that effort and setting the stage for future modularity enhancements and research integration,” says Schulz.
Of course, the effort will involve significant challenges, given the myriad of software modules and options in use across the HPC ecosystem, and the fact that every site has different requirements and favorites. Since you have to start somewhere, the initial idea is to create a common, validated platform for the HPC community that can be used to provide a full-featured reference HPC software stack. Given all of the complexities that different sites have to consider, the OpenHPC community will not be prescriptive on the choice of configuration management or architecture. Rather, the idea is to foster community usage with multiple example installation recipes (for large and small systems) and include validation efforts on reference systems and then expand over time. Whatever becomes of the early efforts, just establishing a common ground where HPC experts can share considerations could be an important leap forward.
Establishing a starting point
Intel created an initial stack, which has already been integrated and tested, to provide the seed from which the community can work. The pre-packaged binary components include practically everything that goes between the hardware and the applications that run on the system, including:
- RAS
- development toolchains
- scientific libraries
- performance tools
- benchmarking and diagnostics
- provisioning systems
- resource managers
- fabric management
- I/O services
- systems management
The modules come from existing open source upstream communities. To help streamline the time-consuming and resource-intensive packaging of multiple builds for different researchers and specialists, the idea is to use a hierarchical structure that can recognize interdependencies between packages in a system and automatically identify the appropriate libraries needed to run a package. The community will use the initial stack as a starting point, enhancing it and creating new APIs based on group decisions. The community may also be able to provide value by facilitating expert testing of different types of scientific packages, which can then be rolled into a test harness that gets automated as part of the build process. That way, when the community has a new release, members will know that they can not only grab the binary bits directly and not have to build them on their own site, but that they will be able to reduce the amount of integration testing, for example.
The team spent the last 12 months designing the build and test infrastructure, and iterated through several release cycles to arrive at the initial community stack, which is available for download here. The components will be available according to their respective licenses.
Getting involved
The companies and organizations that helped found the community are excited to get the OpenHPC community rolling to improve the value and usability of HPC for business and academia alike. “The OpenHPC framework will help organizations to focus on effectively using their HPC systems to attain mission critical objectives and to drive innovation, rather than expending resources on stabilizing and continually testing their operating environment,” says Bronis R. de Supinski, CTO, Livermore Computing, Lawrence Livermore National Laboratory.
SUSE has long recognized that as demand for processing power and speed continue to grow, HPC will likely see increasingly widespread applications in industry. “Our goal is to tear down barriers for adoption of HPC in commercial environments, by working together with the OpenHPC community on providing a standardized HPC stack,” says Kai Dupke, Senior Product Manager HPC at SUSE.
Atos is contributing its experience in building open source HPC software stacks to the community because it seemed like a natural fit. “We are confident that the OpenHPC collaborative effort will be instrumental in driving HPC technology to the next generation and accelerating its adoption, while bringing HPC users more value for a lower cost,” explains Jerome Soller, CTO Big Data at Atos.
The catch 22 with a project like this is balancing flexibility with ease-of-use. Schulz and the initial team know that community involvement will be critical to solving the tough challenges. “I’ve been an open-source person for a long time, so I’m excited to put this out there to get feedback and know that the real work is only beginning. The initial discussion that will probably be most interesting will be: Do we think the initial conventions the community has set fourth are workable on a large majority of systems?” says Schulz.
In addition to providing feedback, organizations and individuals will be able to contribute to OpenHPC in all kinds of ways, from hosting local mirrors of OpenHPC to contributing new components to donating hardware or sharing site knowledge. Learn more about OpenHPC and all of the ways you can contribute here.
Sean Thielen is a Portland-area high-tech writer.