HPC Organizations Combine Forces at SC12 to Support Use of Slurm Workload Manager
Slurm is one of the most powerful and scalable workload managers in HPC, yet it is probably the least well-known. This low profile is about to change. The open source product has gained momentum, with many national laboratories around the world and dozens of universities relying on Slurm — but few people outside of these organizations know this fact.
Now, a number of Slurm backers have come together to raise the profile of Slurm, beginning with organizing a Slurm booth at SC12 in Salt Lake City. These initial sponsors comprise Slurm users CEA (French Alternative Energies and Atomic Energy Commission), CSCS (Swiss National Supercomputer Center), and Lawrence Livermore National Laboratory (LLNL). In addition, technology providers Bright Computing, Bull Information Systems, Greenplum/EMC, Intel, NVIDIA and SchedMD are participating.
Mark Seager, Intel CTO for the HPC Ecosystem comments, “Continued innovation in the HPC ecosystem is a sign of the health, growth and importance of the HPC market segment. It is important for HPC innovation to remain vibrant, and the Slurm activity is an exemplar.”
Slurm is an open source workload manager originally developed to schedule compute jobs at LLNL. Ten years later, it is now used on about 30 percent of the TOP500 systems, possibly more than any other workload manager.
Started by just a few programmers at Livermore, there are now more than 100 developers from dozens of organizations around the world who have contributed to the code. Together, they are adding capabilities at high speed, working to a six-month release cycle. The primary developers of Slurm, Moe Jette and Danny Auble, now run SchedMD, the company that oversees the code base and leads its further development, and offers commercial Level 3 support.
“We built Slurm to efficiently schedule resources for the biggest systems, and have proven this scalability to at least an order of magnitude higher than any currently available system,” said Moe Jette, CTO of SchedMD. “It’s now one of the most widely used workload managers in the Top500, including on the Sequoia supercomputer at LLNL. As we move to Exascale, Slurm is the workload manager best positioned to schedule jobs at that scale.”
Beyond organizing a booth at SC12, there are other profile-raising initiatives that have emerged, including a new logo, a Slurm group on Linkedin, Twitter accounts (@SchedMD, @SlurmWLM,), facebook and a blog. In addition, Slurm is freely available for download, along with current documentation and information, from SchedMD (http://www.schedmd.com).
Matthijs van Leeuwen, CEO of Bright Computing, also sees the rising importance of Slurm Workload Manager. “We are seeing a strong increase in customer demand for Slurm. Although we have integration with all of the major workload managers as pre-configured options for Bright Cluster Manager, and are partners with most of their vendors, we are now including Slurm as our default workload manager. Further, we are about to launch commercial support for Slurm, to provide a one-stop solution for our customers. This initiative to support the growth of Slurm makes a lot of sense to us. It aligns with our belief that those who manage HPC clusters benefit from the ability to choose the workload manager that best fits their needs.”
Slurm presentations scheduled for SC12 Booth #3444 include:
• Introduction to Slurm Workload Manager and Roadmap (Moe Jette and Danny Auble, SchedMD)
• Bull’s Slurm Roadmap (Eric Monchalin, Bull)
• MapReduce support in Slurm (Ralph Castain, EMC/Greenplum)
• Using Slurm for data aware scheduling to the cloud (Martijn de Vries, Bright Computing)
• Slurm on the Sequoia supercomputer (Don Lipari, LLNL)
• Slurm at Rensselaer Polytechnic Institute (Tim Wickberg, RPI)
• The Slurm BoF meeting is scheduled for 12:15 PM on 15 November, in room 155-A at the Salt Palace Convention Center in Salt Lake City.