Adaptive Computing worked with High Performance Computing for Health Sciences (HPC4Health) — which consists of The Hospital for Sick Children (SickKids), University Health Network’s (UHN) Princess Margaret Cancer Center, Compute Canada and Compute Ontario — to create a converged HPC, cloud and big data environment that was capable of bringing multiple organizations together to share resources dynamically, securely and equitably. Together, we are building the engine that will help make personalized medicine and diagnostics a reality. To help bring these organizations together, Moab HPC Suite – Enterprise Edition 8.1 (Moab) was chosen for its elastic computing, advanced policies and accounting capabilities.
HPC4Health has been an amazing project. We loved working with the HPC4 Health teams. They were so creative and dedicated to creating an environment that would really make a difference. Everyone knew they needed the power of HPC to analyze the massive data they were collecting and to extract the necessary data fast to make life-changing decisions. By coupling HPC with cloud, and its inherent sharing capabilities, you have this really cool, dynamic, scalable, powerful environment that can serve multiple organizations and deliver the necessary resources when each organization needs it. By creating this converged infrastructure of cloud, HPC and big data with Moab, SickKids and UHN’s Princess Margaret Cancer Center have the resources necessary to save lives!
Adaptive was there from the beginning to help create technology that did not exist to make HPC4Heath’s vision a reality. We helped them build a converged data center that dynamically shared resources securely and allowed them to account for the workloads used by each organization involved in HPC4Health. The HPC4Health IT Infrastructure is configured as a single pool of resources, with each organization having dedicated resources, plus a common communal pool of resources. Each organization and their Admins manage their dedicated resources just as if it were a private data center. As workloads increase, Moab automates each organization’s growth requirements and dynamically obtains additional resources from the communal pool to handle the peak loads, and then relinquishes those resources back to the communal pool for the next peak workload requirement from any organization. All workloads are tracked per user/organization and accounted for with extensive reporting capabilities.
Here are a few more details about elastic computing, advanced policies and accounting capabilities to better understand how HPC4Health is able to orchestrate their workloads and resources.
Elastic computing
Administrators from both SickKids and UHN’s Princess Margaret Cancer Centre must ensure that regularly scheduled workloads are completed, particularly during peak times. Each organization manages many users with countless needs, and the requirement to be responsive to those needs is imperative; therefore, the ability to burst workloads to other resources is extremely important.
Moab tackles these challenges with elastic computing, which allows Admins to efficiently manage resource expansion by bursting to private clouds or other data center resources utilizing OpenStack. Elastic computing is triggered when a threshold set in Moab is exceeded. To determine this threshold, Moab surveys the system workload and calculates the combined completion time of these burstable workloads if no other workloads are running. Elastic computing bursts workloads, on an as-needed basis, into a communal pool of data center resources and then relinquishes these resources back to the shared pool. Using Openstack, Moab completely wipes each resource after use to help comply with Canadian privacy regulations. This added flexibility enables Admins to expand their own cluster while taking advantage of the elasticity of resources and scalability of the cloud.
Advanced policies
Some advanced policies, such as auto enforcement of Service Level Agreements (SLAs), dynamic provision of virtual resources, and job arrays, are key to the success of HPC4Health’s converged infrastructure.
- Auto SLA enforcement schedules and adjusts workloads to consistently meet service guarantees and business priorities so the right workloads are completed at the optimal times.
- Resource sharing and usage policies schedule resources across users, groups and projects in line with resource sharing agreements, such as usage limits, usage access controls and dynamic fairshare policies.
- SLA and priority polices ensure the highest priority workloads are processed first, such as quality of service and hierarchical priority weighting.
- Continuous plus future scheduling ensures priorities and guarantees are proactively met as conditions and workload levels change (future reservations, priorities and pre-emption).
- Dynamic Provisioning discovers that the current level of resources will not meet a given SLA, then reaches out to a provisioning tool that has access to the communal pool of virtual resources. The resources are allocated and then provisioned to match the needed environment. When the workload is complete, the added resources are returned to the communal pool (de-provisioned and removed from the workload manager).
- Job Arrays support the submission of many sub-jobs that perform the same work using the same script, but operate on different sets of data.
Accounting
Usage accounting and budget enforcement enables tracking of resource usage, as well as the setting and enforcement of usage budgets by user, group, project or any custom organizational hierarchy. Resources are scheduled against that budget for a given period of time, including dynamic usage reports and a flexible conditional usage cost/charge structure. This allows HPC4Health to track usage for each organization and then each organization can further track internal usage by user, department or group.
To hear more about HPC4Health, join us at SC15 in Austin, TX. Jorge Gonzalez-Outeirino, Ph.D., Facility Manager at the Centre for Computational Medicine at SickKids will be speaking in booth #833 on November 17 at 10:30 am. Also you can visit the Adaptive Computing Web site for additional information on the HPC4Health deployment.
Marty Smuin is CEO at Adaptive Computing.