A Building Front for Cloud Computing
High-performance applications are increasing in strength as the storm grows
Over the past few years, cloud computing has rolled into view, but the uptake remains unclear. In particular, one wonders: Who is taking advantage of this high-powered opportunity for flexible access to computing cores, storage, software and other services? As this article shows, some of the biggest users of high-performance computing are starting to explore this technology.
In September, 2010, at SPE’s Annual Technical Conference and Exhibition in Florence, Italy, Microsoft and Accenture released a survey of 172 experts in the oil and gas industry. In the results, 64 percent said that they would benefit from a simpler and more unified computing environment. Moreover, half of those polled pointed out that the explosion of data hinders their work. To solve that problem, 30 percent of those experts mentioned cloud computing as a valuable option. The poll also notes that worker interest is running ahead of company adoption, because less than of a quarter of those polled stated that cloud computing — and other advanced capabilities — have been put to work fully in their companies.
Other well-known environments for high-performance computing, such as biotechnology and the pharmaceutical industry, seem farther along in adopting cloud computing. The biotechnological and pharmaceutical sciences generate and must analyze some of the largest databases. Consequently, some of these companies have started to explore the cloud.
Taking on fuel
To keep fulfilling the world’s petroleum needs, companies in the oil and gas field juggle massive amounts of data, including geological information, as well as the histories of a seemingly endless list of wells. To manage so much data and computation, these companies could turn to cloud-based tools.
That’s just what Energistics — an open-standards consortium for the oil and gas industry — has in mind. According to Tracy Terrell, managing director of technology at Energistics, “We work with data exchange for the upstream oil and gas industry, which is exploration and production.” For those users, says Terrell, Energistics is working with its members on a proof-of-concept project that uses cloud computing.
“We want to evaluate using the cloud for document sharing,” he says. These XML documents include drilling, production and reservoir information. “When companies get together to drill a well,” Terrell explains, “they share a lot of data. This includes daily drilling reports, real-time production or revisions to a well design.”
So, a cloud can provide a neutral meeting place for those oil and gas companies to exchange documents. “The companies will be able to authenticate, using their own directory instead of logging into someone else’s directory to deposit a document. It will relieve these parties from managing IDs for each other and providing a file-exchange location,” Terrell explains.
In describing the status of this project, Terrell says, “Our members from Chevron, Pioneer Natural Resources, Microsoft, Atman Consulting, Covisint and others have worked together on the proof of concept. Microsoft has advised and sponsored an Azure platform for the project, and we are now at the stage where meaningful uses can be demonstrated.”
For now, says Terrell, “It’s about convenience, but that’s just the first opportunity of what we think are many. This may become a permanent repository for asset — field or well or facility — information. It could hold all of the drilling and production information for the life of a well in the cloud.”
A cloudburst for biotech
For biotechnology companies, cloud computing can be a resource and a product. Life Technologies, for example, uses cloud computing for in-house work. “We’ve used cloud computing as a way to offload some of our computing surges without having to invest in more in-house computing resources,” says Claude Benchimol, head of biological information systems for Life Technologies.
In addition, Life Technologies provides cloud computing to its customers. “Cloud computing is part of our offerings for genetic information,” Benchimol says. For example, Life Technologies sells DNA-sequencing machines.
“These second-generation machines produce a colossal amount of data,” Benchimol says. For example, the DNA sequencing from just one human takes seven to 10 days, and that can produce as much as a terabyte of data. Moreover, the speed of such sequencing keeps getting faster. “Within one to two years,” Benchimol says, “one person’s DNA might be sequenced in just a day.”
This process also requires computation. To figure out the order of nucleotides that make up a person’s genome, it is first chopped into pieces — the bite sizes that a sequencer can accommodate. A single human genome consists of about three billion nucleotides, and that gets chopped into chains of just 50 to 200 nucleotides for sequencing. Once the pieces are sequenced, they must be combined. “After you decode each piece,” Benchimol explains, “the trick is putting it back together. It’s like assembling an enormous puzzle.” That processing can reduce the terabyte of data into three gigabytes or less.
“The cloud offers people a supercomputer for a limited time,” Benchimol says. To provide that capability, Life Technologies teamed up with Penguin, a cloud partner. Customers get three options:
- They can run Life Technologies’ data-processing software on their own computer, if the hardware meets the requirements.
- Likewise, a customer can purchase a Penguin cluster with preinstalled software.
- Lastly, Life Technologies also offers customers a cloud subscription through Penguin.
When putting someone’s genome on the cloud, questions of security arise. As Benchimol says: “Nothing is more proprietary than someone’s DNA code.”
While Benchimol acknowledges that security has been raised as a concern, he says, “I don’t believe this has been an issue, because bank information, clinical-trial data and all sorts of information are already on the cloud, and it’s usually well-protected and well-managed.”
Sky-high medical advances
While genomics raises the data load on biotechnology, the pharmaceutical industry turns mountains of data into medical advances. To do that, Merck takes advantage of a range of cloud-computing resources, such as infrastructure as a service. According to Martin Leach, executive director, IT discovery and preclinical science at Merck Research Laboratories, “We’ve been experimenting with infrastructure as a service to expand scientific computing for large computational problems.”
This large pharmaceutical company, which recently merged with Schering-Plough, is still consolidating its computing resources. Nonetheless, Leach says that he estimates that he has 5,000 to 7,000 computing cores at his disposal. “For all of the biology, chemistry, statistics and so, that still is not enough.”
So, Leach and his colleagues are looking to cloud computing for unpredictable computing needs. As Leach explains, “If someone comes to me and needs 1,000 cores for a specific problem, that takes a lot of time to build internally. If you have a validated cloud-computing option, you can have that capability in several hours.”
Beyond computation, Merck also might look to the cloud for storage. “It’s being investigated, especially where very large files are associated with research,” Leach says. This includes research that consists of images, such as high-content screening, MRIs and PETs. “That creates lots of raw-image data that you don’t necessarily need online and nearby. We’re exploring the cloud for image management and will need to test options available.”
As the clouds keep building, more companies will certainly put this technology to use, especially in situations that require high performance computing.
Mike May is a freelance writer for science and technology based in Houston, TX. He may be reached at editor@ScientificComputing.com