In the October 2012 edition of Scientific Computing, I described the rapid increase in biopharmaceutical R&D partnering and its effect on informatics architectures. I detailed how externalization and research virtualization were leading to a “de-evolution” of traditional technology alignments as companies resorted to e-mail and paper documents to share data between partners. I also described how systems optimized for internal consumption are often inadequate to facilitate secure collaboration across an ecosystem of geographically dispersed contractors, academics and strategic partners.
Back in 2012, there were few vendor offerings specifically designed around a research virtualization paradigm. Services such as CDD Vault from Collaborative Drug Discovery (CDD) were available, but the majority of “cloud” laboratory informatics solutions were merely client/server LIMS and ELN products that vendors arranged to be hosted by a third party. However, these products generally lack the cloud capabilities of multi-tenancy, thin-client design and the foundational architecture to scale to thousands of users. In addition, integrating partners may require a multitude of capabilities, and these isolated systems may only meet a portion of the needs. This necessitates deployment of multiple vendor solutions with dissimilar user interfaces, hosting environments, development tools and security configurations.
The lack of a holistic data management environment to support virtualization has left project managers in a haze about how best to address the needs of the business. The sky is beginning to clear somewhat with recent introductions from companies such as Accelrys, Core Informatics and PerkinElmer. Those products, along with CDD, will be discussed to highlight capabilities and vendor approaches.
Collaborative Drug Discovery: CDD Vault
Collaborative Drug Discovery (www.collaborativedrug.com) is the most mature of the group, celebrating its tenth year providing hosted solutions. The company claims nearly a quarter of a million logins into their software-as-a-service (SaaS) CDD Vault solution. More than half of those logins occurred in the last year alone.
Coming before the “cloud” was in anyone’s vernacular, CDD was spun out of Eli Lilly in 2004 with a focus on startups, academics and government institutions emphasizing neglected disease research. In the last few years, the company has expanded its profile into larger biopharmaceutical companies and contract research organizations (CROs).
“Around 2010, we saw a change in the market where commercial companies started to actively seek out hosted solutions,” said Barry Bunin, CDD’s CEO. “This was driven by larger companies not having the platforms to support virtualization, as well as smaller companies who did not have the means to deploy software of this scale. Smaller biotechs that are involved in a lot of innovative research need to be nimble, yet have sophisticated tools at their disposal.”
CDD Vault gives particular attention to the support of distributed discovery chemistry, including modules for chemical registration, in vitro biology, and Structure-Activity-Relationship (SAR) analysis. The focus is on a deep set of functionality in a limited set of domains, rather than a broad platform spanning all of R&D. To accomplish this depth of capabilities, partnerships were formed with companies like ChemAxon.
Developed using the open source framework Ruby on Rails, the thin client Vault represents data using the JSON (JavaScript Object Notation) format. Published RESTful APIs enable CDD or user-developed applications to post and get data from the system. Connectors are also available for extracting, transforming and loading (ETL) data from existing repositories. A key strategy of the company is to integrate private and public data sources. Queries can extend beyond private data collections into several public screening and compound databases.
Security is always at the top of mind for any company looking to host intellectual property in the cloud. CDD takes a unique position in that they view hosting the system at their location as being a more secure environment than using a third party. “One of our lessons learned was that you have to expend a lot of energy on security,” said Sylvia Ernst, head of sales for CDD. “We feel it is important that we control the server and have the staff to monitor all aspects of security.” The company says they contract with ethical hacking third parties for routine penetration testing.
Accelrys: ScienceCloud
In March of this year, Accelrys introduced “ScienceCloud” (www.sciencecloud.com), a service a generation beyond two previously introduced offerings. One was HEOS, a discovery data management system acquired from SCYNEXIS in 2012, and the other was iLabber ELN from the acquisition of Contur in 2011. Those capabilities and additional functionality were transformed into ScienceCloud using the Accelrys Enterprise Platform (AEP) framework. Early customers have been biopharmaceutical companies, their academic partners and CROs. The system is hosted through an alliance with BT Plc.
ScienceCloud also has a heavy focus on collaborative discovery chemistry, providing a set of applications for chemical registration, structured data management, property prediction, searching and data analysis. The inclusion of the ELN adds capabilities for reaction planning and experiment documentation. The company indicates they will expand to support biologics workflows in the near future, with long-range plans to move downstream into development and translational research.
The system manages structured data in Oracle tables and unstructured data maintained mostly as documents. An interesting feature is the ability to search via chemical structure across both data types. When a user searches by drawing a chemical structure, it is also converted to other formats, such as a SMILES string, to search across indexed documents. This can be expanded to search external data sources through Pipeline Pilot, the company’s workflow, integration and analysis tool.
Extensions come primarily from a cloud version of Pipeline Pilot, open to both users and third-party developers. Designers can create new workflow protocols, integrations, dashboards or reports. In addition, programs can be written using the underlying RESTful Web services and can be wrapped by Pipeline Pilot for automation. Applications or protocols can be published as open source to the “Exchange” area, essentially an online marketplace of apps. ScienceCloud iOS mobile apps are also available on iTunes.
This major push to the cloud reflects Accelrys’ view of the future of research. “In my mind, I think it is inevitable that drug discovery research will largely move to a virtualization model in the coming years,” said the company’s CTO Matt Hahn. “The trends are clear. Industry will move science to hosted platforms.” Recognizing that the transition will not be overnight, integration to existing infrastructure is important. “No one is 100 percent in the cloud. There is a spectrum of adoption that must be supported, and our goal is to connect to other vendor products. Apps will evolve, and it will take time to migrate away from existing systems. We must provide as simple of a transition as possible.”
Core Informatics: Platform for Science
Founded in 2006 as a LIMS supplier, Core Informatics introduced their Platform for Science (PFS) (www.platformforscience.com) in November 2013. Billed as a scientific platform-as-a-service (sPaaS), PFS is a composite of lab informatics capabilities leveraging Amazon Web Services (AWS). Customers are predominantly commercial biopharma companies, CROs and molecular diagnostics firms supporting genomics, biologics and small molecule processes.
Unlike the others, PFS includes LIMS and scientific data management system (SDMS) functionality in addition to ELN, chemical and biologics registration, inventory and workflow. Core Informatics is providing foundational components rather than concentrating on a specific domain like discovery chemistry. “Our goal is to build an open community, where Core, partners and users can build and share a marketplace of applications built on a common framework,” says Anthony Uzzo the company’s co-founder and president.
Like Accelrys, PFS includes a storefront of apps, all of which are currently downloadable at no charge. “Some of the most popular apps recently have been for next-generation sequencing (NGS) supporting Illumina and Ion Torrent protocols. A big advantage of the cloud is that these protocols are very dynamic. As soon as the instrument company creates a new one, a workflow app can be posted to be available to all our NGS clients.” New apps can be developed using the Core Informatics toolset or though programs using the available RESTful API.
The system is intimately tied to AWS, leveraging Amazon Simple Storage Service (S3), Elastic Compute Cloud (EC2), and Beanstalk; Oracle-as-a-service is the backend database. “Leveraging AWS is strategic for us,” said Uzzo. “We plan to follow the AWS computing trends to add capabilities beyond what we — or our users — could do on our own. An example of this is Redshift, their recently introduced petabyte-scale data warehousing technology that we might take advantage of in the future.” Using HTML5 for the development of the interface provides portability across browsers and mobile devices.
As with the other companies, client data security is a major focus, and Uzzo is confident in Amazon’s ability to protect data. “They have many security features, including the ability to lock down access to specific IP addresses. We also routinely contract with third parties to perform penetration testing. Before we conduct the tests, we must inform AWS that we are going to perform these tests or they detect, flag and block the intrusion.” “Security,” he says, “was certainly a major focus of the Central Intelligence Agency which recently signed a $600 million contract with AWS.”
PerkinElmer: Elements
Another product taking advantage of AWS is PerkinElmer’s Elements (elements.perkinelmer.com) launched in March of this year. Elements came to PE though an acquisition of Wingu in the fall of 2013. With Elements, PE has purposely taken a different philosophy than the others with focused attention on academic collaboration, a group that has been historically much less resistant to the cloud than large pharma.
Elements is fundamentally a thin client ELN and collaboration software-as-a-service solution built using Java, Javascript and PostgreSQL as the data repository. The basic ELN functionality is extended through data containers and workflow apps known as Elements. Once created, the user drags an Element onto an experiment page from a storefront. Currently, PE must develop these apps, but the vision is to open the development environment to third parties and users. An application program interface is said to be in the works.
Asked how the system differs from PE’s flagship thick client E-Notebook ELN, Brian Gilman, Elements product manager says, “E-Notebook is for those who need a richer set of capabilities. The focus of Elements is on simplicity and those users who require a basic ELN and collaboration environment.” He goes on to say that one should “never underestimate the complexity of scientific workflows. By keeping it simple, we can focus on the core features that are needed across many different types of users and grab the ‘long tail.’”
The academic market has been historically slow to adopt ELN. By focusing on simplicity and a flexible pricing model, PE hopes to change this. “A user can start with Elements for free,” says Gilman. “They can have up to five experiments at no charge. If their needs expand beyond that, there are incremental monthly plans based on storage and features.” With this “try and buy” approach, PE is encouraging students and their teachers to use ELN for their class assignments without cost. In the future, PE hopes to further the use of ELN in education through additional features, such as interactive teaching and grading.
The majority of newly announced laboratory informatics solutions are cloud services, either as software or platform. These introductions are helping to raise awareness of the benefits of the cloud and are lowering resistance to adoption. How quickly the majority of labs will transition to the cloud is anyone’s guess, but research virtualization is a primary driver accelerating the market evolution.
Michael Elliott is CEO of Atrium Research & Consulting. He may be reached at editor@ScientificComputing.com.