Technologies are evolving to simplify integration and reduce long-term risk
Over the years, the desire to solve the myriad of data management problems in the laboratory has led to a plethora of point solutions. Not counting the vast number of instrument data management systems, it is not uncommon that a typical scientist has over 10 different systems to navigate. Despite all the sophisticated capabilities provided by these tools, over 20 percent of the average scientist’s time is spent on non-value-added data aggregation, transcription, formatting and manual documentation.1 Not to mention the effort required to learn different security schemes and user interfaces. The more data generated, the greater the need to merge data from these software solutions, which comes at the expense of time for experimentation. Despite the millions of dollars invested, the scientist continues to be the data integrator. In other words, the most expensive resources spend valuable hours on the most mundane tasks. It is no wonder that, in a recent survey of over 400 scientists,2 “integrating data from multiple systems” was cited as the number one laboratory data management challenge.
The most common tool used by scientists for data aggregation is Microsoft Office — copy/paste data to Excel, perform calculations and charting, format, and paste to a PowerPoint presentation for a project team meeting. Along the way, printouts are made and pasted to a paper laboratory notebook. The bound notebook has been the ultimate storage location for raw data and the processed information consolidated and annotated from manual integration efforts.
Enter the age of electronic laboratory notebook — is it just another point solution or something more? Well, many scientists view it as something more; a piece to a larger puzzle. When survey participants were asked what they perceive an ELN to be, the top response (or statistically tied for first) in 2005, 2006, 2008 and 2010 has been “An ELN is a portal or entry point into all the laboratory’s systems and databases.” Scientists want the tool to help them solve their data management challenges and allow them to focus more time on science. Companies want this as well — “improve laboratory efficiency and productivity” has been the number one ELN project goal since 2005 (Figure 1). “Replacing paper” has always been far down the list. Scientists don’t want the paper to go away, they want their data problems to disappear.
Given this desire, why do many ELN suppliers position “paperless” as the key to lab productivity rather than a comprehensive focus on data flow and integration? Because integration is hard. Not hard from a technical perspective — with enough time, energy and money you can do anything with a good programmer, even bonding together disparate legacy systems. Organizational and operational issues are more difficult to address than writing code.
|Figure 1: 2010 ELN project goals|
An ELN project can span many different domains, and their subgroups can have different needs and terminologies impacting integration. What integration means to one group can be vastly different to another. In medicinal chemistry, a smooth data flow between the ELN, chemical registration, drawing, reagent inventory and spectroscopy instruments is needed, versus analytical chemistry where ties to LIMS, CDS, metrology, document management and solution inventory are important. One group may define integration as instrument-specific. For another, it is stitching together databases for a portal. While for a third group, it is for analysis automation.
The multiple terminologies various groups use also can impact integration. For example, what a “lot” or “batch” is can vary by who you ask: the medicinal chemist, formulator or biologics process development scientist. A common vocabulary can be one of the biggest stumbling blocks, as it involves either gaining consensus, defining semantic relationships, and/or data transformations.
The lack of leadership, communication and shared vision across organizational boundaries can cause divergence in approaches leading to a significant challenge for the ELN project team to sort out.
Following the path of least resistance, what usually happens is a tactical project, interfacing the ELN to System A via the systems’ application program interface (API). This project will spawn other interfaces — ELN to System B, System A to System B, and so forth. Commonly referred to as “star integration” due to the radiating connections, this tight coupling of autonomous systems soon becomes unwieldy. That is why this approach is also known as “spaghetti integration.” When taking a step back and looking at the overall topology, it starts to resemble a bowl of spaghetti (Figure 2).
After a few years and a few product revisions, these tight couplings tend to break. Vendors revise their data models, program interfaces and architectures. Even though they try (for the most part) to maintain backward compatibility, there is disruption for some client somewhere. With an unmanaged spaghetti topology, this can have a ripple effect into the operation of other systems.
|Figure 2: Example star or “spaghetti” integration|
This is one of the reasons, besides the additional revenue, that LIMS suppliers over the years have created supplemental modules like inventory, metrology and stability, which were standalone products from other vendors they used to have to integrate. They could now control the environment and maintain forward compatibility. Consequently, most of the major LIMS suppliers now are adding ELN modules for many of the same reasons. Conversely, ELN companies are doing the same by adding LIMS-like task workflow, instrument calibration, reporting and inventory. “Owning the customer experience” is not only beneficial from a revenue perspective, but it makes many headaches (both for the client and the supplier) go away. The challenge is maintaining best-in-breed functionality for each of the modules, something no single company has proven to be able to do so far.
Regardless of what a supplier may say, a “one stop shop” is a theory, not a reality. There will always be a need to bridge technologies and integrate data flows. You also may not be disposed to have a single vendor controlling your data management architecture. Thankfully, several ELN suppliers are expanding their list of Web services to simplify access to system functionality and enable loose coupling with other systems. The risk of breaking a tightly coupled connection between the ELN and another system is minimized. The underlying architecture can change, since it is abstracted from the Web services interface. But, the nuances of design and capabilities will still vary, and there are many legacy data sources with which to deal.
Increasingly popular in larger corporations is the use of an enterprise service bus (ESB). An ESB is a service-oriented middleware application providing a common framework for messaging between systems. A Web service interface is built between the system and the bus, abstracting it from the rest of the architecture. An ESB enables processes and data flows to be adapted without the necessity to recode point-to-point integration. But, an ESB is quite expensive and requires an organizational commitment and IT infrastructure that is beyond the reach of many. Commercial systems also are generic and lack the scientific data awareness (like chemical representation) that is needed in most laboratory processes.
There are scientifically-aware tools for data pipelining and workflow that offer some ESB-like capabilities. Three major scientific software providers are investing in bridging these technologies with ELN:
- Accelrys has connected their Pipeline Pilot scientific informatics platform with Symyx Notebook ELN to be a general purpose integration engine.
- CambridgeSoft signed an agreement with open-source KNIME to build capabilities from their integration platform into CambridgeSoft’s suite of products, including E-Notebook ELN.
- With a focus on translational medicine, IDBS has coupled their InforSense business intelligence platform with their E-Workbook ELN to aggregate data from clinical and genomic data sources.
|Figure 3: Service bus architecture|
Though they are taking different approaches, the goal of these vendors appears to be the same. Simplify data flow between the ELN and other data sources — both internal and external — while providing domain-specific data analytics, mining, reporting and process automation. All three products share an ability to design and customize workflows via a graphical user interface, enabling the needs of specific departments and users to be addressed.
Technologies are evolving to simplify the effort involved in integration and to reduce long-term risk. Nevertheless, technology is only a part of the solution to the challenge. There are organizational and operational challenges that increase the risk of project failure if left unaddressed. Over the years, best practice organizations have learned:
- to have a prioritized, holistic integration vision and strategy
- the necessity of management support
- the need for strong project leadership
- the importance of communication across organizational divides
- to have a deep understanding of current data flows, system capabilities/limitations
- to develop a future-state data flow design, understanding the gaps from the current state
- to remediate gaps, breaking them into components where possible
- an agile, time-boxed approach using prototypes and strong user interaction
- a strategy for master data management and on-going governance
The seamless integration of ELN to other systems in the enterprise can eliminate much of the tedious efforts of manual data aggregation and manipulation. The access to data across sources allows new data relationships to be uncovered. With the notebook oftentimes at the center of a scientist’s workflow, the desire to integrate with all the systems in the laboratory is high. Careful thought and design, however, needs to be considered to avoid a spaghetti bowl of unmanageable integrations. A little time upfront will save substantial time in the future.
Michael Elliott is CEO, Atrium Research & Consulting. He may be reached at editor@ScientificComputing.com.
1. Elliott, Michael H, Electronic Laboratory Notebooks: A Foundation for Scientific Knowledge Management, Atrium Research & Consulting LLC, Sep 2009.
2. Elliott, Michael H, 4th Electronic Laboratory Notebook Survey, Atrium Research & Consulting LLC, Dec 2010.