Open Innovation and IT Infrastructure Considerations for Information Assembly in Analytical Sciences

Figure 1: Layout of an example assembled analysis result Everyone wants to foresee the future, prompting the Industrial Research Institute to present their summary findings looking ahead in “IRI 2038: Envisioning the Future of R&D.” Among their findings, some of the key future trends are already being considered by thought leaders across a variety of industries; moreover, several related topics were specifically highlighted in a series of recent articles.

Michael Elliott describes in his article, “The Future-as-a-Service,” the trend of how collaborative R&D partnerships present significant challenge to current IT infrastructures. Elliott goes on to outline some of the efforts to support these new R&D operating models with services-oriented IT architectures.
Sanji Bhal describes challenges associated with information transfer in this new model and looks at how specifically reduction/abstraction of certain information can limit the effectiveness of collaboration in his article “Analytical Knowledge Transfer presents a Challenging Landscape in an Externalized World.”
In “The Business Challenges of Externalizing R&D,” Brian Fahie and Evan Guggenheim propose some of the effective data flows between collaborators and decision-making mechanisms.

Of particular relevance to many stakeholders is the increasing trend of “open” innovation models, where a variety of internal and external resources are leveraged to discover, develop and, ultimately, commercialize new products across industries.

Taking into consideration the points and observations raised in the preceding list of articles, this article will provide some additional recommendations for industrial R&D stakeholders to consider, with particular emphasis on measurement or analytical sciences.

Data abstraction

Consider the various quality assurance requirements — across industries — where comprehensive molecular characterization of materials is required during the product life cycle. Typically, specifications used are abstracted from analytical test methods — whether to validate the fidelity of composition (in material discovery), to validate control during process development, or to support the release of materials in commercial markets. This process of abstraction culminates in the ability for quality assurance stakeholders to support data review and material release. Correspondingly, informatics systems are built to support these workflows with proven efficiency. However, for certain test methods, the reduction of data to numerical and textual descriptors limits additional use of the data acquired.

As an example, consider impurity profiling in drug substances and drug products. By reducing the validated HPLC-UV-MS methods to a collection of tracked components (by retention time and area percent), it becomes difficult to leverage the rich information contained in chromatograms and corresponding spectra. Luckily, modern chromatography data systems (CDS), scientific data management systems (SDMS) and document management installations afford the retrieval of all associated analytical data, if one knows in advance what they’re seeking, with varying levels of effort.

We have observed that, at times, the solution to a manufacturing challenge can be found in well-indexed impurity profiles only by investigating profiles in their full form. Extending this example: consider the scenario where a material/substance lot fails due to the presence of a new impurity. Determining the source of this new impurity is the first step in containment and abatement. Imagine being able to perform a “spectral search” across all native analytical data for a substance — from its initial raw materials, through process intermediates, to final composition and formulation. Moreover, upon discovery of a new, raw material-related impurity, there is value in assessing whether other substances in a firm’s commercial inventory are potentially at risk of future quality failures.

Analysis and Data: Impact on architecture and system functionality

The interrelatedness of certain analytical information is an additional consideration. In the example above, compositional profiling by HPLC-UV-MS may be captured by one analytical data file. However, many substance or product specifications require multiple techniques from different instruments, collected at different times, by different places and perhaps outside the company. (This is especially relevant when leveraging material generated and tested by external partners, as described by Fahie and Guggenheim.

To support such interrelatedness, firms must be able to not only sufficiently index individual analytical data files, but must also afford “analysis assembly” capabilities to provide users with a comprehensive “story” for relevant analyses.

Consider formulation profiling as an example. The following list of “related data” must be “assembled” to present a comprehensive assessment of a product formulation:

LC-UV/MS (and other detector types)
GC-FID/MS (and other detector types)
chemical, biological, formulation schematics
chemical structures for process-related impurities/degradants
1D and 2D NMR data for isolated formulation components, with references to separations information (e.g., retention time)
XRPD, DSC, TGA, particle size distribution, and a variety of other material characterization datasets

From an architecture perspective, firms can decide to put the analysis assembly capability at the data mart tier, or store so-called Assembled Analysis Results if their model leverages relational data models at the information tier.

Figure 2: Data flow diagram, leveraging analysis assembly at the data warehouse tier Moreover, firms should recognize the need to implement comprehensive analysis ontologies or taxonomies — beyond analytical technique-based tagging. We propose that firms build data models that support users when they need to query and retrieve these complex, interrelated data. Finally, the ability to visualize these interrelated analysis project data in a simple, facile interface is critical to a variety of workflows, similar to Figure 1.

Data models and open innovation considerations

We have observed that many of the innovation management systems are primarily concerned with project timeline/deliverable management; often neglecting how information between collaborators is governed. The articles referenced above describe some of the emerging data frameworks that will support firms’ operating models now and in the future, along with some specific workflows within a partner ecosystem.

In addition to the ever-present concern for secure information exchange and sequestration, some additional considerations may be relevant for R&D stakeholders. Similar to the Analysis Assembly capability described above, complex, assembled data files may come from sources from a variety of partners, across the product lifecycle. Furthermore, systems must support a complex permissions hierarchy, which allows for a variety of partner relationship modalities. These can range from submission-only to open data access. Finally, certain pertinent partner relationship management attributes must also be associated with analytical data files. Consider also that, if these attributes are associated appropriately, specific key performance indicators for individual partners can be obtained, in addition to the existing contractual obligations assessed using existing systems. Some examples include response/fulfillment timing, activity tracking and subjective interaction assessment.

Summary

The articles referenced above indicate the future trends predicted in the IRI 2038 findings are observable today. We assert that industrial externalization efforts and modern IT infrastructure investments are weak signals of the future being realized sooner than 2038. However, innovation stakeholders must recognize that certain functionality must be implemented within this new architecture to meet critical decision support requirements. Specifically, for analytical sciences, the aforementioned analysis assembly toolset must be implemented to assure business value.

As R&D operating models continue to evolve, stakeholders must continue to make strategic investments that support corporate innovation velocity, productivity and risk mitigation. These investments must ultimately support facile collaboration, robust data management practices and IP protection. But, as these systems are implemented, stakeholders must be mindful that certain scientific data requires specialized ontologies or taxonomies — particularly in data resulting from measurement or analytical sciences. For such data to be effectively leveraged in business-critical decision-making, the appropriate functionality for analysis assembly must be carefully implemented.

Andrew Anderson is Vice President, Business Development, and Graham A. McGibbon is a Manager, Scientific Solutions and Partnerships, at ACD/Labs. They may be reached at [email protected].

Related Articles Read More >

A startup says it found hidden memory behavior in NVIDIA GPUs and is building a security layer around it

Bioptimus launches massive patient data atlas to train its biology AI

Basecamp Research partners with Anthropic, NVIDIA to build the world’s largest genomic database

Could AI smell cancer? Science says yes

Search R&D World