Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

It Takes Glue to Tango: MeDICi integration framework creates data-intensive computing pipeline

By R&D Editors | November 17, 2008

It Takes Glue to Tango

MeDICi integration framework creates data-intensive computing pipeline

MeDICi Workflow Provenance and Integration
click to enlarge

Figure 1: This chart illustrates a MeDICi workflow provenance and integration. The loosely coupled components allow mix and match solutions.

Biologists increasingly rely on high-performance computing platforms to process the tsunami of data generated by high-throughput genome and metagenome sequencing technology and high-throughput proteomics. Unfortunately, the platforms that produce the massive data sets rarely work smoothly with the interactive analysis and visualization programs used in bioinformatics. This makes it difficult for researchers to exploit the computational power of HPC platforms to speed scientific discovery.

At the Department of Energy’s (DOE) Pacific Northwest National Laboratory (PNNL) in Richland, WA, researchers are creating computing environments for biologists that seamlessly integrate collections of data and computational resources. These advantages enable users to rapidly analyze high-throughput data. A major goal is to shield the biologist from the complexity of interacting with multiple dissimilar databases and running tasks on HPC platforms and computational clusters. One of these environments — the MeDICi Integration Framework — is now available for free download. Short for Middleware for Data-Intensive Computing, MeDICi makes it easy to integrate separate codes into complex applications that operate as a data analysis pipeline.

What is MeDICi?

MeDICi Integration Framework
click to enlarge

Figure 2: Example of a MeDICi integration framework illustrates the incorporation of external HPC resources and databases within the pipeline.

MeDICi is an evolving middleware platform (computer software that connects software components or applications) for building complex, high-performance analytical applications. These applications typically comprise a pipeline of software components. Each component analyzes incoming data and passes its results to the next step in the pipeline. The platform creates analysis pipelines that unite the tools and data needed to support biologists in their analyses. MeDICi provides a set of mechanisms for easily plugging together unrelated, distributed codes and executing the resulting pipeline.

The MeDICi technology starts with widely used, standards-based Java and Web services middleware and layers on top of these tools a simple dataflow-based programming model. This approach has enabled a small PNNL team to build a robust platform in just two years. MeDICi consists of three subsystems
(Figure 1):
• MeDICi Integration Framework (MIF) — a Java-based, asynchronous messaging platform for application integration
• MeDICi Provenance — a Java API, RDF-based store and content management system for capturing and querying important metadata that can be used for debugging and reconstruction of application results
• MeDICi Pipeline — a BPEL-based environment that integrates with MIF to provide definition tools and a standards-based recoverable pipeline execution engine

The three subsystems function alone or together in any MeDICi application, depending on the needs of the scientists. The MIF is the heart of MeDICi and provides the basic programming interfaces for creating pipelines. MIF leverages the open source Mule Enterprise Service Bus. On top of Mule, MIF imposes a component-based programming model for pipeline creation.

Integration framework 
MIF components are constructed using Java programming interfaces that support inter-component communication using asynchronous messaging. Local components execute inside the MIF container. Remote components create distributed solutions and integrate with non-Java code. They support the same programmatic interfaces and use additional MIF facilities to execute component code outside the MIF container.

The default MIF container mechanism uses Java method calls for exchanging messages between components. Use of the Java Messaging Service and other protocols for exchanging messages can enhance scalability — just configure the communication links between components using the chosen protocol.

Mule provides the MIF container environment. MIF extends the Mule interface to make component and pipeline construction easier and to create an encapsulation device for component creation. The MIF interface is agnostic of the underlying Java messaging platform. This allows deployments to configure MIF applications using technologies that meet individual quality-of-service requirements.

Two current PNNL projects that illustrate the use of MIF in building pipelines to solve complex, data-intensive biology problems are featured in the companion articles “Community Proteomics Analysis” and “Aiding Environmental Cleanup.”

The future for MeDICi architecture
The MeDICi Integration Framework is freely downloadable and will soon be open source. The developers expect that by concentrating on simplicity of design and programming, and high performance and robustness based on leveraging standards-based technology, MeDICi will become the glue that makes it possible for many diverse codes and tools to “dance” together.

Ian Gorton is the chief architect for PNNL’s Data Intensive Computing program. Christopher S. Oehmen and Jason E. McDermott are senior research scientists in the laboratory’s Bioinformatics and Computational Biology group, Fundamental and Computational Sciences Directorate. They may be reached at [email protected].

Case studies
Community Proteomics Analysis
Aiding Environmental Cleanup
Mission Possible

Related Resources
1. Cannon WR, Jarman KH, Webb-Robertson BJ, Baxter DJ, Oehmen CS, Jarman KD, Heredia-Langner A, Auberry KJ, Anderson GA (2005) Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. Journal of proteome research 4: 1687-1698.
2. Fredrickson JK, Romine MF, Beliaev AS, Auchtung JM, Driscoll ME, Gardner TS, Nealson KH, Osterman AL, Pinchuk G, Reed JL, Rodionov DA, Rodrigues JL, Saffarini DA, Serres MH, Spormann AM, Zhulin IB, Tiedje JM (2008) Towards environmental systems biology of Shewanella. Nature reviews 6: 592-603.
3. Oehmen C, Nieplocha J (2006) ScalaBLAST: A scalable implementation of BLAST for High Performance Data-Intensive Bioinformatics Analysis. IEEE Trans Parallel Dist Sys 17: 740-749.
4. Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of molecular biology 314: 1041-1052.
5. Shah AR, Oehmen CS, Harper J, Webb-Robertson BJ (2007) Integrating subcellular location for improving machine learning models of remote homology detection in eukaryotic organisms. Comput Biol Chem 31: 138-142.

Acronyms
API Application Programming Interface | BPEL Business Process Execution Language | DIC Data Intensive Computing | DOE U.S. Department of Energy | EDMS Experimental Data Management System | EMSL Environmental Molecular Sciences Laboratory | MeDICi Middleware for Data-Intensive Computing, | MIF MeDICi Integration Framework | PNNL Pacific Northwest National Laboratory | PRISM Proteomics Research Information Storage and Management System | RDF Resource Description Framework | SCOP Structural Classification of Proteins

 

Related Articles Read More >

Why IBM predicts quantum advantage within two years
Aardvark AI forecasts rival supercomputer simulations while using over 99.9% less compute
This week in AI research: Latest Insilico Medicine drug enters the clinic, a $0.55/M token model R1 rivals OpenAI’s $60 flagship, and more
How the startup ALAFIA Supercomputers is deploying on-prem AI for medical research and clinical care
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE