Mission Possible
Data intensive computing pushes scientific frontiers
How can scientists transform terabytes and petabytes of streaming data into information that promotes vital discoveries and timely decisions — in near real time? The answer lies in data intensive computing.
Addressing the demands of ever-growing data volumes and complexity requires game-changing advances in software, hardware and algorithm development. Solution technologies also must scale to handle the amplified data rates and simultaneously accelerate timely, effective analysis results. Data intensive computing, or DIC, aims to capture, manage, analyze and understand data at volumes and rates that push the frontiers of current technologies.
PNNL’s approach to DIC centers on three key research areas:
• Hybrid hardware architectures — research in hybrid computing uses multithreaded hardware architectures, field-programmable gate arrays, and multi-core processors to drive the analytics closer to the data source. This makes it possible to achieve near-real-time data reduction and feature extraction.
• Software architectures — The Middleware for Data-Intensive Computing, (MeDICi) software architecture incorporates software integration capabilities, virtualized meta-data management and a workflow engine to support the development of domain-agnostic solutions.
• Analytic algorithms — Advanced analytics create novel algorithms to provide real-time analysis and visualization capabilities for exploration and diagnostic discovery to facilitate human understanding.
From this research, interdisciplinary teams develop and combine new technologies to:
• create capabilities that enable scientific discovery and insight (e.g. cleaning up the environment)
• provide decision support and control (e.g. securing cyber networks)
• advance situational awareness and response (e.g. preventing terrorism).
Coupled with novel visualization techniques, these compelling new tools will drive future scientific explorations in fields as diverse as bio-informatics, proteomics, security, climate data processing and real-time scientific instrument control.
Ian Gorton is the chief architect for PNNL’s Data Intensive Computing program. Christopher S. Oehmen and Jason E. McDermott are senior research scientists in the laboratory’s Bioinformatics and Computational Biology group, Fundamental and Computational Sciences Directorate. They may be reached at [email protected].