Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Computer equal to or better than humans at cataloging science

By R&D Editors | December 1, 2014

In 1997, IBM’s Deep Blue computer beat chess wizard Garry Kasparov. This year, a computer system developed at the Univ. of Wisconsin-Madison equaled or bested scientists at the complex task of extracting data from scientific publications and placing it in a database that catalogs the results of tens of thousands of individual studies.

“We demonstrated that the system was no worse than people on all the things we measured, and it was better in some categories,” says Christopher Ré, who guided the software development for a project while a UW professor of computer sciences.

The development, described in PLoS, marks a milestone in the quest to rapidly and precisely summarize, collate and index the vast output of scientists around the globe, says first author Shanan Peters, a professor of geoscience at UW-Madison.

Peters and colleagues set up the faceoff between PaleoDeepDive, their new machine reading system, and the human scientists who had manually entered data into the Paleobiology Database. This repository, compiled by hundreds of researchers, is the destination for data from paleontology studies funded by the National Science Foundation and other agencies internationally.

The knowledge produced by paleontologists is fragmented into hundreds of thousands of publications. Yet many research questions require what Peters calls a “synthetic approach: For example, how many species were on the planet at any given time?”

Teaming up with Ré, who is now at Stanford Univ., and UW-Madison computer sciences professor Miron Livny, the group built on the DeepDive machine reading system and the HTCondor distributed job management system to create PaleoDeepDive. “We were lucky that Miron Livny brought the high throughput computing capabilities of the UW-Madison campus to bear,” says Peters. “Getting started required a million hours of computer time.”

PaleoDeepDive mimics the human activities needed to assemble the Paleobiology Database. “We extracted the same data from the same documents and put it into the exact same structure as the human researchers, allowing us to rigorously evaluate the quality of our system, and the humans,” Peters says.

Instead of trying to divine the single correct meaning, the tactic was to “to look at the entire problem of extraction as a probabilistic problem,” says Ré, who credits much of the heavy lifting to UW-Madison PhD candidate Ce Zhang.

Computers often have trouble deciphering even simple-sounding statements, Ré says. Ré imagines a study containing the terms “Tyrannosaurus rex” and “Alberta, Canada.” Is Alberta where the fossil was found, or where it is stored? “We take a more relaxed approach: There is some chance that these two are related in this manner, and some chance they are related in that manner.”

In these large-data tasks, PaleoDeepDive has a major advantage, Peters says. “Information that was manually entered into the Paleobiology Database by humans cannot be assessed or enhanced without going back to the library and re-examining original documents. Our machine system, on the other hand, can extend and improve results essentially on the fly as new information is added.”

Further advantages can result from improvements in the computer tools. “As we get more feedback and data, it will do a better job across the board,” Peters says.

The machine-reading trial required access to tens of thousands of articles, says Jacquelyn Crinion, assistant director of licensing and acquisitions services at the UW–Madison General Library System. And the download volume threatened logjams in document delivery. Eventually, Elsevier gave the UW-Madison team broad access to 10,000 downloads per week.

As text- and data-mining takes off, Crinion says the library system and publishers will adapt. “The challenge for all of us is to provide specialized services for researchers while continuing to meet the core needs of the vast majority of our customers.”

The Paleobiology Database has already generated hundreds of studies about the history of life, Peters says. “Ultimately, we hope to have the ability to create a computer system that can do almost immediately what many geologists and paleontologists try to do on a smaller scale over a lifetime: read a bunch of papers, arrange a bunch of facts, and relate them to one another in order to address big questions.”

Source: Univ. of Wisconsin-Madison

Related Articles Read More >

Five cases where shaky science snowballed into public confusion
Caltech, Fermilab, and collaborators test quantum sensors for future particle physics experiments
2025 R&D layoffs tracker: 83,543 and counting
NSF layoffs in 2025: Deep budget cuts headed for U.S. research sector
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE