Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Sophisticated New Statistical Technique helps Map Species’ Genetic Heritage

By R&D Editors | December 12, 2014

CHAMPAIGN, ILProfessor Tandy Warnow developed a new statistical method that sorts genetic data to construct better species trees detailing genetic lineage. Courtesy of L. Brian Stauffer — Where did the songbird get its song? What branch of the bird family tree is closer to the flamingo — the heron or the sparrow?

These questions seem simple, but are actually difficult for geneticists to answer. A new, sophisticated statistical technique developed by researchers at the University of Illinois and the University of Texas at Austin can help researchers construct more accurate species trees detailing the lineage of genes and the relationships between species.

The method, called statistical binning, was used in the Avian Phylogenetics Project, the subject of a December 12, 2014, special issue of the journal Science.

“A species tree is a way of describing how a species evolved from a common ancestor,” said study leader Tandy Warnow, Founder Professor of Bioengineering and Computer Science at the University of Illinois. “Researchers use a species tree to do all sorts of things, like figure out when different traits came into being, and what triggered that trait evolution, and how those things may or may not have been triggered by environmental changes.”

There are two main approaches to constructing a species tree from genomic data, Warnow said. One method, which has prevailed for decades, puts all the gene data together into one giant matrix and analyzes it to map the overall species tree. This is called concatenation. The difficulty with that approach is that individual genes often have different lineages, which can diverge greatly from each other and the species tree as a whole.

A second approach, the coalescent-based method, looks at the data for each gene and estimates gene trees for each trait. Then it combines all the trees together to create the overall species tree. While this approach is sound theoretically and statistically, it does not perform as well as expected in practice.

“We realized that the gene trees that are combined have error in them,” Warnow said. “When the gene trees have error, then when you combine them you get a bad estimate of the species tree. So we needed to get better gene trees, and the question is, how do we do that?”

Statistical binning takes all the gene data and uses statistical optimization techniques to sort the genes into sets or “bins.” The genes in each bin have trees that don’t seem to have statistically significant differences. The data for each bin is combined into a “supergene” tree, and then the supergene trees are combined into an overall species tree.

“You can think of statistical binning as combining the best properties of the two dominant approaches,” said Siavash Mirarab, graduate student at the University of Texas at Austin and first author of the paper detailing the statistical binning method. “Without this method, what people had to do was throw away data they didn’t like. This approach allows you to use all the data you have and you don’t have to throw away anything. We have a method that achieves that by grouping things together in a way that makes sense, statistically.”

The researchers compared the species trees produced using the coalescent method with statistical binning to trees produced with concatenation or coalescence alone for several biological classes, such as birds, mammals, yeast and others. They found that adding the statistical binning process to the pipeline produced species trees that were better than the trees produced by either of the conventional methods.

“We sort the gene data in a sophisticated statistical way, but having done it we get better trees,” Warnow said. “The result is significantly improved estimates of the gene trees, which gave us better estimates of the species tree and branch lengths, which helps you figure out when things happened. Everything was much more accurate.”

Statistical binning allowed the Avian Phylogenetics Project to analyze more than 14,000 genes – one of the largest such projects yet published – and construct a large tree linking many different bird species. (Read more about the results.)

Warnow and Mirarab plan to continue to refine the statistical binning method and hope that it can add accuracy to many other similar studies.

“There’s a large divide in the research community as to whether to use concatenation of a coalescent analyses. What we did was understand why the coalescent method didn’t give good results and came up with a way of improving the input so that it could have good results. It’s a way of bringing these two very divided communities into greater agreement with each other,” Warnow said.

Related Articles Read More >

Why IBM predicts quantum advantage within two years
Aardvark AI forecasts rival supercomputer simulations while using over 99.9% less compute
This week in AI research: Latest Insilico Medicine drug enters the clinic, a $0.55/M token model R1 rivals OpenAI’s $60 flagship, and more
How the startup ALAFIA Supercomputers is deploying on-prem AI for medical research and clinical care
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE