Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

New Algorithm allows Faster, More Accurate DNA Study

By R&D Editors | January 14, 2016

Diagram showing the transmission of hereditary information in a cell. Courtesy of Moscow Institute of Physics and TechnologyA team of scientists from Germany, USA and Russia, including Dr. Mark Borodovsky, a Chair of the Department of Bioinformatics at Moscow Institute of Physics and Technology (MIPT), have proposed an algorithm to automate the process of searching for genes, making it more efficient. The new development combines the advantages of the most advanced tools for working with genomic data. The new method will enable scientists to analyze DNA sequences faster and more accurately and identify the full set of genes in a genome.

Although the paper describing the algorithm only appeared recently in the journal Bioinformatics, which is published by Oxford Journals, the proposed method has already proven to be very popular ― the computer software program has been downloaded by more than 1500 different centers and laboratories worldwide. Tests of the algorithm have shown that it is considerably more accurate than other similar algorithms.

The development belongs to the field of bioinformatics ― a cross-disciplinary field of science. Bioinformatics combines mathematics, statistics and computer science to study biological molecules, such as DNA, RNA and protein structures. DNA, which is fundamentally an information molecule, is even sometimes depicted in computerized form in order to emphasize its role as a molecule of biological memory.

Bioinformatics is a very topical subject; every new sequenced genome raises so many additional questions that scientists simply do not have time to answer them all. Specialists’ time, as well as the specialists themselves, is worth its weight in gold. This is why automating processes is key to the success of any bioinformatics project, and these algorithms are essential for solving a wide variety of problems.

One of the most important areas of bioinformatics is annotating genomes ― determining which particular DNA molecules are used to synthesize RNA and proteins. These parts ― genes ― are of great scientific interest. The fact is that, in many studies, scientists do not need information about the entire DNA (which is around two meters long for a single human cell), but about its most informative part ― genes. Gene sections are identified by searching for similarities between sequence fragments and known genes, or by detecting consistent patterns of the nucleotide sequence. This process is carried out using predictive algorithms.

Locating gene sections is no easy task, especially in eukaryotic organisms, which includes almost all widely known types of organism, except for bacteria. This is due to the fact that, in these cells, the transfer of genetic information is complicated by “gaps” in the coding regions (introns) and because there are no definite indicators to determine whether a region is a coding region or not.

The algorithm proposed by the scientists determines which regions in the DNA are genes and which are not. A Markov chain (a sequence of random events, the future of which is dependent on past events) studied in known genes can be used for this. The states of the chain in this case are either nucleotides or nucleotide words (k-mers). The algorithm determines the most probable division of a genome into coding and noncoding regions, classifying the genomic fragments in the best possible way according to their ability to encode proteins or RNA. Experimental data obtained from RNA give additional useful information, which can be used to train the model used in the algorithm. Certain gene prediction programs can use this data to improve the accuracy of finding genes. However, these algorithms require a training set involving type-specific training of the model. For the AUGUSTUS software program, for example, which has a high level of accuracy, a training set of genes is needed. This set can be obtained using another program ― GeneMark-ET ― which is a self-training algorithm. These two algorithms were combined in the BRAKER1 algorithm, which was proposed jointly by the developers of AUGUSTUS and GeneMark-ET.

BRAKER1 has demonstrated a high level of efficiency. The developed program has already been downloaded by more than 1500 different centers and laboratories. Tests of the algorithm have shown that it is considerably more accurate than other similar algorithms. The example running time of BRAKER1 on a single processor is ∼17.5 hours for training and the prediction of genes in a genome with a length of 120 megabases. This is a good result, bearing in mind the fact that this time may be significantly reduced by using parallel processors, and this means that, in the future, the algorithm may be able to function even faster and generally more efficiently.

Tools such as these help to solve a variety of different problems. Accurately annotating genes in a genome is extremely important ― an example of this is the global 1000 Genomes Project, the initial results of which have already been published. The project was launched in 2008 involving researchers from 75 different laboratories and companies. As a result, sequences of rare gene variants and gene substitutions were discovered, some of which can cause disease. When diagnosing genetic diseases, it is very important to know which substitutions in gene sections cause the disease to develop. Under the project, genomes of different people are mapped, especially their coding sections, and rare nucleotide substitutions are identified. In the future, this will help doctors to diagnose complex diseases, such as heart disease, diabetes and cancer.

BRAKER1 enables scientists to work effectively with the genomes of new organisms, speeding up the process of annotating genomes and acquiring essential knowledge about life sciences.

Related Articles Read More >

Abstract of modern high tech internet data center room with rows of racks with network and server hardware. 3d rendering
A startup says it found hidden memory behavior in NVIDIA GPUs and is building a security layer around it
Bioptimus launches massive patient data atlas to train its biology AI
Basecamp Research partners with Anthropic, NVIDIA to build the world’s largest genomic database
Could AI smell cancer? Science says yes
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.

R&D World Digital Issues

Fall 2025 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

R&D 100 Awards
Research & Development World
  • Subscribe to R&D World Magazine
  • Sign up for R&D World’s newsletter
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2026 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE