Enter the 2019 R&D 100 Awards!
ThreatSEQ is a 2018 R&D 100 Award winner. All of the R&D 100 Awardees were announced at the R&D 100 Awards Gala held in Orlando, Florida on Nov. 16, 2018.
The R&D 100 Awards have served as the most prestigious innovation awards program for the past 57 years, honoring R&D pioneers and their revolutionary ideas in science and technology.
Submissions for the 2019 R&D 100 Awards are now being accepted. Any new technical product or process that was first available for purchase or licensing between January 1, 2018 and March 31, 2019, is eligible for entry in the 2019 awards.
Scientists working with raw genetic material and the creation of DNA sequences must take precautions to ensure they don’t accidentally produce potential biohazards, such as toxins or pathogens that could lead to contagious diseases and death among plant and animal hosts. Not only could this type of incident lead to significant financial losses, as host organisms are killed—it can also pose a danger to those working with these materials in the lab.
Fortunately, many gene sequences that produce harmful qualities are known to the scientific community, and new sequences can be checked against these known concerning sequences before they are produced.
Battelle has sought to improve and simplify this screening with a system they developed called ThreatSEQ, which was one of the winners of the 2018 R&D 100 Awards. The software scans four databases containing over 10,000 sequences of concern and ranks sequences of at least 200 bp as non-threatening, unidentified, or as being in one of four threat tiers.
The four different databases and ThreatSEQ’s four-tier system helps gene synthesis companies narrow down if a sequence is potentially dangerous and decide what kind of follow-up is needed given the screening results. Omar Tabbaa, director of computational biotechnology at Battelle, told R&D Magazine in an exclusive interview that these databases include the main sequences of concern database, which he calls the “meat and potatoes” of ThreatSEQ; a database of restricted pathogens containing sequences of concern; a database of restricted pathogens that don’t contain specific known sequences of concern; and a general database of every protein, to case a “wide net.”
“Just because it doesn’t hit something concerning, they still want to know what they’re producing,” said Tabbaa. “That gets them the full coverage, if it is something that is a known sequence.”
A sequence that is found to be a match in the sequence of concern database, or matches to a Tier 1 virus, is ranked as Tier 1, while sequences with any matches to the restricted pathogens database, to sequences that aren’t specifically considered non-concerning, are Tier 2. Tier 3 matches a non-concerning sequence of a restricted pathogen.
“For example, take e. coli—only certain strains are actually causing disease. And a lot of that disease is caused due to the fact that some of the strains produce a toxin, which can give you all the gastrointestinal discomfort,” Tabbaa explained. “But there are a lot of e. coli strains that are not harmful, (but) it shares a lot of genes between those two. So the sequences that aren’t of concern are things like housekeeping genes, things that are just required for replication, and things like that.”
The last threat tier encompasses any sequence that has at least one 200 bp segment with a best match alignment to the genome of a restricted pathogen. The ThreatSEQ user interface provides the user with their sequence’s ranking along with information about the known sequence it matched with. The system is more than just a litmus test of whether or not a sequence is safe to produce—it gives users information that can help decide their next step.
“We provide a lot of back information so they don’t have to do the Googling,” Tabbaa added. “If it hits to the restricted pathogen database for which we don’t have any sequences of concern, at that point we know that it’s a restricted pathogen, (but) it could be any gene in there. So there’s a possibility that it could be threatening, and that will require some sort of review on the vendor side.”
The databases are built on over 10 years of research, according to Battelle, and include over 10,000 sequences of concern encompassing 850 different types of concerns, 96 viruses, 75 bacteria species, 12 eukaryotic pathogens, as well as other items that contribute to pathogenesis. This covers all of the human U.S. Select Agents and Australia Group Lists for pathogens and toxins.
Tabbaa says one of the benefits of ThreatSEQ is that it consolidates so many sequences into one system, whereas previously, companies would tend to each develop their own systems and databases to screen for potential hazards.
“Every company was doing it a little bit different, and there were certain varying levels of sophistication. Some were doing very basic algorithms and databases, and even the database updating was being done different,” he said. “One of the things that we’ve been working toward is the standardization across the industry.”
Tabbaa says ThreatSEQ’s databases and algorithm will be continuously updated and improved, and that many industry partners have helped bolster the technology along the way.
“This system wouldn’t exist if it weren’t for the industry support that we’ve been getting,” he said. “We’re working with a lot of other vendors as well; their requirements can be different from each other, and that’s allowing us to continue building up the tool.”
Correction: A previous version of this article stated that the development of ThreatSEQ was partially funded by a government grant. Development of ThreatSEQ was fully funded by Batelle.