It is often an impossible challenge for chemists and other scientists to comb through the millions of possible chemical reactions.
Now, a new machine learning-based platform called IBM RXN for Chemistry is giving users a tool that allows them to input a molecule with different reactants and predict what chemical reaction ultimately will occur.
Created by Teodoro Laino, PhD, technical leader for molecular simulation at IBM Research, Philippe Schwaller, and Theophile Gaudin, both predoctoral researchers at IBM Research, the system uses a neural machine translation technology approach, similar to how programs like Google Translate work to translate between English and German. The web-based app, which was launched in 2017, applies these types of technology to allow users to go from designing materials to generating products with a 90 percent accuracy, significantly higher than other models.
In an exclusive interview with R&D Magazine, the three researchers explained the genesis of IBM RNX for Chemistry.
“We decided to start at that time by looking at a very simple problem, basically given a set of reactants, reagents and conditions, is it possible using data-based models to come up with a prediction of what would be the most likely product of that reaction?” Laino said. “The way we decided to address the problem is instead of building a neural network by brute force using the data of chemical reactions; we decided to explore a slightly different approach.
“The approach that we decided to follow was basically grounded in the assumption that in organic chemistry, there are a certain number of rules that are dictated by the physics of the problem that are representing a language,” he added. “It is not memorizing the reaction, but it is about constructing internally a certain number of rules that are deciding what are going to be the most likely product.”
The key to making the platform work is a system called SMILES (simplified molecular-input line-entry system), which represents a molecule as a sequence of characters, such as BrCCOC₁OCCCC₁. The model was trained using a combination of reaction datasets, totaling more than two million different reactions that were extracted from patents and textbook examples.
The platform allows users to draw molecules with a web-based chemical structure editor called Ketcher, where they can select, modify and erase connected and unconnected atoms and bonds, check bond lengths, angles and spatial arrangement of atoms and check the stereochemistry and structural layout with all the different tools available within the system. Research groups and science classes can also collaborate, create and share projects on IBM RXN for Chemistry.
The system includes preconfigured libraries of molecules that allow users to add reactants and reagents and explore potential chemical reactions. They can also upload molecules to enrich and customize libraries and enhance libraries with their own reaction outcomes or with molecules drawn on their Ketcher board. As of April 11, there have been more than 34,500 predicted reactions on the platform.
The program is completely free and available on the IBM Cloud. Users are required to sign in using a social media account as a login. Each individual also owns the rights to anything they produce using IBM RNX for Chemistry.
With the success of the current iteration, which is targeted for academia, the researchers are planning to roll out an updated version with new features this coming summer that will allow users to take an up-close look at synthesized molecules, in an effort to make the project more attractive to commercial research labs.
“Input your target molecule and the system constructs an entire retrosynthetic approach,” Laino said. “We go back, step by step, until we identify commercial molecules and give back the entire sequence of reactions that are needed to synthetize that molecule. This is a sort of game-changing; splitting a synthesis like that could take weeks for chemists.”