The role of the modern researcher can sometimes feel like a thankless one. Despite being at the forefront of innovation, there is constant demand for greater outcomes and even more breakthroughs—even with R&D spending declining. More R&D activity is also coming back in-house, as greater visibility into spending has allowed many companies to see the extent of expenditure on outsourcers and Contract Research Organisations (CROs).
In order to cope with this increased pressure, chemists are turning towards technology and automation to help, in particular to deep learning.
Deep learning, a subsection of artificial intelligence (AI), is centred on learning and extrapolating based on datasets, and is more sophisticated than the ‘swivel-chair’ task-focused AI employed by many businesses. For chemists, the ability of deep learning machines to predict reactions or suggest methods to synthesize given molecules is tremendously valuable, yet there is a long way to go until it becomes mainstream. Research groups around the world are exploring a variety of deep learning approaches and are bumping into a number of barriers to success which urgently need to be overcome.
The key advantage of deep learning AI is that, if properly deployed, it can radically reduce costs and resource use for researchers. Every day, chemists are working on the creation, synthesis, and application of various compounds or molecules. This type of classification and discovery work is exactly what deep learning AI excels at, thereby freeing up researchers’ precious time. Additionally, since machines can work far faster than humans around the clock, deep learning could well find alternative and more efficient synthesis routes, allowing for even greater savings. Indeed, with sufficient data feeds, deep learning could find innovative alternatives to current production methods, removing the need for particular molecules entirely.
But deep learning isn’t just a cost-cutting exercise—it makes sense from an innovation perspective as well. Take Atomwise, a pesticide company. Atomwise has seen the benefits of deploying its deep learning machine, AtomNet, to tackle key real-world issues such as improving pesticides. Deep learning allows Atomwise researchers to simulate millions of compounds and identify the ones that target pests without causing toxicity in humans or other friendly species. This has allowed the company to produce less harmful products faster than competitors.
Deep learning, deep history
Despite the obvious potential of deep learning, as with any new technology there are still hurdles to widespread adoption. These include a lack of familiarity with machine and deep learning in the chemistry industry, the lack of standardisation of data, and a lack of collaboration opportunities. For example, recent research from Reaxys found that only 13 percent of chemists believe familiarity with machine and deep learning is a top technological skill required to be successful in the industry. This figure needs to rise substantially if chemistry is to make optimal use of deep learning.
Outside of a lack of familiarity and of appropriate skills, one of the biggest problems is also one of the oldest—the ‘Garbage In Garbage Out’ principle. If we put garbage data in—either incomplete or incorrect—we get garbage out. Algorithms can only extrapolate from what is known; even the best algorithms in the world will still yield poor results if it’s only given half the data it needs. For deep learning, the standard of data needs to be much higher than for human researchers, not only in terms of accuracy but in terms of being free of bias. Clean data is a huge problem, sucking up to 80 percent of data scientists’ time, yet it’s incredibly important. Recent research found that only three percent of companies’ data meets even the loosest standards of acceptability. This means that for deep learning to be successful in modelling reactions and compound properties, companies must ensure they can access comprehensive, historical databases of published chemical information.
Deep learning for chemistry
Even after centuries of study it is not entirely obvious what will happen when two chemicals are mixed together. When computers first became available in 1969, the chemistry Nobel laureate EJ Corey started the study of how they could help plan chemical reactions. While there has been continued academic study since, the level was relatively low until recently. Market forces demanding increasing efficiency, and the rapid growth of deep learning tools have resulted in a resurgence in research to predict the outcome of chemical reactions and plan how novel compounds might be made. Groups at leading institutions such as Stanford, MIT, Universitat Munster, UC Irvine and others have been using deep learning to create predictive models for chemical reactions. Elsevier’s chemistry prediction expert Dr. Frederik van den Broek has organized an international symposium at the fall American Chemical Society meeting in Boston where these leading researchers will report their latest results in this fast-moving area.
Chemistry has a huge wealth of historical knowledge on which modern researchers can draw. To ensure that the systems they are training aren’t missing vital data, chemical researchers must have access to the same sort of deep-dive data as those working on carbon capture. For example, the 42 million reactions indexed in the Reaxys database have each been reported with up to 2,000 sets of conditions, procedures, and associated yields, from a period of more than 240 years.
More than ever before, collaboration is becoming the key to successful research. Not just across disciplines or geographies either— the future of the research is a hybrid model where machines augment and enhance the work being done by human experts. Deep learning will be at the heart of this change, yet it isn’t a magic bullet. Just like any good human researcher, these virtual researchers require quality data with which to work.