This article is also available in Italian / Questo articolo è disponibile anche in italiano
Once upon a time, not so many years ago, there was the chemical research laboratory. Nothing science-fictional or Mephistophelean, more like computer screens and countless simulations. A researcher could also spend their entire professional life, from doctoral dissertation to retirement, on the structure of a single molecule: every day, for years, sitting in front of a virtual model, changing a hydrogen here or a double bond there, and running tests to find the most effective variant, the safest, the easiest to synthesise.
Whether it was a life-saving drug, a new plastic, or a fertilizer, it took years, or even decades, to reach a certified, marketable product. It was exhausting, alienating, and also quite expensive work, since researching a single compound could require millions or even billions of dollars in investment (for each new drug, for example, R&D spending is estimated between 314 million dollars to 4.46 billion dollars, as reported in the study published by JAMA Network in June 2024).
In the past two decades, however, a revolution has begun that is now progressing at an exponential rate: artificial intelligence. Machine learning and generative AI systems are now used in somewhat all branches of scientific research, but none is benefiting to such an extent as sensational as the chemical and pharmaceutical industries. From the discovery and creation of new molecules to the search for novel materials, from chemical risk prediction to contamination remediation, to market analysis and supply chain optimisation: artificial intelligence speeds up processes and experimentation, reduces waste, improves the quality of outputs, helps sustainability and even promotes ethics, since in some cases it could drastically reduce the use of animal testing. And, finally, it has the kind of power that can single-handedly change the rules of the game: it can reduce costs.
When AI learns the grammar of molecules
Food aromas or cosmetic fragrances, drugs or detergents, fertilisers or paints, metal alloys or new high-performance materials. The applications of the chemical industry are countless, but at the base of each supply chain, at the origin of each industrial process, there is always a first and most important step: the search for the right molecule. A beginning that, in reality, can take years of study and experimentation, since “the possible combinations are almost infinite,” Julien Herzen, Lead Data Scientist at Unit8, a Swiss company that offers consulting on data science and AI, explained in a webinar.
To give an example, in the field of medicinal chemistry alone, it is estimated that 1060 compounds with drug-like characteristics can be created, more than the number of atoms in the solar system. “And out of all these molecules,” Herzen continues, “only 1010 (10 billion) are known, commercially or virtually, to scientists.”
To find the right molecule in this giant haystack, therefore, the chemical industry is increasingly relying on machine learning and deep learning systems. As we explained in RM48, these are artificial neural networks capable of sifting through huge amounts of data very quickly and combining them until the required characteristics are achieved. A “molecule designer,” Herzen explains, will then enter the desired properties (e.g. solubility or edibility) into a machine learning system, and it will then proceed incrementally by generating various combinations of elements, with attempts gradually getting closer and closer to the desired result.
Even with the help of AI, however, the process remains long and arduous. The problem is that to effectively train a machine learning model, it must first be fed datasets consisting of millions of molecular structures, which in turn must be classified and labeled by human researchers. And since such vast datasets are quite rare, the results generated by molecular design systems often lead to failed attempts, either molecules that cannot be synthesised or ones that are ineffective.
Working on this process choke point, therefore, is the MIT-IBM Watson AI Lab, which has recently developed a new model that can generate molecules and simultaneously predict their properties much more effectively than the deep learning systems used so far. The MIT study is not just an improvement, but a true breakthrough, and to understand its scope, it is enough to say that the new framework only needs a dataset of less than a hundred examples of molecular structures to generate reliable results.
Thanks to a reinforcement learning approach (the system is “rewarded” when it achieves the goal), the researchers were, in fact, able to teach the AI the molecular “grammar”; and once the fundamental rules governing its structure and behavior are assimilated, the model proceeds independently to improve its ability to speak the language of molecules. “This grammar-based representation is very powerful,” Minghao Guo, MIT doctoral student and first author of the study, explained in a statement. “And since it is also a very general model, it can be deployed to different kinds of graph-form data. We are trying to identify other possible applications besides chemistry and material science.”
In search of the materials of the future
In addition to the pharmaceutical branch, there is another large sector of the chemical industry that has been eagerly jumping on the application of artificial intelligence: the search for new materials. For example, in June 2024 the British company Materials Nexus announced that it had launched the first magnet (for use in electric vehicles, wind turbines, etc.) made without rare earths. The material was identified, in fact, thanks to the company's AI platform. And it is just the latest example in chronological order.
The search for new materials, similarly to the search for compounds destined to become drugs, has always been a slow and time-consuming process, based on repeated trial and error and expensive testing. But recently, and particularly in the past year, the use of machine learning and deep learning models has brought about an exponential acceleration. To the point that, at the end of November 2023, Google DeepMind announced triumphantly in Nature the discovery of 2.2 million new theoretically stable crystal structures that had never been achieved experimentally: practically the equivalent of 800 years of research work. The magic came from an advanced deep learning tool named GNoME, or Graph Networks for Materials Exploration, which grinds input data in the form of graphs, comparable to the bonds between atoms. GNoME has not only been limited to discovering novel theoretically synthesisable crystals, but has also identified the 381,000 most stable and promising structures for experimentation and made them available to the scientific community on The Materials Project platform. The number is truly extraordinary, considering that, until now, there were “only” 48,000 known inorganic crystals.
Who is using AI in the chemical industry
From pharmaceuticals to material science, from petrochemicals to agrochemicals, from toxicology to Chemical Risk Assessment, there is no branch of the chemical industry today that is not affected by the AI revolution. DOW Chemicals, for example, uses machine learning models to develop new polyurethane formulas, and DuPont is implementing AI-trained robots to handle hazardous chemical materials. Meanwhile, the biotechnology company Recursion Pharmaceuticals worked closely together with AI giant NVIDIA to explore 2.8 million billion combinations of molecules and target proteins in just one week: a quest that would have taken 100,000 years with traditional methods.
According to an estimate by market analysis firm Precedence Research, the use of artificial intelligence tools in the chemical industry is worth 1.4 billion dollars today, but is projected to reach 20.5 billion dollars by 2033. And the return on these investments will not fail to pay off: for example, a McKinsey report released in February 2024, which focuses only on generative AI in the pharmaceutical industry, predicts that this technology could generate an annual economic value of between 60 and 110 billion dollars for the industry.
It must be said that although sci-fi applications in the lab spark the most enthusiasm and imagination, chemical companies are beginning to make use of AI in every area of business operations: not only R&D, but also production, engineering, supply chain optimisation, management, and market research. Such as German multinational BASF, which “uses advanced machine learning algorithms tailored to predict product demand,” Suad Sejdovic, the company's Global Head of Analytics & AI, explains. While in the field of generative AI, well before the ChatGPT phenomenon exploded, BASF had already implemented “early pilot projects such as QKnows AI, a knowledge platform and retrieval tool for internal reports and external literature for R&D colleagues.” And further, AI systems are used by the group for value chain optimisation, and in the future, Sejdovic adds, “we see significant potential for marketing, sales, and procurement.”
Finally, AI technologies can also be an important help for chemical companies' sustainability goals. BASF, for example, is in the process of digitising its plants to ensure better traceability of materials and identify steps in production processes where carbon emissions can be reduced. In this regard, Sejdovic anticipates, “the new plant under construction in Zhanjiang, China, will be the first designed to be digital from the start and will become the group's site with the lowest carbon footprint.”
Image: Envato