Researchers from Lawrence Livermore National Laboratory (LLNL) and Meta have created an open dataset of atomistic polymer chemistry. The dataset includes millions of quantum-accurate simulations designed to help AI model the complex behavior of plastics, films, batteries and other materials.

Credit: Graphic: Dan Herchek/LLNL Background Image: Evan Antoniuk/LLNL
The dataset, called Open Polymers 2026 (OPoly26), enables AI to learn patterns from millions of precomputed polymer structures in hours or days. The work builds on the Meta and Lawrence Berkeley National Laboratory (LBNL) led Open Molecules 2025 (OMol25) dataset, which contains open molecular data aimed at advancing AI-driven chemistry.
A quantum leap for materials science
The OPoly26 dataset contains more than 6 million density functional theory (DFT) calculations on polymeric chemical systems, making it nearly 10 times larger than the next largest comparable polymer dataset.
By generating critical missing data on polymers, the team aims to expand and democratize open datasets for materials scientists, accelerating the pace of discovery across polymer chemistry.
“This fills a huge gap,” said Evan Antoniuk, an LLNL materials scientist and OPoly26 co-principal investigator. “We see this as a community resource, one that we hope becomes the go-to starting point for anyone interested in performing atomistic simulations of polymers.”
LLNL contributed computational power and polymer domain knowledge to generate the dataset, running simulations to model how the polymers behave in real-world conditions. Meta contributed its computational resources to perform 1.2 billion core hours of DFT simulations and train MLIP models.
Exascale ambition: 1.2 billion core hours
The researchers used LLNL’s supercomputer, Tuolumne, leveraging this hardware to compress years of simulation work into months, enabling the dataset to reach a scale unmatched in polymer science.
“Comprehensive coverage of this chemical space is essential to the success of the OPoly26 dataset,” said LLNL staff scientist Nick Liesen. “We have worked to leverage pipelines that take us from a simple text string to fully atomistic representations of polymer dynamics at scale.”
Meta researchers trained and benchmarked machine-learned interatomic potentials at scale, allowing the team to evaluate how well AI models generalize across small-molecule and polymer chemistry. They found substantial improvements in model accuracy when polymer data were incorporated alongside the small-molecule training sets.
Decoding reactivity and PFAS stability
Understanding why certain polymers, including PFAS-based materials, resist chemical change requires models that can accurately describe both reactive and nonreactive behavior. Capturing this behavior under realistic conditions required careful attention to reactive configurations, according to LBNL chemist and OPoly26 co-PI Sam Blau, who also previously co-led OMol25.
“Reactivity — the breakage and formation of chemical bonds — is central to polymer synthesis, manufacturing, aging and recycling, and to nanoscale patterning of polymer thin films for semiconductor manufacturing,” said Blau.
The paper introduces an initial suite of polymer-specific evaluation tasks to benchmark how effectively these models capture simulated polymer phenomena and interactions such as polymer solvation. Future work will include evaluating MLIP models against experimental measurements to gauge how well they capture real-world polymer properties.
The researchers are releasing OPoly26 under an open license to maximize reuse and reproducibility, making all data publicly available to fuel polymer advancements across academia, industry and government.


