Press Releases

Insilico Medicine and Saudi Aramco Introduce the “Sanity Pipeline” to Advance AI-driven Discovery of MOFs

In October 2025, the Nobel Prize in Chemistry was awarded to Susumu Kitagawa, Richard Robson, and Omar M. Yaghi for the development of Metal-Organic Frameworks (MOFs), a breakthrough that shifted materials science from "serendipity" to "programmable design." Often described as "Molecular Lego," MOFs – built from inorganic nodes and organic linkers – offer immense structural diversity and highly tunable properties, enabling an exceptionally broad range of applications, including gas storage, molecular separation, catalysis, pollutant removal, water harvesting, energy conversion and storage, sensing, and biomedicine.

The design space for MOFs is exceptionally vast, as it arises from the combinatorial explosion of diverse inorganic nodes and nearly limitless organic linkers. Traditional workflows, reliant on manual enumeration and trial-and-error, are no longer sufficient to navigate this complexity. The rapid development of Generative AI and foundation models offers a transformative solution: by learning from vast databases, these models can "reverse-engineer" novel structural candidates, identifying promising materials with unprecedented speed.

However, realizing the full potential of generative AI and data-driven approaches in materials science depends on advances in structural validation tools. To address this need, researchers from Saudi Aramco and Insilico Medicine (“Insilico”, 3696.HK), a clinical-stage generative artificial intelligence (AI)-driven biotechnology company, introduced the Sanity Pipeline, designed to tackle key challenges related to structural validity checks in materials science. The work was recently published as a preprint on ChemRxiv.
“Generative AI models trained on large-scale MOF databases face a critical challenge: hidden structural errors in CIF files, including geometric inconsistencies, incorrect connectivity, and chemically implausible oxidation states,” the authors note. “These issues can bias training data, reduce model reliability, and lead to the generation of unrealistic materials. While existing tools address individual aspects of this problem, they do not provide a unified, scalable solution suitable for modern generative workflows.”

The Sanity Pipeline is a multi-level system designed to address this gap. It combines fast pre-filtering with in-depth validation, enabling both high-throughput screening and detailed defect detection. Its core components, jointly developed by Saudi Aramco and Insilico Medicine, include LibCIF, a tool for ultrafast MOF decomposition and structural analysis, and OxiChecker, an oxidation-state anomaly detector. Additionally, the pipeline is complemented by established geometric and crystallographic validation tools, forming a unified and scalable validation cascade.

LibCIF enables rapid decomposition of crystalline structures into nodes, linkers, and molecular fragments and performs structural checks. It achieves significant speed improvements compared to existing solutions, enabling real-time processing of large datasets. This makes it particularly suitable for reinforcement-learning pipelines and large-scale generative screening, where rapid feedback is essential.

OxiChecker provides a complementary layer of chemical validation by assessing charge neutrality and oxidation-state consistency. It incorporates multi-path charge assignment strategies, handles ambiguities in conjugated and radical systems, and applies domain-specific rules for common inorganic and organometallic motifs. This enables robust performance even in noisy generative outputs, where conventional tools may struggle due to bond-distance-based typing sensitivity.

By integrating these components into a scalable cascade system, the Sanity Pipeline enables a flexible balance between speed and accuracy, depending on the use case. It supports both real-time screening and high-precision offline analysis, providing foundational infrastructure for next-generation materials discovery and AI-driven design.
The tool is publicly available on Github, and the authors hope it will support the research community in advancing the discovery and translation of novel materials into real-world applications. This achievement further solidifies the partnership between Insilico Medicine and Saudi Aramco. Since signing a Memorandum of Understanding (MOU) at the 2023 LEAP Technology Conference, both entities have worked in tandem to harness generative AI, driving innovation at the intersection of sustainable development and aging-related science.

Read the pre-print of Sanity and Decomposition Pipeline for Metal-Organic Frameworks in Generative AI https://chemrxiv.org/doi/full/10.26434/chemrxiv.15003614/v1