Teaching AI the Language of Molecules

2026.04.09

Teaching AI the Language of Molecules: How MMAI Gym and Liquid Intelligence are Solving the “Brute Force” Crisis in Drug Discovery

Drug discovery is often described as searching for a needle in a haystack, with only a tiny number of drug compounds simultaneously safe, effective, and synthesizable. The reality is far more daunting–it's like finding a needle in a whole galaxy. Chemists estimate the number of potential drug-like molecules at 10⁶⁰, more than the number of atoms in the Milky Way. We have traditionally relied on two things to navigate this virtually infinite landscape: the intuition of master chemists and, more recently, the brute force of massive Artificial Intelligence.

But a problem has emerged in the AI era. As Large Language Models (LLMs) grow to hundreds of billions of parameters, they are becoming "jacks of all trades and masters of none." They can write poetry and code, but they struggle to speak the language of molecules. Simply making models bigger is no longer yielding the scientific breakthroughs we need.

A new paper published at ICLR 2026 by researchers from Insilico Medicine and Liquid AI introduces a paradigm shift. Instead of building a bigger brain, they built a smarter, faster, and more specialized one. By introducing the MMAI (Multi-Modal AI) Gym for Science, they have trained a compact Liquid Foundation Model (LFM) that reasons through the fundamental logic of chemistry.

The Science Gap:
Why General AI Fails in the Lab

While models like GPT-4 or Llama are impressive, they are essentially "tourists" in the world of drug discovery. They rely on in-context learning that doesn't reliably deliver the precision required for tasks like predicting molecular properties and toxicities or planning a complex chemical synthesis. These models, even those trained in specialized chemistry domains, also fall short when they are tasked with generalizing patterns learned from their training data to totally new situations, like designing a new chemotype for a small-molecule drug or characterizing drug molecule interactions with a never-before-targeted protein target.

The industry has reached a crossroads: do we continue to spend hundreds of millions of dollars on brute force scaling, or do we teach AI to think like a scientist?

The team at Insilico Medicine and Liquid AI chose the latter, developing MMAI Gym for Science as a structured training and benchmarking environment designed to provide foundation models with a multi-stage curriculum in medicinal chemistry, biology, and clinical development.

The Idea Behind MMAI Gym:
Teaching AI to "Think"

The MMAI Gym treats drug discovery not as a text-prediction problem, but as a series of domain-specific reasoning tasks. It features over 200 specialized tasks across 2D and 3D molecular structures, protein sequences, and drug-gene interactions.

Key innovations within the MMAI Gym include:

Reasoning Traces: Using Supervised Fine-Tuning (SFT) and Reinforcement Learning (RFT), models are taught to generate "thinking" chains . This forces the AI to work through chemical plausibility step-by-step before providing an answer.
Chemical Format Augmentation: The model translates descriptions of molecular structures between various molecular "languages" (SMILES, SELFIES, plain-language IUPAC names), ensuring it understands the underlying concept of a molecule rather than just its name to avoid over-fitting results to specific naming types.
Liquid Architecture: By utilizing the LFM2-2.6B model, the team moved away from traditional softmax attention-based language models which become exponentially more resource-hungry as the work increases in length and complexity. Instead, they used a hybrid design that is significantly faster and more memory-efficient, keeping the key traits of the original lightweight and computationally efficient foundation model.

The MMAI Gym trained the LFM2-2.6B model using a wide variety of datasets developed both in-house and externally to cover different data types informing various stages of the drug discovery process, including 2D and 3D compound and protein structures, libraries of chemical properties, and drug discovery benchmarking systems, with an eye toward evaluating accuracy in situations outside the direct training sets to increase generalizability. To make sure that the model doesn’t make spurious or over-fitted associations between input formats and the ground-truth results, training and testing was repeated using combinations of different prompt wording templates, ensuring the model’s responses were accurate over a real-world diversity of prompt formulations.

David vs. Goliath:
2.6 Billion Beats 27 Billion

The true test of any AI is performance. The researchers compared their 2.6-billion parameter Liquid model against much larger Foundational LLMs and LLMs designed for chemistry tasks.
The results, presented in the ICLR paper, challenge the status quo:

ADMET Superiority: In the Therapeutics Data Commons (TDC) benchmark, the LFM2-2.6B-MMAI model outperformed TxGemma-27B—a model ten times its size—on multiple critical safety and pharmacokinetics tasks.
Molecular Optimization: In the MuMO-Instruct benchmark, which assesses the ability to keep a molecule’s overall structure similar while optimizing properties like blood-brain barrier permeability, solubility, or lipophilicity, the Liquid model achieved state-of-the-art success rates, outperforming even the most established proprietary models.
Retrosynthesis Mastery: In planning how to build a molecule from scratch, the model’s performance jumped from near-zero to elite levels, matching top-tier proprietary and specialist chemistry models.

"We have demonstrated that specialist-level performance in drug discovery does not require frontier-scale model size," the researchers noted. "With the right data and training procedures, a smaller, efficient model can outthink a giant."

The Future: A New Era of Liquid Intelligence

The implications of this work extend far beyond rearranging the leaderboard. By proving that intelligent efficiency can beat brute force, Insilico and Liquid AI are opening the door to a more sustainable and accessible future for biotech.

Smaller, faster models mean that high-level drug discovery AI can be run on-site in smaller laboratories, accelerating the journey from a digital concept to a life-saving drug. The MMAI Gym for Science serves as the blueprint for this transition, allowing any foundation model to be transformed into a powerful drug discovery specialist.

As we look toward the future of longevity and personalized medicine, the ability to rapidly and efficiently navigate chemical space is the ultimate competitive advantage. The era of the Brute Force AI is ending; the era of the Scientific Specialist has begun.

Stay tuned, follow us on social media!

Boston

Hong Kong

Abu Dhabi

Shanghai