Take Your AI Model To the MMAI Gym for Science
What is
the MMAI Gym?
MMAI — Multi-Modal AI Gym
Insilico Medicine’s MMAI Gym is a training ground for LLMs that leverages Insilico Medicine’s proprietary data, reasoning datasets, and validated models to boost the LLM’s intelligence in drug discovery and development related tasks.
How can we work together?
Build “Smart” Agentic Workflows for drug discovery using our Pharmaceutical Super Intelligence (PSI) Models
2
These models will have validated Chemical (CSI) and Biological (BSI) superintelligence that you can use to build workflows for drug discovery and development
If you don’t have your own model, no problem! We're also training open-source foundation models with the AI Gym
1
Improve your own model’s drug discovery “fitness levels” at the MMAI Gym
Experimentally validate the output with our automation lab
Subscribe for a 3 month training session and watch your model’s output improve in performance in benchmarks
Upgrade your proprietary LLM into a best-in-class foundation model for drug discovery tasks using our training regime

LLM Training Routine

Boost drug discovery intelligence of your LLM

Case Study
MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery
We use MMAI Gym to train an efficient Liquid Foundation Model (LFM) for these applications, demonstrating that smaller, purpose-trained foundation models can outperform substantially larger general-purpose or specialist models on molecular benchmarks.
Portals
Publications
Advancing Target Discovery Through Disease-Specific Integration of Multi-Modal Target Identification Models and Comprehensive Target Benchmarking System
We present a unified platform combining machine learning-based target identification with comprehensive benchmarking. As a testbed, we developed Target Identification Pro (TID-Pro), a disease-specific model spanning 38 diseases across oncology, metabolic, immune, fibrotic, and neurological categories. TID-Pro shows strong predictive performance for clinical-stage targets and reveals disease-specific patterns, underscoring the need for tailored target detection models.
Read Paper
The End of Aging Clocks: Training Foundation Models to Reason in Aging and Longevity
Here, we report Longevity-LLM v0.1, a Qwen3-14B model fine-tuned through supervised and reinforcement learning regimes on DNA methylation, proteomics, clinical biomarker, and RNA expression data. Longevity-LLM achieves high ranks in the recently announced Longevity Bench, including such tasks as cancer survival and RNA- or proteome-based age prediction. After reinforcement fine-tuning, the model achieved a 4.34-year MAE in epigenetic age prediction, surpassing the Horvath multi-tissue clock.
Read Paper
When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs
We propose a new benchmarking framework for single-step retrosynthesis that evaluates both general-purpose and chemistry-specialized LLMs using ChemCensor, a novel metric for chemical plausibility. By emphasizing plausibility over exact match, this approach better aligns with human synthesis planning practices. We also introduce CREED, a novel dataset comprising millions of ChemCensor-validated reaction records for LLM training, and use it to train a model that improves over the LLM baselines under this benchmark.
Let's work together