Press Releases
2026-04-22 20:00

Nature Reviews Drug Discovery | Target Identification and Assessment in the Era of AI

CAMBRIDGE, Mass., April 22, 2026 — Insilico Medicine (“Insilico”, 3696.HK), a clinical-stage generative artificial intelligence (AI)-driven biotechnology company, recently published a comprehensive review that emphasizes key considerations in target selection, summarizes breakthroughs in the application of AI-driven approaches for therapeutic target exploration, and highlights clinical-stage successes in which AI played a pivotal role in target identification.

The review, titled "Target identification and assessment in the era of AI", was published in Nature Reviews Drug Discovery (https://doi.org/10.1038/s41573-026-01412-8), a leading journal in the pharmaceutical and biomedical research community known for its peer-reviewed strategic roadmaps and in-depth analysis of emerging trends and breakthroughs in drug discovery and development.

Target identification is the first and perhaps most critical step in drug discovery and development. Although the human genome contains roughly 20,000 protein-coding genes, only about 4,500 are considered “druggable,” and notably, all approved drugs to date act on only 716 distinct targets. The challenge in identifying a therapeutic target gene that both effectively treats disease and can be safely targeted has thus led to traditional methods of developing a therapeutic hypothesis taking months to decades. Today, artificial intelligence (AI) is reshaping this process, turning what was once a series of chance or random discoveries into a systematic, data-driven science.

The Criticality of Target Selection

Target identification involves selecting a biological molecule that can be modulated to achieve a desired therapeutic effect with sufficient safety. Because this decision sets the direction for all research that follows, it strongly influences both the probability of success and the time and resources required in subsequent stages of drug development.

Key considerations for selecting an optimal drug target mentioned in the review article include:

  • Therapeutic hypothesis: Defining the mechanism by which target modulation affects disease biology.
  • Druggability and Safety: Assessing if the target can be effectively modulated by a drug and evaluating the risk of on-target and off-target adverse effects.
  • Commercial Tractability—Novelty or Confidence: Balancing groundbreaking "first-in-class" opportunities with the relative safety of "best-in-class" established targets.
  • Combination Value: Evaluating whether the target can enable synergistic or differentiated combination strategies aligned with clinical and commercial therapeutics.

Despite decades of progress in our understanding of human biology, traditional target discovery remains challenging. Many diseases are driven by complex, incompletely understood mechanisms, and technical and resource constraints make it difficult to extract necessary insights from sources such as human genomics and disease models. The integration of machine learning allows researchers to navigate biological complexity by uncovering previously unknown disease-associated targets.

Harnessing Multimodal Data for Target Discovery

The cornerstone of AI's application in identifying therapeutic targets is its capacity to process and analyze a wide array of complex multimodal data. To navigate the intricate biological networks of disease, researchers utilize diverse data sources that bridge the gap between molecular activity and clinical outcomes.

At the molecular and cellular levels, AI platforms leverage 'omics' data—encompassing genomics, transcriptomics, proteomics, metabolomics, and epigenetics—to construct comprehensive molecular profiles of diseases. By integrating these layers, machine learning models can identify disease-causing variants and uncover fundamental biological mechanisms that might otherwise remain hidden due to complex or context-specific genetic interactions and the scarcity of large, high-depth clinical datasets. This is complemented by cellular imaging data, where advanced machine learning models like Convolutional Neural Networks (CNNs) can automatically detect subtle morphological changes in cells—such as alterations in the mitochondria or cytoskeleton—facilitating target identification through high-content phenotypic screening.

Beyond that, AI systems utilize structured biological knowledge graphs and real-world clinical data to generate target hypotheses. Knowledge graphs organize intricate relationships between proteins, genes, pathways, and diseases, allowing graph neural networks (GNNs) and inference methods to predict novel interactions and synthetic lethality. Meanwhile, clinical and phenotypic data—such as electronic health records, clinical trial outcomes, and medical imaging—provide critical insights into patient demographics, disease progression, and individualized responses, helping to bridge the gap between genetic findings and clinical reality.

AI also excels in mining unstructured text-based information to evaluate both the scientific validity and commercial viability of potential targets. By processing vast repositories of scientific literature, grant funding allocations, patents, and regulatory submissions, AI tools can track research trends, identify hidden gene-disease linkages, and gauge the competitive landscape. To maximize the value of these highly diverse sources, researchers are increasingly employing integration strategies, such as building heterogeneous knowledge graphs or unified data warehouses, which harmonize these datasets so they can be seamlessly analyzed by advanced AI frameworks.

The Algorithmic Engine: AI Models for Target Discovery

In this review, the authors detail the diverse machine learning frameworks that form the backbone of modern target discovery. These computational engines allow researchers to move beyond traditional observation, transforming vast biological datasets into actionable prioritized therapeutic hypotheses.

At the core of these efforts are supervised learning algorithms, which utilize labeled data—such as confirmed drug-target pairs—to predict novel interactions or prioritize causal disease genes from genomic loci. Notable examples that Insilico Medicine has pioneered include the PandaOmics platform, which integrates multi-omic and published text data sources to nominate disease targets; GeroScope, which identifies potential targets implicated in aging from gene expression profiles; TargetPro, which learns features characteristic of clinical-stage targets; and various deep learning (DL)-based mode-specific algorithms that identify aging and disease biomarkers and targets from patients' blood chemistry, DNA methylation, gut microbiome, or transcriptome data.

In contrast, unsupervised and semi-supervised learning methods identify hidden biological structures within unlabeled or partially labeled datasets. These techniques are frequently employed to identify disease-associated modules within gene networks, extract feature similarities from protein-protein interactions, and predict the potential druggability of candidate targets.

Advanced deep learning architectures rely heavily on representation learning, a process that encodes diverse biological entities—ranging from microscopy images to amino acid sequences—into high-dimensional numerical vectors, or embeddings. These embeddings capture complex, contextual biological properties that facilitate a wide array of downstream tasks. Building on these representations, Graph Neural Networks (GNNs) exploit the inherent structure of biological graphs to predict multi-gene interactions, such as synthetic lethality, or to identify specific target combinations capable of reversing complex disease phenotypes.

The newest frontiers in the field involve Generative AI and foundation models, which are pre-trained on massive datasets—such as tens of millions of single-cell transcriptomes—to capture gene network dynamics. These models enable researchers to simulate cellular responses to genetic perturbations and pinpoint crucial disease drivers with high precision. Emerging "Life Models" like the PreciousGPT series generate synthetic multi-omics data to facilitate target discovery. Similarly, transformer models like Geneformer and scGPT, pre-trained on tens of millions of single-cell transcriptomes, allow for the simulation of cellular perturbations.

Finally, domain-specific Large Language Models (LLMs) and AI agent frameworks, such as BioGPT and OriGene, act as "virtual biologists". By mimicking human expert reasoning, these agents can process vast repositories of biomedical literature, synthesize information across disparate databases, and autonomously generate and refine therapeutic hypotheses.

Clinical Proof Points

For the treatment of IPF, Insilico used PandaOmics—the target-identification engine within its end-to-end generative AI platform, Pharma.AI—to prioritize TNIK as a novel therapeutic target. Chemistry42, the platform’s generative chemistry engine, then enabled the design of the inhibitor Rentosertib (ISM001-055), a potentially first-in-class small-molecule TNIK inhibitor. Notably, the program progressed from project initiation to preclinical candidate nomination in approximately 18 months, and Rentosertib has completed a Phase IIa clinical trial, demonstrating a favorable safety profile and dose-dependent improvements in forced vital capacity in patients.

The Future: AI-Driven Closed-Loop Platforms

The review concludes that the future of target discovery depends on overcoming persistent industry challenges, addressing data quality and availability, developing explainable AI models, establishing standardized metrics and benchmarking frameworks, utilizing synthetic data and digital twins, and deploying AI-driven closed-loop platforms. In this emerging paradigm, AI nominates targets, automated robotic labs execute experiments, and the resulting biological data is fed back into the models to refine the search. By merging computational power with experimental validation, the industry can significantly accelerate the delivery of effective, clinically actionable therapies to patients.