A breakthrough milestone in AI-powered drug discovery reached linking biology and chemistry with AI

A breakthrough milestone
in AI-powered drug discovery reached

2021.02.24

Linking biology
and chemistry with AI

AI-generated novel molecule for a novel target discovered with AI
demonstrated efficacy in a broad therapeutic area and reached
preclinical candidate stage in Idiopathic Pulmonary Fibrosis (IPF)

Insilico Medicine Documentary: A breakthrough milestone in AI-powered drug discovery reached

Drug discovery is one of the most complex, risky, and lengthy areas of human development. It takes many highly-intelligent and highly-skilled experts in biology, chemistry, and medicine to discover a drug. This process takes decades, billions of dollars, and fails over 90% of the time. There are very few truly novel drugs on the market. In 2020, the FDA approved 53 novel drugs, and that was the record year. Many of these drugs were small molecules that modulated the function of well-known molecular targets. Discovering a novel molecule for a novel target for a broad disease indication is extremely rare.

Today we announce the results of a new study that demonstrates how artificial intelligence can transform the discovery of medicine. For the first time, using many interconnected deep learning models and other advanced AI approaches we managed to link biology and chemistry to discover a novel biological target and generate a novel small molecule for a very important disease – Idiopathic Pulmonary Fibrosis (IPF). We performed all the required human patient cell, tissue, and animal validation experiments to claim a first-in-class preclinical candidate for a novel pan-fibrotic target , currently in preparation for clinical development. Insilico Medicine made both discoveries in a fraction of the time and the cost of traditional pharmaceutical research workflows but most importantly, it succeeded in a process where the probability of failure in biology and chemistry is over 90%. To achieve the preclinical candidate we designed and synthetized under 80 molecules and achieved unprecedented hit rates with several molecules at the preclinical candidate level.

Short explainer video: Insilico Medicine Achieves Industry-First Nominating Preclinical Candidate

Our AI system, including its biology solving engine PandaOmics, and compound-generating engine Chemistry42, is built on years of modeling large biological, chemical, and textual datasets — ever since our pioneering work on generative adversarial networks (GANs) for drug design in 2016. The novel target generated by PandaOmics presents a significant breakthrough and is relevant for a broad range of fibrosis indications.

Our Chemistry42 platform used this newly discovered target as the basis for the structure-based design of a first-in-class novel small molecule inhibitor. This small molecule showed outstanding efficacy for IPF (idiopathic pulmonary fibrosis)
and a good safety profile that led to its nomination as a pre-clinical drug candidate in December 2020 for IND-enabling studies. The phase I clinical trial for the novel drug candidate is planned for December 2021.

Our AI system has revealed a novel wide-indication target,
and a corresponding drug candidate in under 18 months and at roughly
1/10th of the typical cost associated with similar programs.

The Productivity Curse of Drug Discovery

Discovering a new drug to market is a complex and resource-consuming process that can cost pharmaceutical companies an average of $2.6 billion and up to a decade of research and development.

The process starts with identifying a hypothesis about a disease in question, typically revealing a malfunctioning protein leading to the disease or pathology. Proteins are the workhorses of our body. They perform most of the biological tasks required for our survival — from synthesizing molecules and mediating signals between cells and tissues to fighting infections. Most diseases are conditioned by the malfunctioning of one or several proteins at some point: they might possess a the wrong shape or chemical composition. These errors can lead to mistakes in biochemical reactions and, as a result — systemic damage to the body. Even a slight variation in any given protein may cause severe consequences or even death. The protein believed to be playing a role in disease development and progression is called a "target."

Once the target is identified, intensive follow-on research must be conducted to prove that the choice was correct — a process called "target validation." This work includes various studies, ranging from solving the crystal structure of the target protein to confirming its association with the disease in question. The association between a target and a disease is a crucial step in drug discovery that could lead to either the success or failure of the entire program. While every effort is made to understand the target's role in a disease, the correctness of choice becomes fully obvious only years later — during clinical trials in humans.

Target identification and validation are followed by discovering ways to influence the malfunctioning protein — typically by switching it off or changing its activity. This stage is done by medicinal chemistry departments or specialized contract research companies and involves large-scale screening programs where thousands or millions of chemical compounds are tested to see if they can influence the target in a beneficial way. The molecules with acceptable activity are called "hits." Of these "hits" many turn out to be false positives, and only a small number are eventually confirmed and selected to become "leads."

While lead compounds show significant activity towards the target of interest, they still need to be optimized for other crucial parameters — metabolic stability, safety, bioavailability and other properties. After all, there is no use in an active substance if it does not reach the target protein efficiently or if it targets multiple unrelated proteins in the body leading to unwanted side effects (toxicity).

The culmination of the lead optimization process is a molecule or a set of molecules ready for preclinical studies. Such molecules are then tested in animals (in vivo) to see if they keep acting predictably in the actual living organism. If the lead activity and safety are confirmed in animal tests, the molecule is finally nominated as a drug candidate. From there, it can proceed to Investigational New Drug (IND)-enabling studies, a final step in the preclinical drug discovery process. IND-enabling studies are a prerequisite for a molecule to be accepted by regulatory authorities (e.g. Food and Drug Administration (FDA) in the USA) for clinical development in humans.

The clinical stage is a new level of commitment that involves high costs, risks, and strict compliance requirements — since actual human lives are at stake at this point. Even though drug developers make a huge effort to ensure the quality of drug candidates, tragic accidents happen in clinical trials, leading to the death of patients because of unpredictable side effects or unknown biological factors. Such cases lead to immediate program terminations and enormous losses for drug makers.

Much as the above process resembles gambling in a casino, some molecules occasionally manage to pass all preclinical and clinical hurdles and get to market. At that point, they become medicines that are prescribed by physicians.

The enormous cost of drug discovery is the result of expensive research equipment, facilities, and talent as well as the high failure rate of costly clinical trials — up to 90% of all projects never translate to market for various reasons.

Such low productivity in pharmaceutical research is primarily conditioned by the immense complexity of biological systems and our limited understanding of how nature works. However, another significant role here is sub-optimal research and development (R&D) processes, complex and cumbersome workflows at large pharma organizations, and a significant disconnect between various drug discovery process stages. While biology research is done by one company, the chemical stage is often led by another department or even a different company. This is followed up by clinical studies conducted by yet another department or an organization. Transitioning stages, such as those between target validation and hit discovery, become graveyards for many brilliant ideas, breakthroughs, and dollars spent.

A new paradigm-shifting change that increases the productivity of pharmaceutical innovation is long overdue. The adoption of artificial intelligence at scale can bring about this change.

How Can AI Help?

“

The pinnacle of the deep learning revolution can be pinged to 2014 when deep learning systems started outperforming humans in image recognition and generative adversarial networks were invented. It is also the year when we started the company. In 2016 we demonstrated that a deep learning system can identify a novel biological target from omics data with experimental validation. From 2017 and 2019 we consistently demonstrated that generative AI can invent and design novel molecules that work in human cells and in animals. — Alex Zhavoronkov, CEO of Insilico Medicine

It is known that artificial intelligence thrives on data, especially on big datasets of high quality. Fortunately, there is a lot of data generated at each step of the drug discovery process, making it a lucrative application for modern AI technologies.

The application of such technologies has been shown to be beneficial at pretty much every step of the drug discovery process — especially at the hypothesis generation and target identification stage. Deep learning models and natural language processing technologies are compelling when modeling large complex multi-dimensional data sets, such as genomics, proteomics, clinical data, structural data about targets, and unstructured text (research papers, patents, grants etc.).

AI platforms that deploy virtual screening and the de novo generation of molecules have demonstrated the power of deep neural nets as tools for intelligent hit discovery. In this context, generative adversarial networks (GANs) are especially noteworthy — as we showed in our pioneering works back in 2016. Since 2016 we published dozens of research papers covering both generative biology and generative chemistry.

AI is used in lead optimization and preclinical studies and helps build, run and predict clinical trials and their outcomes.

There are now hundreds of drug discovery companies building AI-models for various tasks anddemonstrating the substantial benefit of this new approach. However, a true paradigm shift can only be achieved when AI is used to connect the dots between the various stages of drug discovery and build an end-to-end system from hypothesis to pre-clinical and clinical stages.

Insilico Medicine's work on building the most comprehensive AI-driven drug discovery platform over the years resulted in a new integrated research process where data and knowledge flow seamlessly from one stage of the process to another, leading to a rapid and cost-efficient workflow. We are proud to contribute to solving the drug discovery productivity curse by eliminating the disconnect between various stages of drug discovery, linking biology and chemistry into one unified data-driven workflow.

Our team spent years building and integrating hundreds of AI models, each responsible for a particular task, into one platform that is able to generate hypotheses, select targets, generate compounds, and predict clinical trial outcomes. To our knowledge, this is the most comprehensive drug design platform on the market.

Solving Fibrosis with Deep Learning

Our first works in target discovery using deep neural networks date back to 2015-2016 where we collaborated extensively with pharmaceutical and biotechnology companies as well as academic institutions to invent and test new ways to understand human biology. We built the first deep neural network-based system to predict human age using tissue-specific transcriptomic, proteomic, and other data types. Our first published work in target discovery with basic experimental validation was with a company called BioTime (now AgeX Therapeutics) called "Use of deep neural network ensembles to identify embryonic-fetal transition markers: repression of COX7A1 in embryonic and cancer cells" since then, we built over one hundred different models that use different approaches to target discovery, incorporating best practices and centuries of human knowledge and experience.

Here we share our latest study results where we applied our end-to-end AI platform to tackle Idiopathic Pulmonary Fibrosis (IPF). IPF is a broad medical condition that is limited to the lungs and primarily affects older adults. As the disease progresses, the health of the patient gradually deteriorates leading to a potentially life-threatening condition. Fibrosis is one of the main aging-associated disease processes, and we can use deep neural networks trained on age and different types of fibrosis to identify a range of targets.

To build our initial hypothesis, we trained deep neural networks on a collection of omics and clinical datasets to predict tissue-specific fibrosis, as well as age and sex of the patients. We then applied a range of target discovery tools implemented in our PandaOmics target discovery system and performed sophisticated gene and pathway scoring published in Nature Communications, and came up with relevant targets via deep feature selection, causality inference, and de novo pathway reconstruction. The target novelty and disease association scoring was assessed by a natural language processing (NLP) engine, which analyses data from millions of data files, including patents, research publications, grants, and databases of clinical trials. As a result, PandaOmics revealed 20 targets for validation that we narrowed down to one novel intracellular target and prioritized for further analysis.

Chemistry42 is our generative chemistry module for drug discovery. This module includes an ensemble of generative and scoring engines that "imagines" molecules from scratch using cutting-edge deep learning technologies we pioneered for medical use back in 2015. Chemistry42 automatically creates drug-like molecular structures with appropriate physicochemical properties. In this case, Chemistry42 was used to design a library of small molecules that bind to the novel intracellular target revealed by PandaOmics.

The series of novel small molecules generated by Chemistry42 showed promising on target inhibition. One particular hit ISM001 demonstrated activity with nanomolar (nM) IC50 values. When optimizing ISM001, we managed to achieve increased solubility, good ADME properties, and no sign of CYP inhibition — with retained nanomolar potency. Interestingly, the optimized compounds also showed nanomolar potency against nine other targets related to fibrosis.

In follow-up in vivo studies, the molecules were shown to improve fibrosis in a Bleomycin-induced mouse lung fibrosis model, leading to further improvement in lung function. These compounds also demonstrated a good safety profile in a 14-day repeated mouse dose range-finding (DRF) study.

The best-performing molecule was nominated as a preclinical drug candidate in December 2020 for IND-enabling studies that will lead to clinical investigations. IND-enabling studies have started, and currently, the scale-up/process development of the candidate is ongoing. We plan to complete the IND-enabling studies by the end of this year and start the phase I clinical trials either this year or early next year.

This figure presents a very rough list of experiments we performed to claim the preclinical candidate.

The whole study, from hypothesis to preclinical drug candidate, took just under 18 months to complete at a budget of around 2 million US dollars. This accomplishment is several orders of magnitude faster and cheaper when compared to the traditional drug discovery process.

Our team is committed to progressing the candidate, and are open to potential partnerships with pharmaceutical companies to co-develop the drug candidate following phase II.

Pioneering GANs in Drug Discovery

Identifying a novel pan-fibrotic target and a drug candidate with an unprecedented mechanism of action in one and a half years may sound like a drug discovery dream come true. But the path has been long and a lot of challenges remain to be addressed.

In 2015 we started our early exploratory experiments with generative adversarial networks (GANs). GANs are a type of deep learning architecture that have one neural net inventing new "matter" to match some predefined requirements (a generator), while another neural net tries hard to prove the generator wrong. Both neural nets are tasked with learning until the generator ends up with the best result. GANs use low-dimensional formats like binary fingerprints, SMILES strings, graphs, and other light representations to generate molecules.

We described the concept of using an Adversarial Autoencoder (AAE) for the generation of novel molecules in our paper "The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology," submitted for publication in Oncotarget in June 2016. This publication coincided closely with the publication of a similar idea by Alan Aspuru-Guzik's team in their ArXiv paper "Automatic chemical design using a data-driven continuous representation of molecules." During this period we started to collaborate and build a global community around generative chemistry.

Later we developed several improvements and new features to our GAN-based AI platform for drug design and started patenting our findings. In 2017 we built multiple working GAN models, including druGAN for fingerprints, ORGAN for SMILES, various recurrent neural networks (RNN) architectures with reinforcement learning and LSTM, agile temporal convolutional networks (ACTNs), and a reinforced adversarial neural computer (RANC). In 2018 we progressed towards building and validating a powerful deep generative model, generative tensorial reinforcement learning (GENTRL). GENTRL is a new AI system for drug discovery that dramatically accelerates the process of lead discovery from years to days. We made the code publicly available on GitHub to inspire a broader community of scientists to keep building on this work.

Eventually, we built an end-to-end AI platform with three key components; a target discovery and multi-omics data analysis engine PandaOmics, a de novo molecular design engine Chemistry42, and a clinical trial outcomes prediction engine InClinico.

We have also started investing heavily in the synthesis and validation of the molecules suggested by our engine for various projects.

In 2018 we published research revealing the first JAK3 inhibitor generated using the Entangled Conditional Autoencoder (ECAAE) with experimental validation. At that time, our engines could already achieve reasonable hit rates for GPCRs and other target classes.

In 2019 we reached a significant proof-of-concept milestone where we predicted a molecule for a well-known fibrosis target in just 21 days — and successfully validated prediction in vitro and in vivo. The results were published in Nature Biotechnology and attracted massive media attention and feedback from leading drug hunters and AI researchers. The milestone demonstrated the incredible potential of using AI to identify drug candidates and represented the first step of the overall traditional drug discovery process.

Since its inception, Insilico Medicine has published more than 150 papers and presented results at more than 100 conferences. During this time, we encountered excitement and support from the drug discovery community as well as skepticism. Our early models used to generate molecules that were not diverse enough or easy to synthesize, and the targets were well-known or easily druggable with known hits. With time, our improved results persuaded a number of scientists to become our supporters.

Now, in less than 18 months, we have achieved a far more incredible validation milestone — being able to identify a completely new target out of an automatically generated hypothesis and a drug candidate that is a first-in-class molecule — having come closer to being able to change the paradigm of drug discovery as we know it. We hope this study will prove illustrative enough to turn even the most inveterate skeptics into AI adopters.

Potential Impact

As much as we are excited about our AI platform's current success, the work towards transforming the pharmaceutical industry's paradigm is only beginning. It will take time before AI-driven R&D
is adopted at scale by leading pharmaceutical organizations.

Currently, AI is widely adopted by many pharmaceutical and biotech companies for specific tasks like virtual screening or data analysis. Still, the overall approach to drug discovery remains
the same — a cascade of poorly connected stages, without an efficient back-propagating element of learning from mistakes. With tools like our PandaOmics and Chemistry42, combined in one integrated workflow, organizations can streamline their efforts and accelerate the translation of ideas into actual clinical candidates and further on. We hope that the results of this study will inspire a much greater shift to a new drug discovery paradigm by pharmaceutical organizations all around the globe. With our highly motivated and dedicated team of experts, we continue to innovate and refine our approach and to expand to other therapeutic areas and find more novel drug candidates that will become future medicines.

What Experts say about this Discovery

We have reached out to pharmaceutical industry key opinion leaders
to hear their thoughts about our discovery and learn new ideas about
how we can move forward collectively towards AI-driven drug discovery

AI is transforming the healthcare industry, enabling breakthroughs that can improve millions of lives. Using NVIDIA's AI platform, Insilico Medicine, a premier NVIDIA Inception member, has done what we only dreamed of a few years ago: applying AI to dramatically accelerate drug discovery. And it is doing so at a time when it's never been more critically important to bring the power of AI to every industry to solve our greatest challenges.

Jensen Huang

CEO and founder of NVIDIA
Sinovation Ventures invested in Insilico Medicine from the early stage, and is confident about the company's innovation capability applying cutting-edge AI technologies to new drug discovery. With the proprietary AI technology platforms, Insilico Medicine managed to discover novel target and design novel molecules reaching PCC candidates stage for idiopathic pulmonary fibrosis. This record-time and cost-saving breakthrough powered by AI can be a worldwide milestone for AI chemistry and AI biology. The team led by Dr. Alex Zhavoronkov, Founder and CEO of Insilico Medicine, with expertise both in AI and new drug R&D, is committed to deliver on their mission of AI for good.

Dr. Kai-Fu Lee

Chairman and CEO of Sinovation Ventures
One of the most difficult steps and biggest mysteries in drug discovery is related to target validation, specifically identifying the targets that have a strong impact in a clinical setting. Insilico Medicine has managed to tackle one of the biggest mysteries in drug discovery through its AI endeavors.

Dr. Tudor Oprea

Professor and Chief of the Translational Informatics Division at the University of New Mexico and experienced drug-hunter with 25 years industrial and academic experience in drug discovery
Speed is everything in drug development. At least 90 percent of the costs associated with getting a drug approved for human use is in the late stage clinical trials. With its AI-powered universal system for drug discovery, Insilico is enabling researchers to figure out how to fail faster much earlier during the many phases of the drug discovery process leading up to clinical trials before it gets too late.

Dr. Charles Cantor

Professor Emeritus at Boston University, member of the Science Advisory Board at Insilico Medicine, co-Founder of Sequenom Inc., and co-Founder of Retrotope Inc.
This achievement of Insilico Medicine is another piece of evidence that AI is a powerful tool for drug discovery. By using AI in as many steps of the process as possible, AI can significantly reduce the time and cost to developing effective therapies.

Dr. Alán Aspuru-Guzik

Professor of Chemistry and Computer Science at the University of Toronto, and co-founder of AI companies Kebotix and Zapata Computing

Dr. Charles Cantor
Professor Emeritus at Boston University, Co-Founder of Sequenom Inc.,
Co-Founder of Retrotope Inc.
Former Principal Scientist of Human Genome Project, Department of Energy,
Co-founder of Sequenom (acquired by LabCorp), Professor and Director
of the Center for Advanced Biotechnology, Boston University

Dr. Tudor Oprea
Professor and Chief, Translational Informatics Division, The University of New Mexico
One of the top experts in target discovery with 25+ years in the industry
Built IDG-KMC, TCRD, Pharos, and Drug Central target discovery tools and organizations
H-index = 72

Dr. Yuan-Hua Ding
Founder & CEO at ATB
Former VP & Head of Pfizer Asia Discovery Lab Drug discovery expert with over 20 years of experience in structural biology

Dr. Alán Aspuru-Guzik
Professor of Chemistry and Computer Science, University of Toronto
Expert in Quantum Computing, Quantum Chemistry, Machine Learning
Professor, Harvard, Department of Chemistry 2006-2018
H-index = 82

Dr. Yuri Nikolsky
CEO, MiLaboratories
CEO, Sybille BioSciences
Co-founder of GeneGo (acquired by Thomson Reuters)
Former VP of Life Sciences, Thomson Reuters
Developer of MetaCore, MetaBase and other tools for target discovery
H-index = 48

Dr. Stevan Djuric
Expert in Drug Discovery and Development
Former VP of Abbvie and Abbott Laboratories
Over 30 years experience in Medicinal Chemistry and Immunoinflammatory disease
Adjunct professor at The University of Kansas, High Point University
H-index > 30

Dr. Jeremy Levin
Chairman and CEO, Ovid Therapeutics Inc Chairman of the Biotechnology Innovation Organization (BIO)

Stay tuned, follow us on social media!