Reinforcement Learning with Expert Human Feedback
to Advance Pharmaceutical
Generative AI
Many of today's most powerful generative AI systems employ the established technique called Reinforcement Learning from Human Feedback (RLHF), learning from human preferences. Unlike text, pictures, videos, voice, smell, and other data types that can be easily interpreted by most non-expert humans, interpreting AI-generated biological, chemical, and clinical data requires years of training and talent. For example, medicinal chemists who can accurately predict the multiple properties of small molecules just by looking at them are rare. In addition, even expert humans are often wrong and subject to human biases. And unlike in other fields where it is possible to reduce the biases in feedback by diversifying and balancing the user base, experts in drug discovery can not be hired cheaply. There are only a few experienced drug hunters that can traverse biology, chemistry, and medicine and even fewer humans who can cover multiple therapeutic areas. These expert humans are expensive, busy, and rarely work for companies developing solutions for AI-driven drug discovery. This poses a significant challenge for the training and validation of generative AI systems and is one of the key reasons for slowing the adoption and the impact of these systems on the pharmaceutical industry. Finally, many properties of the generated biological and chemical, or clinical data cannot be predicted even by expert humans with decades of experience and require experimental validation in biological systems.

Since 2015, Insilico Medicine is actively developing generative AI platforms for biology, chemistry, and medicine utilizing a broad range of generative approaches ranging from generative adversarial networks (GANs), variational autoencoders (VAE), genetic algorithms, transformers, and many other approaches with algorithmic, experimental, and human-directed reinforcement learning. Over time, we learned that it may take several months to invent, implement, train, and integrate one generative model, but it may take several years to validate it using expert feedback and experimentally. To achieve molecular-level accuracy, we rely on lengthy and computationally-expensive algorithmic reinforcement learning where pre-trained generative systems generate data with the desired properties and multiple predictive systems evaluate the output and reward or punish the generative systems depending on the probability that the desired property is present or the desired objective is achieved. We also work with a large number of contract research organizations (CROs) and our own human-operated and fully-robotic laboratories to validate the generated data.

In 2020, we started releasing production-grade industrial software for target discovery, generative chemistry, and clinical trial outcome prediction and developed methods for seamless integration of user feedback. Today, multiple pharmaceutical companies that took serious steps to integrate the validated and benchmarked generative systems, have deployed either PandaOmics, Generative Biologics, or Generative Chemistry apps or all of them combined, and helped validate and improve the Pharma.AI platform.
Today, we are excited to announce the launch of a new expert feedback collection program for demo users of Chemistry42 called Reinforcement Learning with Expert Human Feedback (ReLEHF). The aim of this program is to enable and empower our rapidly growing community to dynamically improve the platform at scale
Starting today, users can rate the generated molecular structures in the demo environment of Chemistry42. This valuable feedback will be used to improve the accuracy and efficiency of the generative models leveraged by the platform.
In 2020, Insilico Medicine launched Chemistry42, a generative AI platform that led to the establishment of Insilico's clinical stage programs in IPF, oncology and rare diseases. The platform uses over 42 generative models to search the chemical space to design and optimize molecules that bind a given target that is implicated in a disease. Since its launch, the Chemistry42 AI team has significantly improved the generative models to solve more challenging use cases, including lead optimization and ADME predictions.

Users configure the experiments by taking either a structure-based or ligand-based design approach, and depending on the approach, specifying protein structures' binding site, pharmacophore hypotheses, known ligands, drug-likeness, desired synthesis complexity, and other physicochemical properties. This configuration guides the generative algorithms in the design of novel molecules that satisfy the specified configuration using a Reinforcement Learning (RL) procedure.
To learn more about Chemistry42, read our application note in JCIM.
Reinforcement Learning with Expert Human Feedback
Today, Insilico is launching a Reinforcement Learning with Expert Human Feedback (ReLEHF) initiative to further improve the core AI functionality of its platform. Computational and medicinal chemistry experts are invited to participate by leaving their feedback on examples of generated structures from experiments run by the Insilico Medicine team.

In ReLEHF, experts can explore the results from Insilico's case studies run with the latest version of Chemistry42. The examples include a structure-based design of JAK3 inhibitors, hit-expansion of USP7 inhibitors, and generation of EGFR brain-penetrant double mutant inhibitors. Experts can leave feedback for any generated structures, and their input will be used to improve our AI pipelines in future releases of Chemistry42.
Each future version will feature new case studies and incorporate experts' feedback to further improve the algorithms. This regular iterative procedure has proven to be extremely efficient with in-house expert feedback, and we hope to further improve based on the industry's best chemists' opinions.
By collecting expert annotations for generated molecules, we improve the accuracy and efficiency of our generative models and reward functions. This data will significantly improve our AI-based drug discovery solutions.
Daniil Polykovskiy, PhD
IT Director of Insilico Medicine
Why ReLEHF is important
Chemistry42's reward functions assess the molecules based on physics, chemistry, and biological information to ensure the generated molecules are diverse and have optimized properties. However, there is much more to be taken into account when determining which structures to move forward in drug discovery programs. Our goal is to use expert feedback to understand how drug hunters think when presented with generated structures — what is the reasoning behind their preferences? The expert preference model trained on ReLEHF data will prioritize the structures and improve generation results.
The experts can also leave specific comments along with their feedback. Corresponding AI teams will review all the comments and determine which specific modules should be improved. For example, if an expert thinks that the structure is toxic, the corresponding ADME predictors will incorporate this feedback to improve the accuracy. If the expert thinks that the structure is not novel enough, we will use this information to improve the novelty of the generated structures.
Get demo access: Fill in the access request form on
Review case study descriptions in the user manual and experiment configuration
Select any of the available case studies and start reviewing the molecules
Assess if a molecule aligns with the specific objectives of the case study and general drug discovery needs
Leave 👍 or 👎 to help us improve the platform.
You can also leave additional feedback to let us know exactly what you like or don't like about the structure
Thank you for your support in building a community around Chemistry42 and helping us to improve our AI-based drug discovery solutions.
By clicking the button you agree to our Privacy Policy. The information you provided in the application above and any preceding or follow-up communication with you in this regard will be kept strictly confidential according to Insilico Medicine's internal policies
Stay tuned, follow us on social media!