Many of today's most powerful generative AI systems employ the established technique called
Reinforcement Learning from Human Feedback (RLHF), learning from
human preferences. Unlike text, pictures, videos, voice, smell, and other data types that can be easily interpreted by most non-expert humans, interpreting AI-generated biological, chemical, and clinical data requires years of training and talent. For example, medicinal chemists who can accurately predict the multiple properties of small molecules just by looking at them are rare. In addition, even expert humans are often wrong and subject to human biases. And unlike in other fields where it is possible to reduce the biases in feedback by diversifying and balancing the user base, experts in drug discovery can not be hired cheaply. There are only a few experienced drug hunters that can traverse biology, chemistry, and medicine and even fewer humans who can cover multiple therapeutic areas. These expert humans are expensive, busy, and rarely work for companies developing solutions for AI-driven drug discovery. This poses a significant challenge for the training and validation of generative AI systems and is one of the key reasons for slowing the adoption and the impact of these systems on the pharmaceutical industry. Finally, many properties of the generated biological and chemical, or clinical data cannot be predicted even by expert humans with decades of experience and require experimental validation in biological systems.
Since 2015, Insilico Medicine is actively developing generative AI platforms for biology, chemistry, and medicine utilizing a broad range of generative approaches ranging from generative adversarial networks (GANs), variational autoencoders (VAE), genetic algorithms, transformers, and many other approaches with algorithmic, experimental, and human-directed reinforcement learning. Over time, we learned that it may take several months to invent, implement, train, and integrate one generative model, but it may take several years to validate it using expert feedback and experimentally. To achieve molecular-level accuracy, we rely on lengthy and computationally-expensive algorithmic reinforcement learning where pre-trained generative systems generate data with the desired properties and multiple predictive systems evaluate the output and reward or punish the generative systems depending on the probability that the desired property is present or the desired objective is achieved. We also work with a large number of contract research organizations (CROs) and our own human-operated and fully-robotic laboratories to validate the generated data.
In 2020, we started releasing production-grade industrial software for target discovery, generative chemistry, and clinical trial outcome prediction and developed methods for seamless integration of user feedback. Today, multiple pharmaceutical companies that took serious steps to integrate the validated and benchmarked generative systems, have deployed either PandaOmics, Generative Biologics, or Generative Chemistry apps or all of them combined, and helped validate and improve the Pharma.AI platform.