2025.10.02
Covering Patent Holes:
How LEGION Uses AI to Block Competition in Chemical Space
Drug discovery is often described as searching for a needle in a haystack. But in reality, it’s far trickier: it’s like searching for one special blade of grass in a field bigger than the universe. Chemists call this infinite landscape, the set of all possible small molecules that could exist, chemical space. Estimates put the number of “drug-like” molecules at more than 10⁶⁰. For perspective, that’s about the number of atoms in the Milky Way galaxy.
Most of this space is unexplored, and that gap creates both opportunity and danger. One of the largest challenges facing biotech companies working to drug novel targets is the flood of “me too” and “me better” drugs. Instead of venturing into the unknown regions of chemical space to discover novel and effective drug molecules, let alone discover promising new targets, it’s far easier to start with a molecule you already know works and improve it. A company can spend a decade unlocking the biology of a target and identifying small molecules to inhibit it, only to have another, more nimble biotech design a molecule that is structurally similar, safer, or even higher quality, and get it into clinical trials in just a few years. Welcome to the animal world of biotech, where AI is accelerating the cycle and enabling fast followers to move faster than ever.

This raises a new question: how to not only efficiently search chemical space for drugs, but also cover it so innovation is protected and no longer patentable by fast followers?

A new paper on ChemRxiv introduces LEGION, a powerful AI-driven workflow that takes on this challenge by covering chemical space so fully that competitors can no longer claim it as patentable ground. LEGION stands for Latent Enumeration, Generation, Integration, Optimization, and Navigation, and expands the reach of generative chemistry tools, ensuring that vast regions of chemical space are disclosed and defended. 

Using LEGION, researchers at Insilico Medicine generated over 123 billion new molecular structures in their proof-of-concept test, uncovering tens of thousands of promising “scaffold” core molecular structures in a matter of hours, not months. These results were subsequently reviewed and validated by medicinal chemists to confirm their plausibility and relevance. From this output, Insilico open-sourced a subset of 120+ million molecules around the high-value NLRP3 target, ensuring that regions of this chemical space are publicly disclosed and far harder for competitors to patent.
The Patent Problem in Chemical Space
Even though chemical space is unimaginably vast, the part we’ve actually studied is tiny. Databases like PubChem or ChEMBL contain millions of molecules, but that’s nothing compared to the whole 1060-member collection of potential drugs. Worse, biologically active molecules that actually interact with proteins in our cells tend to clump together in narrow regions of this space. That means drug discovery tools, often trained on known molecules, keep focusing on the same spots.

This dilemma doesn’t just limit discovery but creates risk. If innovators only explore familiar corners of chemical space, competitors can move into regions and claim patents on structurally distinct and superior molecules that hit the same target. In other words, incomplete maps of chemical space leave innovators exposed.

This isn’t just hypothetical. In drug discovery, scaffold hopping, designing structurally distinct molecules around the same biological target to secure new IP, is a well-known strategy. That’s why the real challenge isn’t just finding new molecules, but covering chemical space so comprehensively that it becomes unpatentable to fast followers. LEGION was built with this purpose in mind.
The Idea Behind LEGION,
and How it Protects Innovation
The earliest steps of generative chemistry are absolutely critical for the rest of the molecular discovery process. AI-based drug discovery algorithms first generate a slew of chemical structures based on their training sets of input molecules and the parameters and properties learned from those molecules. If that initial set of output structures is limited, the entire discovery pipeline is handicapped from the start and vast regions of chemical space remain open for competitors to patent. LEGION was built to prevent that, expanding the starting pool as broadly and diversely as possible so chemical space is not just explored but covered and disclosed at scale so competitors can no longer patent it. 

LEGION maximizes the extent of chemical space it can cover through a multi-pronged strategy, with several key optimizations to the molecular generation steps that ensure unexplored regions are disclosed and defended.

First, LEGION maximizes scaffold diversity. Generative chemistry tools will often start with a set of molecules known to bind to a target protein of interest and, from there, extract a core sub-structure to keep constant while iteratively swapping out peripheral bits and pieces. That core sub-structure is the scaffold. Instead of letting AI over-optimize around a few known scaffolds, LEGION tweaks the generative reward system so that all promising molecules get equal credit, while highly similar ones get penalized. This pushes the system to keep exploring new shapes.

Secondly, LEGION uses manual tricks to handle stubborn chemistry. Generative models struggle with complex scaffolds that have multiple “attachment points” for chemical side chains. LEGION sidesteps this by simplifying tricky structures into more manageable forms by copy-pasting in a set of side-chains common to drug molecules to reduce the number of open-ended attachment points. By doing this, more complex scaffolds don’t get tossed out early just because the generative algorithms have difficulty handling their large number of attachment points, thus increasing the number of scaffolds that can be built upon in the following steps.

Third, after handing off the scaffolds to AI-based generative chemistry engine Chemistry42 to generate thousands of drug-like virtual molecules, LEGION implements a mixing-and-matching step called combinatorial explosion to systematically mash up the structures generated in this initial round of molecular generation. These virtual structures, composed of a scaffold and any peripheral side-chains added to its attachment points, get broken up into scaffold/side-chain fragments. Side-chain fragments from one scaffold then get added to the attachment points of other scaffolds, essentially multiplying the number of virtual compounds generated by the number of fragments of each type. In the proof-of-concept testing, a single round of combinatorial explosion from about 12,000 scaffolds yielded nearly 123 billion structures.
Together, these approaches mean LEGION can identify regions of chemical space that would otherwise remain dark. By disclosing them at scale, it blocks competitors from fast following and claiming these structures as new IP.
Putting LEGION to The Test:
The NLRP3 Case Study
To prove the system, researchers applied LEGION to NLRP3, a protein at the heart of inflammation in tissues all over the body. Insilico is developing a potentially best-in-class, brain-penetrant, and safe NLRP3 inhibitor to treat the wide range of diseases in which NLRP3 is implicated, including arthritis, Parkinson’s disease, heart disease, and many more. This makes NLRP3 one of the most valuable drug targets in immunology with a market potential often compared to the GLP-1 class. Many pharma companies have tried to develop inhibitors targeting NLRP3, but none have yet reached the market, leaving open the possibility of discovering new drugs that are both therapeutically effective and structurally distinct.

Using LEGION, the team identified over 34,000 unique scaffolds with potential to bind NLRP3 with a combination of generative AI tools for designing new molecules and AI-based screening of the massive databases of previously identified molecules. The scaffold-simplifying step to systematically replace attachment points on complex scaffolds with common drug side-chains, thus limiting the number of free attachment points to a computationally manageable number, resulted in a total of nearly 94,000 final scaffolds. These were fed into the Chemistry42 generative chemistry engine to iteratively add to and modify the structures’ makeup and side-chains, yielding 6.5 million virtual compounds ready for virtual filtering and screening.

In parallel, a subset of the scaffolds with two attachment points were subjected to combinatorial explosion, resulting in over an additional 100 million structures, which was actually a random sample of the 123 billion total structures generated with combinatorial explosion, in order to have a feasible number of structures for the other analytical tools to handle. With more powerful computational systems in the future, though, more of the total generated volume could be used for follow-up.

Many of the most promising scaffolds were reviewed by experienced medicinal chemists, who confirmed their plausibility, novelty, and relevance for drug development. This validation demonstrates that LEGION’s output is not just computational noise but a credible expansion of chemical space. By making these molecules public, Insilico ensures that the chemical space around valuable targets like NLRP3 is scientifically sound and defended against future patent claims.
For NLRP3, the stakes could not be higher. With its market potential compared to GLP-1 and many competitors circling the target, there is a high risk of being rapidly followed. By open-sourcing more than 120 million AI-generated NLRP3 molecules, LEGION accelerated structure discovery and secured the competitive landscape. This disclosure makes vast regions of NLRP3 chemical space unpatentable to fast followers and protects Insilico’s innovation while reshaping how IP battles are fought in biotech.
Where LEGION Positions The Future of Drug Discovery
LEGION’s goal is simple: don’t just search smarter, cover wider. The philosophy that LEGION embodies is only possible because of the sheer size of chemical space and, therefore, how little of it has already been explored and tested. Typically, advances in drug discovery methods focus on producing molecules similar to previously discovered molecules with similar drug-like properties. LEGION kicks off the search with a focus on breadth, diversity, and maximally reaching into the far corners of chemical space, giving drug hunters the opportunity to develop more unique drugs while also ensuring those regions cannot be easily patented by fast followers.

The implications in the patent landscape and drug discovery are profound. LEGION doesn’t just accelerate discovery timelines but offers a new model for intellectual property strategy. By generating large families of molecules around each scaffold and disclosing them publicly, companies can block off huge swaths of chemical space from competitors. This creates stronger patent positions, new opportunities for competitive advantage, and especially greater protection for innovation.

At the same time, there are limitations. LEGION relies heavily on structural data about the target protein, such as 3D crystal structures and known ligand interactions. For targets without deep structural information, the coverage would be less extensive. And while AI can propose molecules, human medicinal chemists still play a critical role in scoring, prioritizing, and testing the results in the lab.
A New Paradigm:
AI That Secures Chemical Space
In many ways, LEGION is like the latest high-resolution orbiting telescopes for chemical space. For centuries, astronomers gazed at the night sky and charted a few constellations. Then optical telescopes revealed so much more of the planets, stars, and galaxies that make up our celestial neighborhood. But then modern space telescopes changed everything again, unveiling far-off galaxies in the deepest reaches of space and showing just how far the universe extends. LEGION does the same for molecules, taking us beyond the small corner of chemical space that AI-powered generative chemistry tools have recently allowed us to survey. But unlike astronomy, mapping chemical space has a powerful advantage: once these molecular galaxies are revealed and disclosed, competitors can no longer claim them as new IP.

The ChemRxiv paper concludes that LEGION is the first AI-enabled framework to combine massive-scale generative chemistry with large-scale virtual screening, enabling not just intelligent navigation of chemical space, but systematic coverage that leaves fast follower competitors with nowhere to patent.

Workflows like LEGION could mark a turning point in drug discovery. Instead of inching through a haystack, researchers may soon scan entire fields of chemical space at once, finding the molecules of tomorrow while also securing the IP landscape around them.
Read the full paper on ChemRxiv:
Molecular LEGION: Latent Enumeration, Generation, Integration, Optimization and Navigation. A case study of incalculably large chemical space coverage around the NLRP3 target

DOI: 10.26434/chemrxiv-2025-h10tn
Stay tuned, follow us on social media!