Heterogeneous Graph Walk (HeroWalk) is a guided random walk-based approach that is applied to a heterogeneous graph (i.e., a graph containing different types of nodes). The model learns node representations and then finds the gene nodes similar to the reference disease node. First, the "walks" are sampled with a predefined meta-path, i.e., the fixed sequence of node types in a walk, e.g. 'gene'-'disease'-'gene.' The node degree controls the probability of transition between the nodes while sampling. Following that, the AI model learns the representation of each node based on the resulting corpus of walks. The cosine similarity between the specific disease and all available genes produces a ranked list of genes. The top genes from this list are predicted to be promising target hypotheses.
Matrix Factorization score is based on a collaborative filtering algorithm, which is widely used in recommender systems. First, well-known gene-disease associations from the PandaOmics database are converted to a sparse binary matrix.This matrix is then decomposed into two low-rank matrices that consist of latent factors for genes and diseases. The algorithm uses graph regularization based on a fast kNN search to account for the intraclass similarity between the nodes of a similar type. Recomputing the original interaction matrix from latent factors provides the scores for unobserved interactions; thus, gene ranking is obtained.
The score takes into account differential gene expression, protein abundance, or methylation level. Machine learning-based models are used to normalize available omics data within multiple samples from various datasets.
Interactome Community score utilizes several AI graph-based methods applied to the protein-protein interaction network enriched with active drug targets, GWAS hits, and differentially expressed/methylated genes.
This score utilizes a manually curated regulatory network consisting of transcription factors and other genes. A target will be scored higher if its network neighborhood is enriched with transcription factors regulating differentially expressed genes.
This score compares differential expression from the datasets of interest with transcriptomic data on cell lines with perturbation-induced target gene overexpression (LINCS database). A target is scored higher if the differential expression signature is similar to the gene expression changes in perturbed cells.
This score compares differential expression from the datasets of interest with transcriptomic data on cell lines with perturbation-induced target gene knockout (LINCS database). Target is scored lower if the differential expression signature is similar to the gene expression changes in perturbed cells.
This score estimates target relevance from the perspective of indirect genetic evidence. The signal from the GWAS hits and cancer drivers is propagated through the gene interaction network using diffusion algorithms. Genes belonging to the network submodules enriched with genetic variants are scored higher.
This score estimates whether a gene physically interacts with other genes known to be implicated in a specific disease. The score relies on manually curated data and external sources (ClinVar, Open Targets).
This score ranks genes based on the mutation burden using genetic data shown to be associated with current disease/phenotype. Data is compiled from a variety of sources, including Clinvar, GWAS catalog and IntOGen. Higher scores indicate mutation burden correlating with a given disease.
The score combines several approaches to pathway analysis. First, the iPanda algorithm is used to examine the involvement of a given gene in pathway activation patterns in order to get a collection of gene expression datasets of interest (activation/inhibition of each pathway is examined separately). Next, all the pathways from the library are merged into a single network, which is examined from the perspective of signal propagation by a number of methods. The final score indicates how a given gene affects individual pathways activation/inhibition and whether it possesses the ability to affect multiple pathways at once.
A Network Neighbors score utilizes several graph-based methods applied to the protein-protein interaction network enriched with differentially expressed/methylated genes. A target will be scored higher if there are more network neighbors with significant differences in expression or methylation levels.
The Relevance score is higher for those genes which are known drug targets. It also takes into account the number of clinical trials for corresponding drugs and their phase (late phases are scored higher). The data for the score calculation is obtained from the OpenTargets resource.