0 ×


phenobo version 2.1.6

GeneticNetworkScore is part of the phenotype and metabotype analysis implemented in PheNoBo. This node is the successor of the PhenoToGeno node and the MetaboToGeno node. It is the predecessor of the NetworkScore node.

The aim of GeneticNetworkScore is to refine the gene scores of PhenoToGeno and MetaboToGeno. The node calculates a new score for each gene based on a genetic network. The procedure increases the scores of genes that interact with known causal genes for the patient's condition. Therefore, the GeneticNetworkScore node enables the detection of new disease genes.

GeneticNetworkScore requires 2 tables with input data: the initial gene scores and a genetic network. For detailed information about the format of the tables have a look at the Input Port section and at the example files provided at https://github.com/marie-sophie/mapra.

The node implements a random walk with restart on a genetic network. The random walk with restart is an iterative procedure based on the function st+1 = (1-r)Mst + rs0. The function describes a random score transfer along the edges of the network. st is a vector and denotes the scores of all genes after t iterations. The vector s0 contains the initial scores calculated by PhenoToGeno or MetaboToGeno. M is a (sparse) transition matrix representing the edges of the genetic network. The entries mi,j of M give the probability of transferring scores from gene j to gene i. The parameter r gives the fraction of the original scores s0 that is not distributed within the network.

Finally, the gene scores of the random walk with restart are translated into enrichment scores. The enrichment score of a gene with gene score g is determined as log10(gn) where n denotes the total number of genes. If the enrichment score is greater than 0, the gene score is higher than expected for a random prediction (where all genes get a score of n-1). If the enrichment score is lower than 0, the gene score is lower than expected for a random prediction.


Use Weighted Edges
This option allows to include edge weights into the calculations. The edge weights are translated into the probabilities mi,j of the transition matrix M of the random walk with restart. The weight of an edge is proportional to the probability of transferring scores along the edge. If this option is checked, the table at input port 1 has to provide a column with integer edge weights.
Restart Probability
The parameter restart probability r controls the fraction of the original scores that is distributed among the nodes of the network. For example, if the restart probability is set to r=0.9 (default value), 90% of the original score of a gene is kept and 10% of its score is distributed among its neighbors.
Number of Iterations
This option refers to the parameter t (number of steps) of the random walk with restart. It influences how far the score is spread among the neighbors of a node. For example, if the number of steps is t=2 (default value), the direct neighbors and the neighbors of the direct neighbors receive scores from a node.
Iterate until Convergence
This option provides an alternative to the option Number of Iterations. If you choose this option, the scores are approximated for an infinite number of steps (t=∞). This means that the score of a node is distributed among all other nodes of the network.

Input Ports

Scored Genes: a table produced by the PhenoToGeno node or the MetaboToGeno node. GeneticNetworkScore requires not all columns generated by PhenoToGeno or MetaboToGeno. This node only depends on the columns gene_id and gene_probability.
Network: a table representing a genetic network. Each row corresponds to an undirected edge of the network. The edges are described by 2 columns called gene1 and gene2 giving the gene ids of the edge's vertices. If the option Use Weighted Edges is checked, the table requires a third column named weight with integer values.

Output Ports

Gene Scores: Each row represents a gene and consists of 3 columns: gene_id, gene_probability and enrichment_score. The column gene_probability contains modified gene scores based on the scores from the table at input port 0. The gene probability indicates the likelihood that the gene is causal for the patient's disease. The column enrichment_score is a gene score that is normalized for the total number of genes. If the enrichment score is above 0, the gene probability is higher than expected for a random prediction.


To use this node in KNIME, download the below referenced file, save it to your KNIME's plugin folder and restart KNIME.


You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.