PhenoToGeno

PhenoToGeno is part of the phenotype analysis implemented in PheNoBo. This node is the successor of the Phenomizer node and a predecessor of the NetworkScore node.

The aim of PhenoToGeno is to transform the per-disease results of Phenomizer into predictions of causal genes. The PhenoToGeno algorithm calculates a score for each gene based on the p values reported by Phenomizer. The score of a gene x indicates the probability that x is the causal gene for the patient's disease given the observed symptoms.

PhenoToGeno requires 3 tables with input data: the disease scores, the disease-gene associations and the set of all genes to score. For detailed information about the format of the tables have a look at the example files provided at https://github.com/marie-sophie/mapra.

The algorithm of PhenoToGeno is derived from the Phen-Gen tool (see Javed et al., 2014)...
The PhenoToGeno algorithm is a procedure with 2 main steps.
The first step of the method is a score transformation at disease level. The p value of each disease is converted to the probability that the patient suffers from the disease. A disease with p value p gets a new score s=1/(1+np) where n denotes the total number of diseases.
The second step transfers a disease score to all genes that have an association with that disease. If a disease does not have any known associations, its score is distributed among all genes. There are several methods (see dialog options) to handle genes that obtain scores from more than one disease.
The algorithm of PhenoToGeno is derived from the Phen-Gen tool (see Javed et al., 2014) and is described in more detail at...

Options

Gene Annotation Mode
This option specifies how PhenoToGeno handles genes that obtain scores from more than one disease. There are two possible modes.
Combination of all disease scores: This method combines the scores of all diseases annotated to a gene. The final score of a gene is determined as (1-s1)(1-s2)...(1-sn) with s1 to sn denoting the scores of its diseases.
Maximum disease score: This method takes the maximum score of all diseases annotated to a gene. The final score of a gene is determined as max(s1,s2,...,sn) with s1 to sn denoting the scores of its diseases.

Input Ports

Icon
Output of Phenomizer: a table produced by the Phenomizer node. PhenoToGeno requires not all columns generated by Phenomizer. This node only depends on the columns disease_id and p_value.
Icon
Associations Disease - Gene: a table representing associations between diseases and genes. These associations should comprise all known causal genes for the diseases of PhenoDis. The table should have two columns named disease_id and gene_id. The associations are represented as pairs PhenoDis disease id - gene id (e.g. Ensmebl id). Note that the gene id is allowed to be missing (for a disease without known genes) whereas the disease id is required in every row.
All diseases that should be considered in the PhenoToGeno algorithm have to occur at least once in the table.
Icon
All genes: a table with a single column gene_id. It contains gene ids (e.g. Ensembl gene ids) of all genes to score.

Output Ports

Icon
Gene Scores: Each row represents a gene from the table at Input Port 2 and consists of 3 columns: gene_id, gene_probability and contribution. The gene probability gives the likelihood that the gene is causal for the patient's disease. The column contribution lists the diseases that contributed most to the score of the gene.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.