Phenomizer

This node implements the Phenomizer algorithm for the PhenoDis database at Helmholtz Zentrum.
The Phenomizer method is an ontology-based similarity search algorithm. It compares a list of symptoms against a set of annotated diseases. The similarity measure of the algorithm makes use of a symptom ontology, i.e. a directed acyclic graph that represents an is-a hierarchy of the symptoms. The algorithm is described in detail in the Phenomizer paper by Koehler et al. (2009).

Phenomizer requires several input tables: Table 0 to 2 are directly extracted from PhenoDis. Table 3 contains the PhenoDis symptom_ids of the query symptoms. Note that the column names of the tables have to match the names specified in the Input Port section. For more information about the format of the input tables see example data from https://github.com/marie-sophie/mapra.

The output of Phenomizer is a list of diseases with similarity score and p value. The list is sorted according to p value (ascending) and score (descending). The score of a disease indicates the similarity of the query symptoms and the symptoms annotated for the disease. The p value of a disease helps to evaluate the significance of the score. Phenomizer uses the following categories to classify the p values:

  • ns : not significant (p value >= 0.05)
  • * : significant (0.01 <= p value < 0.05)
  • ** : very significant ( 0.001 <= p value < 0.01)
  • *** : extremely significant (0.001 > p value)

Options

Number of diseases in output
Limits the number of diseases passed to the output table
Use frequency weights
Phenomizer is able to use weights to calculate similarity scores. The weights depend on the frequency of a symptom for a given disease (column frequency of ksz table). If this option is unchecked, all symptoms have equal weight.
Calculate p values
Phenomizer with p values performs a significance test for the similarity score of each disease. The corresponding p values are part of the output table. The p values are adjusted for multiple testing using the Benjamini-Hochberg method. The diseases are ranked according to their p values.
Phenomizer without p values just reports similarity scores. The diseases are ranked according to those scores.
Choose folder with p value files
This option is required only if the calculate p values option is chosen. Phenomizer with p values depends on files with precalculated score distributions for the PhenoDis database. The provided folder should contain 10 files with empirical score distributions for the diseases in PhenoDis. The files are named length_x.txt with x ranging from 1 to 10. Note that you need different score distributions for Phenomizer with weights and Phenomizer without weights.

Input Ports

Icon
Symptoms: Symptom table from PhenoDis with the columns symptom_id and symptom_name
Icon
Ontology: ISA table from PhenoDis with the columns parent_id and child_id
Icon
Symptom-Disease Annotation: KSZ table from PhenoDis with the columns disease_id, disease, symptom_id and frequency. The column frequency is required only if the option Use frequency weights is checked.
Icon
Query: table of query symptoms with the column symptom_id

Output Ports

Icon
Most Similar Diseases: Each row corresponds to a disease and has 3 columns: disease_id, disease and score. If the option Calculate p values is chosed, there are 2 additional columns: p_value and significance.

Popular Predecessors

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.