**ScoreMetabolites** is the first node of the metabotype analysis implemented in **PheNoBo**.
This node is the predecessor of the **MetaboToGeno** node.

The task of ScoreMetabolites is to compare the metabolite concentrations measured for a patient to a set of reference values.
This comparison results in a score and a p value for each measured metabolite.
A high score and a low p value hint at metabolites which strongly deviate from the expected values.
Such metabolites are likely to be related to the patient's disease.

ScoreMetabolites requires 2 tables with input data: the reference values and the measured metabolite concentrations.
For detailed information about the format of the tables have a look at the Input Port section and
at the example files provided at https://github.com/marie-sophie/mapra.

The algorithm of ScoreMetabolites is able to calculate 2 types of scores depending on the missing values in the input data.

**Z Score**:
This node calculates a Z Score for each metabolite that fulfills 2 conditions:

(1) There are sufficient control samples: low missingness in the reference values.

(2) The metabolite was measured for the patient: the concentration in the actual measurement is not missing.

The Z score is calculated as (concentration-mean)/standard deviation.
The corresponding p value is calculated analytically by assuming a Normal distribution with mean 0 and standard deviation 1 for the Z scores.
As the measured metabolite concentrations strongly depend on variables like age, sex and fasting state, the reference samples are divided into phenotype groups with separate mean and standard deviation values.
The patient's measured concentration is then compared to the mean and standard deviation of the appropriate phenotype group during calculation of the Z score.

**Binary Score**:
This node calculates a Binary Score, if the data of a metabolite do not meet the conditions for calculating a Z Score.
The binary score can assume 2 different values: 0 and 1.
The binary score is set to 1, if condition (2) is fulfilled but condition (1) does not hold: there are not sufficient reference values to interpret the measured concentration.
The binary score is set to 0, if condition (2) is violated: the concentration of the metabolite in the patient's measurement is missing.
The p value corresponding to a binary score is derived from the missingness in the control samples across all phenotype groups.

**Reference**: table with reference values for each metabolite. The table contains summarized values for each metabolite calculated from a set of control samples. The table has 6 columns:**metabolite_id**,**type**,**group**,**mean**,**stdev**and**missingness**.

The column**metabolite id**contains the identifier (e.g. Metabolon id) of a metabolite. The column**group**refers to a phenotype group of control samples. There is a row for each metabolite id and each phenotype group.

The column**mean**contains the mean value of the current metabolite within the current phenotype group. The column**stdev**contains the standard deviation of the current metabolite within the current phenotype group. The column**missingness**indicates the percentage of control samples with a missing value for the current metabolite (regardless of the group).

The column**type**gives the type of the current row. There are 2 kinds of rows: rows of type*binary*and rows of type*concentration*. The binary entries have only one row per metabolite id. The columns type, mean and stdev of a binary row contain missing values (i.e. the metabolite does not fulfill condition (1)). The entries of type concentration have several rows per metabolite id. These rows should not contain any missing values (i.e. the metabolite meets condition (1)).**Measurements from patient**: table with measured metabolite concentrations from a patient. The table should have 3 columns named**metabolite_id**,**concentration**and**group**.

The metabolite id should be a unique identifier for each metabolite (e.g. the Metabolon id). The column concentration can either contain a measured concentration or a missing value (if the metabolite concentration was below the limit of detection). The column group gives information about the patient. The patients are grouped e.g. according to age, sex and/or state of fasting.

**The measured concentrations should be logarithmized and normalized the same way than the reference data at input port 0.**

**Scored Metabolites**: table with metabolite scores. Each row represents a metabolite and consists of 4 columns:**metabolite_id**,**type**,**metabolite_score**and**significance**. The column type indicates if a Z score (value*concentration*) or a binary score (value*binary*) was calculated. The column metabolite_score contains the Z scores and the binary scores. The column significance gives the probability of observing a more extreme score than the actual score.

If the metabolite names are provided at input port 0, they are added as an additional column called**metabolite_name**.

- This node has no views

- No workflows found

- No links available

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**