ScoreMetabolites

ScoreMetabolites is the first node of the metabotype analysis implemented in PheNoBo. This node is the predecessor of the MetaboToGeno node.

The task of ScoreMetabolites is to compare the metabolite concentrations measured for a patient to a set of reference values. This comparison results in a score and a p value for each measured metabolite. A high score and a low p value hint at metabolites which strongly deviate from the expected values. Such metabolites are likely to be related to the patient's disease.

ScoreMetabolites requires 2 tables with input data: the reference values and the measured metabolite concentrations. For detailed information about the format of the tables have a look at the Input Port section and at the example files provided at https://github.com/marie-sophie/mapra.

The algorithm of ScoreMetabolites is able to calculate 2 types of scores depending on the missing values in the input data.
Z Score: This node calculates a Z Score for each metabolite that fulfills 2 conditions:
(1) There are sufficient control samples: low missingness in the reference values.
(2) The metabolite was measured for the patient: the concentration in the actual measurement is not missing.
The Z score is calculated as (concentration-mean)/standard deviation. The corresponding p value is calculated analytically by assuming a Normal distribution with mean 0 and standard deviation 1 for the Z scores. As the measured metabolite concentrations strongly depend on variables like age, sex and fasting state, the reference samples are divided into phenotype groups with separate mean and standard deviation values. The patient's measured concentration is then compared to the mean and standard deviation of the appropriate phenotype group during calculation of the Z score.
Binary Score: This node calculates a Binary Score, if the data of a metabolite do not meet the conditions for calculating a Z Score. The binary score can assume 2 different values: 0 and 1. The binary score is set to 1, if condition (2) is fulfilled but condition (1) does not hold: there are not sufficient reference values to interpret the measured concentration. The binary score is set to 0, if condition (2) is violated: the concentration of the metabolite in the patient's measurement is missing. The p value corresponding to a binary score is derived from the missingness in the control samples across all phenotype groups.

Input Ports

: Reference: table with reference values for each metabolite. The table contains summarized values for each metabolite calculated from a set of control samples. The table has 6 columns: metabolite_id, type, group, mean, stdev and missingness.
The column metabolite id contains the identifier (e.g. Metabolon id) of a metabolite. The column group refers to a phenotype group of control samples. There is a row for each metabolite id and each phenotype group.
The column mean contains the mean value of the current metabolite within the current phenotype group. The column stdev contains the standard deviation of the current metabolite within the current phenotype group. The column missingness indicates the percentage of control samples with a missing value for the current metabolite (regardless of the group).
The column type gives the type of the current row. There are 2 kinds of rows: rows of type binary and rows of type concentration. The binary entries have only one row per metabolite id. The columns type, mean and stdev of a binary row contain missing values (i.e. the metabolite does not fulfill condition (1)). The entries of type concentration have several rows per metabolite id. These rows should not contain any missing values (i.e. the metabolite meets condition (1)).
: Measurements from patient: table with measured metabolite concentrations from a patient. The table should have 3 columns named metabolite_id, concentration and group.
The metabolite id should be a unique identifier for each metabolite (e.g. the Metabolon id). The column concentration can either contain a measured concentration or a missing value (if the metabolite concentration was below the limit of detection). The column group gives information about the patient. The patients are grouped e.g. according to age, sex and/or state of fasting.
The measured concentrations should be logarithmized and normalized the same way than the reference data at input port 0.

Output Ports

: Scored Metabolites: table with metabolite scores. Each row represents a metabolite and consists of 4 columns: metabolite_id, type, metabolite_score and significance. The column type indicates if a Z score (value concentration) or a binary score (value binary) was calculated. The column metabolite_score contains the Z scores and the binary scores. The column significance gives the probability of observing a more extreme score than the actual score.
If the metabolite names are provided at input port 0, they are added as an additional column called metabolite_name.

Popular Predecessors

~~File Reader~~100 %

Popular Successors

No recommendations found

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, download the below referenced file, save it to your KNIME's plugin folder and restart KNIME.

v5.6

Plugin provider:

Plugin version: 2.1.6

On NodePit since: 2025-08-15

Last update: 2025-08-22

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!