Icon

MQ_​PGs_​LFQ_​general_​1.3.0

# MQ_PGs_LFQ_general workflow## Description of the workflow and general remarksMQ_PGs_LFQ_general is a KNIME workflow developed for the general processing of label-free bottom-up mass spectrometry data.Please note, that you should understand e.g. data structure and experimental design used within the study to apply the correct processing approach! The processing may require consultation with (bio)statistician to achieve correct outputs.## Input dataIn this workflow, the proteinGroups.txt file, an output of MaxQuant (http://coxdocs.org/doku.php?id=maxquant:start) software is used for the file input. In general, the workflow is applicable also to other types of data in wide format data table; please note that additional adjustment of nodes settings may be required then (e.g. different column names, prefixes and suffixes, etc.).## Documentation of used nodes/metanodesThe workflow contains several nodes for the data processing:- File Reader- Contaminants filtering (e.g. cRAP)- Columns Resorter- Log2 transformation of columns containing peptide intensities and LFQ intensities- Normalizationo normalization on median (linear)o quantile normalization (non-linear)o LoessF normalization (non-linear)o vsn normalization (non-linear)o MaxLFQ normalization- Statistics (LIMMA test)- Sorter node and Column Filter node- UniProt query and Values lookup node (appends the information from UniProt to original dataframe) - Excel Sheet Appender node (export as .xlsx file)In each step of the workflow data can be visualized by several visualization nodes including Violin Plots, Correlation Clustermaps, Volcano plots, Scatter plot matrices and Hierarchical Clustering.## Example datasetAn example dataset (proteinGroups.txt file) is provided with the workflow. It can be processed by connecting 'Example dataset' node to the 'MQ PGs filtering' node and following subsequent nodes. ## Additional nodes recommended to use with this workflowPreviously described nodes are what we think is the basis of label-free bottom-up mass spectrometry data processing. However, we recommend using also other nodes for more advanced data processing and evaluation. Here we provide a list of potentially utilized nodes:- Missing values imputation node: Value imputation node can be used providing several imputation strategies. - Upset plot (interactive): a plot for displaying the intersections between particular datasets.
log2 calculation for all Intensity columns
Math Formula (Multi Column)
computes LIMMA test
LIMMA tests
resorts columns to get them in the most meaningful way
Column Resorter
filterse.g. cRAP
MQ PGs filtering (e.g. cRAP)
log2 calculation for all LFQ intensity columns
Math Formula (Multi Column)
writes sheet with processed proteinGroupstable (change filename)
Excel Writer
proteinGroups.txt filefrom __inputs__ folder
File Reader
appends data from UniProtKB to original dataframe
Values lookup
gets protein info from UniProtKB based on majority proteins list
UniProt query
Example dataset
select normalization methodfor further data processing not normalized data and data already normalized in previously usedprocessing software are already included in the data table, select "no normalization" if no normalization is needed
Data normalization
please, adjust before running:1) select samples and subsets2) specify norm. techniques to test3) specify set of data to test4) select graphs for visualizationgraphs can be found in the _outputs_ folder of the workflow
Normalization tests - sub WF
run before theNormalization testsmetanodesample names are outputas a flow variable
Sample name extraction
removes some ID columns that contain too long cells not compatible with Excel (e.g. msms, evidence)
Column Filter
sorts PGs based on total Intensity
Sorter

Nodes

Extensions

Links