0 ×

ProteomicsLFQ

Generic Workflow Nodes for KNIME: OpenMS version 2.6.0.202009301213 by Freie Universitaet Berlin, Universitaet Tuebingen, and the OpenMS Team

A standard proteomics LFQ pipeline.

Options

version
Version of the tool that generated this parameters file.
proteinFDR
Protein FDR threshold (0.05=5%).
seedThreshold
Peak intensity threshold applied in seed detection.
psmFDR
PSM FDR threshold (e.g. 0.05=5%). If Bayesian inference was chosen, it is equivalent with a peptide FDR
protein_inference
Infer proteins: aggregation = aggregates all peptide scores across a protein (by calculating the maximum) bayesian = computes a posterior probability for every protein based on a Bayesian network. Note: 'bayesian' only uses and reports the best PSM per peptide.
protein_quantification
Quantify proteins based on: unique_peptides = use peptides mapping to single proteins or a group of indistinguishable proteins(according to the set of experimentally identified peptides). strictly_unique_peptides = use peptides mapping to a unique single protein only. shared_peptides = use shared peptides only for its best group (by inference score)
quantification_method
feature_intensity: MS1 signal. spectral_counting: PSM counts.
targeted_only
true: Only ID based quantification. false: include unidentified features so they can be linked to identified ones (=match between runs).
transfer_ids
Requantification using mean of aligned RTs of a peptide feature. Only applies to peptides that were quantified in more than 50% of all runs (of a fraction).
mass_recalibration
Mass recalibration.
keep_feature_top_psm_only
If false, also keeps lower ranked PSMs that have the top-scoring sequence as a candidate per feature in the same file.
log
Name of log file (created only when specified)
debug
Sets the debug level
threads
Sets the number of threads allowed to be used by the TOPP tool
no_progress
Disables progress logging to command line
force
Overrides tool-specific checks
test
Enables the test mode (needed for internal use only)
signal_to_noise
Minimal signal-to-noise ratio for a peak to be picked (0.0 disables SNT estimation!)
spacing_difference_gap
The extension of a peak is stopped if the spacing between two subsequent data points exceeds 'spacing_difference_gap * min_spacing'. 'min_spacing' is the smaller of the two spacings from the peak apex to its two neighboring points. '0' to disable the constraint. Not applicable to chromatograms.
spacing_difference
Maximum allowed difference between points during peak extension, in multiples of the minimal difference between the peak apex and its two neighboring points. If this difference is exceeded a missing point is assumed (see parameter 'missing'). A higher value implies a less stringent peak definition, since individual signals within the peak are allowed to be further apart. '0' to disable the constraint. Not applicable to chromatograms.
missing
Maximum number of missing points allowed when extending a peak to the left or to the right. A missing data point occurs if the spacing between two subsequent data points exceeds 'spacing_difference * min_spacing'. 'min_spacing' is the smaller of the two spacings from the peak apex to its two neighboring points. Not applicable to chromatograms.
ms_levels
List of MS levels for which the peak picking is applied. If empty, auto mode is enabled, all peaks which aren't picked yet will get picked. Other scans are copied to the output without changes.
report_FWHM
Add metadata for FWHM (as floatDataArray named 'FWHM' or 'FWHM_ppm', depending on param 'report_FWHM_unit') for each picked peak.
report_FWHM_unit
Unit of FWHM. Either absolute in the unit of input, e.g. 'm/z' for spectra, or relative as ppm (only sensible for spectra, not chromatograms).
max_intensity
maximal intensity considered for histogram construction. By default, it will be calculated automatically (see auto_mode). Only provide this parameter if you know what you are doing (and change 'auto_mode' to '-1')! All intensities EQUAL/ABOVE 'max_intensity' will be added to the LAST histogram bin. If you choose 'max_intensity' too small, the noise estimate might be too small as well. If chosen too big, the bins become quite large (which you could counter by increasing 'bin_count', which increases runtime). In general, the Median-S/N estimator is more robust to a manual max_intensity than the MeanIterative-S/N.
auto_max_stdev_factor
parameter for 'max_intensity' estimation (if 'auto_mode' == 0): mean + 'auto_max_stdev_factor' * stdev
auto_max_percentile
parameter for 'max_intensity' estimation (if 'auto_mode' == 1): auto_max_percentile th percentile
auto_mode
method to use to determine maximal intensity: -1 --> use 'max_intensity'; 0 --> 'auto_max_stdev_factor' method (default); 1 --> 'auto_max_percentile' method
win_len
window length in Thomson
bin_count
number of bins for intensity values
min_required_elements
minimum number of elements required in a window (otherwise it is considered sparse)
noise_for_empty_window
noise value used for sparse windows
write_log_messages
Write out log messages in case of sparse windows or median in rightmost histogram bin
debug
Debug level for feature detection.
batch_size
Nr of peptides used in each batch of chromatogram extraction. Smaller values decrease memory usage but increase runtime.
mz_window
m/z window size for chromatogram extraction (unit: ppm if 1 or greater, else Da/Th)
n_isotopes
Number of isotopes to include in each peptide assay.
isotope_pmin
Minimum probability for an isotope to be included in the assay for a peptide. If set, this parameter takes precedence over 'extract:n_isotopes'.
rt_quantile
Quantile of the RT deviations between aligned internal and external IDs to use for scaling the RT extraction window
rt_window
RT window size (in sec.) for chromatogram extraction. If set, this parameter takes precedence over 'extract:rt_quantile'.
min_peak_width
Minimum elution peak width. Absolute value in seconds if 1 or greater, else relative to 'peak_width'.
signal_to_noise
Signal-to-noise threshold for OpenSWATH feature detection
mapping_tolerance
RT tolerance (plus/minus) for mapping peptide IDs to features. Absolute value in seconds if 1 or greater, else relative to the RT span of the feature.
samples
Number of observations to use for training ('0' for all)
no_selection
By default, roughly the same number of positive and negative observations, with the same intensity distribution, are selected for training. This aims to reduce biases, but also reduces the amount of training data. Set this flag to skip this procedure and consider all available observations (subject to 'svm:samples').
kernel
SVM kernel
xval
Number of partitions for cross-validation (parameter optimization)
log2_C
Values to try for the SVM parameter 'C' during parameter optimization. A value 'x' is used as 'C = 2^x'.
log2_gamma
Values to try for the SVM parameter 'gamma' during parameter optimization (RBF kernel only). A value 'x' is used as 'gamma = 2^x'.
epsilon
Stopping criterion
cache_size
Size of the kernel cache (in MB)
no_shrinking
Disable the shrinking heuristics
predictors
Names of OpenSWATH scores to use as predictors for the SVM (comma-separated list)
min_prob
Minimum probability of correctness, as predicted by the SVM, required to retain a feature candidate
type
Type of elution model to fit to features
add_zeros
Add zero-intensity points outside the feature range to constrain the model fit. This parameter sets the weight given to these points during model fitting; '0' to disable.
unweighted_fit
Suppress weighting of mass traces according to theoretical intensities when fitting elution models
no_imputation
If fitting the elution model fails for a feature, set its intensity to zero instead of imputing a value from the initial intensity estimate
each_trace
Fit elution model to each individual mass trace
min_area
Lower bound for the area under the curve of a valid elution model
boundaries
Time points corresponding to this fraction of the elution model height have to be within the data region used for model fitting
width
Upper limit for acceptable widths of elution models (Gaussian or EGH), expressed in terms of modified (median-based) z-scores. '0' to disable. Not applied to individual mass traces (parameter 'each_trace').
asymmetry
Upper limit for acceptable asymmetry of elution models (EGH only), expressed in terms of modified (median-based) z-scores. '0' to disable. Not applied to individual mass traces (parameter 'each_trace').
score_cutoff
If only IDs above a score cutoff should be used. Used together with min_score.
min_score
Minimum score for an ID to be considered. Applies to the last score calculated. Unless you have very few runs or identifications, increase this value to focus on more informative peptides.
min_run_occur
Minimum number of runs (incl. reference, if any) in which a peptide must occur to be used for the alignment. Unless you have very few runs or identifications, increase this value to focus on more informative peptides.
max_rt_shift
Maximum realistic RT difference for a peptide (median per run vs. reference). Peptides with higher shifts (outliers) are not used to compute the alignment. If 0, no limit (disable filter); if > 1, the final value in seconds; if <= 1, taken as a fraction of the range of the reference RT scale.
use_unassigned_peptides
Should unassigned peptide identifications be used when computing an alignment of feature or consensus maps? If 'false', only peptide IDs assigned to features will be used.
use_feature_rt
When aligning feature or consensus maps, don't use the retention time of a peptide identification directly; instead, use the retention time of the centroid of the feature (apex of the elution profile) that the peptide was matched to. If different identifications are matched to one feature, only the peptide closest to the centroid in RT is used. Precludes 'use_unassigned_peptides'.
use_identifications
Never link features that are annotated with different peptides (only the best hit per peptide identification is taken into account).
nr_partitions
How many partitions in m/z space should be used for the algorithm (more partitions means faster runtime and more memory efficient execution )
ignore_charge
false [default]: pairing requires equal charge state (or at least one unknown charge '0'); true: Pairing irrespective of charge state
ignore_adduct
true [default]: pairing requires equal adducts (or at least one without adduct annotation); true: Pairing irrespective of adducts
exponent
Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
weight
Final RT distances are weighted by this factor
max_difference
Never pair features with larger m/z distance (unit defined by 'unit')
unit
Unit of the 'max_difference' parameter
exponent
Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
weight
Final m/z distances are weighted by this factor
exponent
Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
weight
Final intensity distances are weighted by this factor
log_transform
Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))
top
Calculate protein abundance from this number of proteotypic peptides (most abundant first; '0' for all)
average
Averaging method used to compute protein abundances from peptide abundances
include_all
Include results for proteins with fewer proteotypic peptides than indicated by 'top' (no effect if 'top' is 0 or 1)
best_charge_and_fraction
Distinguish between fraction and charge states of a peptide. For peptides, abundances will be reported separately for each fraction and charge; for proteins, abundances will be computed based only on the most prevalent charge observed of each peptide (over all fractions). By default, abundances are summed over all charge states.
normalize
Scale peptide abundances so that medians of all samples are equal
fix_peptides
Use the same peptides for protein quantification across all samples. With 'top 0', all peptides that occur in every sample are considered. Otherwise ('top N'), the N peptides that occur in the most samples (independently of each other) are selected, breaking ties by total abundance (there is no guarantee that the best co-ocurring peptides are chosen!).

Input Ports

Icon
Input files [mzML]
Icon
Identifications filtered at PSM level (e.g., q-value < 0.01).And annotated with PEP as main score.#br#We suggest using:#br#1. PeptideIndexer to annotate target and decoy information.#br#2. PSMFeatureExtractor to annotate percolator features.#br#3. PercolatorAdapter tool (score_type = 'q-value', -post-processing-tdc)#br#4. IDFilter (pep:score = 0.01)#br#To obtain well calibrated PEPs and an inital reduction of PSMs#br#ID files must be provided in same order as spectra files. [idXML,mzId]
Icon
design file [tsv,opt.]
Icon
fasta file [fasta,opt.]

Output Ports

Icon
output mzTab file [mzTab]
Icon
output MSstats input file [csv]
Icon
output consensusXML file [consensusXML]
Icon
Optional output file with feature candidates. []
Icon
Output file: SVM cross-validation (parameter optimization) results [csv]

Views

ProteomicsLFQ Std Output
The text sent to standard out during the execution of ProteomicsLFQ.
ProteomicsLFQ Error Output
The text sent to standard error during the execution of ProteomicsLFQ. (If it appears in gray, it's the output of a previously failing run which is preserved for your trouble shooting.)

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install OpenMS from the following update site:

KNIME 4.3

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.