ProteomicsLFQ

A standard proteomics LFQ pipeline.

Web Documentation for ProteomicsLFQ

Options

version: Version of the tool that generated this parameters file.
proteinFDR: Protein FDR threshold (0.05=5%).
picked_proteinFDR: Use a picked protein FDR?
psmFDR: FDR threshold for sub-protein level (e.g. 0.05=5%). Use -FDR_type to choose the level. Cutoff is applied at the highest level. If Bayesian inference was chosen, it is equivalent with a peptide FDR
FDR_type: Sub-protein FDR level. PSM, PSM+peptide (best PSM q-value).
protein_inference: Infer proteins: aggregation = aggregates all peptide scores across a protein (using the best score) bayesian = computes a posterior probability for every protein based on a Bayesian network. Note: 'bayesian' only uses and reports the best PSM per peptide.
protein_quantification: Quantify proteins based on: unique_peptides = use peptides mapping to single proteins or a group of indistinguishable proteins(according to the set of experimentally identified peptides). strictly_unique_peptides = use peptides mapping to a unique single protein only. shared_peptides = use shared peptides only for its best group (by inference score)
quantification_method: feature_intensity: MS1 signal. spectral_counting: PSM counts.
targeted_only: true: Only ID based quantification. false: include unidentified features so they can be linked to identified ones (=match between runs).
feature_with_id_min_score: The minimum probability (e.g.: 0.25) an identified (=id targeted) feature must have to be kept for alignment and linking (0=no filter).
feature_without_id_min_score: The minimum probability (e.g.: 0.75) an unidentified feature must have to be kept for alignment and linking (0=no filter).
mass_recalibration: Mass recalibration.
alignment_order: If star, aligns all maps to the reference with most IDs.
keep_feature_top_psm_only: If false, also keeps lower ranked PSMs that have the top-scoring sequence as a candidate per feature in the same file.
log: Name of log file (created only when specified)
debug: Sets the debug level
threads: Sets the number of threads allowed to be used by the TOPP tool
no_progress: Disables progress logging to command line
force: Overrides tool-specific checks
test: Enables the test mode (needed for internal use only)
intThreshold: Peak intensity threshold applied in seed detection.
charge: Charge range considered for untargeted feature seeds.
traceRTTolerance: Combines all spectra in the tolerance window to stabilize identification of isotope patterns. Controls sensitivity (low value) vs. specificity (high value) of feature seeds.
signal_to_noise: Minimal signal-to-noise ratio for a peak to be picked (0.0 disables SNT estimation!)
spacing_difference_gap: The extension of a peak is stopped if the spacing between two subsequent data points exceeds 'spacing_difference_gap * min_spacing'. 'min_spacing' is the smaller of the two spacings from the peak apex to its two neighboring points. '0' to disable the constraint. Not applicable to chromatograms.
spacing_difference: Maximum allowed difference between points during peak extension, in multiples of the minimal difference between the peak apex and its two neighboring points. If this difference is exceeded a missing point is assumed (see parameter 'missing'). A higher value implies a less stringent peak definition, since individual signals within the peak are allowed to be further apart. '0' to disable the constraint. Not applicable to chromatograms.
missing: Maximum number of missing points allowed when extending a peak to the left or to the right. A missing data point occurs if the spacing between two subsequent data points exceeds 'spacing_difference * min_spacing'. 'min_spacing' is the smaller of the two spacings from the peak apex to its two neighboring points. Not applicable to chromatograms.
ms_levels: List of MS levels for which the peak picking is applied. If empty, auto mode is enabled, all peaks which aren't picked yet will get picked. Other scans are copied to the output without changes.
report_FWHM: Add metadata for FWHM (as floatDataArray named 'FWHM' or 'FWHM_ppm', depending on param 'report_FWHM_unit') for each picked peak.
report_FWHM_unit: Unit of FWHM. Either absolute in the unit of input, e.g. 'm/z' for spectra, or relative as ppm (only sensible for spectra, not chromatograms).
max_intensity: maximal intensity considered for histogram construction. By default, it will be calculated automatically (see auto_mode). Only provide this parameter if you know what you are doing (and change 'auto_mode' to '-1')! All intensities EQUAL/ABOVE 'max_intensity' will be added to the LAST histogram bin. If you choose 'max_intensity' too small, the noise estimate might be too small as well. If chosen too big, the bins become quite large (which you could counter by increasing 'bin_count', which increases runtime). In general, the Median-S/N estimator is more robust to a manual max_intensity than the MeanIterative-S/N.
auto_max_stdev_factor: parameter for 'max_intensity' estimation (if 'auto_mode' == 0): mean + 'auto_max_stdev_factor' * stdev
auto_max_percentile: parameter for 'max_intensity' estimation (if 'auto_mode' == 1): auto_max_percentile th percentile
auto_mode: method to use to determine maximal intensity: -1 --> use 'max_intensity'; 0 --> 'auto_max_stdev_factor' method (default); 1 --> 'auto_max_percentile' method
win_len: window length in Thomson
bin_count: number of bins for intensity values
min_required_elements: minimum number of elements required in a window (otherwise it is considered sparse)
noise_for_empty_window: noise value used for sparse windows
write_log_messages: Write out log messages in case of sparse windows or median in rightmost histogram bin
debug: Debug level for feature detection.
quantify_decoys: Whether decoy peptides should be quantified (true) or skipped (false).
min_psm_cutoff: Minimum score for the best PSM of a spectrum to be used as seed. Use 'none' for no cutoff.
add_mass_offset_peptides: If for every peptide (or seed) also an offset peptide is extracted (true). Can be used to downstream to determine MBR false transfer rates. (0.0 = disabled)
batch_size: Nr of peptides used in each batch of chromatogram extraction. Smaller values decrease memory usage but increase runtime.
mz_window: m/z window size for chromatogram extraction (unit: ppm if 1 or greater, else Da/Th)
n_isotopes: Number of isotopes to include in each peptide assay.
isotope_pmin: Minimum probability for an isotope to be included in the assay for a peptide. If set, this parameter takes precedence over 'extract:n_isotopes'.
rt_quantile: Quantile of the RT deviations between aligned internal and external IDs to use for scaling the RT extraction window
rt_window: RT window size (in sec.) for chromatogram extraction. If set, this parameter takes precedence over 'extract:rt_quantile'.
min_peak_width: Minimum elution peak width. Absolute value in seconds if 1 or greater, else relative to 'peak_width'.
signal_to_noise: Signal-to-noise threshold for OpenSWATH feature detection
mapping_tolerance: RT tolerance (plus/minus) for mapping peptide IDs to features. Absolute value in seconds if 1 or greater, else relative to the RT span of the feature.
samples: Number of observations to use for training ('0' for all)
no_selection: By default, roughly the same number of positive and negative observations, with the same intensity distribution, are selected for training. This aims to reduce biases, but also reduces the amount of training data. Set this flag to skip this procedure and consider all available observations (subject to 'svm:samples').
kernel: SVM kernel
xval: Number of partitions for cross-validation (parameter optimization)
log2_C: Values to try for the SVM parameter 'C' during parameter optimization. A value 'x' is used as 'C = 2^x'.
log2_gamma: Values to try for the SVM parameter 'gamma' during parameter optimization (RBF kernel only). A value 'x' is used as 'gamma = 2^x'.
log2_p: Values to try for the SVM parameter 'epsilon' during parameter optimization (epsilon-SVR only). A value 'x' is used as 'epsilon = 2^x'.
epsilon: Stopping criterion
cache_size: Size of the kernel cache (in MB)
no_shrinking: Disable the shrinking heuristics
predictors: Names of OpenSWATH scores to use as predictors for the SVM (comma-separated list)
min_prob: Minimum probability of correctness, as predicted by the SVM, required to retain a feature candidate
type: Type of elution model to fit to features
add_zeros: Add zero-intensity points outside the feature range to constrain the model fit. This parameter sets the weight given to these points during model fitting; '0' to disable.
unweighted_fit: Suppress weighting of mass traces according to theoretical intensities when fitting elution models
no_imputation: If fitting the elution model fails for a feature, set its intensity to zero instead of imputing a value from the initial intensity estimate
each_trace: Fit elution model to each individual mass trace
min_area: Lower bound for the area under the curve of a valid elution model
boundaries: Time points corresponding to this fraction of the elution model height have to be within the data region used for model fitting
width: Upper limit for acceptable widths of elution models (Gaussian or EGH), expressed in terms of modified (median-based) z-scores. '0' to disable. Not applied to individual mass traces (parameter 'each_trace').
asymmetry: Upper limit for acceptable asymmetry of elution models (EGH only), expressed in terms of modified (median-based) z-scores. '0' to disable. Not applied to individual mass traces (parameter 'each_trace').
max_iteration: Maximum number of iterations for EMG fitting.
init_mom: Alternative initial parameters for fitting through method of moments.
model_type: Options to control the modeling of retention time transformations from data
type: Type of model
symmetric_regression: Perform linear regression on 'y - x' vs. 'y + x', instead of on 'y' vs. 'x'.
x_weight: Weight x values
y_weight: Weight y values
x_datum_min: Minimum x value
x_datum_max: Maximum x value
y_datum_min: Minimum y value
y_datum_max: Maximum y value
wavelength: Determines the amount of smoothing by setting the number of nodes for the B-spline. The number is chosen so that the spline approximates a low-pass filter with this cutoff wavelength. The wavelength is given in the same units as the data; a higher value means more smoothing. '0' sets the number of nodes to twice the number of input points.
num_nodes: Number of nodes for B-spline fitting. Overrides 'wavelength' if set (to two or greater). A lower value means more smoothing.
extrapolate: Method to use for extrapolation beyond the original data range. 'linear': Linear extrapolation using the slope of the B-spline at the corresponding endpoint. 'b_spline': Use the B-spline (as for interpolation). 'constant': Use the constant value of the B-spline at the corresponding endpoint. 'global_linear': Use a linear fit through the data (which will most probably introduce discontinuities at the ends of the data range).
boundary_condition: Boundary condition at B-spline endpoints: 0 (value zero), 1 (first derivative zero) or 2 (second derivative zero)
span: Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.
num_iterations: Number of robustifying iterations for lowess fitting.
delta: Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.
interpolation_type: Method to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolation
extrapolation_type: Method to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.
interpolation_type: Type of interpolation to apply.
extrapolation_type: Type of extrapolation to apply: two-point-linear: use the first and last data point to build a single linear model, four-point-linear: build two linear models on both ends using the first two / last two points, global-linear: use all points to build a single linear model. Note that global-linear may not be continuous at the border.
score_type: Name of the score type to use for ranking and filtering (.oms input only). If left empty, a score type is picked automatically.
score_cutoff: Use only IDs above a score cut-off (parameter 'min_score') for alignment?
min_score: If 'score_cutoff' is 'true': Minimum score for an ID to be considered. Unless you have very few runs or identifications, increase this value to focus on more informative peptides.
min_run_occur: Minimum number of runs (incl. reference, if any) in which a peptide must occur to be used for the alignment. Unless you have very few runs or identifications, increase this value to focus on more informative peptides.
max_rt_shift: Maximum realistic RT difference for a peptide (median per run vs. reference). Peptides with higher shifts (outliers) are not used to compute the alignment. If 0, no limit (disable filter); if > 1, the final value in seconds; if <= 1, taken as a fraction of the range of the reference RT scale.
use_unassigned_peptides: Should unassigned peptide identifications be used when computing an alignment of feature or consensus maps? If 'false', only peptide IDs assigned to features will be used.
use_feature_rt: When aligning feature or consensus maps, don't use the retention time of a peptide identification directly; instead, use the retention time of the centroid of the feature (apex of the elution profile) that the peptide was matched to. If different identifications are matched to one feature, only the peptide closest to the centroid in RT is used. Precludes 'use_unassigned_peptides'.
use_adducts: If IDs contain adducts, treat differently adducted variants of the same molecule as different.
use_identifications: Never link features that are annotated with different peptides (only the best hit per peptide identification is taken into account).
nr_partitions: How many partitions in m/z space should be used for the algorithm (more partitions means faster runtime and more memory efficient execution).
min_nr_diffs_per_bin: If IDs are used: How many differences from matching IDs should be used to calculate a linking tolerance for unIDed features in an RT region. RT regions will be extended until that number is reached.
min_IDscore_forTolCalc: If IDs are used: What is the minimum score of an ID to assume a reliable match for tolerance calculation. Check your current score type!
noID_penalty: If IDs are used: For the normalized distances, how high should the penalty for missing IDs be? 0 = no bias, 1 = IDs inside the max tolerances always preferred (even if much further away).
ignore_charge: false [default]: pairing requires equal charge state (or at least one unknown charge '0'); true: Pairing irrespective of charge state
ignore_adduct: true [default]: pairing requires equal adducts (or at least one without adduct annotation); true: Pairing irrespective of adducts
exponent: Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
weight: Final RT distances are weighted by this factor
max_difference: Never pair features with larger m/z distance (unit defined by 'unit')
unit: Unit of the 'max_difference' parameter
exponent: Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
weight: Final m/z distances are weighted by this factor
exponent: Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
weight: Final intensity distances are weighted by this factor
log_transform: Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))
method: - top - quantify based on three most abundant peptides (number can be changed in 'top'). - iBAQ (intensity based absolute quantification), calculate the sum of all peptide peak intensities divided by the number of theoretically observable tryptic peptides (https://rdcu.be/cND1J). Warning: only consensusXML or featureXML input is allowed!
best_charge_and_fraction: Distinguish between fraction and charge states of a peptide. For peptides, abundances will be reported separately for each fraction and charge; for proteins, abundances will be computed based only on the most prevalent charge observed of each peptide (over all fractions). By default, abundances are summed over all charge states.
N: Calculate protein abundance from this number of proteotypic peptides (most abundant first; '0' for all)
aggregate: Aggregation method used to compute protein abundances from peptide abundances
include_all: Include results for proteins with fewer proteotypic peptides than indicated by 'N' (no effect if 'N' is 0 or 1)
normalize: Scale peptide abundances so that medians of all samples are equal
fix_peptides: Use the same peptides for protein quantification across all samples. With 'N 0',all peptides that occur in every sample are considered. Otherwise ('N'), the N peptides that occur in the most samples (independently of each other) are selected, breaking ties by total abundance (there is no guarantee that the best co-ocurring peptides are chosen!).

Input Ports

: Input files [mzML]
: Identifications filtered at PSM level (e.g., q-value < 0.01).And annotated with PEP as main score.#br#We suggest using:#br#1. PSMFeatureExtractor to annotate percolator features.#br#2. PercolatorAdapter tool (score_type = 'q-value', -post-processing-tdc)#br#3. IDFilter (pep:score = 0.05)#br#To obtain well calibrated PEPs and an initial reduction of PSMs#br#ID files must be provided in same order as spectra files. [idXML,mzId]
: design file [tsv,opt.]
: fasta file [fasta,opt.]

Output Ports

: output mzTab file [mzTab]
: output MSstats input file [csv]
: output Triqler input file [tsv]
: output consensusXML file [consensusXML]
: Optional output file with feature candidates. []
: Output file: SVM cross-validation (parameter optimization) results [csv]

Popular Predecessors

Popular Successors

Views

ProteomicsLFQ Std Output: The text sent to standard out during the execution of ProteomicsLFQ.
ProteomicsLFQ Error Output: The text sent to standard error during the execution of ProteomicsLFQ. (If it appears in gray, it's the output of a previously failing run which is preserved for your trouble shooting.)

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension OpenMS from the below update site following our NodePit Product and Node Installation Guide:

v5.4

A zipped version of the software site can be downloaded here.

Plugin provider: Freie Universitaet Berlin, Universitaet Tuebingen, ZIB (GKN-Team) and the OpenMS Team

Plugin version: 3.4.0.202501170921

On NodePit since: 2024-12-06

Last update: 2025-05-31

KNIME versions: v5.4, v5.3, v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!