This node is currently not available in KNIME v5.11 — instead we’re showing this page for KNIME v5.2. You can use the version menu in the title bar to permanently switch your preferred version. This will also show the link to the update site.

MSSimulator

A highly configurable simulator for mass spectrometry experiments.

Web Documentation for MSSimulator

Options

version: Version of the tool that generated this parameters file.
log: Name of log file (created only when specified)
debug: Sets the debug level
threads: Sets the number of threads allowed to be used by the TOPP tool
no_progress: Disables progress logging to command line
force: Overrides tool-specific checks
test: Enables the test mode (needed for internal use only)
enzyme: Enzyme to use for digestion (select 'no cleavage' to skip digestion)
model: The cleavage model to use for digestion. 'Trained' is based on a log likelihood model (see DOI:10.1021/pr060507u).
min_peptide_length: Minimum peptide length after digestion (shorter ones will be discarded)
threshold: Model threshold for calling a cleavage. Higher values increase the number of cleavages. -2 will give no cleavages, +4 almost full cleavage.
missed_cleavages: Maximum number of missed cleavages considered. All possible resulting peptides will be created.
rt_column: Modelling of an RT or CE column
auto_scale: Scale predicted RT's/MT's to given 'total_gradient_time'? If 'true', for CE this means that 'CE:lenght_d', 'CE:length_total', 'CE:voltage' have no influence.
total_gradient_time: The duration [s] of the gradient.
sampling_rate: Time interval [s] between consecutive scans
min: Start of RT Scan Window [s]
max: End of RT Scan Window [s]
feature_stddev: Standard deviation of shift in retention time [s] from predicted model (applied to every single feature independently)
affine_offset: Global offset in retention time [s] from predicted model
affine_scale: Global scaling in retention time from predicted model
distortion: Distortion of the elution profiles. Good presets are 0 for a perfect elution profile, 1 for a slightly distorted elution profile etc... For trapping instruments (e.g. Orbitrap) distortion should be >4.
value: Width of the Exponential Gaussian Hybrid distribution shape of the elution profile. This does not correspond directly to the width in [s].
variance: Random component of the width (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the variance (Note: The scale parameter has to be >= 0).
value: Asymmetric component of the EGH. Higher absolute(!) values lead to more skewness (negative values cause fronting, positive values cause tailing). Tau parameter of the EGH, i.e. time constant of the exponential decay of the Exponential Gaussian Hybrid distribution shape of the elution profile.
variance: Random component of skewness (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the time constant (Note: The scale parameter has to be > 0).
model_file: SVM model for retention time prediction
pH: pH of buffer
alpha: Exponent Alpha used to calculate mobility
mu_eo: Electroosmotic flow
lenght_d: Length of capillary [cm] from injection site to MS
length_total: Total length of capillary [cm]
voltage: Voltage applied to capillary
dt_simulation_on: Modelling detectibility enabled? This can serve as a filter to remove peptides which ionize badly, thus reducing peptide count
min_detect: Minimum peptide detectability accepted. Peptides with a lower score will be removed
dt_model_file: SVM model for peptide detectability prediction
ionized_residues: List of residues (as three letter code) that will be considered during ES ionization. The N-term is always assumed to carry a charge. This parameter will be ignored during MALDI ionization
charge_impurity: List of charged ions that contribute to charge with weight of occurrence (their sum is scaled to 1 internally), e.g. ['H:1'] or ['H:0.7' 'Na:0.3'], ['H:4' 'Na:1'] (which internally translates to ['H:0.8' 'Na:0.2'])
max_impurity_set_size: Maximal #combinations of charge impurities allowed (each generating one feature) per charge state. E.g. assuming charge=3 and this parameter is 2, then we could choose to allow '3H+, 2H+Na+' features (given a certain 'charge_impurity' constraints), but no '3H+, 2H+Na+, 3Na+'
ionization_probability: Probability for the binomial distribution of the ESI charge states
ionization_probabilities: List of probabilities for different charge states (starting at charge=1, 2, ...) during MALDI ionization (the list must sum up to 1.0)
lower_measurement_limit: Lower m/z detector limit
upper_measurement_limit: Upper m/z detector limit
enabled: Enable RAW signal simulation? (select 'false' if you only need feature-maps)
peak_shape: Peak Shape used around each isotope peak (be aware that the area under the curve is constant for both types, but the maximal height will differ (~ 2:3 = Lorentz:Gaussian) due to the wider base of the Lorentzian
value: Instrument resolution at 400 Th
type: How does resolution change with increasing m/z?! QTOFs usually show 'constant' behavior, FTs have linear degradation, and on Orbitraps the resolution decreases with square root of mass
scaling: Scale of baseline. Set to 0 to disable simulation of baseline
shape: The baseline is modeled by an exponential probability density function (pdf) with f(x) = shape*e^(- shape*x)
sampling_points: Number of raw data points per FWHM of the peak
file: Contaminants file with sum formula and absolute RT interval. See 'share/OpenMS/SIMULATION/contaminants.txt' for details
error_mean: Average systematic m/z error (in Da)
error_stddev: Standard deviation for m/z errors. Set to 0 to disable simulation of m/z errors
scale: Constant scale factor of the feature intensity. Set to 1.0 to get the real intensity values provided in the FASTA file
scale_stddev: Standard deviation of peak intensity (relative to the scaled peak height). Set to 0 to get simple rescaled intensities
rate: Poisson rate of shot noise per unit m/z (random peaks in m/z, where the number of peaks per unit m/z follows a Poisson distribution). Set this to 0 to disable simulation of shot noise
intensity-mean: Shot noise intensity mean (exponentially distributed with given mean)
mean: Mean value of white noise (Gaussian) being added to each *measured* signal intensity
stddev: Standard deviation of white noise being added to each *measured* signal intensity
mean: Mean intensity value of the detector noise (Gaussian distribution)
stddev: Standard deviation of the detector noise (Gaussian distribution)
status: Create Tandem-MS scans?
tandem_mode: Algorithm to generate the tandem-MS spectra. 0 - fixed intensities, 1 - SVC prediction (abundant/missing), 2 - SVR prediction of peak intensity
svm_model_set_file: File containing the filenames of SVM Models for different charge variants
ms2_spectra_per_rt_bin: Number of allowed MS/MS spectra in a retention time bin.
min_mz_peak_distance: The minimal distance (in Th) between two peaks for concurrent selection for fragmentation. Also used to define the m/z width of an exclusion window (distance +/- from m/z of precursor). If you set this lower than the isotopic envelope of a peptide, you might get multiple fragment spectra pointing to the same precursor.
mz_isolation_window: All peaks within a mass window (in Th) of a selected peak are also selected for fragmentation.
exclude_overlapping_peaks: If true, overlapping or nearby peaks (within 'min_mz_peak_distance') are excluded for selection.
charge_filter: Charges considered for MS2 fragmentation.
use_dynamic_exclusion: If true dynamic exclusion is applied.
exclusion_time: The time (in seconds) a feature is excluded.
max_list_size: The maximal number of precursors in the inclusion list.
min_rt: Minimal rt in seconds.
max_rt: Maximal rt in seconds.
rt_step_size: rt step size in seconds.
rt_window_size: rt window size in seconds.
min_protein_id_probability: Minimal protein probability for a protein to be considered identified.
min_pt_weight: Minimal pt weight of a precursor
min_mz: Minimal mz to be considered in protein based LP formulation.
max_mz: Minimal mz to be considered in protein based LP formulation.
use_peptide_rule: Use peptide rule instead of minimal protein id probability
min_peptide_ids: If use_peptide_rule is true, this parameter sets the minimal number of peptide ids for a protein id
min_peptide_probability: If use_peptide_rule is true, this parameter sets the minimal probability for a peptide to be safely identified
add_single_spectra: If true, the MS2 spectra for each peptide signal are included in the output (might be a lot). They will have a meta value 'MSE_DebugSpectrum' attached, so they can be filtered out. Native MS_E spectra will have 'MSE_Spectrum' instead.
isotope_model: Model to use for isotopic peaks ('none' means no isotopic peaks are added, 'coarse' adds isotopic peaks in unit mass distance, 'fine' uses the hyperfine isotopic generator to add accurate isotopic peaks. Note that adding isotopic peaks is very slow.
max_isotope: Defines the maximal isotopic peak which is added if 'isotope_model' is 'coarse'
max_isotope_probability: Defines the maximal isotopic probability to cover if 'isotope_model' is 'fine'
add_metainfo: Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++
add_losses: Adds common losses to those ion expect to have them, only water and ammonia loss is considered
sort_by_position: Sort output by position
add_precursor_peaks: Adds peaks of the unfragmented precursor ion to the spectrum
add_all_precursor_charges: Adds precursor peaks with all charges in the given range
add_abundant_immonium_ions: Add most abundant immonium ions (for Proline, Cystein, Iso/Leucine, Histidin, Phenylalanin, Tyrosine, Tryptophan)
add_first_prefix_ion: If set to true e.g. b1 ions are added
add_y_ions: Add peaks of y-ions to the spectrum
add_b_ions: Add peaks of b-ions to the spectrum
add_a_ions: Add peaks of a-ions to the spectrum
add_c_ions: Add peaks of c-ions to the spectrum
add_x_ions: Add peaks of x-ions to the spectrum
add_z_ions: Add peaks of z-ions to the spectrum
y_intensity: Intensity of the y-ions
b_intensity: Intensity of the b-ions
a_intensity: Intensity of the a-ions
c_intensity: Intensity of the c-ions
x_intensity: Intensity of the x-ions
z_intensity: Intensity of the z-ions
relative_loss_intensity: Intensity of loss ions, in relation to the intact ion intensity
precursor_intensity: Intensity of the precursor peak
precursor_H2O_intensity: Intensity of the H2O loss peak of the precursor
precursor_NH3_intensity: Intensity of the NH3 loss peak of the precursor
add_isotopes: If set to 1 isotope peaks of the product ion peaks are added
max_isotope: Defines the maximal isotopic peak which is added, add_isotopes must be set to 1
add_metainfo: Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++
add_first_prefix_ion: If set to true e.g. b1 ions are added
hide_y_ions: Add peaks of y-ions to the spectrum
hide_y2_ions: Add peaks of y-ions to the spectrum
hide_b_ions: Add peaks of b-ions to the spectrum
hide_b2_ions: Add peaks of b-ions to the spectrum
hide_a_ions: Add peaks of a-ions to the spectrum
hide_c_ions: Add peaks of c-ions to the spectrum
hide_x_ions: Add peaks of x-ions to the spectrum
hide_z_ions: Add peaks of z-ions to the spectrum
hide_losses: Adds common losses to those ion expect to have them, only water and ammonia loss is considered
y_intensity: Intensity of the y-ions
b_intensity: Intensity of the b-ions
a_intensity: Intensity of the a-ions
c_intensity: Intensity of the c-ions
x_intensity: Intensity of the x-ions
z_intensity: Intensity of the z-ions
relative_loss_intensity: Intensity of loss ions, in relation to the intact ion intensity
ionization_type: Type of Ionization (MALDI or ESI)
type: Select the labeling type you want for your experiment
ICPL_fixed_rtshift: Fixed retention time shift between labeled pairs. If set to 0.0 only the retention times, computed by the RT model step are used.
label_proteins: Enables protein-labeling. (select 'false' if you only need peptide-labeling)
ICPL_light_channel_label: UniMod Id of the light channel ICPL label.
ICPL_medium_channel_label: UniMod Id of the medium channel ICPL label.
ICPL_heavy_channel_label: UniMod Id of the heavy channel ICPL label.
fixed_rtshift: Fixed retention time shift between labeled peptides. If set to 0.0 only the retention times computed by the RT model step are used.
modification_lysine: Modification of Lysine in the medium SILAC channel
modification_arginine: Modification of Arginine in the medium SILAC channel
modification_lysine: Modification of Lysine in the heavy SILAC channel. If left empty, two channelSILAC is assumed.
modification_arginine: Modification of Arginine in the heavy SILAC channel. If left empty, two-channel SILAC is assumed.
iTRAQ: 4plex or 8plex iTRAQ?
reporter_mass_shift: Allowed shift (uniformly distributed - left to right) in Da from the expected position (of e.g. 114.1, 115.1)
channel_active_4plex: Four-plex only: Each channel that was used in the experiment and its description (114-117) in format <channel>:<name>, e.g. "114:myref","115:liver".
channel_active_8plex: Eight-plex only: Each channel that was used in the experiment and its description (113-121) in format <channel>:<name>, e.g. "113:myref","115:liver","118:lung".
isotope_correction_values_4plex: override default values (see Documentation); use the following format: <channel>:<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '114:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
isotope_correction_values_8plex: override default values (see Documentation); use the following format: <channel>:<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '113:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
Y_contamination: Efficiency of labeling tyrosine ('Y') residues. 0=off, 1=full labeling
labeling_efficiency: Describes the distribution of the labeled peptide over the different states (unlabeled, mono- and di-labeled)
biological: Controls the 'biological' randomness of the generated data (e.g. systematic effects like deviations in RT). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same)
technical: Controls the 'technical' randomness of the generated data (e.g. noise in the raw signal). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same)

Input Ports

: Input protein sequences [FASTA]

Output Ports

: output: simulated MS raw (profile) data [mzML]
: output: ground-truth picked (centroided) MS data [mzML]
: output: ground-truth features [featureXML]
: output: ground-truth features, grouping ESI charge variants of each parent peptide [consensusXML]
: output: ground-truth features, grouping labeled variants [consensusXML]
: output: ground-truth features caused by contaminants [featureXML]
: output: ground-truth MS2 peptide identifications [idXML]

Popular Predecessors

Popular Successors

Views

MSSimulator Std Output: The text sent to standard out during the execution of MSSimulator.
MSSimulator Error Output: The text sent to standard error during the execution of MSSimulator. (If it appears in gray, it's the output of a previously failing run which is preserved for your trouble shooting.)

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension OpenMS from the below update site following our NodePit Product and Node Installation Guide:

v5.2

A zipped version of the software site can be downloaded here.

Plugin provider: Freie Universitaet Berlin, Universitaet Tuebingen, ZIB (GKN-Team) and the OpenMS Team

Plugin version: 3.0.0.202307112039

On NodePit since: 2023-12-06

Last update: 2026-03-12

KNIME versions: v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0, v3.7, v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!