DecoyDatabase

Creates combined target+decoy sequence database from forward sequence database.

Web Documentation for DecoyDatabase

Options

version
Version of the tool that generated this parameters file.
decoy_string
String that is combined with the accession of the protein identifier to indicate a decoy protein.
decoy_string_position
Should the 'decoy_string' be prepended (prefix) or appended (suffix) to the protein accession?
only_decoy
Write only decoy proteins to the output database instead of a combined database.
type
Type of sequence. RNA sequences may contain modification codes, which will be handled correctly if this is set to 'RNA'.
method
Method by which decoy sequences are generated from target sequences. Note that all sequences are shuffled using the same random seed, ensuring that identical sequences produce the same shuffled decoy sequences. Shuffled sequences that produce highly similar output sequences are shuffled again (see shuffle_sequence_identity_threshold).
shuffle_max_attempts
shuffle: maximum attempts to lower the amino acid sequence identity between target and decoy for the shuffle algorithm
shuffle_sequence_identity_threshold
shuffle: target-decoy amino acid sequence identity threshold for the shuffle algorithm. If the sequence identity is above this threshold, shuffling is repeated. In case of repeated failure, individual amino acids are 'mutated' to produce a different amino acid sequence.
seed
Random number seed (use 'time' for system time)
enzyme
Enzyme used for the digestion of the sample. Only applicable if parameter 'type' is 'protein'.
log
Name of log file (created only when specified)
debug
Sets the debug level
threads
Sets the number of threads allowed to be used by the TOPP tool
no_progress
Disables progress logging to command line
force
Overrides tool-specific checks
test
Enables the test mode (needed for internal use only)
missed_cleavages
Number of missed cleavages for relevant and neighbor peptides.
mz_bin_size
Bin size for spectra m/z comparison (the original study suggests 0.05 Th for high-res and 1.0005079 Th for low-res spectra).
pc_mass_tolerance
Maximal precursor mass difference (in Da or ppm; see 'pc_mass_tolerance_unit') between neighbor and relevant peptide.
pc_mass_tolerance_unit
Is 'pc_mass_tolerance' in Da or ppm?
min_peptide_length
Minimum peptide length (relevant and neighbor peptides)
min_shared_ion_fraction
Minimal required overlap 't_i' of b/y ions shared between neighbor candidate and a relevant peptide (t_i <= 2*B12/(B1+B2)). Higher values result in fewer neighbors.
non_shuffle_pattern
Residues to not shuffle (keep at a constant position when shuffling). Separate by comma, e.g. use 'K,P,R' here.
keepPeptideNTerm
Whether to keep peptide N terminus constant when shuffling / reversing.
keepPeptideCTerm
Whether to keep peptide C terminus constant when shuffling / reversing.

Input Ports

Icon
Input FASTA file(s), each containing a database. It is recommended to include a contaminant database as well. [fasta]
Icon
These are the relevant proteins, for which we seek neighbors [fasta,opt.]

Output Ports

Icon
Output FASTA file where the decoy database (target + decoy or only decoy, see 'only_decoy') will be written to. [fasta]
Icon
Output FASTA file with neighbors of relevant peptides (given in 'in_relevant_proteins'). []
Icon
Output FASTA file with target+decoy of relevant peptides (given in 'in_relevant_proteins'). Required for downstream filtering of search results via IDFilter and subsequent FDR. []

Views

DecoyDatabase Std Output
The text sent to standard out during the execution of DecoyDatabase.
DecoyDatabase Error Output
The text sent to standard error during the execution of DecoyDatabase. (If it appears in gray, it's the output of a previously failing run which is preserved for your trouble shooting.)

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.