Dfi

The Deferred Frequency Index (DFI) is a tool for string mining under frequency constraints, i.e., predicates that evaluate solely the frequency of a pattern occurrence in the data. The frequency of a pattern is defined as the number of distinct sequences in a database that contain the pattern at least once. Currently the implementation contains 3 different predicates and can easily be extended by user-defined frequency predicates. The frequencies are calculated during the construction of a suffix tree over all databases, which enables to limit the index construction to a problem-specific minimum referred to as the optimal monotonic hull.

(c) Copyright 2010 by David Weese and Marcel H. Schulz

Web Documentation for Dfi

Options

version-check
Turn this option off to disable version update notifications of the application.
minmax
Set minimal and maximal frequency per database.
support
Minimal support in the first (with --growth) or all (with --entropy) databases.
growth
Minimal support ratio between the first and second databases.
entropy
Maximal entropy of support values of all databases.
alphabet
Specify database alphabet.
maximal
Output only left and right maximal substrings.

Input Ports

Icon
Database files in Fasta/Fastq or text format (lines are strings). [fq,fastq,fa,fasta,faa,ffn,fna,frn,embl,gbk,raw,sam]

Output Ports

Icon
Change output filename. Default: <stdout>. [txt]

Views

Dfi Std Output
The text sent to standard out during the execution of Dfi.
Dfi Error Output
The text sent to standard error during the execution of Dfi. (If it appears in gray, it's the output of a previously failing run which is preserved for your trouble shooting.)

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.