Frequency Filter

Filters terms in the given bag of words with a certain frequency value. On the one hand minimum and maximum values can be defined to be used for filtering. If the value of a specified frequency column is less than the minimum or greater than the maximum value the term is filtered. On the other hand a number k of terms to keep can be defined. Only those k terms with the highest frequency value are kept, the rest is filtered.


Filter options

Filter unmodifiable terms
Usually terms which have been set unmodifiable are not modified or filtered. If this setting is checked, these terms are filtered as well if they don't fit the specified requirements.
Filter column
The column containing the values to apply the filtering, i.e. the TF measure of each term can be computed before by the TF node. Once the column is appended, the filtering can be applied to this values.
Filtering by
The filter option specifies which filtering is be applied, the threshold filtering or the number of terms filtering. The threshold filtering keeps all rows with values contained in the specified filter column which are greater than the specified min and less than the maximum value. The number of terms filter on the other hand keeps a number K rows with the highest values.
Threshold settings
Specifies the minimum and the maximum threshold of the values of the filter column.
Number of terms settings
Specifies a number K of rows to keep, the rest is filtered out. Only these K rows with the highest value of the filter column are kept.

Deep Filtering

Document column
Specifies the column containing the documents to apply the filtering.
Deep filtering
If deep filtering is checked, the terms contained inside the documents are filtered too, this means that the documents are changed, which is more time consuming.

Input Ports

The input table which contains terms and documents.

Output Ports

The output table which contains terms documents and a corresponding frequency value.


This node has no views




You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.