Generates for each row of a given input table a bit vector. The bit vectors are either generated from
multiple numerical or string columns, a string column containing the bit positions to set, hexadecimal
or binary strings or a collection column. In order to adjust the node settings please select first the
source column object e.g. if the bit vector should be created from multiple string/numerical columns or from a
single string/collection column. Depending on the selected option the corresponding dialog elements are enabled.
Bit vectors from multiple columns
In the case of multiple columns the bit positions in the resulting bit vector correspond to the
column position in the input table.
For example if the second and third column of a given input table is selected and the first column is omitted
the bit vectors of each row will have length 2. The first bit of the bit vector is set if the value of the
second column matches the selected criterion likewise the second bit of the bit vector is set if the value
of the third column matches the selected criterion. The columns to consider when creating the bit vector can
be specified in the multiple column selection section. Using the enforce exclusion/inclusion option the
node can be configured to handle previously unknown columns. If the enforce exclusion option is selected all
unknown columns are added automatically to the include list whereas if the enforce inclusion option is selected
all unknown columns are added to the exclude list. The columns to include can be also defined by a pattern if
the Wildcard/Regex Selection option is selected.
Multiple string columns
The bit of a vector is set if the corresponding column value does match/does not match the specified pattern
depending on the "Set bit if pattern does match/does not match" option. The pattern may contain wildcards such
as '?' or '*' to match any one character or any sequence (including none) of characters.
It can also be a complex
regular expression.
Multiple numeric columns
There are two options to determine if the bit is set for the value in
the corresponding column or not:
either a global threshold is defined, then all values which are above or equal to
the threshold are converted into set bits, all other bit positions remain 0, or
a certain percentage of the mean of each column is used as a threshold, then all values which are above
or equal to the percentage of the mean are converted into set bits. As an example let's say the mean percentage
is set to 50% and the mean of col1 is 2 and the mean of col2 is 8. Then the corresponding bit for col1 is set
if the value is above or equal to 1 and for col2 if the value is above or equal to 4.
Bit vectors from a single column
In the case of a single input column only the selected single column to be parsed is considered for the
generation of the bit vectors.
Single string column
In the case of a string input only the column containing the string is
considered for the generation of the bit vectors. The string is parsed
and converted into a bit vector. There are three valid input formats
which can be parsed and converted:
Hexadecimal strings: strings consisting only of the characters 0-9 and A - F
(where lower- or uppercase is not important). The represented hexadecimal number is
converted into a binary number which is represented by the resulting bit vector.
Binary strings: strings consisting only of 0s and 1s are parsed and
converted into the according bit vectors.
ID strings: strings consisting of numbers (separated by spaces)
where the numbers refer to those positions in the bit vector which should be set.
(Typical input format for association rule mining).
Single collection column
In the case of a single collection column each unique collection element gets a bit position assigned.
The length of the bit vectors corresponds to the number of unique elements in a collection cells.
For example if the input table contains two rows with the collections {a,b} and {b,c} the corresponding
bit vectors will be [110] and [011].
Missing values
For numeric data the incoming missing values will result in 0s.
For multiple string columns a missing values will also result in 0s.
For the string input missing values will also result in a missing value
in the output table. If a string could not be parsed it will also result in
a missing cell in the output table and an error message with detailed information is printed in the console.
For a collection column all missing collection elements are ignored.
Options
Create bit vectors from multiple string columns
Pattern
The pattern to search for in the data value
Contains wildcards
Select this option to use wild cards in the pattern. Wildcard patterns contain '*' (matching any sequence of
characters) and '?' (matching any one character).
Regular expression
Select this option to specify a regular expression. Examples of regular expressions are given below.
"^foo.*" matches anything that starts with "foo". The '^'-character
stands for the beginning of the word, the dot matches any (one) character,
and the asterisk allows any number (including zero) of the previous character.
"[0-9]*" matches any string of digits (including the empty string).
The [] define a set of characters (they could be
added individually like [0123456789], or by range). This set
matches any (one) character included in the set.
For a complete explanation of regular expressions see e.g. the JavaDoc
of the
java.util.regex.Pattern class.
Case sensitive match
A case sensitive matching is performed if this option is selected.
Set bit if pattern
Depending on the selected option the corresponding bit in the bit vector is set if the pattern either
does match or does not match
Multiple column selection panel
Select the string columns to convert to a bit vector
Create bit vectors from multiple numeric columns
Threshold
If the "numeric input" is checked, specify the global threshold.
All values which are above or equal to this threshold will result
in a 1 in the bit vector.
Use percentage of the mean
Check, if a percentage of the mean of each column should serve as
threshold above which the bits are set.
Percentage
Specify which percentage of the mean a value should have in order to be set.
Multiple column selection panel
Select the numeric columns to convert to a bit vector
Create bit vectors from a single string column
Kind of string representation
Select one of the three valid input formats: HEX (hexadecimal),
ID (bit positions) or BIT (binary strings). See description above.
Single column to be parsed
The string column to parse
Create bit vectors from a single collection column
Single column to be parsed
The collection column to parse
General options
Remove column(s) used for bit vector creation:
If it is checked the generating column(s) (included columns if numeric input was used
or the selected string column) are removed.
If it is unchecked the generated bit vectors are appended to the input table.
Output column
Name of the output column.
Fail on invalid input
If selected, the node will fail during execution if a data cell could not be converted to a bit set.
If unselected, the node will skip these invalid entries and insert a missing value instead.
Bit vector type
The dense vector type stores is the default method that stores each vector position with a single bit thus
requiring the same amount of bits for all vectors e.g. the vector length in bits.
The sparse vector type stores only the indices of the set bits thus depends the space required to store a
bit vector depends on the number of set bits. Each set bit requires between 32 and 64 bits depending on the
operating system. Therefore the sparse option should be only selected if the majority of the bit vectors
contain only few set bits e.g. less than 10%.
Input Ports
Data table with numerical data or a string column to be parsed.
Provides information about the generation of the bit vectors from
the data. In particular this is the number of processed rows,
the total number of generated zeros and ones and the resulting
ratio of 1s to 0s.
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well?
Do you think, the search results could be improved or something is missing?
Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.