Generates for each row of a given input table a bit vector. The bit vectors are either generated from multiple numerical or string columns, a string column containing the bit positions to set, hexadecimal or binary strings or a collection column. In order to adjust the node settings please select first the source column object e.g. if the bit vector should be created from multiple string/numerical columns or from a single string/collection column. Depending on the selected option the corresponding dialog elements are enabled.

- either a global threshold is defined, then all values which are above or equal to the threshold are converted into set bits, all other bit positions remain 0, or
- a certain percentage of the mean of each column is used as a threshold, then all values which are above or equal to the percentage of the mean are converted into set bits. As an example let's say the mean percentage is set to 50% and the mean of col1 is 2 and the mean of col2 is 8. Then the corresponding bit for col1 is set if the value is above or equal to 1 and for col2 if the value is above or equal to 4.

- Hexadecimal strings: strings consisting only of the characters 0-9 and A - F (where lower- or uppercase is not important). The represented hexadecimal number is converted into a binary number which is represented by the resulting bit vector.
- Binary strings: strings consisting only of 0s and 1s are parsed and converted into the according bit vectors.
- ID strings: strings consisting of numbers (separated by spaces) where the numbers refer to those positions in the bit vector which should be set. (Typical input format for association rule mining).

- Pattern
- The pattern to search for in the data value
- Contains wildcards
- Select this option to use wild cards in the pattern. Wildcard patterns contain '*' (matching any sequence of characters) and '?' (matching any one character).
- Regular expression
- Select this option to specify a regular expression. Examples of regular expressions are given below.
"
`^foo.*`

" matches anything that starts with "foo". The '^'-character stands for the beginning of the word, the dot matches any (one) character, and the asterisk allows any number (including zero) of the previous character.

"`[0-9]*`

" matches any string of digits (including the empty string). The`[`

`]`

define a set of characters (they could be added individually like`[0123456789]`

, or by range). This set matches any (one) character included in the set.

For a complete explanation of regular expressions see e.g. the JavaDoc of the java.util.regex.Pattern class. - Case sensitive match
- A case sensitive matching is performed if this option is selected.
- Set bit if pattern
- Depending on the selected option the corresponding bit in the bit vector is set if the pattern either does match or does not match
- Multiple column selection panel
- Select the string columns to convert to a bit vector

- Threshold
- If the "numeric input" is checked, specify the global threshold. All values which are above or equal to this threshold will result in a 1 in the bit vector.
- Use percentage of the mean
- Check, if a percentage of the mean of each column should serve as threshold above which the bits are set.
- Percentage
- Specify which percentage of the mean a value should have in order to be set.
- Multiple column selection panel
- Select the numeric columns to convert to a bit vector

- Kind of string representation
- Select one of the three valid input formats: HEX (hexadecimal), ID (bit positions) or BIT (binary strings). See description above.
- Single column to be parsed
- The string column to parse

- Single column to be parsed
- The collection column to parse

- Remove column(s) used for bit vector creation:
- If it is checked the generating column(s) (included columns if numeric input was used or the selected string column) are removed. If it is unchecked the generated bit vectors are appended to the input table.
- Output column
- Name of the output column.
- Fail on invalid input
- If selected, the node will fail during execution if a data cell could not be converted to a bit set. If unselected, the node will skip these invalid entries and insert a missing value instead.
- Bit vector type
- The dense vector type stores is the default method that stores each vector position with a single bit thus requiring the same amount of bits for all vectors e.g. the vector length in bits. The sparse vector type stores only the indices of the set bits thus depends the space required to store a bit vector depends on the number of set bits. Each set bit requires between 32 and 64 bits depending on the operating system. Therefore the sparse option should be only selected if the majority of the bit vectors contain only few set bits e.g. less than 10%.

- Statistics View
- Provides information about the generation of the bit vectors from the data. In particular this is the number of processed rows, the total number of generated zeros and ones and the resulting ratio of 1s to 0s.

- 01_ClusteringKNIME Hub
- 02_Working_with_Collection_Supported_NodesKNIME Hub
- 03_Imbalanced_Sentiment_Analysis_with_XGBoostKNIME Hub
- 07_RDKit_with_Java_Snippet_ExampleKNIME Hub
- 07-association-analysis-examplesKNIME Hub
- Show all 35 workflows

- No links available

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

To use this node in KNIME, install the extension KNIME Base nodes from the below update site following our NodePit Product and Node Installation Guide:

v4.7

A zipped version of the software site can be downloaded here.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**