Apriori: The apriori algorithm (Agrawal et al. 1993) carries out a breadth first search on the subset lattice and determines the support of item sets by subset tests. This is a pretty fast implementation that uses a prefix tree to organize the counters for the item sets. The census data set may be used to test the program.
FPgrowth: The FPgrowth algorithm (frequent pattern growth, Han et al 2000) represents the transaction database as a prefix tree which is enhanced with pointers that organize the nodes into lists referring to the same item. The search is carried out by projecting the prefix tree, working recursively on the result, and pruning the original tree. Since version 1.2 this implementation also contains the alpha-pruning of the FP-Bonsai techniques.
RElim: The RElim algorithm (recursive elimination) is inspired by the FP-growth algorithm, but does its work without prefix trees or any other complicated data structures. The main strength of this algorithm is not its speed (although it is not slow, but even outperforms apriori and eclat on some data sets), but the simplicity of its structure. Basically all the work is done in one recursive function of fairly few lines of code.
SaM: The split and merge algorithm (Split and Merge) combines a depth-first traversal of the subset lattice with a horizontal transaction representation. The main strength of this algorithm is not its speed (although it is not slow, but even outperforms apriori and Eclat on some data sets), but the simplicity of its structure. Basically all the work is done in one recursive function of about fairly few lines of code. In addition, it only uses a simple array as the only data structure.
JIM: Finds Jaccard item sets with an extension of the Eclat algorithm. In analogy to frequent item set mining, where one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, a Jaccard item set is an item set for which the (generalized) Jaccard index of its item covers exceeds a user-specified threshold. This measure yields a much better assessment of the association strength of the items than simple support. Since the (generalized) Jaccard index is, like the support, also anti-monotone, the same basic approach can be used for the search, provided it is extended to compute the denominator of the Jaccard index.
Dice: Finds Dice item sets with an extension of the Eclat algorithm. In analogy to frequent item set mining, where one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, a Dice item set is an item set for which the Dice index of its item covers exceeds a user-specified threshold.
Tanimoto: Finds Tanimoto item sets with an extension of the Eclat algorithm. In analogy to frequent item set mining, where one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, a Tanimoto item set is an item set for which the Tanimoto index of its item covers exceeds a user-specified threshold.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Itemset Mining from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.