0 ×

Spark Frequent Item Sets

KNIME Extension for Apache Spark core infrastructure version 4.3.1.v202101261633 by KNIME AG, Zurich, Switzerland

This node uses Spark MLlib to compute frequent item sets. See the Spark Association Rule Learner node to generate frequent item sets and association rules in one step.

Frequent item sets are computed using the FP-growth implementation provided by Spark MLlib, using input data with a collection column, where each cell holds the items of a transaction. Rows with missing values in the selected item column are ignored. FP-growth uses a suffix tree (FP-tree) structure to encode transactions without generating candidate sets explicitly and then extracts the frequent item sets from this FP-tree. This approach avoids the usually expensive generation of explicit candidates sets used in Apriori-like algorithms designed for the same purpose. More information about the FP-Growth algorithm can be found in Han et al., Mining frequent patterns without candidate generation. Spark implements Parallel FP-growth (PFP) described in Li et al., PFP: Parallel FP-Growth for Query Recommendation.

Transactions/item sets are represented as collection columns. The Spark GroupBy or Spark SQL nodes are recommended to create collection columns in Spark.

See Association rule learning (Wikipedia) for general information.

This node requires at least Apache Spark 2.0.

Options

Item Column
Collection column, where each cell holds the items of a transaction.
Minimum Support
The minimum support for an item set to be identified as frequent. For example, if an item set appears in 3 out of 5 transactions, it has a support of 3/5=0.6 (default: 0.3).
Number of partitions
Optional: Number of partitions used by the Parallel FP-growth algorithm to distribute the work (default: same as input data).

Input Ports

Icon
Spark DataFrame with a collection column, where each cell holds the items of a transaction

Output Ports

Icon
Spark DataFrame with frequent item sets

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install KNIME Extension for Apache Spark from the following update site:

KNIME 4.3

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.