Cache

The Cache node materializes and caches the input table in a data processing workflow. This node is useful after a sequence of preprocessing steps, especially when these steps involve column transformations, such as removing, manipulating, or adding new columns.

In workflows involving multiple transformation nodes, only the modified data (e.g., added columns) is stored, while the unmodified columns reference the input data. Although this approach optimizes the execution and data caching for individual nodes, it can result in tables that are composites of multiple nested tables. Consequently, iterating over such composite tables may be less efficient compared to iterating over a single, unified table.

The Cache node addresses this by materializing the input data, creating a self-contained table that consolidates all columns. Additionally, the Cache node is useful in scenarios where portions of a workflow are executed in streaming mode, as it allows data to be staged at specific points. This staging facilitates inspection and debugging, providing a snapshot of the data at the desired point in the workflow.

Options

Column domains

Specify whether to take domains of all input columns as output domains as-is or compute them on the output rows.

Depending on the use case, one or the other setting may be preferable:

  • Retaining input columns can be useful, if the axis limits of a view should be derived from domain bounds, and that bounds should stay stable even when the displayed data is filtered.
  • Computing domains can be useful when a selection widget consumes the output and should only display actually present options to users.

If column domains are irrelevant for a particular use case, the "Retain" option should be used since it does not incur computation costs.

Copy Implementation
Select the copy implementation to use when copying. In most cases leave it as automatic unless you are running performance tests or similar. For backward compatibility reasons, the value for existing nodes is kept as 'Row-based'.
  • Automatic: Determines the algorithm automatically based on the table backend used by the current workflow
  • Columnar (Full Row): Uses the Columnar Table API, best used when the workflow configuration is set to use the Columnar Table Backend.
  • Columnar (Cell By Cell): Uses the new Columnar Table API, in additioncopies each value in a row individually.
  • Row-based (Full Row): Uses the new Row Table Backend, using the old row-based API, which is known to be inefficient when the Columnar Table is set on a workflow.

Input Ports

Icon
Input table to cache.

Output Ports

Icon
As input table, only cached.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.