AP-22457_ParallelChunk_LoopPerformance

AP-22457: Parallel Chunk Loop (End) is unnecessarily slow due to synchronous data write (columnar backend)

The issue was an unnecessary synchronization when writing the output in the Parallel Chunk End when the "Columnar Backend" was set on the workflow.

Performance comparisons for 50M rows (data generator), with a par-chunker containing a row filter removing about 2/3 of the rows:

Runtime comparison (on my system):
- Parallel Chunk, 5.2.3 : 202s
- Parallel Chunk, 5.3 Nightly: 65s
- Plain Row Filter: 35s
(no par-chunker, just for reference)

Nodes

Variable Expressions6 ×
Component Input4 ×
Component Output4 ×
Row Filter2 ×
Transpose2 ×
Show all 10 nodes

Extensions

FeatureKNIME Base nodes
FeatureKNIME Column Expressions (legacy)
FeatureKNIME Parallel Chunk Loop Nodes
FeatureKNIME Testing Framework UI

AP-22457_​ParallelChunk_​LoopPerformance

Nodes

Extensions

Links

Download

AP-22457_ParallelChunk_LoopPerformance