Icon

AP-22457_​ParallelChunk_​LoopPerformance

AP-22457: Parallel Chunk Loop (End) is unnecessarily slow due to synchronous data write (columnar backend)

The issue was an unnecessary synchronization when writing the output in the Parallel Chunk End when the "Columnar Backend" was set on the workflow.

Performance comparisons for 50M rows (data generator), with a par-chunker containing a row filter removing about 2/3 of the rows:

Runtime comparison (on my system):
- Parallel Chunk, 5.2.3 : 202s
- Parallel Chunk, 5.3 Nightly: 65s
- Plain Row Filter: 35s
(no par-chunker, just for reference)

Nodes

Extensions

Links