Icon

01.Parallel Executions - Exercise

<p>This workflow invokes a lexicon-based sentiment classifier to predict the sentiment of a large number of airline reviews. To speed up the predictions, this workflow surrounds the invoked workflow with the Parallel Chunk nodes, parallelizing its execution.</p>

Error Execute failed: 422 Execution context "xyz" not found when executing on KNIME Hub

  1. Open the remote workflow in KNIME Analytics Platform

    1. Connect to the Hub

    2. Navigate to the space where you have uploaded the workflow (note the cloud icon in the tab)

    3. Double click to open the remote workflow locally

  2. Open the Call Workflow Service node and make sure that a valid execution context is selected

    • If you cannot select the execution context, it means that you have opened the local workflow, not the remote one

  3. Save and overwrite the workflow and try again the execution

TROUBLESHOOTING ERROR 422 (PULL)

Step 2. Parallelize the execution

Here we wrap the called workflow into a construction that parallelizes its execution.

  1. Drag nodes Parallel Chunk Start and Parallel Chunk End from the node repository to this part of the workflow. Connect the output port of the CSV Reader node to the input port of the Parallel Chunk Start node.

  2. When configuring the Parallel Chunk Start node, select 'Use automatic chunk count'. When configuring the Parallel Chunk End node, select 'Add Chunk Index to Row ID'.

  3. Connect the output port of the Parallel Chunk Start node to the input port of the Call Workflow Service node. Connect the output port of the Call Workflow Service node to the input port of the Parallel Chunk End node.

  4. Click on the top right of the Parallel Chunk End node and drag the line that appears until it connects to the input port of the Timer info node. Execute the workflow now and see how much time it takes to run it (in ms). It should be significantly faster than the previous execution from Step 1.


Part 5 - Best Practices

Exercise 01 Parallel Executions

Learning objective: In this exercise you'll learn how to parallelize an application to optimize its runtime. The application predicts the sentiment of a large number of airline reviews.


Workflow description: This workflow invokes a lexicon-based sentiment classifier to predict the sentiment of a large number of airline reviews. To speed up the predictions, this workflow surrounds the invoked workflow with the Parallel Chunk nodes, parallelizing its execution.


You'll find the instructions to the exercises in the yellow annotations.

Step 1. Invoke the sentiment predictor

  1. Drag the Call Workflow Service node from the node repository to this part of the workflow.

  2. When configuring the Call Workflow Service node, set the workflow relative path to '../Callee Applications/04.Callee Deploying Sentiment Predictor - Lexicon Based'. Select 'Adjust node ports' and clock 'OK'. Connect the output port of the CSV Reader node to its input port.

  3. Click on the top right of the Call Workflow Service node and drag the red line that appears until it connects to the input port of the Timer Info node. You're connecting a flow variable from the former node to the flow variable input port of the latter. Execute the workflow and see how much time it takes to run it (in ms).

Input Ingestion

Here we read a large sample of tweets for sentiment prediction. Parallelization makes it faster.

Check how much time (ms)it takes without parallelization
Timer Info
Read large dataset
CSV Reader

Nodes

Extensions

Links