Icon

03 Removing Duplicates & Outliers

<p><strong>Removing Duplicates and Outliers</strong></p><p>This workflow demonstrates how to <strong>prepare raw data for analysis</strong> by removing recurring information, filtering out unwanted values, and calculating new metrics to sort our results effectively.</p>

URL: KNIME Self Paced Course https://www.knime.com/learning?pk_vid=4a8e4f3d0cc709d917781568819e6e19
URL: KNIME Cheat Sheet: Data Wrangling https://www.knime.com/files/data-wrangling-with-knime.pdf
URL: Just KNIME It! https://www.knime.com/just-knime-it?pk_vid=4a8e4f3d0cc709d917781569309e6e19
URL: KNIME TV - Youtube https://www.youtube.com/@KNIMETV

Clean data using a Expression Row Filter node

Step 1: Click on the "Expression Row Filter" node to open the configuration window.

Step 2: Enter the expression below to remove "outliers" or unwanted data.
$["Score"] <= 200

Step 3: Click "Apply and Execute". This keeps only the rows that meet your specified criteria.

Sort results

Step 1: Click on the "Sorter" node to open configuration window.

Step 2: Define one or multiple sorting criteria. In the column dropdown, select your newly created column ("Score_Normalized").

Step 3: Choose the Descending sort order and click "Apply and Execute".

Your data is now cleaned, transformed, and ranked!

Workflow complete!

Keep the momentum going by exploring Just KNIME It!on the Hub to challenge yourself and see how these nodes can be integrated into more complex workflows and use cases.

Removing Duplicates and Outliers


This workflow demonstrates how to prepare raw data for analysis by removing recurring information, filtering out unwanted values, and calculating new metrics to sort our results effectively.

Load rawinput data
Table Creator
Remove duplicates
Duplicate Row Filter
Remove scoreshigher than 200
Expression Row Filter
Divide Score by 100
Expression
Sort by normalized scorein descending order
Sorter

Nodes

Extensions

Links