Icon

02 String Cleaning & Standardization

<p><strong>String Cleaning and Standardization</strong></p><p>This workflow demonstrates how to <strong>transform string data </strong>from raw, "dirty" data into a standardized, analysis-ready format with a variety of nodes dedicated to string cleaning and string manipulation.</p>

URL: KNIME Self Paced Course https://www.knime.com/learning?pk_vid=4a8e4f3d0cc709d917781565429e6e19
URL: KNIME Cheat Sheet: Data Wrangling https://www.knime.com/files/data-wrangling-with-knime.pdf
URL: Just KNIME It! https://www.knime.com/just-knime-it?pk_vid=4a8e4f3d0cc709d917781565749e6e19
URL: KNIME TV - Youtube https://www.youtube.com/@KNIMETV
URL: KNIME: Cheat Sheet Building a Workflow for Beginners https://www.knime.com/cheat-sheets/building-knime-workflow-beginners?pk_vid=4a8e4f3d0cc709d91778166786f42aeb

String Cleaning and Standardization


This workflow demonstrates how to transform string data from raw, "dirty" data into a standardized, analysis-ready format with a variety of nodes dedicated to string cleaning and string manipulation.

Workflow complete!

Keep the momentum going by exploring Just KNIME It!on the Hub to challenge yourself and see how these nodes can be integrated into more complex workflows and use cases.

Data cleaning and manipulation

Step 1: In the "String Cleaner" node, we remove trailing commas from names, and quotes from country codes by adding them to ''Other characters to remove''.

Step 2: In the "Expression" node, we apply proper capitalization to the names. Use the following expression to ensure proper casing for names:
capitalize(strip($["Name"]))

Step 3:Click "Apply and Execute".

Standardization: Normalize values

Step 1: Use the second "Expression" node to force the "Email" column to lower case.

Step 2: In the "String Replacer" node, we replace lowercase country codes (e.g., "us") and replace them with full standardized names (e.g., "USA").

Step 3: Click "Apply and Execute".

Presentation and view

Step 1: In the "String Format Manager", select the "Email" column to add the mailto: hyperlink to all emails.

Step 2: Use the "Column Renamer to change column names like "Name" to "First Name" and "Email" to "Email Address".

Step 3: Click "Apply and Execute".

Load raw data
Table Creator
Proper format for Country Names
String Replacer
Convert email stringsto hyperlinks
String Format Manager
Apply lowercaseformatting to emails
Expression
Remove trailingcommas
String Cleaner
Rename "Name"and "Email" columns
Column Renamer
Visualize cleanedresults
Table View
Apply propercase formatting
Expression

Nodes

Extensions

Links