Icon

02 Transform Data by Cleaning and Standardizing Strings

<p><strong>Transform Data: String Cleaning and Standardization</strong></p><p>This workflow demonstrates how to <strong>transform string data </strong>from raw, "dirty" data into a standardized, analysis-ready format with a variety of nodes dedicated to string cleaning and string manipulation.</p>

URL: KNIME Learning Center https://www.knime.com/learning
URL: KNIME Cheat Sheet: Building a KNIME workflow for beginners https://www.knime.com/cheat-sheets/building-knime-workflow-beginners
URL: KNIME Cheat Sheet: Data wrangling with KNIME Analytics Platform https://www.knime.com/files/data-wrangling-with-knime.pdf

Transform Data: String Cleaning and Standardization


This workflow demonstrates how to transform string data from raw, "dirty" data into a standardized, analysis-ready format with a variety of nodes dedicated to string cleaning and string manipulation.

Workflow complete!

Keep the momentum going by exploring Just KNIME It! on the Hub to challenge yourself and see how these nodes can be integrated into more complex workflows and use cases.

Data cleaning and manipulation

Step 1: In the "String Cleaner" node, we remove trailing commas from names, and quotes from country codes by adding them to ''Other characters to remove''.

Step 2: In the "Expression" node, we apply proper capitalization to the names. Use the following expression to ensure proper casing for names:
capitalize(strip($["Name"]))

Step 3: Click "Apply and Execute".

Standardization: Normalize values

Step 1: Use the second "Expression" node to force the "Email" column to lower case.

Step 2: In the "String Replacer" node, we replace lowercase country codes (e.g., "us") and replace them with full standardized names (e.g., "USA").

Step 3: Click "Apply and Execute".

Presentation and view

Step 1: In the "String Format Manager", select the "Email" column to add the mailto: hyperlink to all emails.

Step 2: Use the "Column Renamer" to change column names like "Name" to "First Name" and "Email" to "Email Address".

Step 3: Click "Apply and Execute".

Load raw data
Table Creator
Proper format for Country Names
String Replacer
Convert email stringsto hyperlinks
String Format Manager
Apply lowercaseformatting to emails
Expression
Remove trailingcommas
String Cleaner
Rename "Name"and "Email" columns
Column Renamer
Visualize cleanedresults
Table View
Apply propercase formatting
Expression

Nodes

Extensions

Links