Icon

KNIME_​project11

EDA

data clean

review

expoert

explanations:

  1. Import: Use CSV reader to import the dataset we are gong to clean,

  2. EDA is to identify the types of variables, detect errors in the data, understanding the relationships between variables and verify consistency with the business domain,

  3. Cleaning data: identifying and handling missing values, defining and converting data formait(Date, sting), removing duplicate rows. Correcting inconsistent values.

  4. Export: Use CSV writer to output the data cleaned.

CSV Reader
Linear Correlation
visualize distribution of numerical values and identify quartiles outliers
Box Plot
Handle the missing values
Missing Value
visualize correlation, overview of several relationships
Heatmap
review descriptive statistics after cleaning to confirm thats values are consistent
Table View
correct obviously data errors, such as mismatched between model and brand ( model i4, brand Toyota)
Rule Engine
visualize correlation
Scatter Plot
standardise date format
String to Date&Time
browse the dataset to identify any obvious data quality issues.(missing value, inconsistencies..)
Table View
Statistics View
visualize distribution
Histogram
export dataset
CSV Writer
descriptive statistics for data exploration
Statistics View
clean and standardise the text
String Cleaner
To check if there are the duplicate rows, in order to remove or correct.
Duplicate Row Filter

Nodes

Extensions

Links