Icon

final_​exam

Exploratory Data Analysis (EDA):

  • Data Inventory: Successfully ingested raw dataset containing 10000 rows and 7 observations.

  • Visual Verification: Initial profiling via Table View verified nconsistencies, spelling or formatting errors .

  • Missing values : verified missing values.

  • Outlier : Box Plot shows no significant outliers in price per unit and quantity.

Data cleaning:

  • Integrity Enforcement: Applied Duplicate Row Filter to remove redundant records and ensure the uniqueness .

  • Text Normalization:

    • Utilized String Cleaner to strip unnecessary space.

    • Used String Manipulation (Multi Column) to lower categories and maintain consistency across text-based fields.

  • Temporal Formatting: Converted the Transaction Date column from string format to a standardized Date&Time format.

  • Missing values: Applied Missing Value node, using mean imputation for Numbers and imputation fix value for strings

  • New column: Add a new column named total price

  • Correlation Verification: Validated the cleaned dataset through Linear Correlation and Heatmap nodes, confirming strong relationships between quantity, price per unit and total price.

Reading information
CSV Reader
identify missing values, imputation required.
Statistics View
exported clean data for further analysis
CSV Writer
normal distribution
Histogram
no liner correlation
Scatter Plot
initial profiling:verified dataset contains 10000 rows 7 columns, exsist errors in format, inconsistencies,
Table View
manipulate missing values, inputation with unknown for string, mean value for number
Missing Value
add a column named total price by formula(quantity *price per unit)
Math Formula
String to Number
standardize: formatting (lower) to ensure categorical consistency
String Manipulation (Multi Column)
Heatmap
no significant outliers
Box Plot
standardize, avoid unnecessary spaces
String Cleaner
Get rid of duplicate rows
Duplicate Row Filter
Scatter Plot
Linear Correlation
standardize number
String to Number
to check missing values
Statistics View
standardize date&time format
String to Date&Time

Nodes

Extensions

Links