Icon

Group_​Project_​AI

Data Analysis

Data Preparation

New DataSet Check

Continue Next Step Here

CSV Reader
Noticed some missing values, treated in the node bellow
Statistics
replaced missing value by "mean" for numericals, and by "most frequent" for Nominal
Missing Value
Check that all columns are useful here. I think there is nothing to exclude but I leave it here in case we change our mind
Column Filter
Check for repartition, nothing to note, no particular outlayers
Histogram
Check on the new DataSet
Statistics
Here, you can choose to tick the "remove included columns from output" option to remove the original columns that have been transformed into Numerical value. For the first version, I chose to keep everything.
One to Many
Here, for second version, I excluded the original columns to keep only numerical values
One to Many
Node to put all numerical values as a nomalized model, so each varaiable has the same weight in the rest of the analysis.
Normalizer
Node to extract the newly prepared Data Set (I used it to run checks with ChatGPT, making sure there
CSV Writer
excluding "Female_Gender", as it is redundant with "Male_Gender". From now on, in "Male_Gender", "1 = Male" and "0 = Female"
Column Filter

Nodes

Extensions

Links