Icon

Projet_​finale

Data import

Data cleaning and formatting

EDA

Read CSV file from my local computer
CSV Reader
Replace missing values in text columns (genres, production_companies, cast, director) with ‘N/A’ to keep all films included in category-based analyses.
Missing Value
an initial exploration of the dataset by examining data types, missing values and the number of distinct entries in order to identify key variables and detect early anomalies.”
Statistics View
Remove duplicate films based on the id key
Duplicate Row Filter
Remove columns that are not useful for a future AI project:Homepage, because it's almot empty
Column Filter
Statistics View
Box Plot
Select the id and genres_transformed columns to extract the information needed for building the film–genre relational table.
Column Filter
Math Formula
CSV Writer
Scatter Plot
Select the id and director_transformed columns to prepare a director-specific table for each film.
Column Filter
Split the director list (using ‘|’ as separator) into a collection to separate any multiple directors.
Cell Splitter
Relationship between popularity and vote average
Scatter Plot
Split the genres_transformed column (genre list separated by ‘|’) into a collection.
Cell Splitter
Math Formula
Expand each collection into multiple rows—one per (id, genre) pair—to create a normalized table where each row represents a film–genre association.
Ungroup
Split the actor list (using ‘|’ as separator) into a collection so each actor is individually linked to the film.
Cell Splitter
Expand each actor collection into multiple rows—one per (id, actor) pair—to produce a normalized table suitable for actor-based analysis.
Ungroup
Create one row per (id, director) pair to produce a normalized film–director table suitable for director-based analysis.
Ungroup
Joiner
Select the id and cast_transformed columns to extract the information needed for building the film–actor relational table.
Column Filter
CSV Writer
CSV Writer
Keep only the id and cast columns.
Column Renamer
Keep only the id and genre columns.
Column Renamer
CSV Writer
Keep only the id and director columns
Column Renamer
Standardize financial variablesby converting them to decimal format (float)
Number Format Manager
Keep only the id and production_company columns.
Column Renamer
Remove films with a runtime of 0
Row Filter
Split the production company list (separated by ‘|’) into a collection to isolate each company associated with a film.
Cell Splitter
Expand each collection into multiple rows—one per (id, production company) pair—to create a normalized film–company table.
Ungroup
Create a budget quality flag (budget_flag):missing → budget is zero or not providedsuspicious → budget < 1,000ok → realistic budget
Rule Engine
Select the id and production_companies_transformed columns to prepare a production-company table for each film.
Column Filter
Convert release_date from string to a proper Date/DateTime format to standardize the field and make it usable for EDA.
String to Date&Time
Bar Chart
Create a simple central table containing each film’s key features, making descriptive analysis and joins with relational tables easier.
Column Filter
Create a runtime category (runtime_type):short → runtime < 30 minvery long documentary → runtime ≥ 300 min and genre = Documentaryvery long film → runtime ≥ 180 minstandard feature → all other cases
Rule Engine
Statistics
Standardize text columns (titles, genres, cast, director, production_companies):convert all text to lowercaseremove extra spaces.
String Manipulation (Multi Column)
Histogram

Nodes

Extensions

Links