eda

Integrity Enforcement: Applied Duplicate Row Filter to remove redundant records and ensure the uniqueness of each movie .

Text Normalization:

Utilized String Cleaner to strip unnecessary space.
Used String Manipulation (Multi Column) to capitalize categories and maintain consistency across text-based fields.

Temporal Formatting: Converted the release_date column from string format to a standardized Date&Time format.

Dimensionality Reduction: Implemented a Column Filter to remove low-relevance features such as imdb_id, homepage, and overview to focus on analytical objects.

Constraint Filtering: Applied a Row Filter to exclude records with unrealistic financial data, specifically targeting movies with budget and revenue below $10,000.

Missing values: Applied Missing Value node, using median imputation for runtime and mean imputation for vote_average to maintain statistical distribution.

Correlation Verification: Validated the cleaned dataset through Linear Correlation and Heatmap nodes, confirming strong relationships between revenue, budget, and vote.

Nodes

Extensions

Links

Download