Justification

-EDA revealed missing values in columns such as homepage, tagline, keywords, production_companies and cast. Possible measures include deleting columns irrelevant to future analysis and applying 'unknown' to ensure all rows are categorized.

-The distribution analysis showed that budget and revenue contain a large number of zero values. Because this data is important for analysis, we need to consider removing data with a value of 0 during the data cleaning phase.

-The date column is being recognized as a string, which indicates a formatting issue. It needs to be converted to the correct date format.

-Box plot analysis confirms extreme outliers in revenue and vote_count, which need to be needled in data cleaning phase.

-The heatmap and linear correlation analysis suggest a positive relationship between revenue and vote_count, popularity and vote_count indicating that financially successful movies tend to receive higher audience engagement. This justified the use of clustering to segment performance levels.

-There are inconsistencyin in Text Columns. Columns affected are dirctors, genres, production_companies, cast, and keyword. They contain multiple values separated by "|". To facilitate future AI analysis, we will output all the information in each column into a separate table.

Import

EDA

Data Cleaning

Justification

Data Preparation and Transformation

Data Preparation and Transformation

Data Export

KNIME_​Movie_​Final_​Project

Import

EDA

Data Cleaning

Justification

Data Preparation and Transformation

Data Preparation and Transformation

Data Export

Nodes

Extensions

Links

Download

KNIME_Movie_Final_Project