Golden Era of Movies
You’ve joined a movie analytics team investigating audience preferences, genre dynamics, and the true value of highly-rated films on Letterboxd. Using the Letterboxd Movie Ratings dataset, your goal is to clean and transform movie data, uncover hidden audience trends, and identify which films punch above their popularity level.
Here are four questions your team lead wants you to answer:
Which genre is among thetop 5 most popular genres as well as among thetop 5 best rated genres?
Categorize movies into three groups based on their runtime: "Short Film" (runtime < 60 minutes), "Standard" (runtime <150 minutes), and "Epic" (runtime >= 150 minutes). Choose your favorite three movie genres and compare their runtime based on the runtimecategories.
For each genre, calculate the average rating in each decade. Compare how the ratings changed over time for the genres Action, Crime, Romance, and Music.
Identify the "Hidden Gems" in our dataset: Movies with a high rating by a reliable group of viewers but typically overlooked by the mainstream audience. You want to reward high ratings and penalize high popularity. Calculate a custom Hidden Gem Score (HGS) for each movie. Which 25 movies rank highest on this custom index?
Note:If you struggle to come up with a HGS, find inspiration from document analysis! TF-IDF uses a logarithmic penalty to prevent incredibly common words from drowning out the unique ones.
Dataset:Cleaned Letterboxd data on KNIME Community Hub (original Letterboxd Movie Ratings dataset from Kaggle)