Icon

The Golden Era of Movies

<p><strong>Golden Era of Movies</strong></p><p>You’ve joined a movie analytics team investigating audience preferences, genre dynamics, and the true value of highly-rated films on Letterboxd. Using the Letterboxd Movie Ratings dataset, your goal is to clean and transform movie data, uncover hidden audience trends, and identify which films punch above their popularity level.</p><p>Here are four questions your team lead wants you to answer:</p><ol><li><p>Which genre is among the<strong><em> </em>top 5 most popular genres</strong> as well as among the<strong><em> </em>top 5 best rated genres</strong>?</p></li><li><p>Categorize movies into three groups based on their runtime: "Short Film" (runtime &lt; 60 minutes), "Standard" (runtime &lt;150 minutes), and "Epic" (runtime &gt;= 150 minutes). Choose your favorite three movie genres and <strong>compare their runtime based on the runtime</strong> <strong>categories</strong>.</p></li><li><p>For each genre, calculate the <strong>average rating in each decade</strong>. Compare how the ratings changed over time for the genres <em>Action</em>, <em>Crime</em>, <em>Romance</em>, and <em>Music</em>.</p></li><li><p>Identify the "<strong>Hidden Gems</strong>" in our dataset: Movies with a high rating by a reliable group of viewers but typically overlooked by the mainstream audience. You want to reward high ratings and penalize high popularity. Calculate a custom <strong>Hidden Gem Score (HGS)</strong> for each movie. Which <strong>25 movies rank highest</strong> on this custom index?<br><strong><em>Note:</em> </strong>If you struggle to come up with a HGS, find inspiration from document analysis! TF-IDF uses a logarithmic penalty to prevent incredibly common words from drowning out the unique ones.</p></li></ol>

Top 5 most popular genres vs. top 5 best rated genres

Comparing runtimes across genres

Average rating of genres over time

Finding the hidden gems

Hidden gem formula

HGS = movie rating / (log10(popularity score+10))

Golden Era of Movies

You’ve joined a movie analytics team investigating audience preferences, genre dynamics, and the true value of highly-rated films on Letterboxd. Using the Letterboxd Movie Ratings dataset, your goal is to clean and transform movie data, uncover hidden audience trends, and identify which films punch above their popularity level.

Here are four questions your team lead wants you to answer:

  1. Which genre is among thetop 5 most popular genres as well as among thetop 5 best rated genres?

  2. Categorize movies into three groups based on their runtime: "Short Film" (runtime < 60 minutes), "Standard" (runtime <150 minutes), and "Epic" (runtime >= 150 minutes). Choose your favorite three movie genres and compare their runtime based on the runtimecategories.

  3. For each genre, calculate the average rating in each decade. Compare how the ratings changed over time for the genres Action, Crime, Romance, and Music.

  4. Identify the "Hidden Gems" in our dataset: Movies with a high rating by a reliable group of viewers but typically overlooked by the mainstream audience. You want to reward high ratings and penalize high popularity. Calculate a custom Hidden Gem Score (HGS) for each movie. Which 25 movies rank highest on this custom index?
    Note:If you struggle to come up with a HGS, find inspiration from document analysis! TF-IDF uses a logarithmic penalty to prevent incredibly common words from drowning out the unique ones.

Dataset:Cleaned Letterboxd data on KNIME Community Hub (original Letterboxd Movie Ratings dataset from Kaggle)

Read cleanedLetterboxd data
CSV Reader
Create decadecolumn
Expression
Keep top 5 mostpopular genres
Top k Row Filter
Keep only genre in top tablethat is also in bottom table
Reference Row Filter
vote_count >= 100 & < 2000;popularity <40;vote_average >= 7.5
Row Filter
Keep top 5 bestrated genres
Top k Row Filter
Comparing runtimesof different genres
Bar Chart
Create Pivot tableto calculate average ratingper decade and genre
Pivot
Create Pivot table to getaverage runtime for eachgenre and runtime_category
Pivot
Color by genre
Color Manager
Visualize resultingtable
Table View
Categorize moviesbased on their runtime
Expression
Filter to Action, CrimeRomance, and Music
Column Filter
Rename split resultcolumn into genre
Column Renamer
Calculatehidden_gem_score
Expression
Filter to Documentary,History, and Crime
Row Filter
Average rating ofgenres over time
Line Plot
Top 25hidden gems
Top k Row Filter
Group by genres tocalculate average rating and popularity
GroupBy
Split multi-genresin unique values
Cell Splitter
One row permovie and genre
Ungroup

Nodes

Extensions

Links