Icon

Monthly Challenge - June2026-Movies

<p><strong>The </strong>Golden Era of Movies</p><p><strong>Description:</strong></p><p>You’ve joined a movie analytics team investigating audience preferences, genre dynamics, and the true value of highly-rated films on Letterboxd. Using the Letterboxd Movie Ratings dataset, your goal is to clean and transform movie data, uncover hidden audience trends, and identify which films punch above their popularity level.</p><p>Here are four questions your team lead wants you to answer:</p><ol><li><p>Which genre is among the <strong>top 5 most popular genres</strong> as well as among the <strong>top 5 best rated genres</strong>?</p></li><li><p>Categorize movies into three groups based on their runtime: "Short Film" (runtime &lt; 60 minutes), "Standard" (runtime &lt;150 minutes), and "Epic" (runtime &gt;= 150 minutes). Choose your favorite three movie genres and <strong>compare their runtime based on the runtime categories</strong>.</p></li><li><p>For each genre, calculate the <strong>average rating in each decade</strong>. Compare how the ratings changed over time for the genres <em>Action</em>, <em>Crime</em>, <em>Romance</em>, and <em>Music</em>.</p></li><li><p>Identify the '<strong>Hidden Gems</strong>' in our dataset: Movies with a high rating by a reliable group of viewers but typically overlooked by the mainstream audience. You want to <strong>reward high ratings</strong> and <strong>penalize high popularity</strong>. Calculate a custom <strong>Hidden Gem Score (HGS)</strong> for each movie. Which 25 movies rank highest on this custom index?<br><strong><em>Note:</em></strong> If you struggle to come up with a HGS, find inspiration from document analysis! TF-IDF uses a logarithmic penalty to prevent incredibly common words from drowning out the unique ones.</p></li></ol><p><strong>Author:</strong> Nur Sena Alici</p>

The Golden Era of Movies

Description:

You’ve joined a movie analytics team investigating audience preferences, genre dynamics, and the true value of highly-rated films on Letterboxd. Using the Letterboxd Movie Ratings dataset, your goal is to clean and transform movie data, uncover hidden audience trends, and identify which films punch above their popularity level.

Here are four questions your team lead wants you to answer:

  1. Which genre is among the top 5 most popular genres as well as among the top 5 best rated genres?

  2. Categorize movies into three groups based on their runtime: "Short Film" (runtime < 60 minutes), "Standard" (runtime <150 minutes), and "Epic" (runtime >= 150 minutes). Choose your favorite three movie genres and compare their runtime based on the runtime categories.

  3. For each genre, calculate the average rating in each decade. Compare how the ratings changed over time for the genres Action, Crime, Romance, and Music.

  4. Identify the 'Hidden Gems' in our dataset: Movies with a high rating by a reliable group of viewers but typically overlooked by the mainstream audience. You want to reward high ratings and penalize high popularity. Calculate a custom Hidden Gem Score (HGS) for each movie. Which 25 movies rank highest on this custom index?
    Note: If you struggle to come up with a HGS, find inspiration from document analysis! TF-IDF uses a logarithmic penalty to prevent incredibly common words from drowning out the unique ones.

Author: Nur Sena Alici

EXTRACT

The dataset is clean from missing values and anomalies.

KNIME Monthly Challenge - June 2026
TRANSFORM

Prepares data for the various questions of the challenge

VISUALISATION

Regroups all the tables/charts for the user.

CSV Reader
Cell Splitter
creates genres
Ungroup
Measures by genre
GroupBy
Measures by duration
GroupBy
top 5 ratings
Top k Row Filter
Filters my top 3 genres
Row Filter
Joiner
top 25 HGI
Top k Row Filter
top 5 pop
Top k Row Filter
Row Filter
Calculates decades, HGI and duration
Expression
Column Renamer
Measure by decades
GroupBy
Visualisation

Nodes

Extensions

Links