Icon

Collaborative Filtering - Movie Recommendation Model

<p><strong>Movie Recommendation System</strong></p><p><strong><em>Author(s)</em></strong></p><ul><li><p>Tylah Jenkins (14248037)</p></li><li><p>Dayhe Kwon ()</p></li><li><p>Aabhii Taneja ()</p></li><li><p>Harshit Setia ()</p></li></ul><p><strong>Date: </strong>01/08/2025</p><p></p><ol><li><p><strong>Overview &amp; Purpose</strong></p></li></ol><p>This workflow builds a simple movie recommendation system using the MovieLens 100k dataset to predict user preferences and movie ratings.</p><ol start="2"><li><p><strong>Data Used</strong></p></li></ol><p>Uses MovieLens 100k dataset files: u.data (for the final working model), uX.base/test files if running the hyperparameter tuning, and u.user/u.item if running the hybrid model.</p><p><strong>Required Location: </strong>On the user's Desktop is the most convenient location.</p><ol start="3"><li><p><strong>Methodology:</strong></p></li></ol><p>Workflow steps include:</p><ul><li><p>Data loading (ingestion) and initial inspection</p></li><li><p>Cleaning &amp; preprocessing (column renaming, cleaning was conducted and trusted from source)</p></li><li><p>Data splitting (Partitioned - 80/20 train/test split) / Test validation set checks</p></li><li><p>Spark environment initialisation and required transformations</p></li><li><p>Model training/prediction (Collaborative Filtering)</p></li><li><p>Output generation (predicted vs actual)</p></li><li><p>Evaluation &amp; Analysis</p></li></ul><ol start="4"><li><p><strong>How to run the Workflow</strong></p></li></ol><ul><li><p>Place data files in the specified location.</p></li><li><p>Open workflow in KNIME.</p></li><li><p>Drag mouse over all nodes in the 'Final Working Model' section and select execute all.</p></li></ul><ol start="5"><li><p><strong>Outputs</strong></p></li></ol><ul><li><p>Actual vs. Predicted ratings.</p></li><li><p>Table of Top 10 Recommendations table for each user (unrated movies). </p></li><li><p>Evaluation metrics table: RMSE and Recall.</p></li></ul><ol start="6"><li><p><strong>Evaluation Metrics</strong></p></li></ol><p>System evaluated using RMSD and Recall metrics on the u.data set (training/test split).</p><ol start="7"><li><p><strong>Assumptions</strong></p></li></ol><ul><li><p>Data files are complete, unaltered, and tab-separated.</p></li><li><p>Input data includes required pre-cleaning (users &gt;= 20 ratings, with complete demographics).</p></li><li><p>Rating scale is a 1-5.</p></li><li><p>Data provided is reliable and credible.</p></li></ul>

URL: Social Information Network Analysis - Assignment 2 Brief https://drive.google.com/file/d/1eBkvvoXW0e3ZKEg91z5mIXKacE1m-Nc4/view?usp=sharing

Nodes

Extensions

Links