Icon

Movie Recommendation Engine with Spark Collaborative Filtering

Movie Recommendation Engine with Spark Collaborative Filtering 1. Create local Spark Context 2. Read ratings.csv and movies.csv from movie-lens dataset into Spark (https://grouplens.org/datasets/movielens/) 3. Ask user for rating on 20 random movies to build user profile and include in training set 4.Train Spark Collaborative Filtering Learner (Alternating Least Squares) algorithm 5. Apply model to all other movies unrated by user 6. Display recommendation results for user
Build current user profile. Ask user (ID=9999) to rate 20 random movies to build user profile
Training. Train Model on Training Set. Training set = 80% original movies + 20 movies rated by user
Testing. Evaluate Model on Test Set. Test set = 20% original movies
Deployment. Create Predictions for current user. Movies with top 10 predicted ratings are .recommended
Rated movies to training set
Unrated movies to deployment
Replace CSVReader, Row Sampling and Table to Spark with CSV to Spark to take advantage of Spark distribution (it will read the entire dataset much faster) *The current workflow configuration reads only 2.5% of the dataset*
Sample 2.5%
Row Sampler
training set from original training set + user ratings
Spark Concatenate
ask user to rate 20 random movies
Ask User for Movie Ratings
user ratings to Spark
Table to Spark
add timestamp = 123 userID = 999999
add fields
calculate numerical error between oiginal ratings and predicted ratings
Spark Numeric Scorer
rest of movies stays unrated
no rating
all other movies unrated by user to Spark
Table to Spark
remove NaN and missing preditions
Spark Missing Value
80% - 20%
Spark Partitioning
generate ratings for user on unrated movies
Spark Predictor (MLlib)
display on WebPortal
Display Recommendations
create local Spark Context
Create Local Big Data Environment
predicted ratings back to KNIME
Spark to Table
top 20 movies
Row Splitter (deprecated)
sort recommendations and extract top 20 movies
Top 20 recommended movies
Read .csv from relative path
CSV Reader
Read .csv from absolute path
CSV to Spark
train ALS model
Spark Collaborative Filtering Learner (MLlib)
generate rating predictions on test set
Spark Predictor (MLlib)
List of movies
CSV Reader
Pull table into spark
Table to Spark

Nodes

Extensions

Links