Icon

10_​Recommendation_​Engine_​w_​Spark_​Collaborative_​Filtering

Movie Recommendation Engine with Spark Collaborative Filtering
Movie Recommendation Engine with Spark Collaborative Filtering 1. Create local Spark Context 2. Read ratings.csv and movies.csv from movie-lens dataset into Spark (https://grouplens.org/datasets/movielens/) 3. Ask user for rating on 20 random movies to build user profile and include in training set 4.Train Spark Collaborative Filtering Learner (Alternating Least Squares) algorithm 5. Apply model to all other movies unrated by user 6. Display recommendation results for user Build current user profile.Ask user (ID=9999) to rate 20 random movies to build user profile Training. Train Model on Training Set.Training set = 80% original movies + 20 moviesrated by user Testing. Evaluate Model on Test Set.Test set = 20% original movies Deployment. Create Predictions for current user.Movies with top 10 predicted ratings are .recommended Rated movies totraining set Unrated movies todeployment Replace CSV Reader, Row Sampling and Table to Sparkwith CSV to Spark to take advantage of Spark distribution (itwill read the entire dataset much faster)*The current workflow configuration reads only 2.5% of thedataset* trainALS modelgenerate ratingpredictions on test set80% - 20%generate ratings for useron unrated moviespredictedratings backto KNIMEtop 20 moviesuser ratingsto Sparktraining set fromoriginal training set +user ratingsall other moviesunrated by user to Sparkadd timestamp = 123userID = 999999ask user to rate20 random moviesrest of moviesstays unratedcalculate numerical error betweenoiginal ratingsand predicted ratingsremove NaN and missing preditionsdisplay on WebPortalsort recommendationsand extract top 20moviesPull table into sparkSample 2.5%create localSpark ContextRead .csv from absolute pathRead .csv from relative pathList of moviesSpark CollaborativeFiltering Learner (MLlib) Spark Predictor(MLlib) Spark Partitioning Spark Predictor(MLlib) Spark to Table Row Splitter Table to Spark Spark Concatenate Table to Spark add fields Ask User forMovie Ratings no rating Spark NumericScorer Spark Missing Value DisplayRecommendations Top 20 recommendedmovies Table to Spark Row Sampling Create Local BigData Environment CSV to Spark CSV Reader CSV Reader Movie Recommendation Engine with Spark Collaborative Filtering 1. Create local Spark Context 2. Read ratings.csv and movies.csv from movie-lens dataset into Spark (https://grouplens.org/datasets/movielens/) 3. Ask user for rating on 20 random movies to build user profile and include in training set 4.Train Spark Collaborative Filtering Learner (Alternating Least Squares) algorithm 5. Apply model to all other movies unrated by user 6. Display recommendation results for user Build current user profile.Ask user (ID=9999) to rate 20 random movies to build user profile Training. Train Model on Training Set.Training set = 80% original movies + 20 moviesrated by user Testing. Evaluate Model on Test Set.Test set = 20% original movies Deployment. Create Predictions for current user.Movies with top 10 predicted ratings are .recommended Rated movies totraining set Unrated movies todeployment Replace CSV Reader, Row Sampling and Table to Sparkwith CSV to Spark to take advantage of Spark distribution (itwill read the entire dataset much faster)*The current workflow configuration reads only 2.5% of thedataset* trainALS modelgenerate ratingpredictions on test set80% - 20%generate ratings for useron unrated moviespredictedratings backto KNIMEtop 20 moviesuser ratingsto Sparktraining set fromoriginal training set +user ratingsall other moviesunrated by user to Sparkadd timestamp = 123userID = 999999ask user to rate20 random moviesrest of moviesstays unratedcalculate numerical error betweenoiginal ratingsand predicted ratingsremove NaN and missing preditionsdisplay on WebPortalsort recommendationsand extract top 20moviesPull table into sparkSample 2.5%create localSpark ContextRead .csv from absolute pathRead .csv from relative pathList of moviesSpark CollaborativeFiltering Learner (MLlib) Spark Predictor(MLlib) Spark Partitioning Spark Predictor(MLlib) Spark to Table Row Splitter Table to Spark Spark Concatenate Table to Spark add fields Ask User forMovie Ratings no rating Spark NumericScorer Spark Missing Value DisplayRecommendations Top 20 recommendedmovies Table to Spark Row Sampling Create Local BigData Environment CSV to Spark CSV Reader CSV Reader

Nodes

Extensions

Links