0 ×

Seasonality_​Removal

Workflow

Original data First order difference Daily seasonality removal Load the data Taxi Demand Prediction with Spark Random ForestThis workflow uses a subset of the popular NYC taxi dataset and Spark Random Forest node to train a simple time series prediction model to predict taxi demand in the next hour based on data from the past hours.The input data is the number of NYC taxi trips per hour per day in the year 2017. Our goal is to predict taxi demand at a certain hour, and in order to do that we need the taxi demands in the previous N hours. The step to create Nlagged columns is done in the Spark Lag Column metanode. The Find lag metanode creates a correlation matrix between the lagged columns where we can inspect the matrix visually to see the correlation, moreover it alsoautomatically finds the value N which has the highest correlation factor with the original column of total number of trips (taxi demand) per hour. A Random Forest model is then trained using those N lagged columns, with twoadditional temporal features (hour of day, and day of week). We experimented with first order differencing and seasonality removal, which are a common practice to do in time series prediction, to see if they would improve our simple model. Based on the results, it seems that for regular timeseries often a highly parametric algorithm like a Random Forest produces good results even if trained on the full time series, without seasonality removal. Weekly seasonality removal Visualization partitioninto training and test settrain the modelload the Parquetdataset to Sparkpredictthe test settrain the modelpredictthe test setrecomputepredicted trip counttrain the modelpredictthe test setrecomputepredicted trip countpartitioninto training and test setpartitioninto training and test setpartitioninto training and test setrecomputepredicted trip countpredictthe test settrain the model View line plotPrediction vs Expected Split by dateand time Spark NumericScorer Find lag Create Local BigData Environment Spark RandomForests Learner Parquet to Spark Spark Predictor Path totraining set Spark RandomForests Learner Spark Predictor Spark NumericScorer Spark SQL Query Spark Lag Column Inspect the dataset Spark RandomForests Learner Spark Predictor Spark SQL Query Spark NumericScorer Calculate firstorder difference Remove dailyseasonality Split by dateand time Split by dateand time View line plotPrediction vs Expected View line plotPrediction vs Expected Remove weeklyseasonality Split by dateand time Spark NumericScorer Spark SQL Query Spark Predictor Spark RandomForests Learner View line plotPrediction vs Expected Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column Original data First order difference Daily seasonality removal Load the data Taxi Demand Prediction with Spark Random ForestThis workflow uses a subset of the popular NYC taxi dataset and Spark Random Forest node to train a simple time series prediction model to predict taxi demand in the next hour based on data from the past hours.The input data is the number of NYC taxi trips per hour per day in the year 2017. Our goal is to predict taxi demand at a certain hour, and in order to do that we need the taxi demands in the previous N hours. The step to create Nlagged columns is done in the Spark Lag Column metanode. The Find lag metanode creates a correlation matrix between the lagged columns where we can inspect the matrix visually to see the correlation, moreover it alsoautomatically finds the value N which has the highest correlation factor with the original column of total number of trips (taxi demand) per hour. A Random Forest model is then trained using those N lagged columns, with twoadditional temporal features (hour of day, and day of week). We experimented with first order differencing and seasonality removal, which are a common practice to do in time series prediction, to see if they would improve our simple model. Based on the results, it seems that for regular timeseries often a highly parametric algorithm like a Random Forest produces good results even if trained on the full time series, without seasonality removal. Weekly seasonality removal Visualization partitioninto training and test settrain the modelload the Parquetdataset to Sparkpredictthe test settrain the modelpredictthe test setrecomputepredicted trip counttrain the modelpredictthe test setrecomputepredicted trip countpartitioninto training and test setpartitioninto training and test setpartitioninto training and test setrecomputepredicted trip countpredictthe test settrain the model View line plotPrediction vs Expected Split by dateand time Spark NumericScorer Find lag Create Local BigData Environment Spark RandomForests Learner Parquet to Spark Spark Predictor Path totraining set Spark RandomForests Learner Spark Predictor Spark NumericScorer Spark SQL Query Spark Lag Column Inspect the dataset Spark RandomForests Learner Spark Predictor Spark SQL Query Spark NumericScorer Calculate firstorder difference Remove dailyseasonality Split by dateand time Split by dateand time View line plotPrediction vs Expected View line plotPrediction vs Expected Remove weeklyseasonality Split by dateand time Spark NumericScorer Spark SQL Query Spark Predictor Spark RandomForests Learner View line plotPrediction vs Expected Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column Spark Lag Column

Download

Get this workflow from the following link: Download

Nodes

Seasonality_​Removal consists of the following 310 nodes(s):

Plugins

Seasonality_​Removal contains nodes provided by the following 8 plugin(s):