Icon

02_​Taxi_​Demand_​Prediction_​Training_​workflow

Taxi demand prediction training workflow

In this use case, we will use the NYC taxi dataset and a Random Forest to train a simple time series prediction model to predict taxi demand in the next hour based on data from past hours.

Given the large size of the dataset, we train and deploy the machine learning model of choice on a Spark cluster. The KNIME Big Data Extension allows you to run a KNIME workflow on the big data platform you prefer, via in-database processing or via Spark.

Training Workflow Taxi Demand PredictionBased on the NYC taxi dataset, this workflow uses a Random Forest to train a simple time series prediction model to predict taxi demand in the next hour based on data from pasthours. partitioninto training and test setload the Parquetdataset to SparkNode 1185Node 1186 View line plot Split by dateand time Spark Lag Column Spark NumericScorer Model Writer Find lag Parquet to Spark Spark Lag Column Find lag Path totraining set Create Local BigData Environment Spark Random ForestLearner (Regression) Spark Predictor(Regression) Training Workflow Taxi Demand PredictionBased on the NYC taxi dataset, this workflow uses a Random Forest to train a simple time series prediction model to predict taxi demand in the next hour based on data from pasthours. partitioninto training and test setload the Parquetdataset to SparkNode 1185Node 1186View line plot Split by dateand time Spark Lag Column Spark NumericScorer Model Writer Find lag Parquet to Spark Spark Lag Column Find lag Path totraining set Create Local BigData Environment Spark Random ForestLearner (Regression) Spark Predictor(Regression)

Nodes

Extensions

Links