Icon

Taxi_​Time_​Series_​Prediction

Taxi demand prediction training workflow

In this use case, we will use the NYC taxi dataset and a Random Forest to train a simple time series prediction model to predict taxi demand in the next hour based on data from past hours.

Given the large size of the dataset, we train and deploy the machine learning model of choice on a Spark cluster. The KNIME Big Data Extension allows you to run a KNIME workflow on the big data platform you prefer, via in-database processing or via Spark.

Training Workflow
Taxi Demand Prediction Based on the NYC taxi dataset, this workflow uses a Random Forest to train a simple time series prediction model to predict taxi demand in the next hour based on data from past hours.
predictthe test set
Spark Predictor (MLlib)
Find lag
Find lag
Spark Lag Column
View line plot
Path to training set
Spark Lag Column
train the model
Spark Random Forests Learner (MLlib)
partitioninto training and test set
Split by date and time
load the Parquet dataset to Spark
Parquet to Spark
Model Writer
Spark Numeric Scorer
Create Local Big Data Environment

Nodes

Extensions

Links