Icon

LAB01_​Regression_​LabScheme_​Taxi

Linear Regression

Linear regression: predict house price.

- Partition data into training and test set
- Train a linear regression model
- Apply the trained model to the test set
- Handle missing values
- Evaluate the model performance with the Numeric Scorer node

URL: Guide to Intelligent Data Science https://www.datascienceguide.org/

Exercise: RegressionIn this exercise we will predict the total amount required for taxi trips in NYC using a regression model, and we will investigatethe impact of some features like date and time, distance, duration and average speed.1) Read data from the .csv file yellow_tripdata_2015-09 - Perform a data cleaning step by means of the provided "Taxi data first cleaning" component2) Perform data partition by means of the Partitioning node; - Top port should have at least 70% of the rows - Draw randomly such rows3) Feed a Regression Learner with the top output port of the Partitioning node - Select the total_amount column as target4) Add a Regression Predictor to make predicitons on the test set previously partitioned5) Add Numeric Scorer to analyze the Regression Predictor output - Reference Column: the target column - Predicted Column: the column created by the predictor nodeThrough the nodes within the blue frame, produce plots to visualize the regression model predictions superimposed to thedata available for learning. Just for a 2D plot of the results in form of prediction vs. selected feature Taxi data from .csv file"Mark" values for plotIt takes as inputthe output of theRegression PredictorCombine dataand predictionsColor settingsfor the plotIt takes as inputthe test data"Mark" values for plotPredictionsubsetCombinereduced tablesDatasubsetThe x-axis valuesmust be set Taxi data firstcleaning File Reader Column Expressions ConstantValue Column Concatenate Color Manager ConstantValue Column Column Expressions Row Splitter Concatenate Row Splitter Scatter Plot(Plotly) Exercise: RegressionIn this exercise we will predict the total amount required for taxi trips in NYC using a regression model, and we will investigatethe impact of some features like date and time, distance, duration and average speed.1) Read data from the .csv file yellow_tripdata_2015-09 - Perform a data cleaning step by means of the provided "Taxi data first cleaning" component2) Perform data partition by means of the Partitioning node; - Top port should have at least 70% of the rows - Draw randomly such rows3) Feed a Regression Learner with the top output port of the Partitioning node - Select the total_amount column as target4) Add a Regression Predictor to make predicitons on the test set previously partitioned5) Add Numeric Scorer to analyze the Regression Predictor output - Reference Column: the target column - Predicted Column: the column created by the predictor nodeThrough the nodes within the blue frame, produce plots to visualize the regression model predictions superimposed to thedata available for learning. Just for a 2D plot of the results in form of prediction vs. selected feature Taxi data from .csv file"Mark" values for plotIt takes as inputthe output of theRegression PredictorCombine dataand predictionsColor settingsfor the plotIt takes as inputthe test data"Mark" values for plotPredictionsubsetCombinereduced tablesDatasubsetThe x-axis valuesmust be set Taxi data firstcleaning File Reader Column Expressions ConstantValue Column Concatenate Color Manager ConstantValue Column Column Expressions Row Splitter Concatenate Row Splitter Scatter Plot(Plotly)

Nodes

Extensions

Links