Icon

LAB01_​Regression_​LabScheme_​Taxi_​Arman_​Moradi_​final

Linear Regression

Linear regression: predict house price.

- Partition data into training and test set
- Train a linear regression model
- Apply the trained model to the test set
- Handle missing values
- Evaluate the model performance with the Numeric Scorer node

URL: Guide to Intelligent Data Science https://www.datascienceguide.org/

Exercise: RegressionIn this exercise we will predict the total amount required for taxi trips in NYC using a regression model, and we will investigate the impact of some features like date and time, distance, duration and average speed.1) Read data from the .csv file yellow_tripdata_2015-09 - Perform a data cleaning step by means of the provided "Taxi data first cleaning" component2) Perform data partition by means of the Partitioning node; - Top port should have at least 70% of the rows - Draw randomly such rows3) Feed a Regression Learner with the top output port of the Partitioning node - Select the total_amount column as target4) Add a Regression Predictor to make predicitons on the test set previously partitioned5) Add Numeric Scorer to analyze the Regression Predictor output - Reference Column: the target column - Predicted Column: the column created by the predictor nodeThrough the nodes within the blue frame, produce plots to visualize the regression model predictions superimposed to the data available for learning. Just for a 2D plot of the results in form of prediction vs. selected feature "Mark" values for plotIt takes as inputthe output of theRegression PredictorCombine dataand predictionsColor settingsfor the plotIt takes as inputthe test dataPredictionsubsetCombinereduced tablesDatasubsetThe x-axis valuesmust be setNode 396Node 397Node 398Node 399Node 400"Mark" values for plotNode 402Node 419Node 420Node 421Node 422Node 423Node 424Node 425Node 426Node 427Node 428Node 429Node 430Node 431Node 432Node 433Node 434It takes as inputthe output of theRegression PredictorPredictionsubset"Mark" values for plotThe x-axis valuesmust be setCombinereduced tablesDatasubset"Mark" values for plotColor settingsfor the plotCombine dataand predictionsIt takes as inputthe test dataThe x-axis valuesmust be setNode 454Node 455Node 456Node 457The x-axis valuesmust be setColor settingsfor the plotIt takes as inputthe test dataIt takes as inputthe output of theRegression Predictor"Mark" values for plot"Mark" values for plotCombine dataand predictionsPredictionsubsetDatasubsetCombinereduced tablesNode 469Node 472Node 473Node 474Node 475 Taxi data firstcleaning Column Expressions ConstantValue Column Concatenate Color Manager ConstantValue Column Row Splitter Concatenate Row Splitter Scatter Plot(Plotly) CSV Reader Partitioning PolynomialRegression Learner RegressionPredictor Numeric Scorer Column Expressions RegressionPredictor Numeric Scorer Numeric Scorer RegressionPredictor Linear RegressionLearner Partitioning Partitioning Numeric Scorer Linear RegressionLearner Partitioning CSV Reader Linear RegressionLearner RegressionPredictor Numeric Scorer RegressionPredictor Linear RegressionLearner Partitioning Taxi data firstcleaning ConstantValue Column Row Splitter Column Expressions Scatter Plot(Plotly) Concatenate Row Splitter Column Expressions Color Manager Concatenate ConstantValue Column Scatter_Component Scatter Plot(Plotly) PolynomialRegression Learner Partitioning Numeric Scorer RegressionPredictor Taxi data firstcleaning Scatter Plot(Plotly) Color Manager ConstantValue Column ConstantValue Column Column Expressions Column Expressions Concatenate Row Splitter Row Splitter Concatenate CSV Reader Partitioning Numeric Scorer Linear RegressionLearner RegressionPredictor Exercise: RegressionIn this exercise we will predict the total amount required for taxi trips in NYC using a regression model, and we will investigate the impact of some features like date and time, distance, duration and average speed.1) Read data from the .csv file yellow_tripdata_2015-09 - Perform a data cleaning step by means of the provided "Taxi data first cleaning" component2) Perform data partition by means of the Partitioning node; - Top port should have at least 70% of the rows - Draw randomly such rows3) Feed a Regression Learner with the top output port of the Partitioning node - Select the total_amount column as target4) Add a Regression Predictor to make predicitons on the test set previously partitioned5) Add Numeric Scorer to analyze the Regression Predictor output - Reference Column: the target column - Predicted Column: the column created by the predictor nodeThrough the nodes within the blue frame, produce plots to visualize the regression model predictions superimposed to the data available for learning. Just for a 2D plot of the results in form of prediction vs. selected feature "Mark" values for plotIt takes as inputthe output of theRegression PredictorCombine dataand predictionsColor settingsfor the plotIt takes as inputthe test dataPredictionsubsetCombinereduced tablesDatasubsetThe x-axis valuesmust be setNode 396Node 397Node 398Node 399Node 400"Mark" values for plotNode 402Node 419Node 420Node 421Node 422Node 423Node 424Node 425Node 426Node 427Node 428Node 429Node 430Node 431Node 432Node 433Node 434It takes as inputthe output of theRegression PredictorPredictionsubset"Mark" values for plotThe x-axis valuesmust be setCombinereduced tablesDatasubset"Mark" values for plotColor settingsfor the plotCombine dataand predictionsIt takes as inputthe test dataThe x-axis valuesmust be setNode 454Node 455Node 456Node 457The x-axis valuesmust be setColor settingsfor the plotIt takes as inputthe test dataIt takes as inputthe output of theRegression Predictor"Mark" values for plot"Mark" values for plotCombine dataand predictionsPredictionsubsetDatasubsetCombinereduced tablesNode 469Node 472Node 473Node 474Node 475 Taxi data firstcleaning Column Expressions ConstantValue Column Concatenate Color Manager ConstantValue Column Row Splitter Concatenate Row Splitter Scatter Plot(Plotly) CSV Reader Partitioning PolynomialRegression Learner RegressionPredictor Numeric Scorer Column Expressions RegressionPredictor Numeric Scorer Numeric Scorer RegressionPredictor Linear RegressionLearner Partitioning Partitioning Numeric Scorer Linear RegressionLearner Partitioning CSV Reader Linear RegressionLearner RegressionPredictor Numeric Scorer RegressionPredictor Linear RegressionLearner Partitioning Taxi data firstcleaning ConstantValue Column Row Splitter Column Expressions Scatter Plot(Plotly) Concatenate Row Splitter Column Expressions Color Manager Concatenate ConstantValue Column Scatter_Component Scatter Plot(Plotly) PolynomialRegression Learner Partitioning Numeric Scorer RegressionPredictor Taxi data firstcleaning Scatter Plot(Plotly) Color Manager ConstantValue Column ConstantValue Column Column Expressions Column Expressions Concatenate Row Splitter Row Splitter Concatenate CSV Reader Partitioning Numeric Scorer Linear RegressionLearner RegressionPredictor

Nodes

Extensions

Links