Icon

04. Data Mining

Data Mining
Activity II: Linear Regression - Read the weather.table - Split the data into rows up to 2016 (for training) and rows from 2017 on (for testing) - Train a linear regression model that predicts the AIR_TEMP as a function of all other parameters in the data set - Use the model to predict the temperature in 2017 and evaluate it with the Numeric Scorer- Optional: Calcuate the mean temperature per month on the training data Join the mean temperature to the test data set (2017) Use the Numeric Scorer to see if the easiest model is better than the Linear Regression Activity III: k-Means - Read the location_data.table file - Filter to entries from California (region_code = CA) - Train a k-means model with k=3. Use only position data for clustering (latitude and longitude) - Optional: Plot latitude and longitude in a view (OSM Map or Scatter Plot) and use that to help you visually optimize k Activity I: Decision Trees - Partition the fully joined data into training and test set (50%, Stratified Sampling). - Train a decision tree on the training set to predict Target. - Use the trained model to predict whether a user has upsell potential or not, i.e. whether the target value is 0 or 1. - Evaluate the quality of a model with the Scorer node. - What was the overall accuracy of your model? - Optional: - Evaluate accuracy and robustness of the model with the ROC Curve node. Locations_dataRead weather.table Table Reader Fully Joined Data Table Reader Activity II: Linear Regression - Read the weather.table - Split the data into rows up to 2016 (for training) and rows from 2017 on (for testing) - Train a linear regression model that predicts the AIR_TEMP as a function of all other parameters in the data set - Use the model to predict the temperature in 2017 and evaluate it with the Numeric Scorer- Optional: Calcuate the mean temperature per month on the training data Join the mean temperature to the test data set (2017) Use the Numeric Scorer to see if the easiest model is better than the Linear Regression Activity III: k-Means - Read the location_data.table file - Filter to entries from California (region_code = CA) - Train a k-means model with k=3. Use only position data for clustering (latitude and longitude) - Optional: Plot latitude and longitude in a view (OSM Map or Scatter Plot) and use that to help you visually optimize k Activity I: Decision Trees - Partition the fully joined data into training and test set (50%, Stratified Sampling). - Train a decision tree on the training set to predict Target. - Use the trained model to predict whether a user has upsell potential or not, i.e. whether the target value is 0 or 1. - Evaluate the quality of a model with the Scorer node. - What was the overall accuracy of your model? - Optional: - Evaluate accuracy and robustness of the model with the ROC Curve node. Locations_dataRead weather.table Table Reader Fully Joined Data Table Reader

Nodes

Extensions

Links