Icon

04. Data Mining

Data Mining
Activity III: k-Means - Read location_data.table data - Filter the data to entries from California (region_code = CA) - Perform k-means clustering with k=3. Use only latitude and longitude for clustering. - Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and use the view to visually optimize k Activity II: Linear Regression - Read weather.table data - Split the data into rows up to 2016 (training set) and rows from 2017 on (test set) - Train a linear regression model that predicts the AIR_TEMP as a function of all other features in the dataset - Use the model to predict the temperature in 2017 and evaluate the model with the Numeric Scorer node- Optional: 1. Calculate the mean temperature per month in the training data2. Join the mean temperature per month to the test set3. Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than the Linear Regressionmodel Activity I: Decision Trees - Partition the fully joined data into a training and test set (50%, Stratified Sampling on Target) - Train a Decision Tree on the training set to predict Target - Use the trained model to predict Target in the test set - Evaluate the accuracy of the model with the Scorer node - What is the overall accuracy of your model? - Optional: evaluate the accuracy and robustness of the model with the ROC Curve node Read weather.tableLocations_data Fully Joined Data Table Reader Table Reader Activity III: k-Means - Read location_data.table data - Filter the data to entries from California (region_code = CA) - Perform k-means clustering with k=3. Use only latitude and longitude for clustering. - Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and use the view to visually optimize k Activity II: Linear Regression - Read weather.table data - Split the data into rows up to 2016 (training set) and rows from 2017 on (test set) - Train a linear regression model that predicts the AIR_TEMP as a function of all other features in the dataset - Use the model to predict the temperature in 2017 and evaluate the model with the Numeric Scorer node- Optional: 1. Calculate the mean temperature per month in the training data2. Join the mean temperature per month to the test set3. Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than the Linear Regressionmodel Activity I: Decision Trees - Partition the fully joined data into a training and test set (50%, Stratified Sampling on Target) - Train a Decision Tree on the training set to predict Target - Use the trained model to predict Target in the test set - Evaluate the accuracy of the model with the Scorer node - What is the overall accuracy of your model? - Optional: evaluate the accuracy and robustness of the model with the ROC Curve node Read weather.tableLocations_data Fully Joined Data Table Reader Table Reader

Nodes

Extensions

Links