Icon

04. Data Mining - solution

Data Mining - Solution
Activity II: Linear Regression - Read weather.table data - Split the data into rows up to 2016 (training set) and rows from 2017 on (test set) - Train a linear regression model that predicts the AIR_TEMP as a function of all other features in the dataset - Use the model to predict the temperature in 2017 and evaluate the model with the Numeric Scorer node- Optional: 1. Calculate the mean temperature per month in the training data2. Join the mean temperature per month to the test set3. Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than the Linear Regression model Activity III: k-Means - Read location_data.table data - Filter the data to entries from California (region_code = CA) - Perform k-means clustering with k=3. Use only latitude and longitude for clustering. - Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and use the view to visually optimize k Activity I: Decision Trees - Partition the fully joined data into a training and test set (50%, Stratified Sampling on Target) - Train a Decision Tree on the training set to predict Target - Use the trained model to predict Target in the test set - Evaluate the accuracy of the model with the Scorer node - What is the overall accuracy of your model? - Optional: evaluate the accuracy and robustness of the model with the ROC Curve node Mean temperatureper monthCombine with test dataPredict AIR_TEMPIn CaliforniaLocations_datasplit 2017Read weather.tableNode 312Combine bothevaluations Decision TreePredictor Partitioning GroupBy Column Rename Joiner Linear RegressionLearner Row Filter OSM Map View Color Manager DecisionTree Learner ROC Curve Table Reader k-Means Row Splitter RegressionPredictor Scatter Plot Numeric Scorer Numeric Scorer Table Reader Scorer Column Appender Fully Joined Data Activity II: Linear Regression - Read weather.table data - Split the data into rows up to 2016 (training set) and rows from 2017 on (test set) - Train a linear regression model that predicts the AIR_TEMP as a function of all other features in the dataset - Use the model to predict the temperature in 2017 and evaluate the model with the Numeric Scorer node- Optional: 1. Calculate the mean temperature per month in the training data2. Join the mean temperature per month to the test set3. Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than the Linear Regression model Activity III: k-Means - Read location_data.table data - Filter the data to entries from California (region_code = CA) - Perform k-means clustering with k=3. Use only latitude and longitude for clustering. - Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and use the view to visually optimize k Activity I: Decision Trees - Partition the fully joined data into a training and test set (50%, Stratified Sampling on Target) - Train a Decision Tree on the training set to predict Target - Use the trained model to predict Target in the test set - Evaluate the accuracy of the model with the Scorer node - What is the overall accuracy of your model? - Optional: evaluate the accuracy and robustness of the model with the ROC Curve node Mean temperatureper monthCombine with test dataPredict AIR_TEMPIn CaliforniaLocations_datasplit 2017Read weather.tableNode 312Combine bothevaluations Decision TreePredictor Partitioning GroupBy Column Rename Joiner Linear RegressionLearner Row Filter OSM Map View Color Manager DecisionTree Learner ROC Curve Table Reader k-Means Row Splitter RegressionPredictor Scatter Plot Numeric Scorer Numeric Scorer Table Reader Scorer Column Appender Fully Joined Data

Nodes

Extensions

Links