Icon

04. Data Mining

There has been no title set for this workflow's metadata.

Data Mining

Exercise 4 for KNIME User Training
- Training a Decision Tree to predict a nominal target column
- Evaluate the model performance using scoring metrics for a classification model and an ROC Curve
- Train a linear regression model to predict a numeric target column
- Evaluate the performance of the regression model
- Cluster data based on latitude and longitude
- Visualize clusters in a scatter plot and on a map

URL: KNIME Analytics: a Review https://youtu.be/rvTHhgCKQiw
URL: The Learner-Predictor Construct https://youtu.be/bKrJkdPvpeA
URL: Drag & Drop Data Science https://youtu.be/n8HbUUc51fc
URL: Behind the Scenes of the Decision Tree with KNIME https://youtu.be/qB8HZpwqPEg
URL: Decision Tree Learner Node: Algorithm Settings https://youtu.be/CSwM92yTrJw
URL: Evaluating Classification Model Performance with the Scorer Node https://youtu.be/T61O92RI60Y
URL: ROC Curve of a Classification Model https://youtu.be/bDxAErDccNM
URL: Slides (KNIME Analytics Platform Course) https://www.knime.com/form/material-download-registration

Activity III: k-Means - Read location_data.table data - Filter the data to entries from California (region_code = CA) - Perform k-means clustering with k=3. Use only latitude and longitude for clustering. - Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and use the view to visually optimize k Activity II: Linear Regression - Read weather.table data - Split the data into rows up to 2016 (training set) and rows from 2017 on (test set) - Train a linear regression model that predicts the AIR_TEMP as a function of all other features in the dataset - Use the model to predict the temperature in 2017 and evaluate the model with the Numeric Scorer node- Optional: 1. Calculate the mean temperature per month in the training data2. Join the mean temperature per month to the test set3. Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than the Linear Regressionmodel Activity I: Decision Trees - Partition the fully joined data into a training and test set (50%, Stratified Sampling on Target) - Train a Decision Tree on the training set to predict Target - Use the trained model to predict Target in the test set - Evaluate the accuracy of the model with the Scorer node - What is the overall accuracy of your model? - Optional: evaluate the accuracy and robustness of the model with the ROC Curve node Read weather.tableLocations_data Fully Joined Data Table Reader Table Reader Activity III: k-Means - Read location_data.table data - Filter the data to entries from California (region_code = CA) - Perform k-means clustering with k=3. Use only latitude and longitude for clustering. - Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and use the view to visually optimize k Activity II: Linear Regression - Read weather.table data - Split the data into rows up to 2016 (training set) and rows from 2017 on (test set) - Train a linear regression model that predicts the AIR_TEMP as a function of all other features in the dataset - Use the model to predict the temperature in 2017 and evaluate the model with the Numeric Scorer node- Optional: 1. Calculate the mean temperature per month in the training data2. Join the mean temperature per month to the test set3. Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than the Linear Regressionmodel Activity I: Decision Trees - Partition the fully joined data into a training and test set (50%, Stratified Sampling on Target) - Train a Decision Tree on the training set to predict Target - Use the trained model to predict Target in the test set - Evaluate the accuracy of the model with the Scorer node - What is the overall accuracy of your model? - Optional: evaluate the accuracy and robustness of the model with the ROC Curve node Read weather.tableLocations_data Fully Joined Data Table Reader Table Reader

Nodes

Extensions

Links