Icon

05 Machine Learning LAB

Machine Learning - Exercise

This workflow shows a hands-on exercise in the L1-DS Introduction to KNIME Analytics Platform for Data Scientists - Basics course

Task 1: Linear Regression1. Read the adult_joined.table file by executing the Table Reader and Missing Value nodes2. Partition the data into a training set (75 %) and test set (25 %). Draw randomly.3. Train a linear regression model on the training set to predict the weekly working hours. Use all other columns except the"ID" column for the prediction.4. Apply the model to the test set5. Evaluate the performance of the linear regression model with the Numeric Scorer node Task 2: Decision Tree Model1. Use the same dataset as in task 1 and partition it into a training set (75%) and a test set (25%). Apply stratified samplingto the income column.2. Train a decision tree model on the training set to predict whether or not a person earns more than 50K per year3. Apply the model to the test set4. Evaluate the accuracy of the model with scoring metrics5. Open the configuration dialog of the Scorer (JavaScript) node and exclude those statistics from the class predictionstatistics table that are also present in the confusion matrix. Display the number of rows in the confusion matrix. 6. Evaluate the performance of the model with an ROC curve7. OPTIONAL: Try out other parameter settings to reach a better performance. For example, change the quality measure,pruning method, or minimum number of records. I tried my best to make the data makesense in this task. I didn't quiteunderstand the data I was looking atwhen I did this for the Course Workflow,same thing is happening here. Top: train set (75%)Bottom: test set (25%)Random samplingTrain the modelto predict hours per weekApply the modelto the test setEvaluate modelperformancescoring metricsApply the modelto the test setTop: train set (75%)Bottom: test set (25%)Stratified sampling on incomeTrain the modelto predict incomeProvided FilePartitioning Linear RegressionLearner RegressionPredictor Numeric Scorer Missing Value Scorer (JavaScript) Decision TreePredictor ROC Curve Partitioning DecisionTree Learner Excel Reader Task 1: Linear Regression1. Read the adult_joined.table file by executing the Table Reader and Missing Value nodes2. Partition the data into a training set (75 %) and test set (25 %). Draw randomly.3. Train a linear regression model on the training set to predict the weekly working hours. Use all other columns except the"ID" column for the prediction.4. Apply the model to the test set5. Evaluate the performance of the linear regression model with the Numeric Scorer node Task 2: Decision Tree Model1. Use the same dataset as in task 1 and partition it into a training set (75%) and a test set (25%). Apply stratified samplingto the income column.2. Train a decision tree model on the training set to predict whether or not a person earns more than 50K per year3. Apply the model to the test set4. Evaluate the accuracy of the model with scoring metrics5. Open the configuration dialog of the Scorer (JavaScript) node and exclude those statistics from the class predictionstatistics table that are also present in the confusion matrix. Display the number of rows in the confusion matrix. 6. Evaluate the performance of the model with an ROC curve7. OPTIONAL: Try out other parameter settings to reach a better performance. For example, change the quality measure,pruning method, or minimum number of records. I tried my best to make the data makesense in this task. I didn't quiteunderstand the data I was looking atwhen I did this for the Course Workflow,same thing is happening here. Top: train set (75%)Bottom: test set (25%)Random samplingTrain the modelto predict hours per weekApply the modelto the test setEvaluate modelperformancescoring metricsApply the modelto the test setTop: train set (75%)Bottom: test set (25%)Stratified sampling on incomeTrain the modelto predict incomeProvided FilePartitioning Linear RegressionLearner RegressionPredictor Numeric Scorer Missing Value Scorer (JavaScript) Decision TreePredictor ROC Curve Partitioning DecisionTree Learner Excel Reader

Nodes

Extensions

Links