Icon

Midterm - LogReg, SVM, NB - Exercise

Midterm

In this scenario, you will build a single workflow that utilizes 3 parallel classification techniques (logistic regression, SVM, and Naive Bayes) to classify wine data by color (red or white). The 'wine.xlsx' file contains 12 physicochemical measurement columns and 1 label column (color).

Unlike weekly assignments, you will not be provided explicit instructions. Follow best practices from previous assignments and lecture videos. Some items that will be considered during grading (listed here in no particular order):

  • Single input file shared by all workflows (workflow should begin with 1 node and end with 1 node)

  • Data should be split on a 75/25 split, use the correct sampling strategy, random seed value set to '12345', and normalized

  • Applicable model learner nodes should use random seed values set to '12345' and 'red' as the reference category

  • Predictor nodes should modify the prediction column name by adding the classification type (logreg, SVM, or NB)

  • For all ROC Curve nodes, use 'red' as the positive class

  • Model metrics should be consolidated into a single node (column names: Metrics, LogReg, SVM, NB)

  • Nodes should be neatly and logically positioned

  • A workflow with a yellow border should be added, in which you explain the preferred model and how you know it is best

  • The submitted file should be a .knar file

Nodes

  • No nodes found

Extensions

  • No modules found

Links