Icon

Predicting Coffee Quality

<p><strong>Predicting Coffee Quality</strong></p><p>This workflow demonstrates how to implement a beginner-friendly machine learning approach to predict the quality of coffee beans based on various features we have about the beans, including quality measures (e.g., aroma, flavor, acidity, ...), bean characteristics (e.g., processing method, color, ...), and information about the farm (e.g., country of origin, owner, mill, ...).</p><p>In this example we implement and evaluate a <strong>Decision Tree </strong>and a <strong>Random Forest</strong> and eventually compare its performance with each other.</p><p><strong><em>Data:</em></strong> https://github.com/jldbc/coffee-quality-database/tree/master</p><p></p><p><strong>Coffee sensory attributes</strong></p><ul><li><p><em>Aroma:</em> Refers to the scent or fragrance of the coffee.</p></li><li><p><em>Flavor:</em> The flavor of coffee is evaluated based on the taste, including any sweetness, bitterness, acidity, and other flavor notes.</p></li><li><p><em>Aftertaste:</em> Refers to the lingering taste that remains in the mouth after swallowing the coffee.</p></li><li><p><em>Acidity:</em> Acidity in coffee refers to the brightness or liveliness of the taste.</p></li><li><p><em>Body:</em> The body of coffee refers to the thickness or viscosity of the coffee in the mouth.</p></li><li><p><em>Balance:</em> Balance refers to how well the different flavor components of the coffee work together.</p></li><li><p><em>Uniformity:</em> Uniformity refers to the consistency of the coffee from cup to cup.</p></li><li><p><em>Clean Cup:</em> A clean cup refers to a coffee that is free of any off-flavors or defects, such as sourness, mustiness, or staleness.</p></li><li><p><em>Sweetness:</em> It can be described as caramel-like, fruity, or floral, and is a desirable quality in coffee.</p></li></ul><p><em>Source: https://www.kaggle.com/datasets/fatihb/coffee-quality-data-cqi?resource=download</em></p>

URL: Dataset https://github.com/jldbc/coffee-quality-database/tree/master

Random Forest

Decision Tree

Coffee sensory attributes
  • Aroma: Refers to the scent or fragrance of the coffee.

  • Flavor: The flavor of coffee is evaluated based on the taste, including any sweetness, bitterness, acidity, and other flavor notes.

  • Aftertaste: Refers to the lingering taste that remains in the mouth after swallowing the coffee.

  • Acidity: Acidity in coffee refers to the brightness or liveliness of the taste.

  • Body: The body of coffee refers to the thickness or viscosity of the coffee in the mouth.

  • Balance: Balance refers to how well the different flavor components of the coffee work together.

  • Uniformity: Uniformity refers to the consistency of the coffee from cup to cup.

  • Clean Cup: A clean cup refers to a coffee that is free of any off-flavors or defects, such as sourness, mustiness, or staleness.

  • Sweetness: It can be described as caramel-like, fruity, or floral, and is a desirable quality in coffee.

Source: https://www.kaggle.com/datasets/fatihb/coffee-quality-data-cqi?resource=download

Access data

  • 1311 samples

  • 44 features

    • farm metadata

    • bean characteristics

    • quality measures

Create target variable

Specialty coffee is defined as coffee that scores 80 points or above on a 100-point scale

Partition the data

Stratified sampling on "Coffee Quality - Target" (70/30 split)

Preprocessing II

Preprocessing I

Clean the dataset

Predicting Coffee Quality


This workflow demonstrates how to implement a beginner-friendly machine learning approach to predict the quality of coffee beans based on various features we have about the beans, including quality measures (e.g., aroma, flavor, acidity, ...), bean characteristics (e.g., processing method, color, ...), and information about the farm (e.g., country of origin, owner, mill, ...).

In this example we implement and evaluate a Decision Tree and a Random Forest and eventually compare its performance with each other.

Data:https://github.com/jldbc/coffee-quality-database/tree/master

Train & apply the model

Evaluate the model

Model comparison

Train & apply the model

Evaluate the model

Decision Tree Learner
Decision Tree Predictor
Data cleaning
RemoveTotal Cup Points
Column Filter
Top: training setBottom: test set
Table Partitioner
Create targetvariable
Expression
Scorer
ROC Curve
Compare thetwo models
Binary Classification Inspector
Combine predicitions
Column Appender
ROC Curve
Replace missing altitudevalues with country average
Clean altitude column
Random Forest Predictor
arabica_data_cleaned.csv
CSV Reader
Random Forest Learner
Scorer

Nodes

Extensions

Links