Icon

Spot_​Synth_​A3

Workflow for df_knime_ready (Main Modeling Dataset)

This DataFrame was created to serve as the comprehensive dataset for training a supervised machine learning model in KNIME to predict user interaction scores.

  1. Define Target Variable (interaction_score):

    • Source: df_interactions

    • Process: The interaction_type column ('play', 'like', 'skip') was converted into a numerical interaction_score. We used a binary classification approach: 'like' and 'play' were mapped to 1 (positive engagement), and 'skip' was mapped to 0 (negative engagement).

  2. Prepare Track Features (df_track_features):

    • Source: df_synthetic_tracks

    • Process: Numerical track features (e.g., duration_ms, popularity, danceability, energy) were scaled using MinMaxScaler. Categorical track features (e.g., genre, country, explicit, label) were transformed using OneHotEncoder.

    • Result: A new DataFrame, df_track_features, containing only scaled numerical and one-hot encoded track features, with track_id as the index.

  3. Combine All Data (df_modeling_data):

    • Source: df_interactions (with interaction_score), df_track_features, df_users.

    • Process: These three DataFrames were merged using user_id and track_id as keys. This step brings together user information, track features, and the interaction target into a single table.

  4. One-Hot Encode User Features (within df_knime_ready):

    • Source: df_modeling_data

    • Process: The remaining categorical user features (gender, location, preferred_genres) were one-hot encoded (pd.get_dummies).

  5. Final df_knime_ready Creation:

    • Result: The dataset was consolidated into df_knime_ready, containing user_id, track_id, all preprocessed track and user features (now entirely numerical), and the interaction_score target variable. This DataFrame is now fully numerical and ready for direct consumption by most machine learning nodes in KNIME.

Load Pre-processed dataframe, view distribution and contents

Model Training

Model Testing

Logistic Regression Meta Model Creation

Double-click dialogue box to scroll and view

Format, filter, normalize, and partition data

Local File System Connector
Pre-processed feature selected training set
CSV Reader
Apply Decision Tree Regression model to test set
Decision Tree Predictor
View Contents of Dataframe
Table View
Table View
Table View
Scorer
ROC Curve
Partition thefeature-selectedtraining set intotraining and testsets (e.g., 70/30split)
Table Partitioner
Table View
ROC Curve
Box Plot
Logistic Regression Predictor
Number to String
Domain Calculator
Table View
Table View
Combine predictions from Linear Regression and Decision Tree models for meta-model input
Column Appender
K Nearest Neighbor
Table View
Scorer
Domain Calculator
Scorer
Normalizer
Table View
Bar Chart
Logistic Regression Learner
Scatter Plot
View training data
Table View
View Test Data
Table View
Scorer
Column Filter
Column Filter
Normalizer
Interactions Distribution
Statistics
Train Decision Tree Regression model on training set
Decision Tree Learner
Logistic Regression Predictor
ROC Curve
Logistic Regression Learner

Nodes

Extensions

Links