Spot_Synth_A3

Workflow for df_knime_ready (Main Modeling Dataset)

This DataFrame was created to serve as the comprehensive dataset for training a supervised machine learning model in KNIME to predict user interaction scores.

Define Target Variable (interaction_score):
- Source: df_interactions
- Process: The interaction_type column ('play', 'like', 'skip') was converted into a numerical interaction_score. We used a binary classification approach: 'like' and 'play' were mapped to 1 (positive engagement), and 'skip' was mapped to 0 (negative engagement).
Prepare Track Features (df_track_features):
- Source: df_synthetic_tracks
- Process: Numerical track features (e.g., duration_ms, popularity, danceability, energy) were scaled using MinMaxScaler. Categorical track features (e.g., genre, country, explicit, label) were transformed using OneHotEncoder.
- Result: A new DataFrame, df_track_features, containing only scaled numerical and one-hot encoded track features, with track_id as the index.
Combine All Data (df_modeling_data):
- Source: df_interactions (with interaction_score), df_track_features, df_users.
- Process: These three DataFrames were merged using user_id and track_id as keys. This step brings together user information, track features, and the interaction target into a single table.
One-Hot Encode User Features (within df_knime_ready):
- Source: df_modeling_data
- Process: The remaining categorical user features (gender, location, preferred_genres) were one-hot encoded (pd.get_dummies).
Final df_knime_ready Creation:
- Result: The dataset was consolidated into df_knime_ready, containing user_id, track_id, all preprocessed track and user features (now entirely numerical), and the interaction_score target variable. This DataFrame is now fully numerical and ready for direct consumption by most machine learning nodes in KNIME.

Workflow for df_knime_ready (Main Modeling Dataset)

Load Pre-processed dataframe, view distribution and contents

Model Training

Model Testing

Logistic Regression Meta Model Creation

Format, filter, normalize, and partition data

Spot_​Synth_​A3

Workflow for df_knime_ready (Main Modeling Dataset)

Load Pre-processed dataframe, view distribution and contents

Model Training

Model Testing

Logistic Regression Meta Model Creation

Format, filter, normalize, and partition data

Nodes

Extensions

Links

Download