Icon

01_​Modelling_​Workflow

Continous Deployment - Modelling Workflow

This workflow is using Integrated Deployment in a complex Guided Analytics application hardcoded to work with a Churn Prediction dataset.

Workflow steps:
- access and blend the data
- perform data exploration
- prepare the data with normal nodes and an interactive view
- optimize, train and test two models
- retrain and test the interactively selected model
- generate a final training workflow
- generate a final deployment workflow from the training workflow
- deploy the workflow on KNIME Server

This workflow also works on KNIME WebPortal and its use case is simple churn prediction

Retrain on Full Dataset - retrain on full dataset the desired model with optimized parameters - predict on new test set and export workflow to be deployed / downloaded Train and Optimize Pre-processing - Join contract data and behavioral data - Convert Churn values to String to be used as class in upcoming classification - Reserve 80% of the rows for model training and remaining for model testing - Use same number of data rows for both classes in testing test set Reading and Blending - contract data + churn - behavioral (calls) data train set - Missing value imputation modelling - Optimize Random Forest parameters - Optimize threshold Binary Classification - Train model with optimized parameters - Capture branch to deploy Capture for Deployment : Data Preparation Churn = 0 : current subscriptions Churn = 1 : cancelled subscriptions Data Exploration - automated visualization - uni- and bi-variate visual exploration Domain Expertise Input - filter columns by quality - avoid ground truth leakage - column removal by domain knowledge ML Expert Decision - asses performance - select best model entire raw data set DataPrep workflow Chosen Retrainworkflow DataPrep workflow ML Expert Decision - save training workflow - asses performance - select best model - save and deploy the model Deployment workflow Random Forest train: 90% test: 10%ReadingContractData.csvJoin the contract data and the behavioral dataArea code and churn are converted to String. optimized modelXGBoostRandom ForestXGBoostsub-sample 10%train: 99% test: 1%optimized modeltop: train setbottom: test set//top: test set pred.middle: predictor workflowbottom: retrain workflow//predictionstest setboth models Retrain: - chosen model - optimized model Deploy: - DataPrep - Predictor - within REST APIprepare raw dataNode 1466Deploy to ServerParameterOptimization RF DB Table Selector DB Connector Visualize andDownload Model Partitioning File Reader Joiner Number To String Random ForestLearner Domain Calculator CaptureWorkflow Start CaptureWorkflow End Database URL andCredentials DB Reader InteractiveColumn Filter AutomatedVisualization ParameterOptimization XGB Predictor RF Predictor XGB Select Model Workflow Combiner Row Sampling Partitioning XGBoost TreeEnsemble Learner CaptureWorkflow Start CaptureWorkflow Start CaptureWorkflow End CaptureWorkflow End Joiner Workflow Executor Workflow Executor Deploy Workflowto Server KNIME ServerConnection Workflow Writer Workflow Writer Retrain on Full Dataset - retrain on full dataset the desired model with optimized parameters - predict on new test set and export workflow to be deployed / downloaded Train and Optimize Pre-processing - Join contract data and behavioral data - Convert Churn values to String to be used as class in upcoming classification - Reserve 80% of the rows for model training and remaining for model testing - Use same number of data rows for both classes in testing test set Reading and Blending - contract data + churn - behavioral (calls) data train set - Missing value imputation modelling - Optimize Random Forest parameters - Optimize threshold Binary Classification - Train model with optimized parameters - Capture branch to deploy Capture for Deployment : Data Preparation Churn = 0 : current subscriptions Churn = 1 : cancelled subscriptions Data Exploration - automated visualization - uni- and bi-variate visual exploration Domain Expertise Input - filter columns by quality - avoid ground truth leakage - column removal by domain knowledge ML Expert Decision - asses performance - select best model entire raw data set DataPrep workflow Chosen Retrainworkflow DataPrep workflow ML Expert Decision - save training workflow - asses performance - select best model - save and deploy the model Deployment workflow Random Forest train: 90% test: 10%ReadingContractData.csvJoin the contract data and the behavioral dataArea code and churn are converted to String. optimized modelXGBoostRandom ForestXGBoostsub-sample 10%train: 99% test: 1%optimized modeltop: train setbottom: test set//top: test set pred.middle: predictor workflowbottom: retrain workflow//predictionstest setboth models Retrain: - chosen model - optimized model Deploy: - DataPrep - Predictor - within REST APIprepare raw dataNode 1466Deploy to ServerParameterOptimization RF DB Table Selector DB Connector Visualize andDownload Model Partitioning File Reader Joiner Number To String Random ForestLearner Domain Calculator CaptureWorkflow Start CaptureWorkflow End Database URL andCredentials DB Reader InteractiveColumn Filter AutomatedVisualization ParameterOptimization XGB Predictor RF Predictor XGB Select Model Workflow Combiner Row Sampling Partitioning XGBoost TreeEnsemble Learner CaptureWorkflow Start CaptureWorkflow Start CaptureWorkflow End CaptureWorkflow End Joiner Workflow Executor Workflow Executor Deploy Workflowto Server KNIME ServerConnection Workflow Writer Workflow Writer

Nodes

Extensions