Icon

justknimeit-23part2_​optimization_​Victor

justknimeit-23part2_optimization_Victor
Description: Just like in last week’s challenge, a telecom company wants you to predict which customers are going to churn (that is, going to cancel their contracts) based on attributes of their accounts. One of your colleagues said that she was able to achieve a bit over 95% accuracy for the test datawithout modifying the training data at all, and using all given attributes exactly as they are. Again, the target class to be predicted is Churn (value 0 corresponds to customers that do not churn, and 1 corresponds to those who do). What model should you train over the training dataset to obtainthis accuracy over the test dataset? Can this decision be automated? Note 1: A simple, automated solution to this challenge consists of 5 nodes. Note 2: In this challenge, do not change the statistical distribution of any attribute or class in the datasets, and use all available attributes. Note 3: Need more help to understand the problem? Check this blog post out. Model training and cross-validation Model predictions Model evaluation Data preparation Performance metricssummaryAccuracy : 95,802%Error : 4,198%Cohen's Kappa : 0,825 Training dataTest dataStart optimizationEnd optimizationAggregatesCV resultsCreates 10 foldsfor CVXGBoost trainingXGBoost predictionsView XGBoost treebest performancesXGBoost trainingoptimizedXGBoost predictionswith optimized modelBest parametersto variableViewXGBoost treeperformances andconfusion matrix CSV Reader CSV Reader Parameter OptimizationLoop Start ParameterOptimization Loop End X-Aggregator X-Partitioner XGBoost TreeEnsemble Learner XGBoost Predictor Scorer XGBoost TreeEnsemble Learner XGBoost Predictor Table Rowto Variable Scorer Description: Just like in last week’s challenge, a telecom company wants you to predict which customers are going to churn (that is, going to cancel their contracts) based on attributes of their accounts. One of your colleagues said that she was able to achieve a bit over 95% accuracy for the test datawithout modifying the training data at all, and using all given attributes exactly as they are. Again, the target class to be predicted is Churn (value 0 corresponds to customers that do not churn, and 1 corresponds to those who do). What model should you train over the training dataset to obtainthis accuracy over the test dataset? Can this decision be automated? Note 1: A simple, automated solution to this challenge consists of 5 nodes. Note 2: In this challenge, do not change the statistical distribution of any attribute or class in the datasets, and use all available attributes. Note 3: Need more help to understand the problem? Check this blog post out. Model training and cross-validation Model predictions Model evaluation Data preparation Performance metricssummaryAccuracy : 95,802%Error : 4,198%Cohen's Kappa : 0,825 Training dataTest dataStart optimizationEnd optimizationAggregatesCV resultsCreates 10 foldsfor CVXGBoost trainingXGBoost predictionsView XGBoost treebest performancesXGBoost trainingoptimizedXGBoost predictionswith optimized modelBest parametersto variableViewXGBoost treeperformances andconfusion matrixCSV Reader CSV Reader Parameter OptimizationLoop Start ParameterOptimization Loop End X-Aggregator X-Partitioner XGBoost TreeEnsemble Learner XGBoost Predictor Scorer XGBoost TreeEnsemble Learner XGBoost Predictor Table Rowto Variable Scorer

Nodes

Extensions

Links