Icon

5

1) Read the Adidas_US_Sales.xlsx file2) Split data into training and test subsets with 80:20 ratio, stratified sampling on the "Product" column and random seed equal to 423) With the Normalizer node, apply Min-Max Normalization (with Min = 0 and Max = 1) to training data for all the numeric features except for "Units Sold"4) With the Normalizer (Apply) node, perform the same normalization on test data5) Improve as much as you can the following models: a) Random forest for regression (selecting the "Use static random seed" option and setting its value equal to 1) b) Random forest for classification (selecting the "Use static random seed" option and setting its value equal to 1) HINT: for both of them, consider "Limit number of levels (tree depth)", "Minimum node size" and "Number of models" as hyperparameters to tune. NOTE that the following minimum performance requirements must be met: - for regression: R^2 > 0.880 - for classification: accuracy > 81.00%QUESTION 1: what is the best R^2 value achieved for regression?QUESTION 2: what is the best accuracy achieved for classification?QUESTION 3: explain the strategy applied to find the best hyperparamters settings.NOTE 1: export the workflow with the node configurations providing the best results achieved.NOTE 2: remember to do not reset the nodes when you export the workflow. Node 11Node 12Node 13Node 14 Excel Reader Partitioning Normalizer Normalizer (Apply) Random ForestLearner Random ForestPredictor Random Forest Learner(Regression) Random Forest Predictor(Regression) Scorer Numeric Scorer Parameter OptimizationLoop Start ParameterOptimization Loop End Parameter OptimizationLoop Start ParameterOptimization Loop End 1) Read the Adidas_US_Sales.xlsx file2) Split data into training and test subsets with 80:20 ratio, stratified sampling on the "Product" column and random seed equal to 423) With the Normalizer node, apply Min-Max Normalization (with Min = 0 and Max = 1) to training data for all the numeric features except for "Units Sold"4) With the Normalizer (Apply) node, perform the same normalization on test data5) Improve as much as you can the following models: a) Random forest for regression (selecting the "Use static random seed" option and setting its value equal to 1) b) Random forest for classification (selecting the "Use static random seed" option and setting its value equal to 1) HINT: for both of them, consider "Limit number of levels (tree depth)", "Minimum node size" and "Number of models" as hyperparameters to tune. NOTE that the following minimum performance requirements must be met: - for regression: R^2 > 0.880 - for classification: accuracy > 81.00%QUESTION 1: what is the best R^2 value achieved for regression?QUESTION 2: what is the best accuracy achieved for classification?QUESTION 3: explain the strategy applied to find the best hyperparamters settings.NOTE 1: export the workflow with the node configurations providing the best results achieved.NOTE 2: remember to do not reset the nodes when you export the workflow. Node 11Node 12Node 13Node 14 Excel Reader Partitioning Normalizer Normalizer (Apply) Random ForestLearner Random ForestPredictor Random Forest Learner(Regression) Random Forest Predictor(Regression) Scorer Numeric Scorer Parameter OptimizationLoop Start ParameterOptimization Loop End Parameter OptimizationLoop Start ParameterOptimization Loop End

Nodes

Extensions

Links