Icon

Biomass Analysis

There has been no title set for this workflow's metadata.

There has been no description set for this workflow's metadata.

Workflow per il submit training set validation set test set new training set(train + val) Using the column filter, I extractthe predicted column and thetarget column. The scatter plotallows me to compare eachsample with its prediction. I used a correlation filter to eliminate highly correlatedcolumns (threshold=0.7). This allows the removal ofcolumns that may contain redundant information,reduces the risk of overfitting, and simplifies themodel. Additionally, it helped me eliminate somecolumns with many missing values. I decided to manually remove the 'Lignin' columnbecause it has a high number of missing values and isalso correlated with the 'Hemicellulose' column. The following model uses the boosting approach(150 rounds): at each step, a new regressiontree is generated to correct the errors of theprevious ones by modifying the weights of thetraining instances. The model is robust as itcombines predictions from all the trees.Additionally, XGBoost automatically handlesmissing values during training by assigningvalues to these missing values that maximize thereduction of the error. The output provides the best configurationof model hyperparameters found by theparameter optimization loop (lr=0.2,max_depth=14). They are automatically setas hyperparameters for the new learner. train the model on all training data(train + val) after finding thehyperparameters. With this column filter, I eliminate allthe features that have been filteredduring the previous training(correlation filter + column filter). Hyperparameters: learning rate = (0.1, 0.2), step=0.1max depth = (10,20), step=1 read test set write prediction file write submit file rounding predictioncolumn (precision=1) read prediction file prediction on test set read dataset Node 41880% - 20%XGBoostpredictorNode 430Node 431loop optimizationoptimizationloop endMAE result10 foldcross validationXGBoost LearnerXGBoostpredictorXGBoost Learnercross validationaggregatorMAE resultcorrelationmatrix0.7 correlation treshold Column Filter Excel Reader Partitioning XGBoost Predictor(Regression) Column Filter Column Filter Parameter OptimizationLoop Start ParameterOptimization Loop End Table RowTo Variable CSV Writer Excel Reader XGBoost Predictor(Regression) Numeric Scorer CSV Writer X-Partitioner XGBoost Tree EnsembleLearner (Regression) CSV Reader XGBoost Predictor(Regression) XGBoost Tree EnsembleLearner (Regression) X-Aggregator Column Filter Scatter Plot Scatter Plot Round Double Numeric Scorer Linear Correlation Correlation Filter Workflow per il submit training set validation set test set new training set(train + val) Using the column filter, I extractthe predicted column and thetarget column. The scatter plotallows me to compare eachsample with its prediction. I used a correlation filter to eliminate highly correlatedcolumns (threshold=0.7). This allows the removal ofcolumns that may contain redundant information,reduces the risk of overfitting, and simplifies themodel. Additionally, it helped me eliminate somecolumns with many missing values. I decided to manually remove the 'Lignin' columnbecause it has a high number of missing values and isalso correlated with the 'Hemicellulose' column. The following model uses the boosting approach(150 rounds): at each step, a new regressiontree is generated to correct the errors of theprevious ones by modifying the weights of thetraining instances. The model is robust as itcombines predictions from all the trees.Additionally, XGBoost automatically handlesmissing values during training by assigningvalues to these missing values that maximize thereduction of the error. The output provides the best configurationof model hyperparameters found by theparameter optimization loop (lr=0.2,max_depth=14). They are automatically setas hyperparameters for the new learner. train the model on all training data(train + val) after finding thehyperparameters. With this column filter, I eliminate allthe features that have been filteredduring the previous training(correlation filter + column filter). Hyperparameters: learning rate = (0.1, 0.2), step=0.1max depth = (10,20), step=1 read test set write prediction file write submit file rounding predictioncolumn (precision=1) read prediction file prediction on test set read dataset Node 41880% - 20%XGBoostpredictorNode 430Node 431loop optimizationoptimizationloop endMAE result10 foldcross validationXGBoost LearnerXGBoostpredictorXGBoost Learnercross validationaggregatorMAE resultcorrelationmatrix0.7 correlation treshold Column Filter Excel Reader Partitioning XGBoost Predictor(Regression) Column Filter Column Filter Parameter OptimizationLoop Start ParameterOptimization Loop End Table RowTo Variable CSV Writer Excel Reader XGBoost Predictor(Regression) Numeric Scorer CSV Writer X-Partitioner XGBoost Tree EnsembleLearner (Regression) CSV Reader XGBoost Predictor(Regression) XGBoost Tree EnsembleLearner (Regression) X-Aggregator Column Filter Scatter Plot Scatter Plot Round Double Numeric Scorer Linear Correlation Correlation Filter

Nodes

Extensions

Links