This workflow shows how to use cross-validation in H2O using the KNIME H2O Nodes. In the example we use the H2O Random Forest to predict the multiclass response of the IRIS data set using 5-folds and evaluate the cross-validated performance.
1. Prepare:
Importing the IRIS data to H2O.
2. Cross Validation:
In order to do Cross Validation using the KNIME H2O Nodes, we use the "H2O Cross Validation Loop Start" Node and configure it for 5-fold Cross Validation using stratified fold assignment. The upper output Port contains the training data and the lower output port the test data.
3. Learn Models in Cross Validation Loop:
For each CV-fold, a Random Forest with 50 trees of maximum depth 15 is build by H2O using the training data of the corresponding fold. The test data of the fold is then predicted, adding the class specific probabilities of class membership (needed for multinominal scoring) and scored by the H2O Multinominal Scorer Node.
4. Score
To evaluate the overall performance of all trained random forests, we use the "GroupBy" Node to compute the average performance like Accuracy, LogLoss, and more.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.