Icon

04 Model Building on Big Data

04 Model Building on Big Data
Execute only if training and test setsdon't exist in HDFS Exercise 4: Model Building on Big DataIn this exercise you'll train a prediction model in Spark1) Create a local big data environment. You can use the default configuration.2) Read the Spark/airline_training.parquet and Spark/airline_test.parquet folders into Spark (Parquet to Spark nodes). If the files don'texist, execute the Write parquet files metanode first.3) Filter out the following columns from the training set:- ArrTime- DepTime- All .*Delay columns but the target column DepartureDelay- UniqueCarrier, TailNum, and Origin4) Train a Random Forest model to predict departure delay- Apply entropy as the quality measure and increase the maximum depth to 105) Apply the model to the test set. Append individual class probabilities.6) Check the confusion matrix of the model7) Draw an ROC curve of the model Write parquet files Execute only if training and test setsdon't exist in HDFS Exercise 4: Model Building on Big DataIn this exercise you'll train a prediction model in Spark1) Create a local big data environment. You can use the default configuration.2) Read the Spark/airline_training.parquet and Spark/airline_test.parquet folders into Spark (Parquet to Spark nodes). If the files don'texist, execute the Write parquet files metanode first.3) Filter out the following columns from the training set:- ArrTime- DepTime- All .*Delay columns but the target column DepartureDelay- UniqueCarrier, TailNum, and Origin4) Train a Random Forest model to predict departure delay- Apply entropy as the quality measure and increase the maximum depth to 105) Apply the model to the test set. Append individual class probabilities.6) Check the confusion matrix of the model7) Draw an ROC curve of the model Write parquet files

Nodes

Extensions

Links