Icon

Projekt_​DMMLBreastCancer

Normalizer: Normalisation of numerical variables 0 to 1

Column Filter: removes the ID and Column32 column to avoid bias in the prediction.

Data partitioning of the data set:

  • 80% for training (Row Sampler)

  • 20% for testing (Reference Row Filter)

Random forest model: Construction of a decision tree forest for:

  • Classification of tumours (Random Forest Learner)

  • Prediction based on test data (Random Forest Predictor)

Scorer: Calculation of the confusion matrix and the final accuracy rate of the model

Anzeige Werte Anmerkung

Accuracy 98,2% Die KI hat fast jedes Mal die richtige Diagnose gestellt.

Falsch-Negative 1 Nur ein Patient (M) wurde als gesund (B) vorhergesagt.

Falsche Positive 1 Nur eine gesunde Person (B) wurde als krank (M) vorhergesagt.

Dataset UCI "Breast Cancer Wisconsin (Diagnostic)"

Counting Loop Start performs a 10-fold repetition of the entire training and testing process to ensure robust and reliable validation of the random forest model through random data mixing.

Statistics calculates descriptive statistics for the 30 variables in order to identify any anomalies and confirm that the empty column Column32 should be excluded from the model.

Color Manger converts the diagnosis variable into a consistent colour code (blue = benign, red = malignant) to facilitate visual interpretation of graphs and classification errors.

Scatter Plot projects the data onto two key dimensions to visually show that benign and malignant tumours form distinct clusters, confirming the feasibility of classification prior to model training.

Loop End compiles the metrics from the 10 iterations to produce a final table showing the stability of the model, with an average accuracy of 98.2%.

ROC Curve evaluates the classifier's performance using the ROC curve and shows, with an AUC of 0.967, the model's excellent ability to distinguish between benign and malignant tumours.

CSV Reader
Column Filter
Normalizer
Row Sampler
Reference Row Filter
Random Forest Learner
Statistics
Random Forest Predictor
Counting Loop Start
Scorer
Loop End
Color Manager
Scatter Plot
ROC Curve

Nodes

Extensions

Links