Icon

Second_​Assignment1

Import the dataset “Election.csv” which contains information about voters such as age, income, education, marital status, and voting decision.

Explore the dataset to check data types, distributions, and identify columns with missing values or potential data inconsistencies.

Handle missing data.
Numerical columns are replaced with the median, and categorical columns with the most frequent value (mode) to avoid data loss and maintain consistency.

Verify that all missing values have been handled correctly.
The dataset is now clean and ready for model training.

Convert categorical variables (Gender, Married, Religious) into numerical format using one-hot encoding so that they can be used in machine learning models.

Split the dataset into 70% training and 30% testing using stratified sampling based on the “Vote” column.
This ensures the same proportion of “Undecided” and “Decided” voters in both sets.

Train a Logistic Regression model to classify voters as “Undecided” or “Decided”. This model is simple and interpretable, useful for understanding key relationships between variables.

Train a Decision Tree model for classification.
It creates a tree structure based on features to predict the voting status and helps visualize decision rules.

Train a Random Forest model composed of multiple decision trees.
This model typically gives higher accuracy and better generalization by reducing overfitting.

CSV Reader
Missing Value
Logistic Regression Learner
Statistics View
Statistics
One to Many
Table Partitioner
Decision Tree Learner
Random Forest Learner

Nodes

Extensions

Links