Icon

CH4

The most important things to look at from the crosstab are these:

First, look at the observed frequencies. These are the real counts in your data. They tell you how many females and males fall into each income group. In our table, far fewer females appear in the >50K category than males.

Second, comparing those to the expected frequencies. These are the counts we would expect if sex and income were independent. This is one of the biggest parts of the chi-square test. In our results, the observed and expected counts are noticeably different, especially for female with income = 1 and male with income = 1. That is the first sign that there is an association.

Third, looking at the deviation column, this tells us whether each group is above or below what was expected. A negative deviation means fewer cases than expected, and a positive deviation means more than expected. In our case, females in the >50K group are below expected, while males in the >50K group are above expected.

Fourth, the row percent is very useful for interpretation, It tells us the percentage within each sex. This is where our main finding becomes very clear:

  • about 11.37% of females are in the >50K group

  • about 31.38% of males are in the >50K group

That is a large difference, and it is usually the easiest statistic to explain in writing.

Fifth, looking at the cell chi-square values. These show which cells contribute the most to the overall chi-square statistic. In our table, the biggest contribution comes from female, income = 1, meaning that this cell differs the most from what independence would predict.

Degrees of freedom for a 2 × 2 table:

(2−1)(2−1)=1

So:

  • Chi-square ≈ 1416.36

  • df = 1

  • p-value = 0.00

  • values close to 1 = strong positive relationship

  • values close to -1 = strong negative relationship

  • values close to 0 = weak linear relationship

convers any "?" into Missing values.
Expression
removes the "MISSING" values from the dataset
Expression Row Filter
Grouping by gender and age to get the total gender count
GroupBy
Grouping by race and gender to give the count per race and gender
GroupBy
grouping by race to get the total race count
GroupBy
Box plot : Age by Sex
Box Plot
Age by Sex
Conditional Box Plot (legacy)
randomly generates balanced dataset of 4000 people from each gender.
Python Script
Sex by hours_per_week
Conditional Box Plot (legacy)
Box Plot
Sex by Education_num
Conditional Box Plot (legacy)
Box Plot: Sex by Education_num
Box Plot
Race by Hours per Week
Conditional Box Plot (legacy)
Box Plot: Race by Hours per Week
Box Plot
Race by Education_num
Conditional Box Plot (legacy)
Box Plot: Race by Education_num
Box Plot
Bar Chart
Gender Distribution in Adult Dataset
Pie Chart
GroupBy
importing the adult.data Dataset to check fo rbias.
File Reader
Race Distribution in Adult Dataset
Bar Chart
GroupBy
randomly generates balanced dataset of 4000 people from each gender.
Python Script
this is the Knime version o f "income_rates = data.groupby('sex')['income_binary'].mean() * 100"
Math Formula
The chi-squared test examined whether sex and income outcome are associated.
Crosstab
Race Distribution in Adult Dataset
Pie Chart
randomly generates balanced dataset of 2000 people from each gender.
Python Script
Expression
The chi-squared test examined whether sex and income outcome are associated.
Crosstab
Column Filter
The chi-squared test examined whether sex and income outcome are associated.
Crosstab
This node computes pairwise correlations for the selected numeric columns, which is the KNIME equivalent of the pandas correlation matrix for this task
Linear Correlation
income rate (>50K) by sex
Bar Chart
The chi-squared test examined whether sex and income outcome are associated.
Crosstab
The chi-squared test examined whether sex and income outcome are associated.
Crosstab
randomly generates balanced dataset of 2000 people from each gender.
Python Script

Nodes

Extensions

Links