The most important things to look at from the crosstab are these:
First, look at the observed frequencies. These are the real counts in your data. They tell you how many females and males fall into each income group. In our table, far fewer females appear in the >50K category than males.
Second, comparing those to the expected frequencies. These are the counts we would expect if sex and income were independent. This is one of the biggest parts of the chi-square test. In our results, the observed and expected counts are noticeably different, especially for female with income = 1 and male with income = 1. That is the first sign that there is an association.
Third, looking at the deviation column, this tells us whether each group is above or below what was expected. A negative deviation means fewer cases than expected, and a positive deviation means more than expected. In our case, females in the >50K group are below expected, while males in the >50K group are above expected.
Fourth, the row percent is very useful for interpretation, It tells us the percentage within each sex. This is where our main finding becomes very clear:
That is a large difference, and it is usually the easiest statistic to explain in writing.
Fifth, looking at the cell chi-square values. These show which cells contribute the most to the overall chi-square statistic. In our table, the biggest contribution comes from female, income = 1, meaning that this cell differs the most from what independence would predict.
Degrees of freedom for a 2 × 2 table:
(2−1)(2−1)=1
So:
Chi-square ≈ 1416.36
df = 1
p-value = 0.00