Chapter 23. Two Categorical Variables: The Chi

Chapter 23. Two Categorical Variables: The Chi-Square Test
STAT 145
The new test addresses a general question: is there a relationship between two categorical variables?
Let' consider the following Example along with theory:
A sample of 70 girls and boys was taken at random from a population of children (perhaps of a particular age of
interest), and then the numbers of right-handed boys, right-handed girls, left-handed boys, and left-handed girls
were recorded as shown below:
← two-way table, because we have two categorical
variables:
Left-handed
12
6
18
• handedness (with 2 categories)
Right-handed
24
28
52
• sex
(with 2 categories)
Table's dimension is 2 x 2 (look at number of
column Total
36
34
70
categories in each variable).
Is there evidence that, in the sampled population, handedness is independent of sex? (Use α=0.05 )
Observed
Girls
Boys
row Total
Step 1: State Hypotheses.
H 0 : In the sampled population, there is NO relationship between handedness and sex.
H a : In the sampled population, there is a relationship between handedness and sex.
Note: The alternative hypothesis is so called “many-sided” because it allows any kind of difference (thus it is
not one-sided or two-sided).
Step 2: Compute Expected counts.
The expected count in any cell of a two-way table when
expected count=
Expected
H 0 is true is
row total×column total
table total
Girls
Boys
Left-handed
18×36
=9.26
70
18×34
=8.74
70
9.26+ 8.74
Right-handed
52×36
=26.74
70
52×34
=25.26
70
26.74+25.26 = 52
column Total
9.26+26.74 =36
8.74+ 25.26 = 34
row Total
= 18
70
1
Chapter 23. Two Categorical Variables: The Chi-Square Test
STAT 145
Step 3: Compute test statistic.
Draw an SRS from a large population and make a two-way table of the sample counts for two categorical
variables. To test the null hypotheses H 0 that there is no relationship between the row and column variables
in the population, calculate the chi-square statistic
(observed count−expected count)2
χ=∑
expected count
all cells
2
The chi-square test rejects
2
(Oij −Eij )
or χ = ∑
Eij
all cells
2
H 0 when χ 2 is large or calculate P-values from a chi-square distribution (see
Table D).
Think of χ 2 as a measure of the distance of the observed counts from the expected counts. It is always zero or
positive, and it is zero only when the observed counts are exactly equal to the expected counts.
Contribution to the chisquare test statistic:
2
(Oij −Eij )
Eij
Girls
2
Left-handed
(12−9.26)
=0.81
9.26
Right-handed
(24−26.74)
=0.28
26.74
2
Boys
2
(6−8.74)
=0.86
8.74
2
(28−25.26)
=0.30
25.26
(Oij −Eij )2
χ=∑
=0.81+0.28+ 0.86+0.30=2.25
Eij
all cells
2
Step 4: P-value and Conclusion.
The degrees of freedom for the chi-square test for this two-way table are (r−1)×(c−1) , where r is a
number of rows and c is the number of columns.
In this particular problem (handedness – sex): df =(2−1)×(2−1)=1 .
Using Table D, 2.07< χ2=2.25<2.71 for the line where df =1 , which implies that
0.10< P−value< 0.15 .
Since
P−value>0.05 (because we compare to α=0.05 ), we CAN'T reject
H 0 . We conclude that there
is NO relationship between handedness and sex.
Cell counts required for the chi-square test
You can safely use the chi-square test with P-values from the chi-square distribution when no more than 20% of
the expected counts are less than 5 and all individual expected counts are 1 or greater.
2
Chapter 23. Two Categorical Variables: The Chi-Square Test
STAT 145
The chi–square distributions are a family of distributions that take only positive values and are skewed to the
right. A specific chi–square distribution is specified by giving its degrees of freedom.
The chi–square test for a two-way table with r rows and c columns uses critical values from the chi–square
distribution with (r − 1)(c − 1) degrees of freedom. The P–value is the area under the density curve of this chi–
square distribution to the right of the value of the test statistic.
3
Chapter 23. Two Categorical Variables: The Chi-Square Test
STAT 145
Problem 1.
A study of the career plans of young women and men sent questionnaires to all 722 members of the senior class
in the College of Business Administration at the University of Illinois. One question asked which major within
the business program the student had chosen. Here are the data from the students who responded:
This is an example of a single sample classified according to two categorical variables (gender and major).
a) Test the null hypothesis that there is no relationship between the gender of students and their choice of major.
Give a P-value and state your conclusion(Use α=0.05 )
b) Verify that the expected cell counts satisfy the requirement for use of chi-square.
Problem 2.
Is there a relationship between being a pet owner and being happy?
To answer this question, a psychologist asks randomly selected individuals about their level of happiness and
whether or not they own any pet. The psychologist's observed results are listed in the table:
Happy
Not Happy
but not Sad
Sad
row Total
Pet Owner
75
19
18
112
Not Pet Owner
36
10
33
79
column Total
111
29
51
a) How many pet owners would be expected to be happy if there is no relationship between pet ownership and
happiness? (round answer to the nearest whole number)
b) If there is no relationship between pet ownership and happiness, the number of people who are not pet
owners who are expected to be sad is 24. What is the contribution to the chi-square test statistic? (round answer
to the nearest whole number)
c) Suppose that chi-square value for this test was 8.5. What is the corresponding p-value? And what can we
conclude?
d) To check the conditions for inference for this test of significance, we check for an SRS from the population
as well as … (continue).
4