Chi-square Test of Independence Reviewing the Concept of Independence

SW318
Social Work
Statistics
Slide 1
Chi-square Test of Independence
Reviewing the Concept of Independence
Steps in Testing Chi-square Test of
Independence Hypotheses
Chi-square Test of Independence
in SPSS
SW318
Social Work
Statistics
Slide 2
Chi-square Test of Independence



The chi-square test of independence is probably the
most frequently used hypothesis test in the social
sciences.
In this exercise, we will use the chi-square test of
independence to evaluate group differences when
the test variable is nominal, dichotomous, ordinal, or
grouped interval.
The chi-square test of independence can be used for
any variable; the group (independent) and the test
variable (dependent) can be nominal, dichotomous,
ordinal, or grouped interval.
SW318
Social Work
Statistics
Slide 3
Independence Defined


Two variables are independent if, for all cases, the
classification of a case into a particular category of
one variable (the group variable) has no effect on
the probability that the case will fall into any
particular category of the second variable (the test
variable).
When two variables are independent, there is no
relationship between them. We would expect that
the frequency breakdowns of the test variable to be
similar for all groups.
SW318
Social Work
Statistics
Slide 4
Independence Demonstrated



Suppose we are interested in the relationship
between gender and attending college.
If there is no relationship between gender and
attending college and 40% of our total sample attend
college, we would expect 40% of the males in our
sample to attend college and 40% of the females to
attend college.
If there is a relationship between gender and
attending college, we would expect a higher
proportion of one group to attend college than the
other group, e.g. 60% to 20%.
SW318
Social Work
Statistics
Slide 5
Displaying Independent and Dependent
Relationships
When group membership makes
a difference, the dependent
relationship is indicated by one
group having a higher
proportion than the proportion
for the total sample.
When the variables are
independent, the proportion in
both groups is close to the
same size as the proportion
for the total sample.
100%
80%
60%
40%
40%
40%
40%
20%
0%
Males
Females
Total
Dependent Relationship
betw een Gender and College
Poportion Attending College
Poportion Attending College
Independent Relationship
betw een Gender and College
100%
80%
60%
60%
40%
40%
20%
20%
0%
Males
Females
Total
SW318
Social Work
Statistics
Slide 6
Expected Frequencies



Expected frequencies are computed as if there is no
difference between the groups, i.e. both groups have
the same proportion as the total sample in each
category of the test variable.
Since the proportion of subjects in each category of
the group variable can differ, we take group category
into account in computing expected frequencies as
well.
To summarize, the expected frequencies for each
cell are computed to be proportional to both the
breakdown for the test variable and the breakdown
for the group variable.
SW318
Social Work
Statistics
Slide 7
Expected Frequency Calculation
The data from “Observed Frequencies for Sample Data” is
the source for information to compute the expected
frequencies. Percentages are computed for the column of
all students and for the row of all GPA’s. These
percentages are then multiplied by the total number of
students in the sample (453) to compute the expected
frequency for each cell in the table.
SW318
Social Work
Statistics
Slide 8
Expected Frequencies versus Observed
Frequencies


The chi-square test of independence plugs the
observed frequencies and expected frequencies into
a formula which computes how the pattern of
observed frequencies differs from the pattern of
expected frequencies.
Probabilities for the test statistic can be obtained
from the chi-square probability distribution so that
we can test hypotheses.
SW318
Social Work
Statistics
Slide 9
Independent and Dependent Variables

The two variables in a chi-square test of
independence each play a specific role.



The group variable is also known as the independent
variable because it has an influence on the test variable.
The test variable is also known as the dependent variable
because its value is believed to be dependent on the value
of the group variable.
The chi-square test of independence is a test of the
influence or impact that a subject’s value on one
variable has on the same subject’s value for a second
variable.
SW318
Social Work
Statistics
Slide 10
Step 1. Assumptions for the Chi-square Test



The chi-square Test of Independence can be used for
any level variable, including interval level variables
grouped in a frequency distribution. It is most useful
for nominal variables for which we do not another
option.
Assumptions: No cell has an expected frequency less
than 5.
If these assumptions are violated, the chi-square
distribution will give us misleading probabilities.
SW318
Social Work
Statistics
Slide 11
Step 2. Hypotheses and alpha



The research hypothesis states that the two variables
are dependent or related. This will be true if the
observed counts for the categories of the variables in
the sample are different from the expected counts.
The null hypothesis is that the two variables are
independent. This will be true if the observed counts
in the sample are similar to the expected counts.
The amount of difference needed to make a decision
about difference or similarity is the amount
corresponding to the alpha level of significance,
which will be either 0.05 or 0.01. The value to use
will be stated in the problem.
SW318
Social Work
Statistics
Slide 12
Step 3. Sampling distribution and test statistic


To test the relationship, we use the chisquare test statistic, which follows the chisquare distribution.
If we were calculating the statistic by hand,
we would have to compute the degrees of
freedom to identify the probability of the
test statistic. SPSS will print out the degrees
of freedom and the probability of the test
statistics for us.
SW318
Social Work
Statistics
Slide 13
Step 4. Computing the Test Statistic


Conceptually, the chi-square test of independence
statistic is computed by summing the difference
between the expected and observed frequencies for
each cell in the table divided by the expected
frequencies for the cell.
We identify the value and probability for this test
statistic from the SPSS statistical output.
SW318
Social Work
Statistics
Slide 14
Step 5. Decision and Interpretation


If the probability of the test statistic is less than or
equal to the probability of the alpha error rate, we
reject the null hypothesis and conclude that our data
supports the research hypothesis. We conclude that
there is a relationship between the variables.
If the probability of the test statistic is greater than
the probability of the alpha error rate, we fail to
reject the null hypothesis. We conclude that there is
no relationship between the variables, i.e. they are
independent.
SW318
Social Work
Statistics
Slide 15
Which Cell or Cells Caused the Difference



We are only concerned with this procedure if the
result of the chi-square test was statistically
significant.
One of the problems in interpreting chi-square tests
is the determination of which cell or cells produced
the statistically significant difference. Examination
of percentages in the contingency table and
expected frequency table can be misleading.
The residual, or the difference, between the
observed frequency and the expected frequency is a
more reliable indicator, especially if the residual is
converted to a z-score and compared to a critical
value equivalent to the alpha for the problem.
SW318
Social Work
Statistics
Slide 16
Standardized Residuals



SPSS prints out the standardized residual (converted
to a z-score) computed for each cell. It does not
produce the probability or significance.
Without a probability, we will compare the size of
the standardized residuals to the critical values that
correspond to an alpha of 0.05 (+/-1.96) or an alpha
of 0.01 (+/-2.58). The problems will tell you which
value to use. This is equivalent to testing the null
hypothesis that the actual frequency equals the
expected frequency for a specific cell versus the
research hypothesis of a difference greater than
zero.
There can be 0, 1, 2, or more cells with statistically
significant standardized residuals to be interpreted.
SW318
Social Work
Statistics
Slide 17
Interpreting Standardized Residuals


Standardized residuals that have a positive value
mean that the cell was over-represented in the
actual sample, compared to the expected frequency,
i.e. there were more subjects in this category than
we expected.
Standardized residuals that have a negative value
mean that the cell was under-represented in the
actual sample, compared to the expected frequency,
i.e. there were fewer subjects in this category than
we expected.
SW318
Social Work
Statistics
Slide 18
Interpreting Cell Differences in
a Chi-square Test - 1
A chi-square test of
independence of the
relationship between sex
and marital status finds a
statistically significant
relationship between the
variables.
SW318
Social Work
Statistics
Slide 19
Interpreting Cell Differences in
a Chi-square Test - 2
Researcher often try to identify try to identify which cell or cells are the
major contributors to the significant chi-square test by examining the
pattern of column percentages.
Based on the column percentages, we would identify cells on the
married row and the widowed row as the ones producing the
significant result because they show the largest differences: 8.2% on
the married row (50.9%-42.7%) and 9.0% on the widowed row
(13.1%-4.1%)
SW318
Social Work
Statistics
Slide 20
Interpreting Cell Differences in
a Chi-square Test - 3
Using a level of significance of 0.05, the critical value for a
standardized residual would be -1.96 and +1.96. Using
standardized residuals, we would find that only the cells on the
widowed row are the significant contributors to the chi-square
relationship between sex and marital status.
If we interpreted the contribution of the married marital status,
we would be mistaken. Basing the interpretation on column
percentages can be misleading.
SW318
Social Work
Statistics
Slide 21
Chi-Square Test of Independence: post hoc
practice problem 1
This question asks you to use a chi-square test of independence and, if
significant, to do a post hoc test using 1.96 of the critical value.
First of all, the level of measurement for the independent and the
dependent variable can be any level that defines groups (dichotomous,
nominal, ordinal, or grouped interval). “degree of religious
fundamentalism" [fund] is ordinal and "sex" [sex] is dichotomous, so the
level of measurement requirements are satisfied.
SW318
Social Work
Statistics
Slide 22
Chi-Square Test of Independence: post hoc
test in SPSS (1)
You can conduct a chi-square test of
independence in crosstabulation of
SPSS by selecting:
Analyze > Descriptive Statistics
> Crosstabs…
SW318
Social Work
Statistics
Slide 23
Chi-Square Test of Independence: post hoc
test in SPSS (2)
First, select and move the
variables for the question to
“Row(s):” and “Column(s):”
list boxes.
The variable mentioned first
in the problem, sex, is used
as the independent variable
and is moved to the
“Column(s):” list box.
Second, click on
“Statistics…” button to
request the test
statistic.
The variable mentioned
second in the problem,
[fund], is used as the
dependent variable and is
moved to the “Row(s)” list
box.
SW318
Social Work
Statistics
Slide 24
Chi-Square Test of Independence: post hoc
test in SPSS (3)
First, click on “Chi-square” to
request the chi-square test of
independence.
Second, click on “Continue”
button to close the Statistics
dialog box.
SW318
Social Work
Statistics
Slide 25
Chi-Square Test of Independence: post hoc
test in SPSS (4)
Now click on “Cells…”
button to specify the
contents in the cells of
the crosstabs table.
SW318
Social Work
Statistics
Slide 26
Chi-Square Test of Independence: post hoc
test in SPSS (5)
First, make sure both
“Observed” and “Expected”
in the “Counts” section in
“Crosstabs: Cell Display”
dialog box are checked.
In the “Residuals” section,
select “Unstandardized” and
“Standardized” residuals
and click on “Continue” and
“OK” buttons.
SW318
Social Work
Statistics
Slide 27
Chi-Square Test of Independence: post hoc
test in SPSS (6)
In the table Chi-Square Tests result,
SPSS also tells us that “0 cells have
expected count less than 5 and the
minimum expected count is 70.63”.
The sample size requirement for the
chi-square test of independence is
satisfied.
SW318
Social Work
Statistics
Slide 28
Chi-Square Test of Independence: post hoc
test in SPSS (7)
The probability of the chi-square test
statistic (chi-square=2.821) was
p=0.244, greater than the alpha level
of significance of 0.05. The null
hypothesis that differences in "degree
of religious fundamentalism" are
independent of differences in "sex" is
not rejected.
The research hypothesis that
differences in "degree of religious
fundamentalism" are related to
differences in "sex" is not supported by
this analysis.
Thus, the answer for this question is
False. We do not interpret cell
differences unless the chi-square test
statistic supports the research
hypothesis.
SW318
Social Work
Statistics
Slide 29
Chi-Square Test of Independence: post hoc
practice problem 2
This question asks you to use a chi-square test of independence and, if
significant, to do a post hoc test using -1.96 of the critical value.
First of all, the level of measurement for the independent and the
dependent variable can be any level that defines groups (dichotomous,
nominal, ordinal, or grouped interval). [empathy3] is ordinal and [sex] is
dichotomous, so the level of measurement requirements are satisfied.
SW318
Social Work
Statistics
Slide 30
Chi-Square Test of Independence: post hoc
test in SPSS (8)
You can conduct a chi-square test of
independence in crosstabulation of
SPSS by selecting:
Analyze > Descriptive Statistics
> Crosstabs…
SW318
Social Work
Statistics
Slide 31
Chi-Square Test of Independence: post hoc
test in SPSS (9)
First, select and move the
variables for the question to
“Row(s):” and “Column(s):”
list boxes.
The variable mentioned first
in the problem, [sex], is
used as the independent
variable and is moved to the
“Column(s):” list box.
Second, click on
“Statistics…” button to
request the test
statistic.
The variable mentioned
second in the problem,
[empathy3], is used as the
dependent variable and is
moved to the “Row(s)” list
box.
SW318
Social Work
Statistics
Slide 32
Chi-Square Test of Independence: post hoc
test in SPSS (10)
First, click on “Chi-square” to
request the chi-square test of
independence.
Second, click on “Continue”
button to close the Statistics
dialog box.
SW318
Social Work
Statistics
Slide 33
Chi-Square Test of Independence: post hoc
test in SPSS (11)
Now click on “Cells…”
button to specify the
contents in the cells of
the crosstabs table.
SW318
Social Work
Statistics
Slide 34
Chi-Square Test of Independence: post hoc
test in SPSS (12)
First, make sure both
“Observed” and “Expected”
in the “Counts” section in
“Crosstabs: Cell Display”
dialog box are checked.
In the “Residuals” section,
select “Unstandardized” and
“Standardized” residuals
and click on “Continue” and
“OK” buttons.
SW318
Social Work
Statistics
Slide 35
Chi-Square Test of Independence: post hoc
test in SPSS (13)
In the table Chi-Square Tests result,
SPSS also tells us that “0 cells have
expected count less than 5 and the
minimum expected count is 6.79”.
The sample size requirement for the
chi-square test of independence is
satisfied.
SW318
Social Work
Statistics
Slide 36
Chi-Square Test of Independence: post hoc
test in SPSS (14)
The probability of the chi-square test
statistic (chi-square=23.083) was
p<0.001, less than or equal to the
alpha level of significance of 0.05.
The null hypothesis that differences
in "accuracy of the description of
feeling protective toward people
being taken advantage of" are
independent of differences in "sex" is
rejected.
The research hypothesis that
differences in "accuracy of the
description of feeling protective
toward people being taken advantage
of" are related to differences in "sex"
is supported by this analysis.
Now, you can examine the post hoc
test using the given critical value.
SW318
Social Work
Statistics
Slide 37
Chi-Square Test of Independence: post hoc
test in SPSS (15)
The residual is the difference
between the actual frequency and
the expected frequency (5879.2=-21.2).
When converted to a z-score, the
standardized residual (-2.4) was
smaller than the critical value (1.96), supporting a specific finding
that among survey respondents
who were male, there were fewer
who said that feeling protective
toward people being taken
advantage of describes them very
well than would be expected.
The answer to the question is
true.
SW318
Social Work
Statistics
Slide 38
Steps in solving chi-square test of
independence: post hoc problems - 1
The following is a guide to the decision process for answering
homework problems about chi-square test of independence post
hoc problems:
Is the dependent and
independent variable nominal,
ordinal, dichotomous, or
grouped interval?
Yes
No
Incorrect
application of
a statistic
SW318
Social Work
Statistics
Slide 39
Steps in solving chi-square test of
independence: post hoc problems - 2
Compute the Chi-Square test of independence,
requesting standardized residuals in the output
Yes
Expected cell counts less
than 5?
Incorrect
application of
a statistic
No
Is the p-value for the chisquare test of
independence <= alpha?
Yes
No
False
SW318
Social Work
Statistics
Slide 40
Steps in solving chi-square test of
independence: post hoc problems - 3
Identify the cell in the crosstabs table that
contains the specific relationship in the problem
Is the value of the standardized
residual for the specified cell
larger (smaller) than the postive
(negative) critical value given in
the problem?
No
False
Yes
No
Is the relationship correctly
described?
Yes
True
False