Lab 8: Inference on two samples (19.5 pts. + 2...

STAT 350
class: April 2, 2014
due: April 7, 2014
Lab 8: Inference on two samples (19.5 pts. + 2 pts. BONUS)
Purposes: 1) Inference for 2 – sample independent
2) Inference 2 – sample paired
Remember:
a) Please put your name, my name, section number (class time) and lab # on the front of the
lab.
b) Label each part and put them in logical order.
c) ALWAYS include your MiniTab procedure for each problem unless stated otherwise.
d) Only include the relevant MiniTab output (DO NOT SPAM ME WITH OUTPUT!)
1) Inference for 2 – sample independent
You can use the data either in one column like it is below or in two columns by using the
appropriate option. In example 7.14 (Data file: 7.wheat):
Price
6.8250
7.3025
7.0275
7.0825
7.3000
7.3325
7.5575
7.3125
7.3600
7.5550
Month
July
July
July
July
July
September
September
September
September
September
Stat → Basic Statistics → 2-Sample t → Samples in one column (Samples: Price (values),
Subscripts (Month (differentiate the cases) → Options (Put in the appropriate Confidence level,
Test difference (), Alternative – the defaults are appropriate for this example)
For pooled variance, check ‘Assume equal variances’.
Output from Session Window
For this situation, each of the samples needs to be normal. The following procedure is used to
generate the histogram and probability plots (I did not include the output):
Histogram: Graph → Histogram → With Fit → (Graph variables: Price) → Multiple Graphs → By
variables with groups on separate graphs: Month → Data View → Smoother → Check
Lowess
Normal probability plot: Graph → Probability Plot → Single → Graph variables: Price →
Distribution → Data Display (uncheck Show confidence interval) → Scale → YScale Type →
Score → Multiple Graphs → By variables with groups on separate graphs: Month
1
Stat 350
Lab 8 MiniTab
Two-Sample T-Test and CI: Price, Month
Two-sample T for Price
Month
July
September
N
5
5
Mean
7.107
7.423
StDev
0.201
0.122
SE Mean
0.090
0.055
Difference = mu (July) - mu (September)
Estimate for difference: -0.316
95% CI for difference: (-0.574, -0.058)
T-Test of difference = 0 (vs not =): T-Value = -3.00
P-Value = 0.024
DF = 6
Note: the numbers are opposite those in the example in the book because I was matching the
SAS output.
2) Inference 2 – sample paired
The example output is taken from Example 7.7 (Data file: 7.french). In addition, this example
uses a one-tailed alternative hypothesis. When you are performing directional hypothesis, be
sure to know which variable is which so the direction is appropriate.
Stat → Basic Statistics → Paired t → Samples in columns (It will be first sample – second
sample) (First sample: Posttest, Second sample: Pretest) → Options (Put in the appropriate
Confidence level, Test difference (), Alternative – Confidence level: 90, Alternative: greater
than)
The Paired t procedure will not create a QQplot; only a histogram so you need to create the
difference by hand and then create the plots from the new column.
Calc → Calculator → Store result in variable C5, Expression: ‘Posttest’ – ‘Pretest’, check Assign
as a formula.
Then you can create the histogram and Probability plot from this new column (not provided).
Output from Session Window
Paired T-Test and CI: Posttest, Pretest
Paired T for Posttest - Pretest
Posttest
Pretest
Difference
N
20
20
20
Mean
28.30
25.80
2.500
StDev
5.95
6.30
2.893
SE Mean
1.33
1.41
0.647
90% lower bound for mean difference: 1.641
T-Test of mean difference = 0 (vs > 0): T-Value = 3.86
P-Value = 0.001
Problems
All of these problems are for 2 samples; however, you will need to decide whether the samples
are independent or paired. There should only be one code for each problem; that is, there
should not be separate code for each of the parts. The correct answer to part a) is NOT the
format of the data file or which section the problem is from.
2
Stat 350
Lab 8 MiniTab
Problem 1 (6.5 pts.) (7.44 Potential insurance fraud? Data Set: 7.FRAUD) Insurance
adjusters are concerned about the high estimates they are receiving from Jocko’s Garage. To
see if the estimates are unreasonably high, each of 10 damaged cars was taken to Jocko’s and
to a “trusted” garage and the estimates recorded. Here are the results:
a) Which procedure should you use (independent or paired)? Please explain your answer.
b) Examine each sample graphically, with special attention to outliers and skewness (histogram
and normal quantile plot). Is use of a t procedure acceptable for these data?
c) Perform a hypothesis test to determine if there is a difference between the two garages at a
significance level of 0.01. Be sure to perform the 7 steps.
d) Calculate and interpret the appropriate confidence interval.
e) Based on the answers to c) and d), is there a difference between the two garages? Why or
why not?
Your submission should consist one code for all of the parts, the plots in b) and the appropriate
output in parts c) and d), and the answers to all of the questions. In part d), you may either
rewrite the confidence interval or just indicate where it is in the output.
Problem 2 (6.5 pts.) (7.93 Study habits. Data Set: 7.Studyhabits) The Survey of Study
Habits and Attitudes (SSHA) is a psychological test designed to measure the motivation, study
habits, and attitudes toward learning of college students. These factors, along with ability, are
important in explaining success in school. Scores on the SSHA range from 0 to 200. A selective
private college gives the SSHA to an SRS of both male and female first-year students. The data
for the women are as follows:
Here are the scores of the men:
a) Which procedure should you use (independent or paired)? Please explain your answer.
b) Examine each sample graphically, with special attention to outliers and skewedness. Is use
of a t procedure acceptable for these data?
c) Most studies have found that the mean SSHA score for men is lower than the mean score in
a comparable group of women. Perform the appropriate hypothesis test (7 steps) at a
significance level of 0.1. (Hint: Please look at the answer key for Lab 6.)
d) Calculate and interpret the appropriate confidence bound for the mean difference between
the SSHA scores of male and female first-year students at this college.
e) Based on the answers to c) and d), is the mean score for men lower than the mean score for
women? Why or why not?
Your submission should consist one code for all of the parts, the plots in b) and the appropriate
output in parts c) and d), and the answers to all of the questions. In part d), you may either
rewrite the confidence interval or just indicate where it is in the output.
3
Stat 350
Lab 8 MiniTab
Problem 3 (6.5 pts.) (7.72 Sadness and spending. Data Set: 7.Sadness) The “misery is not
miserly” phenomenon refers to a sad person’s spending judgment going haywire. In a recent
study, 31 young adults were given $10 and randomly assigned to either a sad or a neutral
group. The participants in the sad group watched a video about the death of a boy’s mentor
(from The Champ), and those in the neutral group watched a video on the Great Barrier Reef.
After the video, each participant was offered the chance to trade $0.50 increments of the $10 for
an insulated water bottle. Here are the data:
a) Which procedure should you use (independent or paired)? Please explain your answer.
b) Examine each group’s prices graphically. Is use of the t procedures appropriate for these
data? Carefully explain your answer.
c) Perform the significance test at a significance level of 0.05 to determine if the spending is
dependent on whether the person is sad or not.
d) Calculate and interpret the appropriate confidence interval for the mean difference in
purchase price between the two groups.
e) Based on the answers to c) and d), does spending depend on whether a person is sad or
not? Why or why not?
Your submission should consist one code for all of the parts, the plots in b) and the appropriate
output in parts c) and d), and the answers to all of the questions. In part d), you may either
rewrite the confidence interval or just indicate where it is in the output.
Problem 4 BONUS (2 pts.)
Generate the procedure to calculate the power curve for a t distribution as described in Section
7.3 in the text. Generate a power curve for the example 7.14 in the book (Part 1 above).
4