A2 sol

STATS 1000 / STATS 1004 / STATS 1504
Statistical Practice 1
Assignment 2
2015
DEADLINE:
• Wednesday 25th March 2015 (week 4) 4:00pm
CHECKLIST
: Have you shown all of your working, including probability notation where
necessary?
: Have you given all probabilities to 4 decimal places.
: Have you given all other numbers to 2 decimal places.
: Have you included all SPSS output and plots to support your answers where
necessary.
: Have you completed and attached a coversheet?
: If before the deadline, have you submitted your assignment into the correct
hand-in box (EMG05)?
: If after the deadline, but within 24 hours, have you contacted us via the
enquiry page on MyUni and then submitted your assignment into the late
hand-in box (Level 6, Ingkarni Wardli)?
: Is more than 24 hours, do not hand-in your assignment, it will not be marked.
1
1. Two-way tables in SPSS (See Practical 3)
Alcohol abuse has been described by college presidents in the U.S. as the
number one problem on campus, and it is an important cause of death in
young adults. A survey of 17096 students in U.S. four-year colleges collected
information on drinking behaviour and alcohol-related problems. The researchers defined “frequent binge drinking” as having five or more drinks in
a row three or more times in the past two weeks.
The dataset is in the file binge.sav on MyUni. Download it and perform
the following:
(a) Produce a table of the percentage of binge drinking for each gender and
include it in your assignment.
[1 for table]
[1 mark]
(b) Produce a bar chart with gender on the x-axis, and percentage of binge
drinking for each gender on the y-axis. Include your bar chart in your
assignment.
2
[Also accept stacked barchart]
[1 mark]
(c) With reference to the bar chart and table, which gender appears more
likely to binge drink?
17% of females binge drink, which is less than the 22.7% of males who
binge drink.
[1 mark]
[Total: 3]
2. Scatterplots and least-squares line in SPSS
At a certain municipal incinerator, the heat released by burning rubbish is
used to generate electricity. The data file energy.xlsx contains the percentage water content and the energy density in kiloCalories per kilogram
(kCal/kg). The purpose of the analysis is to predict energy density from
water content.
(a) Obtain a scatter plot of the data and comment on the relationship
between Energy density and Water Content
Strong negative linear relationship
3
Figure 1: Scatterplot of energy against water Content
4
[1 for plot (must be captioned), 1 for desciption]
[2 marks]
(b) Using SPSS, find the intercept and slope of the least squares line and
interpret these parameters in context. For full marks include the appropriate SPSS table
[1 for table]
So intercept is 3412.212[1] and slope is - 42.182[1] .
The intercept, 3412.212, is the average Energy Density (kCal/kg) when
Water Content is 0[1] .
The slope, -42.182, is the average amount that the Energy Consumption
changes per unit increase in Water Content. That is, a 1% increase in
Water Content relates to a 42.182 kCal/kg decrease in Energy Density,
on average[1] .
[5 marks]
(c) Use the least squares line to estimate the mean energy density for shipments of rubbish with 53% water content.
The least squares line is
Energy = 3412.212 − 42.182 × Water.
Hence, the mean Energy Density for a ship with 53% water content is,
3412.212 − 42.182 × (53) = 1176.6.
[1 for calculation, 1 for answer.]
[2 marks]
(d) Suppose we wish to predict energy density for a shipment with water
content 60%. Discuss briefly any concerns you might have about using
the present regression model.
The data we have only has observations with water content between
43.82 and 58.2. Prediction of Energy Consumption for any rubbish
5
shipment with a water content outside of this range must be done with
care, as we have no information regarding the behaviour of the relationship.
[2 for any reasonable discussion about extrapolation]
[2 marks]
[Total: 11]
3. Two-way tables by hand
In a study conducted by C.R. Charig, D.R. Webb, S.R. Payne and O.E. Wickham, two different treatments for kidney stones were trialled and the following data recorded.
Outcome
Small Stones
Large Stones
Treatment A
Treatment B
Success Failure Success Failure
81
6
234
36
192
71
55
25
(a) Calculate the success rates, as percentages, for each treatment separately for patients with small stones and also patients with large stones.
Small Stones
Large Stones
Treatment A
81/87 = 93.1%
192/263 = 73%
Treatment B
234/270 = 86.7%
55/80 = 68.75%
The above table contains the success rates for Treatments A and B for
patients with Small and Large stones separately.
[1 for each cell correct]
[4 marks]
(b) Calculate the success rates, as percentages, for the two treatments if
the data from patients with small stones and large stones is combined.
Combined Patients
Treatment A
Treatment B
273/350 = 78% 289/350 = 82.6%
The above table contains the success rates for Treatments A and B
separately if the patient data is combined.
[1 for each cell]
[2 marks]
6
(c) Discuss briefly, which treatment appears to be more effective making
appropriate reference to your answers to (a) and (b).
When combining the patient data in part (b), Treatment A appears
marginally less successful than Treatment B (78% success rate compared
to 82.6%). However, when we take into account the different size stones
in part (a), we observe that Treatment A has a higher success rate than
Treatment B in both groups (small: 93.1% compared to 86.7%, large:
73% compared to 68.75%).
[2 for any reasonable discussion]
[2 marks]
[Total: 8]
4. Sampling
In June 2012, the Australian Newspoll surveyed 1202 Australians aged 18
years or over. Of those surveyed, 62% stated that they are happy with their
standard of living.
(a) The 1202 people surveyed are a sample. What is the corresponding
population?
All Australians aged 18 years or over.
[1 mark]
(b) Is the value 62% a parameter or a statistic? Explain your answer.
62% is a statistic, as it is a numerical characteristic of the sample that
can be used to estimate a parameter.
[1 mark]
(c) For each of the following statements, decide if you think the statement
is true or false and give a reason why you think it is true or false.
i. Even though we are not told how the sample was taken, this is a
representative sample.
[2 marks]
The statement is false[1] , as it could just be very rich people who
are happy who were sampled etc[for reasonable discussion] .
ii. If the sample is a simple random sample selected from Australians
who are 18 years or older, we can use the sample to estimate the
7
proportion of all Australians of any age who are happy with their
standard of living.
[2 marks]
The statement is false[1] as we have only sampled from Australians
aged 18 or older and so can only make statements for Australian
that are 18 or older[1] .
iii. If the sample is a simple random sample selected from Australians
who are 18 years or older, we can state that exactly 62% of all
Australians aged 18 or over are happy with their standard of living.
[2 marks]
This statement is false[1] as we know that the 62% obtained in the
samples is an estimate of true percentage of all Australians aged 18
or older who are happy with their standard of living. So the true
value may be 62%, but it may also be something close to 62%. [1]
[Total: 8]
[[Assignment total: 30]]
8