STATS 1000 / STATS 1004 / STATS 1504 Statistical Practice 1 Assignment 2 2015 DEADLINE: • Wednesday 25th March 2015 (week 4) 4:00pm CHECKLIST : Have you shown all of your working, including probability notation where necessary? : Have you given all probabilities to 4 decimal places. : Have you given all other numbers to 2 decimal places. : Have you included all SPSS output and plots to support your answers where necessary. : Have you completed and attached a coversheet? : If before the deadline, have you submitted your assignment into the correct hand-in box (EMG05)? : If after the deadline, but within 24 hours, have you contacted us via the enquiry page on MyUni and then submitted your assignment into the late hand-in box (Level 6, Ingkarni Wardli)? : Is more than 24 hours, do not hand-in your assignment, it will not be marked. 1 1. Two-way tables in SPSS (See Practical 3) Alcohol abuse has been described by college presidents in the U.S. as the number one problem on campus, and it is an important cause of death in young adults. A survey of 17096 students in U.S. four-year colleges collected information on drinking behaviour and alcohol-related problems. The researchers defined “frequent binge drinking” as having five or more drinks in a row three or more times in the past two weeks. The dataset is in the file binge.sav on MyUni. Download it and perform the following: (a) Produce a table of the percentage of binge drinking for each gender and include it in your assignment. [1 for table] [1 mark] (b) Produce a bar chart with gender on the x-axis, and percentage of binge drinking for each gender on the y-axis. Include your bar chart in your assignment. 2 [Also accept stacked barchart] [1 mark] (c) With reference to the bar chart and table, which gender appears more likely to binge drink? 17% of females binge drink, which is less than the 22.7% of males who binge drink. [1 mark] [Total: 3] 2. Scatterplots and least-squares line in SPSS At a certain municipal incinerator, the heat released by burning rubbish is used to generate electricity. The data file energy.xlsx contains the percentage water content and the energy density in kiloCalories per kilogram (kCal/kg). The purpose of the analysis is to predict energy density from water content. (a) Obtain a scatter plot of the data and comment on the relationship between Energy density and Water Content Strong negative linear relationship 3 Figure 1: Scatterplot of energy against water Content 4 [1 for plot (must be captioned), 1 for desciption] [2 marks] (b) Using SPSS, find the intercept and slope of the least squares line and interpret these parameters in context. For full marks include the appropriate SPSS table [1 for table] So intercept is 3412.212[1] and slope is - 42.182[1] . The intercept, 3412.212, is the average Energy Density (kCal/kg) when Water Content is 0[1] . The slope, -42.182, is the average amount that the Energy Consumption changes per unit increase in Water Content. That is, a 1% increase in Water Content relates to a 42.182 kCal/kg decrease in Energy Density, on average[1] . [5 marks] (c) Use the least squares line to estimate the mean energy density for shipments of rubbish with 53% water content. The least squares line is Energy = 3412.212 − 42.182 × Water. Hence, the mean Energy Density for a ship with 53% water content is, 3412.212 − 42.182 × (53) = 1176.6. [1 for calculation, 1 for answer.] [2 marks] (d) Suppose we wish to predict energy density for a shipment with water content 60%. Discuss briefly any concerns you might have about using the present regression model. The data we have only has observations with water content between 43.82 and 58.2. Prediction of Energy Consumption for any rubbish 5 shipment with a water content outside of this range must be done with care, as we have no information regarding the behaviour of the relationship. [2 for any reasonable discussion about extrapolation] [2 marks] [Total: 11] 3. Two-way tables by hand In a study conducted by C.R. Charig, D.R. Webb, S.R. Payne and O.E. Wickham, two different treatments for kidney stones were trialled and the following data recorded. Outcome Small Stones Large Stones Treatment A Treatment B Success Failure Success Failure 81 6 234 36 192 71 55 25 (a) Calculate the success rates, as percentages, for each treatment separately for patients with small stones and also patients with large stones. Small Stones Large Stones Treatment A 81/87 = 93.1% 192/263 = 73% Treatment B 234/270 = 86.7% 55/80 = 68.75% The above table contains the success rates for Treatments A and B for patients with Small and Large stones separately. [1 for each cell correct] [4 marks] (b) Calculate the success rates, as percentages, for the two treatments if the data from patients with small stones and large stones is combined. Combined Patients Treatment A Treatment B 273/350 = 78% 289/350 = 82.6% The above table contains the success rates for Treatments A and B separately if the patient data is combined. [1 for each cell] [2 marks] 6 (c) Discuss briefly, which treatment appears to be more effective making appropriate reference to your answers to (a) and (b). When combining the patient data in part (b), Treatment A appears marginally less successful than Treatment B (78% success rate compared to 82.6%). However, when we take into account the different size stones in part (a), we observe that Treatment A has a higher success rate than Treatment B in both groups (small: 93.1% compared to 86.7%, large: 73% compared to 68.75%). [2 for any reasonable discussion] [2 marks] [Total: 8] 4. Sampling In June 2012, the Australian Newspoll surveyed 1202 Australians aged 18 years or over. Of those surveyed, 62% stated that they are happy with their standard of living. (a) The 1202 people surveyed are a sample. What is the corresponding population? All Australians aged 18 years or over. [1 mark] (b) Is the value 62% a parameter or a statistic? Explain your answer. 62% is a statistic, as it is a numerical characteristic of the sample that can be used to estimate a parameter. [1 mark] (c) For each of the following statements, decide if you think the statement is true or false and give a reason why you think it is true or false. i. Even though we are not told how the sample was taken, this is a representative sample. [2 marks] The statement is false[1] , as it could just be very rich people who are happy who were sampled etc[for reasonable discussion] . ii. If the sample is a simple random sample selected from Australians who are 18 years or older, we can use the sample to estimate the 7 proportion of all Australians of any age who are happy with their standard of living. [2 marks] The statement is false[1] as we have only sampled from Australians aged 18 or older and so can only make statements for Australian that are 18 or older[1] . iii. If the sample is a simple random sample selected from Australians who are 18 years or older, we can state that exactly 62% of all Australians aged 18 or over are happy with their standard of living. [2 marks] This statement is false[1] as we know that the 62% obtained in the samples is an estimate of true percentage of all Australians aged 18 or older who are happy with their standard of living. So the true value may be 62%, but it may also be something close to 62%. [1] [Total: 8] [[Assignment total: 30]] 8
© Copyright 2024