Name: __________________________ Date: _____________ 1. A study is conducted to determine if one can predict the yield of a crop based on the amount of yearly rainfall. The response variable in this study is A) the yield of the crop. B) the amount of yearly rainfall. C) the experimenter. D) either bushels or inches of water. E) the month the crop is harvested. 2. A researcher is interested in determining if one can predict the score a student gets on a statistics exam from the amount of time the student spends studying for the exam. In this study, the explanatory variable is A) the researcher. B) the students taking the exam. C) the score on the exam. D) the fact that this is a statistics exam. E) the amount of time spent studying for the exam. 3. When creating a scatterplot, one should A) use only positive values of the explanatory variable. B) use the horizontal axis for the explanatory variable. C) use a different plotting symbol depending on whether the explanatory variable is categorical or the response variable is categorical. D) use a plotting scale that makes the overall trend roughly linear. E) use the horizontal axis for the response variable. Page 1 Use the following to answer questions 4-5: A researcher measures the height (in feet) and volume of usable lumber (in cubic feet) of 32 cherry trees. The goal is to determine if the volume of a tree's usable lumber can be estimated from the height of the tree. The results are plotted below. 4. In the study above, the response variable is A) number of trees. B) volume. C) height or volume; it doesn't matter which is considered the response variable. D) neither height nor volume; the measuring instrument used to measure height is the response variable. E) height. 5. The scatterplot above suggests that A) there is a positive association between height and volume. B) there is an outlier in the plot. C) both A and B. D) neither A nor B. E) the relationship between height and volume is nonlinear. Page 2 6. At a large university, the office responsible for scheduling classes notices that demand is low for classes that meet before 10:00 AM or after 3:00 PM and is high for classes that meet between 10:00 AM and 3:00 PM. Which of the following may we conclude? A) There is an association between demand for classes and the time the classes meet. B) The association between demand for classes and time for classes is linear. C) There is a negative association between demand for classes and the time the classes meet. D) There is no association between demand for classes and the time the classes meet. E) There is a positive association between demand for classes and the time the classes meet. 7. The graph below plots the gas mileage (miles per gallon) of various 1978 model cars versus the weight of these cars in thousands of pounds. In the graph, the points denoted by the plotting symbol x correspond to cars made in Japan. From this plot, we may conclude that A) in 1978 there was little difference between Japanese cars and cars made in other countries. B) in 1978 Japanese cars tended to be lighter in weight than other cars. C) in 1978 Japanese cars tended to get poorer gas mileage than other cars. D) there is a positive association between weight and gas mileage for Japanese cars. E) the plot is invalid. A scatterplot is used to represent quantitative variables, and the country that makes a car is a qualitative variable. Page 3 8. Volunteers for a research study were divided into three groups. Group 1 listened to Western religious music, group 2 listened to Western rock music, and group 3 listened to Chinese religious music. The blood pressure of each volunteer was measured before and after listening to the music, and the change in blood pressure (blood pressure before listening minus blood pressure after listening) was recorded. To explore the relationship between type of music listened to and change in blood pressure, we could A) see if blood pressure decreases as type of music increases by examining a scatterplot. B) make a histogram of the change in blood pressure for all of the volunteers. C) make side-by-side boxplots of the change in blood pressure, with a separate boxplot for each group. D) make a pie chart displaying the distribution of type of music listened to for all of the volunteers. E) do all of the above. 9. A school guidance counselor examines the number of extracurricular activities of students and their grade point average. The guidance counselor says, “The evidence indicates that the correlation between the number of extracurricular activities a student participates in and his or her grade point average is close to zero.” A correct interpretation of this statement would be that A) active students tend to be students with poor grades, and vice versa. B) students with good grades tend to be students that are not involved in many activities, and vice versa. C) students involved in many extracurricular activities are just as likely to get good grades as bad grades. The same is true for students involved in few extracurricular activities. D) as a student becomes more involved in extracurricular activities, there will be a change in his/her grades. E) involvement in many extracurricular activities and good grades go hand in hand. Page 4 10. A student wonders if people of similar heights tend to date each other. She measures herself, her dormitory roommate, and the women in the adjoining rooms; then she measures the next man each woman dates. Here are the data (heights in inches): Women Men 66 72 64 68 66 70 65 68 70 74 65 69 Which of the following statements is true? A) The variables measured are all categorical. B) There is a strong negative association between the heights of men and women, since the women are always smaller than the men they date. C) Tall women tend to date short men. D) Any height above 70 inches must be considered an outlier. E) There is a positive association between the heights of men and women who date each other. 11. Which of the following statements about the correlation coefficient is true? A) The correlation coefficient measures the proportion of variability between the two variables. B) The correlation coefficient will be equal to 1 only if all the data lie on a perfectly horizontal straight line. C) The correlation coefficient measures the fraction of outliers that appear in a scatterplot. D) The correlation coefficient has no unit of measurement and must always lie between –1 and 1, inclusive. E) The correlation coefficient equals the proportion of times two variables lie on a straight line. 12. A study found a correlation of r = –0.61 between the gender of a worker and his or her income. We may correctly conclude that A) women earn more than men on the average. B) women earn less than men on the average. C) an arithmetic mistake was made, since correlation must always be positive. D) this result is incorrect, because computing r makes no sense in this situation. E) on average, women earn 61% less than men. Page 5 13. Consider the scatterplot below. According to the scatterplot, which of the following is a plausible value for the correlation coefficient between weight and MPG? A) 1.0 . B) 0.9 . C) 0.5 . D) 0.2. E) 0.7. Page 6 14. Consider the scatterplot below. The correlation between X and Y is approximately A) 0.999. B) 0.8. C) 0.5. D) 0. E) –0.7. Page 7 15. Consider the scatterplot below. We may conclude that A) the correlation between X and Y must be close to 1 since there is a nearly perfect relationship between them. B) the correlation between X and Y shows a quadratic relationship. C) the correlation between X and Y is close to 0. D) the correlation between X and Y could be any number between –1 and 1. Without knowing the actual values of X and Y we can say nothing more. E) the correlation between X and Y must be close to –1 since there is a nearly perfect relation between them, but it is not a straight-line relation. Use the following to answer questions 16-17: I wish to determine the correlation between the height (in inches) and weight (in pounds) of 21-year-old males. To do this, I measure the height and weight of two 21-year-old men. The measured values are Height Weight Male #1 70 160 Male #2 75 200 16. Referring to the information above, the correlation r computed from the measurements on these males is A) equal to 1. B) positive and between 0.25 and 0.75. C) near 0, but could be either positive or negative. D) exactly 0. E) Meaningless, since the slope is greater than 1. Page 8 17. Referring to the information above, which of the following units would the correlation coefficient r have? A) Inches. B) Pounds. C) Pounds per inch. D) None, because r has no units. E) Inches-pounds. 18. Which of the following is true of the correlation coefficient r? A) It is a resistant measure of association. B) It does not change if either all the X-data or all the Y-data are multiplied by a constant. C) If r is the correlation between X and Y, then –r is the correlation between Y and X. D) r can never be 0 if there is a linear relationship between X and Y. E) All of the above. 19. The scatterplot below is from a small data set. The data were classified as either type 1 or type 2. Those of type 1 are indicated by o's, those of type 2 by x's. The overall correlation of the data in this scatterplot is A) positive. B) near 0, since the overall data do not show a distinct pattern. C) near 0, because the o's display a negative trend and the x's display a negative trend, but the trend from the o's to the x's is positive. The different trends cancel. D) impossible to compute for such a data set. E) negative, since the o's display a negative trend and the x's display a negative trend. Page 9 20. A scatterplot of a variable Y versus a variable X produced the results below. The value of Y for all values of X is exactly 1.0. The correlation between Y and X is A) 1, because the points lie perfectly on a line. B) either 1 or –1, because the points lie perfectly on a line. C) 0, because Y does not change as X increases. D) impossible to determine, since there is no slope to the data. E) none of the above. 21. The profits (in multiples of $100,000) versus the sales (in multiples of $100,000) for a number of companies are plotted below. The correlation between profits and sales is 0.814. Suppose we removed the point that is circled from the data represented in the plot. The correlation between profits and sales would then be A) B) C) D) E) 0.814. significantly larger than 0.814. significantly smaller than 0.814. slightly larger than 0.814. slightly smaller than 0.814. Page 10 22. Volunteers for a research study were divided into three groups. Group 1 listened to Western religious music, group 2 listened to Western rock music, and group 3 listened to Chinese religious music. The blood pressure of each volunteer was measured before and after listening to the music, and the change in blood pressure (blood pressure before listening minus blood pressure after listening) was recorded. A scatterplot of change in blood pressure versus type of music listened to is given below. The correlation between change in blood pressure and type of music listened to is A) negative. B) positive. C) first negative, then positive. D) nearly 0. E) none of the above. Page 11 23. The profits (in multiples of $100,000) versus the sales (in multiples of $100,000) for a number of companies are plotted below. Notice that in the plot, profits is treated as the response variable and sales as the explanatory variable. The correlation between profits and sales is 0.814. Suppose we had taken sales to be the response variable and profits to be the explanatory variable. In this case, the correlation between sales and profits would be A) 0.814. B) –0.814. C) 0. D) any number between 0.814 and 0.814, but we can't state the exact value. E) 1, since the direction of the data doesn't change. Page 12 24. Below is a scatterplot of the calories and sodium content (in milligrams) of several brands of meat hot dogs. The least-squares regression line has been drawn on the plot. Based on the least-squares regression line in this scatterplot, one would predict that a hot dog containing 100 calories would have a sodium content (in milligrams) of about A) 70. B) 350. C) 375. D) 400. E) 600. Page 13 25. The British government conducts regular surveys of household spending. The average weekly household spending on tobacco products and alcoholic beverages for each of 11 regions in Great Britain was recorded. A scatterplot of spending on alcohol versus spending on tobacco is given below. Which of the following statements is true? A) The observation (4.5, 6.0) is an outlier. B) There is clear evidence of a negative association between spending on alcohol and spending on tobacco. C) The equation of the least-squares line for this plot would be approximately y 10 2 x D) The correlation coefficient for this data is 0.99. E) The observation in the lower right corner of the plot is influential. 26. The fraction of the variation in the values of y that is explained by the least-squares regression of y on x is A) the correlation coefficient. B) the slope of the least-squares regression line. C) the square of the correlation coefficient. D) the intercept of the least-squares regression line. E) the residual. Page 14 27. In a statistics course, a linear regression equation was computed to predict a student's final exam score from his/her score on the first test. The equation of the least-squares regression line was yˆ 10 0.9 x where y represents the final exam score and x is the score on the first exam. Suppose Joe scores a 90 on the first exam. What would be the predicted value of his score on the final exam? A) 91. B) 90. C) 89. D) 81. E) It cannot be determined from the information given. We also need to know the correlation coefficient. 28. John's parents recorded his height at various ages up to 66 months. Below is a record of the results. Age (months) 36 Height (inches) 35 48 38 54 41 60 43 66 45 Which of the following is the equation of the least-squares regression line of John's height on age? (NOTE: You do not need to directly calculate the least-squares regression line to answer this question.) A) = 12 (Age). B) = 0.34 + 22.3 (Age). C) = Age/12. D) = 60 – 0.22 (Age). E) = 22.3 + 0.34 (Age). Page 15 29. Foresters use regression to predict the volume of timber in a tree using easily measured quantities such as diameter. Let y be the volume of timber in cubic feet and x be the tree's diameter in feet (measured at three feet above ground level). One set of data gives the following least-squares regression equation: yˆ = –30 + 60x The predicted volume of timber in a tree of diameter 18 inches is A) 1080 cubic feet. B) 1050 cubic feet. C) 90 cubic feet. D) 60 cubic feet. E) 30 cubic feet. 30. A researcher wishes to determine whether the rate of water flow (in liters per second) over an experimental soil bed can be used to predict the amount of soil washed away (in kilograms). The researcher measures the amount of soil washed away for various flow rates and from these data calculates the least-squares regression line to be [y-hat]amount of eroded soil[y-hat] = 0.4 + 1.3 (flow rate) The correlation between amount of eroded soil and flow rate would be A) 1/1.3. B) 0.4. C) 1.3. D) positive, but we cannot say what the exact value is using the information given. E) either positive or negative. It is impossible to say anything about the correlation from the information given. 31. The least-squares regression line is A) the line that makes the square of the correlation in the data as large as possible. B) the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. C) the line that passes through the greatest number of data points. D) the line that best splits the data in half, with half of the points above the line and half below the line. E) all of the above. 32. Which of the following is true of the least-squares regression line? A) The slope is the change in the response variable that would be predicted by a unit change in the explanatory variable. B) It always passes through the point ( X , Y ), the means of the explanatory and response variables, respectively. C) It will only pass through all the data points if r = ± 1. D) No more than 50% of the residual values will be positive. E) All of the above. Page 16 33. A researcher wishes to study how the average weight Y (in kilograms) of children changes during the first year of life. He plots these averages versus the children's age X (in months) and decides to fit a least-squares regression line to the data with X as the explanatory variable and Y as the response variable. He computes the following quantities. r = correlation between X and Y = 0.9 X = mean of the values of X = 6.5 Y = mean of the values of Y = 6.6 s = standard deviation of the values of X = 3.6 sm = standard deviation of the values of Y = 1.2 The slope of the least-squares line is A) 0.30. B) 0.88. C) 1.01. D) 2.7. E) 3.0. 34. Recall that when we standardize the values of a variable, the distribution of standardized values has mean 0 and standard deviation 1. Suppose we measure two variables X and Y on each of several subjects. We standardize both variables and then compute the least-squares regression line of Y on X for these standardized values. Suppose the slope of this least-squares regression line is –0.44. We may conclude that A) the correlation will be 1/–0.44. B) the intercept will also be –0.44. C) the intercept will be 1.0. D) the correlation will be 1.0. E) the correlation will also be –0.44. 35. In a study of 1991 model cars, a researcher found that the fraction of the variation in the price of cars that was explained by the least-squares regression on horsepower was about 0.64. For the cars in this study, the correlation between the price of the car and its horsepower was found to be positive. The actual value of the correlation A) is 0.80. B) is 0.64. C) is 0.41. D) is –0.80. E) cannot be determined from the information given. Page 17 36. In a study of 1991 model cars, a researcher computed the least-squares regression line of price (in dollars) on horsepower. He obtained the following equation for this line. = –6677 + 175 horsepower Based on the least-squares regression line, we would predict that a 1991 model car with horsepower equal to 200 would cost A) $41,677. B) $35,000. C) $34,175. D) $28,323. E) $13,354. 37. A scatterplot of the calories and sodium content of several brands of meat hot dogs is shown below. The least-squares regression line has been drawn on the plot. Referring to this scatterplot, the value of the residual for the point labeled x A) is about 40. B) is about 125. C) is about 425. D) is about 1300. E) cannot be determined from the information given. Page 18 38. A researcher wishes to determine whether the rate of water flow (in liters per second) over an experimental soil bed can be used to predict the amount of soil washed away (in kilograms). The researcher measures the amount of soil washed away for various flow rates and from these data calculates the least-squares regression line to be = 0.4 + 1.3 (flow rate) One of the flow rates used by the researcher was 0.3 liters per second; for this flow rate, the amount of eroded soil was 0.8 kilograms. These values were used in the calculation of the least-squares regression line. The residual corresponding to these values is A) 0.01. B) –0.01. C) 0.5. D) –0.5. E) –3.5. Page 19 39. A response variable Y and explanatory variable X were measured on each of several subjects. A scatterplot of the measurements is shown below. The least-squares regression line is shown in the plot. Which of the following five plots is a plot of the residuals for the data shown in the scatterplot above versus X? A) B) C) Page 20 D) E) 40. A least-squares regression line is fitted to a set of data. If one of the data points has a positive residual, then A) the correlation between the values of the response and explanatory variables must be positive. B) the point must lie above the least-squares regression line. C) the slope of the least-squares regression line must be positive. D) the point must lie near the right edge of the scatterplot. E) all of the above. Page 21 41. Which of the following statements concerning residuals is true? A) The sum of the residuals is always 0. B) A plot of the residuals is useful for assessing the fit of the least-squares regression line. C) The value of a residual is the observed value of the response minus the value of the response that one would predict from the least-squares regression line. D) If the data are linear, then the plot of the residuals should have no discernible pattern. E) All of the above. 42. Consider the scatterplot below. The point indicated by the plotting symbol x would be A) a residual. B) influential. C) a z-score. D) a least-squares point. E) a partial outlier. Page 22 43. A sample of 79 companies was taken, and the annual profits (y) were plotted against annual sales (x). The plot is given below. All values in the plots are in units of $100,000. The correlation between sales and profits is found to be 0.814. Based on this information, we may conclude which of the following? A) If the sales were less than $20,000, the equation of the least-squares regression line would predict the profits quite accurately. B) There are clearly influential observations present. C) If we group the companies in the plot into those that are small in size, those that are medium in size, and those that are large in size and compute the correlation between sales and profits for each group of companies separately, the correlation in each group will be about 0.8. D) Not surprisingly, increasing sales causes an increase in profits. This is confirmed by the large positive correlation. E) All of the above. Page 23 Answer Key 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. A E B B C A B C C E D D B B C A D B A C C E A B E C A E D D B E A E A D A A A B E B B Page 24
© Copyright 2024