AP Stats – Chap 27 Inferences for Regression Finally, we’re interested in examining how slopes of regression lines vary from sample to sample. Each sample will have it’s own slope, b1. These are all estimates of the “true” slope, β1. The distribution of all these slopes follows a t-model and has (n – 2) degrees of freedom. Things you will need to recall about regression from Chapters 7-9: how do we make a scatterplot? what do we look for in a scatterplot? how do we find the correlation? what does it mean? what does R2 mean? how do we create a linear model (regression equation)? which variable goes where in the equation, and which gets the hat on it? how do we know if a linear model is appropriate? how do we create a residuals plot? what does the slope mean…in context? what does the y-intercept mean…in context? Regression Slope t-Test HYPOTHESIS null: ...within the population, there is no association between the variables (that we see in the example). …that the ideal regression line is plain, boring, has a β1 = 0 (horizontal line). alternative: …there is some relationship between the variables. …the β1 ≠ 0 MODEL conditions must be checked in order Straight Enough Condition – is the scatterplot of the original data straight enough? check the residuals plot! you may need to re-express. Independence Condition – this is nearlt impossible to check, so check for Randomization. often, the fact that the individuals are a representative sample of the population is the best that can be done. Does the Plot Thicken? Condition – the spread of the data around the regression “line” should be nearly constant. no fan shape! no growing or shrinking tendencies. again…look at residuals plot! Nearly Normal Condition – make a histogram of the residuals. it needs to be symmetric and unimodal enough. If all four conditions are true, the ideal regression line would look like: “With the conditions having been met, we can use a regression model for the distribution and a linear regression t-test.” MECHANICS if • • • • you have the individual data: enter data into L1 and L2 STAT TESTS LinRegTTest o Xlist: L1 o Ylist: L2 o Freq:1 o choose the two-tailed (≠) option o RegEQ: o CALCULATE if you have a computer regression analysis: • if the t-value is not given in the analysis, you’ll need to calculate t = draw a t-curve and shade it list the p-value list the regression equation (found in Y1) b1 SE ( b1 ) CONCLUSION reject / fail to reject “There is evidence that…” (provide context!) confidence interval: • if you have a TI-84/84+: o STAT o TESTS o LinRegTInt • if you don’t have a TI-84/84+: o find the t* value as before. (here, the df = n – 1) o interval is b1 ± ( t * ) SE ( b1 ) “We are ___% confident that the average __(dependent variable)__ increases/decreases/rises/falls/faster/slower/etc. between _(low)_ and _(high)_ __(units)__ for each additional __(independent variable)__.” Example #1 High Stakes Test New state requirements force students to take a “high stakes” math test in order to graduate from high school. Faced with such a pressure-laden situation, many students become very nervous, which may interfere with their ability to perform well. Concerned about “test anxiety,” a researcher enlists 24 student volunteers for a study. A psychologist interviews them before the math test, assessing their anxiety levels on a scale from 1 to 10. The table shows the anxiety levels and exam scores. 1. Sketch a scatterplot. 2. Does there appear to be an association between anxiety level and test score? Describe what you see in the scatterplot. 3. Find the correlation. What does it indicate? 4. Interpret the R2 in context. 5. Create the linear model. 6. Is this linear model appropriate? Sketch and discuss the residuals plot. 7. Interpret the slope of this line in context. 8. Interpret the y-intercept of this line in context. 9. Is there evidence of an association between anxiety levels and student performance? (Perform a test.) 10. Provide a 95% confidence interval. Example #2 Electricity Usage Investigate the association between average monthly temperature ( °F ) and electrical usage (kilowatt hours) for a home. Original data – avg temp (x) v. kwh (y) Residual plot – avg temp (x) v. residuals (y) Histogram of residuals Is there evidence of an association between average monthly temperate and electrical usage? Explain the association using a 95% confidence interval. Example #3 GPAs Ten students in a graduate program were randomly selected. Their grade point averages (GPAs) when they entered the program were between 3.5 and 4.0. The students’ GPAs on entering the program and their current GPAs were recorded. Use the regression analysis below to answer the questions. 1. Create the linear model. 2. Interpret the p-value. 3. Find a 95% confidence interval for the slope of the regression line. Example #4 Heights and Weights Is the height of a man related to his weight? The regression analysis from a sample of 26 men is shown. 1. How many degrees of freedom are there? 2. What is the t-value? 3. Find a 98% confidence interval for the slope of the regression line.
© Copyright 2024