Assignment 3—Due Friday Apr 3 Introduction to Statistics Using R Paul Gribble Winter, 2015 1 Multiple Regression You are being considered for a position as the new coach of a professional basketball team. You just finished a meeting with the owners, who are interested in replacing the existing coach because the team’s performance has been going downhill. The owners of the team have attributed this downward trend to poor recruitment of new players, who have been poor performers. The owners tell you that their analysis of the data from the previous seasons indicates that one particularly weak area has been “field goals” (when a player attempts to score a basket from the 3-point line or beyond). The team’s players, in particular the new recruits, tend to be poor at field goals. The problem is, as the owners tell you, that data are not available on field goal percentages, for players who are candidate recruits; so when it comes time to recruit new players there is no easy way to pick new recruits who are good field goal shooters. The owners tell you that they have been collecting data for years and years on the new recruits and their field goal percentages, and a series of other measures that are available for the new potential recruits as well. She suggests that “if you are smart enough to use the historical data to be able to predict who among the potential new recruits will be the best field goal shooters, you are hired!” In the file called bball.csv1 you will find historical data on 105 previous recruits. The variables are: GAMES PPM MPG HGT FGP AGE FTP # games played in previous season average points scored per minute average minutes played per game height of player (centimetres) field-goal percentage (% successful shots from 3-point line) age of player (years) percentage of successful penalty free throws In the file bball2.csv2 you will find data on 5 potential new recruits. 1. Use multiple regression to develop a model based on the data in bball.csv that will allow you to predict the field goal percentages for the 5 new recruits. 1 2 http://www.gribblelab.org/stats/data/bball.csv http://www.gribblelab.org/stats/data/bball2.csv Intro Stats R Assignment 3 2. Which player will you suggest to the owners that they hire? Explain your decision. 3. What is the precision (in units of FGP) with which you can predict field goal percentage? 4. After giving them your recommendation and explaining your model, the owners ask you “does the player’s age have anything to do with this?” 5. “What about the player’s height? Would I be better off taking on taller players?” 6. “How many more (or fewer) field goals would a player score if he was 3 inches taller?” 2 Bootstrapping Power Calculations You are planning a new reaction time study for your thesis. You plan to have a main experimental condition and a control condition. The control condition has been used previously in your lab, and when you look at the previous data across many subjects, the reaction time data are best characterized using an ex-gaussian 3 distribution with the following parameters: µ = 500, σ = 50, ν = 150. Note that in R, the function rexGAUS() in the gamlss.dist package can be used to generate random samples from the ex-gaussian distribution. 0.000 0.001 0.002 0.003 0.004 Density RT:Controls 500 1000 1500 RT Nobody has tried the experimental condition that you’re planning, but other studies in the literature that use reaction time as a measure typically see changes in reaction time that correspond to an increase 3 http://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution 2 Intro Stats R Assignment 3 in the µ parameter of the ex-gaussian of about +50 milliseconds (50 ms slower than controls). You plan to use a Mann-Whitney U test to compare differences in mean reaction time between your control group and your experimental group. Note that the rexGAUS() function is extremely slow, so in a large loop it will take forever. Instead you can for now, implement random sampling from an ex-gaussian distribution by taking the sum of random values from a gaussian distribution and an exponential distribution. For example for the purposes of speed in this assignment, you can assume that rnorm(1000,500,50) + rexp(1000,1/150) will give you 1000 random values from an ex-gaussian distribution, just as if you were to get them using: rexGAUS(1000,500,50,150). 1. Use parametric bootstrapping to estimate how many subjects you would require in each group (assume equal group sizes) in order to detect a difference of +50 milliseconds at least 80% of the time; in other words, to get a statistical power of 0.80. Assume an alpha level of 0.05. 2. How many subjects would you require for a statistical power of 0.90? 3. If the effect is only 25 milliseconds, how many subjects would you need for a power of 0.80? Hypothesis Testing After running an experiment with a control group and an experimental group, your data look like this: 4 control 935 508 557 571 607 564 767 713 717 611 515 972 498 438 557 641 492 653 803 714 experimental 857 801 511 772 694 635 541 1207 719 663 1143 613 680 579 1161 1128 837 475 609 883 4. Use non-parametric bootstrapping to test the null hypothesis that the two groups were sampled from the same population with the same mean. 4 http://www.gribblelab.org/stats/data/bootdata.txt 3
© Copyright 2024