A03 - Paul Gribble

Assignment 3—Due Friday Apr 3
Introduction to Statistics Using R
Paul Gribble
Winter, 2015
1
Multiple Regression
You are being considered for a position as the new coach of a professional basketball team. You just
finished a meeting with the owners, who are interested in replacing the existing coach because the
team’s performance has been going downhill. The owners of the team have attributed this downward
trend to poor recruitment of new players, who have been poor performers. The owners tell you that
their analysis of the data from the previous seasons indicates that one particularly weak area has been
“field goals” (when a player attempts to score a basket from the 3-point line or beyond). The team’s
players, in particular the new recruits, tend to be poor at field goals. The problem is, as the owners
tell you, that data are not available on field goal percentages, for players who are candidate recruits;
so when it comes time to recruit new players there is no easy way to pick new recruits who are good
field goal shooters. The owners tell you that they have been collecting data for years and years on the
new recruits and their field goal percentages, and a series of other measures that are available for the
new potential recruits as well. She suggests that “if you are smart enough to use the historical data to
be able to predict who among the potential new recruits will be the best field goal shooters, you are
hired!”
In the file called bball.csv1 you will find historical data on 105 previous recruits. The variables are:
GAMES
PPM
MPG
HGT
FGP
AGE
FTP
# games played in previous season
average points scored per minute
average minutes played per game
height of player (centimetres)
field-goal percentage (% successful shots from 3-point line)
age of player (years)
percentage of successful penalty free throws
In the file bball2.csv2 you will find data on 5 potential new recruits.
1. Use multiple regression to develop a model based on the data in bball.csv that will allow you
to predict the field goal percentages for the 5 new recruits.
1
2
http://www.gribblelab.org/stats/data/bball.csv
http://www.gribblelab.org/stats/data/bball2.csv
Intro Stats R
Assignment 3
2. Which player will you suggest to the owners that they hire? Explain your decision.
3. What is the precision (in units of FGP) with which you can predict field goal percentage?
4. After giving them your recommendation and explaining your model, the owners ask you “does
the player’s age have anything to do with this?”
5. “What about the player’s height? Would I be better off taking on taller players?”
6. “How many more (or fewer) field goals would a player score if he was 3 inches taller?”
2
Bootstrapping
Power Calculations
You are planning a new reaction time study for your thesis. You plan to have a main experimental
condition and a control condition. The control condition has been used previously in your lab, and
when you look at the previous data across many subjects, the reaction time data are best characterized
using an ex-gaussian 3 distribution with the following parameters: µ = 500, σ = 50, ν = 150. Note
that in R, the function rexGAUS() in the gamlss.dist package can be used to generate random
samples from the ex-gaussian distribution.
0.000 0.001 0.002 0.003 0.004
Density
RT:Controls
500
1000
1500
RT
Nobody has tried the experimental condition that you’re planning, but other studies in the literature
that use reaction time as a measure typically see changes in reaction time that correspond to an increase
3
http://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution
2
Intro Stats R
Assignment 3
in the µ parameter of the ex-gaussian of about +50 milliseconds (50 ms slower than controls). You
plan to use a Mann-Whitney U test to compare differences in mean reaction time between your control
group and your experimental group.
Note that the rexGAUS() function is extremely slow, so in a large loop it will take forever. Instead you
can for now, implement random sampling from an ex-gaussian distribution by taking the sum of random
values from a gaussian distribution and an exponential distribution. For example for the purposes of
speed in this assignment, you can assume that rnorm(1000,500,50) + rexp(1000,1/150) will
give you 1000 random values from an ex-gaussian distribution, just as if you were to get them using:
rexGAUS(1000,500,50,150).
1. Use parametric bootstrapping to estimate how many subjects you would require in each group
(assume equal group sizes) in order to detect a difference of +50 milliseconds at least 80% of
the time; in other words, to get a statistical power of 0.80. Assume an alpha level of 0.05.
2. How many subjects would you require for a statistical power of 0.90?
3. If the effect is only 25 milliseconds, how many subjects would you need for a power of 0.80?
Hypothesis Testing
After running an experiment with a control group and an experimental group, your data look like
this: 4
control
935 508
557 571
607 564
767 713
717 611
515 972
498 438
557 641
492 653
803 714
experimental
857
801
511
772
694
635
541 1207
719
663
1143 613
680
579
1161 1128
837
475
609
883
4. Use non-parametric bootstrapping to test the null hypothesis that the two groups were sampled
from the same population with the same mean.
4
http://www.gribblelab.org/stats/data/bootdata.txt
3