Choose Ohio First Scholarship Program - Success in Mathematics Regression Lines Brian Dudek

Choose Ohio First Scholarship
Program - Success in Mathematics
Funded by the Ohio Board of Regents
Regression Lines
Brian Dudek
Alex Miksit
Melinda Toth
Question
Since we know the height of the goalie, will we
be able to predict their save percentages?
– Hypothesis: the taller the soccer goalie, the higher
save percentage they will have.
– We will be testing our hypothesis by looking at
different regression functions.
The goal of regression analysis is to create a
model used to predict Y when given X.
How to determine
the right regression
1. Plot the data points
2. Draw different forms of regression
3. By calculating the residuals, choose the regression curve
with the least error.
1.
A residual is the distance between the plotted point and the function
1.
the added sum of the residuals squared is the key in determining which regression
curve fits best.
Different forms
of regression
Least Square Line (LSL)- The least-squares line is
a regression that has the smallest amount of
space possible between each of the plotted
points and itself. Thus, a line with the minimal
deviation from all data points is desired.
Formula for
LSL
y = mx + b
Different forms
of regression
Median-Median Line- categorizes all the points
into three different groups, then takes the
median of all the groups, and lastly finds the
slope and the y-intercept. The med-med line is
more precise if outliers are present.
An outlier is a statistic
that lies an abnormal
distance outside other
values in a random
sample.
Formula for MedMed Line
y = mx + b
Different forms
of regression
Logarithmic
Quadratic
Power
•y = lnx
•y = ax2 + bx + c
• y = xa
•Ex. Reaction rate vs substrate
concentration in a chemical
reaction
Exponential
•y = ax
•Ex. A graph of time vs. speed
for an object affected by gravity
Regressions
LSL Line
Med-Med Line
Residuals
By looking at the residuals (r2), we can see that they are no
where near the preferred value of 1.00.
Plot of Residuals
LSL Line
By looking at the graph of our
residuals we see that yet
again they are no where near
the preferred value.
Plot of Residuals
Med-Med Line
Regressions
Quadratic
Cubic
Residuals
By looking at the residuals
(r2), we can see that they are
no where near the preferred
value of 1.00.
Plot of Residuals
Quadratic
Plot of Residuals
Cubic
Regressions
Quartic
Power
Residuals
Plot of Residuals
Quartic
By looking at the residuals
(r2), we can see that they are
no where near the preferred
value of 1.00.
Plot of Residuals
Power
Regressions
Exponential
Logarithmic
Exponential
Plot of Residuals
Exponential
By looking at the residuals
(r2), we can see that they are
no where near the preferred
value of 1.00.
Plot of Residuals
Logarithmic
Analyze Results
Do residuals
represent the best
possible regression
model for a given set
of data?
YES!
Question
Since we know the height of the goalie, will
we be able to predict their save
percentages?
Conclusion
Looking at our information we have concluded
that there is a slight correlation between the
height of the player and their save percentage.
This is because our residuals are nowhere
close to the line of regression.
Therefore, the best fit line, however horrible, is
Quartic for the correlation between height
and saved percentage for soccer goalies!
Extension
Now we are curious if there is a correlation
between age and save percentage, or
professional career length and save
percentage.
Information
07/08 Stats
Name
Shots on Goal
Shots Saved
Percent
Height (m)
Age
Yr's playing
Pat Onstad
100
76
76.00
1.93
40
21
Jon Busch
155
122
78.71
1.78
31
11
William Hesmer
130
97
74.62
1.88
26
4
Bouna Coundoul
75
54
72.00
1.88
26
2
Joe Cannon
162
124
76.54
1.88
33
10
Kevin Hartman
156
117
75.00
1.85
33
11
Nick Rimando
135
96
71.11
1.78
28
8
Dario Sala
129
92
71.32
1.93
33
14
Brad Guzan
68
48
70.59
1.93
23
4
Matt Reis
145
107
73.79
1.85
33
10
Greg Sutton
151
116
76.82
1.98
30
9
Jon Conway
140
98
70.00
1.98
30
8
Louis Crayton
62
43
69.35
1.83
30
13
Zach Wells
84
56
66.67
1.88
27
4
Preston Burpo
61
37
60.66
1.91
35
13
Steve Cronin
136
92
67.65
1.91
24
4
•
•
•
•
•
•
•
•
http://people.hofstra.edu/Stefan Waner/ calctopic1/regression.html
http://www.efunda.com/math/leastsquares/leastsquares.cfm
– Definition of LSL
http://cnx.org/content/m17090/latest/linrgs_regeq3.png
– Image of LSL
http://www.marketoracle.co.uk/images/2008/dow_gold_ratio_200years_feb08.jpg
– Image of Med-Med Line
http://mathbits.com/Mathbits/TISection/Statistics2/logarithmic.htm
– Image of logarithmic function
http://calculator.maconstate.edu/quad_regression/index.html
– Image of quad function
http://mathbits.com/mathbits/TISection/Statistics2/power.htm
– Image of power function
http://mathbits.com/Mathbits/TISection/Statistics2/exponential.htm
– Image of exponential function
– http://www.geocities.com/dolphdamerenee/tigger32goalie.jpg
-Tigger
http://4.bp.blogspot.com/_Rsl_LePBeKE/SDJoUNafuKI/AAAAAAAABqM/Y3_tVD5aUY/s400/Cartoon,+Black+Guy+with+Crazy+Graph.jpg
http://www.fhwa.dot.gov/construction/images/specs30.jpg
Monopoly man