Download Report

Solution to Assignment 3, winter 2015
Introduction to Statistics/MAT2375
Textbook problems
[7.1-4]
Solution.
Here are the R commandes :
x=c(55.95, 56.54,57.58,55.13,57.48,56.06,59.93,58.30,52.57,58.46)
xbarre=mean(x)
var=4
alpha=0.05
n=length(x)
z=qnorm(1-alpha/2)
errmax=z*sqrt(var/n)
int=c(xbarre-errmax,xbarre+errmax)
1. 56.8
2. [ 55, 56 ; 58, 04 ]
3. Supposer µ = 56.8
52 − 56.8
X − 56.8
<
)
2
2
= Pr(Z < −2.4) o Z ∼ N (0, 1)
= Φ(−2.4)
= 0.0082
Pr(X < 52) = Pr(
[7.1-6]
Solution.
Use normal approximation (CLT) for the mean of observations.
R commands :
xbarre=11.95
s=11.80
alpha=0.05
n=37
z=qnorm(1-alpha/2)
errmax=z*s/sqrt(n)
int=c(xbarre-errmax,xbarre+errmax)
The 95% confidence interval is [ 8, 15 ; 15, 75 ] (colonies/100ml)
[7.1-8]
Solution
R commands :
x=c(37.4, 48.8, 46.9, 55.0, 44.0)
xbarre=mean(x)
alpha=0.1
n=length(x)
sd=sd(x)
t=qt(alpha/2, df=n-1, lower=F)
errmax=t*sd/sqrt(n)
int=c(xbarre-errmax,xbarre+errmax)
1. x¯ = 46.42
2. The 90% confidence interval is : [ 40.26 ; 52.58 ].
[7.1-14]
Solution
2
¯ ± zα/2 σ/√n. Therefore the
1. If σ 2 is known, the confidence interval is X
length of confidence interval is
√
√
√
¯ + 1, 96σ/ 5 − X
¯ + 1, 96σ/ 5 = 2 × 1, 96 × σ/ 5 = 1, 753σ.
L=X
We have
√
E[L] = E[2 × 1, 96 × σ/ 5] = 1, 753σ
puisque σ is a constant.
¯ ± tα/2 (n − 1)s/√n and
2. If σ 2 is unknown, the confidence interval is X
the length of this interval is
√
√
√
¯ + 2.776S/ 5 − X
¯ + 2.776S/ 5 = 2 × 2.776 × S/ 5 = 2.483S
L=X
Then E[L] = E[2.483S] = 2.483E[S]. Here S is a random variable.
From the exercise 6.4-14 :
E[cS] = σ
where
√
)
n − 1Γ( n−1
2
√
.
c=
n
2Γ( 2 )
Therefore E[S] = σ/c and
E[L] = 2.483σ/c
√
2Γ(5/2)
= √
× 2.483 × σ
4Γ(2)
= 2.334σ
[7.2-2]
Solution
R commandes :
3
x=c(644,493,532,462,565)
y=c(623,472,492,661,540,502,549,518)
xbar = mean(x)
ybar = mean(y)
n=length(x)
m=length(y)
s2x=var(x)
s2y=var(y)
sp2=((n-1)*s2x+(m-1)*s2y)/(n+m-2)
marg=qt(0.05, df=n+m-2, lower=F)*sqrt(sp2)*sqrt(1/n+1/m)
ci=c(xbar-ybar-marg, xbarre-ybarre+marg)
The 95% confiance interval is % est [ −90.10 ; 79.25 ] and the 90%
confidence interval is [ −74, 51 63, 66 ].
[7.2-8]
Solution
¯ −Y¯ is a linear combination of n+m independent variablewith Normal
1. X
¯ − Y¯ has a normal distribution. We only need
distribution. Therefore X
¯ − Y¯ :
to find the mean and the variance for X
¯ − Y¯ ] = µX − µY
E[X
¯ − Y¯ ] = V [X]
¯ + V [Y¯ ]
V [X
2
= σX
/n + σY2 /m
= dσY2 /n + σY2 /m
We have
¯ − Y¯ − µX − µY
¯ − Y¯ − E[X
¯ − Y¯ ]
X
X
p
= p
∼ N (0, 1)
¯ − Y¯ ]
dσY /n + σY /m
V [X
2. Also
(n − 1)s2X /dσy2 ∼ χ2 (n − 1),
(m − 1)s2Y /σY2 ∼ χ2 (m − 1)
4
and s2X is independent from s2Y Therefore,
(n − 1)s2x (m − 1)s2y
+
∼ χ2 (n + m − 2)
dσY2
σY2
3. The variables in (a) and (b) are independantes since the samples are
drawn from a normal population and the mean is independant from
the variance, and samples are independent from each other.
4. Recall that if Z ∼ N (0, 1) et X ∼ χ2 (ν), then √Z ∼ t(ν). Now
X/ν
√
T =
√
=
¯
¯
X−Y −µX −µY
n + m − 2√
2 /n+σ 2 /m
dσY
Y
q
∼ t(n + m − 2)
(m−1)s2y
(n−1)s2x
+
dσ 2
σ2
Y
n+m−
q
Y
¯ Y¯ −µX −µY
√
2 X−
d/n+1/m
(n−1)s2x
d
+ (m − 1)s2y
¯ − Y¯ − µX − µY
X
= q
(n−1)s2x /2+(m−1)s2y
(d/n + 1/m)
n+m−2
Then use T above to find a confidence interval for µx − µy .
[7.2-9]
Solution
R commandes :
hbefore=c(11.1,19.5,14,8.3,12.4,7.89,12.10,8.3,12.31,10)
hafter=c(9.97,15.8,13.02,9.28,11.51,7.4,10.7,10.4,11.4,11.95)
hdiff=hbefore-hafter
hbar=mean(hdiff)
hsd=sd(hdiff)
n=length(hdiff)
marg=qt(0.05, df=n-1, lower=F)*hsd/sqrt(n)
ci=c(hbar-marge, hbar+marge)
fbefore=c(22.9,31.6,27.7,21.7,19.36,25.03,26.9,25.75,23.63,25.06)
fafter=c(22.89,33.47,25.75,19.8,18,22.33,25.26,24.9,21.8,24.28)
5
fdiff=fbefore-fafter
fbar=mean(fdiff)
fsd=sd(fdiff)
n=length(fdiff)
marge=qt(0.05, df=n-1, lower=F)*fsd/sqrt(n)
ci=c(fbarre-marge, fbarre+marge)
pdf("qqplot-qu7.pdf",width=8,height=4)
par(mfrow=c(1,2))
qqnorm(hdiff, sub="Male")
qqline(hdiff)
qqnorm(fdiff, sub="Female")
qqline(fdiffs)
dev.off()
1. [ −1.45 ; 0.56 ]
2. [ −1.86 ; −0.37 ]
3. Not for males beacuse the confidence interval contains 0. Yes for females
since for females the confidence interval does not contain 0 0.
4. The QQ plots are for male and females. There is no significant indication violating normality. Normality is more problematic pour for males
compare to females. For females there is one extreme point.
Normal Q−Q Plot
Normal Q−Q Plot
●
0
●
●
●
−2
●
−1.5
−0.5
0.5
1.0
●
●
●
●
1
●
●
●
0
Sample Quantiles
●
●
−1
2
1
●
●
−2 −1
Sample Quantiles
●
2
●
3
●
1.5
●
−1.5
Theoretical Quantiles
Male
−0.5
0.5
1.0
1.5
Theoretical Quantiles
Female
[7.2-14]
S 2 /σ 2
Solution Since F = Sy2 /σy2 follows from an f distribution with m − 1 et n − 1
x
x
degrees of freedom and
Pr(F1−α/2 ≤
Sy2 σx2
≤ Fα/2 ) = 1 − α
Sx2 σy2
6
Sx2
σx2
Sx2
≤
≤
F
α/2 2 ) = 1 − α
Sy2
σy2
Sy
s
s
Sx2
S2
σx
≤ Fα/2 x2 ) = 1 − α
⇒ Pr( F1−α/2 2 ≤
Sy
σy
Sy
⇒ Pr(F1−α/2
(where Fα/2 satisfies P r(F ≥ Fα/2 ) = α/2). Therefore, a 100(1 − α)% confidence interval for σx2 /σy2 is
Sx2
Sx2
[ F1−α/2 2 ; Fα/2 2 ]
Sy
Sy
and a 100(1 − α)% confidence interval for σx /σy is
s
s
Sx2
S2
[ F1−α/2 2 ; Fα/2 x2 ]
Sy
Sy
The R commands for the confidence interval is :
n=13
m=9
sx2=128.41/12
sy2=36.72/8
ciinf=qf(0.99,lower=F,df1=m-1,df2=n-1)*sx2/sy2
cisup=qf(0.01,lower=F,df1=m-1,df2=n-1)*sx2/sy2
sqrt(ciinf)
sqrt(cisup)
[7.3-10]
Solution
R commands :
n1=194
n2=162
y1=28
y2=11
7
p1=y1/n1
marg=qnorm(0.025,lower=F)*sqrt(p1*(1-p1)/n1)
ci=c(p1-marg,p1+marg)
p2=y2/n2
p1-p2
marg=qnorm(0.05,lower=F)*sqrt(p1*(1-p1)/n1+p2*(1-p2)/n2)
ci=p1-p2-marg
1. pˆ1 = 0.144
2. [ 0.095 ; 0.194 ]
3. pˆ1 − pˆ2 = 0.076
4. The lower limit is 0.0237. Therefore, the confidence interval is [ 0.024 ; 1 ]
10. R problem.
(a)
>x=rnorm(60,10,5)
> summary(x)
Min. 1st Qu. Median
Mean 3rd Qu.
Max.
-0.6316 5.6810 9.7870 10.0100 14.1300 21.8700
> var(x)
[1] 30.77092
> mean(x)
[1] 10.01327
(b)
>
>
>
>
>
+
+
+
x=rnorm(35000,10,5)
X=matrix(x,ncol=70)
a=rep(0,500)
b=rep(0,500)
for(i in 1:500){
a[i]=mean(X[i,])-1.96*5/(sqrt(70))
b[i]=mean(X[i,])+1.96*5/(sqrt(70))
}
8
#sigm=5 known, z=1.96
#the same here
> a[1]
[1] 8.083608
> b[1]
[1] 10.42626
> sum(a<10 & b>10)/500 #This command counts intervals containing 10
[1] 0.958
>a #this gives all 500 lower part of confidence interval
>b #this gives all 500 upper part of confidence interval
You need to repaet this program by replacing 5 by sd(X[i,]) instead of 5 and
1.96 by qt(0.975,69) where 69 is the degrees of freedom.
> x=rnorm(35000,10,5)
> X=matrix(x,ncol=70)
> a=rep(0,500)
> b=rep(0,500)
> for(i in 1:500){
+ a[i]=mean(X[i,])-qt(0.975,69)*sd(X[i,])/(sqrt(70)) #sigm=5 known, z=1.96
+ b[i]=mean(X[i,])+qt(0.975,69)*sd(X[i,])/(sqrt(70)) #the same here
+ }
> a[1]
[1] 8.083608
> b[1]
[1] 10.42626
> sum(a<10 & b>10)/500 #This command counts intervals containing 10
[1] 0.956
>a #this gives all 500 lower part of confidence interval
>b #this gives all 500 upper part of confidence interval
9