Author: Min Ren Analysis for Z tests

Author: Min Ren
STAT 350 (Fall 2014)
Lab 5: R Solution
1
Lab 5 - Interpretation of Confidence Intervals and Power
Analysis for Z tests
Objectives: A Better Understanding of Confidence Intervals and
Power Curves.
A. (55 points) Interpretation of a Confidence Interval. Use software to generate 40
observations from a normal distribution with µ = 10 and σ = 2. Repeat this 50 times.
1. (30 points) From each set of observations, compute a 90% confidence interval. No data is
required, however, you need to include all 50 confidence intervals.
Solution:
Sample Code 1:
(You should run the following code 50 times. Each time, you should copy the table, record the
confidence interval and determine by hand if this confidence interval contains the population
mean: 10.)
n <- 40
RandomData <- rnorm(n,mean=10,sd=2)
ci <- t.test(RandomData,conf.level=0.90)
answer <- data.frame(i,ci$conf.int[1], ci$conf.int[2])
print(answer)}
Sample Code 2:
(You may consider the following code, which is more difficult to write, but generate the 50
tables simultaneously.)
>
>
+
+
+
+
+
>
colnames(res)<-c("90% lower limit","90% upper limit","contain mu?")
for (i in 1:50){
RandomSample<-rnorm(40,10,2)
t<-t.test(RandomSample,conf.level=0.9)
res[i,1:2]<-t$conf.int[1:2]
res[i,3]<-(10>t$conf.int[1]) & (10<t$conf.int[2])
}
res
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
[11,]
[12,]
[13,]
[14,]
[15,]
[16,]
[17,]
90% lower limit
9.711632
9.603163
9.442489
10.232928
9.548948
9.655207
9.984213
9.394327
9.534189
8.710827
8.948052
9.291709
9.371126
10.192441
9.932500
9.833176
9.449219
90% upper limit
10.656472
10.787195
10.330170
11.131520
10.722365
10.843896
11.099040
10.542664
10.548846
9.658468
9.773528
10.241216
10.356686
11.147714
10.834223
10.806848
10.533126
STAT 350: Introduction to Statistics
Purdue University
contain mu?
1
1
1
0
1
1
1
1
1
0
0
1
1
0
1
1
1
Fall 2014
Author: Min Ren
STAT 350 (Fall 2014)
[18,]
[19,]
[20,]
[21,]
[22,]
[23,]
[24,]
[25,]
[26,]
[27,]
[28,]
[29,]
[30,]
[31,]
[32,]
[33,]
[34,]
[35,]
[36,]
[37,]
[38,]
[39,]
[40,]
[41,]
[42,]
[43,]
[44,]
[45,]
[46,]
[47,]
[48,]
[49,]
[50,]
9.226593
9.597059
9.062439
9.644769
9.434676
9.689123
8.963024
8.980457
9.534915
9.231642
9.235816
9.377265
8.978152
9.234448
10.062863
9.796705
9.478981
10.306448
9.637880
9.334300
9.557508
9.062608
9.266484
9.707183
9.269635
9.581570
9.348208
9.567426
9.560901
9.073926
9.582457
9.403577
9.448157
Lab 5: R Solution
10.431731
10.724411
10.218339
10.790207
10.444451
10.600831
10.063669
10.051737
10.704241
10.425103
10.315803
10.525817
10.185177
10.222421
10.864413
10.706976
10.446474
11.409389
10.698254
10.364254
10.555677
10.117098
10.371214
10.860903
10.572222
10.653135
10.503030
10.748857
10.679847
10.070101
10.771986
10.664905
10.417373
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2. (10 points) Determine how many of these intervals contain the population mean, µ = 10.
Please indicate for each confidence interval if it contains the value or not. Is this number
that you would expect? Why or why not?
Solution:
We can see that, in this special case, there are 6 confidence intervals that do not contain the
true population mean, 10. This is roughly what we would expect. The confidence level is 90%,
so we would expect (0.9)(50) = 45 of the confidence intervals to include the population mean,
10. Now we have 44, which is not far from what we expect.
3. (15 points) GROUP PART: Combine your data with 3 or 4 other students (in any of my
sections) and answer the following questions: 1) Is the number of intervals that contain
the mean what you would expect for the combined data? 2) How are the results from part
2 and part 3 different? (This is due on the following Monday and must be submitted online
also.)
Solution:
You will need to show your work to the number of intervals that contain the mean (just add up
the numbers from each student) and then calculate the percentage by dividing by the total
number.
I would expect that this number would be more accurate because this is for a larger sample
size. This percentage is only valid at large numbers.
STAT 350: Introduction to Statistics
Purdue University
Fall 2014
Author: Min Ren
STAT 350 (Fall 2014)
Lab 5: R Solution
3
B. (45 points) Water quality testing. The Deely Laboratory is a drinking-water testing and
analysis service. One of the common contaminants it tests for is lead. Lead enters drinking
water through corrosion of plumbing materials, such as lead pipes, fixtures, and solder. The
service knows that their analysis procedure is unbiased but not perfectly precise, so the
laboratory analyzes each water sample three times and reports the mean result. The repeated
measurements follow a Normal distribution quite closely. The standard deviation of this
distribution is a property of the analytic procedure and is known to be σ = 0.25 parts per
billion (ppb).
The Deely Laboratory has been asked by the university to evaluate a claim that the drinking
water in the Student Union has a lead concentration of 6 ppb, well below the Environmental
Protection Agency’s action level of 15 ppb. Since the true concentration of the sample is the
mean μ of the population of repeated analyses, the hypotheses are
The lab chooses the 1% level of significance,  = 0.01. They plan to perform three analyses of
one specimen (n=3).
1. (30 points, 6 points each part) Using computer software, calculate the following powers:
a. At the 1% level of significance, what is the power of this test against the specific
alternative μ = 6.5?
Solution:
Code:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
n <- 3
alpha <- 0.01
mu0 <- 6
sigma <- 0.25
sigman <- sigma/sqrt(n) #standard error
#z is from alpha/2 for a 2-tailed test
z <- -qnorm(alpha/2)
muprime=6.5
phi1 <- z + (mu0 - muprime)/sigman
phi2 <- -z + (mu0 - muprime)/sigman
pphi1 <- pnorm(phi1)
pphi2 <- pnorm(phi2)
beta <- pnorm(phi1)- pnorm(phi2)
power <- 1 - beta;
power
[1] 0.8128029
The power is 0.8128029
STAT 350: Introduction to Statistics
Purdue University
Fall 2014
Author: Min Ren
STAT 350 (Fall 2014)
Lab 5: R Solution
4
b.
At the 5% level of significance, what is the power of this test against the specific
alternative μ = 6.5?
Solution:
Code:
> n <- 3
> alpha <- 0.05
> mu0 <- 6
> sigma <- 0.25
> sigman <- sigma/sqrt(n) #standard error
> #z is from alpha/2 for a 2-tailed test
> z <- -qnorm(alpha/2)
> muprime=6.5
> phi1 <- z + (mu0 - muprime)/sigman
> phi2 <- -z + (mu0 - muprime)/sigman
> pphi1 <- pnorm(phi1)
> pphi2 <- pnorm(phi2)
> beta <- pnorm(phi1)- pnorm(phi2)
> power <- 1 - beta;
> power
[1] 0.9337271
The power is 0.9337271
c.
At the 1% level of significance, what is the power of this test against the specific
alternative μ = 6.75?
Solution:
Code:
> n <- 3
> alpha <- 0.01
> mu0 <- 6
> sigma <- 0.25
> sigman <- sigma/sqrt(n) #standard error
> #z is from alpha/2 for a 2-tailed test
> z <- -qnorm(alpha/2)
> muprime=6.75
> phi1 <- z + (mu0 - muprime)/sigman
> phi2 <- -z + (mu0 - muprime)/sigman
> pphi1 <- pnorm(phi1)
> pphi2 <- pnorm(phi2)
> beta <- pnorm(phi1)- pnorm(phi2)
> power <- 1 - beta;
> power
[1] 0.9956077
The power is 0.9956077
STAT 350: Introduction to Statistics
Purdue University
Fall 2014
Author: Min Ren
STAT 350 (Fall 2014)
Lab 5: R Solution
5
d.
If the lab performs five analyses of one specimen (n=5), what is the power of this test
against the specific alternative μ = 6.5?
Solution:
Code:
> n <- 5
> alpha <- 0.01
> mu0 <- 6
> sigma <- 0.25
> sigman <- sigma/sqrt(n) #standard error
> #z is from alpha/2 for a 2-tailed test
> z <- -qnorm(alpha/2)
> muprime=6.5
> phi1 <- z + (mu0 - muprime)/sigman
> phi2 <- -z + (mu0 - muprime)/sigman
> pphi1 <- pnorm(phi1)
> pphi2 <- pnorm(phi2)
> beta <- pnorm(phi1)- pnorm(phi2)
> power <- 1 - beta;
> power
[1] 0.9710402
The power is 0.9710402
e.
Write a short paragraph explaining the consequences of changing the significance
level, alternative μ and sample size on the power.
Solution:
We have the following conclusion:
The larger the value of  (or equivalently the higher the significance level), the greater the
power is.
The larger the distance between the ’ and 0 is, the greater the power.
The larger the sample size, the greater the power.
2. (10 points) Generate a power curve when n =3 at a 1% significance level. Please use an
interval length of 4.
Solution:
Code:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
n <- 3
alpha <- 0.01
mu0 <- 6
sigma <- 0.25
sigman <- sigma/sqrt(n) #standard error
#z is from alpha/2 for a 2-tailed test
z <- -qnorm(alpha/2)
muprime <- seq (from=4,to=8, by=0.05)
phi1 <- z + (mu0 - muprime)/sigman
phi2 <- -z + (mu0 - muprime)/sigman
pphi1 <- pnorm(phi1)
pphi2 <- pnorm(phi2)
beta <- pnorm(phi1)- pnorm(phi2)
power <- 1 - beta;
plot(muprime,power,main="Power for the Hypothesis Test",type="l")
STAT 350: Introduction to Statistics
Purdue University
Fall 2014
Author: Min Ren
STAT 350 (Fall 2014)
Lab 5: R Solution
6
3. (5 points) What sample size would be required for the power to be at least 0.90 at the 1%
level of significance against the specific alternative μ = 6.5?
Solution:
Code:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
n <- seq (1,15,1)
muprime <- 6.5
alpha <- .01
mu0 <- 6
sigma <- 0.25
sigman <- sigma/sqrt(n)
z <- -qnorm(alpha/2)
phi1 <- z + (mu0 - muprime)/sigman
phi2 <- -z + (mu0 - muprime)/sigman
pphi1 <- pnorm(phi1)
pphi2 <- pnorm(phi2)
beta <- pnorm(phi1)- pnorm(phi2)
power <- 1 - beta;
answer <- data.frame(n,power)
answer
n
1
2
3
4
5
6
7
power
0.2823677
0.5997105
0.8128029
0.9228015
0.9710402
0.9899145
0.9966929
STAT 350: Introduction to Statistics
Purdue University
Fall 2014
Author: Min Ren
STAT 350 (Fall 2014)
8
9
10
11
12
13
14
15
8
9
10
11
12
13
14
15
Lab 5: R Solution
7
0.9989686
0.9996917
0.9999111
0.9999752
0.9999933
0.9999982
0.9999995
0.9999999
As seen above, we need the sample size to be at least 4.
STAT 350: Introduction to Statistics
Purdue University
Fall 2014