Download Report

STAT 350 (Fall 2014)
Lab 5: SAS Solution
1
Lab 5 - Interpretation of Confidence Intervals and Power
Analysis for Z tests
Objectives: A Better Understanding of Confidence Intervals and
Power Curves.
A. (55 points) Interpretation of a Confidence Interval. Use software to generate 40
observations from a normal distribution with µ = 10 and σ = 2. Repeat this 50 times.
1. (30 points) From each set of observations, compute a 90% confidence interval. No data is
required, however, you need to include all 50 confidence intervals.
Solution:
Sample Code 1:
(You should run the following code 50 times. Each time, you should copy the table, record the
confidence interval and determine by hand if this confidence interval contains the population
mean: 10.)
data CI;
do i = 1 to 40 by 1;
randnorm = rand('normal',10,2) ;
output ;
drop i ;
end;
run ;
proc ttest data = CI ;
var randnorm;
run ;
Sample Code 2:
(You may consider the following code, which is more difficult to write, but generate the 50
tables simultaneously.)
data CI ;
array x{50};
do i = 1 to 40 by 1;
do j = 1 to 50 by 1 ;
x{j} = rand('normal',10,2) ;
end;
output ;
drop i ;
drop j ;
end;
run ;
proc ttest data = CI ;
var x1;
var x2;
var x3;
var x4;
var
var x6;
var x7;
var x8;
var x9;
var
var x11;
var x12;
var x13;
var x14;
var x16;
var x17;
var x18;
var x19;
var x21;
var x22;
var x23;
var x24;
var x26;
var x27;
var x28;
var x29;
var x31;
var x32;
var x33;
var x34;
var x36;
var x37;
var x38;
var x39;
var x41;
var x42;
var x43;
var x44;
var x46;
var x47;
var x48;
var x49;
ods select ConfLimits;
ods show ;
run ;
x5;
x10;
var x15;
var x20;
var x25;
var x30;
var x35;
var x40;
var x45;
var x50;
STAT 350 (Fall 2014)
Lab 5: SAS Solution
2
You should get 50 tables, each of which contains a confidence interval. The following is the
first table I generated. Your report should contain 50 of this tables and highlight all the tables
whose confidence intervals do not include (or include) the population mean:10.
To make the solution concise, I listed all the confidence intervals I got in the table below and
highlight those who do not include the population mean: 10.
CI
In?
95% CL Mean
9.4097
10.8642
1
95% CL Mean
9.5091
10.6161
11.2007
10.8748
1
10.3901
1
10.6851
1
10.9508
1
10.5657
10.8263
1
10.8777
1
10.4367
1
11.2368
1
10.4578
0
11.1275
1
10.9419
1
11.6291
1
10.3659
1
9.6974
11.1017
9.7448
10.8431
9.3926
10.3865
9.1212
10.3648
9.2466
10.5082
9.1406
10.3695
9.1195
10.4024
9.4113
10.6268
9.335
10.6218
9.4849
10.5298
0
9.6107
11.0554
1
9.0389
10.2855
10.7677
1
9.1935
10.566
1
9.4117
10.529
1
9.4129
10.8171
1
10.1422
11.3419
0
95% CL Mean
1
9.3031
10.6495
1
95% CL Mean
1
9.2112
10.5111
1
95% CL Mean
1
8.9837
10.2604
1
95% CL Mean
1
9.6252
10.9945
1
95% CL Mean
1
9.2355
10.4169
1
95% CL Mean
1
9.1342
10.4469
1
95% CL Mean
1
9.5955
10.7123
1
95% CL Mean
1
9.215
10.5701
1
95% CL Mean
1
8.7384
10.1005
1
95% CL Mean
1
95% CL Mean
1
9.5464
95% CL Mean
95% CL Mean
95% CL Mean
9.418
10.632
95% CL Mean
95% CL Mean
10.2866
9.3576
1
95% CL Mean
95% CL Mean
95% CL Mean
9.574
1
95% CL Mean
95% CL Mean
9.6714
11.2792
10.7016
95% CL Mean
95% CL Mean
95% CL Mean
8.9885
1
95% CL Mean
95% CL Mean
10.026
9.8931
9.2468
95% CL Mean
95% CL Mean
95% CL Mean
9.2941
10.7623
95% CL Mean
95% CL Mean
9.5573
9.6341
In?
95% CL Mean
1
95% CL Mean
95% CL Mean
9.6208
10.7949
95% CL Mean
1
95% CL Mean
9.2483
9.4787
95% CL Mean
95% CL Mean
9.4689
1
95% CL Mean
95% CL Mean
9.5067
10.7147
95% CL Mean
95% CL Mean
9.0976
9.5377
CI
95% CL Mean
95% CL Mean
95% CL Mean
9.5308
In?
95% CL Mean
1
95% CL Mean
9.9826
CI
95% CL Mean
1
9.6202
11.0511
1
STAT 350 (Fall 2014)
Lab 5: SAS Solution
3
2. (10 points) Determine how many of these intervals contain the population mean, µ = 10.
Please indicate for each confidence interval if it contains the value or not. Is this number
that you would expect? Why or why not?
Solution
We can see that, in this special case, there are 3 confidence intervals that do not contain the
true population mean, 10. This is roughly what we would expect. The confidence level is 90%,
so we would expect (0.9)(50) = 45 of the confidence intervals to include the population mean,
10. Now we have 47, which is not far from what we expect.
3. (15 points) GROUP PART: Combine your data with 3 or 4 other students (in any of my
sections) and answer the following questions: 1) Is the number of intervals that contain
the mean what you would expect for the combined data? 2) How are the results from part
2 and part 3 different? (This is due on the following Monday and must be submitted online
also.)
Solution:
You will need to show your work to the number of intervals that contain the mean (just add up
the numbers from each student) and then calculate the percentage by dividing by the total
number.
I would expect that this number would be more accurate because this is for a larger sample
size. This percentage is only valid at large numbers.
B. (45 points) Water quality testing. The Deely Laboratory is a drinking-water testing and
analysis service. One of the common contaminants it tests for is lead. Lead enters drinking
water through corrosion of plumbing materials, such as lead pipes, fixtures, and solder. The
service knows that their analysis procedure is unbiased but not perfectly precise, so the
laboratory analyzes each water sample three times and reports the mean result. The repeated
measurements follow a Normal distribution quite closely. The standard deviation of this
distribution is a property of the analytic procedure and is known to be σ = 0.25 parts per
billion (ppb).
The Deely Laboratory has been asked by the university to evaluate a claim that the drinking
water in the Student Union has a lead concentration of 6 ppb, well below the Environmental
Protection Agency’s action level of 15 ppb. Since the true concentration of the sample is the
mean μ of the population of repeated analyses, the hypotheses are
The lab chooses the 1% level of significance,  = 0.01. They plan to perform three analyses of
one specimen (n=3).
1. (30 points, 6 points each part) Using computer software, calculate the following powers:
a. At the 1% level of significance, what is the power of this test against the specific
alternative μ = 6.5?
b. At the 5% level of significance, what is the power of this test against the specific
alternative μ = 6.5?
c. At the 1% level of significance, what is the power of this test against the specific
alternative μ = 6.75?
d. If the lab performs five analyses of one specimen (n=5), what is the power of this test
against the specific alternative μ = 6.5?
e. Write a short paragraph explaining the consequences of changing the significance
level, alternative μ and sample size on the power.
STAT 350 (Fall 2014)
Lab 5: SAS Solution
Solution:
Sample Code:
It is acceptable if you separate the different parts.
data powercalculation; *This data calculates everything of part one.*
n=3; alpha=.01;
mu0 = 6; muprime = 6.5;
sigma = 0.25; sigman = sigma/sqrt(n);
z = - PROBIT(alpha/2);
phi1 = z + (mu0 - muprime)/sigman;
phi2 = -z + (mu0 - muprime)/sigman;
beta=CDF('normal',phi1)- CDF('normal',phi2);
power = 1 - beta;
output;
n=3; alpha=.05;
mu0 = 6; muprime = 6.5;
sigma = 0.25; sigman = sigma/sqrt(n);
z = - PROBIT(alpha/2);
phi1 = z + (mu0 - muprime)/sigman;
phi2 = -z + (mu0 - muprime)/sigman;
beta=CDF('normal',phi1)- CDF('normal',phi2);
power = 1 - beta;
output;
n=3; alpha=.01;
mu0 = 6; muprime = 6.75;
sigma = 0.25; sigman = sigma/sqrt(n);
z = - PROBIT(alpha/2);
phi1 = z + (mu0 - muprime)/sigman;
phi2 = -z + (mu0 - muprime)/sigman;
beta=CDF('normal',phi1)- CDF('normal',phi2);
power = 1 - beta;
output;
n=5; alpha=.01;
mu0 = 6; muprime = 6.5;
sigma = 0.25; sigman = sigma/sqrt(n);
z = - PROBIT(alpha/2);
phi1 = z + (mu0 - muprime)/sigman;
phi2 = -z + (mu0 - muprime)/sigman;
beta=CDF('normal',phi1)- CDF('normal',phi2);
power = 1 - beta;
output;
run;
proc print data = powercalculation ;
run ;
4
STAT 350 (Fall 2014)
Lab 5: SAS Solution
The results are summarized in the following table.
We have the following conclusion:
The larger the value of  (or equivalently the higher the significance level), the greater the
power is.
The larger the distance between the ’ and 0 is, the greater the power.
The larger the sample size, the greater the power.
2. (10 points) Generate a power curve when n =3 at a 1% significance level. Please use an
interval length of 4.
Solution:
data power;
n=3; alpha=.01;
mu0 = 6; sigma = 0.25; sigman = sigma/sqrt(n);
z = - PROBIT(alpha/2);
do muprime=4 to 8 by .005;
phi1 = z + (mu0 - muprime)/sigman;
phi2 = -z + (mu0 - muprime)/sigman;
beta=CDF('normal',phi1)- CDF('normal',phi2);
power = 1 - beta;
output;
end;
run;
title1 h=3 'Power for the Hypothesis Test';
axis1 label=(h=2);
axis2 label=(h=2 angle=90);
symbol1 v=none i=join c=blue;
proc gplot data=power;
plot power*muprime/haxis=axis1 vaxis=axis2;
run;
5
STAT 350 (Fall 2014)
Lab 5: SAS Solution
6
3. (5 points) What sample size would be required for the power to be at least 0.90 at the 1%
level of significance against the specific alternative μ = 6.5?
Solution:
data size;
do n = 1 to 20 by 1; *these value may be changed as appropriate;
muprime = 6.5; alpha=.01;
mu0 = 6; sigma = 0.25; sigman = sigma/sqrt(n);
z = -PROBIT(alpha/2); *z alpha/2;
phi1 = z + (mu0 - muprime)/sigman;
phi2 = -z + (mu0 - muprime)/sigman;
pphi1 = CDF ('normal', phi1);
pphi2 = CDF ('normal', phi2);
beta=CDF('normal',phi1)- CDF('normal',phi2); *CDFs of normal;
power = 1 - beta;
output;
end;
run;
proc print data=size;
run;
STAT 350 (Fall 2014)
Lab 5: SAS Solution
As seen above, we need the sample size to be at least 4.
7