Download Report

AN APPLICATION OF BA.YESIAN ANALYSIS I N DETERMINING
APPROPRIATE SAMPLE SIZES FOR USE I N U. S.
ARMY OPERATIONAL TESTS
A THESIS
Presented t o
The F a c u l t y of the D i v i s i o n of Graduate Studies
By
Robert L. Cordova
In Partial
Fulfillment
of the Requirements f o r the Degree
Master of Science i n I n d u s t r i a l Engineering
Georgia I n s t i t u t e of Technology
June, 1975
AN APPLICATION OF BA YES IAN ANALYSIS IN DETERMINING
APPROPRIATE SAMPLE SIZES FOR USE IN U. S .
ARMY OPERATIONAL TESTS
Approved:
s„
,J
.J
1
R u s s e l l G. Heik.es, Chairman
/
J/
if -*777
Gay3en Thompson
^
Date approved by Chairman:
)
ii
TABLE OF CONTENTS
Page
LIST OF TABLES
iii
LIST OF FIGURES
iv
SUMMARY
v
Chapter
I.
INTRODUCTION
1
The General Problem
The S p e c i f i c Problem
Background
Review of the L i t e r a t u r e
II.
THE TEST METHODOLOGY
10
The Assumptions of Normality
The P r i o r I n f o r m a t i o n
The Basic A l t e r n a t i v e s of Determining Sample Size
The C l a s s i c a l Method
A Bayesian Approximation
III.
DEMONSTRATION OF THE METHODOLOGY
34
Programming the Model
Demonstrating the Model
IV.
CONCLUSIONS AND RECOMMENDATIONS
47
Conclusions
Recommendations
Appendices
I.
II.
FORTRAN PROGRAM FOR THE APPROXIMATE BAYESIAN
PROCEDURE
50
FORTRAN PROGRAM FOR THE CHI-SQUARE TEST OF
NORMALITY
6l
BIBLIOGRAPHY
64
iii
LIST OF TABLES
Table
Page
1.
Minimum Sample S i z e - C l a s s i c a l Method
17
2.
T e s t o f Normal Random Generator
35
3.
Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based on
One Hundred Runs f o r Each Value o f D e l t a
2
a
4.
2
/ V = 16, |m' - ji| = 5
Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based on
One Hundred Runs f o r Each Value o f D e l t a
2
2
1
a/a' = 16, Im - n| = 10
5.
42
Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based on
V a r i o u s Values o f |m' 2
a /a'
6.
41
2
= 16
kk
Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based
on V a r i o u s V a l u e s o f t h e R a t i o o f t h e V a r i a n c e s
Im* - (j. | = 5
*6
iv
LIST OF FIGURES
Figure
Page
1.
Bayes Theorem f o r Continuous Random Variables
18
2.
Bayes Theorem f o r Normal D i s t r i b u t i o n s
23
3.
Separation of the P o s t e r i o r and Sample Means
39
V
SUMMARY
The r e s e a r c h i s d e v o t e d t o m o d i f y i n g t h e B a y e s i a n t e c h n i q u e s
a s s o c i a t e d w i t h d e t e r m i n i n g t h e minimum sample s i z e r e q u i r e d t o c o n
s t r u c t i n t e r v a l e s t i m a t e s o f t h e t r u e mean o f an e x p e r i m e n t a l or s a m p l i n g
p r o c e s s w h i c h i s modeled by a normal d i s t r i b u t i o n w i t h unknown p a r a m e t e r s ,
The p r o c e d u r e c o n s i d e r s o n l y t h e c a s e where t h e p r i o r i n f o r m a t i o n can be
r e p r e s e n t e d by a normal d i s t r i b u t i o n w i t h known mean and known v a r i a n c e .
R i g o r o u s B a y e s i a n a n a l y s i s o f t h i s s i t u a t i o n would r e s u l t
in
u s i n g a p o s t e r i o r d i s t r i b u t i o n w h i c h has a normal-gamma d e n s i t y t o c o n
struct interval estimates.
In o r d e r t o c i r c u m v e n t t h e o b v i o u s
c u l t i e s o f working w i t h t h i s r a t h e r complex d e n s i t y ,
a procedure, which
i s f e l t t o be more c o m p a t i b l e t o t h e U. S. Army O p e r a t i o n a l
environment,
i f offered
diffi
Testing
f o r a p p r o x i m a t i n g t h e r e q u i r e d B a y e s i a n sample
size.
2
I f t h e v a r i a n c e o f t h e s a m p l i n g or e x p e r i m e n t a l p r o c e s s , <j
5
were
known, t h e minimum B a y e s i a n sample s i z e r e q u i r e d t o c o n s t r u c t an i n t e r v a l
e s t i m a t e about t h e t r u e mean, |i, w i t h c o n f i d e n c e c o e f f i c i e n t ,
w i d t h , k,
5
and
is:
2 Z
b
where
o/
r
e
f
e
r
s
L
/ o
a
n
2
k
2
a'
2
t o t h e p e r c e n t a g e p o i n t s o f t h e s t a n d a r d normal d i s -
t r i b u t i o n s u c h t h a t P(Z > Z
/ o
) = <v/2, and a'
i s the variance of
the
otl£
p r i o r d i s t r i b u t i o n o r , t h e p r i o r v a r i a n c e o f t h e t r u e s a m p l i n g mean, |i.
vi
S u b s t i t u t i n g t h e sample v a r i a n c e , S , f o r t h e t r u e p r o c e s s
v a r i a n c e and t h e term t
/ o
,
n*
- 1 for Z /
i n t h e above e x p r e s s i o n ,
and d e f i n i n g t h e w i d t h o f t h e c o n f i d e n c e i n t e r v a l as a f u n c t i o n o f t h e
sample v a r i a n c e , i . e . ,
k = 6S, r e s u l t s i n t h e f o l l o w i n g e x p r e s s i o n
for
t h e approximate B a y e s i a n sample s i z e :
2 t( /2, n * - l )
a
where n *
c
i s t h e c l a s s i c a l sample s i z e r e q u i r e d t o c o n s t r u c t an i n t e r v a l
e s t i m a t e o f w i d t h k about t h e t r u e mean,
The term t ( o / 2 , n *
c
c
of t h e sampling p r o c e s s .
- l ) refers to the percentage points of the
t distribution with n*
n*
2
c
c
Student's
- 1 d e g r e e s o f freedom s u c h t h a t P [ ( t > t(o//2,
- 1)] = a/2.
The e x p r e s s i o n f o r t h e approximate B a y e s i a n sample s i z e i s
Iteratively,
s t a r t i n g w i t h a f r a c t i o n o f t h e c l a s s i c a l sample
r e q u i r e d f o r t h e i n t e r v a l e s t i m a t e o f t h e same s p e c i f i e d
and a c c u r a c y , a s t h e f i r s t a p p r o x i m a t i o n .
solved
size
confidence
The i t e r a t i v e p r o c e d u r e
is
programed f o r a UNIVAC 1108 computer and a p p l i e d t o a h y p o t h e t i c a l
example t o d e m o n s t r a t e t h e e f f e c t i v e n e s s
of t h e methodology.
If accurate prior information i s a v a i l a b l e , the r e s u l t s
achieved
by t h e p r o c e d u r e d e v e l o p e d t o approximate t h e B a y e s i a n sample s i z e a.nd
c o n s t r u c t i n t e r v a l e s t i m a t e s o f t h e unknown s a m p l i n g mean, |i, o f
s p e c i f i e d c o n f i d e n c e and a c c u r a c y are comparable t o t h e r e s u l t s
using c l a s s i c a l techniques.
obtained
These r e s u l t s , however, are a c h i e v e d u s i n g
s m a l l e r samples s i z e s t h a n r e q u i r e d f o r t h e c l a s s i c a l c a s e .
A pro-
vii
oedure was suggested f o r examining the accuracy of the p r i o r
information
and a s c e r t a i n i n g whether or not Bayesian analysis was appropriate f o r
a given sampling or experimental s i t u a t i o n .
However, the expected
r e s u l t s of using t h i s procedure were not obtained.
CHAPTER I
INTRODUCTION
The General Problem
T h i s s t u d y i s an i n v e s t i g a t i o n o f t h e problem o f
determining
t h e minimum sample s i z e o f an e x p e r i m e n t , t h a t i s , t h e minimum number
o f r e p l i c a t i o n s o f t h e e x p e r i m e n t , r e q u i r e d t o e s t i m a t e t h e mean o f t h e
experimental v a r i a b l e t o w i t h i n a predetermined accuracy.
In g e n e r a l ,
t h e s t u d y i s l i m i t e d t o a c e r t a i n t y p e o f t e s t i n g s i t u a t i o n w h i c h has
the following
a.
characteristics:
The t e s t v a r i a b l e , whose mean i s t o be e s t i m a t e d , can be
modeled by a normal d i s t r i b u t i o n w i t h unknown mean and unknown v a r i a n c e .
b.
Information i s a v a i l a b l e , p r i o r t o sampling or experimenting,
from w h i c h a p r o b a b i l i t y d i s t r i b u t i o n o f t h e mean o f t h e t e s t
variable
can be c o n s t r u c t e d .
c.
This p r i o r i n f o r m a t i o n can be r e p r e s e n t e d by a normal d i s -
2
t r i b u t i o n w i t h known mean, m , and known v a r i a n c e , , a* .
1
The S p e c i f i c
A U. S. Army O p e r a t i o n a l T e s t
Problem
(OT) i s an o v e r a l l e v a l u a t i o n o f
a s y s t e m w h i c h has b e e n d e v e l o p e d f o r g e n e r a l u s e w i t h i n t h e U. S . Army
structure
(2).
I n t h i s c o n t e x t , a "system" may be not o n l y hardware,
but a l s o d o c t r i n a l c o n c e p t s , and i s u s u a l l y a m i x t u r e o f b o t h .
t e s t i s c o n d u c t e d i n an environment w h i c h d u p l i c a t e s o r
The
closely
s i m u l a t e s t h o s e c o n d i t i o n s under w h i c h t h e s y s t e m w i l l be employed
if
2
i t i s adopted f o r general use.
I t is this fact that generally differen
t i a t e s OT from engineering, developmental, p r e - p r o d u c t i o n , and other
t e s t s which may be conducted on the same system and which are u s u a l l y
a p a r t of the t o t a l development scheme o f the system.
r e q u i r e d t o be an independent e v a l u a t i o n of the system.
I n f a c t , OT i s
Thus, OT i s a
v i t a l p a r t o f the process by which new equipment and concepts are
incorporated i n t o the U. S. Army s t r u c t u r e .
An Operational Test i s i n essence a systematic p l a n f o r e v a l u a t i n g
the t o t a l system being t e s t e d .
I t i s composed of numerous subtests
which address s p e c i f i c issues (unknown parameters) which are considered
c r i t i c a l or paramount t o the t o t a l e v a l u a t i o n of the system (3).
The
s p e c i f i c c r i t i c a l issues t o be evaluated by each subtest and the order
of these t e s t s govern the o v e r a l l s t r u c t u r e of the Operational Test.
Once the s p e c i f i c s t r u c t u r e of the Operational Test has been
e s t a b l i s h e d , a d e c i s i o n must be made as t o the number of r e p l i c a t i o n s
of each subtest to conduct i n order t o p r o p e r l y evaluate the
issue i n question.
critical
Time and budget c o n s t r a i n t s place emphasis on con
ducting the minimum number o f r e p l i c a t i o n s p o s s i b l e ; w h i l e the disastrous
consequences t h a t could r e s u l t i f a c r i t i c a l issue i s not p r o p e r l y
e v a l u a t e d , make i t imperative t h a t accuracy i s not s a c r i f i c e d f o r
economy.
Thus, the problem reduces down t o one of determining the
minimum number of r e p l i c a t i o n s of each subtest t o conduct i n order to
evaluate the c r i t i c a l issues i n question t o w i t h i n a predetermined
accuracy.
Current procedural and p o l i c y documents governing the conduct
of Operational Tests (3, h
9
11) suggest t h a t f o r the most p a r t , sample
3
s i z e s are determined using non-Bayesian or c l a s s i c a l s t a t i s t i c a l methods.
These methods do not consider p r i o r information a v a i l a b l e concerning
the v a r i a b l e being t e s t e d ; t h e r e f o r e , inferences and decisions about
the v a r i a b l e are based e n t i r e l y on the experimental or sampling r e s u l t s .
Bayesian techniques, on the other hand, attempt t o use both the p r i o r
i n f o r m a t i o n and the experimental r e s u l t s i n making inferences and
decisions about the v a r i a b l e .
Thus, t h i s i n v e s t i g a t i o n i s
essentially
a search f o r a p r a c t i c a l procedure f o r applying Bayesian techniques t o
Operational T e s t i n g .
The p r i n c i p a l Operations Research t o o l s used i n
t h i s study are s t a t i s t i c a l inference and e s t i m a t i o n techniques to
develop the methodology, and computer simulation techniques t o demon
s t r a t e the procedures developed.
O p e r a t i o n a l Testing i s an expensive undertaking which must
operate i n an environment constrained by budget and time considerations.
The author b e l i e v e s t h a t a methodology which e f f e c t i v e l y reduces the
number of r e p l i c a t i o n s r e q u i r e d t o evaluate the c r i t i c a l
issues
addressed by each subtest and also maintains the accuracy and confidence
d e s i r e d of the t e s t , i s a worthwile p u r s u i t d i r e c t l y a p p l i c a b l e t o the
Operational Testing environment.
Background
During the l a s t decade, there has been an increasing emphasis
and d r i v e w i t h i n the m i l i t a r y community t o develop and f o r m a l i z e a
methodology t o adequately i d e n t i f y and evaluate the r i s k s associated
w i t h the development and procurement of major weapons systems.
underlying premise which i n i t i a t e d t h i s a c t i o n was t h a t
The
unanticipated
k
cost and time over-runs and performance shortcomings, which had become
i n c r e a s i n g l y p r e v a l e n t , were the r e s u l t of inadequate assessment of
the r i s k s involved w i t h the m a t e r i e l a c q u i s i t i o n process.
The metho
dology which grew out of t h i s e f f o r t i s known as decision r i s k a n a l y s i s .
I n a r e p o r t prepared f o r the Army M a t e r i e l Systems Analysis Agency
(AMSAA), A t z i n g e r , Brooks, et a l . ,
(6) present a b r i e f h i s t o r y and
d e s c r i p t i o n of the major concepts of the decision r i s k analysis process.
The authors define r i s k analysis as f o l l o w s :
"Decision r i s k analysis
i s a d i s c i p l i n e of systems a n a l y s i s , which i n a s t r u c t u r e d manner, p r o
vides a meaninfgul measure of the r i s k s associated w i t h various
natives . "
alter
The purpose of the report is t o s t r u c t u r e t h i s decision r i s k
analysis process so t h a t the t r a d e - o f f s inherent i n the a l t e r n a t i v e s
are v i s a b l y and meaningfully d i s p l a y e d .
I t c i t e s the f o l l o w i n g four
major areas as the underlying concepts of d e c i s i o n r i s k a n a l y s i s :
a.
Subjective
Probability
b.
Monte Carlo Methods
c.
Network Analysis
d.
Bayesian S t a t i s t i c s
Bayesian s t a t i s t i c s and Bayes Theorem have a t t r a c t e d renewed
i n t e r e s t i n many f i e l d s o f a p p l i e d and t h e o r e t i c a l s t a t i s t i c s
years.
i n recent
This theorem i s e s s e n t i a l l y a mechanism f o r combining new
i n f o r m a t i o n w i t h p r e v i o u s l y a v a i l a b l e information so t h a t decisions or
inferences can be based on a l l the information a v a i l a b l e .
Over the
y e a r s , a controversy has developed between the Bayesian and the more
orthodox c l a s s i c a l s t a t i s t i c a l concepts.
Anscombe ( l ) provides a b r i e f
5
but concise h i s t o r y of the development of both p h i l o s o p h i e s .
During the l a s t few years t h e r e has been a r e v i v a l of i n t e r e s t
among s t a t i s t i c a l t h e o r i s t s i n a mode o f argument going back to the
Reverend Thomas Bayes (1702-61), Presbyterian m i n i s t e r a t Tunbridge
Wells i n England, who wrote an "Essay Towards Solving a Problem i n
the Doctrine o f Chances," which was published i n 17&3 a f t e r his
death.
Bayes work was incorporated i n a great development of p r o
b a b i l i t y theory by Laplace and many o t h e r s , which had general
currency r i g h t i n t o the e a r l y years of the century. Since then
there has been an enormous development of t h e o r e t i c a l s t a t i s t i c s ,
by R. A. F i s h e r , J . Neyman, E.S. Pearson, A. Wald and many o t h e r s ,
i n which the methods and concepts o f inference used by Bayes and
Laplace have been r e j e c t e d .
The orthodox s t a t i s t i c i a n , during the l a s t t w e n t y - f i v e years or
so, has sought t o handle inference problems (problems of deciding
what the f i g u r e s mean and what ought t o be done about them) w i t h
the utmost o b j e c t i v i t y .
He explains his f a v o r i t e concepts, s i g n i
f i c a n c e l e v e l , confidence c o e f f i c i e n t , unbiased e s t i m a t e s , e t c . , i n
terms of what he c a l l s p r o b a b i l i t y , but his notion of p r o b a b i l i t y
bears l i t t l e resemblance t o what the man i n the s t r e e t means ( r i g h t l y )
by p r o b a b i l i t y .
He i s not concerned w i t h probable t r u t h or p l a u s i
b i l i t y , but he defined p r o b a b i l i t y i n terms of frequency of occur
rence i n repeated t r i a l s , as i n a game of chance. He views his
inference problems as matters of r o u t i n e , and t r i e s t o devise p r o
cedures t h a t w i l l work w e l l i n the long r u n . Elements of personal
judgment are as f a r as possible t o be excluded from s t a t i s t i c a l
c a l c u l a t i o n s . A d m i t t e d l y , a s t a t i s t i c i a n has t o be able to exer
cise judgment, but he should be d i s c r e e t about i t and a t a l l costs
keep i t out of the t h e o r y .
I n f a c t , orthodox s t a t i s t i c i a n s show
a great d i v e r s i t y i n t h e i r p r a c t i c e , and i n the explanations they
give f o r t h e i r p r a c t i c e ; and so the above remarks, and some of the
f o l l o w i n g ones, are no b e t t e r than crude g e n e r a l i z a t i o n s . As such,
they a r e , I b e l i e v e , d e f e n s i b l e .
(Perhaps i t should be e x p l i c i t l y
s a i d t h a t F i s h e r , who c o n t r i b u t e d so much t o the development of the
orthodox school, nevertheless holds an unorthodox p o s i t i o n not f a r
removed from the Bayesian; and t h a t some other orthodox s t a t i s t i
c i a n s , notably Wald have made much use of formal Bayesian methods,
t o which no p r o b a b i l i s t i c s i g n i f i c a n c e i s a t t a c h e d . )
The r e v i v e d i n t e r e s t i n Bayesian inference s t a r t s w i t h another
posthumous essay on "Truth and P r o b a b i l i t y , " by F. P. Ramsey^ (190330), who conceived of a theory of consistent behavior by a person
faced w i t h u n c e r t a i n t y . Extensive developments were made by B. de
F i n e t t e and (from a r a t h e r d i f f e r e n c p o i n t of view) by J . J e f f e r y s .
For mathematical s t a t i s t i c i a n s the most thorough study of such a
t h e o r y is t h a t of L. J . Savage3>^. r S c h l a i f e r ^ has persuasively
i l l u s t r a t e d the new approach by reference t o a v a r i e t y of business
and i n d u s t r i a l problems. Anyone curious to o b t a i n some i n s i g h t
i n t o the Bayesian method, without mathematical hardship, cannot do
b e t t e r than browse i n S c h l a i f e r ' s book.
1
#
6
The Bayesian s t a t i s t i c i a n attempts t o show how the evidence of
observations should modify p r e v i o u s l y held b e l i e f s i n the formation
of r a t i o n a l opinions, and how on the basis of such opinions and of
value judgments a r a t i o n a l choice can be made between a l t e r n a t i v e
a v a i l a b l e a c t i o n s . For him p r o b a b i l i t y r e a l l y means p r o b a b i l i t y .
He i s concerned w i t h judgments i n the face of u n c e r t a i n t y , and he
t r i e s t o make the process o f judgment as e x p l i c i t y and o r d e r l y as
possible.
A t z i n g e r , Brooks, et a l . ,
( 6 ) obviously consider Bayesian
s t a t i s t i c a l procedures to have g r e a t p o t e n t i a l i n the decision r i s k
analysis process; they s t a t e :
Bayesian s t a t i s t i c s enjoys a unique p o s i t i o n i n r i s k a n a l y s i s .
There f r e q u e n t l y e x i s t s i t u a t i o n s where the a n a l y s i s t has both data
and expert judgment t o draw upon i n constructing the p r o b a b i l i t y
d i s t r i b u t i o n of i n t e r e s t i n the consolidation a c t i v i t y .
Bayesian
s t a t i s t i c s provides the analyst w i t h a t o o l f o r synthesizing a l l
of t h i s i n f o r m a t i o n i n t o one p r o b a b i l i t y d i s t r i b u t i o n which can
then be used t o d i r e c t l y estimate r i s k s .
Review of the L i t e r a t u r e
The s t a t i s t i c a l l i t e r a t u r e dealing w i t h sample s i z e determination
is q u i t e e x t e n s i v e , p a r t i c u l a r l y i n the area of c l a s s i c a l techniques.
Bayes, T . , Essay Towards Solving a Problem i n the Doctrine of Chances,
r e p r i n t e d w i t h b i b l i o g r a p h i c a l note by G. A. Barnard, Bioraetrika,
i+5
(
1
9
5
8
)
,
2
9
3
-
3
1
5
.
Ramsey, F. P . , The Foundations of Mathematics, London: Rowtledge and
Kegan P a u l , 1 9 3 1 .
Savage, L. J . , The Foundations of S t a t i s t i c s , New York, John W i l e y ,
1
9
5
4
.
Savage, L. J . , Subjective P r o b a b i l i t y and S t a t i s t i c a l P r a c t i c e , t o
be published i n a Mehtuen Monograph.
S c h l a i f e r , R., P r o b a b i l i t y and S t a t i s t i c s f o r Business Decisions:
An I n t r o d u c t i o n t o Managerial Economics Under U n c e r t a i n t y , New York,
McGraw-Hill, 1 9 5 9 -
7
Mace (12) provides an e x c e l l e n t and thorough coverage of c l a s s i c a l p r o
cedures f o r determining the optimum sample s i z e of a research experiment.
This p u b l i c a t i o n i s a p p l i c a t i o n s o r i e n t e d and provides procedures,
formulas, and t a b l e s f o r determining economical sample sizes f o r some
f o r t y d i f f e r e n t types o f research o b j e c t i v e s .
U n f o r t u n a t e l y , the
author considers only one r a t h e r l i m i t e d a p p l i c a t i o n of Bayesian t e c h
niques to sample s i z e d e t e r m i n a t i o n .
The l i m i t a t i o n i n t h i s
particular
example, t h a t the variance of the sampling process must be known, seems
t o occur q u i t e f r e q u e n t l y i n the l i t e r a t u r e of Bayesian techniques f o r
determing minimum sample s i z e s .
There has been extensive research i n the a p p l i c a t i o n of Bayesian
techniques t o r e l i a b i l i t y engineering and q u a l i t y c o n t r o l .
White
(15)
presents a promising methodology f o r p e r i o d i c r e l i a b i l i t y assessment
using Bayesian techniques to combine a n a l y t i c a l p r e d i c t i o n s w i t h l i m i t e d
t e s t r e s u l t s t o o b t a i n g r e a t e r p r e c i s i o n i n the r e l i a b i l i t y
estimate.
The main l i m i t a t i o n of t h i s paper is t h a t i t considers only the gamma
d i s t r i b u t i o n i n the a n a l y s i s .
G i l b r e a t h (8) has devised sampling
procedures f o r use i n s e q u e n t i a l sampling models which have d i r e c t
a p p l i c a t i o n i n q u a l i t y c o n t r o l and i n economic l o t s i z e d e t e r m i n a t i o n .
These techniques, however, are more a p p l i c a b l e t o hypothesis t e s t i n g
than to the e s t i m a t i o n problem.
A t z i n g e r and Brooks (5) provide an e x c e l l e n t comparison of
Bayesian and c l a s s i c a l decision making under u n c e r t a i n t y f o r a class
of problems where the decision v a r i a b l e i s the B e r n o u l l i success p r o
b a b i l i t y , p.
I f the outcome of any p a r t i c u l a r t e s t or experiment is
8
viewed as a success or f a i l u r e , the r e s u l t i n g data c l a s s i f i c a t i o n i s
c h a r a c t e r i s t i c of a B e r n o u l l i process.
The authors persuasively argue
t h a t h i s t o r i c a l l y , one of the major objectives i n t e s t and e v a l u a t i o n
processes has been t o estimate t h i s unknown B e r n o u l l i success parameter.
U n f o r t u n a t e l y , such an analysis does not address the a c t u a l parameters
of the sampling or experimental process
Winkler
itself.
( l 6 ) provides a r a t h e r d e t a i l e d and complete development
and treatment of Bayesian a p p l i c a t i o n s t o inference and decision theory
a t the i n t r o d u c t o r y l e v e l .
Although the concepts developed i n t h i s
p u b l i c a t i o n are v e r y thoroughly covered, the scope of the m a t e r i a l i s
rather l i m i t e d .
That i s , only two s p e c i f i c sampling processes are
analyzed i n d e t a i l :
the sampling process modeled by the B e r n o u l l i
d i s t r i b u t i o n , and the sampling process represented by the normal d i s
t r i b u t i o n w i t h known v a r i a n c e .
R a i f f a and S c h l a i f e r
(13) provide an extensive mathematical
development of Bayesian technqiues a p p l i e d t o s t a t i s t i c a l decision
theory.
However, once a g a i n , extensive analysis of the normal d i s
t r i b u t i o n i s g e n e r a l l y r e s t r i c t e d t o the case where the variance o f
the sampling p o p u l a t i o n i s known.
Thus, Bayesian a p p l i c a t i o n s t o the problem of sample-size d e t e r
mination d e a l only w i t h v e r y s p e c i a l i z e d s i t u a t i o n s i n the current
literature.
There appears t o be no s u b s t a n t i a l research i n t o the
examination o f the general problem.
On the other hand, c l a s s i c a l
s t a t i s t i c a l techniques commonly apply i t e r a t i v e type algorithms to the
t o the general problem of sample s i z e d e t e r m i n a t i o n .
The author believes
t h a t these techniques can be v a l i d l y extended t o Bayesian analysis and
9
produce e q u a l l y v a l i d r e s u l t s .
The aim of t h i s i n v e s t i g a t i o n , t h e n , i s
t o extend the a p p l i c a t i o n o f these w e l l known techniques t o the general
sampling s i t u a t i o n using Bayesian a n a l y s i s .
10
CHAPTER I I
THE TEST METHODOLOGY
The Assumptions of Normality
The n o r m a i l i t y assumptions s t a t e d i n the i n t r o d u c t i o n
are c r u c i a l , a l b e i t r e s t r i c t i v e , t o t h i s i n v e s t i g a t i o n .
introduction
The assumption
t h a t the p r i o r d i s t r i b u t i o n , which represents the d i s t r i b u t i o n of the
mean of a random v a r i a b l e , i s normally d i s t r i b u t e d has s o l i d support
i n the C e n t r a l Limit Theorem.
Hines and Montgomery (9) s t a t e the
essence o f t h i s important theorem as f o l l o w s :
I f X., , X
. . . , X i s a sequence of n independent random
n
2
v a r i a b l e s w i t h E(X^) = ^ and V(X^) = cr (both f i n i t e ) and Y = X
+ X + . . . + X , then under some general conditions
2
n
p
1
0
Y
z =
n
'I H
1=1
n
71 <>?
i=l
has an approximate N ( 0 , l ) d i s t r i b u t i o n as n approaches i n f i n i t y .
The "general conditions" mentioned i n the theorem are i n f o r m a l l y
summarized as f o l l o w s : The terms X^, taken i n d i v i d u a l l y , c o n t r i
bute a n e g l i g i b l e amount t o the variance of the sum, and i t i s not
l i k e l y t h a t a s i n g l e term makes a l a r g e c o n t r i b u t i o n t o the sum.
The p r i n c i p a l i m p l i c a t i o n of t h i s theorem, t h e n , i s t h a t
in
g e n e r a l the sum of n independent random v a r i a b l e s is approximately
normally d i s t r i b u t e d f o r s u f f i c i e n t l y l a r g e n, regardless of the d i s
t r i b u t i o n of the n i n d i v i d u a l random v a r i a b l e s .
Unfortunately,
the
11
assumption t h a t the random v a r i a b l e t o be t e s t e d i s normally d i s t r i b u t e d
is much more r e s t r i c t i v e .
However, i n many cases, r e a l - w o r l d s i t u a t i o n s
can be s a t i s f a c t o r i l y approximated by a normal process.
Also,
statisti
c a l inference and e s t i m a t i o n procedures, p a r t i c u l a r l y those concerning
the mean o f random v a r i a b l e s , are g e n e r a l l y robust ( i n s e n s i t i v e ) t o the
n o r m a l i t y assumption ( 1 2 ) .
The P r i o r I n f o r m a t i o n
At f i r s t glance, the requirement t h a t Operational Testing be
independent of other t e s t i n g conducted on the same system may seen an
insurmountable obstacle i n attempting t o o b t a i n adequate p r i o r
mation.
infor
T h i s , however, i s u s u a l l y not the case; other sources of p r i o r
i n f o r m a t i o n do e x i s t .
For example, most new systems undergoing t e s t i n g
have been s p e c i f i c a l l y designed t o replace older or outmoded systems
which are c u r r e n t l y a p a r t of the U. S. Army s t r u c t u r e .
These older
systems represent a v a s t source o f h i s t o r i c a l data from which p r i o r
d i s t r i b u t i o n s f o r n e a r l y any c r i t i c a l issue can be developed.
In
those r a r e cases where no h i s t o r i c a l data e x i s t from which t o construct
a p r i o r d i s t r i b u t i o n f o r a s p e c i f i c c r i t i c a l i s s u e , the Delphi technique
or other proven methods o f developing s u b j e c t i v e assessments of uncer
t a i n t i e s can be used t o develop the p r i o r d i s t r i b u t i o n
(6).
I n any e v e n t , t o the Bayesian s t a t i s t i c i a n , the p r i o r
information
represents the best a v a i l a b l e estimate about an u n c e r t a i n q u a n t i t y ,
regardless of i t s source.
This f a c t even suggests t h a t i t i s reason
able and l o g i c a l t o modify the p r i o r d i s t r i b u t i o n developed from
h i s t o r i c a l data to r e f l e c t the improved design c h a r a c t e r i s t i c s o f the
12
new system.
Suppose, f o r example, t h a t one of the c r i t i c a l issues being
evaluated during OT of a new weapons system i s the accuracy of the
weapon a t a s p e c i f i e d range.
The d i s t r i b u t i o n of the mean-error of
s i m i l a r weapons c u r r e n t l y i n use can be determined from h i s t o r i c a l
data.
I f the new system i s expected to be s i g n i f i c a n t l y more accurate
because of new design c h a r a c t e r i s t i c s , the mean of the p r i o r d i s t r i
b u t i o n developed from the h i s t o r i c a l data can be adjusted to r e f l e c t
the expected increase i n the performance of the new system.
In dis
cussing techniques f o r the assessment of p r i o r d i s t r i b u t i o n s and the
use o f d i f f u s e p r i o r d i s t r i b u t i o n s to represent the s i t u a t i o n where no
p r i o r information i s a v a i l a b l e , Winkler
(l6) states:
I t should be stressed t h a t i n g e n e r a l , t h e r e i s no such t h i n g as
a " t o t a l l y i n f o r m a t i o n l e s s " s i t u a t i o n and the use of p a r t i c u l a r
d i s t r i b u t i o n s t o represent d i f f u s e p r i o r states of information i s
a convenient approximation t h a t i s a p p l i c a b l e only when the p r i o r
i n f o r m a t i o n i s "overwhelmed" by the sample i n f o r m a t i o n .
I n most
r e a l - w o r l d s i t u a t i o n s , n o n - n e g l i g i b l e p r i o r i n f o r m a t i o n (nonn e g l i g i b l e r e l a t i v e t o the sample i n f o r m a t i o n ) i s a v a i l a b l e , and
the concept of a d i f f u s e p r i o r d i s t r i b u t i o n is not a p p l i c a b l e .
The Basic A l t e r n a t i v e s of Determining Sample Size
This study considers only two basic approaches t o determining
the appropriate sample s i z e i n an experimental process.
One approach
i s t o simply d i s r e g a r d any p r i o r knowledge or information a v a i l a b l e
about the v a r i a b l e o f i n t e r e s t , and use c l a s s i c a l s t a t i s t i c a l techniques
t o solve the problem.
The other approach i s t o combine the p r i o r
i n f o r m a t i o n w i t h the r e s u l t s of a l i m i t e d number o f r e p l i c a t i o n s of
the experiment, i f p o s s i b l e , and then use these r e s u l t s t o solve the
problem.
13
The C l a s s i c a l Method
C l a s s i c a l e s t i m a t i o n procedures and techniques are w e l l documented
i n the l i t e r a t u r e
(9? 1 0 , 1 2 ) .
This method uses only the r e s u l t s of
the sampling or experimental process i n the e s t i m a t i o n procedures and
ignores a l l p r i o r i n f o r m a t i o n .
S t a r t i n g from the basic assumption t h a t
the sampling process i s normally d i s t r i b u t e d w i t h unknown mean, JJL, and
2
unknown v a r i a n c e , a , the random v a r i a b l e representing the outcome of
the sampling process can be represented by:
2
~ N((i, a ) ,
Let (X^, X^,
of the experiment.
with
2
(i, cr unknown
. . . , X ) represent the r e s u l t s of n r e p l i c a t i o n s
The sample s t a t i s t i c s based on the s p e c i f i c n
values obtained from the sampling process can be expressed as:
n
-
I V
X a= — ^
X^,
the sample mean
i=l
and
n
^2
2
i = 1
S =
j
, the sample variance
The appropriate expression f o r a ( l - a) percent confidence
i n t e r v a l about the unknown mean, (i, f o r a process which i s normally
d i s t r i b u t e d and f o r which the variance is unknown i s constructed using
the Student's t d i s t r i b u t i o n ,
i.e.;
Ik
P(X - t ( c r / 2 , n - l ) —
£ u £ X + t(a/2,n-l) —
) = 1-q,
(2-1)
where the expression t ( a / 2 , n - 1 ) r e f e r s t o the percentage points of the
Student's t d i s t r i b u t i o n w i t h n-1 degrees o f freedom such t h a t P ( t >
t ( / 2 , n - 1 ) = a/2.
a
R e c a l l from the i n t r o d u c t i o n t h a t the c l a s s i c a l
interpretation
of p r o b a b i l i t y d i f f e r s considerably from the Bayesian i n t e r p r e t a t i o n .
Thus, the i n t e r p r e t a t i o n of equation ( 2 - 1 ) i s based on long-run con
siderations.
That i s , the c l a s s i c a l s t a t i s t i c i a n would say t h a t i f a
confidence i n t e r v a l based on a sample s i z e of n i s constructed each
t i m e , then i n the long r u n , 1-a percent o f such i n t e r v a l s would contain
the t r u e mean of the normally d i s t r i b u t e d sampling process.
The value
of a, which i s p r e s e l e c t e d a t some low v a l u e , can then be thought of
as p r o t e c t i o n against f a i l u r e of the i n t e r v a l t o include the t r u e
value of the mean o f the sampling process.
The v a l u e ,
a
- 0.05,
is
o f t e n s e l e c t e d f o r s t a t i s t i c a l inference and e s t i m a t i o n problems because
of t r a d i t i o n a l useage.
The second type of e r r o r t h a t can occur i n
i n t e r v a l e s t i m a t i o n problems i s t h a t the i n t e r v a l constructed based
on a set o f s p e c i f i c sample r e s u l t s may t o too w i d e , even though the
i n t e r v a l does include the t r u e value of the mean o f the sampling p r o
cess.
T h i s , t h e n , i s a problem of the accuracy associated w i t h the
confidence i n t e r v a l .
P r o t e c t i o n against t h i s type of e r r o r I s accom
p l i s h e d by c o n t r o l l i n g the w i d t h of the confidence i n t e r v a l constructed.
The w i d t h of each s p e c i f i c confidence i n t e r v a l is dependent on the
15
sample s i z e and the value of a s p e c i f i e d .
The terms
UL = X - t( /2, n-1) —
Jn
a
and
UU = X + t ( / 2 , n-1) - ~
a
,
Jn
which are r e a l - v a l u e d functions of the sample r e s u l t s , are the lower
and upper l i m i t s , r e s p e c t i v e l y , of the i n t e r v a l e s t i m a t e .
The Student's
t d i s t r i b u t i o n i s v e r y s i m i l a r t o the standard normal d i s t r i b u t i o n , and
f o r degrees of freedom, v = n-1 > 2 0 , the two d i s t r i b u t i o n s are v i r t u a l l y
indistinguisable.
And i n f a c t , the Student's t d i s t r i b u t i o n i s
identical
t o the standard normal d i s t r i b u t i o n f o r degrees of freedom, v = oo ( 1 0 ) .
This f a c t allows accurate approximations i n computing the minimum
sample s i z e by approximating the value of t ( a / 2 , n - 1 ) by t ( a / 2 , • ) =
Zq/2 f o r moderate sample s i z e s .
The experssion Zq/2 r e f e r s t o the
percentage points of the standard normal d i s t r i b u t i o n such t h a t
P(Z > Z q / 2 ) = a / 2 .
For the moment, l e t the p r e s e l e c t e d width of the confidence
i n t e r v a l be simply equal t o k.
Then from equation ( 2 - 1 ) , the h a l f -
i n t e r v a l w i d t h can be expressed a s :
16
Solving t h i s equation f o r n, r e s u l t s i n t h e f o l l o w i n g expression f o r
the minimum sample size r e q u i r e d f o r a confidence i n t e r v a l width equal
t o k.
n*
=
2 t ( o / 2 , n * -i)s
c
-2
(2-2)
c
It
i s more convenient t o express the width o f the confidence
interval
i n terms o f the sample standard d e v i a t i o n i n order t o s i m p l i f y equation
(2-2).
Thus, i f k = 6S i s s u b s t i t u t e d i n t o t h e equation, the minimum
sample size r e q u i r e d can then be expressed a s :
2t(o/2, n* -1)
n
Equation (2-3)
*c = [
ii
—
2
2
("3)
]
cannot be solved e x p l i c i t l y f o r n * , since the
value o f t ( a / 2 , n * - l ) i s a f u n c t i o n o f the sample s i z e n * .
t(<y/2,
the value o f
i s equal t o
Zq,/2,
n * - l ) i s approximately equal t o t(<y/2,
f o r moderate sample sizes a good f i r s t
But since
oo),
which
approximation
f o r t h e s o l u t i o n o f equation (2-3)
i s obtained by s u b s t i t u t i n g the
value o f Zq/2
n* - l ) .
f o r t h e value t(a/2,
This f i r s t
approximation
i s known t o be too s m a l l , although f o r l a r g e sample sizes i t i s q u i t e
close t o the a c t u a l value o f n * .
Using t h i s f i r s t
approximation,
c a l l i t n^, t o evaluate t ( o / 2 , n - l ) and t o solve equation (2-3)
again,
0
to o b t a i n a b e t t e r second approximation f o r the value of n * .
i t e r a t i v e procedure can be used t o approximate the value o f n *
desired accuracy; however, there i s u s u a l l y no s i g n i f i c a n t
i n the approximation a f t e r t h e second or t h i r d i t e r a t i o n .
This
c
t o any
improvement
17
Table 1 shows t h e v a l u e s o f n *
c
o b t a i n e d f o r a 95 p e r c e n t
0.05) c o n f i d e n c e i n t e r v a l f o r v a r i o u s v a l u e s o f 6 u s i n g t h i s
procedure.
(<* =
iterative
Because o f t h e premimum p l a c e d on a c c u r a t e e s t i m a t e s
O p e r a t i o n a l T e s t i n g , v a l u e s o f 6 > 1.0 were not c o n s i d e r e d .
in
The v a l u e s
shown i n t h e t a b l e under t h e h e a d i n g P(K) are t h e approximate p r o b a
b i l i t i e s o f a s i n g l e o b s e r v a t i o n from t h e sampling p r o c e s s
falling
between t h e l o w e r and upper l i m i t s o f t h e c o n f i d e n c e i n t e r v a l ,
P(K) = P(UL £ x £ UU).
i.e.,
This v a l u e g i v e s a p r o b a b i l i s t i c measure o f
the accuracy (width) of the confidence i n t e r v a l .
The v a l u e s o f
i n t h e t a b l e have b e e n rounded up t o t h e n e x t h i g h e s t i n t e g e r .
n*
c
As
i l l u s t r a t e d i n Table 1, e q u a t i o n (2-3) p o i n t s out t h a t i n o r d e r t o
d e c r e a s e t h e w i d t h o f a c o n f i d e n c e i n t e r v a l by o n e - h a l f ,
s i z e must be i n c r e a s e d a p p r o x i m a t e l y by a f a c t o r o f
T a b l e 1.
6
1.0
0.9
1.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
t h e sample
four.
Minimum Sample S i z e - C l a s s i c a l Method
P(K)
0.383
0.347
0.311
0.274
0.236
0.197
0.159
0.119
0.080
0.040
c
18
22
27
34
46
64
99
174
387
1537
18
A Bayesian Approximation
Bayes Theorem f o r Continuous Random V a r i a b l e s .
The essence of
Bayes Theorem f o r continuous random v a r i a b l e s i s depicted i n Figure 1
shown below.
The d e n s i t i e s f(0)
and f ( 8 | y ) represent the p r i o r d i s
t r i b u t i o n and the p o s t e r i o r d i s t r i b u t i o n r e s p e c t i v e l y , and
represents the l i k e l i h o o d or sampling f u n c t i o n .
f(y|e)
I t i s important t o
keep i n mind always t h a t i t i s the p r i o r d i s t r i b u t i o n or the s t a t i s t i
cians p r i o r s t a t e o f knowledge t h a t i s modified by the sampling r e s u l t s
and not the r e v e r s e .
f(ee)
f(y|e)
Figure 1.
ey)
f(e|
Bayes Theorem f o r Continuous Random Variables
The p r i o r and p o s t e r i o r d i s t r i b u t i o n s must be proper d e n s i t y
functions.
That i s , they must possess the f o l l o w i n g mathematical
p r o p e r t i e s a p p l i c a b l e t o the d e n s i t y f u n c t i o n of any continuous random
v a r i a b l e , x , which has range space or domain, R :
(i)
f ( x ) 2> 0
(ii)
J
f o r a l l xeR
f(x)dx = 1
f
R
x
19
The l i k e l i h o o d or sampling f u n c t i o n ,
f(y|e),
represents the p r o b a b i l i t y
of o b t a i n i n g a given v a l u e , y , f o r the range o f possible values of
9.
The l i k e l i h o o d f u n c t i o n is not a proper d e n s i t y f u n c t i o n because the
events f ( y | © ) are not mutually exclusive over the range of
0.
As suggested i n Figure 1 , Bayes Theorem i s e s s e n t u a l l y a process
of combining the p r i o r d i s t r i b u t i o n w i t h the sample information t o
y i e l d the p o s t e r i o r d i s t r i b u t i o n .
The r e s u l t a n t p o s t e r i o r d e n s i t y has
the f o l l o w i n g form:
f(9|y) =
f
frl°>
j f(e) f(y|e)de
(2-10
This r e s u l t can be expressed i n words a s :
posterior
density
normalizing
|~ p r i o r
"1 [~ l i k e l i h o o d 1
L constant
J L density J L function
J
where the normalizing constant, l /
r
J
f(0)f(y|e)d9,
i s needed t o make
the p o s t e r i o r d i s t r i b u t i o n a proper d e n s i t y f u n c t i o n .
Before the advent of the high speed computer which g r e a t l y eased
the computational burden involved w i t h numerical i n t e g r a t i o n techniques,
a p p l i c a t i o n of equation (2-K)
t o r e v i s e density functions i n the l i g h t
of sample information o f t e n proved extremely d i f f i c u l t because of the
i n t e g r a t i o n r e q u i r e d t o compute the normalizing constant.
For t h i s
reason, Bayesian s t a t i s t i c i a n s developed the concept of "conjugate"
d i s t r i b u t i o n s , which are f a m i l i e s of d i s t r i b u t i o n s t h a t ease the compu
t a t i o n a l burden when they are used as p r i o r d i s t r i b u t i o n s
(l6).
Of
20
c o u r s e t h e r e s u l t a n t form o f t h e p o s t e r i o r d i s t r i b u t i o n depends on t h e
l i k e l i h o o d f u n c t i o n as w e l l as t h e p r i o r d i s t r i b u t i o n .
prior distributions
Thus,
conjugate
a r e s e l e c t e d on t h e b a s i s o f t h e s t a t i s t i c a l
p e r t i e s o f t h e model c h o s e n t o r e p r e s e n t t h e s a m p l i n g p r o c e s s .
pro
When
t h e p r i o r d i s t r i b u t i o n i s c o n j u g a t e t o t h e l i k e l i h o o d or s a m p l i n g
f u n c t i o n , t h e r e s u l t a n t p o s t e r i o r d i s t r i b u t i o n i s a l s o a member o f
t h e same c o n j u g a t e f a m i l y o f p r i o r
distributions.
Bayes Theorem f o r Normal D i s t r i b u t i o n s .
If i t is possible
to
model t h e p o p u l a t i o n o r p r o c e s s b e i n g sampled by a normal d i s t r i b u t i o n ,
the proper choice for a family of conjugate p r i o r d i s t r i b u t i o n s
on t h e s t a t i s t i c i a n ' s
knowledge o f t h e p a r a m e t e r s o f t h e normal d a t a
generating process used.
effects
R a i f f a and S c h l a i f e r
(13) summarize t h e
o f t h e s t a t i s t i c i a n ' s knowledge o f t h e two p a r a m e t e r s o f
normal d i s t r i b u t i o n s on t h e p r o p e r c h o i c e o f c o n j u g a t e p r i o r
b u t i o n s as
depends
the
distri
follows:
Case ( i ) u, known, cr unknown: The a p p r o p r i a t e f a m i l e o f
d i s t r i b u t i o n s have a gamma-2 d e n s i t y .
conjugate
Case ( i i ) a known, u unknown: The a p p r o p r i a t e f a m i l e o f c o n j u g a t e
d i s t r i b u t i o n s have a normal d e n s i t y .
2
Case ( i i i ) b o t h u- and a unknown: The a p p r o p r i a t e f a m i l y o f c o n j u
g a t e d i s t r i b u t i o n s have a normal-gamma d e n s i t y .
An A p p r o x i m a t i o n P r o c e d u r e .
S i n c e i t was assumed t h a t w i t h i n
t h e c o n t e x t o f t h i s s t u d y t h e model r e p r e s e n t i n g t h e s a m p l i n g p r o c e s s
i n O p e r a t i o n a l T e s t i n g was n o r m a l l y d i s t r i b u t e d w i t h unknown mean, |i,
2
and unknown v a i r a n c e , a , t h e a p p r o p r i a t e f a m i l y o f c o n j u g a t e
b u t i o n s t o u s e i n t h i s c a s e have a normal-gamma d e n s i t y .
overcome t h e o b v i o u s d i f f i c u l t i e s
distri
In o r d e r t o
a s s o c i a t e d w i t h computing
interval
21
estimates w i t h the normal-gamma d e n s i t y , a procedure i s suggested here
to modify the Bayesian analysis of t h i s sampling process so t h a t the
f a m i l y of conjugate p r i o r d i s t r i b u t i o n s have a normal d e n s i t y f u n c t i o n ;
as i s the case when the variance of the population or sampling process
i s known.
Assume f o r the moment t h a t the variance of the sampling process
i s known.
Then the conjugate p r i o r d i s t r i b u t i o n has a normal d e n s i t y
f u n c t i o n of the form:
/
-(LL
1
x
- m') /2cr'
V 2m*
where the prime ( ' )
i s used t o s i g n i f y a parameter or constant which
2
Is associated w i t h the p r i o r d i s t r i b u t i o n .
Thus, a '
i s the variance
of the p r i o r d i s t r i b u t i o n o r , the p r i o r variance of the unknown p a r a
meter, (j,; and m' i s the mean of the p r i o r d i s t r i b u t i o n o f t h i s p a r a
meter .
I f n r e p l i c a t i o n s of the experiment were now conducted and a
sample mean,
n1
i r „
m = n
)
X.
I
,
i=l
and a sample v a r i a n c e ,
n
S
2
=
-V
7 (X.l
n-1
U
- m)
2
,
i=l
were observed, the r e s u l t a n t p o s t e r i o r d i s t r i b u t i o n would also have a
22
normal d e n s i t y f u n c t i o n of the form:
f
W
(
,
|
y
.
)
i
T
_
2TTO-"
e
-
2
(,-m") /2a"
2
2
where y represents the sample r e s u l t s , and the double prime (") i s
used t o i n d i c a t e a parameter or constant which i s associated w i t h the
posterior d i s t r i b u t i o n .
Thus, a"
i s t h e p o s t e r i o r variance of \i,
and m" i s the mean of the p o s t e r i o r d i s t r i b u t i o n o f \i.
These p o s t e r i o r
parameters can be computed from t h e f o l l o w i n g formulas:
1
„2
(
J
,2
O
+
(2-5)
4
2
G
and
m"
o f l / o ' V
+
( ° / g
2
? »
(2.6)
2
( l / o ' ) + (n/a )
2
Equations (2-5) and (2-6) i n d i c a t e t h a t the r e c i p r o c a l of the
p o s t e r i o r variance i s equal t o the sum o f the r e c i p r o c a l o f the p r i o r
2
v a r i a n c e , a ' , and t h e r e c i p r o c a l o f the variance of t h e sample mean,
2
a/n.
The p o s t e r i o r mean i s a weighted average o f t h e p r i o r mean, m ' ,
and the sample mean, m.
The weights being t h e r e c i p r o c a l o f the r e s
pective variances.
As depicted i n Figure 2 . , an important f e a t u r e o f t h e p o s t e r i o r
d i s t r i b u t i o n i s t h a t the p o s t e r i o r mean, m", always l i e s between the
2
p r i o r mean, m ' , and the sample mean, m.
The p o s t e r i o r v a r i a n c e , a" ,
23
T
I s always smaller than the p r i o r v a r i a n c e , cr
(l6).
T
(2-5)j i f the variance of the p r i o r d i s t r i b u t i o n , o*
From equation
, decreases, the
amount o f p r i o r u n c e r t a i n t y decreases, and the p r i o r information i s
given more weight i n the determination of the p o s t e r i o r d i s t r i b u t i o n .
S i m i l a r l y , as the variance of the sample mean, a / n , decreases, the
sampling i n f o r m a t i o n i s given more weight i n the determination of the
posterior distribution.
m"
Figure 2.
m
Bayes Theorem f o r Normal D i s t r i b u t i o n s
A d i f f e r e n t p a r a m e t e r i z a t i o n of t h i s problem might help c l e a r i f y
the r e s u l t s obtained.
2
l e t n' = ^-p
a*
Then the p r i o r variance can be w r i t t e n i n terms of n
or sampling v a r i a n c e , t h u s :
f
and the process
2k
Similarly,
if
2
then
Substituting these r e s u l t s into equations
(2-5) and ( 2 - 6 ) , t h e p a r a
meters of the p o s t e r i o r d i s t r i b u t i o n are then
or s i m p l y ,
n" = n' + n
and
2
m" =
2
( n ' / g ) m ' + (n/cr )m
2
(n'/a )
+
2
( /a )
n
25
or simply,
=
n'm- + nm
n' + n
(
^
I n his i n t e r p r e t a t i o n of the r e s u l t s obtained by using these
new parameters, Winkler
( l 6 ) suggests t h a t the p r i o r d i s t r i b u t i o n can
be thought of as roughly e q u i v a l e n t t o the information contained i n a
sample of s i z e n
1
1
w i t h a sample mean o f m from a normal sampling p r o -
2
cess w i t h variance cr .
That i s , n
1
appears t o be the sample s i z e
2
r e q u i r e d t o produce a variance of cr'
1
f o r a sample mean equal t o m ,
since the variance of the sample mean from a sample s i z e n' i s equal
to a / n ' .
Winkler also considers equations (2-7) and (2-8) as formulas
f o r pooling the information from the two samples.
Under t h i s
inter
p r e t a t i o n , the p o s t e r i o r or pooled sample s i z e i s equal t o the sum of
the two i n d i v i d u a l sample s i z e s , one from the p r i o r
and one from the sampling process.
distribution
The p o s t e r i o r or pooled sample mean
i s equal t o a weighted average of the two i n d i v i d u a l sample means.
This pooling process suggests t h a t a reasonable estimate of the
sample mean, based on a l l the i n f o r m a t i o n a v a i l a b l e , i s the p o s t e r i o r
or pooled mean, m".
Notice t h a t i f n' > n, then the p o s t e r i o r or pooled
mean is closer t o the p r i o r mean than t o the sample mean.
That i s ,
the p r i o r i n f o r m a t i o n i s given more importance than the sample r e s u l t s
i n the determination of the p o s t e r i o r parameters.
Of course, the
p o s t e r i o r mean i s closer to the sample mean i f n > n ' ; and i f n' = n,
the p o s t e r i o r mean i s e x a c t l y midway between the p r i o r mean and the
}
26
sample mean.
Notice also t h a t since the sample mean, m, i s as equally-
l i k e l y t o f a l l above as i t i s t o f a l l below the t r u e population or
sampling mean, p,; i t i s then e q u a l l y l i k e l y t h a t the sample mean and
the mean of the p r i o r d i s t r i b u t i o n , m ' , t o be on the same or opposite
sides of
When m' and m f a l l on the same side of JJL, the mean of the
p o s t e r i o r d i s t r i b u t i o n , m", w i l l be f u r t h e r from \i than the sample mean.
That i s , the p o s t e r i o r mean w i l l be a less accurate estimate of the t r u e
p o p u l a t i o n mean than the sample mean.
sides of
When m
T
and m are on opposite
then i t cannot be determined whether the p o s t e r i o r mean w i l l
be closer or f u r t h e r from JJ, than the sample mean.
Each s p e c i f i c case
must be examined s e p a r a t e l y ; the r e s u l t s w i l l depend on the sample
s i z e , the s p e c i f i c value of the p r i o r mean, and the variances of the
p r i o r and sampling d i s t r i b u t i o n s .
Since the p o i n t estimate of |j, based on a l l information a v a i l a b l e
i s the p o s t e r i o r mean which i s normally d i s t r i b u t e d w i t h mean, m", and
v a r i a n c e , a" , the
statistic
Z =
has a standard normal d i s t r i b u t i o n , i . e . ,
Z ~ N(0, l ) .
Therefore the
appropriate expression f o r a ( l - a) percent i n t e r v a l e s t i m a t i o n of J
|L
f o r t h i s case i s constructed using the standard normal d i s t r i b u t i o n ,
i.e.,
a" £ ^ £ m" + Z
P(m" - Z
a,
a/2
a") = 1 - a
(2-H)
27
The lower and upper l i m i t s of the confidence i n t e r v a l i n t h i s case are
UL = m" - Z
If,
/ o
a" and UU = m" Z
cr", r e s p e c t i v e l y .
/ o
as was done i n the c l a s s i c a l case, the w i d t h of the con
fidence i n t e r v a l f o r the general case i s set equal t o k, then from
equation ( 2 - l l ) the h a l f - i n t e r v a l width can be expressed as:
2
Now s u b s t i t u t i n g the expression f o r cr"
from equation ( 2 - 5 ) i n t o the
above expression r e s u l t s i n the f o l l o w i n g :
l.
-i 2
1
V2
r ?? T —
-^
V* a na'
1 / V
2
+
q
1
n / a
2
ct
2 J =2
2
+
Z
CT CT
«/2 '
k/2
1
2
2 .
J
=
CT
+
,2
n
C
T
and f i n a l l y ,
^ 2Z
,a
e
n
*b =
ot/ k
n 2
(2-12)
This t h e n , i s the Bayesian s o l u t i o n f o r the minimum sample s i z e
r e q u i r e d t o e s t a b l i s h a confidence i n t e r v a l o f w i d t h k about the mean
of the sampling process under the s p e c i a l c o n d i t i o n t h a t the variance
28
o f the sampling process i s known.
(2-12) deserve mention.
[2 Z /
01/ cr/k]
<-
Several c h a r a c t e r i s t i c s of equation
F i r s t of a l l , the f i r s t term i n the e q u a t i o n ,
, i s i n f a c t the exact expression f o r the c l a s s i c a l s o l u t i o n
t o the problem of determining the minimum sample s i z e r e q u i r e d t o
e s t a b l i s h a confidence i n t e r v a l of w i d t h k about the mean of a sampling
process w i t h known v a r i a n c e .
2
Second, the l a s t term i n the equation,
2
a /o~' , i s the expression developed e a r l i e r f o r n
1
i n equation
(2-7).
R e c a l l W i n k l e r ' s i n t e r p r e t a t i o n o f n' as being roughly the equivalent
sample s i z e , r e l a t i v e t o the sampling process, o f the information con-
2
2
t a i n e d i n the p r i o r d i s t r i b u t i o n .
The r a t i o o~ / a ' = 0 i s also used t o
d e f i n e a d i f f u s e p r i o r d i s t r i b u t i o n , i . e . , an informationless p r i o r
2
state.
Assuming t h a t the variance of the sampling process, o~ > 0,
2
the r a t i o a / a '
2
= 0 only i f the variance of the p r i o r
distribution,
2
a'
= oo.
I n t h i s case, the variance of the p r i o r d i s t r i b u t i o n would
2
represent a c o n d i t i o n of t o t a l u n c e r t a i n t y and since n
equation
1
= a /o"'
2
= 0,
(2-12) would y i e l d the same r e s u l t s as i n the c l a s s i c a l case.
Tying a l l these f a c t s t o g e t h e r , equation
p r e t e d as f o l l o w s :
(2-12) can be i n t e r
the minimum Bayesian sample s i z e r e q u i r e d t o e s t a
b l i s h an i n t e r v a l estimation o f the mean of any s p e c i f i e d w i d t h or
accuracy i s equal t o the minimum sample s i z e r e q u i r e d t o e s t a b l i s h
the same i n t e r v a l e s t i m a t i o n using c l a s s i c a l methods, minus the value
of the p r i o r i n f o r m a t i o n i n terms o f an equivalent sample s i z e .
more c l e a r l y :
n*
= n*
b
- n
c
1
Or,
29
Now, consider once again equation (2-12) i n order to address the
f a c t t h a t the variance of the sampling process i s i n f a c t not known.
S u b s t i t u t i n g the sample variance f o r the variance o f the sampling p r o
cess and the term t ( o / 2 , n *
c
l
- l ) for
n
equation (2-12), and once
again d e f i n i n g the w i d t h of the confidence i n t e r v a l as k = 6S, r e s u l t s
i n the f o l l o w i n g expression f o r the approximate Bayesian sample s i z e :
(2-13)
where
and m is equal t o the sample mean based on n*^ observations.
Examination o f equation (2-13) reveals t h a t the f i r s t term i n
the equation i s i d e n t i c a l t o equation (2-3), the c l a s s i c a l s o l u t i o n
t o the minimum sample s i z e problem f o r a normal sampling process w i t h
unknown v a r i a n c e .
The l a s t term i n the equation i s an approximation
of the equivalent sample s i z e o f the information contained i n the p r i o r
2
d i s t r i b u t i o n , where the value of n
2
1
= a /a
2
1
is approximated by n' =
2
S / a ' . Of course equation (2-13) cannot be evaluated e x p l i c i t l y , even
though the value of the f i r s t term i n the equation i s e x a c t l y known
from the r e s u l t s obtained using the c l a s s i c a l method, since the value
2
of S
depends on the s p e c i f i c observations obtained during the sampling
process.
30
Before suggesting a procedure f o r approximating a s o l u t i o n t o
equation
(2-13) f o r the general case, i t may be more appropriate a t
t h i s p o i n t t o examine the general implications of using the p o s t e r i o r
d i s t r i b u t i o n to construct cofidence i n t e r v a l estimates about the mean
of the sampling process.
An i n t e r v a l estimation based on the p o s t e r i o r
d i s t r i b u t i o n has as i t s midpoint the p o s t e r i o r mean, m"; w h i l e the mid
p o i n t of an i n t e r v a l e s t i m a t i o n based on the sampling process alone is
the sample mean, m.
R e f e r r i n g t o Figure 2, i t i s obvious, t h e n , t h a t
an i n t e r v a l estimate of width 6S which i s based on the p o s t e r i o r d i s
t r i b u t i o n w i l l not include the sample mean, m, i f m" and m are separated
by more than ^6S.
A l a r g e separation between m" and m i s i n d i c a t i v e of
p r i o r i n f o r m a t i o n which i s not v e r y compatible t o the r e s u l t s obtained
from the sampling or experimental r e s u l t s .
I n other words, the p r i o r
i n f o r m a t i o n does not p r e d i c t the behavior of the sampling process very
well.
This i s an important c o n s i d e r a t i o n i n O p e r a t i o n a l T e s t i n g , since
i t i s important t o decide whether or not t o use the p r i o r
information
i n e s t i m a t i n g the mean of the sampling or experimental process.
I t would seem appropriate t h e n , t o develop a t l e a s t a h e u r i s t i c
r u l e t o r e j e c t the use of p r i o r information which causes the p o s t e r i o r
and sampling means t o d i f f e r beyond some p r e - e s t a b l i s h e d l i m i t .
The
general form of such a r u l e would be of the form:
|m" - m| £ q6S
where the value of q i s s l e e t e d i n a manner such t h a t i f the
inequality
3
were n o t s a t i s f i e d ,
1
t h e a p p l i c a t i o n o f B a y e s i a n t e c h n i q u e s would be
a b o r t e d and t h e a p p r o p r i a t e sample s i z e f o r t h e s p e c i f i c
be d e t e r m i n e d by u s i n g c l a s s i c a l
s i t u a t i o n would
techniques.
R e t u r n i n g now t o t h e problem o f c o n s t r u c t i n g an i n t e r v a l
estimate
o f w i d t h 6S f o r t h e mean o f t h e s a m p l i n g p r o c e s s u s i n g B a y e s i a n t e c h n i
ques , t h e f o l l o w i n g p r o c e d u r e i s s u g g e s t e d as a r e a s o n a b l e approach t o
approximating the s o l u t i o n of equation
a.
B a y e s i a n sample
n
=
l
rl
(i
o/ •
2
-
1
3
)
Determine t h e minimum sample s i z e ,
c l a s s i c a l method.
b.
(
for the general case.
n *
C
J
required for the
This v a l u e , c a l l i t n^, i s t h e upper l i m i t o f t h e
size.
As a f i r s t a p p r o x i m a t i o n t o t h e B a y e s i a n sample s i z e ,
^
t i e r e
the value of d i s s e l e c t e d w i t h c o n s i d e r a t i o n
let
given
t o t h e c l a s s i c a l sample s i z e b e i n g u s e d .
That i s , f o r s m a l l v a l u e s
n* ,
( s u c h as 2 or k)
c
d s h o u l d be c h o s e n a t some low v a l u e
n^ be l a r g e enough t o y i e l d s u i t a b l e sample s t a t i s t i c s .
of
i n order t h a t
For l a r g e
v a l u e s o f n * ? d may be i n c r e a s e d s i n c e t h e r e s u l t i n g n^ samples would
c
s t i l l yield suitable s t a t i s t i c s .
The o b j e c t i v e h e r e i s t o
approximate
t h e B a y e s i a n sample s i z e c o n c e r v a t i v e l y w h i l e i n s u r i n g t h a t t h e
approxi
m a t i o n d e c i d e d upon i s l a r g e enough t o y i e l d r e a s o n a b l y v a l i d sample
statistics.
c.
Conduct t h e n^ r e p l i c a t i o n s o f t h e e x p e r i m e n t and from t h e
r e s u l t s compute t h e sample
statistics:
n
^ X
L \
1
1=1
n.
32
and
n
n
2
S 2
1
d.
t
- m )
±
- 1
(X
L
i=l
1
n
Use t h e s e s t a t i s t i c s t o compute t h e
2
s
n
1
approximations
=
~
0"
and
_
^tt
n'm
1
+ n-m..
l
l l
m _ = —:—
1
i
n
e.
n
i
Determine t h e s e c o n d a p p r o x i m a t i o n o f t h e B a y e s i a n sample
s i z e by u s i n g t h e v a l u e o b t a i n e d f o r t h e f i r s t a p p r o x i m a t i o n and t h e
following
relationship:
N
"2 = l
n
+
( n
* 0
" V
where A i s c h o s e n w i t h t h e same c o n s i d e r a t i o n s as was t h e v a l u e o f d.
The e x p r e s s i o n f o r t h e
a p p r o x i m a t i o n o f t h e B a y e s i a n sample
size
is:
n
f.
= n
+ i(n
Determine i f s u f f i c i e n t
Q
- n'
)
r e p l i c a t i o n s o f t h e e x p e r i m e n t have
33
been conducted after each iteration by comparing the computed approxi
mation of the Bayesian sample size to the classical sample size minus
the computed value of n'.
That is, continue the iterative procedure
until n. Jt n - n'.,
j
0
j
rt
g.
After computing the final approximation of the Bayesian
sample size, determine if the prior information should be accepted or
rejected.
That is, if |m" - m | <: q6S, use the n. replications already
conducted to construct the interval estimate of the mean of the expert
mental process using Bayesian techniques.
If |m" - m | > q&S, reject
the use of the prior information; conduct the remaining n ^ - n^. repli
cations of the experiment and construct the desired interval estimate
of the mean of the experimental process using classical techniques.
3h
CHAPTER I I I
DEMONSTRATION OF THE METHODOLOGY
Programming the Model
The model developed f o r approximating the minimum Bayesian sample
s i z e f o r the s p e c i a l t e s t s i t u a t i o n described i n Chapter I i s programmed
f o r the UNIVAC 1108 computer using standard F o r t r a n I V language.
The
program consists of four basic segments designed t o perform the f o l l o w
ing f u n c t i o n s :
generate the r e q u i r e d data and compute the sample
s t a t i s t i c s ; compute the minimum c l a s s i c a l sample s i z e r e q u i r e d f o r an
i n t e r v a l e s t i m a t i o n o f s p e c i f i e d w i d t h ; compute the approximate Bayesian
sample size r e q u i r e d f o r the same i n t e r v a l w i d t h ; and construct the
confidence i n t e r v a l s d e s i r e d based on the sampling r e s u l t s .
The Box and M u e l l e r technique (7) i s used t o generate the normally
d i s t r i b u t e d pseudo random numbers r e p r e s e n t a t i v e of a normal process
w i t h s p e c i f i e d mean and v a r i a n c e .
The random number generator was t e s t e d
f o r various sample sizes and values o f the model parameters using the
chi-square g o o d n e s s - o f - f i t t e s t f o r n o r m a l i t y .
The r e s u l t s of these
t e s t s were q u i t e favorable and are summarized i n Table 2.
Equation (2-3) i s solved i t e r a t i v e l y f o r the minimum c l a s s i c a l
sample s i z e by using two standard UNIVAC MATH-STAT l i b r a r y functions
(ik).
The f u n c t i o n TINORM i s used t o compute the value of the inverse
of the standard normal d i s t r i b u t i o n given the value of the p r o b a b i l i t y
f o r which the o r d i n a t e i s t o be c a l c u l a t e d .
The f u n c t i o n STUDIN i s
Table 2.
Specified
Mean and
Variance
-120.0
: 81.0
-5.0
: 4.0
0.0
: 1.0
57.0
: 64.0
200.0
: 400.0
225.0
: 25.0
Number of
Observations
per T r i a l
500
250
100
25
500
250
100
25
500
250
100
25
500
250
100
25
500
250
100
25
500
250
100
25
Test of Normal Random Generator
Number of
Trials
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
Number of T r i a l s Accepted at
the & = .05 S i g n i f i c a n c e Level
27
27
29
26
26
27
28
24
27
28
29
23
30
27
28
25
28
29
27
24
26
30
27
23
Table 2.
Specified
Mean and
Variance
Number o f
Observations
per T r a i l
Number of
Trials
(Continued)
Number of T r i a l s Accepted a t
the Qf = '05 S i g n i f i c a n c e Level
p, = 297.72
337
30
28
483
30
29
13
30
24
486
30
28
2
CT = 173.18
M» =
a
2
199.5
= 46.32
p, = 45.3
a
2
=
.78
(j, = 1542.0
2
CT =
57*2.6
37
used to c a l c u l a t e the inverse of the Student's t d i s t r i b u t i o n f o r a
given confidence c o e f f i c i e n t .
The r e s u l t s obtained from the subroutine
used t o c a l c u l a t e the c l a s s i c a l sample s i z e f o r each s p e c i f i e d value of
6 are shown i n Table 1 .
Approximations f o r the Bayesian sample s i z e f o r a given value of
d e l t a are computed using the i t e r a t i v e procedure developed i n the p r e
ceding chapter.
The value of the c l a s s i c a l sample s i z e computed f o r a
given value o f d l e t a i s input t o t h i s subroutine which uses t h i s value
t o c a l c u l a t e the f i r s t approximation of the Bayesian sample s i z e .
Confidence i n t e r v a l s are computed by using the STUDIN l i b r a r y
f u n c t i o n t o c a l c u l a t e the value t ( o / 2 , n * - l ) , where n * i s the computed
c l a s s i c a l or Bayesian sample s i z e .
The subroutine then computes the
lower and upper l i m i t s of the confidence i n t e r v a l ,
UL = m - t ( o / 2 , n *
-
l)
UU = m + t ( o / 2 , n *
- 1)
S
c
and
f o r the c l a s s i c a l case, and
UL = m" - t ( o / 2 , n *
and
b
-
l)
S
c
i.e.,
38
UU = m" + t ( o / 2 , n *
- l ) ——
b
f o r the Bayesian case.
Demonstrating the Model
I n order t o demonstrate the model developed t o approximate the
Bayesian sample s i z e i n the preceding chapter, various values of the
constants, d , A, and q used i n the i t e r a t i v e procedure were t r i e d i n
preliminary simulations.
The values d = k,
A = l A ? and Q. = 3/8 were
chosen f o r the f o l l o w i n g reasons:
a.
Values of d < h tended t o produce f i r s t approximations o f
the Bayesian sample s i z e which were too l a r g e when working w i t h small
values o f the c l a s s i c a l sample s i z e , n_.
0
the f i r s t approximation.
1
That i s , n, 2: n„ - n . a f t e r
1
0
j
Larger values of d produced more conservative
f i r s t approximations of the Bayesian sample s i z e f o r small values of
n , but a t the same time r e s u l t e d i n u n r e l i a b l e , i . e . ,
Q
greatly variable,
sample s t a t i s t i c s .
b.
n
Q
Values of A < l/h
were r e j e c t e d because f o r l a r g e values of
the number of i t e r a t i o n s r e q u i r e d t o compute the approximate Bayesian
sample s i z e was considerably increased.
I t was f e l t t h a t t h i s
result
was undesirable i n an Operational Testing mode and, o f course, i t
meant increased computer times t o solve the approximation.
o f using a v a r i a b l e value f o r A was t r i e d , i . e . ,
o n e - h a l f a f t e r each i t e r a t i o n .
also
A scheme
A was decreased by
This scheme was also r e j e c t e d because
39
f o r l a r g e r v a l u e s of
the i t e r a t i v e procedure q u i c k l y evolved
into
a s e q u e n t i a l t y p e o f sampling p r o c e d u r e .
c.
The v a l u e o f q = 3/8 was s e l e c t e d as a r e a s o n a b l e
b a s e d on t h e i l l u s t r a t i o n shown i n F i g u r e 3.
choice
The i n t e r v a l a - d r e p r e
s e n t s an i n t e r v a l e s t i m a t i o n b a s e d on t h e p o s t e r i o r d i s t r i b u t i o n .
from p r e v i o u s d e f i n i t i o n s ,
l/2
fiS.
a - d = 6 S , and t h e i n t e r v a l s a-m" • m"-d =
Then i f t h e i n t e r v a l s a-b = c - d = l/8 6 S , t h e sample mean, m,
i s r e q u i r e d t o be w i t h i n t h e i n t e r v a l b - c = 3 / 4 6 S , i . e . ,
3/8 6S i s t h e p r e r e q u i s i t e f o r i n c o r p o r a t i n g t h e p r i o r
into the estimation procedures.
for sufficient
sample
Then
|m" - m| £
information
I t was f e l t t h a t l/8 6S would a l l o w
v a r i a t i o n o f t h e sample mean due t o d i f f e r e n c e s
results.
F i g u r e 3.
S e p a r a t i o n o f t h e P o s t e r i o r and Sample Means
in
ko
The procedure t o approximate the Bayesian sample s i z e was demon
s t r a t e d using a h y p o t h e t i c a l case having the f o l l o w i n g c h a r a c t e r i s t i c s :
2
2
a.
the r a t i o a /CT
b.
|m' - p,| =
1
where p, and a
=l 6 .
5 and |m' - p,| = 10.
are the t r u e (but assumed unknown) values of the p a r a 2
1
meters of the sampling process and m and a
1
are the parameters of the
prior distribution.
The f i r s t t e s t of the procedure involved a computer simulation
of 100 runs f o r each value of d e l t a from 1.0 t o 0 . 2 .
The model was not
t e s t e d f o r the value of d e l t a equal to 0.1 i n t h i s or subsequent t e s t s
of the procedure because the l a r g e sample sizes involved r e q u i r e d an
excessive amount of computer t i m e .
The r e s u l t s o f t h i s f i r s t t e s t are
summarized i n Table 3 f o r the case where (m
f o r the case where
|m' - p,| = 10.
1
- p, j = 5 and i n Table k
These r e s u l t s appear q u i t e favorable
as shown i n the percentage of reduction achieved over the c l a s s i c a l
sample s i z e s .
Note t h a t the computed Bayesian sample s i z e does not
depend on the value of
|m' - p , | .
That i s , the Bayesian sample sizes
are i d e n t i c a l i n Tables 3 and k f o r a given value of d e l t a .
The con
fidence and accuracy o f the i n t e r v a l estimates produced, i . e . ,
the
number of times the t r u e mean of the sampling process i s contained
w i t h i n the i n t e r v a l and the w i d t h of the i n t e r v a l constructed, i s
comparable t o the r e s u l t s obtained using c l a s s i c a l methods f o r the
case where
|m' - p, | = 5 .
For the case where |m
f
- p, | = 10, the desired
confidence i s not achieved u n t i l the s i t u a t i o n i n v o l v i n g the two l a r g e s t
cample s i z e s .
The separation between the p o s t e r i o r and sample means
Table 3.
Data f o r the Bayesian Approximation Model Based
on One Hundred Runs f o r Each Value of D e l t a
2
a /a'
Specified
Width of
Confidence
Interval
Computed
Classical
Sample
Size
Number of
Times
UL <m<£JU
c
c
Average
Computed
Bayesian
Sample
Size
2
= 16, |m« - u-| = 5
Number of
Times
Average
Value of
UL ^n^UU
lm" -ml
b
b
Number of
Times
Percent Reduction
i n Sample Size
D <: q6S
n*.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
18
22
27
34
46
64
99
174
387
96
96
95
96
98
93
97
96
97
6.7
8.9
12.6
18.7
32.4
52.8
87.7
163.I
376.4
97
97
98
98
96
93
9*
98
98
5.589
4.814
4.355
2.541
2.175
1.185
0.756
0.470
0.217
75
76
79
90
95
100
100
100
100
61.1
59.1
51.9
44.1
28.3
17.2
11.1
5.7
2.6
Table k.
Data f o r the Bayesian Approximation Model Based
on One Hundred Runs f o r Each Value o f D e l t a
2
2
c /o'
Specified
Width of
Confidence
Interval
Computed
Classical
Sample
Size
Number of
Times
UL <m<:UU
c
c
Average
Computed
Bayesian
Sample
= 16,
|m
f
- \i\
Number of
Times
Average
Value of
UL <m<:UU
m" - m|
b
b
=
10
Number
Times
D <:
Percent Reduction
i n Sample Size
q8S
n*.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
18
22
27
3h
1+6
6h
99
174
387
96
96
96
96
98
9*
98
95
97
6.7
8.9
12.6
18.7
32.4
52.8
87-7
163.1
376.4
75
74
86
77
86
88
88
95
95
8.300
6.008
6-993
4.438
3.874
2.359
1.545
0.930
0.416
54
51
49
75
77
97
100
100
100
61.1
59.1
51.9
44.1
28.3
17.2
11.1
5.7
2.6
h3
decreases as the sample s i z e increases and the sampling information i s
given more weight i n the determination of the p o s t e r i o r d i s t r i b u t i o n .
For t h i s reason, the t e s t suggested f o r determining whether or not t o
use the p r i o r information does not work w e l l a t a l l .
where |m
T
- p, | = 5 and |m
T
For both the case
- p, | = 10, the t e s t r e j e c t s the p r i o r
infor
mation too o f t e n f o r small sample sizes and erroneously allows the use
of the p r i o r i n f o r m a t i o n i n l a r g e sample s i z e s .
I t appears t h a t a
b e t t e r decision r u l e as t o whether or not to r e j e c t the p r i o r
information
should consider the d i f f e r e n c e between the p r i o r mean ( r a t h e r than the
p o s t e r i o r mean) and the sample mean.
The accuracy of the approximation
procedure is q u i t e good; the o v e r a l l average reduction i n the sample
s i z e f o r a l l values o f d e l t a i s 12.0 samples, which equates t o a p p r o x i
mately 75 percent of the t r u e d i f f e r e n c e between the c l a s s i c a l and the
Bayesian sample s i z e s , which i s ]_6 samples f o r t h i s p a r t i c u l a r case.
The second t e s t of the procedure involved computing the Bayesian
sample size r e q u i r e d f o r each value of d e l t a and f o r various values of
|m' - [ i \ ranging from one standard d e v i a t i o n below the t r u e mean of the
sampling process t o one standard d e v i a t i o n above t h i s v a l u e .
i f i c values chosen f o r
I n Table 5 .
The spec
|m' - p, | and the r e s u l t s of the t e s t are shown
The r e s u l t s obtained when the value of
1
Im
- p, | i s w i t h i n
o n e - h a l f standard d e v i a t i o n on e i t h e r side of p, are q u i t e f a v o r a b l e ,
w i t h only t h r e e cases out o f the t o t a l of 63 t r i a l s where the Bayesian
i n t e r v a l estimate d i d not include the t r u e value of the mean of the
sampling process.
O v e r a l l , t h e r e were a t o t a l of 2h cases, out of the
99 t o t a l t r i a l s , where the Bayesian i n t e r v a l estimate d i d not include
Table 5.
Data f o r the Bayesian Approximation Model Based
on Various Values of |m' - p, |
2
a /cr'
Specified
Value of
Delta
Classical
Sample
Size
6
n*
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
18
22
27
3h
46
64
99
174
387
2
= 16
Computed Bayesian Sample Size f o r
the S p e c i f i e d Value of |m' - p,|
-20
7*
6*
15
23*
38*
51
85*
159
378
-15
5*
10*
7*
22
39*
48
87*
164*
379
-10
7
12
10
18*
35
52
88
159
376
-5
5
19
14
12
37
h9
89
163
375
-2
5
6
7
17
36
52
87
162
376
0
2
5
11
5
18
6
23 20
21 24
31 40
57 58
91 91
166 164
376 377
5
6
7
17
32
55
90
161
377
10
5
6
17*
9
36
54
93
166*
376
15
20
5*
13
15*
5*
6*
21
19*
36*
53*
9*
40
52
85
163
376*
Figures marked w i t h an a s t r i c ( * ) i n d i c a t e cases where the confidence i n t e r v a l based on the
Bayesian sample s i z e d i d not contain the t u r e value of the mean of the sampling process.
87
164*
378
45
the t r u e value of the mean of the sampling process.
The f i n a l t e s t conducted on the model was t o f i x the value
|m' - |j, | = 5 and to compute the Bayesian sample s i z e r e q u i r e d f o r each
2
value of d e l t a and f o r various r a t i o s of the v a r i a n c e s , cr /cr'
2
.
The
s p e c i f i c values chosen f o r the r a t i o of the variances and the r e s u l t s
of the t e s t are shown i n Table 6.
The r e s u l t s obtained when the r a t i o
of the sampling and the p r i o r variances was 4 or g r e a t e r are good, w i t h
only one case out of a t o t a l of 63 t r i a l s where the Bayesian i n t e r v a l
estimate d i d not include the t r u e value of the mean of the sampling
process.
O v e r a l l , t h e r e were a t o t a l of seven cases out of the 99
t r i a l s where the Bayesian i n t e r v a l estimate d i d not include the t r u e
value of the mean of the sampling process.
Table 6.
D a t a f o r t h e B a y e i s a n A p p r o x i m a t i o n Model Based
on V a r i o u s V a l u e s o f t h e R a t i o o f t h e V a r i a n c e s
|m
S p e c i f i e d Value
o f t h e R a t i o of
the Variances
cr /cr'
Classical
Size
- p. | = 5
Computed B a y e s i a n Sample S i z e
t h e S p e c i f i e d Value of D e l t a
1.0
32
16
8
7
6
5
4
3
2
1
f
0.9
0.8
5
5
14
5
14
17
11
16
17
18
6
10
16
21
6
21
19
16*
21
22
7
7
16
23
16
26
25
25*
27
27
18
22
27
0.7
13
22
29
29
30
33
34
31
33
• 33
0.6
0.5
for
0.4
0.3
0.2
22
39
41
43
44
44
45
45
45
46
33
48
58
58
60
61
64
64*
64
64
69*
87
9h
95
95
97
97
98
99
99
144
164
167
170
170
173
173
173*
173*
174*
368
379
382
382
384
385
385
385
385
387
46
64
99
174
387
Sample
34
F i g u r e s m a r k e d w i t h a n a s t r i c (*) i n d i c a t e c a s e s w h e r e t h e c o n f i d e n c e i n t e r v a l b a s e d on t h e
B a y e s i a n s a m p l e s i z e d i d n o t c o n t a i n t h e t r u e v a l u e of t h e mean o f t h e s a m p l i n g p r o c e s s .
CHAPTER IV
CONCLUSIONS AND RECOMMENDATIONS
Conclusions
The r e s u l t s
1.
of t h i s s t u d y i n d i c a t e t h e f o l l o w i n g
conclusions.
The s u g g e s t e d p r o c e d u r e t o a p p r o x i m a t e B a y e s i a n s a m p l e
and c o n s t r u c t
i n t e r v a l e s t i m a t e s f o r t h e mean o f t h e s a m p l i n g
s h o u l d b e u s e d f o r t h e n o r m a l s a m p l i n g p r o c e s s when a c c u r a t e
Information is a v a i l a b l e .
sizes
process
prior
T h a t i s , when t h e p r i o r mean i s w i t h i n o n e -
h a l f s t a n d a r d d e v i a t i o n o f t h e t r u e mean o f t h e s a m p l i n g p r o c e s s .
2.
I n t h e w o r s t c a s e , t h e p r o c e d u r e w i l l y i e l d t h e same s a m p l e
s i z e s as would c l a s s i c a l t e c h n i q u e s .
In t h i s c a s e , t h e i n t e r v a l
m a t e s s h o u l d be b a s e d on t h e c l a s s i c a l m e t h o d ,
p r i o r i n f o r m a t i o n has been
3.
since in essence,
the
rejected.
The a c c u r a c y a n d c o n f i d e n c e l e v e l s a s s o c i a t e d w i t h t h e
i n t e r v a l e s t i m a t e s b a s e d on t h e a p p r o x i m a t i o n p r o c e d u r e a r e
t o t h o s e o b t a i n e d by u s i n g c l a s s i c a l t e c h n i q u e s
is
esti
if the prior
comparable
information
accurate.
h.
The h e u r i s t i c r u l e s u g g e s t e d t o d e t e r m i n e w h e t h e r o r
not
t o u s e t h e p r i o r i n f o r m a t i o n d i d n o t work w e l l b e c a u s e t h e v a l u e
|m" = ml i s a f u n c t i o n of t h e s a m p l e s i z e a s w e l l a s b e i n g a
function
T
o f t h e v a l u e of t h e p r i o r mean, m .
5.
The r e s u l t s o b t a i n e d i n t h e d e m o n s t r a t i o n o f t h e
f o r t h e v a l u e s of d e l t a s e l e c t e d ,
procedure
indicate t h a t the procedure t o
approxi-
m a t e t h e B a y e s i a n s a m p l e s i z e and c o n s t r u c t
interval estimates is
v i a b l e c o n c e p t which has d i r e c t a p p l i c a b i l i t y and v a l u e i n
a
Operational
Testing.
Re c ommendat i ons
As i n m o s t c a s e s i n v o l v i n g r e s e a r c h o f a l i m i t e d s c o p e ,
more p r o b l e m s a r e u n e a r t h e d t h a n a r e r e s o l v e d i n t h i s s t u d y .
l i m i t e d r e s u l t s o b t a i n e d , h o w e v e r , show some m e r i t a n d
to Operational Testing.
The
applicability
As a m a t t e r o f f u t u r e r e s e a r c h i n t h e
c o v e r e d by t h i s s t u d y , t h e f o l l o w i n g recommendations a r e
1.
perhaps
Further efforts
area
suggested.
a r e r e q u i r e d t o improve t h e i t e r a t i v e
cedure u s e d t o approximate t h e Bayesian sample s i z e .
A refined
pro
pro
c e d u r e s h o u l d t a k e i n t o a c c o u n t t h e n e e d t o t r e a t l a r g e and s m a l l
sample s i z e s as s e p a r a t e p r o b l e m s .
approximation a t any s p e c i f i c
Perhaps t h e increment added t o
the
i t e r a t i o n s h o u l d be some f u n c t i o n o f
the
number o f i t e r a t i o n s a l r e a d y c o n d u c t e d .
C a r e m u s t be t a k e n ,
however,
t h a t any p r o c e d u r e developed f o r t h i s s i t u a t i o n be c o m p a t i b l e t o
O p e r a t i o n a l T e s t i n g e n v i r o n m e n t , w h e r e e a s e o f a p p l i c a t i o n and
c i t y are prime
2.
simili-
objectives.
The s a m p l e s t a n d a r d d e v i a t i o n , S , i n e q u a t i o n
only variable in the equation for a specific
c u l a r random v a r i a b l e
(2-13), is
sample s i z e .
This
t h e a p p r o x i m a t e B a y e s i a n s a m p l e s i z e would l e a d t o more
a p p r o x i m a t i o n s of t h e
the
parti
is related to the chi-square distribution.
haps f u r t h e r work w i t h t h i s p a r t i c u l a r e l e m e n t of t h e e x p r e s s i o n
3.
the
Per
for
accurate
equation.
A workable decision r u l e for determining whether or not
to
kg
use the p r i o r
information
is needed.
I t i s suggested t h a t the
s h i p b e t w e e n t h e p r i o r and s a m p l e m e a n s , i . e . ,
|m' - m | , w i l l
more v i a b l e r e s u l t s t h a n t h e t e c h n i q u e u s e d i n t h i s s t u d y .
whatever rule is developed,
i t must t r e a t t h e d i f f e r e n c e s
w i t h l a r g e and s m a l l sample s i z e s
4.
yield
Obviously,
associated
separately.
There a r e obvious l i m i t a t i o n s
Operational Testing.
relation
in applying t h i s procedure
A l t h o u g h t h e p r o c e d u r e h o l d s some p o t e n t i a l
r e d u c i n g c o s t s a s s o c i a t e d w i t h O p e r a t i o n a l T e s t i n g by r e d u c i n g
number o f r e p l i c a t i o n s r e q u i r e d of a s p e c i f i c t e s t s , a n y
s a m p l i n g scheme i s i n h e r e n t l y d i f f i c u l t
to
of
the
iterative
and c o s t l y t o a p p l y b e c a u s e of
t h e problems i n v o l v e d w i t h m u l t i p l e s c h e d u l i n g and s e t - u p c o s t s .
The
p r o c e d u r e seems b e t t e r s u i t e d t o t h o s e t e s t i n g s i t u a t i o n s w h e r e a
large
number o f s a m p l e s a r e r e q u i r e d a n d t h e c o s t o f s a m p l i n g i s
low.
relatively
F o r t h e s e r e a s o n s , a scheme t o i n c o r p o r a t e t h e c o n c e p t of
functions
i n t o t h i s procedure i s needed before
of a t r u e d e c i s i o n making p r o c e d u r e .
i t c a n assume t h e
loss
cloak
APPENDIX I
FORTRAN PROGRAM FOR THE APPROXIMATE
BAYESIAN PROCEDURE
51
RFOR, IS '-IAIN
CO'-'^i/ONE/ X ( 2 . 1 0 0 " ' ) , NCM
COV'.C'i/TWO/ XVEANI3), XVARO)
COMMON/THPEF/ XHAT(3). ?HAT(3>
C0M"~N/F I VP/ ALPHA, ML ( 3 ) , L" U 3 )
W ' C N / S E V E M / N C ( 5 ) . DELTA
CO'-OXN/E I0HT/ H 3 I 1 ! ) , NPRTM(lC). D I FF
CO'V-ON/NINE/ WIDTH (2 )
COM'-'ON/TEN/ L O 0 P ( 2 ) . K FY • DMAX
EXTERNAL UN IF
1
Pt-'AO IN PASIC PARAMETERS
S FORMAT ( )
READ(5»B) ALPHA
RFAO(0.8) NSU
REA^OfS) XVFAN(l), XVAR(l)
RCAD(5.P) XVEAN(2)» XvAR(2)
C****K
C»***»
ART UP UNIFORM GFMFRATOR TO RANDO" IZ F STARTW DOINIT
DO K' J = l , N5U
0= UNIF(A)
10 CONTIN'ir
DO 100 KK= 1» 30
REAOCi. fl. ENP = 999) DELTA
C*****
DETER VTNF THF MINI^U" CLASSICAL ? A WPL F «X T 7 T
CALL CLASS( M( 1 ) )
CALL RAiVDN ( 1 )
C***-»*
DETERMINE THE MINIMUM BAYFSI AN SAMPLF S I 7 F .
CALL f3AYFS( Nf 1 ) » N( 2 ) )
C«****
COVPUTF CONFIDENCE INTERVALS FOR THE DATA P R ^ r r c c r ?
CALL CONFTDC N( 3) . 3 )
CALL ORDEPU)
CALL CO**FID( N( 1 ) . 1 )
C**#**
PRINT OUTPUT
CALL OUTPUT
100 CONTINUE
999 CONTINUE
l\'RITF(6.7fM
7 0 FORMAT(1H1)
5 TOP
END
IF APPROPRIATE
52
-RFOR.IS RANDN
C*»***THIS SUBROUTINE GENERATES NORMALLY 01 STR I BUTFD PSEUDS-RANDOM
C
NUMBERS HAVING A SPECIFIED MFAN AND VARIANCE
C
C*****ARGUEMENT D E F I N I T I O N
C
X I S THE ARRAY OF RANDOM NUMBERS ( O U T P U T )
C
N I S THE NUMBER O F RANDOM NUMBERS D E S I R F D ( I N P I ' T )
C
XMEAN I S THE MEAN OF THE RANDOM NUMBERS ( I N P U T )
C
XVAR I S THE V A R I A N C E O F \ T H E RANDOM NUMBERS ( I N P U T )
C
C***#*THIS SUBROUTINE USES THE BOX AND MUELLER METHOD Fn<?
C
G^NFRAT TON OF NORMAL PSEUDO-RANDOM NU RFRS
W
SUBROUTINE RANDN(J)
COMMON/ONE/ X ( 2 » 1 0 0 0 ) , N(3)
COMMON/TWO/ XMEAN ( 3 ) , XVARO)
EXTERNAL UN I F
TPI=6.2831852
DO 100 1 = 1 , N ( J ) , 2
A= UNIF( 1 )
B= UNIF(2)
X ( 1 , I ) = XMFAN(2)+ SQRT(-2.0*XVAR(2)*ALOG(A))*COS(Tpy*B)
X ( 2 ,1 ) = X < 1 ,1 )
11= 1+ 1
X ( 1 , 1 I ) = XMEAM(2)+ SORT(-2.0*XVAR(2)*ALOr,( A) ) *M N < TPI *n >
X ( 2 ,1 I J = X ( 1 ,1 I )
100 CONTINUE
RETURN
END
- R F O R , i s UNIF
FUNCTION UNIF(A)
DATA I Y / 9 6 5 8 1 /
IY=IY*3 1 25
IF(IY)5,6,6
5 IY=IY+l+34359738367
6 YFL = I Y
UNIF= Y F L * 2 . 0 * * ( - 3 5 )
r
R TU^N
END
53
-RFOR.IS ORDER
C**»**
C
C
C
C*####
C
C
C
C
C
THIS SUBROUTINE SORTS A GIVEN SET OE DATA FROM TH^ LOWEST
VALUE TO THE HIGHEST, AND COMPUTES THE SAMPLE STATISTICS
(MEAN AND STANDARD DEVIATION) OF THE DATA PROCESS
ARGUEMENT DEFINITION
X= THE ARRAY OF DATA VALUES TO BE SORTED (INP"T/OUTPUT)
N= THE NUMBER OF DATA POINTS (INPUT)
XHAT = THE SAMPLE MEAN OF THE DATA PROCESS (OUTPUT)
SHAT = THE SAMPLE STANDARD DEVIATION OE THE DATA
PROCESS (OUTPUT)
SUBROUTINE ORDER(K )
COMMON/ONE/ X ( 2 » 1 0 0 0 ) , N O )
COMMON/THREE/ XHAT(3 ) , SHAT(3)
NM1= N ( K ) - 1
DO 200 1 = 1 , NM1
IP1=1+1
DO 100 J= IP1 , N(K)
IE( X(K»I) . L E . X ( K . J )
TFMP= X ( K • I )
X ( K , I ) = X(K,J)
X(K»J)= TEMP
8 100
CONTINUE
200 CONTINUE
C#*#**
) GO TO 100
COMPUTE THE SAMPLE STATISTICS FOR THE DATA PROCESS
SUM 1 = 0 . 0
SUM2=0.0
DO 300 1 = 1 , N(K )
SUM1= SUM 1+ X(K , I )
5UM2= SUM2+ X (K » I )**2
300 CONTINUE
YN= NIK)
XHAT(K)= SUM1/YN
RN= YN- 1.0
SUM22= SUM2- (SUM1**2)/YN
SHAT(K)= SORT(SUM22/RN)
8
RETURN
END
54
-RFOR »IS CLASS
C#*#*#
C
C
C
C*****
C
C
C
THIS SUBROUTINE CALCULATES THF MtNtMUM CLASSICAL CAWPLF SIZE
REQU t RED TO CONSTRUCT A CONFIDENCE iNTFRVAL OF GIvFN WIDTH
ABOUT THE MEAN OE A NORMAL SAMPLING POPULATION OF UNKNOWN
VARIANCE
ARGUEMANT DEFINITION
ALPHA = THE CONFIDENCE COEFFICIENT (INPUT)
DELTA= A FUNCTION OF THE INTERVAL WIDTH (INPUT)
NCLASS= THE COMPUTED SAMPLE SIZE (OUTPUT)
SUBROUTINE CLASS(NCLASS)
COMMON/FIVE/ ALPHA. UL I 3 ) , UU(3)
COMMON/SEVEN/ NC(5 ) . DELTA
COM ON /T EN / LOQp(2W KEY » DMAX
M
C#***#
C
C
C
COMPUTE THF FIRST APPROXIMATION OF THE CLASSICAL «AMPL*
S I Z E . N C ( 1 ) , BASFD ON THE STANDARD NORMAL DISTRIBUTION, WHICH
IS IDENTICAL TO THF T DISTRIBUTION WITH INFINITE nFGREFS
OF FREEDOM
ALPHA 1= ALPHA/2,0
S= TINOR (ALPHA 1, $15)
GO TO 18
v
15 WRITE(6,17)
17 FORMAT( / / , lOXi 68H ERROR MESSAGE—OVERFLOW ON T Nv Rc F NORMAL DIST
IRIBUTION--FORMAT 15
)
CALL EXIT
F
18 CONTINUE
REALC= (2.0*S/DELTA)**2
NC ( 1> = INT(REALC)
IF( NC(1) . L T . REALC ) NC(1)= NC(1)+ 1
C*****
C
C
C
COMPUTE THF SUCCEEDING APPROXIMATIONS OF THE CLASSICAL SAMPLE
S I Z E , N C ( J ) , BASED ON THE T DISTRIBUTION WITH DEGREES OF
FREEDOM EQUAL TO N C ( J - l ) - 1 .
STOP THE ITFPATIV oROCmuPF
WHEN N ( J ) IS EQUAL TO N ( J - l )
DO 30 J = 2 , 10
NDF = N C ( J - l ) - 1
T= STUD IN(ALPHA, NDF, $21)
GO TO 24
r
21 WRITE(6,23)
23 FORMAT?//, lOX, 74H ERROR MESSAGE—OVERFLOW ON STUDENTS T DI.STRIBu
1TI0N FUNCTION—FORMAT 21
)
CALL EXIT
24 CONTINUE
REALC= (2.0*T/DELTA)**2
NC(J)= I NT(REALC)
IF( NC(J) . L T . REALC ) NC(J)= NC(J)+ ]
IF ( NCU) .EO. N C ( J - l ) ) GO TO 35
8
30 CONTINUE
(.•*#•*
ASSIGN THE COMPUTED SAMPLE SIZE TO NCLASS
35 NCLASS- NCIJ)
LOOP(n = J
RETURN
END
56
-RFOR.IS
C
C
C
C
C
C*****
C
C
BAYES
THIS SUBROUTINE CALCULATES THE MINIMUM BAYESIAN SAMPLF S I Z E .
IF AP
APPROPRIATE. TO CONSTRUCT A CONFIDENCE INTERVAL OF GIVEN
WIDTH ABOUT THF MEAN OF A NORMAL SAMPLING POPULATION
WITH UNKNOWN VARIANCE
ARGUE
ARGUEMANT DEFINITION
K = THE MINIMUM CLASSICAL SAMPLE SIZE (INPUT)
NBAYES= THE COMPUTED SAMPLE SIZE (OUTPUT)
NB
SUBROUTINE BAYES ( K. NBAYES )
COMMON /ONE / X ( 2 . 1 0 0 U ) . N O )
COMMON/TWO/ XMEAN(3). XVAR(3)
COMMON/THREE/ XHAT(3)» SHAT(3)
COMMON/SEVEN/ N C ( 5 ) . DELTA
COMMON/EIGHT/ N B ( 1 0 ) . NPRIM(IO). DIFF
COMMON/TEN/ LOOP(2). KEY » DMAX
C#*»**
C
COMPUTE THE FIRST APPROXIMATION. N ( l ) . OF THE BAYESIAN
SAMPLE SIZE
REALB= FLOAT(K)/4.0
NB(1)= INK REALB )
IFt NB(1) . L T . REALB ) NB(1)= NB(1)+ 1
N 12)= NB( 1 )
<:•*•*•
C
C
TAKE N ( l ) SAMPLES AND COMPUTE THE SAMPLE STATISTICS FOR THE
DATA PROCESS AND THE POSTERIOR PARAMETERS BASED ON THESE
N ( l ) OBSERVATIONS
CALL ORDER(2)
APPN= SHAT(2)**2/XVAR(1)
NPRIM(1)= I NT( APPN )
IF( NPRIM(I) . L T . APPN ) NPRIM(1)= NPRIM(1)+ 1
XMEAN(3)= ( NPRIMt1)*XMEAN(1)+ NB(1)*XHAT(2) ) /
1 FLOAT( NPRIM(1 )+ NB ( 1 ) )
DIFF= ABS( XMEAN(3)- XHAT(2) )
DMAX= DIFF
KEY = 1
IF( N(2) . G E . K- NPRIM(l) ) GO TO 55
C*****
C
C
COMPUTE THE SUCCEEDING APPROXIMATIONS. N ( J ) . OF THE BAYESIAN
SAMPLE S I Z E . STOP THE ITERATIVE PROCEDURF WHEN NP(J) IS
GREATER THAN OR EQUAL TO K- NPRIM(J)
DO ICO J = 2 . 20
RINC= FL OAT ( K- N P R I M ( J - l ) )/<f.O
INC= I NT( RING )
IF{ INC . L T . RINC ) INC= INC + 1
NB(J)= N B ( J - 1 ) + INC
N(2)= NB{J)
C*****
C
C
TAKE EACH SUCCEEDING N ( J ) SAMPLES AND COMPUTE THF SAMPLE
STATISTICS FOR THE DATA PROCESS AND THE POSTERIOR PARAMETERS
BASED ON THESE N ( J ) OBSERVATIONS
CALL ORDER(2 )
57
APPN = 5HAT(2)**2/XVAR(1)
NPRIM(J)= INK APPN )
IF( NPRIM(J) . L T . APPN ) NPRIM(J)* NPRIM(J)+ 1
XMEAN(3)= ( NPRIMIJ) *XMEAN( 1) + NB(J)*XHAT(2 ) ) /
1 ELOAK NPR1M(J)+ NB(J) )
DIFE= ABS( XMEAN(3)- XHAT(2 ) )
IE( DIFF . L E . DMAX ) GO TO 35
DMAX= DIFF
K.EY= J
35 CONTINUE
IF( N(2) . G E . K- NPRIM(J) ) GO TO 45
100 CONTINUE
C»*##*
ASSIGN THE SAMPLE SIZE COMPUTED ABOVE TO N&AYES ANO DETERMINE
C
THE POSTERIOR (POOLED) SAMPLE SIZE
55 CONTINUE
NBAYE S= N(2)
IE( NBAYFS .GT. K ) NBAYES = K
N ( 3 ) = NBAYES+ NPRIM(1)
XHAK3)= XMEAN13)
SHAT(3)= SHAT12)
L00P(2)= 1
GO TO 999
45 CONTINUE
NBAYES= NB(J)
IF( NBAYES . G T . K ) NBAYES= K
N(3)= NBAYES+ NPRIM(J)
XHAT(3)= XMEAN(3)
SHAT(3)= SHAK2)
LOOP(2 )= J
999 RETURN
END
58
-RFOR•IS CONFID
C####*
C
C
C#*#*#
C
C
C
C
C
C
C
THIS SUBROUTINE CALCULATES A CONFIDENCE INTERVAL FOR THE MEAN
OF A NORMAL POPULATION WHEN THE VARIACE IS UNKNOWN
ARGUEMENT DEFINITION
N= THE NUMBER OF DATA POINTS IN THE SAMPLE (INPUT)
ALPHA= THE CONFIDENCE COEFFICIENT (INPUT)
XHAT= THE SAMPLE MEAN OF THE DATA PROCESS (INPUT)
SHAT = THE SAMPLE STANDARD DEVIATION OF THE DATA
PROCESS (INPUT)
UL= THE LOWER CONFIDENCE LIMIT FOR THE MFAN (OUTPUT)
UU= THE UPPER CONFICENCE LIMIT FOR THF MEAN (OUTPUT)
SUBROUTINE CONFIDtN. J )
COMMON/THREE/ XHA T(3 ) . SHAT(3)
COMMON/FIVE/ ALPHA. U L ( 3 ) . UU(3)
C*****
C***»#
C
C
C
C
COMPUTE THE DEGREES OF FREEDOM ASSOCIATED WITH THF ^A^PLE
NDF = N-1
DETERMINE THE VALUE OF THE 5TUDENT(S T DISTRIBUTION AT A
SIGNIFICANCE LEVEL = ALPHA
NOTE — THIS OPERATION USES A STAT*PACT FUNCTION CA[ LED STUDIN
TO CALCULATE THE INVERSE STUDENTS T VALUE GIVEN THE
CONFIDENCE COEFFICIENT ALPHA
T= STUD IN(ALPHA . NDF. $10)
GO TO 700
10 WRITE(6.15)
15 FORMAT ( / / . l OX. 7<.H ERROR MESSAGE—OVERFLOW ON STUDENT (S T DISTRIBU
1TION FUNCTION—FORMAT 700
,)
CALL EXIT
700 CONTINUE
YN = N
C*****
COMPUTF THF LOWER CONFIDENCE LIMIT
UL(J)= XHAT(J)- T*(SHAT(J)/SQRT(YN) )
C*****
COMPUTE THE UPPER CONFIDENCE LIMIT
UU(J)= XHAT(J)+ T*(SHAT(J)/SQRT(YN) )
RETURN
END
-RMAP
LIB S Y S T E M i * M A T H S T A T•
-XQT
59
-RFOR * IS OUTPUT
SUBROUTINE OUTPUT
COMMON/ONE/ X < 2 . 1 0 0 0 ) . N(3>
COMMON/TWO/ XMEAN( 3 ) . XVAR(3)
COMMON/THREE/ XHATO). SHAT(3)
COMMON/FIVE/ ALPHA. UL(3 ) . UU(3)
COMMON/SEVEN/ NC<5). DELTA
COMMCN/EIGHT/ N B ( l O ) . NPRIM(IO). DIFF
COMMON/NINE/ WIDTH(2)
COMMON/TEN/ LOOP(2)• KEY• DMAX
c
»####
PRINT HEADINGS FOR PRINTED OUTPUT
DO 100 J = l . 2
WRITE(6.15)
15 FORMAT{1 HI)
1F( J . E G . 2 ) GO TO 40
35 F O R M A T ^ / / ! 40X. 45H DATA VALUES USED IN THE CLASSICAL ANALYSIS
GO TO 50
I
40 WRITE(6.45)
45 FORMAT(/ / / . 40X. 45H DATA VALUES USED IN THE BAYESIAN ANALYSIS
)
C*»**»
PRINT BASIC PARAMETERS ASSOCIATED WITH EACH DATA PROCESS
50 CONTINUE
WRITE(6.52) N ( J )
52 FORMAT(///. 10X. 26H NUMBER OF OBSERVATIONS =
. I'M
WRITE(6.54) XMEAN(2)
54 FORMAT(1 OX . 19H LIKELIHOOD MEAN =
. F8.3)
8
IF( J . E Q . 2) WRITE(6.56) XMEAN(1)
56 FORMAT(1H+. T 8 2 . 14H PRIOR MEAN =
• F8.3)
WRITE(6t58> XVAR(2)
58 FORMAT(lux. 23H LIKELIHOOD VARIANCE =
. F8.3)
IF( J . E C 2) WRITE(6.60) XVAR ( 1 )
60 FORMAT(1H+, T 8 2 . 18H PRIOR VARIANCE =
. F8.3)
WRITE(6.62) DFLTA
62 FORMAT(1UX. 9H DELTA =
. F4.2)
C***#*
PRINT THE DATA VALUES GENFRATED BY RANON
WRITE(6»65) ( X U . I ) . 1 = 1 . N ( J ) )
65 FORMAT( / / / . 1 0 ( 3 X . F 8 . 3 ) )
C**»**
PRINT THE SAMPLE STATISTICS OF THE DATA PROCESS
IF( J .EG. 2 ) WRITE(6,72) XHAT(3)
72 FORMAT(///. lOX. 47H THE MCAN OF THE POSTERIOR DISTRIBUTION, M— =
1
, F10.5)
WRIT"I 6 . 7 5 ) XHA T(J ) » SNA T(J )
75 I- ORN* AT ( / / / » U'X, 45H THE SAMPLE MT AN Of- THF DATA PRnr>c,r,,
,
1 Fl . 5 . / / / . 1 )X, 5QH THr SAMPLf" STANDARD D V I A T1 ON OF TUP DATA p
2R0Cf SS » SHAT =
,
F10 • 5
)
y
r
H
A
r
=
6o
<.**#*#
C
PRINT THE ( 1 -ALPHA) CONFIDENCE INTERVAL ASSOCIATE* WITH FACH
PROCESS
WlDTH(J>= DFLTA*SHAT(J )
WKITEJ6.85) WIDTH(J)
65 FORMAT( / / » 10X. 52H THE DESIRED WIDTH OF THE CONFIDENCE INTERVAL I
IS =
. F6.2 )
UL(2)= UL(3)
UU(2)= UU(3)
WRITE(6.95) ALPHA. U L { J ) . UU(J)
95 FORMAT(//» 10X. 47H THE (1-ALPHA ) CONFIDENCF INTERVAL FOP THF MEAN
1 . / . 1 0 x . 38H WITH CONFIDENCE COEFFICIENT. ALPHA = , F 4 . 3 ,
2 8H, IS = ( . F 8 . 3 . 2H,
, F 8 . 3 . IH)
)
IF( J •EQ« 1) GO TO 97
WRITE(6.98)
98 FORMAT(//.
1 AND SAMPLE
WRITEC6.99)
99 FORMAT!//.
8
DIFF
] 0 x . 71H THE ABSOLUTE DIFFERENCE BETWEEN THE POSTERIOR
MEANS. DIFF =
. F6.3)
DMAX» KEY
1 0 x . 7H DMAX =
. F 6 . 3 . 10H AT LOOP =
.12)
97 CONTINUE
WRITE(6.96) LOOP(J)
96 FORMAT(//. 1 0 x . 9H LOOPS =
1UJ CONTINUE
RETURN
END
.12)
APPENDIX I I
FORTRAN PROGRAM FOR THE CHI-SQUARE
TEST OF NORMALITY
62
-RFOR* IN CHISQ
C*****
THIS SUBROUTINE TAKES A SET OF ORDERED DATA (ARRAMGFD F R ^ V
C
THE LOWEST TO THE HIGHEST VALUE) AND
C
(1) ESTABLISHES K EOUAL-PROBABIL I TV CELLS, '-'HERE K DEPENDS ON
C
THE SAMPLE SIZE, I . E . » K= 20 FOR N . G E . 1 0 0 , K= 1" FOP N . GE.
C
5u .AND. ) L T • 10!), AND K= 5 FOR M . L T . 50
C
(2) PERFORMS A CHI-SOHARE GOODNESS-OF - F I T TEST FoR NORMALITY
C
ON THE DATA SAMPLE AND DFTERMIMFS THE SIGNIFICANCE LEVEL
C
AT WHICH WE CAN ASSUME THAT THE DATA SAMPLF IS IN FACT
C
REPRESENTATIVE OF A NORMAL PROCESS
C*****
ARGUEMENT DEFINITION
C
X= THE ARRAY OF DATA VALUES TO BE TESTED (INPUT)
C
N = THF NUMbER OF DATA PIONTS (INPUT)
C
K = THE NUMBER OF CELLS INTO WHICH THE DATA IS nIvIDED (INPUT
C
XHAT = THE SAMPLE MEAN OF THF DATA PROCESS
C
SHAT = THE SAMPLE STANDARD DEVIATION OF THE DATA PROCESS
C
CHIS= THE CHI-SQUARE STATISTIC COMPUTED FROM
C
THE DATA (OUTPUT)
C
SIGL= THE SIGNIFICANCE LEVEL OF THE TEST (Ol»TP"T)
SUBROUTINE CHI SO
COMMON/ONE/ X(500 ) , N
COMMON/TWO/ XMEAN, XVAR
COMMON/FOUR/ K, KLESS1, CHIS, SIGL
COMMON/5 I X/ CBSTRD(19), CBN0RM(19), KOUNT(20)
DIMENSION ALPHA(19)
C*****
SET ALL CFLL COUNTERS TO ZERO
DO 5 1 = 1 , K
KO'INT ( I ) = 0
5 CONTINUE
v>#*#*
C
C
iJ
15
20
25
30
35
5J
7)
9.)
Hn
COMPUTE THE CELL-BREAK POINTS FOR THE GIV F N DATA
NOTE — THIS OPERATION USES A STAT-PACT FUNCTION C A | LED TI'lORM
TO COMPUTE THE VALUE OF THE INVERSE OF THE NORMAL ( ^ , 1 ) DISTR.
I F ( K - 1 J ) 1 0 , 2 n , 30
DO 15 1 = 1 , KLESS1
ALPHA(I)= n . 2 * I
CONTINUE
GO TO 5 0
DO 25 1 = 1 , KLESS1
ALPHA(I )= 0 . 1 * 1
CONTINUE
GO TO 50
DO 35 1 = 1 , KLESS1
ALPHA(I)= 0 . 0 5 * 1
CONTINUE
DO U . ) 1 = 1, KLES51
CbSTRD(I)= T1 NORM(ALPHA(I ) , $70)
CBMORM(I)= CBSTRD( I )*SQRT( XVAR )+ X'H Af
GO TO I J C
-'R I TE ( 6
)
r- ORMAT ( / / , 1 iiX,68H [ RROR f-T S AG*—0\'f Rf
1TRlBUT ION — FORMAT 10f
)
r
CON 1 INUL
•"
l
r
IO" ^M
'
'•'•',[. "> I ;
63
C#***#
COUNT THE NUMUER OF OBSERVATIONS FALLING IN EACH rFLL
DO 3<u 1 = 1 , N
DO 2 JO J = l» K L ESS 1
IF( X ( I ) .GT. CONORM(J) ) GO TO 190
<OUNT ( J } = <0')MT ( J ) +1
GO TO 300
19;
I F ( J . Q . KLFSS1) <OUNT(<)= KCl»NT(K)+l
20')
CONTINUE
3(3 J CONTINUE
C
COMPUTE THE CHI-SOUARE STATISTIC, CHIS
CHI l = u.C
RM= FLOAT( N ) / FLOAT ( K )
DO 5 J . J I = l» rC
CHU= CH I 1+ ( KOUNT ( I ) -RN)**2
50., CONTINUE
CHIS= CHI1/RN
C***»*
i; F T R '-11 N THE SIGNIFICANCE LEVEL OF THE T F ST
C
NOTE — THIS OPERATION HSFS A STAT-PACT FACTION C A| LFD CHT TO
C
DETERMINE THE CHI-SOUARE DISTRIBUTION GIVEN THF P^INT AND
C
THF f) E G R F E S OF FRFEDOM
NDF = K.-3
CUXD= CHI( CHIS, NDF, 5600 )
SIGL= 1 . 0 - CU^D
GO TO 69 0
P
r
b*><> '// R I T ' ( 6 , 6 1 0 ) CHIS
6 l J FORMAT(//, 1 OX , 74H ERROR MESSAGE—OVERFLOW ON CHl-SO"ARE DISTRICT
HON FUNCTION—FORMAT 600
,2AH CHI-SOUARE STATISTIC = , F 5 . 2 )
CONTINUE
69 ) RETURN
END
64
BIBLIOGRAPHY
1.
Anscombe, F . J . , " B a y e s i a n S t a t i s t i c s , " The A m e r i c a n
Vol. 15, 1961, pp. 21-24.
Statistician,
2.
Army R e g u l a t i o n No. 1 0 - 4 , "U. S. Army O p e r a t i o n a l T e s t a n d E v a l u
a t i o n A g e n c y , " H e a d q u a r t e r s , D e p a r t m e n t o f t h e Army, W a s h i n g t o n ,
D. C . , 1 9 7 4 .
3.
Army R e g u l a t i o n No. 7 1 - 3 ? " U s e r T e s t i n g , " H e a d q u a r t e r s ,
o f t h e Army, W a s h i n g t o n , D. C , 1 9 7 4 .
4.
Army R e g u l a t i o n No. 1 0 0 0 - 1 .
"Basic P o l i c i e s f o r Systems A c q u i s i t i o n
by t h e Department of t h e Army," H e a d q u a r t e r s , Department of t h e
Army, W a s h i n g t o n , D. C , 1 9 7 1 .
5.
A t z i n g e r , E . M . , a n d B r o o k s , W. J . , C o m p a r i s o n o f B a y e a i a n a n d
C l a s s i c a l Analysis for a Class of Decision Problems, Technical
R e p o r t No. 5 9 , A p r i l 1 9 7 2 , U. S . Army M a t e r i e l S y s t e m s A n a l y s i s
Agency, Aberdeen P r o v i n g Ground, Maryland.
6.
A t z i n g e r , E . M . , B r o o k s , W. J . , e t a l . , Compendium o n R i s k A n a l y s i s
T e c h n i q u e s , S p e c i a l P u b l i c a t i o n No. 4 , J u l y 1 9 7 2 , U. S. Army S y s t e m s
A n a l y s i s Agency, Aberdeen P r o v i n g Ground, Maryland.
7.
Box, G. E . P . , a n d M u l l e r , M. E . , "A Note o n t h e G e n e r a t i o n o f
Random Normal D e v i a t e s , " A n n a l s o f M a t h e m a t i c a l S t a t i s t i c s , V o l .
2 9 , 1958, p p . 21-2U.
8.
G i l b r e a t h , S. G., A Bayesian Procedure f o r t h e Design of S e q u e n t i a l
Sampling P l a n s , Unpublished D o c t o r a l D i s s e r t a t i o n , Georgia I n s t i t u t e
of Technology, A t l a n t a , Georgia, 1966.
9.
H i n e s , W. W., a n d Montgomery, D. C , P r o b a b i l i t y a n d S t a t i s t i c s i n
E n g i n e e r i n g a n d Management S c i e n c e , New Y o r k : The R o n a l d P r e s s
Company, 1 9 7 2 .
Department
10.
H o e l , P . G . , P o r t , S. C , a n d S t o n e , C. J . , I n t r o d u c t i o n t o
c a l Theory, Boston:
Houghton M i f f l i n Company, 1 9 7 1 .
11.
0TEA, "MICV F i r i n g P o r t Weapon O p e r a t i o n a l T e s t I (MICV FPW 0T I )
T e s t D e s i g n P l a n , " U. S. Army O p e r a t i o n a l T e s t a n d E v a l u a t i o n
Agency, F t . B e l v o i r , V i r g i n i a , September 1973-
12.
M a c e , A. E . , Sample S i z e D e t e r m i n a t i o n , New Y o r k :
P u b l i s h i n g C o r p . , 1964.
Reinhold
Statists
65
13.
R a i f f a , H . , and S c h l a i f e r , R . , A p p l i e d S t a t i s t i c a l D e c i s i o n T h e o r y ,
Boston:
The MIT P r e s s , 1 9 6 1 .
Ik.
UNIVAC, " L a r g e S c a l e S y s t e m s STAT-PACK and MATH-PACK P r o g r a m m e r ' s
R e f e r e n c e , " New York" S p e r r y Rand C o r p . , 1 s t R e v i s i o n , 1 9 7 0 .
15.
W h i t e , L. R . , B a y e s i a n R e l i a b i l i t y A s s e s s m e n t f o r S y s t e m s P r o g r a m
Decisions, Unpublished Masters D i s s e r t a t i o n , Air Force I n s t i t u t e
o f T e c h n o l o g y , W r i g h t - P a t t e r s o n AFB, O h i o , 1 9 7 1 .
16.
W i n k l e r , R. L . , I n t r o d u c t i o n t o B a y e s i a n I n f e r e n c e a n d D e c i s i o n ,
New York:
H o l t , R i n e h a r t and Winston, I n c . , 1972.