AN APPLICATION OF BA.YESIAN ANALYSIS I N DETERMINING APPROPRIATE SAMPLE SIZES FOR USE I N U. S. ARMY OPERATIONAL TESTS A THESIS Presented t o The F a c u l t y of the D i v i s i o n of Graduate Studies By Robert L. Cordova In Partial Fulfillment of the Requirements f o r the Degree Master of Science i n I n d u s t r i a l Engineering Georgia I n s t i t u t e of Technology June, 1975 AN APPLICATION OF BA YES IAN ANALYSIS IN DETERMINING APPROPRIATE SAMPLE SIZES FOR USE IN U. S . ARMY OPERATIONAL TESTS Approved: s„ ,J .J 1 R u s s e l l G. Heik.es, Chairman / J/ if -*777 Gay3en Thompson ^ Date approved by Chairman: ) ii TABLE OF CONTENTS Page LIST OF TABLES iii LIST OF FIGURES iv SUMMARY v Chapter I. INTRODUCTION 1 The General Problem The S p e c i f i c Problem Background Review of the L i t e r a t u r e II. THE TEST METHODOLOGY 10 The Assumptions of Normality The P r i o r I n f o r m a t i o n The Basic A l t e r n a t i v e s of Determining Sample Size The C l a s s i c a l Method A Bayesian Approximation III. DEMONSTRATION OF THE METHODOLOGY 34 Programming the Model Demonstrating the Model IV. CONCLUSIONS AND RECOMMENDATIONS 47 Conclusions Recommendations Appendices I. II. FORTRAN PROGRAM FOR THE APPROXIMATE BAYESIAN PROCEDURE 50 FORTRAN PROGRAM FOR THE CHI-SQUARE TEST OF NORMALITY 6l BIBLIOGRAPHY 64 iii LIST OF TABLES Table Page 1. Minimum Sample S i z e - C l a s s i c a l Method 17 2. T e s t o f Normal Random Generator 35 3. Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based on One Hundred Runs f o r Each Value o f D e l t a 2 a 4. 2 / V = 16, |m' - ji| = 5 Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based on One Hundred Runs f o r Each Value o f D e l t a 2 2 1 a/a' = 16, Im - n| = 10 5. 42 Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based on V a r i o u s Values o f |m' 2 a /a' 6. 41 2 = 16 kk Data f o r t h e B a y e s i a n A p p r o x i m a t i o n Model Based on V a r i o u s V a l u e s o f t h e R a t i o o f t h e V a r i a n c e s Im* - (j. | = 5 *6 iv LIST OF FIGURES Figure Page 1. Bayes Theorem f o r Continuous Random Variables 18 2. Bayes Theorem f o r Normal D i s t r i b u t i o n s 23 3. Separation of the P o s t e r i o r and Sample Means 39 V SUMMARY The r e s e a r c h i s d e v o t e d t o m o d i f y i n g t h e B a y e s i a n t e c h n i q u e s a s s o c i a t e d w i t h d e t e r m i n i n g t h e minimum sample s i z e r e q u i r e d t o c o n s t r u c t i n t e r v a l e s t i m a t e s o f t h e t r u e mean o f an e x p e r i m e n t a l or s a m p l i n g p r o c e s s w h i c h i s modeled by a normal d i s t r i b u t i o n w i t h unknown p a r a m e t e r s , The p r o c e d u r e c o n s i d e r s o n l y t h e c a s e where t h e p r i o r i n f o r m a t i o n can be r e p r e s e n t e d by a normal d i s t r i b u t i o n w i t h known mean and known v a r i a n c e . R i g o r o u s B a y e s i a n a n a l y s i s o f t h i s s i t u a t i o n would r e s u l t in u s i n g a p o s t e r i o r d i s t r i b u t i o n w h i c h has a normal-gamma d e n s i t y t o c o n struct interval estimates. In o r d e r t o c i r c u m v e n t t h e o b v i o u s c u l t i e s o f working w i t h t h i s r a t h e r complex d e n s i t y , a procedure, which i s f e l t t o be more c o m p a t i b l e t o t h e U. S. Army O p e r a t i o n a l environment, i f offered diffi Testing f o r a p p r o x i m a t i n g t h e r e q u i r e d B a y e s i a n sample size. 2 I f t h e v a r i a n c e o f t h e s a m p l i n g or e x p e r i m e n t a l p r o c e s s , <j 5 were known, t h e minimum B a y e s i a n sample s i z e r e q u i r e d t o c o n s t r u c t an i n t e r v a l e s t i m a t e about t h e t r u e mean, |i, w i t h c o n f i d e n c e c o e f f i c i e n t , w i d t h , k, 5 and is: 2 Z b where o/ r e f e r s L / o a n 2 k 2 a' 2 t o t h e p e r c e n t a g e p o i n t s o f t h e s t a n d a r d normal d i s - t r i b u t i o n s u c h t h a t P(Z > Z / o ) = <v/2, and a' i s the variance of the otl£ p r i o r d i s t r i b u t i o n o r , t h e p r i o r v a r i a n c e o f t h e t r u e s a m p l i n g mean, |i. vi S u b s t i t u t i n g t h e sample v a r i a n c e , S , f o r t h e t r u e p r o c e s s v a r i a n c e and t h e term t / o , n* - 1 for Z / i n t h e above e x p r e s s i o n , and d e f i n i n g t h e w i d t h o f t h e c o n f i d e n c e i n t e r v a l as a f u n c t i o n o f t h e sample v a r i a n c e , i . e . , k = 6S, r e s u l t s i n t h e f o l l o w i n g e x p r e s s i o n for t h e approximate B a y e s i a n sample s i z e : 2 t( /2, n * - l ) a where n * c i s t h e c l a s s i c a l sample s i z e r e q u i r e d t o c o n s t r u c t an i n t e r v a l e s t i m a t e o f w i d t h k about t h e t r u e mean, The term t ( o / 2 , n * c c of t h e sampling p r o c e s s . - l ) refers to the percentage points of the t distribution with n* n* 2 c c Student's - 1 d e g r e e s o f freedom s u c h t h a t P [ ( t > t(o//2, - 1)] = a/2. The e x p r e s s i o n f o r t h e approximate B a y e s i a n sample s i z e i s Iteratively, s t a r t i n g w i t h a f r a c t i o n o f t h e c l a s s i c a l sample r e q u i r e d f o r t h e i n t e r v a l e s t i m a t e o f t h e same s p e c i f i e d and a c c u r a c y , a s t h e f i r s t a p p r o x i m a t i o n . solved size confidence The i t e r a t i v e p r o c e d u r e is programed f o r a UNIVAC 1108 computer and a p p l i e d t o a h y p o t h e t i c a l example t o d e m o n s t r a t e t h e e f f e c t i v e n e s s of t h e methodology. If accurate prior information i s a v a i l a b l e , the r e s u l t s achieved by t h e p r o c e d u r e d e v e l o p e d t o approximate t h e B a y e s i a n sample s i z e a.nd c o n s t r u c t i n t e r v a l e s t i m a t e s o f t h e unknown s a m p l i n g mean, |i, o f s p e c i f i e d c o n f i d e n c e and a c c u r a c y are comparable t o t h e r e s u l t s using c l a s s i c a l techniques. obtained These r e s u l t s , however, are a c h i e v e d u s i n g s m a l l e r samples s i z e s t h a n r e q u i r e d f o r t h e c l a s s i c a l c a s e . A pro- vii oedure was suggested f o r examining the accuracy of the p r i o r information and a s c e r t a i n i n g whether or not Bayesian analysis was appropriate f o r a given sampling or experimental s i t u a t i o n . However, the expected r e s u l t s of using t h i s procedure were not obtained. CHAPTER I INTRODUCTION The General Problem T h i s s t u d y i s an i n v e s t i g a t i o n o f t h e problem o f determining t h e minimum sample s i z e o f an e x p e r i m e n t , t h a t i s , t h e minimum number o f r e p l i c a t i o n s o f t h e e x p e r i m e n t , r e q u i r e d t o e s t i m a t e t h e mean o f t h e experimental v a r i a b l e t o w i t h i n a predetermined accuracy. In g e n e r a l , t h e s t u d y i s l i m i t e d t o a c e r t a i n t y p e o f t e s t i n g s i t u a t i o n w h i c h has the following a. characteristics: The t e s t v a r i a b l e , whose mean i s t o be e s t i m a t e d , can be modeled by a normal d i s t r i b u t i o n w i t h unknown mean and unknown v a r i a n c e . b. Information i s a v a i l a b l e , p r i o r t o sampling or experimenting, from w h i c h a p r o b a b i l i t y d i s t r i b u t i o n o f t h e mean o f t h e t e s t variable can be c o n s t r u c t e d . c. This p r i o r i n f o r m a t i o n can be r e p r e s e n t e d by a normal d i s - 2 t r i b u t i o n w i t h known mean, m , and known v a r i a n c e , , a* . 1 The S p e c i f i c A U. S. Army O p e r a t i o n a l T e s t Problem (OT) i s an o v e r a l l e v a l u a t i o n o f a s y s t e m w h i c h has b e e n d e v e l o p e d f o r g e n e r a l u s e w i t h i n t h e U. S . Army structure (2). I n t h i s c o n t e x t , a "system" may be not o n l y hardware, but a l s o d o c t r i n a l c o n c e p t s , and i s u s u a l l y a m i x t u r e o f b o t h . t e s t i s c o n d u c t e d i n an environment w h i c h d u p l i c a t e s o r The closely s i m u l a t e s t h o s e c o n d i t i o n s under w h i c h t h e s y s t e m w i l l be employed if 2 i t i s adopted f o r general use. I t is this fact that generally differen t i a t e s OT from engineering, developmental, p r e - p r o d u c t i o n , and other t e s t s which may be conducted on the same system and which are u s u a l l y a p a r t of the t o t a l development scheme o f the system. r e q u i r e d t o be an independent e v a l u a t i o n of the system. I n f a c t , OT i s Thus, OT i s a v i t a l p a r t o f the process by which new equipment and concepts are incorporated i n t o the U. S. Army s t r u c t u r e . An Operational Test i s i n essence a systematic p l a n f o r e v a l u a t i n g the t o t a l system being t e s t e d . I t i s composed of numerous subtests which address s p e c i f i c issues (unknown parameters) which are considered c r i t i c a l or paramount t o the t o t a l e v a l u a t i o n of the system (3). The s p e c i f i c c r i t i c a l issues t o be evaluated by each subtest and the order of these t e s t s govern the o v e r a l l s t r u c t u r e of the Operational Test. Once the s p e c i f i c s t r u c t u r e of the Operational Test has been e s t a b l i s h e d , a d e c i s i o n must be made as t o the number of r e p l i c a t i o n s of each subtest to conduct i n order t o p r o p e r l y evaluate the issue i n question. critical Time and budget c o n s t r a i n t s place emphasis on con ducting the minimum number o f r e p l i c a t i o n s p o s s i b l e ; w h i l e the disastrous consequences t h a t could r e s u l t i f a c r i t i c a l issue i s not p r o p e r l y e v a l u a t e d , make i t imperative t h a t accuracy i s not s a c r i f i c e d f o r economy. Thus, the problem reduces down t o one of determining the minimum number of r e p l i c a t i o n s of each subtest t o conduct i n order to evaluate the c r i t i c a l issues i n question t o w i t h i n a predetermined accuracy. Current procedural and p o l i c y documents governing the conduct of Operational Tests (3, h 9 11) suggest t h a t f o r the most p a r t , sample 3 s i z e s are determined using non-Bayesian or c l a s s i c a l s t a t i s t i c a l methods. These methods do not consider p r i o r information a v a i l a b l e concerning the v a r i a b l e being t e s t e d ; t h e r e f o r e , inferences and decisions about the v a r i a b l e are based e n t i r e l y on the experimental or sampling r e s u l t s . Bayesian techniques, on the other hand, attempt t o use both the p r i o r i n f o r m a t i o n and the experimental r e s u l t s i n making inferences and decisions about the v a r i a b l e . Thus, t h i s i n v e s t i g a t i o n i s essentially a search f o r a p r a c t i c a l procedure f o r applying Bayesian techniques t o Operational T e s t i n g . The p r i n c i p a l Operations Research t o o l s used i n t h i s study are s t a t i s t i c a l inference and e s t i m a t i o n techniques to develop the methodology, and computer simulation techniques t o demon s t r a t e the procedures developed. O p e r a t i o n a l Testing i s an expensive undertaking which must operate i n an environment constrained by budget and time considerations. The author b e l i e v e s t h a t a methodology which e f f e c t i v e l y reduces the number of r e p l i c a t i o n s r e q u i r e d t o evaluate the c r i t i c a l issues addressed by each subtest and also maintains the accuracy and confidence d e s i r e d of the t e s t , i s a worthwile p u r s u i t d i r e c t l y a p p l i c a b l e t o the Operational Testing environment. Background During the l a s t decade, there has been an increasing emphasis and d r i v e w i t h i n the m i l i t a r y community t o develop and f o r m a l i z e a methodology t o adequately i d e n t i f y and evaluate the r i s k s associated w i t h the development and procurement of major weapons systems. underlying premise which i n i t i a t e d t h i s a c t i o n was t h a t The unanticipated k cost and time over-runs and performance shortcomings, which had become i n c r e a s i n g l y p r e v a l e n t , were the r e s u l t of inadequate assessment of the r i s k s involved w i t h the m a t e r i e l a c q u i s i t i o n process. The metho dology which grew out of t h i s e f f o r t i s known as decision r i s k a n a l y s i s . I n a r e p o r t prepared f o r the Army M a t e r i e l Systems Analysis Agency (AMSAA), A t z i n g e r , Brooks, et a l . , (6) present a b r i e f h i s t o r y and d e s c r i p t i o n of the major concepts of the decision r i s k analysis process. The authors define r i s k analysis as f o l l o w s : "Decision r i s k analysis i s a d i s c i p l i n e of systems a n a l y s i s , which i n a s t r u c t u r e d manner, p r o vides a meaninfgul measure of the r i s k s associated w i t h various natives . " alter The purpose of the report is t o s t r u c t u r e t h i s decision r i s k analysis process so t h a t the t r a d e - o f f s inherent i n the a l t e r n a t i v e s are v i s a b l y and meaningfully d i s p l a y e d . I t c i t e s the f o l l o w i n g four major areas as the underlying concepts of d e c i s i o n r i s k a n a l y s i s : a. Subjective Probability b. Monte Carlo Methods c. Network Analysis d. Bayesian S t a t i s t i c s Bayesian s t a t i s t i c s and Bayes Theorem have a t t r a c t e d renewed i n t e r e s t i n many f i e l d s o f a p p l i e d and t h e o r e t i c a l s t a t i s t i c s years. i n recent This theorem i s e s s e n t i a l l y a mechanism f o r combining new i n f o r m a t i o n w i t h p r e v i o u s l y a v a i l a b l e information so t h a t decisions or inferences can be based on a l l the information a v a i l a b l e . Over the y e a r s , a controversy has developed between the Bayesian and the more orthodox c l a s s i c a l s t a t i s t i c a l concepts. Anscombe ( l ) provides a b r i e f 5 but concise h i s t o r y of the development of both p h i l o s o p h i e s . During the l a s t few years t h e r e has been a r e v i v a l of i n t e r e s t among s t a t i s t i c a l t h e o r i s t s i n a mode o f argument going back to the Reverend Thomas Bayes (1702-61), Presbyterian m i n i s t e r a t Tunbridge Wells i n England, who wrote an "Essay Towards Solving a Problem i n the Doctrine o f Chances," which was published i n 17&3 a f t e r his death. Bayes work was incorporated i n a great development of p r o b a b i l i t y theory by Laplace and many o t h e r s , which had general currency r i g h t i n t o the e a r l y years of the century. Since then there has been an enormous development of t h e o r e t i c a l s t a t i s t i c s , by R. A. F i s h e r , J . Neyman, E.S. Pearson, A. Wald and many o t h e r s , i n which the methods and concepts o f inference used by Bayes and Laplace have been r e j e c t e d . The orthodox s t a t i s t i c i a n , during the l a s t t w e n t y - f i v e years or so, has sought t o handle inference problems (problems of deciding what the f i g u r e s mean and what ought t o be done about them) w i t h the utmost o b j e c t i v i t y . He explains his f a v o r i t e concepts, s i g n i f i c a n c e l e v e l , confidence c o e f f i c i e n t , unbiased e s t i m a t e s , e t c . , i n terms of what he c a l l s p r o b a b i l i t y , but his notion of p r o b a b i l i t y bears l i t t l e resemblance t o what the man i n the s t r e e t means ( r i g h t l y ) by p r o b a b i l i t y . He i s not concerned w i t h probable t r u t h or p l a u s i b i l i t y , but he defined p r o b a b i l i t y i n terms of frequency of occur rence i n repeated t r i a l s , as i n a game of chance. He views his inference problems as matters of r o u t i n e , and t r i e s t o devise p r o cedures t h a t w i l l work w e l l i n the long r u n . Elements of personal judgment are as f a r as possible t o be excluded from s t a t i s t i c a l c a l c u l a t i o n s . A d m i t t e d l y , a s t a t i s t i c i a n has t o be able to exer cise judgment, but he should be d i s c r e e t about i t and a t a l l costs keep i t out of the t h e o r y . I n f a c t , orthodox s t a t i s t i c i a n s show a great d i v e r s i t y i n t h e i r p r a c t i c e , and i n the explanations they give f o r t h e i r p r a c t i c e ; and so the above remarks, and some of the f o l l o w i n g ones, are no b e t t e r than crude g e n e r a l i z a t i o n s . As such, they a r e , I b e l i e v e , d e f e n s i b l e . (Perhaps i t should be e x p l i c i t l y s a i d t h a t F i s h e r , who c o n t r i b u t e d so much t o the development of the orthodox school, nevertheless holds an unorthodox p o s i t i o n not f a r removed from the Bayesian; and t h a t some other orthodox s t a t i s t i c i a n s , notably Wald have made much use of formal Bayesian methods, t o which no p r o b a b i l i s t i c s i g n i f i c a n c e i s a t t a c h e d . ) The r e v i v e d i n t e r e s t i n Bayesian inference s t a r t s w i t h another posthumous essay on "Truth and P r o b a b i l i t y , " by F. P. Ramsey^ (190330), who conceived of a theory of consistent behavior by a person faced w i t h u n c e r t a i n t y . Extensive developments were made by B. de F i n e t t e and (from a r a t h e r d i f f e r e n c p o i n t of view) by J . J e f f e r y s . For mathematical s t a t i s t i c i a n s the most thorough study of such a t h e o r y is t h a t of L. J . Savage3>^. r S c h l a i f e r ^ has persuasively i l l u s t r a t e d the new approach by reference t o a v a r i e t y of business and i n d u s t r i a l problems. Anyone curious to o b t a i n some i n s i g h t i n t o the Bayesian method, without mathematical hardship, cannot do b e t t e r than browse i n S c h l a i f e r ' s book. 1 # 6 The Bayesian s t a t i s t i c i a n attempts t o show how the evidence of observations should modify p r e v i o u s l y held b e l i e f s i n the formation of r a t i o n a l opinions, and how on the basis of such opinions and of value judgments a r a t i o n a l choice can be made between a l t e r n a t i v e a v a i l a b l e a c t i o n s . For him p r o b a b i l i t y r e a l l y means p r o b a b i l i t y . He i s concerned w i t h judgments i n the face of u n c e r t a i n t y , and he t r i e s t o make the process o f judgment as e x p l i c i t y and o r d e r l y as possible. A t z i n g e r , Brooks, et a l . , ( 6 ) obviously consider Bayesian s t a t i s t i c a l procedures to have g r e a t p o t e n t i a l i n the decision r i s k analysis process; they s t a t e : Bayesian s t a t i s t i c s enjoys a unique p o s i t i o n i n r i s k a n a l y s i s . There f r e q u e n t l y e x i s t s i t u a t i o n s where the a n a l y s i s t has both data and expert judgment t o draw upon i n constructing the p r o b a b i l i t y d i s t r i b u t i o n of i n t e r e s t i n the consolidation a c t i v i t y . Bayesian s t a t i s t i c s provides the analyst w i t h a t o o l f o r synthesizing a l l of t h i s i n f o r m a t i o n i n t o one p r o b a b i l i t y d i s t r i b u t i o n which can then be used t o d i r e c t l y estimate r i s k s . Review of the L i t e r a t u r e The s t a t i s t i c a l l i t e r a t u r e dealing w i t h sample s i z e determination is q u i t e e x t e n s i v e , p a r t i c u l a r l y i n the area of c l a s s i c a l techniques. Bayes, T . , Essay Towards Solving a Problem i n the Doctrine of Chances, r e p r i n t e d w i t h b i b l i o g r a p h i c a l note by G. A. Barnard, Bioraetrika, i+5 ( 1 9 5 8 ) , 2 9 3 - 3 1 5 . Ramsey, F. P . , The Foundations of Mathematics, London: Rowtledge and Kegan P a u l , 1 9 3 1 . Savage, L. J . , The Foundations of S t a t i s t i c s , New York, John W i l e y , 1 9 5 4 . Savage, L. J . , Subjective P r o b a b i l i t y and S t a t i s t i c a l P r a c t i c e , t o be published i n a Mehtuen Monograph. S c h l a i f e r , R., P r o b a b i l i t y and S t a t i s t i c s f o r Business Decisions: An I n t r o d u c t i o n t o Managerial Economics Under U n c e r t a i n t y , New York, McGraw-Hill, 1 9 5 9 - 7 Mace (12) provides an e x c e l l e n t and thorough coverage of c l a s s i c a l p r o cedures f o r determining the optimum sample s i z e of a research experiment. This p u b l i c a t i o n i s a p p l i c a t i o n s o r i e n t e d and provides procedures, formulas, and t a b l e s f o r determining economical sample sizes f o r some f o r t y d i f f e r e n t types o f research o b j e c t i v e s . U n f o r t u n a t e l y , the author considers only one r a t h e r l i m i t e d a p p l i c a t i o n of Bayesian t e c h niques to sample s i z e d e t e r m i n a t i o n . The l i m i t a t i o n i n t h i s particular example, t h a t the variance of the sampling process must be known, seems t o occur q u i t e f r e q u e n t l y i n the l i t e r a t u r e of Bayesian techniques f o r determing minimum sample s i z e s . There has been extensive research i n the a p p l i c a t i o n of Bayesian techniques t o r e l i a b i l i t y engineering and q u a l i t y c o n t r o l . White (15) presents a promising methodology f o r p e r i o d i c r e l i a b i l i t y assessment using Bayesian techniques to combine a n a l y t i c a l p r e d i c t i o n s w i t h l i m i t e d t e s t r e s u l t s t o o b t a i n g r e a t e r p r e c i s i o n i n the r e l i a b i l i t y estimate. The main l i m i t a t i o n of t h i s paper is t h a t i t considers only the gamma d i s t r i b u t i o n i n the a n a l y s i s . G i l b r e a t h (8) has devised sampling procedures f o r use i n s e q u e n t i a l sampling models which have d i r e c t a p p l i c a t i o n i n q u a l i t y c o n t r o l and i n economic l o t s i z e d e t e r m i n a t i o n . These techniques, however, are more a p p l i c a b l e t o hypothesis t e s t i n g than to the e s t i m a t i o n problem. A t z i n g e r and Brooks (5) provide an e x c e l l e n t comparison of Bayesian and c l a s s i c a l decision making under u n c e r t a i n t y f o r a class of problems where the decision v a r i a b l e i s the B e r n o u l l i success p r o b a b i l i t y , p. I f the outcome of any p a r t i c u l a r t e s t or experiment is 8 viewed as a success or f a i l u r e , the r e s u l t i n g data c l a s s i f i c a t i o n i s c h a r a c t e r i s t i c of a B e r n o u l l i process. The authors persuasively argue t h a t h i s t o r i c a l l y , one of the major objectives i n t e s t and e v a l u a t i o n processes has been t o estimate t h i s unknown B e r n o u l l i success parameter. U n f o r t u n a t e l y , such an analysis does not address the a c t u a l parameters of the sampling or experimental process Winkler itself. ( l 6 ) provides a r a t h e r d e t a i l e d and complete development and treatment of Bayesian a p p l i c a t i o n s t o inference and decision theory a t the i n t r o d u c t o r y l e v e l . Although the concepts developed i n t h i s p u b l i c a t i o n are v e r y thoroughly covered, the scope of the m a t e r i a l i s rather l i m i t e d . That i s , only two s p e c i f i c sampling processes are analyzed i n d e t a i l : the sampling process modeled by the B e r n o u l l i d i s t r i b u t i o n , and the sampling process represented by the normal d i s t r i b u t i o n w i t h known v a r i a n c e . R a i f f a and S c h l a i f e r (13) provide an extensive mathematical development of Bayesian technqiues a p p l i e d t o s t a t i s t i c a l decision theory. However, once a g a i n , extensive analysis of the normal d i s t r i b u t i o n i s g e n e r a l l y r e s t r i c t e d t o the case where the variance o f the sampling p o p u l a t i o n i s known. Thus, Bayesian a p p l i c a t i o n s t o the problem of sample-size d e t e r mination d e a l only w i t h v e r y s p e c i a l i z e d s i t u a t i o n s i n the current literature. There appears t o be no s u b s t a n t i a l research i n t o the examination o f the general problem. On the other hand, c l a s s i c a l s t a t i s t i c a l techniques commonly apply i t e r a t i v e type algorithms to the t o the general problem of sample s i z e d e t e r m i n a t i o n . The author believes t h a t these techniques can be v a l i d l y extended t o Bayesian analysis and 9 produce e q u a l l y v a l i d r e s u l t s . The aim of t h i s i n v e s t i g a t i o n , t h e n , i s t o extend the a p p l i c a t i o n o f these w e l l known techniques t o the general sampling s i t u a t i o n using Bayesian a n a l y s i s . 10 CHAPTER I I THE TEST METHODOLOGY The Assumptions of Normality The n o r m a i l i t y assumptions s t a t e d i n the i n t r o d u c t i o n are c r u c i a l , a l b e i t r e s t r i c t i v e , t o t h i s i n v e s t i g a t i o n . introduction The assumption t h a t the p r i o r d i s t r i b u t i o n , which represents the d i s t r i b u t i o n of the mean of a random v a r i a b l e , i s normally d i s t r i b u t e d has s o l i d support i n the C e n t r a l Limit Theorem. Hines and Montgomery (9) s t a t e the essence o f t h i s important theorem as f o l l o w s : I f X., , X . . . , X i s a sequence of n independent random n 2 v a r i a b l e s w i t h E(X^) = ^ and V(X^) = cr (both f i n i t e ) and Y = X + X + . . . + X , then under some general conditions 2 n p 1 0 Y z = n 'I H 1=1 n 71 <>? i=l has an approximate N ( 0 , l ) d i s t r i b u t i o n as n approaches i n f i n i t y . The "general conditions" mentioned i n the theorem are i n f o r m a l l y summarized as f o l l o w s : The terms X^, taken i n d i v i d u a l l y , c o n t r i bute a n e g l i g i b l e amount t o the variance of the sum, and i t i s not l i k e l y t h a t a s i n g l e term makes a l a r g e c o n t r i b u t i o n t o the sum. The p r i n c i p a l i m p l i c a t i o n of t h i s theorem, t h e n , i s t h a t in g e n e r a l the sum of n independent random v a r i a b l e s is approximately normally d i s t r i b u t e d f o r s u f f i c i e n t l y l a r g e n, regardless of the d i s t r i b u t i o n of the n i n d i v i d u a l random v a r i a b l e s . Unfortunately, the 11 assumption t h a t the random v a r i a b l e t o be t e s t e d i s normally d i s t r i b u t e d is much more r e s t r i c t i v e . However, i n many cases, r e a l - w o r l d s i t u a t i o n s can be s a t i s f a c t o r i l y approximated by a normal process. Also, statisti c a l inference and e s t i m a t i o n procedures, p a r t i c u l a r l y those concerning the mean o f random v a r i a b l e s , are g e n e r a l l y robust ( i n s e n s i t i v e ) t o the n o r m a l i t y assumption ( 1 2 ) . The P r i o r I n f o r m a t i o n At f i r s t glance, the requirement t h a t Operational Testing be independent of other t e s t i n g conducted on the same system may seen an insurmountable obstacle i n attempting t o o b t a i n adequate p r i o r mation. infor T h i s , however, i s u s u a l l y not the case; other sources of p r i o r i n f o r m a t i o n do e x i s t . For example, most new systems undergoing t e s t i n g have been s p e c i f i c a l l y designed t o replace older or outmoded systems which are c u r r e n t l y a p a r t of the U. S. Army s t r u c t u r e . These older systems represent a v a s t source o f h i s t o r i c a l data from which p r i o r d i s t r i b u t i o n s f o r n e a r l y any c r i t i c a l issue can be developed. In those r a r e cases where no h i s t o r i c a l data e x i s t from which t o construct a p r i o r d i s t r i b u t i o n f o r a s p e c i f i c c r i t i c a l i s s u e , the Delphi technique or other proven methods o f developing s u b j e c t i v e assessments of uncer t a i n t i e s can be used t o develop the p r i o r d i s t r i b u t i o n (6). I n any e v e n t , t o the Bayesian s t a t i s t i c i a n , the p r i o r information represents the best a v a i l a b l e estimate about an u n c e r t a i n q u a n t i t y , regardless of i t s source. This f a c t even suggests t h a t i t i s reason able and l o g i c a l t o modify the p r i o r d i s t r i b u t i o n developed from h i s t o r i c a l data to r e f l e c t the improved design c h a r a c t e r i s t i c s o f the 12 new system. Suppose, f o r example, t h a t one of the c r i t i c a l issues being evaluated during OT of a new weapons system i s the accuracy of the weapon a t a s p e c i f i e d range. The d i s t r i b u t i o n of the mean-error of s i m i l a r weapons c u r r e n t l y i n use can be determined from h i s t o r i c a l data. I f the new system i s expected to be s i g n i f i c a n t l y more accurate because of new design c h a r a c t e r i s t i c s , the mean of the p r i o r d i s t r i b u t i o n developed from the h i s t o r i c a l data can be adjusted to r e f l e c t the expected increase i n the performance of the new system. In dis cussing techniques f o r the assessment of p r i o r d i s t r i b u t i o n s and the use o f d i f f u s e p r i o r d i s t r i b u t i o n s to represent the s i t u a t i o n where no p r i o r information i s a v a i l a b l e , Winkler (l6) states: I t should be stressed t h a t i n g e n e r a l , t h e r e i s no such t h i n g as a " t o t a l l y i n f o r m a t i o n l e s s " s i t u a t i o n and the use of p a r t i c u l a r d i s t r i b u t i o n s t o represent d i f f u s e p r i o r states of information i s a convenient approximation t h a t i s a p p l i c a b l e only when the p r i o r i n f o r m a t i o n i s "overwhelmed" by the sample i n f o r m a t i o n . I n most r e a l - w o r l d s i t u a t i o n s , n o n - n e g l i g i b l e p r i o r i n f o r m a t i o n (nonn e g l i g i b l e r e l a t i v e t o the sample i n f o r m a t i o n ) i s a v a i l a b l e , and the concept of a d i f f u s e p r i o r d i s t r i b u t i o n is not a p p l i c a b l e . The Basic A l t e r n a t i v e s of Determining Sample Size This study considers only two basic approaches t o determining the appropriate sample s i z e i n an experimental process. One approach i s t o simply d i s r e g a r d any p r i o r knowledge or information a v a i l a b l e about the v a r i a b l e o f i n t e r e s t , and use c l a s s i c a l s t a t i s t i c a l techniques t o solve the problem. The other approach i s t o combine the p r i o r i n f o r m a t i o n w i t h the r e s u l t s of a l i m i t e d number o f r e p l i c a t i o n s of the experiment, i f p o s s i b l e , and then use these r e s u l t s t o solve the problem. 13 The C l a s s i c a l Method C l a s s i c a l e s t i m a t i o n procedures and techniques are w e l l documented i n the l i t e r a t u r e (9? 1 0 , 1 2 ) . This method uses only the r e s u l t s of the sampling or experimental process i n the e s t i m a t i o n procedures and ignores a l l p r i o r i n f o r m a t i o n . S t a r t i n g from the basic assumption t h a t the sampling process i s normally d i s t r i b u t e d w i t h unknown mean, JJL, and 2 unknown v a r i a n c e , a , the random v a r i a b l e representing the outcome of the sampling process can be represented by: 2 ~ N((i, a ) , Let (X^, X^, of the experiment. with 2 (i, cr unknown . . . , X ) represent the r e s u l t s of n r e p l i c a t i o n s The sample s t a t i s t i c s based on the s p e c i f i c n values obtained from the sampling process can be expressed as: n - I V X a= — ^ X^, the sample mean i=l and n ^2 2 i = 1 S = j , the sample variance The appropriate expression f o r a ( l - a) percent confidence i n t e r v a l about the unknown mean, (i, f o r a process which i s normally d i s t r i b u t e d and f o r which the variance is unknown i s constructed using the Student's t d i s t r i b u t i o n , i.e.; Ik P(X - t ( c r / 2 , n - l ) — £ u £ X + t(a/2,n-l) — ) = 1-q, (2-1) where the expression t ( a / 2 , n - 1 ) r e f e r s t o the percentage points of the Student's t d i s t r i b u t i o n w i t h n-1 degrees o f freedom such t h a t P ( t > t ( / 2 , n - 1 ) = a/2. a R e c a l l from the i n t r o d u c t i o n t h a t the c l a s s i c a l interpretation of p r o b a b i l i t y d i f f e r s considerably from the Bayesian i n t e r p r e t a t i o n . Thus, the i n t e r p r e t a t i o n of equation ( 2 - 1 ) i s based on long-run con siderations. That i s , the c l a s s i c a l s t a t i s t i c i a n would say t h a t i f a confidence i n t e r v a l based on a sample s i z e of n i s constructed each t i m e , then i n the long r u n , 1-a percent o f such i n t e r v a l s would contain the t r u e mean of the normally d i s t r i b u t e d sampling process. The value of a, which i s p r e s e l e c t e d a t some low v a l u e , can then be thought of as p r o t e c t i o n against f a i l u r e of the i n t e r v a l t o include the t r u e value of the mean o f the sampling process. The v a l u e , a - 0.05, is o f t e n s e l e c t e d f o r s t a t i s t i c a l inference and e s t i m a t i o n problems because of t r a d i t i o n a l useage. The second type of e r r o r t h a t can occur i n i n t e r v a l e s t i m a t i o n problems i s t h a t the i n t e r v a l constructed based on a set o f s p e c i f i c sample r e s u l t s may t o too w i d e , even though the i n t e r v a l does include the t r u e value of the mean o f the sampling p r o cess. T h i s , t h e n , i s a problem of the accuracy associated w i t h the confidence i n t e r v a l . P r o t e c t i o n against t h i s type of e r r o r I s accom p l i s h e d by c o n t r o l l i n g the w i d t h of the confidence i n t e r v a l constructed. The w i d t h of each s p e c i f i c confidence i n t e r v a l is dependent on the 15 sample s i z e and the value of a s p e c i f i e d . The terms UL = X - t( /2, n-1) — Jn a and UU = X + t ( / 2 , n-1) - ~ a , Jn which are r e a l - v a l u e d functions of the sample r e s u l t s , are the lower and upper l i m i t s , r e s p e c t i v e l y , of the i n t e r v a l e s t i m a t e . The Student's t d i s t r i b u t i o n i s v e r y s i m i l a r t o the standard normal d i s t r i b u t i o n , and f o r degrees of freedom, v = n-1 > 2 0 , the two d i s t r i b u t i o n s are v i r t u a l l y indistinguisable. And i n f a c t , the Student's t d i s t r i b u t i o n i s identical t o the standard normal d i s t r i b u t i o n f o r degrees of freedom, v = oo ( 1 0 ) . This f a c t allows accurate approximations i n computing the minimum sample s i z e by approximating the value of t ( a / 2 , n - 1 ) by t ( a / 2 , • ) = Zq/2 f o r moderate sample s i z e s . The experssion Zq/2 r e f e r s t o the percentage points of the standard normal d i s t r i b u t i o n such t h a t P(Z > Z q / 2 ) = a / 2 . For the moment, l e t the p r e s e l e c t e d width of the confidence i n t e r v a l be simply equal t o k. Then from equation ( 2 - 1 ) , the h a l f - i n t e r v a l w i d t h can be expressed a s : 16 Solving t h i s equation f o r n, r e s u l t s i n t h e f o l l o w i n g expression f o r the minimum sample size r e q u i r e d f o r a confidence i n t e r v a l width equal t o k. n* = 2 t ( o / 2 , n * -i)s c -2 (2-2) c It i s more convenient t o express the width o f the confidence interval i n terms o f the sample standard d e v i a t i o n i n order t o s i m p l i f y equation (2-2). Thus, i f k = 6S i s s u b s t i t u t e d i n t o t h e equation, the minimum sample size r e q u i r e d can then be expressed a s : 2t(o/2, n* -1) n Equation (2-3) *c = [ ii — 2 2 ("3) ] cannot be solved e x p l i c i t l y f o r n * , since the value o f t ( a / 2 , n * - l ) i s a f u n c t i o n o f the sample s i z e n * . t(<y/2, the value o f i s equal t o Zq,/2, n * - l ) i s approximately equal t o t(<y/2, f o r moderate sample sizes a good f i r s t But since oo), which approximation f o r t h e s o l u t i o n o f equation (2-3) i s obtained by s u b s t i t u t i n g the value o f Zq/2 n* - l ) . f o r t h e value t(a/2, This f i r s t approximation i s known t o be too s m a l l , although f o r l a r g e sample sizes i t i s q u i t e close t o the a c t u a l value o f n * . Using t h i s f i r s t approximation, c a l l i t n^, t o evaluate t ( o / 2 , n - l ) and t o solve equation (2-3) again, 0 to o b t a i n a b e t t e r second approximation f o r the value of n * . i t e r a t i v e procedure can be used t o approximate the value o f n * desired accuracy; however, there i s u s u a l l y no s i g n i f i c a n t i n the approximation a f t e r t h e second or t h i r d i t e r a t i o n . This c t o any improvement 17 Table 1 shows t h e v a l u e s o f n * c o b t a i n e d f o r a 95 p e r c e n t 0.05) c o n f i d e n c e i n t e r v a l f o r v a r i o u s v a l u e s o f 6 u s i n g t h i s procedure. (<* = iterative Because o f t h e premimum p l a c e d on a c c u r a t e e s t i m a t e s O p e r a t i o n a l T e s t i n g , v a l u e s o f 6 > 1.0 were not c o n s i d e r e d . in The v a l u e s shown i n t h e t a b l e under t h e h e a d i n g P(K) are t h e approximate p r o b a b i l i t i e s o f a s i n g l e o b s e r v a t i o n from t h e sampling p r o c e s s falling between t h e l o w e r and upper l i m i t s o f t h e c o n f i d e n c e i n t e r v a l , P(K) = P(UL £ x £ UU). i.e., This v a l u e g i v e s a p r o b a b i l i s t i c measure o f the accuracy (width) of the confidence i n t e r v a l . The v a l u e s o f i n t h e t a b l e have b e e n rounded up t o t h e n e x t h i g h e s t i n t e g e r . n* c As i l l u s t r a t e d i n Table 1, e q u a t i o n (2-3) p o i n t s out t h a t i n o r d e r t o d e c r e a s e t h e w i d t h o f a c o n f i d e n c e i n t e r v a l by o n e - h a l f , s i z e must be i n c r e a s e d a p p r o x i m a t e l y by a f a c t o r o f T a b l e 1. 6 1.0 0.9 1.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 t h e sample four. Minimum Sample S i z e - C l a s s i c a l Method P(K) 0.383 0.347 0.311 0.274 0.236 0.197 0.159 0.119 0.080 0.040 c 18 22 27 34 46 64 99 174 387 1537 18 A Bayesian Approximation Bayes Theorem f o r Continuous Random V a r i a b l e s . The essence of Bayes Theorem f o r continuous random v a r i a b l e s i s depicted i n Figure 1 shown below. The d e n s i t i e s f(0) and f ( 8 | y ) represent the p r i o r d i s t r i b u t i o n and the p o s t e r i o r d i s t r i b u t i o n r e s p e c t i v e l y , and represents the l i k e l i h o o d or sampling f u n c t i o n . f(y|e) I t i s important t o keep i n mind always t h a t i t i s the p r i o r d i s t r i b u t i o n or the s t a t i s t i cians p r i o r s t a t e o f knowledge t h a t i s modified by the sampling r e s u l t s and not the r e v e r s e . f(ee) f(y|e) Figure 1. ey) f(e| Bayes Theorem f o r Continuous Random Variables The p r i o r and p o s t e r i o r d i s t r i b u t i o n s must be proper d e n s i t y functions. That i s , they must possess the f o l l o w i n g mathematical p r o p e r t i e s a p p l i c a b l e t o the d e n s i t y f u n c t i o n of any continuous random v a r i a b l e , x , which has range space or domain, R : (i) f ( x ) 2> 0 (ii) J f o r a l l xeR f(x)dx = 1 f R x 19 The l i k e l i h o o d or sampling f u n c t i o n , f(y|e), represents the p r o b a b i l i t y of o b t a i n i n g a given v a l u e , y , f o r the range o f possible values of 9. The l i k e l i h o o d f u n c t i o n is not a proper d e n s i t y f u n c t i o n because the events f ( y | © ) are not mutually exclusive over the range of 0. As suggested i n Figure 1 , Bayes Theorem i s e s s e n t u a l l y a process of combining the p r i o r d i s t r i b u t i o n w i t h the sample information t o y i e l d the p o s t e r i o r d i s t r i b u t i o n . The r e s u l t a n t p o s t e r i o r d e n s i t y has the f o l l o w i n g form: f(9|y) = f frl°> j f(e) f(y|e)de (2-10 This r e s u l t can be expressed i n words a s : posterior density normalizing |~ p r i o r "1 [~ l i k e l i h o o d 1 L constant J L density J L function J where the normalizing constant, l / r J f(0)f(y|e)d9, i s needed t o make the p o s t e r i o r d i s t r i b u t i o n a proper d e n s i t y f u n c t i o n . Before the advent of the high speed computer which g r e a t l y eased the computational burden involved w i t h numerical i n t e g r a t i o n techniques, a p p l i c a t i o n of equation (2-K) t o r e v i s e density functions i n the l i g h t of sample information o f t e n proved extremely d i f f i c u l t because of the i n t e g r a t i o n r e q u i r e d t o compute the normalizing constant. For t h i s reason, Bayesian s t a t i s t i c i a n s developed the concept of "conjugate" d i s t r i b u t i o n s , which are f a m i l i e s of d i s t r i b u t i o n s t h a t ease the compu t a t i o n a l burden when they are used as p r i o r d i s t r i b u t i o n s (l6). Of 20 c o u r s e t h e r e s u l t a n t form o f t h e p o s t e r i o r d i s t r i b u t i o n depends on t h e l i k e l i h o o d f u n c t i o n as w e l l as t h e p r i o r d i s t r i b u t i o n . prior distributions Thus, conjugate a r e s e l e c t e d on t h e b a s i s o f t h e s t a t i s t i c a l p e r t i e s o f t h e model c h o s e n t o r e p r e s e n t t h e s a m p l i n g p r o c e s s . pro When t h e p r i o r d i s t r i b u t i o n i s c o n j u g a t e t o t h e l i k e l i h o o d or s a m p l i n g f u n c t i o n , t h e r e s u l t a n t p o s t e r i o r d i s t r i b u t i o n i s a l s o a member o f t h e same c o n j u g a t e f a m i l y o f p r i o r distributions. Bayes Theorem f o r Normal D i s t r i b u t i o n s . If i t is possible to model t h e p o p u l a t i o n o r p r o c e s s b e i n g sampled by a normal d i s t r i b u t i o n , the proper choice for a family of conjugate p r i o r d i s t r i b u t i o n s on t h e s t a t i s t i c i a n ' s knowledge o f t h e p a r a m e t e r s o f t h e normal d a t a generating process used. effects R a i f f a and S c h l a i f e r (13) summarize t h e o f t h e s t a t i s t i c i a n ' s knowledge o f t h e two p a r a m e t e r s o f normal d i s t r i b u t i o n s on t h e p r o p e r c h o i c e o f c o n j u g a t e p r i o r b u t i o n s as depends the distri follows: Case ( i ) u, known, cr unknown: The a p p r o p r i a t e f a m i l e o f d i s t r i b u t i o n s have a gamma-2 d e n s i t y . conjugate Case ( i i ) a known, u unknown: The a p p r o p r i a t e f a m i l e o f c o n j u g a t e d i s t r i b u t i o n s have a normal d e n s i t y . 2 Case ( i i i ) b o t h u- and a unknown: The a p p r o p r i a t e f a m i l y o f c o n j u g a t e d i s t r i b u t i o n s have a normal-gamma d e n s i t y . An A p p r o x i m a t i o n P r o c e d u r e . S i n c e i t was assumed t h a t w i t h i n t h e c o n t e x t o f t h i s s t u d y t h e model r e p r e s e n t i n g t h e s a m p l i n g p r o c e s s i n O p e r a t i o n a l T e s t i n g was n o r m a l l y d i s t r i b u t e d w i t h unknown mean, |i, 2 and unknown v a i r a n c e , a , t h e a p p r o p r i a t e f a m i l y o f c o n j u g a t e b u t i o n s t o u s e i n t h i s c a s e have a normal-gamma d e n s i t y . overcome t h e o b v i o u s d i f f i c u l t i e s distri In o r d e r t o a s s o c i a t e d w i t h computing interval 21 estimates w i t h the normal-gamma d e n s i t y , a procedure i s suggested here to modify the Bayesian analysis of t h i s sampling process so t h a t the f a m i l y of conjugate p r i o r d i s t r i b u t i o n s have a normal d e n s i t y f u n c t i o n ; as i s the case when the variance of the population or sampling process i s known. Assume f o r the moment t h a t the variance of the sampling process i s known. Then the conjugate p r i o r d i s t r i b u t i o n has a normal d e n s i t y f u n c t i o n of the form: / -(LL 1 x - m') /2cr' V 2m* where the prime ( ' ) i s used t o s i g n i f y a parameter or constant which 2 Is associated w i t h the p r i o r d i s t r i b u t i o n . Thus, a ' i s the variance of the p r i o r d i s t r i b u t i o n o r , the p r i o r variance of the unknown p a r a meter, (j,; and m' i s the mean of the p r i o r d i s t r i b u t i o n o f t h i s p a r a meter . I f n r e p l i c a t i o n s of the experiment were now conducted and a sample mean, n1 i r „ m = n ) X. I , i=l and a sample v a r i a n c e , n S 2 = -V 7 (X.l n-1 U - m) 2 , i=l were observed, the r e s u l t a n t p o s t e r i o r d i s t r i b u t i o n would also have a 22 normal d e n s i t y f u n c t i o n of the form: f W ( , | y . ) i T _ 2TTO-" e - 2 (,-m") /2a" 2 2 where y represents the sample r e s u l t s , and the double prime (") i s used t o i n d i c a t e a parameter or constant which i s associated w i t h the posterior d i s t r i b u t i o n . Thus, a" i s t h e p o s t e r i o r variance of \i, and m" i s the mean of the p o s t e r i o r d i s t r i b u t i o n o f \i. These p o s t e r i o r parameters can be computed from t h e f o l l o w i n g formulas: 1 „2 ( J ,2 O + (2-5) 4 2 G and m" o f l / o ' V + ( ° / g 2 ? » (2.6) 2 ( l / o ' ) + (n/a ) 2 Equations (2-5) and (2-6) i n d i c a t e t h a t the r e c i p r o c a l of the p o s t e r i o r variance i s equal t o the sum o f the r e c i p r o c a l o f the p r i o r 2 v a r i a n c e , a ' , and t h e r e c i p r o c a l o f the variance of t h e sample mean, 2 a/n. The p o s t e r i o r mean i s a weighted average o f t h e p r i o r mean, m ' , and the sample mean, m. The weights being t h e r e c i p r o c a l o f the r e s pective variances. As depicted i n Figure 2 . , an important f e a t u r e o f t h e p o s t e r i o r d i s t r i b u t i o n i s t h a t the p o s t e r i o r mean, m", always l i e s between the 2 p r i o r mean, m ' , and the sample mean, m. The p o s t e r i o r v a r i a n c e , a" , 23 T I s always smaller than the p r i o r v a r i a n c e , cr (l6). T (2-5)j i f the variance of the p r i o r d i s t r i b u t i o n , o* From equation , decreases, the amount o f p r i o r u n c e r t a i n t y decreases, and the p r i o r information i s given more weight i n the determination of the p o s t e r i o r d i s t r i b u t i o n . S i m i l a r l y , as the variance of the sample mean, a / n , decreases, the sampling i n f o r m a t i o n i s given more weight i n the determination of the posterior distribution. m" Figure 2. m Bayes Theorem f o r Normal D i s t r i b u t i o n s A d i f f e r e n t p a r a m e t e r i z a t i o n of t h i s problem might help c l e a r i f y the r e s u l t s obtained. 2 l e t n' = ^-p a* Then the p r i o r variance can be w r i t t e n i n terms of n or sampling v a r i a n c e , t h u s : f and the process 2k Similarly, if 2 then Substituting these r e s u l t s into equations (2-5) and ( 2 - 6 ) , t h e p a r a meters of the p o s t e r i o r d i s t r i b u t i o n are then or s i m p l y , n" = n' + n and 2 m" = 2 ( n ' / g ) m ' + (n/cr )m 2 (n'/a ) + 2 ( /a ) n 25 or simply, = n'm- + nm n' + n ( ^ I n his i n t e r p r e t a t i o n of the r e s u l t s obtained by using these new parameters, Winkler ( l 6 ) suggests t h a t the p r i o r d i s t r i b u t i o n can be thought of as roughly e q u i v a l e n t t o the information contained i n a sample of s i z e n 1 1 w i t h a sample mean o f m from a normal sampling p r o - 2 cess w i t h variance cr . That i s , n 1 appears t o be the sample s i z e 2 r e q u i r e d t o produce a variance of cr' 1 f o r a sample mean equal t o m , since the variance of the sample mean from a sample s i z e n' i s equal to a / n ' . Winkler also considers equations (2-7) and (2-8) as formulas f o r pooling the information from the two samples. Under t h i s inter p r e t a t i o n , the p o s t e r i o r or pooled sample s i z e i s equal t o the sum of the two i n d i v i d u a l sample s i z e s , one from the p r i o r and one from the sampling process. distribution The p o s t e r i o r or pooled sample mean i s equal t o a weighted average of the two i n d i v i d u a l sample means. This pooling process suggests t h a t a reasonable estimate of the sample mean, based on a l l the i n f o r m a t i o n a v a i l a b l e , i s the p o s t e r i o r or pooled mean, m". Notice t h a t i f n' > n, then the p o s t e r i o r or pooled mean is closer t o the p r i o r mean than t o the sample mean. That i s , the p r i o r i n f o r m a t i o n i s given more importance than the sample r e s u l t s i n the determination of the p o s t e r i o r parameters. Of course, the p o s t e r i o r mean i s closer to the sample mean i f n > n ' ; and i f n' = n, the p o s t e r i o r mean i s e x a c t l y midway between the p r i o r mean and the } 26 sample mean. Notice also t h a t since the sample mean, m, i s as equally- l i k e l y t o f a l l above as i t i s t o f a l l below the t r u e population or sampling mean, p,; i t i s then e q u a l l y l i k e l y t h a t the sample mean and the mean of the p r i o r d i s t r i b u t i o n , m ' , t o be on the same or opposite sides of When m' and m f a l l on the same side of JJL, the mean of the p o s t e r i o r d i s t r i b u t i o n , m", w i l l be f u r t h e r from \i than the sample mean. That i s , the p o s t e r i o r mean w i l l be a less accurate estimate of the t r u e p o p u l a t i o n mean than the sample mean. sides of When m T and m are on opposite then i t cannot be determined whether the p o s t e r i o r mean w i l l be closer or f u r t h e r from JJ, than the sample mean. Each s p e c i f i c case must be examined s e p a r a t e l y ; the r e s u l t s w i l l depend on the sample s i z e , the s p e c i f i c value of the p r i o r mean, and the variances of the p r i o r and sampling d i s t r i b u t i o n s . Since the p o i n t estimate of |j, based on a l l information a v a i l a b l e i s the p o s t e r i o r mean which i s normally d i s t r i b u t e d w i t h mean, m", and v a r i a n c e , a" , the statistic Z = has a standard normal d i s t r i b u t i o n , i . e . , Z ~ N(0, l ) . Therefore the appropriate expression f o r a ( l - a) percent i n t e r v a l e s t i m a t i o n of J |L f o r t h i s case i s constructed using the standard normal d i s t r i b u t i o n , i.e., a" £ ^ £ m" + Z P(m" - Z a, a/2 a") = 1 - a (2-H) 27 The lower and upper l i m i t s of the confidence i n t e r v a l i n t h i s case are UL = m" - Z If, / o a" and UU = m" Z cr", r e s p e c t i v e l y . / o as was done i n the c l a s s i c a l case, the w i d t h of the con fidence i n t e r v a l f o r the general case i s set equal t o k, then from equation ( 2 - l l ) the h a l f - i n t e r v a l width can be expressed as: 2 Now s u b s t i t u t i n g the expression f o r cr" from equation ( 2 - 5 ) i n t o the above expression r e s u l t s i n the f o l l o w i n g : l. -i 2 1 V2 r ?? T — -^ V* a na' 1 / V 2 + q 1 n / a 2 ct 2 J =2 2 + Z CT CT «/2 ' k/2 1 2 2 . J = CT + ,2 n C T and f i n a l l y , ^ 2Z ,a e n *b = ot/ k n 2 (2-12) This t h e n , i s the Bayesian s o l u t i o n f o r the minimum sample s i z e r e q u i r e d t o e s t a b l i s h a confidence i n t e r v a l o f w i d t h k about the mean of the sampling process under the s p e c i a l c o n d i t i o n t h a t the variance 28 o f the sampling process i s known. (2-12) deserve mention. [2 Z / 01/ cr/k] <- Several c h a r a c t e r i s t i c s of equation F i r s t of a l l , the f i r s t term i n the e q u a t i o n , , i s i n f a c t the exact expression f o r the c l a s s i c a l s o l u t i o n t o the problem of determining the minimum sample s i z e r e q u i r e d t o e s t a b l i s h a confidence i n t e r v a l of w i d t h k about the mean of a sampling process w i t h known v a r i a n c e . 2 Second, the l a s t term i n the equation, 2 a /o~' , i s the expression developed e a r l i e r f o r n 1 i n equation (2-7). R e c a l l W i n k l e r ' s i n t e r p r e t a t i o n o f n' as being roughly the equivalent sample s i z e , r e l a t i v e t o the sampling process, o f the information con- 2 2 t a i n e d i n the p r i o r d i s t r i b u t i o n . The r a t i o o~ / a ' = 0 i s also used t o d e f i n e a d i f f u s e p r i o r d i s t r i b u t i o n , i . e . , an informationless p r i o r 2 state. Assuming t h a t the variance of the sampling process, o~ > 0, 2 the r a t i o a / a ' 2 = 0 only i f the variance of the p r i o r distribution, 2 a' = oo. I n t h i s case, the variance of the p r i o r d i s t r i b u t i o n would 2 represent a c o n d i t i o n of t o t a l u n c e r t a i n t y and since n equation 1 = a /o"' 2 = 0, (2-12) would y i e l d the same r e s u l t s as i n the c l a s s i c a l case. Tying a l l these f a c t s t o g e t h e r , equation p r e t e d as f o l l o w s : (2-12) can be i n t e r the minimum Bayesian sample s i z e r e q u i r e d t o e s t a b l i s h an i n t e r v a l estimation o f the mean of any s p e c i f i e d w i d t h or accuracy i s equal t o the minimum sample s i z e r e q u i r e d t o e s t a b l i s h the same i n t e r v a l e s t i m a t i o n using c l a s s i c a l methods, minus the value of the p r i o r i n f o r m a t i o n i n terms o f an equivalent sample s i z e . more c l e a r l y : n* = n* b - n c 1 Or, 29 Now, consider once again equation (2-12) i n order to address the f a c t t h a t the variance of the sampling process i s i n f a c t not known. S u b s t i t u t i n g the sample variance f o r the variance o f the sampling p r o cess and the term t ( o / 2 , n * c l - l ) for n equation (2-12), and once again d e f i n i n g the w i d t h of the confidence i n t e r v a l as k = 6S, r e s u l t s i n the f o l l o w i n g expression f o r the approximate Bayesian sample s i z e : (2-13) where and m is equal t o the sample mean based on n*^ observations. Examination o f equation (2-13) reveals t h a t the f i r s t term i n the equation i s i d e n t i c a l t o equation (2-3), the c l a s s i c a l s o l u t i o n t o the minimum sample s i z e problem f o r a normal sampling process w i t h unknown v a r i a n c e . The l a s t term i n the equation i s an approximation of the equivalent sample s i z e o f the information contained i n the p r i o r 2 d i s t r i b u t i o n , where the value of n 2 1 = a /a 2 1 is approximated by n' = 2 S / a ' . Of course equation (2-13) cannot be evaluated e x p l i c i t l y , even though the value of the f i r s t term i n the equation i s e x a c t l y known from the r e s u l t s obtained using the c l a s s i c a l method, since the value 2 of S depends on the s p e c i f i c observations obtained during the sampling process. 30 Before suggesting a procedure f o r approximating a s o l u t i o n t o equation (2-13) f o r the general case, i t may be more appropriate a t t h i s p o i n t t o examine the general implications of using the p o s t e r i o r d i s t r i b u t i o n to construct cofidence i n t e r v a l estimates about the mean of the sampling process. An i n t e r v a l estimation based on the p o s t e r i o r d i s t r i b u t i o n has as i t s midpoint the p o s t e r i o r mean, m"; w h i l e the mid p o i n t of an i n t e r v a l e s t i m a t i o n based on the sampling process alone is the sample mean, m. R e f e r r i n g t o Figure 2, i t i s obvious, t h e n , t h a t an i n t e r v a l estimate of width 6S which i s based on the p o s t e r i o r d i s t r i b u t i o n w i l l not include the sample mean, m, i f m" and m are separated by more than ^6S. A l a r g e separation between m" and m i s i n d i c a t i v e of p r i o r i n f o r m a t i o n which i s not v e r y compatible t o the r e s u l t s obtained from the sampling or experimental r e s u l t s . I n other words, the p r i o r i n f o r m a t i o n does not p r e d i c t the behavior of the sampling process very well. This i s an important c o n s i d e r a t i o n i n O p e r a t i o n a l T e s t i n g , since i t i s important t o decide whether or not t o use the p r i o r information i n e s t i m a t i n g the mean of the sampling or experimental process. I t would seem appropriate t h e n , t o develop a t l e a s t a h e u r i s t i c r u l e t o r e j e c t the use of p r i o r information which causes the p o s t e r i o r and sampling means t o d i f f e r beyond some p r e - e s t a b l i s h e d l i m i t . The general form of such a r u l e would be of the form: |m" - m| £ q6S where the value of q i s s l e e t e d i n a manner such t h a t i f the inequality 3 were n o t s a t i s f i e d , 1 t h e a p p l i c a t i o n o f B a y e s i a n t e c h n i q u e s would be a b o r t e d and t h e a p p r o p r i a t e sample s i z e f o r t h e s p e c i f i c be d e t e r m i n e d by u s i n g c l a s s i c a l s i t u a t i o n would techniques. R e t u r n i n g now t o t h e problem o f c o n s t r u c t i n g an i n t e r v a l estimate o f w i d t h 6S f o r t h e mean o f t h e s a m p l i n g p r o c e s s u s i n g B a y e s i a n t e c h n i ques , t h e f o l l o w i n g p r o c e d u r e i s s u g g e s t e d as a r e a s o n a b l e approach t o approximating the s o l u t i o n of equation a. B a y e s i a n sample n = l rl (i o/ • 2 - 1 3 ) Determine t h e minimum sample s i z e , c l a s s i c a l method. b. ( for the general case. n * C J required for the This v a l u e , c a l l i t n^, i s t h e upper l i m i t o f t h e size. As a f i r s t a p p r o x i m a t i o n t o t h e B a y e s i a n sample s i z e , ^ t i e r e the value of d i s s e l e c t e d w i t h c o n s i d e r a t i o n let given t o t h e c l a s s i c a l sample s i z e b e i n g u s e d . That i s , f o r s m a l l v a l u e s n* , ( s u c h as 2 or k) c d s h o u l d be c h o s e n a t some low v a l u e n^ be l a r g e enough t o y i e l d s u i t a b l e sample s t a t i s t i c s . of i n order t h a t For l a r g e v a l u e s o f n * ? d may be i n c r e a s e d s i n c e t h e r e s u l t i n g n^ samples would c s t i l l yield suitable s t a t i s t i c s . The o b j e c t i v e h e r e i s t o approximate t h e B a y e s i a n sample s i z e c o n c e r v a t i v e l y w h i l e i n s u r i n g t h a t t h e approxi m a t i o n d e c i d e d upon i s l a r g e enough t o y i e l d r e a s o n a b l y v a l i d sample statistics. c. Conduct t h e n^ r e p l i c a t i o n s o f t h e e x p e r i m e n t and from t h e r e s u l t s compute t h e sample statistics: n ^ X L \ 1 1=1 n. 32 and n n 2 S 2 1 d. t - m ) ± - 1 (X L i=l 1 n Use t h e s e s t a t i s t i c s t o compute t h e 2 s n 1 approximations = ~ 0" and _ ^tt n'm 1 + n-m.. l l l m _ = —:— 1 i n e. n i Determine t h e s e c o n d a p p r o x i m a t i o n o f t h e B a y e s i a n sample s i z e by u s i n g t h e v a l u e o b t a i n e d f o r t h e f i r s t a p p r o x i m a t i o n and t h e following relationship: N "2 = l n + ( n * 0 " V where A i s c h o s e n w i t h t h e same c o n s i d e r a t i o n s as was t h e v a l u e o f d. The e x p r e s s i o n f o r t h e a p p r o x i m a t i o n o f t h e B a y e s i a n sample size is: n f. = n + i(n Determine i f s u f f i c i e n t Q - n' ) r e p l i c a t i o n s o f t h e e x p e r i m e n t have 33 been conducted after each iteration by comparing the computed approxi mation of the Bayesian sample size to the classical sample size minus the computed value of n'. That is, continue the iterative procedure until n. Jt n - n'., j 0 j rt g. After computing the final approximation of the Bayesian sample size, determine if the prior information should be accepted or rejected. That is, if |m" - m | <: q6S, use the n. replications already conducted to construct the interval estimate of the mean of the expert mental process using Bayesian techniques. If |m" - m | > q&S, reject the use of the prior information; conduct the remaining n ^ - n^. repli cations of the experiment and construct the desired interval estimate of the mean of the experimental process using classical techniques. 3h CHAPTER I I I DEMONSTRATION OF THE METHODOLOGY Programming the Model The model developed f o r approximating the minimum Bayesian sample s i z e f o r the s p e c i a l t e s t s i t u a t i o n described i n Chapter I i s programmed f o r the UNIVAC 1108 computer using standard F o r t r a n I V language. The program consists of four basic segments designed t o perform the f o l l o w ing f u n c t i o n s : generate the r e q u i r e d data and compute the sample s t a t i s t i c s ; compute the minimum c l a s s i c a l sample s i z e r e q u i r e d f o r an i n t e r v a l e s t i m a t i o n o f s p e c i f i e d w i d t h ; compute the approximate Bayesian sample size r e q u i r e d f o r the same i n t e r v a l w i d t h ; and construct the confidence i n t e r v a l s d e s i r e d based on the sampling r e s u l t s . The Box and M u e l l e r technique (7) i s used t o generate the normally d i s t r i b u t e d pseudo random numbers r e p r e s e n t a t i v e of a normal process w i t h s p e c i f i e d mean and v a r i a n c e . The random number generator was t e s t e d f o r various sample sizes and values o f the model parameters using the chi-square g o o d n e s s - o f - f i t t e s t f o r n o r m a l i t y . The r e s u l t s of these t e s t s were q u i t e favorable and are summarized i n Table 2. Equation (2-3) i s solved i t e r a t i v e l y f o r the minimum c l a s s i c a l sample s i z e by using two standard UNIVAC MATH-STAT l i b r a r y functions (ik). The f u n c t i o n TINORM i s used t o compute the value of the inverse of the standard normal d i s t r i b u t i o n given the value of the p r o b a b i l i t y f o r which the o r d i n a t e i s t o be c a l c u l a t e d . The f u n c t i o n STUDIN i s Table 2. Specified Mean and Variance -120.0 : 81.0 -5.0 : 4.0 0.0 : 1.0 57.0 : 64.0 200.0 : 400.0 225.0 : 25.0 Number of Observations per T r i a l 500 250 100 25 500 250 100 25 500 250 100 25 500 250 100 25 500 250 100 25 500 250 100 25 Test of Normal Random Generator Number of Trials 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Number of T r i a l s Accepted at the & = .05 S i g n i f i c a n c e Level 27 27 29 26 26 27 28 24 27 28 29 23 30 27 28 25 28 29 27 24 26 30 27 23 Table 2. Specified Mean and Variance Number o f Observations per T r a i l Number of Trials (Continued) Number of T r i a l s Accepted a t the Qf = '05 S i g n i f i c a n c e Level p, = 297.72 337 30 28 483 30 29 13 30 24 486 30 28 2 CT = 173.18 M» = a 2 199.5 = 46.32 p, = 45.3 a 2 = .78 (j, = 1542.0 2 CT = 57*2.6 37 used to c a l c u l a t e the inverse of the Student's t d i s t r i b u t i o n f o r a given confidence c o e f f i c i e n t . The r e s u l t s obtained from the subroutine used t o c a l c u l a t e the c l a s s i c a l sample s i z e f o r each s p e c i f i e d value of 6 are shown i n Table 1 . Approximations f o r the Bayesian sample s i z e f o r a given value of d e l t a are computed using the i t e r a t i v e procedure developed i n the p r e ceding chapter. The value of the c l a s s i c a l sample s i z e computed f o r a given value o f d l e t a i s input t o t h i s subroutine which uses t h i s value t o c a l c u l a t e the f i r s t approximation of the Bayesian sample s i z e . Confidence i n t e r v a l s are computed by using the STUDIN l i b r a r y f u n c t i o n t o c a l c u l a t e the value t ( o / 2 , n * - l ) , where n * i s the computed c l a s s i c a l or Bayesian sample s i z e . The subroutine then computes the lower and upper l i m i t s of the confidence i n t e r v a l , UL = m - t ( o / 2 , n * - l) UU = m + t ( o / 2 , n * - 1) S c and f o r the c l a s s i c a l case, and UL = m" - t ( o / 2 , n * and b - l) S c i.e., 38 UU = m" + t ( o / 2 , n * - l ) —— b f o r the Bayesian case. Demonstrating the Model I n order t o demonstrate the model developed t o approximate the Bayesian sample s i z e i n the preceding chapter, various values of the constants, d , A, and q used i n the i t e r a t i v e procedure were t r i e d i n preliminary simulations. The values d = k, A = l A ? and Q. = 3/8 were chosen f o r the f o l l o w i n g reasons: a. Values of d < h tended t o produce f i r s t approximations o f the Bayesian sample s i z e which were too l a r g e when working w i t h small values o f the c l a s s i c a l sample s i z e , n_. 0 the f i r s t approximation. 1 That i s , n, 2: n„ - n . a f t e r 1 0 j Larger values of d produced more conservative f i r s t approximations of the Bayesian sample s i z e f o r small values of n , but a t the same time r e s u l t e d i n u n r e l i a b l e , i . e . , Q greatly variable, sample s t a t i s t i c s . b. n Q Values of A < l/h were r e j e c t e d because f o r l a r g e values of the number of i t e r a t i o n s r e q u i r e d t o compute the approximate Bayesian sample s i z e was considerably increased. I t was f e l t t h a t t h i s result was undesirable i n an Operational Testing mode and, o f course, i t meant increased computer times t o solve the approximation. o f using a v a r i a b l e value f o r A was t r i e d , i . e . , o n e - h a l f a f t e r each i t e r a t i o n . also A scheme A was decreased by This scheme was also r e j e c t e d because 39 f o r l a r g e r v a l u e s of the i t e r a t i v e procedure q u i c k l y evolved into a s e q u e n t i a l t y p e o f sampling p r o c e d u r e . c. The v a l u e o f q = 3/8 was s e l e c t e d as a r e a s o n a b l e b a s e d on t h e i l l u s t r a t i o n shown i n F i g u r e 3. choice The i n t e r v a l a - d r e p r e s e n t s an i n t e r v a l e s t i m a t i o n b a s e d on t h e p o s t e r i o r d i s t r i b u t i o n . from p r e v i o u s d e f i n i t i o n s , l/2 fiS. a - d = 6 S , and t h e i n t e r v a l s a-m" • m"-d = Then i f t h e i n t e r v a l s a-b = c - d = l/8 6 S , t h e sample mean, m, i s r e q u i r e d t o be w i t h i n t h e i n t e r v a l b - c = 3 / 4 6 S , i . e . , 3/8 6S i s t h e p r e r e q u i s i t e f o r i n c o r p o r a t i n g t h e p r i o r into the estimation procedures. for sufficient sample Then |m" - m| £ information I t was f e l t t h a t l/8 6S would a l l o w v a r i a t i o n o f t h e sample mean due t o d i f f e r e n c e s results. F i g u r e 3. S e p a r a t i o n o f t h e P o s t e r i o r and Sample Means in ko The procedure t o approximate the Bayesian sample s i z e was demon s t r a t e d using a h y p o t h e t i c a l case having the f o l l o w i n g c h a r a c t e r i s t i c s : 2 2 a. the r a t i o a /CT b. |m' - p,| = 1 where p, and a =l 6 . 5 and |m' - p,| = 10. are the t r u e (but assumed unknown) values of the p a r a 2 1 meters of the sampling process and m and a 1 are the parameters of the prior distribution. The f i r s t t e s t of the procedure involved a computer simulation of 100 runs f o r each value of d e l t a from 1.0 t o 0 . 2 . The model was not t e s t e d f o r the value of d e l t a equal to 0.1 i n t h i s or subsequent t e s t s of the procedure because the l a r g e sample sizes involved r e q u i r e d an excessive amount of computer t i m e . The r e s u l t s o f t h i s f i r s t t e s t are summarized i n Table 3 f o r the case where (m f o r the case where |m' - p,| = 10. 1 - p, j = 5 and i n Table k These r e s u l t s appear q u i t e favorable as shown i n the percentage of reduction achieved over the c l a s s i c a l sample s i z e s . Note t h a t the computed Bayesian sample s i z e does not depend on the value of |m' - p , | . That i s , the Bayesian sample sizes are i d e n t i c a l i n Tables 3 and k f o r a given value of d e l t a . The con fidence and accuracy o f the i n t e r v a l estimates produced, i . e . , the number of times the t r u e mean of the sampling process i s contained w i t h i n the i n t e r v a l and the w i d t h of the i n t e r v a l constructed, i s comparable t o the r e s u l t s obtained using c l a s s i c a l methods f o r the case where |m' - p, | = 5 . For the case where |m f - p, | = 10, the desired confidence i s not achieved u n t i l the s i t u a t i o n i n v o l v i n g the two l a r g e s t cample s i z e s . The separation between the p o s t e r i o r and sample means Table 3. Data f o r the Bayesian Approximation Model Based on One Hundred Runs f o r Each Value of D e l t a 2 a /a' Specified Width of Confidence Interval Computed Classical Sample Size Number of Times UL <m<£JU c c Average Computed Bayesian Sample Size 2 = 16, |m« - u-| = 5 Number of Times Average Value of UL ^n^UU lm" -ml b b Number of Times Percent Reduction i n Sample Size D <: q6S n*. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 18 22 27 34 46 64 99 174 387 96 96 95 96 98 93 97 96 97 6.7 8.9 12.6 18.7 32.4 52.8 87.7 163.I 376.4 97 97 98 98 96 93 9* 98 98 5.589 4.814 4.355 2.541 2.175 1.185 0.756 0.470 0.217 75 76 79 90 95 100 100 100 100 61.1 59.1 51.9 44.1 28.3 17.2 11.1 5.7 2.6 Table k. Data f o r the Bayesian Approximation Model Based on One Hundred Runs f o r Each Value o f D e l t a 2 2 c /o' Specified Width of Confidence Interval Computed Classical Sample Size Number of Times UL <m<:UU c c Average Computed Bayesian Sample = 16, |m f - \i\ Number of Times Average Value of UL <m<:UU m" - m| b b = 10 Number Times D <: Percent Reduction i n Sample Size q8S n*. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 18 22 27 3h 1+6 6h 99 174 387 96 96 96 96 98 9* 98 95 97 6.7 8.9 12.6 18.7 32.4 52.8 87-7 163.1 376.4 75 74 86 77 86 88 88 95 95 8.300 6.008 6-993 4.438 3.874 2.359 1.545 0.930 0.416 54 51 49 75 77 97 100 100 100 61.1 59.1 51.9 44.1 28.3 17.2 11.1 5.7 2.6 h3 decreases as the sample s i z e increases and the sampling information i s given more weight i n the determination of the p o s t e r i o r d i s t r i b u t i o n . For t h i s reason, the t e s t suggested f o r determining whether or not t o use the p r i o r information does not work w e l l a t a l l . where |m T - p, | = 5 and |m T For both the case - p, | = 10, the t e s t r e j e c t s the p r i o r infor mation too o f t e n f o r small sample sizes and erroneously allows the use of the p r i o r i n f o r m a t i o n i n l a r g e sample s i z e s . I t appears t h a t a b e t t e r decision r u l e as t o whether or not to r e j e c t the p r i o r information should consider the d i f f e r e n c e between the p r i o r mean ( r a t h e r than the p o s t e r i o r mean) and the sample mean. The accuracy of the approximation procedure is q u i t e good; the o v e r a l l average reduction i n the sample s i z e f o r a l l values o f d e l t a i s 12.0 samples, which equates t o a p p r o x i mately 75 percent of the t r u e d i f f e r e n c e between the c l a s s i c a l and the Bayesian sample s i z e s , which i s ]_6 samples f o r t h i s p a r t i c u l a r case. The second t e s t of the procedure involved computing the Bayesian sample size r e q u i r e d f o r each value of d e l t a and f o r various values of |m' - [ i \ ranging from one standard d e v i a t i o n below the t r u e mean of the sampling process t o one standard d e v i a t i o n above t h i s v a l u e . i f i c values chosen f o r I n Table 5 . The spec |m' - p, | and the r e s u l t s of the t e s t are shown The r e s u l t s obtained when the value of 1 Im - p, | i s w i t h i n o n e - h a l f standard d e v i a t i o n on e i t h e r side of p, are q u i t e f a v o r a b l e , w i t h only t h r e e cases out o f the t o t a l of 63 t r i a l s where the Bayesian i n t e r v a l estimate d i d not include the t r u e value of the mean of the sampling process. O v e r a l l , t h e r e were a t o t a l of 2h cases, out of the 99 t o t a l t r i a l s , where the Bayesian i n t e r v a l estimate d i d not include Table 5. Data f o r the Bayesian Approximation Model Based on Various Values of |m' - p, | 2 a /cr' Specified Value of Delta Classical Sample Size 6 n* 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 18 22 27 3h 46 64 99 174 387 2 = 16 Computed Bayesian Sample Size f o r the S p e c i f i e d Value of |m' - p,| -20 7* 6* 15 23* 38* 51 85* 159 378 -15 5* 10* 7* 22 39* 48 87* 164* 379 -10 7 12 10 18* 35 52 88 159 376 -5 5 19 14 12 37 h9 89 163 375 -2 5 6 7 17 36 52 87 162 376 0 2 5 11 5 18 6 23 20 21 24 31 40 57 58 91 91 166 164 376 377 5 6 7 17 32 55 90 161 377 10 5 6 17* 9 36 54 93 166* 376 15 20 5* 13 15* 5* 6* 21 19* 36* 53* 9* 40 52 85 163 376* Figures marked w i t h an a s t r i c ( * ) i n d i c a t e cases where the confidence i n t e r v a l based on the Bayesian sample s i z e d i d not contain the t u r e value of the mean of the sampling process. 87 164* 378 45 the t r u e value of the mean of the sampling process. The f i n a l t e s t conducted on the model was t o f i x the value |m' - |j, | = 5 and to compute the Bayesian sample s i z e r e q u i r e d f o r each 2 value of d e l t a and f o r various r a t i o s of the v a r i a n c e s , cr /cr' 2 . The s p e c i f i c values chosen f o r the r a t i o of the variances and the r e s u l t s of the t e s t are shown i n Table 6. The r e s u l t s obtained when the r a t i o of the sampling and the p r i o r variances was 4 or g r e a t e r are good, w i t h only one case out of a t o t a l of 63 t r i a l s where the Bayesian i n t e r v a l estimate d i d not include the t r u e value of the mean of the sampling process. O v e r a l l , t h e r e were a t o t a l of seven cases out of the 99 t r i a l s where the Bayesian i n t e r v a l estimate d i d not include the t r u e value of the mean of the sampling process. Table 6. D a t a f o r t h e B a y e i s a n A p p r o x i m a t i o n Model Based on V a r i o u s V a l u e s o f t h e R a t i o o f t h e V a r i a n c e s |m S p e c i f i e d Value o f t h e R a t i o of the Variances cr /cr' Classical Size - p. | = 5 Computed B a y e s i a n Sample S i z e t h e S p e c i f i e d Value of D e l t a 1.0 32 16 8 7 6 5 4 3 2 1 f 0.9 0.8 5 5 14 5 14 17 11 16 17 18 6 10 16 21 6 21 19 16* 21 22 7 7 16 23 16 26 25 25* 27 27 18 22 27 0.7 13 22 29 29 30 33 34 31 33 • 33 0.6 0.5 for 0.4 0.3 0.2 22 39 41 43 44 44 45 45 45 46 33 48 58 58 60 61 64 64* 64 64 69* 87 9h 95 95 97 97 98 99 99 144 164 167 170 170 173 173 173* 173* 174* 368 379 382 382 384 385 385 385 385 387 46 64 99 174 387 Sample 34 F i g u r e s m a r k e d w i t h a n a s t r i c (*) i n d i c a t e c a s e s w h e r e t h e c o n f i d e n c e i n t e r v a l b a s e d on t h e B a y e s i a n s a m p l e s i z e d i d n o t c o n t a i n t h e t r u e v a l u e of t h e mean o f t h e s a m p l i n g p r o c e s s . CHAPTER IV CONCLUSIONS AND RECOMMENDATIONS Conclusions The r e s u l t s 1. of t h i s s t u d y i n d i c a t e t h e f o l l o w i n g conclusions. The s u g g e s t e d p r o c e d u r e t o a p p r o x i m a t e B a y e s i a n s a m p l e and c o n s t r u c t i n t e r v a l e s t i m a t e s f o r t h e mean o f t h e s a m p l i n g s h o u l d b e u s e d f o r t h e n o r m a l s a m p l i n g p r o c e s s when a c c u r a t e Information is a v a i l a b l e . sizes process prior T h a t i s , when t h e p r i o r mean i s w i t h i n o n e - h a l f s t a n d a r d d e v i a t i o n o f t h e t r u e mean o f t h e s a m p l i n g p r o c e s s . 2. I n t h e w o r s t c a s e , t h e p r o c e d u r e w i l l y i e l d t h e same s a m p l e s i z e s as would c l a s s i c a l t e c h n i q u e s . In t h i s c a s e , t h e i n t e r v a l m a t e s s h o u l d be b a s e d on t h e c l a s s i c a l m e t h o d , p r i o r i n f o r m a t i o n has been 3. since in essence, the rejected. The a c c u r a c y a n d c o n f i d e n c e l e v e l s a s s o c i a t e d w i t h t h e i n t e r v a l e s t i m a t e s b a s e d on t h e a p p r o x i m a t i o n p r o c e d u r e a r e t o t h o s e o b t a i n e d by u s i n g c l a s s i c a l t e c h n i q u e s is esti if the prior comparable information accurate. h. The h e u r i s t i c r u l e s u g g e s t e d t o d e t e r m i n e w h e t h e r o r not t o u s e t h e p r i o r i n f o r m a t i o n d i d n o t work w e l l b e c a u s e t h e v a l u e |m" = ml i s a f u n c t i o n of t h e s a m p l e s i z e a s w e l l a s b e i n g a function T o f t h e v a l u e of t h e p r i o r mean, m . 5. The r e s u l t s o b t a i n e d i n t h e d e m o n s t r a t i o n o f t h e f o r t h e v a l u e s of d e l t a s e l e c t e d , procedure indicate t h a t the procedure t o approxi- m a t e t h e B a y e s i a n s a m p l e s i z e and c o n s t r u c t interval estimates is v i a b l e c o n c e p t which has d i r e c t a p p l i c a b i l i t y and v a l u e i n a Operational Testing. Re c ommendat i ons As i n m o s t c a s e s i n v o l v i n g r e s e a r c h o f a l i m i t e d s c o p e , more p r o b l e m s a r e u n e a r t h e d t h a n a r e r e s o l v e d i n t h i s s t u d y . l i m i t e d r e s u l t s o b t a i n e d , h o w e v e r , show some m e r i t a n d to Operational Testing. The applicability As a m a t t e r o f f u t u r e r e s e a r c h i n t h e c o v e r e d by t h i s s t u d y , t h e f o l l o w i n g recommendations a r e 1. perhaps Further efforts area suggested. a r e r e q u i r e d t o improve t h e i t e r a t i v e cedure u s e d t o approximate t h e Bayesian sample s i z e . A refined pro pro c e d u r e s h o u l d t a k e i n t o a c c o u n t t h e n e e d t o t r e a t l a r g e and s m a l l sample s i z e s as s e p a r a t e p r o b l e m s . approximation a t any s p e c i f i c Perhaps t h e increment added t o the i t e r a t i o n s h o u l d be some f u n c t i o n o f the number o f i t e r a t i o n s a l r e a d y c o n d u c t e d . C a r e m u s t be t a k e n , however, t h a t any p r o c e d u r e developed f o r t h i s s i t u a t i o n be c o m p a t i b l e t o O p e r a t i o n a l T e s t i n g e n v i r o n m e n t , w h e r e e a s e o f a p p l i c a t i o n and c i t y are prime 2. simili- objectives. The s a m p l e s t a n d a r d d e v i a t i o n , S , i n e q u a t i o n only variable in the equation for a specific c u l a r random v a r i a b l e (2-13), is sample s i z e . This t h e a p p r o x i m a t e B a y e s i a n s a m p l e s i z e would l e a d t o more a p p r o x i m a t i o n s of t h e the parti is related to the chi-square distribution. haps f u r t h e r work w i t h t h i s p a r t i c u l a r e l e m e n t of t h e e x p r e s s i o n 3. the Per for accurate equation. A workable decision r u l e for determining whether or not to kg use the p r i o r information is needed. I t i s suggested t h a t the s h i p b e t w e e n t h e p r i o r and s a m p l e m e a n s , i . e . , |m' - m | , w i l l more v i a b l e r e s u l t s t h a n t h e t e c h n i q u e u s e d i n t h i s s t u d y . whatever rule is developed, i t must t r e a t t h e d i f f e r e n c e s w i t h l a r g e and s m a l l sample s i z e s 4. yield Obviously, associated separately. There a r e obvious l i m i t a t i o n s Operational Testing. relation in applying t h i s procedure A l t h o u g h t h e p r o c e d u r e h o l d s some p o t e n t i a l r e d u c i n g c o s t s a s s o c i a t e d w i t h O p e r a t i o n a l T e s t i n g by r e d u c i n g number o f r e p l i c a t i o n s r e q u i r e d of a s p e c i f i c t e s t s , a n y s a m p l i n g scheme i s i n h e r e n t l y d i f f i c u l t to of the iterative and c o s t l y t o a p p l y b e c a u s e of t h e problems i n v o l v e d w i t h m u l t i p l e s c h e d u l i n g and s e t - u p c o s t s . The p r o c e d u r e seems b e t t e r s u i t e d t o t h o s e t e s t i n g s i t u a t i o n s w h e r e a large number o f s a m p l e s a r e r e q u i r e d a n d t h e c o s t o f s a m p l i n g i s low. relatively F o r t h e s e r e a s o n s , a scheme t o i n c o r p o r a t e t h e c o n c e p t of functions i n t o t h i s procedure i s needed before of a t r u e d e c i s i o n making p r o c e d u r e . i t c a n assume t h e loss cloak APPENDIX I FORTRAN PROGRAM FOR THE APPROXIMATE BAYESIAN PROCEDURE 51 RFOR, IS '-IAIN CO'-'^i/ONE/ X ( 2 . 1 0 0 " ' ) , NCM COV'.C'i/TWO/ XVEANI3), XVARO) COMMON/THPEF/ XHAT(3). ?HAT(3> C0M"~N/F I VP/ ALPHA, ML ( 3 ) , L" U 3 ) W ' C N / S E V E M / N C ( 5 ) . DELTA CO'-OXN/E I0HT/ H 3 I 1 ! ) , NPRTM(lC). D I FF CO'V-ON/NINE/ WIDTH (2 ) COM'-'ON/TEN/ L O 0 P ( 2 ) . K FY • DMAX EXTERNAL UN IF 1 Pt-'AO IN PASIC PARAMETERS S FORMAT ( ) READ(5»B) ALPHA RFAO(0.8) NSU REA^OfS) XVFAN(l), XVAR(l) RCAD(5.P) XVEAN(2)» XvAR(2) C****K C»***» ART UP UNIFORM GFMFRATOR TO RANDO" IZ F STARTW DOINIT DO K' J = l , N5U 0= UNIF(A) 10 CONTIN'ir DO 100 KK= 1» 30 REAOCi. fl. ENP = 999) DELTA C***** DETER VTNF THF MINI^U" CLASSICAL ? A WPL F «X T 7 T CALL CLASS( M( 1 ) ) CALL RAiVDN ( 1 ) C***-»* DETERMINE THE MINIMUM BAYFSI AN SAMPLF S I 7 F . CALL f3AYFS( Nf 1 ) » N( 2 ) ) C«**** COVPUTF CONFIDENCE INTERVALS FOR THE DATA P R ^ r r c c r ? CALL CONFTDC N( 3) . 3 ) CALL ORDEPU) CALL CO**FID( N( 1 ) . 1 ) C**#** PRINT OUTPUT CALL OUTPUT 100 CONTINUE 999 CONTINUE l\'RITF(6.7fM 7 0 FORMAT(1H1) 5 TOP END IF APPROPRIATE 52 -RFOR.IS RANDN C*»***THIS SUBROUTINE GENERATES NORMALLY 01 STR I BUTFD PSEUDS-RANDOM C NUMBERS HAVING A SPECIFIED MFAN AND VARIANCE C C*****ARGUEMENT D E F I N I T I O N C X I S THE ARRAY OF RANDOM NUMBERS ( O U T P U T ) C N I S THE NUMBER O F RANDOM NUMBERS D E S I R F D ( I N P I ' T ) C XMEAN I S THE MEAN OF THE RANDOM NUMBERS ( I N P U T ) C XVAR I S THE V A R I A N C E O F \ T H E RANDOM NUMBERS ( I N P U T ) C C***#*THIS SUBROUTINE USES THE BOX AND MUELLER METHOD Fn<? C G^NFRAT TON OF NORMAL PSEUDO-RANDOM NU RFRS W SUBROUTINE RANDN(J) COMMON/ONE/ X ( 2 » 1 0 0 0 ) , N(3) COMMON/TWO/ XMEAN ( 3 ) , XVARO) EXTERNAL UN I F TPI=6.2831852 DO 100 1 = 1 , N ( J ) , 2 A= UNIF( 1 ) B= UNIF(2) X ( 1 , I ) = XMFAN(2)+ SQRT(-2.0*XVAR(2)*ALOG(A))*COS(Tpy*B) X ( 2 ,1 ) = X < 1 ,1 ) 11= 1+ 1 X ( 1 , 1 I ) = XMEAM(2)+ SORT(-2.0*XVAR(2)*ALOr,( A) ) *M N < TPI *n > X ( 2 ,1 I J = X ( 1 ,1 I ) 100 CONTINUE RETURN END - R F O R , i s UNIF FUNCTION UNIF(A) DATA I Y / 9 6 5 8 1 / IY=IY*3 1 25 IF(IY)5,6,6 5 IY=IY+l+34359738367 6 YFL = I Y UNIF= Y F L * 2 . 0 * * ( - 3 5 ) r R TU^N END 53 -RFOR.IS ORDER C**»** C C C C*#### C C C C C THIS SUBROUTINE SORTS A GIVEN SET OE DATA FROM TH^ LOWEST VALUE TO THE HIGHEST, AND COMPUTES THE SAMPLE STATISTICS (MEAN AND STANDARD DEVIATION) OF THE DATA PROCESS ARGUEMENT DEFINITION X= THE ARRAY OF DATA VALUES TO BE SORTED (INP"T/OUTPUT) N= THE NUMBER OF DATA POINTS (INPUT) XHAT = THE SAMPLE MEAN OF THE DATA PROCESS (OUTPUT) SHAT = THE SAMPLE STANDARD DEVIATION OE THE DATA PROCESS (OUTPUT) SUBROUTINE ORDER(K ) COMMON/ONE/ X ( 2 » 1 0 0 0 ) , N O ) COMMON/THREE/ XHAT(3 ) , SHAT(3) NM1= N ( K ) - 1 DO 200 1 = 1 , NM1 IP1=1+1 DO 100 J= IP1 , N(K) IE( X(K»I) . L E . X ( K . J ) TFMP= X ( K • I ) X ( K , I ) = X(K,J) X(K»J)= TEMP 8 100 CONTINUE 200 CONTINUE C#*#** ) GO TO 100 COMPUTE THE SAMPLE STATISTICS FOR THE DATA PROCESS SUM 1 = 0 . 0 SUM2=0.0 DO 300 1 = 1 , N(K ) SUM1= SUM 1+ X(K , I ) 5UM2= SUM2+ X (K » I )**2 300 CONTINUE YN= NIK) XHAT(K)= SUM1/YN RN= YN- 1.0 SUM22= SUM2- (SUM1**2)/YN SHAT(K)= SORT(SUM22/RN) 8 RETURN END 54 -RFOR »IS CLASS C#*#*# C C C C***** C C C THIS SUBROUTINE CALCULATES THF MtNtMUM CLASSICAL CAWPLF SIZE REQU t RED TO CONSTRUCT A CONFIDENCE iNTFRVAL OF GIvFN WIDTH ABOUT THE MEAN OE A NORMAL SAMPLING POPULATION OF UNKNOWN VARIANCE ARGUEMANT DEFINITION ALPHA = THE CONFIDENCE COEFFICIENT (INPUT) DELTA= A FUNCTION OF THE INTERVAL WIDTH (INPUT) NCLASS= THE COMPUTED SAMPLE SIZE (OUTPUT) SUBROUTINE CLASS(NCLASS) COMMON/FIVE/ ALPHA. UL I 3 ) , UU(3) COMMON/SEVEN/ NC(5 ) . DELTA COM ON /T EN / LOQp(2W KEY » DMAX M C#***# C C C COMPUTE THF FIRST APPROXIMATION OF THE CLASSICAL «AMPL* S I Z E . N C ( 1 ) , BASFD ON THE STANDARD NORMAL DISTRIBUTION, WHICH IS IDENTICAL TO THF T DISTRIBUTION WITH INFINITE nFGREFS OF FREEDOM ALPHA 1= ALPHA/2,0 S= TINOR (ALPHA 1, $15) GO TO 18 v 15 WRITE(6,17) 17 FORMAT( / / , lOXi 68H ERROR MESSAGE—OVERFLOW ON T Nv Rc F NORMAL DIST IRIBUTION--FORMAT 15 ) CALL EXIT F 18 CONTINUE REALC= (2.0*S/DELTA)**2 NC ( 1> = INT(REALC) IF( NC(1) . L T . REALC ) NC(1)= NC(1)+ 1 C***** C C C COMPUTE THF SUCCEEDING APPROXIMATIONS OF THE CLASSICAL SAMPLE S I Z E , N C ( J ) , BASED ON THE T DISTRIBUTION WITH DEGREES OF FREEDOM EQUAL TO N C ( J - l ) - 1 . STOP THE ITFPATIV oROCmuPF WHEN N ( J ) IS EQUAL TO N ( J - l ) DO 30 J = 2 , 10 NDF = N C ( J - l ) - 1 T= STUD IN(ALPHA, NDF, $21) GO TO 24 r 21 WRITE(6,23) 23 FORMAT?//, lOX, 74H ERROR MESSAGE—OVERFLOW ON STUDENTS T DI.STRIBu 1TI0N FUNCTION—FORMAT 21 ) CALL EXIT 24 CONTINUE REALC= (2.0*T/DELTA)**2 NC(J)= I NT(REALC) IF( NC(J) . L T . REALC ) NC(J)= NC(J)+ ] IF ( NCU) .EO. N C ( J - l ) ) GO TO 35 8 30 CONTINUE (.•*#•* ASSIGN THE COMPUTED SAMPLE SIZE TO NCLASS 35 NCLASS- NCIJ) LOOP(n = J RETURN END 56 -RFOR.IS C C C C C C***** C C BAYES THIS SUBROUTINE CALCULATES THE MINIMUM BAYESIAN SAMPLF S I Z E . IF AP APPROPRIATE. TO CONSTRUCT A CONFIDENCE INTERVAL OF GIVEN WIDTH ABOUT THF MEAN OF A NORMAL SAMPLING POPULATION WITH UNKNOWN VARIANCE ARGUE ARGUEMANT DEFINITION K = THE MINIMUM CLASSICAL SAMPLE SIZE (INPUT) NBAYES= THE COMPUTED SAMPLE SIZE (OUTPUT) NB SUBROUTINE BAYES ( K. NBAYES ) COMMON /ONE / X ( 2 . 1 0 0 U ) . N O ) COMMON/TWO/ XMEAN(3). XVAR(3) COMMON/THREE/ XHAT(3)» SHAT(3) COMMON/SEVEN/ N C ( 5 ) . DELTA COMMON/EIGHT/ N B ( 1 0 ) . NPRIM(IO). DIFF COMMON/TEN/ LOOP(2). KEY » DMAX C#*»** C COMPUTE THE FIRST APPROXIMATION. N ( l ) . OF THE BAYESIAN SAMPLE SIZE REALB= FLOAT(K)/4.0 NB(1)= INK REALB ) IFt NB(1) . L T . REALB ) NB(1)= NB(1)+ 1 N 12)= NB( 1 ) <:•*•*• C C TAKE N ( l ) SAMPLES AND COMPUTE THE SAMPLE STATISTICS FOR THE DATA PROCESS AND THE POSTERIOR PARAMETERS BASED ON THESE N ( l ) OBSERVATIONS CALL ORDER(2) APPN= SHAT(2)**2/XVAR(1) NPRIM(1)= I NT( APPN ) IF( NPRIM(I) . L T . APPN ) NPRIM(1)= NPRIM(1)+ 1 XMEAN(3)= ( NPRIMt1)*XMEAN(1)+ NB(1)*XHAT(2) ) / 1 FLOAT( NPRIM(1 )+ NB ( 1 ) ) DIFF= ABS( XMEAN(3)- XHAT(2) ) DMAX= DIFF KEY = 1 IF( N(2) . G E . K- NPRIM(l) ) GO TO 55 C***** C C COMPUTE THE SUCCEEDING APPROXIMATIONS. N ( J ) . OF THE BAYESIAN SAMPLE S I Z E . STOP THE ITERATIVE PROCEDURF WHEN NP(J) IS GREATER THAN OR EQUAL TO K- NPRIM(J) DO ICO J = 2 . 20 RINC= FL OAT ( K- N P R I M ( J - l ) )/<f.O INC= I NT( RING ) IF{ INC . L T . RINC ) INC= INC + 1 NB(J)= N B ( J - 1 ) + INC N(2)= NB{J) C***** C C TAKE EACH SUCCEEDING N ( J ) SAMPLES AND COMPUTE THF SAMPLE STATISTICS FOR THE DATA PROCESS AND THE POSTERIOR PARAMETERS BASED ON THESE N ( J ) OBSERVATIONS CALL ORDER(2 ) 57 APPN = 5HAT(2)**2/XVAR(1) NPRIM(J)= INK APPN ) IF( NPRIM(J) . L T . APPN ) NPRIM(J)* NPRIM(J)+ 1 XMEAN(3)= ( NPRIMIJ) *XMEAN( 1) + NB(J)*XHAT(2 ) ) / 1 ELOAK NPR1M(J)+ NB(J) ) DIFE= ABS( XMEAN(3)- XHAT(2 ) ) IE( DIFF . L E . DMAX ) GO TO 35 DMAX= DIFF K.EY= J 35 CONTINUE IF( N(2) . G E . K- NPRIM(J) ) GO TO 45 100 CONTINUE C»*##* ASSIGN THE SAMPLE SIZE COMPUTED ABOVE TO N&AYES ANO DETERMINE C THE POSTERIOR (POOLED) SAMPLE SIZE 55 CONTINUE NBAYE S= N(2) IE( NBAYFS .GT. K ) NBAYES = K N ( 3 ) = NBAYES+ NPRIM(1) XHAK3)= XMEAN13) SHAT(3)= SHAT12) L00P(2)= 1 GO TO 999 45 CONTINUE NBAYES= NB(J) IF( NBAYES . G T . K ) NBAYES= K N(3)= NBAYES+ NPRIM(J) XHAT(3)= XMEAN(3) SHAT(3)= SHAK2) LOOP(2 )= J 999 RETURN END 58 -RFOR•IS CONFID C####* C C C#*#*# C C C C C C C THIS SUBROUTINE CALCULATES A CONFIDENCE INTERVAL FOR THE MEAN OF A NORMAL POPULATION WHEN THE VARIACE IS UNKNOWN ARGUEMENT DEFINITION N= THE NUMBER OF DATA POINTS IN THE SAMPLE (INPUT) ALPHA= THE CONFIDENCE COEFFICIENT (INPUT) XHAT= THE SAMPLE MEAN OF THE DATA PROCESS (INPUT) SHAT = THE SAMPLE STANDARD DEVIATION OF THE DATA PROCESS (INPUT) UL= THE LOWER CONFIDENCE LIMIT FOR THE MFAN (OUTPUT) UU= THE UPPER CONFICENCE LIMIT FOR THF MEAN (OUTPUT) SUBROUTINE CONFIDtN. J ) COMMON/THREE/ XHA T(3 ) . SHAT(3) COMMON/FIVE/ ALPHA. U L ( 3 ) . UU(3) C***** C***»# C C C C COMPUTE THE DEGREES OF FREEDOM ASSOCIATED WITH THF ^A^PLE NDF = N-1 DETERMINE THE VALUE OF THE 5TUDENT(S T DISTRIBUTION AT A SIGNIFICANCE LEVEL = ALPHA NOTE — THIS OPERATION USES A STAT*PACT FUNCTION CA[ LED STUDIN TO CALCULATE THE INVERSE STUDENTS T VALUE GIVEN THE CONFIDENCE COEFFICIENT ALPHA T= STUD IN(ALPHA . NDF. $10) GO TO 700 10 WRITE(6.15) 15 FORMAT ( / / . l OX. 7<.H ERROR MESSAGE—OVERFLOW ON STUDENT (S T DISTRIBU 1TION FUNCTION—FORMAT 700 ,) CALL EXIT 700 CONTINUE YN = N C***** COMPUTF THF LOWER CONFIDENCE LIMIT UL(J)= XHAT(J)- T*(SHAT(J)/SQRT(YN) ) C***** COMPUTE THE UPPER CONFIDENCE LIMIT UU(J)= XHAT(J)+ T*(SHAT(J)/SQRT(YN) ) RETURN END -RMAP LIB S Y S T E M i * M A T H S T A T• -XQT 59 -RFOR * IS OUTPUT SUBROUTINE OUTPUT COMMON/ONE/ X < 2 . 1 0 0 0 ) . N(3> COMMON/TWO/ XMEAN( 3 ) . XVAR(3) COMMON/THREE/ XHATO). SHAT(3) COMMON/FIVE/ ALPHA. UL(3 ) . UU(3) COMMON/SEVEN/ NC<5). DELTA COMMCN/EIGHT/ N B ( l O ) . NPRIM(IO). DIFF COMMON/NINE/ WIDTH(2) COMMON/TEN/ LOOP(2)• KEY• DMAX c »#### PRINT HEADINGS FOR PRINTED OUTPUT DO 100 J = l . 2 WRITE(6.15) 15 FORMAT{1 HI) 1F( J . E G . 2 ) GO TO 40 35 F O R M A T ^ / / ! 40X. 45H DATA VALUES USED IN THE CLASSICAL ANALYSIS GO TO 50 I 40 WRITE(6.45) 45 FORMAT(/ / / . 40X. 45H DATA VALUES USED IN THE BAYESIAN ANALYSIS ) C*»**» PRINT BASIC PARAMETERS ASSOCIATED WITH EACH DATA PROCESS 50 CONTINUE WRITE(6.52) N ( J ) 52 FORMAT(///. 10X. 26H NUMBER OF OBSERVATIONS = . I'M WRITE(6.54) XMEAN(2) 54 FORMAT(1 OX . 19H LIKELIHOOD MEAN = . F8.3) 8 IF( J . E Q . 2) WRITE(6.56) XMEAN(1) 56 FORMAT(1H+. T 8 2 . 14H PRIOR MEAN = • F8.3) WRITE(6t58> XVAR(2) 58 FORMAT(lux. 23H LIKELIHOOD VARIANCE = . F8.3) IF( J . E C 2) WRITE(6.60) XVAR ( 1 ) 60 FORMAT(1H+, T 8 2 . 18H PRIOR VARIANCE = . F8.3) WRITE(6.62) DFLTA 62 FORMAT(1UX. 9H DELTA = . F4.2) C***#* PRINT THE DATA VALUES GENFRATED BY RANON WRITE(6»65) ( X U . I ) . 1 = 1 . N ( J ) ) 65 FORMAT( / / / . 1 0 ( 3 X . F 8 . 3 ) ) C**»** PRINT THE SAMPLE STATISTICS OF THE DATA PROCESS IF( J .EG. 2 ) WRITE(6,72) XHAT(3) 72 FORMAT(///. lOX. 47H THE MCAN OF THE POSTERIOR DISTRIBUTION, M— = 1 , F10.5) WRIT"I 6 . 7 5 ) XHA T(J ) » SNA T(J ) 75 I- ORN* AT ( / / / » U'X, 45H THE SAMPLE MT AN Of- THF DATA PRnr>c,r,, , 1 Fl . 5 . / / / . 1 )X, 5QH THr SAMPLf" STANDARD D V I A T1 ON OF TUP DATA p 2R0Cf SS » SHAT = , F10 • 5 ) y r H A r = 6o <.**#*# C PRINT THE ( 1 -ALPHA) CONFIDENCE INTERVAL ASSOCIATE* WITH FACH PROCESS WlDTH(J>= DFLTA*SHAT(J ) WKITEJ6.85) WIDTH(J) 65 FORMAT( / / » 10X. 52H THE DESIRED WIDTH OF THE CONFIDENCE INTERVAL I IS = . F6.2 ) UL(2)= UL(3) UU(2)= UU(3) WRITE(6.95) ALPHA. U L { J ) . UU(J) 95 FORMAT(//» 10X. 47H THE (1-ALPHA ) CONFIDENCF INTERVAL FOP THF MEAN 1 . / . 1 0 x . 38H WITH CONFIDENCE COEFFICIENT. ALPHA = , F 4 . 3 , 2 8H, IS = ( . F 8 . 3 . 2H, , F 8 . 3 . IH) ) IF( J •EQ« 1) GO TO 97 WRITE(6.98) 98 FORMAT(//. 1 AND SAMPLE WRITEC6.99) 99 FORMAT!//. 8 DIFF ] 0 x . 71H THE ABSOLUTE DIFFERENCE BETWEEN THE POSTERIOR MEANS. DIFF = . F6.3) DMAX» KEY 1 0 x . 7H DMAX = . F 6 . 3 . 10H AT LOOP = .12) 97 CONTINUE WRITE(6.96) LOOP(J) 96 FORMAT(//. 1 0 x . 9H LOOPS = 1UJ CONTINUE RETURN END .12) APPENDIX I I FORTRAN PROGRAM FOR THE CHI-SQUARE TEST OF NORMALITY 62 -RFOR* IN CHISQ C***** THIS SUBROUTINE TAKES A SET OF ORDERED DATA (ARRAMGFD F R ^ V C THE LOWEST TO THE HIGHEST VALUE) AND C (1) ESTABLISHES K EOUAL-PROBABIL I TV CELLS, '-'HERE K DEPENDS ON C THE SAMPLE SIZE, I . E . » K= 20 FOR N . G E . 1 0 0 , K= 1" FOP N . GE. C 5u .AND. ) L T • 10!), AND K= 5 FOR M . L T . 50 C (2) PERFORMS A CHI-SOHARE GOODNESS-OF - F I T TEST FoR NORMALITY C ON THE DATA SAMPLE AND DFTERMIMFS THE SIGNIFICANCE LEVEL C AT WHICH WE CAN ASSUME THAT THE DATA SAMPLF IS IN FACT C REPRESENTATIVE OF A NORMAL PROCESS C***** ARGUEMENT DEFINITION C X= THE ARRAY OF DATA VALUES TO BE TESTED (INPUT) C N = THF NUMbER OF DATA PIONTS (INPUT) C K = THE NUMBER OF CELLS INTO WHICH THE DATA IS nIvIDED (INPUT C XHAT = THE SAMPLE MEAN OF THF DATA PROCESS C SHAT = THE SAMPLE STANDARD DEVIATION OF THE DATA PROCESS C CHIS= THE CHI-SQUARE STATISTIC COMPUTED FROM C THE DATA (OUTPUT) C SIGL= THE SIGNIFICANCE LEVEL OF THE TEST (Ol»TP"T) SUBROUTINE CHI SO COMMON/ONE/ X(500 ) , N COMMON/TWO/ XMEAN, XVAR COMMON/FOUR/ K, KLESS1, CHIS, SIGL COMMON/5 I X/ CBSTRD(19), CBN0RM(19), KOUNT(20) DIMENSION ALPHA(19) C***** SET ALL CFLL COUNTERS TO ZERO DO 5 1 = 1 , K KO'INT ( I ) = 0 5 CONTINUE v>#*#* C C iJ 15 20 25 30 35 5J 7) 9.) Hn COMPUTE THE CELL-BREAK POINTS FOR THE GIV F N DATA NOTE — THIS OPERATION USES A STAT-PACT FUNCTION C A | LED TI'lORM TO COMPUTE THE VALUE OF THE INVERSE OF THE NORMAL ( ^ , 1 ) DISTR. I F ( K - 1 J ) 1 0 , 2 n , 30 DO 15 1 = 1 , KLESS1 ALPHA(I)= n . 2 * I CONTINUE GO TO 5 0 DO 25 1 = 1 , KLESS1 ALPHA(I )= 0 . 1 * 1 CONTINUE GO TO 50 DO 35 1 = 1 , KLESS1 ALPHA(I)= 0 . 0 5 * 1 CONTINUE DO U . ) 1 = 1, KLES51 CbSTRD(I)= T1 NORM(ALPHA(I ) , $70) CBMORM(I)= CBSTRD( I )*SQRT( XVAR )+ X'H Af GO TO I J C -'R I TE ( 6 ) r- ORMAT ( / / , 1 iiX,68H [ RROR f-T S AG*—0\'f Rf 1TRlBUT ION — FORMAT 10f ) r CON 1 INUL •" l r IO" ^M ' '•'•',[. "> I ; 63 C#***# COUNT THE NUMUER OF OBSERVATIONS FALLING IN EACH rFLL DO 3<u 1 = 1 , N DO 2 JO J = l» K L ESS 1 IF( X ( I ) .GT. CONORM(J) ) GO TO 190 <OUNT ( J } = <0')MT ( J ) +1 GO TO 300 19; I F ( J . Q . KLFSS1) <OUNT(<)= KCl»NT(K)+l 20') CONTINUE 3(3 J CONTINUE C COMPUTE THE CHI-SOUARE STATISTIC, CHIS CHI l = u.C RM= FLOAT( N ) / FLOAT ( K ) DO 5 J . J I = l» rC CHU= CH I 1+ ( KOUNT ( I ) -RN)**2 50., CONTINUE CHIS= CHI1/RN C***»* i; F T R '-11 N THE SIGNIFICANCE LEVEL OF THE T F ST C NOTE — THIS OPERATION HSFS A STAT-PACT FACTION C A| LFD CHT TO C DETERMINE THE CHI-SOUARE DISTRIBUTION GIVEN THF P^INT AND C THF f) E G R F E S OF FRFEDOM NDF = K.-3 CUXD= CHI( CHIS, NDF, 5600 ) SIGL= 1 . 0 - CU^D GO TO 69 0 P r b*><> '// R I T ' ( 6 , 6 1 0 ) CHIS 6 l J FORMAT(//, 1 OX , 74H ERROR MESSAGE—OVERFLOW ON CHl-SO"ARE DISTRICT HON FUNCTION—FORMAT 600 ,2AH CHI-SOUARE STATISTIC = , F 5 . 2 ) CONTINUE 69 ) RETURN END 64 BIBLIOGRAPHY 1. Anscombe, F . J . , " B a y e s i a n S t a t i s t i c s , " The A m e r i c a n Vol. 15, 1961, pp. 21-24. Statistician, 2. Army R e g u l a t i o n No. 1 0 - 4 , "U. S. Army O p e r a t i o n a l T e s t a n d E v a l u a t i o n A g e n c y , " H e a d q u a r t e r s , D e p a r t m e n t o f t h e Army, W a s h i n g t o n , D. C . , 1 9 7 4 . 3. Army R e g u l a t i o n No. 7 1 - 3 ? " U s e r T e s t i n g , " H e a d q u a r t e r s , o f t h e Army, W a s h i n g t o n , D. C , 1 9 7 4 . 4. Army R e g u l a t i o n No. 1 0 0 0 - 1 . "Basic P o l i c i e s f o r Systems A c q u i s i t i o n by t h e Department of t h e Army," H e a d q u a r t e r s , Department of t h e Army, W a s h i n g t o n , D. C , 1 9 7 1 . 5. A t z i n g e r , E . M . , a n d B r o o k s , W. J . , C o m p a r i s o n o f B a y e a i a n a n d C l a s s i c a l Analysis for a Class of Decision Problems, Technical R e p o r t No. 5 9 , A p r i l 1 9 7 2 , U. S . Army M a t e r i e l S y s t e m s A n a l y s i s Agency, Aberdeen P r o v i n g Ground, Maryland. 6. A t z i n g e r , E . M . , B r o o k s , W. J . , e t a l . , Compendium o n R i s k A n a l y s i s T e c h n i q u e s , S p e c i a l P u b l i c a t i o n No. 4 , J u l y 1 9 7 2 , U. S. Army S y s t e m s A n a l y s i s Agency, Aberdeen P r o v i n g Ground, Maryland. 7. Box, G. E . P . , a n d M u l l e r , M. E . , "A Note o n t h e G e n e r a t i o n o f Random Normal D e v i a t e s , " A n n a l s o f M a t h e m a t i c a l S t a t i s t i c s , V o l . 2 9 , 1958, p p . 21-2U. 8. G i l b r e a t h , S. G., A Bayesian Procedure f o r t h e Design of S e q u e n t i a l Sampling P l a n s , Unpublished D o c t o r a l D i s s e r t a t i o n , Georgia I n s t i t u t e of Technology, A t l a n t a , Georgia, 1966. 9. H i n e s , W. W., a n d Montgomery, D. C , P r o b a b i l i t y a n d S t a t i s t i c s i n E n g i n e e r i n g a n d Management S c i e n c e , New Y o r k : The R o n a l d P r e s s Company, 1 9 7 2 . Department 10. H o e l , P . G . , P o r t , S. C , a n d S t o n e , C. J . , I n t r o d u c t i o n t o c a l Theory, Boston: Houghton M i f f l i n Company, 1 9 7 1 . 11. 0TEA, "MICV F i r i n g P o r t Weapon O p e r a t i o n a l T e s t I (MICV FPW 0T I ) T e s t D e s i g n P l a n , " U. S. Army O p e r a t i o n a l T e s t a n d E v a l u a t i o n Agency, F t . B e l v o i r , V i r g i n i a , September 1973- 12. M a c e , A. E . , Sample S i z e D e t e r m i n a t i o n , New Y o r k : P u b l i s h i n g C o r p . , 1964. Reinhold Statists 65 13. R a i f f a , H . , and S c h l a i f e r , R . , A p p l i e d S t a t i s t i c a l D e c i s i o n T h e o r y , Boston: The MIT P r e s s , 1 9 6 1 . Ik. UNIVAC, " L a r g e S c a l e S y s t e m s STAT-PACK and MATH-PACK P r o g r a m m e r ' s R e f e r e n c e , " New York" S p e r r y Rand C o r p . , 1 s t R e v i s i o n , 1 9 7 0 . 15. W h i t e , L. R . , B a y e s i a n R e l i a b i l i t y A s s e s s m e n t f o r S y s t e m s P r o g r a m Decisions, Unpublished Masters D i s s e r t a t i o n , Air Force I n s t i t u t e o f T e c h n o l o g y , W r i g h t - P a t t e r s o n AFB, O h i o , 1 9 7 1 . 16. W i n k l e r , R. L . , I n t r o d u c t i o n t o B a y e s i a n I n f e r e n c e a n d D e c i s i o n , New York: H o l t , R i n e h a r t and Winston, I n c . , 1972.
© Copyright 2024