ON CONTROLLED SAMPLING SCHEMES WITH APPLICATIONS TO SAMPLE SURVEYS THESIS SUBMITTED TO THE KUMAUN UNIVERSITY, NAINITAL BY ILA PANT FOR THE AWARD OF THE DEGREE OF DOCTOR OF PHILOSOPHY IN STATISTICS UNDER THE SUPERVISION OF Dr. NEERAJ TIWARI READER AND CAMPUS HEAD DEPARTMENT OF STATISTICS KUMAUN UNIVERSITY, S.S.J.CAMPUS, ALMORA-263601 UTTARAKHAND (INDIA) 2008 1 CERTIFICATE This is to certify that the thesis entitled “ON CONTROLLED SAMPLING SCHEMES WITH APPLICATIONS TO SAMPLE SURVEYS” submitted to the Kumaun University, Nainital for the degree of DOCTOR OF PHILOSOPHY IN STATISTICS is a record of bonafide work carried out by Mrs. ILA PANT, under my guidance and supervision. I hereby certify that she has completed the research work for the full period as required in the ordinance 6. She has put in the required attendance in the Department and signed in the prescribed register during the period. I also certify that no part of this thesis has been submitted for any other degree or diploma. (Dr. Neeraj Tiwari) Reader and Campus head, Department of Statistics Soban Singh Jeena Campus Almora (Uttarakhand) 2 ACKNOWLEDGEMENT I am very much grateful to my supervisor Dr. Neeraj Tiwari, Reader and Campus head, Department of Statistics, Soban Singh Jeena Campus, Almora, for excellently guiding and constantly supporting me to do my research work on “ON CONTROLLED SAMPLING SCHEMES WITH APPLICATIONS TO SAMPLE SURVEYS”. I have learned a lot while working with my supervisor Dr. Neeraj Tiwari. I have no words to express my feelings for constant encouragement and help I received from my loving husband Mr. Raj Kishore Bisht. I would like to express my gratitude to all faculty members of the Department of Statistics, S. S. J. Campus Almora. I am very thankful to Mr. Girish Kandpal, Mr. Virendra Joshi, Mrs. Ruchi Tiwari, Mr. Girja Pandey, Mr. Lalit Joshi and Mr. C. M. S. Adhikari for their co-operation in completing my research work. I convey my sincere gratitude to my staff of the Department of Economics and Statistics, Udham Singh Nagar for a lot of support in the completion of my research work. I find no words to express my feelings for constant encouragement, blessing and inspiration, I received from my father Mr. H. C. Pant, mother Mrs. Bimla pant, father in law Mr. D. K. 3 Bisht, elder sister Mrs. Himanshi Gunwant and younger brother Mr. Bhaskar pant. Help received from Banaras Hindu University (BHU) and IASRI, Pusa New Delhi is also gratefully acknowledged. Last but not least, I am much grateful to the God for giving me the strength, patient and very supporting family members. Date: (ILA PANT) 4 CONTENTS CHAPTERS PAGE NO. CHAPTER I INTRODUCTION 1-37 1.1 Historical background of Controlled Selection 1 1.2 Controlled Selection: Concept and definition 6 1.3 Review of literature 12 1.4 Estimates of the variances 28 1.5 Frame-work of the thesis 35 CHAPTER II ON AN OPTIMAL CONTROLLED NEAREST PROPORTIONAL TO SIZE SAMPLING SCHEME 38-98 2.1 Introduction 38 2.2 The optimal controlled sampling design 43 2.3 Examples 55 5 CHAPTER III TWO DIMENSIONAL OPTIMAL CONTROLLED NEAREST PROPORTIONAL TO SIZE SAMPLING DESIGN USING QUADRATIC PROGRAMMING 3.1 Introduction 3.2 The two dimensional optimal controlled 99-159 99 nearest proportional to size sampling design 105 3.3 Examples 111 3.4 Variance estimation for the proposed 126 plan CHAPTER IV ON STATISTICAL DISCLOSURE CONTROL USING RANDOM ROUNDING AND CELL PERTURBATION TECHNIQUES 160-211 4.1 Introduction 160 4.2 Controlled cell perturbation: The proposed methodology 167 4.3 Examples 176 6 CHAPTER V OPTIMAL CONTROLLED SELECTION PROCEDURE FOR SAMPLE CO-ORDINATION PROBLEM USING LINEAR PROGRAMMING 212-272 5.1 Introduction 212 5.2 The optimal controlled procedure 217 5.3 Examples 227 CHAPTER VI THE APPLICATION OF FUZZY LOGIC TO THE SAMPLING SCHEME 273-305 6.1 Introduction 273 6.2 Fuzzy logic approach 279 6.3 The proposed procedure 284 CHAPTER VII: SUMMARY 306-310 REFERENCES 311-322 7 CHAPTER I INTRODUCTION 1.1 HISTORICAL BACKGROUND OF CONTROLLED SELECTION In most of the practical situations, it is not possible to collect the information about each and every unit of the given population, from which we have to draw some conclusions. This is because it is very costly and time consuming to collect the information about each and every unit of the population. Thus in these situations a device known as sampling survey is used to draw the inferences about the given population. Sampling survey consists of selecting a part of a finite population, followed by making of inferences about the entire population on the basis of the selected part. The purpose of sampling theory is to develop methods of sample selection and of estimation that provides estimates which are precise enough to draw the conclusions about the given population. 8 There exist many sampling procedures in the literature, out of which the simplest method of sampling is simple random sampling (SRS), in which each and every unit of the population has equal chance of being included in the sample i.e. there is no restriction in the selection of sampling units. Thus the procedure of simple random sampling may result in the selection of a sample which is not quite desirable. For example suppose we have to conduct a sample survey and we are using the procedure of SRS. Then it may happen that the selected sampling units are not important from the point of view of the character under study or it may happen that the selected sampling units are geographically spread out, thereby not only increasing the expenditure on travel and wastage of time but also affecting adversely the supervision of the fieldwork. All these factors would seriously affect the quality of the data collected and also the precision of the estimate of the parameter. Hence there arises the need of developing a suitable sampling methodology, in which some controls have imposed to reduce the risk of above mentioned factors. The concept of introducing controls in sampling procedures has originated first with the procedure of stratified sampling, which provides us the opportunity to represent all the homogeneous sub- 9 groups within a heterogeneous population. For example suppose, we want to perform a sample survey to know the status of women in Uttarakhand and we have to select a sample of size 4 from 13 districts. If we use SRS to select a sample of size 4, then all the 13 districts has the same chance of being selected in the sample. Thus it may happen that the sample selected would consists of all the districts from the hill area or all the districts from the plan area and thus could not fulfill the purpose of the survey because in this situation the selected sample would either represent the status of the women belonging to only hilly areas or only to the plan areas, respectively. Stratified sampling would be advantageous in this situation as we can form two strata of whole population, one consisting of districts belonging to hill areas and other consisting of districts belonging to plan areas. Thus two districts can be selected from each stratum and then the selected sample would more preciously represent the status of women in Uttarakhand. In this example we observe that the sample which consists of the districts from both the hilly and the plan areas appears to be most precise and desirable. Such samples are termed as “preferred” or desirable samples and all other samples are termed as “non-preferred” or undesirable samples. Thus in this example 10 stratified sampling has been used to impose a restriction or control (in the form of representing both the hilly and plane regions of the state) on the selection of the sample. Systematic sampling can also be used for achieving controls. In Systematic sampling only k preferred samples with equal probability of selection (1/k) are left and the remaining ( N C n − k ) non-preferred samples have zero probability of selection. Besides stratified sampling and systematic sampling there also exists sampling procedures which imposes one or more restrictions on the selection of the sampling units. In fact any departure from SRS can be considered as a control, which increases the probability of selection of preferred combination of units and consequently decreases the selection probability of non-preferred combination of units. We observe that in all these sampling procedures the selection of the sample is partially controlled. But in many cases there may arises the situations in which, it is necessary to further control the selection of the sampling units. In these situations all the above sampling procedures may not be sufficient and the sampler has to use another method of sampling with some more restrictions (controls) on the selection of the sampling units. Hence there arises the need of 11 developing a suitable sampling procedure which reduces the risk of getting a non-preferred sample from the population and increases the selection probability of preferred sample combination. The sampling designs which eliminate or assign very small selection probability to the non-preferred samples by imposing controls while selecting samples, are called controlled sampling designs and the procedure of selecting the samples using these controlled sampling designs are known as “Controlled Selection” or “Controlled Sampling”. The term “Controlled Selection “ or “Controlled Sampling” is rather uncommon in the field of sample surveys, however the need of this technique in sampling was long felt. The problem of imposing controls while selecting samples for National samples of countries in U. S. was discussed by Frankel and Stock (1942) and Goodman and Kish (1950). A slight modified version of controlled selection was adopted for more general use by Hess, Reidel and Fitzpatrick (1961) in the selection of hospitals and patients for the data for 1961 universe of nonfederal, short term general medical hospitals in the United States. To further examine the relative advantages of controlled selection, Waterton (1983) used the technique for a data on a postal survey of Scottish school leavers carried out in 1977. In recent years, 12 there had been a lot of work in the field of controlled selection due to its practical importance. This has been discussed in section 1.3. 1.2 CONTROLLED SELECTION: CONCEPT AND DEFINITION “Controlled Selection” or “Controlled Sampling” as the name suggests, is a method of selecting the samples from the finite population by imposing certain restrictions or controls while selecting the population units in the sample. While selecting the sampling units, the sampler has to keep in mind certain facts about the survey such as the cost of the survey, time taken in the completion of the survey and other factors related to the sample survey. While selecting the sample it may happen that the selected sample consists of the units which are very costly and does not fit into the budget of the survey or these units may be spread out very far so that the completion time of the survey may be increased. Thus the sampler wants to avoid these types of sample combinations. For this purpose the sampler has to impose certain restrictions in the selection of the sample. These restrictions may be of different kind, depending upon the requirement of the 13 particular sampling plan. Due to these restrictions, some combination of units become preferable and other combination of units becomes non-preferable. The technique of controlled selection is used in sampling to minimize as far as possible the probability of selecting the non-preferred samples, while conforming strictly to the requirements of probability sampling. Although the concept of controlled selection was being used by the statisticians for a long period of time, it received considerable attention in recent years due to its practical importance. The controlled selection has been defined by various authors in following manner: According to Goodman and Kish (1950) the controlled selection is, “Any process of selection in which, while maintaining the assigned probability for each unit, the probabilities of selection for some or all preferred combinations of n out of N units are larger than in stratified random sampling (and the corresponding probabilities of selection for at least some non-preferred combinations are smaller than in stratified random sampling)”. 14 Wilkerson (1960) describes the controlled selection as, “The probability selection of a sample pattern from a set of patterns, which have been purposively established, so that, taken as a group, they give to each primary sampling unit its proper chance of appearing in the final sample. Each pattern is set up in accordance with controls, which may be as rigid as desired to ensure that it satisfies selected criteria of proper distribution”. According to Hess and Srikantan (1966), “Controlled selection, a technique of sampling from finite universe, permits multiple stratification beyond what is possible by stratified random sampling, while conforming strictly to the requirements of probability sampling”. Controlled selection is applicable in many other areas. Some of the applications of controlled selection have been given in the following subsection. 15 1.2.1 Applications of Controlled Selection to Statistical Problems: The concept of controlled selection is applicable not only in the field of sampling but this concept is related with many other fields and thus it can be used for many purposes. The concept of controlled selection and controlled rounding are closely related with each other as the controlled rounding procedure can be used to solve the problem of controlled selection. Statistical Disclosure Control (SDC) is one of the areas where controlled rounding can be used and hence the concept of controlled selection can also be used for the purpose of SDC. SDC can be defined as a technique of generation and dissemination of statistical information concerned with the management of the risks of disclosing information about respondents in a table. Since all the cells in the table are not confidential, thus we have to protect only those cells which contains confidential information and these cells are called sensitive cells. Thus in the technique of SDC, we have to impose certain restrictions in the publication of the sensitive cells so that none can get the confidential information. Here we see that we are imposing certain restrictions in 16 the publication of sensitive cells and thus we are using the concept of controlled selection in a different way. Another area where controlled selection can be used is the procedure of overlap of sampling units in two or more different surveys. In the procedure of overlap of sampling units we either take as many units as possible common in the different surveys (called maximization of overlap of sampling units) or try to ignore the common units in different surveys (called minimization of overlap of sampling units). Since in both the procedures (i.e. the maximization and minimization of overlap of sampling units) we have to impose certain restrictions or controls while conducting the survey, so that we can get maximum or minimum units common in different surveys, thus we can say that again we have to use the concept of controlled selection. Again if we have some prior knowledge about the population for which we have to conduct the sample survey, then we can use this information for improving the efficiency of the survey. We know that in probability proportional to size sampling (PPS), we assign the probability to the population units according to their size, but in some situations, it may happen that some auxiliary information related with 17 the population units is also available. This information can also be utilized while assigning the selection probabilities to the population units, to increase the efficiency of the survey. For utilizing the auxiliary information, we use the concept of fuzzy logic. More information about the population units allows imposing more restrictions in assigning the probability to the population units, thus the sampling procedure becomes more efficient. Here, we are again imposing the controls while assigning the initial selection probabilities to the population units and thus we are using the concept of controlled selection. In the following paragraph, we are giving in brief the basic idea of fuzzy logic and fuzzy inference system. (a) Fuzzy Logic: The fuzzy logic is a logic, which deals with the values, which are approximate rather than exact. The classical logic relies on something which is either true or false. A True element is usually assigned a value of 1 and false has a value 0. Thus, something either completely belongs to a set or it is completely excluded from the set. The fuzzy logic broadens this definition of classical logic. The basis of the logic is fuzzy sets. Unlike in classical sets, where membership is full or none, an object is allowed to belong only partly 18 to one set. The membership of an object to a particular set is described by a real value which lies between 0 and 1. Thus, for instance, an element can have a membership value 0.5, which describes a 50% membership in a given set. Such logic allows a much easier application of many problems that cannot be easily implemented using classical approach. (b) Fuzzy Inference System: Using fuzzy inference system, we can utilize all the auxiliary information to find out the final results. Mat Lab Fuzzy Logic Toolbox provides the facility of constructing Fuzzy inference system. For this purpose, one needs to choose the Baseline model i.e. input variables, output variables, implication method, aggregation method and defuzzification method. Construction of rules in fuzzy inference system is an important part. Rules can be defined from a common knowledge about required inference procedure. 1.3 REVIEW OF LITERATURE In this section we describe the earlier work done in different aspects on controlled selection and its related topics. We first give in subsection 1.3.1, the description of the work done on controlled 19 selection in one and two-dimensions. In subsection 1.3.2, we describe the work done on controlled rounding and disclosure control and in subsection 1.3.3 we give the work done on the sample co-ordination problem. In the last subsection i.e. in 1.3.4, we give in brief the work done on Fuzzy logic and other areas in which fuzzy logic had been applied. 1.3.1 Controlled Selection in One and Two-Dimensions: The technique of controlled selection was originally formulated by Goodman and Kish (1950). They used the technique of controlled selection to a specified problem of selecting twenty one primary sampling units to represent the North-central states and found that by the use of this technique, the between first stage unit components of the variance were reduced from 11% to 32% below the same components corresponding to the stratified random sampling. Hess and Srikantan (1966) used the data for 1961 universe of non-federal, short-term general medical hospitals in the United States to illustrate the applications of estimation and variance formulae for controlled selection. They pointed out some advantages that can be 20 expected from controlled selection over one way stratified random sampling. These were: 1. Controls may be imposed to secure proper distribution geographically or otherwise and to ensure adequate sample size for subgroups that are domains of study. 2. To secure moderate reduction in the sampling errors of a multiplicity of characters simultaneously. 3. To secure the significant reduction of the sampling error in the global estimates of specified key variables. In the study of Hess and Srikantan (1966), Waterton (1983) used the data available from a postal survey of Scottish school leavers carried out in 1977, to describe the advantages of controlled selection and compare the efficiency of controlled selection with multiple proportionate stratified random sampling. Different approaches have been given by various authors to implement controlled selection. These may be broadly classified into three categories, namely 1. Method of typical experimental design configurations. 2. Method of emptying boxes. 21 3. Method of linear programming. Now we shall give in brief, some of the work done by various authors on the above three approaches. First, let us consider the method of typical experimental design configurations. Chakrabarti (1963) was the first who used a balanced incomplete block design (BIBD) with parameters v = N, b < N C n , k = n, r and λ to construct controlled simple random sampling without replacement design, where N is the population size and n > 2 is the sample size and the parameters v, b, r, k, λ have their usual meanings. The method discussed by Chakrabarti has certain limitations, as the BIBD does not exist for many combinations of v and k. For instance, no BIBD exist for v = 8, k = 3 with b < 8 C3 = 56 blocks. To overcome this drawback, BIBD with repeated blocks were used by Wynn (1977) and Foody and Hedayat (1977). Avadhani and Sukhatme (1973) also worked on this method, using BIBD, to minimize the probability of selecting a non-preferred sample. 22 Gupta, Nigam and Kumar (1982) extended the idea of experimental design configurations to obtain controlled sampling designs with inclusion probability proportional to size (IPPS) by using BIBD. Nigam, Kumar and Gupta (1984) used typical configuration of different types of experimental designs such as BIBD with or without repeated blocks, supplemented blocks, Partially balanced incomplete blocks and cyclic designs for obtaining controlled IPPS sampling plans, with the property cπiπj ≤ πij ≤ πiπj for all i ≠ j = 1, 2, …, N and c is some positive constant, 0 < c < 1. Gupta, Srivastava and Reddy (1989) used binary incomplete connected block designs to construct controlled IPPS sampling designs. Srivastava and Saleh (1985) and Mukhopadhyay and Vijayan (1996) suggested the use of ‘t-designs’ to replace simple random sampling without replacement (SRSWOR) designs to construct controlled sampling designs. The work on two dimensional controlled selection problems was mainly due to Patterson (1954), Yates (1960) and Jessen 23 (1969,1970,1973,1975 and 1978) under the titles ‘lattice sampling’, ‘two way stratification’ and ‘multi- stratification’. Patterson (1954) and Yates (1960) examined the case in which the cells are of equal size and marginal constraints are integers-all being equal in the case of squares. They also discussed the case of a rectangle where the marginal constraints of rows/columns would be a multiple of those for the other. They considered the method of selection as well as the properties of such samples. Yates suggested the name ‘lattice sampling’ for his schemes. Jessen (1969) discussed four methods of selecting PNR (Probability non replacement) sample and analyzed their properties. These methods provide samples in which the probability of including the ith element in the sample is proportional to the size of the element. His ‘Method 2’ is superior to ‘Method 1’ in the sense that it involves lesser steps than ‘Method 1’. ‘Method 3’ provides positive πij’s for all element pairs, although it is complex in nature. The ‘Method 4’ is limited to size n = 2, whereas the other methods can be used for any n. Jessen (1970) considered the general problem of sampling from multidimensional universe with the objective of selecting 24 samples that are repetitive of the universe in each sample’s dimensions as well as jointly. Jessen (1973) examined some properties of a simple two way probability lattice sampling, i.e. selecting a set of cells of unequal sizes where the probabilities of selection of cells are proportional to the size of the cells and sample sizes along rows and columns are fixed. Jessen (1975) discussed the construction and the related estimated problems for square and cubic lattice in the case of ‘Random Lattices’ and ‘Probability Lattices’. Jessen (1978) summarized his earlier works and extended them to more general situations. The second approach, known as ‘The method of emptying boxes’, was proposed by Hedayat and Lin (1980). They proposed the method of emptying boxes to construct controlled IPPS sampling plans satisfying πij > 0 and πij < πiπj for all i ≠ j = 1, 2, …, N. This method is quite close to the decremental method of Jessen (1969). The third approach which is used extensively in recent years is the ‘Linear programming approach’. 25 Causey, Cox and Ernst (1985) were the first who used the transportation model to solve two and more dimensional controlled selection problems. Using transportation model, Causey, Cox and Ernst (1985) developed an algorithm for controlled selection, which completely solves the two-dimensional problem. They showed with the help of an example that the solution of the three-dimensional controlled selection problem does not always exists. They also provide a method for maximizing and minimizing the overlap of sampling units in two different surveys. Rao and Nigam (1990, 1992) used the simplex method in linear programming to solve the one dimensional controlled selection problems for two different situations, namely (1) Controlled sampling design for specified πij’s (and hence πi’s) and (2) Controlled sampling designs for specified πi’s and hence the πij’s subject to the constraints cπiπj ≤ πij ≤ πiπj for all i ≠ j = 1, 2,…, N, where c is some positive constant, 0 < c < 1. Their approach provides optimal solutions and is superior to all previous work done in this direction, however, they did not consider two and more dimensional controlled selection problems. Sitter and Skinner (1994) also proposed the linear programming approach by applying the ideas of Rao and Nigam, to 26 multiway stratification. However, they did not consider controls beyond stratification and also the computations required by their procedure increases rapidly as the number of cells in the multiway classification increases. Tiwari and Nigam (1998) suggested a method for two dimensional controlled selections using simplex method in linear programming. Their method derives its inspiration from the optimal controlled sampling designs of Rao and Nigam (1990, 1992). They also proposed an alternate variance estimator for controlled selection designs, as Horvitz-Thompson estimator could not be used to their plan due to non fulfillment of the condition πij ≤ πiπj. Lu and Sitter (2002) developed some methods to reduce the amount of computation so that very large problems became feasible using the linear programming approach. Tiwari, et. al. (2007) proposed an optimal controlled sampling design for one dimensional controlled selection problems. Tiwari et. al. (2007) used the quadratic programming to obtain the optimal controlled sampling design. This design ensures the probability of selecting non-preferred samples exactly equal to zero, rather than minimizing it, without sacrificing the efficiency of Horvitz-Thompson 27 estimator based on an associated uncontrolled IPPS sampling plan. The idea of ‘nearest proportional to size sampling designs’, introduced by Gabler (1987), is used to construct the proposed design. For variance estimation the Yates-Grundy form of Horvitz-Thompson estimator can be used, as the proposed procedure satisfies the necessary and sufficient conditions required for the estimation of H-T estimator. 1.3.2 Controlled Rounding/ Disclosure Control: Rounding techniques involve the replacement of the original data by multiples of a given rounding base. Rounding methods are used for many purposes, such as for improving the readability of data values, to control statistical disclosure in tables, to solve the problem of iterative proportional fitting (or raking) in two-way tables and controlled selection. Statistical disclosure control is one of the area in which rounding methods are widely used. In the following paragraphs, we first discuss the work done in controlled rounding and then give some work done by different authors for disclosure control. 28 Fellegi (1975) proposed a Random rounding technique for one dimensional table. This random rounding technique unbiasedly rounds the cell values and also maintains the additivity of the rounded table. Drawback of this method is that it can be applied to only one dimensional table. Cox and Ernst (1982) proposed a method based on transportation model, to solve completely the controlled rounding problem i.e. the problem of optimally rounding the real valued entries in a two-way tabular array to adjacent integer values in a manner that preserves the tabular (additive) structure of the array. The method consists of the replacement of a real number a by an adjacent integer value R(a), where R(a) equals either a or a +1 with a = integer part of a . Here the rounding base is 1. Any problem with rounding base B can be reduced to rounding base 1 by dividing all entries by B. If A denotes the tabular array then R(A) will be an optimal controlled rounding of A if the p th root of the sum of p th powers of the absolute values of the differences between the values in A and R(A) is minimized. Also if R(a) − a < 1 then the rounding is referred to as “zero restricted controlled rounding”. 29 Cox (1987) presented a constructive algorithm for achieving unbiased controlled rounding in two and three dimensions, which is simple to implement by hand for small to medium sized table. According to Nargundkar and saveland (1972), the rounding procedure is said to be unbiased if E ( R(a )) = a i.e. if the expected value of the rounded entry equals to the corresponding original (unrounded) entry of the given table. The procedure of Cox (1987) is based on the concept of ‘alternating row-column path’ in an array. The main drawback of the procedure of Cox (1987) is that this procedure is somewhat arbitrary and sometimes needed a large number of iterations for a solution. Thus without changing the basic concept of the method of Cox (1987), Tiwari and Nigam (1993) introduced a method for unbiased controlled rounding which terminates in fewer steps. As discussed earlier one of the method of achieving Statistical Disclosure Control is controlled rounding. All the methods of controlled rounding discussed above can be used for SDC, but there also exist some other methods in the literature for SDC such as cell suppression, partial cell suppression and cell perturbation etc. Cell suppression can be defined as a method in which sensitive cells are 30 not published i.e. they are suppressed. To make sure that the suppressed cells can not be derived by subtraction from published marginal totals, additional cells are selected for suppression and these cells are known as complementary suppressions. In the method of cell suppression one has to find out the complementary suppressions in such a way that the loss of information in minimum. Different methods of cell suppression are discussed by various authors such as Cox (1980), Sande (1984), Carvalho et. al. (1994) and Fischetti and Salazar (2000) etc. Method of partial cell suppression was discussed by Fischetti and Salazar (2003). In the method of partial cell suppression instead of wholly suppressing primary and complementary cells, some intervals obtained with the help of a mathematical model are published for these cell entries. The loss of information in partial cell suppression is smaller in comparison to complete cell suppression. In order to reduce the amount of data loss that occurs from cell suppressions, Salazar (2005) proposed an improved method and termed it as “Cell perturbation”. This method is closely related to the classical controlled rounding methods and has the advantage that it also ensures the protection of sensitive cells to a specified level, while 31 minimizing the loss of information. However, this method has some disadvantages also. Firstly, it perturbs all the cell values resulting a large amount of data loss. Secondly, the marginal cell values of the resultant tables are not preserved, thereby disturbing the marginal which are non-sensitive and expected to be published in their original form. 1.3.3 Sample Co-Ordination Problem: The problem of co-ordination of sampling units has been a topic of interest for more than fifty years. Different methods have been proposed by various authors in order to solve the sample coordination problem. Some of the methods proposed by earlier authors are given in the following paragraph. The first approach on sample co-ordination problem was discussed by Keyfitz (1951). Keyfitz (1951) proposed an optimum procedure for selecting one unit per stratum designs, when initial and new designs have identical stratification, with only the change in selection probability. Fellegi (1963, 1966), Gray and Platck (1963) and Kish (1963) also proposed methods for sample co-ordination problem but these methods are in general restricted to either two 32 successive samples or to small sample size. In order to solve the problem in context of a larger sample size, Kish and Scott (1971) proposed a method for sample co-ordination problem. Brever et. al. (1972) introduced the concept of permanent random number (PRN) for solving the sample co-ordination problem. The concept of linear programming approach for solving the problem of sample coordination problem was first discussed by Causey et al. (1985). Causey et. al. (1985) proposed an optimum linear programming procedure for maximizing the expected number of sampling units which are common to the two designs, when the two sets of sample units are chosen sequentially. Ernst and Ikeda (1995) also presented a linear programming procedure for overlap maximization under very general conditions. Ernst (1996) developed a procedure for sample coordination problem, with one unit per stratum designs where the two designs may have different stratifications. Ernst (1998) proposed a procedure for sample co-ordination problem with no restriction on the number of sample units per stratum, but the stratification must be identical. Both of these procedures proposed by Ernst (1996, 1998) uses the controlled selection algorithm of Causey, Cox and Ernst (1985) and can be used for simultaneous as well as sequential sample 33 surveys. Ernst and Paben (2002) proposed a new methodology for sample co-ordination problem, which is based on the procedure of Ernst (1996, 1998). This procedure has no restriction on the number of sample units selected per stratum and also does not require that the two designs have identical stratification. Recently Matei and Tillé (2006) proposed a methodology for sample co-ordination problem for two sequential sample surveys. They proposed an algorithm, based on iterative proportional fitting (IPF), to compute the probability distribution of a bi-design. Their methods can be applied to any type of sampling design for which it is possible to compute the probability distribution for both samples. 1.3.4 Fuzzy Logic Approach: The concept of Fuzzy logic was introduced in 1965 by Lotfi Zadeh. Zadeh published his seminal work as "Fuzzy Sets", in which he described the mathematics of fuzzy set theory, and by extension fuzzy logic. In fuzzy set theory, Zadeh proposed a method of making the membership function (or the values False and True) operates over the range of real numbers [0.0, 1.0]. Zadeh (1965), described that if A is a fuzzy set and x is a relevant object then the proposition “x is a 34 member of A” is not necessarily either true or false, as required by classical logic, but it may be true only to some degree, the degree to which x is actually a member of A. Albert (1978) has defined some basic concept of the algebra of fuzzy logic. Frühwirth-Schnatter (1992) have used the fuzzy data on statistical inference and applied it to the descriptive statistics. Frühwirth-Schnatter (1993) again used the concept of fuzzy logic in Bayesian inference. Doherty, Driankov and Hellendoorn (1993) described the fuzzy if-then unless rules and their implementation in fuzzy logic. Azmi (1993) used some statistical and mathematical tools to define the fuzzy approach in operations research. Various authors have used the concept of fuzzy logic in different areas, such as, Bellman and Zadeh (1970), Dockery and Murray (1987), Biswal (1992) and Bit, Biswal and Alam (1992) etc. 1.4 ESTIMATES OF THE VARIANCES One of the problems which need attention when dealing with two or more stratification variables is variance estimation. Estimating the variance of the estimator is necessary for practical purposes. 35 Various authors have proposed different procedures for obtaining the estimates of the variances for controlled selection designs. We discuss some of these in the following paragraphs. To demonstrate the utility of controlled selection in reducing the variance of the key estimates, Goodman and Kish (1950) drew 100 samples of 17 units each, using the method of controlled selection for the population of North Central States of U.S.A. the individual units within the selected groups (samples) are chosen with PPS sampling. The mean for each of the 100 samples is calculated and the variance among those 100 means is obtained. The variance of the stratified random selection is calculated with the help of standard formula for stratified sampling. Thus ‘between’ components of variances are obtained. The ‘within’ components of variances are obtained through using the appropriate formula for simple random sample of n cases within each of the 17 first-stage units with the assumption that the variance within each of these units is the same. Out of the 8 items considered by the authors, for the first four items (i.e. items 1,2,3 and 4) the between component is a crucial part of the total variance but for the next four items (i.e. items 5,6,7 and 8) the between component is relatively unimportant. The authors showed 36 that for the first four items, the use of controlled selection resulted into significant reductions in the between variances and hence a significant reduction in the total variances. But for next four items, for which the between components are unimportant, the reductions in total variances are marginal. Thus in Goodman and Kish’s point of view, it may be concluded that the items for which the between component of the variance plays an important role, the use of controlled selection is highly justified. Goodman and Kish (1950) proposed the procedures for uncontrolled high entropy (meaning the absence of any detectable pattern or ordering in the selected sample units) plan. The expression ∧ for variance of Y HT correct to O(N-2) using the procedure of Goodman and Kish (1950) is given as ∧ V (Y HT ) GK = 1 2 2 2 p A − (n − 1) ∑ pi Ai 2 ∑ i i nN i∈S i∈S 2 n −1 3 2 2 2 2 2 − 2∑ pi Ai − ∑ p i ∑ pi Ai − 2 ∑ pi Ai nN 2 i∈S i∈S i∈S i∈S where Ai = Yi −Y pi N and Y = ∑ Yi i =1 37 Jessen (1970) proposed Horvitz-Thompson estimator and used the expressions given by Yates and Grundy (1953) for calculating the variance of the estimator and for unbiased estimation of the variance in the case of probability sampling with marginal constraints. The sampling procedure with unequal probabilities and without replacement in which the inclusion probability of ith unit in a sample of size n is π i =npi ( pi being the probability of selecting the ith unit of the population at the first draw) is known as IPPS (inclusion probability proportional to size) sampling. The estimator used in such situations is due to Horvitz and Thompson (1952), and as such, known as Horvitz-Thompson estimator (H-T estimator). To estimate the population mean ( Y ) based on a sample s of size n, the H-T unbiased estimator can be defined as ∧ Y HT = ∑ i∈s Yi Nπ i , where Yi is the value of the ith sample unit and N denotes the total number of population units. Sen (1953) and Yates and Grundy (1953) showed ∧ independently that for fixed size sampling designs, Y HT has the variance 38 ∧ V (Y HT ) = 1 N2 N ∑ (π π i < j =1 i j - π ij ) ( Yi πi − Yj πj )2 , ∧ and an unbiased estimator of V( Y HT ) is given as 1 2 V (Y HT ) = N ∧ ∧ π i π j − π ij Yi Yj 2 ( − ) π ij πi πj i < j =1 n ∑ where π ij denotes the inclusion probability of the units i and j. Jessen (1973) examined the use of both H-T and YatesGrundy forms of variance estimator and found that these estimators suffer from the drawback of providing negative estimates. Also, in certain non-trivial circumstances the variance of these two variance estimators is rather high. Moreover, they provide unbiased estimates of variance only if π ij > 0 for all (i,j) in the population. To overcome these difficulties, Jessen (1973) has suggested the use of ‘split sample estimator’. To illustrate the expression for this estimator, suppose that two units are selected from each row and column of a two-way table so that the resultant sample can be split into two parts, each containing one unit in every row and column. Let these two parts be denoted by A and B. Then the population total can 39 be estimated by each half sample. Thus, from each half sample, we obtain ∧ Yi i =1 π i / 2 n/2 YA =∑ where n is sample size and π i / 2 is the probability of including ith unit in the given half-sample. Then, a combined estimator of Y, known as split sample estimator, is given by ∧ ∧ ∧ Y = (Y A + YB ) / 2 and its variance is estimated by N ∧ ∧ ∧ ∧ 2 Var (Y ) = (1 − n∑ pi ) (Y A − YB ) / 2 i −1 2 N where (1 − n∑ pi 2 ) is an approximate finite population correction i =1 factor. The split sample estimator of Jessen (1973) is useful in the situations where the stability condition of the H-T estimator or the non-negativity condition of Yates-Grundy form of H-T estimator is not satisfied. However, the Jessen’s split sample estimator is negatively biased and biases are found to be quite high. Tiwari and Nigam (1998) have proposed a method for variance estimation for two dimensional controlled selection problems. To 40 describe their method, suppose two units are selected from each row and column of an LxL array. Denoting by y i1 , y i 2 , i = 1,2,..., L , the two observations from the ith row and let pi1 , pi 2 be the corresponding probabilities of selection. Similarly, let y1i , y 2i , i = 1,2,..., L , be the observations from ith column and p1i , p 2i be their corresponding probabilities of selection. An unbiased estimator of population total is given by ∧ L Y = ∑( i =1 y i1 y y y + i 2 + 1i + 2i ) 2 p i1 2 pi 2 2 p1i 2 p 2i and its variance is estimated by N L ∧ ∧ y y y y 2 Var (Y ) = (1 − n∑ pi )(1 / 4)∑ ( i1 − i 2 ) 2 + ( 1i − 2i ) 2 pi 2 p1i p 2i i =1 i =1 p i1 where N = L2. This estimator of variance was found to be positively biased but the bias were quite low in comparison to split sample estimator of Jessen (1973). Recently, Brewer and Donadio (2003) derived the πij-free formula for high entropy variance of HT estimator. They showed that the performance of this variance estimator, under conditions of high entropy, was reasonably good for all populations. Their expression for 41 the variance of the HT estimator is given by ∧ V (Y HT ) BD = 1 N2 ∑π i∈S −1 i (1 − ci π i )(Yi π i − Y n −1 ) 2 where ci is taken from formula (18) of Brewer and Donadio (2003) as this value of ci appears to perform better than the other values of ci suggested by Brewer and Donadio. 1.4 FRAME-WORK OF THE THESIS The present thesis consists of seven chapters, including the one on introduction. In chapter 2, using quadratic programming and the concept of nearest proportional to size sampling design of Gabler (1987), we have defined an optimal controlled sampling procedure for onedimensional controlled selection problems. The proposed procedure ensures the probabilities of selecting the non-preferred samples exactly equal to zero, rather than minimizing it. The variance estimation for the proposed optimal controlled sampling design using the Yates-Grundy form of Horvitz-Thompson estimator is also discussed. 42 In chapter 3, we have extended the procedure discussed in chapter 2, for multi-dimensional controlled selection problems. Since it is difficult to satisfy the non-negativity condition of H-T estimator for multi-dimensional controlled selection problems, therefore we have defined an estimator for estimating the variance in twodimensional controlled selection problems. A random group method has been suggested for variance estimation in two dimensional controlled selection problems. In chapter 4, a method is suggested for the problem of disclosure control. Using the technique of random rounding, we have introduced a new methodology for protecting the confidential information of tabular data with minimum loss of information. The tables obtained through the proposed method consist of unbiasedly rounded values, are additive and have specified level of confidentiality protection. In chapter 5, we have proposed a new methodology for the sample co-ordination problem. The proposed methodology not only selects the sample in a controlled way but also maximizes or minimizes the overlap of sampling units for the two sample surveys. The two surveys can be conducted simultaneously or sequentially. 43 Variance estimation is also possible with the proposed procedure as the proposed procedure satisfies the non-negativity condition of Horvitz-Thompson (H-T) estimator for variance estimation and in those situations where the non-negativity condition of HorvitzThompson (H-T) estimator is not satisfied, alternative method of variance estimation can be used. In chapter 6, using fuzzy logic approach, we have defined a new methodology for assigning the initial selection probabilities to the different population units. The proposed methodology utilizes all the auxiliary information related with the population units, in assigning the probability to these population units. Superiority of the proposed procedure over the PPS sampling is also discussed. In chapter 7, a brief description of the work done in the preceding chapters has been given. 44 CHAPTER II ON AN OPTIMAL CONTROLLED NEAREST PROPORTIONAL TO SIZE SAMPLING SCHEME 2.1 INTRODUCTION In many field situations, all the possible samples are not equally preferable from the operational point of view, as some samples may be undesirable due to factors such as administrative inconvenience, long distance, similarity of units and cost considerations. Such samples are termed as non-preferred samples and the technique for avoiding these samples, as far as possible, is known as ‘controlled selection’ or ‘controlled sampling’. This technique, originated by Goodman and Kish (1950) has received considerable attention in recent years due to its practical importance. The technique of controlled sampling is most appropriate for sampling situations when financial or other considerations make it necessary to select a small number of large first stage units, such as hospitals, firms, schools etc., for inclusion in the study. The main purpose of controlled selection is to increase probability of sampling a 45 preferred combination beyond that possible with stratified sampling, whilst simultaneously maintaining the initial selection probabilities of each unit of the population, thus preserving the property of a probability sample. This situation generally arises in field experiments where the practical considerations make some units undesirable but the theoretical compulsions make it necessary to follow probability sampling. Controls may be imposed to secure proper distribution geographically or otherwise and to insure adequate sample size for some domains (subgroups) of the population. Goodman and Kish (1950) considered the reduction of sampling variances of the key estimates as the principal objective of controlled selection, but they also cautioned that this might not always be attained. Besides the aspects of long distance, administrative inconvenience, similarity of units and cost considerations, the need for controls may arise because various kinds of information may be desirable from the same survey. A real problem emphasizing the need for controls beyond stratification was also discussed by Goodman and Kish (1950, p.354) with the objective of selecting 21 primary sampling units to represent the North Central States. Hess and Srikantan (1966) used the data for 1961 universe of nonfederal, short-term general medical hospitals in 46 the United States to illustrate the applications of estimation and variance formulae for controlled selection. In his study, Waterton (1983) used the data available from a postal survey of Scottish school leavers carried out in 1977, to describe the advantages of controlled selection and compare the efficiency of controlled selection with multiple proportionate stratified random sampling. Three different approaches have been advanced in the recent literature to implement controlled sampling. These are (i) using typical experimental designs configurations, (ii) method of emptying boxes and (iii) using linear programming approaches. While some researchers have used simple random sampling designs to construct the controlled sampling designs, one of the more popular strategies is the use of IPPS (inclusion probability proportional to size) sampling designs in conjunction with Horvitz-Thompson (1952) estimator. To introduce this strategy, we assume that a known positive quantity xi is associated with the ith unit of the population (yi) and there is reason to believe that yi’s are approximately proportional to xi’s. Here xi is assumed to be known for all units in the population and yi is to be collected for sampled units. In IPPS sampling designs πi, the 47 probability of including the ith unit in a sample of size n, is npi, where pi is the probability of selecting ith unit in the population, given by N pi = xi / ∑ x j , i = 1, 2,…, N. j =1 To construct controlled simple random sampling designs, Chakrabarti (1963) and Avadhani and Sukhatme (1973) proposed the use of balanced incomplete block (BIB) designs with parameters v = N, k = n and λ. Wynn (1977) and Foody and Hedayat (1977) used the BIB designs with repeated blocks for situations where non-trivial BIB designs do not exist. Gupta et al. (1982) studied controlled sampling designs with inclusion probabilities proportional to size and used balanced incomplete block designs in conjunction with the HorvitzThompson estimator of the population total Y. Nigam et al. (1984) used some configurations of different types of experimental designs, including BIB designs, to obtain controlled IPPS sampling plans with the additional property c πi πj ≤ πij ≤ πi πj for all i ≠ j = 1,…,N and some positive constant c such that 0 < c < 1, where πi and πij denote first and second order inclusion probabilities, respectively. Hedayat and Lin (1980) and Hedayat et al. (1989) used the method of ‘emptying boxes’ to construct controlled IPPS sampling designs with 48 the additional property 0 < πij ≤ πi πj, i < j = 1, …, N. Srivastava and Saleh (1985) and Mukhopadhyay and Vijayan (1996) suggested the use of ‘t-designs’ to replace simple random sampling without replacement (SRSWOR) designs to construct controlled sampling designs. All the methods of controlled sampling discussed in the previous paragraph may be carried out by hand with varying degrees of laboriousness, but none take advantage of the power of modern computing. Using the simplex method in linear programming, Rao and Nigam (1990, 1992) proposed the optimal controlled sampling designs that minimize the probability of selecting the non-preferred samples, while retaining certain properties of an associated uncontrolled plan. Utilizing the approach of Rao and Nigam (1990, 1992), Sitter and Skinner (1994) and Tiwari and Nigam (1998) used the simplex method in linear programming to solve multi-way stratification problems with ‘controls beyond stratification’. In the present chapter, we use quadratic programming to propose an optimal controlled sampling design that ensures the probability of selecting non-preferred samples exactly equal to zero, rather than minimizing it, without sacrificing the efficiency of 49 Horvitz-Thompson estimator based on an associated uncontrolled IPPS sampling plan. The idea of ‘nearest proportional to size sampling designs’, introduced by Gabler (1987), is used to construct the proposed design. The Microsoft Excel Solver of Microsoft Office 2000 package has been used to solve the quadratic programming problem. The applicability of Horvitz-Thompson estimator to the proposed design is discussed. The variance of the estimate for the proposed design has been compared with the variances of alternative optimal controlled designs of Rao and Nigam (1990, 1992) and uncontrolled high entropy selection procedures of Goodman and Kish (1950) and Brewer and Donadio (2003). In Section 2.3, some examples have been considered to demonstrate the utility of the proposed procedure by comparing the probabilities of non-preferred samples and variances of the estimates. 2.2 THE OPTIMAL CONTROLLED SAMPLING DESIGN In this section, we use the concept of ‘nearest proportional to size sampling designs’ to propose an optimal controlled IPPS sampling design that matches the original πi values, satisfies the 50 sufficient condition πij ≤ πi πj for non-negativity of the Yates-Grundy (1953) form of the Horvitz-Thompson (1952) estimator of the variance and also ensures the probability of selecting non-preferred samples exactly equal to zero. Consider a population of N units. Suppose a sample of size n is to be selected from this population. Let S and S1 denote respectively, the set of all possible samples and the set of non-preferred samples. Denoting by pi, the initial probabilities associated with the population units, that is, the single draw selection probabilities, we obtain an IPPS design p(s) appropriate for the set of initial probabilities under consideration. In the present discussion, we begin with the MidzunoSen (1952, 1953) IPPS design to demonstrate our procedure, as it is relatively easy to compute the probability of drawing every potential sample under this scheme. However, if the conditions of Midzuno-Sen scheme are not satisfied, we demonstrate that other IPPS sampling without replacement procedures, such as Sampford (1967) procedure, may also be used to obtain the initial IPPS design. In what follows, we first describe the Midzuno-Sen IPPS scheme and then discuss the Sampford’s design for obtaining the original IPPS design p(s). 51 2.2.1 The Midzuno-Sen and Sampford IPPS Designs: The Midzuno-Sen (MS) (1952, 1953) scheme has a restriction that the initial probabilities (pi’s) must satisfy the condition 1 n-1 1 . ≤ pi ≤ , n N-1 n ∀i, (1) If (1) is satisfied for the sampling plan under consideration, we apply the MS scheme to get an IPPS plan with the revised normal size measures pi*’s given by pi* = n pi . N-1 N-n n-1 N-n , i = 1,2,...,N. (2) Now, supposing that the sth sample consists of units i1,i2,…,in, the probability of including these units in the sth sample under MS scheme is given by p(s) = π i1i2 ...in = 1 ( pi1* + p j2* + ... + pin * ) N-1 n − 1 (3) However, due to restriction (1), the MS plan limits the applicability of the method to units that are rather similar in size. Therefore, when the initial probabilities do not satisfy the condition of MS plan, we suggest the use of Sampford (1967) plan to obtain the initial IPPS design p(s). 52 Using Sampford’s scheme, the probability of including n units in the sth sample is given by n p(s) = π i1i2 ...in = n K n λi1 λi 2 ...λi n (1 - ∑ piu ) (4) u =1 n where K n = (∑ t =1 tLn −t −1 ) , λi = pi /(1-pi) and for a set S(m) of m≤ N nt different units, i1,i2,…im, Lm is defined as L0 = 1, Lm = ∑λ λ i1 S (m) i2 ... λim (1≤ m ≤ N ). 2.2.2 The Proposed plan: The idea behind the proposed plan is to get rid of the nonpreferred samples S1 by confining ourselves to the set S–S1 by introducing a new design p0(s) which assigns zero probability of selection to each of the non-preferred samples belonging to S1, given by p 0(s) = { p(s) 1 - ∑ p(s) s ∈ S1 0 , , for s ∈ S-S1 , otherwise where p(s) is the initial uncontrolled IPPS sampling plan. 53 (5) Consequently, p0(s) is no longer an IPPS design. So, applying the idea of Gabler (1987), we are interested for the ‘nearest proportional to size sampling design’ p1(s) in the sense that p1(s) minimizes the directed distance D from the sampling design p0(s) to the sampling design p1(s), defined as = D(p0 , p1 ) p1 - 1 p0 E p0 2 = ∑ s 2 p1 (s) - 1 p0(s) (6) subject to the following constraints: (i) ( ii ) p1(s) ≥ 0 ∑ p (s) s ∈ S - S1 ( iii ) ∑ p (s) s∋i (iv) 1 = 1 = πi ∑ p (s) > ∑ p (s) ≤ s ∋ i,j (v) 1 s ∋ i,j 1 1 (7) 0 πi π j The ordering of the above five constraints is carried out in accordance with their necessity and desirability. Constraints (i) and (ii) are necessary for any sampling design. Constraint (iii), which requires that the selection probabilities in the old and new schemes 54 remain unchanged, is the requirement for IPPS design, which ensures that the resultant design will be IPPS. This constraint is a very strong constraint and it affects the convergence properties of the proposed plan to a great extent. Constraint (iv) is highly desirable because it ensures unbiased estimation of variance. Constraint (v) is desirable as it ensures the sufficient condition for non-negativity of the YatesGrundy estimator of variance. The solution to the above quadratic programming problem, viz., minimizing the objective function (6) subject to the constraints (7), provides us with the optimal controlled IPPS sampling plan that ensures zero probability of selection for the non-preferred samples. The proposed plan is as near as possible to the controlled design p0(s) defined in (5) and at the same time it achieves the same set of first order inclusion probabilities πi, as for the original uncontrolled IPPS sampling plan. Due to the constraints (iv) and (v) in (7), the proposed plan also ensures the conditions πij > 0 and πij ≤ πi πj for YatesGrundy estimator of the variance to be stable and non-negative. The distance measure D(p0 , p1 ) defined in (6) is similar to the χ2-statistics often employed in related problems and is also used by Cassel and Särndal (1972) and Gabler (1987). Few other distance 55 measures are also discussed by Takeuchi et al. (1983). An alternative distance measure for the present discussion may be defined as D( p0 , p1 ) = ∑ s ( p 0 − p1 ) 2 ( p0 + p1 ) (8) Minimization of (8) subject to the constraints (7) using fractional programming will provide us the desired controlled IPPS sampling plan. However, on the basis of different numerical problems considered by us, it was found that the distance function defined in (8) performs almost similar to (6) on convergence as well as on efficiency point of view. Therefore, in this paper we restrict ourselves to the distance function (6) only. It may be noted that while all the other controlled sampling plans discussed by earlier authors attempt to minimize the selection probabilities of the non-preferred samples, the proposed plan completely excludes the possibility of selecting non-preferred samples by ensuring zero probability for them and at the same time it also ensures the non-negativity of Yates-Grundy estimator of the variance. However, in some situations a feasible solution to the quadratic programming problem, satisfying all the constraints in (7), may not exist. In such situations, some of the constraint in (7) may be relaxed. 56 The relaxation on the constraints may be carried out on the basis of their necessity and desirability. As discussed earlier, the first three constraints are necessary for any IPPS design, so they cannot be relaxed. Constraint (iv) is highly desirable for unbiased estimation of variance whereas constraint (v) ensures the sufficient condition for non-negativity of variance estimator. Therefore, if all the constraints are not satisfied for a particular problem, the constraint (v) may be relaxed. This may not guarantee us the non-negativity of the YatesGrundy form of variance estimator. However, since the condition πij ≤ πi πj is sufficient for non-negativity of Yates-Grundy estimator of the variance but not necessary for n > 2, as pointed out by Singh (1954), there will still be a possibility of obtaining a non-negative estimator of the variance. After relaxing the constraint (v) in (7), if the YatesGrungy estimator of the variance comes out to be negative, an alternative variance estimator may be used. This has been demonstrated in Example 5 in Section 3, where the constraint (v) in (7) has been relaxed to obtain a feasible solution of the quadratic programming problem. If even after relaxing the constraint (v), a feasible solution of the quadratic programming problem is not found, the constraint (iv) may also be relaxed and consequently an alternative 57 variance estimator may be considered for use. The affect of relaxing these constraints on efficiency of the proposed design is difficult to study. For some problems the variance is reduced after relaxing the constraint (v) [eg., in the case of Example 2 in Section 2.3 ] while for other problems it is increased [ eg., in the case of Example 1 in Section 2.3 ]. This has been discussed at the end of Section 2.3. The proposed method may also be considered superior to the earlier methods of optimal controlled selection in the sense that setting some samples to have zero selection probability, as in the proposed method, is different from associating a cost with each sample and then trying to minimize the cost, the technique used in earlier approaches of controlled selection, which is a crude approach giving some samples very high cost and others very low. One limitation of the proposed plan is that it becomes impractical when N C n is very large, as the process of enumeration of all possible samples and formation of objective function and constraints becomes quite tedious. This limitation also holds for the optimum approach of Rao and Nigam (1990, 1992) and other controlled sampling approaches discussed in Section 2.1. However, with the advent of faster computing techniques and modern statistical 58 packages, there may not be much difficulty in using the proposed procedure for moderately large populations. Moreover, the practical importance of the proposed method is that it may be used to select a small number of first-stage units from each of a large number of strata. This involves a solution of a series of quadratic programming problems, each of a reasonable size, provided the set of non-preferred samples is specified separately in each stratum. Some discussion on the convergence properties of the proposed procedure may be desirable. As in the case of linear programming, there is no guarantee of convergence of a quadratic programming problem. Kuhn and Tucker (1951) have derived some necessary conditions for optimum solution of a quadratic programming algorithm but no sufficient conditions exist for convergence. Therefore unless the Kuhn-Tucker conditions are satisfied in advance, there is no way of verifying whether a quadratic programming algorithm converges to an absolute (or global) or relative (or local) optimum. Also, there is no way to predict in advance that the solution of a quadratic programming problem exists or not. 59 2.2.3 Comparison of variance of the estimate: To estimate the population mean ( Y ) based on a sample s of size n, we use the HT estimator of Y defined as ∧ Y HT = ∑ i∈s Yi Nπ i (9) Sen (1953) and Yates and Grundy (1953) showed independently that ∧ for fixed size sampling designs Y HT has the variance ∧ V (Y HT ) = 1 N2 N ∑ (π π i < j =1 i j - π ij ) ( Yi πi − Yj πj )2 (10) ∧ and an unbiased estimator of V( Y HT ) is given as ∧ ∧ V (Y HT ) = 1 N2 π i π j − π ij Yi Yj 2 ( − ) π ij πi πj i < j =1 n ∑ (11) As discussed in Section 2.2, the proposed optimal controlled plan satisfies the sufficient condition for Yates-Grundy estimator of variance to be non-negative. Therefore, the non-negativity of the variance estimator (11) is also ensured by the proposed plan. To demonstrate the utility of the proposed procedure in terms of precision of the estimate, the variance of estimate for the proposed procedure obtained through (10) is compared with variance of the HT 60 estimator using the optimal controlled plan of Rao and Nigam (1990, 1992). Moreover, these variances are also compared with those of two uncontrolled high entropy (meaning the absence of any detectable pattern or ordering in the selected sample units) procedures of Goodman and Kish (1950) and Brewer and Donadio (2003). In what follows, we reproduce the expressions for the variances of these two high entropy procedures. ∧ The expression for variance of Y HT correct to O (N-2) using the procedure of Goodman and Kish (1950) is given as ∧ V (Y HT ) GK = 1 2 2 2 p A − (n − 1) ∑ pi Ai 2 ∑ i i nN i∈S i∈S n −1 3 2 2 − 2∑ pi Ai − ∑ pi 2 nN i∈S i∈S where Ai = 2 pi Ai − 2 ∑ pi Ai ∑ i∈S i∈S 2 2 2 (12) N Yi − Y and Y = ∑ Yi . pi i =1 Recently, Brewer and Donadio (2003) derived the πij-free formula for high entropy variance of HT estimator. They showed that the performance of this variance estimator, under conditions of high entropy, was reasonably good for all populations. Their expression for the variance of the HT estimator is given by 61 ∧ V (Y HT ) BD = 1 N2 ∑π i∈S −1 i (1 − ci π i )(Yi π i − Y n −1 ) 2 (13) where ci is taken from formula (18) of Brewer and Donadio (2003) as this value of ci appears to perform better than the other values of ci suggested by Brewer and Donadio. 2.3 EXAMPLES In this section, we consider some numerical examples to demonstrate the utility of the proposed procedure and compare it with the existing procedures of optimal controlled sampling. The variance of the estimate under proposed plan is also compared with the existing procedures of optimal controlled selection and uncontrolled high entropy selection procedures. Example 1: Let us consider a population consisting of six villages, borrowed from Hedayat and Lin (1980) and we have to draw a sample of 3 villages from these 6 villages. The set S of all possible samples consists of 20 samples each of size n = 3 and are given as follows 123; 124; 125; 126; 134; 135; 136; 145; 146; 156; 234; 235; 236; 245; 246; 256; 345; 346; 356; 456; 62 Due to the considerations of travel, organization of fieldwork and cost considerations, Rao and Nigam (1990) identified the following 7 samples as non-preferred samples: 123; 126; 136; 146; 234; 236; 246; (a): Let the Yi and pi values associated with the six villages of the population are: Yi : 12 15 17 24 17 19 pi : 0.14 0.14 0.15 0.16 0.22 0.19 Since the pi values satisfy the condition (1), we apply MS scheme to get an IPPS plan with the revised normal size measures pi*’s. Using (2), the values of pi*’s are given as follows p1*’s=.0333; p2*’s=.0333; p3*’s=.0833; p4*’s=.1333; p5*’s=.4333; p6*’s=.2833; Now we calculate the values of p(s), which gives the probability of including the units i, j, k in the sample. Since we are applying the MS scheme, thus using (3) we get the values of p(s) as follows. p(s1) = .015; p(s2) = .020; p(s3)= .05; p(s4) = .035; p(s5) = .025; p(s6)= .055; p(s7) = .04; p(s8) = .060; p(s9) = .045; p(s10) =.075; p(s11) = .025; p(s12) =.055; p(s13) = .04; p(s14) = .06; p(s15) = .045; p(s16) = .075; 63 p(s17) = .065; p(s18) = .05; p(s19) = .080; p(s20) = .085; Now in order to assigns the zero probability of selection to each of the non-preferred sample belonging to S1, we use the design p0(s) given by (5). After assigning the zero probability to nonpreferred samples, values of p0(s) for preferred sample combinations are given as follows. p0(s1)=.0265; p0(s2)=.0663; p0(s3)=.0332; p0(s4)=.07285; p0(s5)=.0795; p0(s6)=.0994; p0(s7)=.0729; p0(s8)=.0795; p0(s9)=.0993; p0(s10)=.0861; p0(s11)=.0662; p0(s12)=.1060; p0(s13)=.1126; Since p0(s) is no longer an IPPS design, thus we apply the proposed model to minimize the directed distance D from the sampling design p0(s) to the sampling design p1(s) and also to satisfy the constraints in model (7). The proposed model is given as follows Minimize z = 37.75*p1(s)^2+15.1*p2(s)^2+30.2*p3(s)^2+13.7*p4(s)^2 +12.58*p5(s)^2+10.06*p6(s)^2+13.72*p7(s)^2+12.58*p8(s)^2+10.06* p9(s)^2+11.61*p10(s)^2+15.1*p11(s)^2+9.43*p12(s)^2+8.88*p13(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+ p11(s)+p12(s)+p13(s) = 1 64 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s) = 0.42 3. p1(s)+p2(s)+p7(s)+p8(s)+p9(s) = 0.42 4. p3(s)+p4(s)+p7(s)+p10(s)+p11(s)+p12(s) = 0.45 5. p1(s)+p3(s)+p5(s)+p8(s)+p10(s)+p11(s)+p13(s) = 0.48 6. p2(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p12(s)+p13(s)= 0.66 7. p6(s)+p9(s)+p11(s)+p12(s)+p13(s) = 0.57 8. p1(s)+p2(s)≤ 0.1764 (14) 9. p3(s)+p4(s) ≤ 0.189 10. p1(s)+p3(s)+p5(s) ≤ 0.2016 11. p2(s)+p4(s)+p5(s)+p6(s) ≤ 0.2772 12. p6(s) ≤ 0.2394 13. p7(s) ≤ 0.189 14. p1(s)+p8(s) ≤ 0.2016 15. p2(s)+p7(s)+p8(s)+p9(s) ≤ 0.2772 16. p9 (s) ≤ 0.2394 17. p3(s)+p10(s)+p11(s) ≤ 0.216 18. p4(s)+p7(s)+p10(s)+p12(s) ≤ 0.297 19. p11(s)+p12(s) ≤ 0.2565 20. p5(s)+p8(s)+p10(s)+p13(s) ≤ 0.3168 21. p11(s)+p13(s) ≤ 0.2736 22. p6(s)+p9(s)+p12(s)+p13(s) ≤ 0.3762 23. pi ( s ) ≥ 0 , i = 1,2,…,13. 24. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,13. 65 After solving the above quadratic programming problem through the Microsoft Excel Solver of Microsoft Office 2000 package, we obtain the following controlled IPPS plan given in Table1. The value of D(p0 , p1 ) comes out to be 0.997349. This plan matches the original πi values, satisfies the condition πij ≤ πi πj and ensures the probability of selecting non-preferred samples exactly equal to zero. Obviously, due to the fulfillment of the condition πij ≤ πi πj, we can apply the Yates-Grundy form of HT estimator for estimating the variance of the proposed plan. Table 1 Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS) and Sampford’s (SAMP) schemes for Example 1 s p1(s) [MS] p1(s) [SAMP] s p1(s) [MS] p1(s) [SAMP] 124 .142800 .085500 245 .029996 .123331 125 .033600 .049500 256 .126206 .139478 134 0 0 345 .018801 .058944 135 .087301 .026864 346 .197199 .104500 145 .030104 .063669 356 .059301 .057500 156 .126195 .074467 456 .061099 .164056 235 .087398 .052191 66 We also solved the above example using plan (3) with specified πij’s taken from Sampford’s plan [to be denoted by RN3] and plan (4) [to be denoted by RN4] of Rao and Nigam (1990). Using RN3 the probability of non- preferred samples (φ) comes out to be 0.155253 and using RN4 with c = .005, φ comes out to be zero, whereas the proposed plan always ensures zero probability to nonpreferred samples. ∧ The value of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, the Randomized Systematic IPPS sampling plan of Goodman and Kish (1950) [to be denoted by GK] and uncontrolled high entropy sampling plan of Brewer and Donadio (2003) [to be denoted by BD ] are produced in the first row of Table 2. It is clear from Table 2 that the proposed plan yields almost same value of variance of the H-T ∧ estimator as yielded by RN4. The value of V ( Y HT ) for the proposed plan is slightly higher than those obtained from RN3, GK and BD. This increase in variance may be acceptable given the elimination of undesirable samples by the proposed plan. 67 Table 2 ∧ Values of V ( Y HT ) for the Proposed, RN3, RN4, GK and BD plans ∧ V ( Y HT ) Ex 1(a) N=6, n=3 Ex 1(b) N=6, n=3 Ex 2(a) N=7, n=3 Ex 2(b) N=7, n=3 Ex 3(a) N=8, n=3 Ex 3(b) N=8, n=3 Ex 4(a) N=8, n=4 Ex 4(b) N=8, n=4 Ex 5 N=7, n=4 RN3 RN4 GK BD 2.9303 4.0241 3.0336 2.9186 Proposed Plan 4.0570 4.7574 5.0690 4.8945 4.1540 4.7842 4.4759 5.0085 4.6105 4.4471 3.5635 11.9668 14.5196 12.2502 11.4426 9.4890 4.8539 4.2893 4.9573 4.8364 3.9023 7.2924 8.4286 7.7350 7.3716 8.1676 3.1854 3.4631 3.2266 3.1538 3.7450 2.4094 2.5266 2.5441 2.3845 2.2545 3.0756 3.9294 3.1215 3.0746 5.0996 (b): Now suppose that the initial pi values for the above population of 6 units are as follows: pi : 0.10 0.15 0.10 0.20 0.27 0.18 Since these values of pi do not satisfy the condition (1) of Midzuno-Sen plan, we apply Sampford (1967) plan to get the initial p(s) values. Applying the method discussed in Section 2.2 and solving the resultant quadratic programming problem, we obtain the 68 controlled IPPS plan given in Table 1. This plan again ensures zero probability to non- preferred samples and satisfies the non-negativity condition for Yates-Grundy form of HT variance estimator. This example was also solved by RN3 and RN4 plans. The value of φ for RN3 plan comes out to be 0.064135 and the value of φ for RN4 with c= .005 comes out to be zero. However, the proposed plan ensures zero probability to non-preferred samples. ∧ The values of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, GK plan and BD plan are produced in the second row of Table 2. The proposed plan appears to perform better than RN4 and GK and quite close to other plans considered by us for this problem. Example 2: We consider the following population borrowed from Avadhani and Sukhatme (1973) consisting of seven villages. There are 35 possible samples, each of size n = 3. For reasons of travel and organization of fieldwork, Avadhani and Sukhatme (1973) considered the following 7 samples as non-preferred in addition to 7 nonpreferred samples considered in Example 1: 137; 147; 167; 237; 69 247; 347; 467. (a): Suppose that the values of Yi and their corresponding initial probabilities of selection associated with the seven villages of the population are as follows: Yi: 12 15 17 24 17 19 25 pi : 0.12 0.12 0.13 0.14 0.20 0.15 0.14. Since the pi values satisfy the condition (1), we apply MS scheme to get an IPPS plan with the revised normal size measures pi*’s, given by (2). Applying the method discussed in Section 2.2, we obtain the following controlled IPPS plan given in Table 3. This plan again matches the original πi values, satisfies the condition πij ≤ πi πj and ensures the probability of selecting non-preferred samples exactly equal to zero. Solving the above problem using RN3 the probability of nonpreferred samples (φ) comes out to be 0.064972 and using RN4 with c= 0.5, φ comes out to be zero, whereas the proposed plan always ensures zero probability to undesirable samples. 70 Table 3 Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS) and Sampford’s (SAMP) schemes for Example 2 s 124 125 127 134 135 145 156 157 235 245 256 p1(s) [MS] .054498 .024684 .050419 .060546 .038670 .036156 .050303 .044725 .037890 .042312 .049417 p1(s) [SAMP] .014364 .018896 .017956 .034880 .037400 .024912 .061530 .030061 .035531 .023802 .058697 s 257 267 345 346 356 357 367 456 457 567 p1(s) [MS] .024245 .076536 .024244 .079009 .017499 .053149 .078992 .064772 .058462 .033472 p1(s) [SAMP] .028750 .042004 .044368 .076730 .104590 .052435 .094066 .074300 .036644 .088083 ∧ The value of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, GK plan and BD plan are produced in the third row of Table 2. The proposed plan appears to perform better than the other plans considered by us for this problem. (b): Now consider the pi values for the 7 population units as follows: pi : 0.08 0.08 0.16 0.11 0.24 0.20 0.13 Since these pi values do not satisfy the condition (1) of MS plan, we apply Sampford (1967) plan to get the initial p(s) values. Applying the method discussed in Section 2.2 and solving the resultant quadratic programming problem, we obtain the controlled 71 IPPS plan given in Table 3. This plan again ensures zero probability to non- preferred samples and satisfies the non-negativity condition for Yates-Grundy form of HT variance estimator. This example was also solved by RN3 and RN4 plans. The value of φ for RN3 plan comes out to be 0.04511 and the value of φ for RN4 with c = 0.5 comes out to be zero. However, the proposed plan ensures zero probability to non-preferred samples. ∧ The values of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, GK plan and BD plan are produced in the fourth row of Table 2. The proposed plan appears to perform better than the other plans considered by us for this problem. Example 3: We now consider a population with N = 8 and n = 3, borrowed from Rao and Nigam (1990). The set of all possible samples S contains 56 samples. Based on the considerations similar to those of Avadhani and Sukhatme (1973), Rao and Nigam (1990) considered the following 7 samples as non-preferred in addition to the 14 nonpreferred samples considered in Example 2: 128; 178; 248; 458; 468; 478; 578. 72 (a): Suppose the following Yi and pi values are associated with the eight villages of the population: Yi : 12 15 17 24 17 19 25 18 pi : 0.10 0.10 0.11 0.12 0.18 0.13 0.12 0.14 Since the pi values satisfy the condition (1), we apply the MS scheme to get an IPPS scheme with revised normal size measures pi*’s computed from (2). Table 4 Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS) and Sampford’s (SAMP) schemes for Example 3 s 124 125 127 134 135 138 145 148 156 157 158 168 235 238 245 256 257 258 p1(s) [MS] .020748 .020145 .015713 .017626 .021201 .017288 .023252 .046374 .022315 .033000 .031268 .031070 .017266 .016833 .042378 .017710 .028683 .030255 p1(s) [SAMP] .020605 .003443 .012351 .004485 .020702 .017661 .001291 .041119 .001099 .005024 .010069 .012151 .029730 .053539 .034869 0 .005300 .007658 73 s 267 268 278 345 346 348 356 357 358 367 368 378 456 457 567 568 678 p1(s) [MS] .024882 .030459 .034927 .020322 .039436 .041417 .017750 .029863 .013255 .027937 .021454 .028354 .048615 .059832 .031326 .031564 .045482 p1(s) [SAMP] .017902 .031297 .053306 .016724 .094625 .154166 .015496 .043015 .008111 .051224 .014276 .076247 .031530 .050585 0 .015354 .045045 Applying the method discussed in Section 2.2 and solving the resulting quadratic programming problem, we obtain the optimal controlled IPPS plan demonstrated in Table 4. This plan also matches the original πi values, satisfies the condition πij ≤ πi πj and excludes the possibility of selecting the non-preferred samples. Using RN3 the probability of non- preferred samples φ comes out to be 0.121614 and using RN4 with c= .005, φ comes out to be zero, whereas the proposed plan always ensures zero probability to non- preferred samples. ∧ The values of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, GK plan and BD plan are produced in the fifth row of Table 2. The proposed plan appears to perform better than the other plans considered by us for this problem. (b): Now suppose that the initial pi values for the above population of 8 units are as follows: pi : 0.05 0.09 0.20 0.15 0.10 0.11 0.12 0.18 Since these values of pi do not satisfy the condition (1) of MS plan, we apply Sampford (1967) plan to get the initial p(s) values. 74 Applying the method discussed in Section 2.2, we obtain the controlled IPPS plan given in Table 4. This plan again ensures zero probability to non- preferred samples and satisfies the non-negativity condition for Yates-Grundy form of HT variance estimator. This example was also solved by RN3 and RN4 plans. The value of φ for RN3 plan comes out to be 0.166792 and the value of φ for RN4 with c= .005 comes out to be zero. However, the proposed plan again ensures zero probability to non-preferred samples. ∧ The values of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, GK plan and BD plan are produced in the sixth row of Table 2. The proposed plan appears to perform better than RN4 and quite close to other plans considered by us for this problem. Example 4: Now we reconsider the population of 8 units and suppose that a sample of size n = 4 is to be selected from this population. The set of all possible samples contain 70 samples. Based on the considerations similar to those of Avadhani and Sukhatme (1973), suppose that the following 28 samples are non-preferred for reasons of travel and organization of field work: 1234; 1236; 1238; 1246; 75 1248; 1268; 1346; 1348; 1357; 1456; 1468; 1567; 1568; 1678; 2345; 2346; 2456; 2468; 2567; 2568; 2678; 3456; 3468; 3567; 3678; 4567; 4678; 5678. (a): Suppose that the following pi values are associated with the eight villages of the population, the Yi’s being the same as considered for Example 3: pi: 0.11 0.11 0.12 0.13 0.17 0.12 0.11 0.13 As the pi values satisfy the condition (1), we apply the MS scheme to obtain the initial p(s) values. Using the method discussed in Section 2.2, the optimal controlled IPPS plan is demonstrated in Table 5. This plan again matches the original πi values, satisfies the condition πij ≤ πi πj and the probability of selecting non-preferred samples is reduced to zero. Using RN3 the probability of non- preferred samples (φ) comes out to be 0.049625 and using RN4 plan with c = .005, φ comes out to be zero, whereas the proposed plan always ensures zero probability to non- preferred samples. ∧ The values of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, GK plan and BD plan are given in the seventh row of Table 2. 76 ∧ The values of V( Y HT ) for the proposed plan is slightly higher than the other plans considered by us for this problem. This may be acceptable due to elimination of non-preferred samples by the proposed plan. Table 5 Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS) and Sampford’s (SAMP) schemes for Example 4 s 1235 1237 1245 1247 1256 1257 1258 1267 1278 1345 1347 1356 1358 1367 1368 1378 1457 1458 1467 1478 1578 p1(s) [MS] .002827 .003892 .043684 .004447 .059600 .009386 .008253 .016110 .011690 .042009 .007307 .024070 .004769 .018232 .018501 .011333 .039377 .042688 .035251 .014037 .022538 p1(s) [SAMP] 0 0 .008430 0 .009182 .029193 0 .036769 .015483 .030784 0 .048805 0 .061419 .021215 .049737 .011047 0 .024208 0 .013725 77 s 2347 2348 2356 2357 2358 2367 2368 2378 2457 2458 2467 2478 2578 3457 3458 3467 3478 3568 3578 4568 4578 p1(s) [MS] .002772 .010933 .051364 .024836 .017733 .020084 .035003 .018755 .012706 .027376 .029039 .002203 .027308 .016542 .014911 .016465 .043945 .046681 .027036 .109600 .004708 p1(s) [SAMP] 0 .019388 .051644 .039983 0 .057759 .023368 .005639 .015937 .003187 .022877 .003298 .017862 .042995 .010210 .108967 .069871 .024985 .053230 .068800 0 (b): Now suppose that the initial pi values for the above population of 8 units are as follows: pi : 0.09 0.09 0.18 0.11 0.12 0.14 0.17 0.10 Since these values of pi do not satisfy the condition (1) of MS plan, we apply Sampford (1967) plan to get the initial p(s) values. Applying the method discussed in Section 2.2, we obtain the controlled IPPS plan given in Table 5. This plan again ensures zero probability to non- preferred samples and satisfies the non-negativity condition for Yates-Grundy form of HT variance estimator. This example was also solved by RN3 and RN4 plans. The value of φ for RN3 plan comes out to be 0.13128 and the value of φ for RN4 plan with c = .005 comes out to be zero. However, the proposed plan again ensures zero probability to non-preferred ∧ samples. The value of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, GK plan and BD plan are produced in the 8th row of Table 2. The proposed plan appears to perform better than the other plans considered by us for this problem. Example 5: We now consider one more example to demonstrate the situation where the proposed plan fails to provide a feasible solution 78 satisfying all the constraints in (7). In such situations, we have to drop a constraint in (7) to obtain a feasible solution of the related quadratic programming problem. Consider a population of seven villages. Suppose a sample of size n = 4 is to be drawn from this population. There are 35 possible samples, out of which the following 14 are considered as nonpreferred: 1234; 1236; 1246; 1346; 1357; 1456; 1567; 2345; 2346; 2456; 2567; 3456; 3567; 4567. Suppose that the following pi values are associated with the 7 villages: pi: 0.14 0.13 0.15 0.13 0.16 0.15 0.14. Since the pi values satisfy the condition (1), we apply the MS plan and solve the quadratic programming problem by the method discussed in Section 2.2. However, no feasible solution of the related quadratic programming problem exists in this case. Consequently, we drop the constraint (v) in (7) for this particular problem to obtain a feasible solution of the quadratic programming problem. The controlled IPPS plan obtained for this problem is given in Table 6. This plan also matches the original πi values and ensures the probability of selecting the non-preferred samples exactly equal to zero. However, due to non-fulfillment of the condition πij ≤ πi πj for 79 this example, the non-negativity of the Yates-Grundy estimator of the variance is not ensured. Table 6 Optimal Controlled IPPS Plan Corresponding to Midzuno-Sen Scheme of Example 5 s 1235 1245 1247 1256 1257 1267 1345 p1(s) .02985 .050183 .012319 .083296 .01466 .030735 .067399 s 1347 1356 1367 1457 1467 2347 2356 p1(s) .018881 .104753 .041471 .047525 .058928 .021368 .104519 s 2357 2367 2457 2467 3457 3467 p1(s) .026409 .044308 .047515 .054839 .063891 .077151 Solving this problem using RN3 the probability of nonpreferred samples φ comes out to be 0.297746 and using RN4 with c= 0.5, φ comes out to be 0.1008, whereas the proposed plan ensures zero probability to non- preferred samples. ∧ The values of V( Y HT ) for the proposed plan, RN3 plan, RN4 plan, the GK plan and BD plan are produced in the last row of Table ∧ 2. The value of V( Y HT ) for the proposed plan does not appears to be satisfactory for this problem due to the non-fulfillment of the 80 condition πij ≤ πi πj. For this type of problems, we suggest the use of some another estimator in place of HT estimator. To check the affect of dropping the constraint (v) in (7) on efficiency of the estimator, we have also solved Example 1 and Example 2 without this constraint. These values are obtained as 4.30617 and 4.81743 for Example 1 (a) and (b) respectively and 3.51507 and 8.82203 for Example 2 (a) and (b) respectively. This ∧ shows that while for Example 1, the values of V (Y HT ) without the ∧ constraint (v) are greater than the corresponding value of V (Y HT ) with this constraint, they are lesser than the corresponding value of ∧ V( Y HT ) with this constraint in the case of Example 2. 81 APPENDIX 2.0 Example 1(b): Here we have N=6 and n=3. Values of Yi and pi are given as follows: Yi : 12 pi : 0.10 15 17 24 17 19 0.15 0.10 0.20 0.27 0.18 All possible combinations are as follows: 123; 124; 125; 126; 134; 135; 136; 145; 146; 156; 234; 235; 236; 245; 246; 256; 345; 346; 356; 456; Suppose following 6 sample combinations are considered as non-preferred sample combinations: 123; 126; 136; 146; 234; 236; 246. To find out the initial p(s) values, we apply the Sampford (1967) plan and get the following values of p(s) (only of preferred sample combinations). p(s1) = .0189; p(s2) = .0469; p(s3) = .011; p(s4) = .0271; p(s5) = .077; p(s6) = .063; p(s7) = .0469; p(s8) = .1299; p(s9) = .107; p(s10) =.077; p(s11) = .0256; p(s12) =.063; p(s13) = .1717. Now to assign the zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows p0(s1)=.0218; p0(s2)=.0542; p0(s3)=.0124; p0(s4)=.0313; p0(s5)=.089; p0(s6)=.0729; p0(s7)=.054; p0(s8)=.1501; p0(s9)=.1237; p0(s10)=.089; p0(s11)=.0296; p0(s12)=.0729; p0(s13)=.1984. 82 After getting the values of p0(s), we apply the proposed model. The objective function and the constraints for this example are given as follows Minimize z = 45.75*p1(s)^2+18.44*p2(s)^2+80.07*p3(s)^2+31.89*p4 (s)^2+11.23*p5(s)^2+13.71*p6(s)^2+18.44*p7(s)^2+6.65*p8(s)^2+8.0 8*p9(s)^2+11.23*p10(s)^2+33.73*p11(s)^2+13.71*p12(s)^2+5.03*p13(s )^2-1 Subject to the constraints defined in (14), with the change in the values of right hand side as follows 1, 0.3, 0.45, 0.3, 0.6, 0.81, 0.54, 0.135, 0.09, 0.18, 0.243, 0.162, 0.135, 0.27, 0.3645, 0.243, 0.18, 0.243, 0.162, 0.486, 0.324, 0.4374. After solving the above model, we get the desired results displayed in table 1, with the value of the D(p0 , p1 ) as .42146. Example 2(a): Let us consider a population consisting of 7 villages and we have to draw a sample of 3 villages. The set S of all possible samples consists of 35 samples and are given as follows 123; 124; 125; 126; 127; 134; 135; 136; 137; 145; 146; 147; 156; 157; 167; 234; 235; 236; 237; 245; 246; 247; 256; 257; 267; 345; 346; 347; 356; 357; 367; 456; 457; 467; 567. Suppose the non-preferred sample combinations are: 123; 126; 136; 137; 146; 237; 246; 247; 347; 467; 147; 167; 234; 236; and the values of Yi and pi are: Yi : 12 15 17 24 17 19 25 pi : .12 .12 .13 .14 .20 .15 .14 83 Since the pi values satisfy the condition (1), we apply MS scheme to get an IPPS plan with the revised normal size measures pi*’s. Using (2), the values of pi*’s are given as follows p1*’s=.04; p2*’s=.04; p3*’s=.085; p5*’s=.4; p6*’s=.175; p7*’s=.13. p4*’s=.13; The values of p(s) for preferred sample combinations are given as follows p(s1) = .014; p(s2) = .032; p(s3) = .014; p(s4) = .017; p(s5) = .035; p(s6) = .038; p(s7) = .041; p(s8) = .038; p(s9) = .035; p(s10) =.038; p(s11) = .041; p(s12) =.038; p(s13) = .023; p(s14) =.041; p(s15) = .026; p(s16) = .044; p(s17) =.041; p(s18) = .026; p(s19) =.047; p(s20) = .044; p(s21) = .047. Now to assign zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows p0(s1)=.0194; p0(s2)=.0444; p0(s3)=.0194; p0(s4)=.0236; p0(s5)=.0486; p0(s6)=.0527; p0(s7)=.0569; p0(s8)=.0527; p0(s9)=.0486; p0(s10)=.0527; p0(s11)=.0569; p0(s12)=.0527; p0(s13)=.0319; p0(s14)=.0569; p0(s15)=.0361; p0(s16)=.0611; p0(s17)=.0569; p0(s18)=.0361; p0(s19)=.0652; p0(s20)=.0611 p0(s21)=.0652. After getting the values of p0(s), we apply the proposed model. The objective function and the constraints for this example are given as follows 84 Minimize z = 51.42*p1(s)^2+22.5*p2(s)^2+51.42*p3(s)^2+42.35*p4(s )^2+20.57*p5(s)^2+18.94*p6(s)^2+17.56*p7(s)^2+18.94*p8(s)^2+20. 57*p9(s)^2+18.94*p10(s)^2+17.56*p11(s)^2+18.94*p12(s)^2+31.3*p13( s)^2+17.56*p14(s)^2+27.69*p15(s)^2+16.36*p16(s)^2+17.56*p17(s)^2+ 27.69*p18(s)^2+15.31*p19(s)^2+16.36*p20(s)^2+15.31* p21(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11( s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+ p21(s) = 1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s) = 0.36 3. p1(s)+p2(s)+p3(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s) = 0.36 4. p4(s)+p5(s)+p9(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s) = 0.39 5. p1(s)+p4(s)+p6(s)+p10(s)+p14(s)+p15(s)+p19(s)+p20(s) = 0.42 6. p2(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p14(s)+ p16(s)+p17(s)+p19(s)+p20(s)+p21(s) = 0.6 7. p7(s)+p11(s)+p13(s)+p15(s)+p16(s)+p18(s)+p19(s)+p21(s) = 0.45 8. p3(s)+p8(s)+p12(s)+p13(s)+p17(s)+p18(s)+p20(s)+p21(s) = 0.42 9. p1(s)+p2(s)+p3(s) ≤ 0.1296 10. p4(s)+p5(s)≤0.1404 (15) 11. p1(s)+p4(s)+p6(s) ≤ 0.1512 12. p2(s)+p5(s)+p6(s)+p7(s)+p8(s) ≤ 0.216 13. p7(s) ≤ 0.162 14. p3(s)+p8(s) ≤ 0.1512 15. p9(s) ≤ 0.1404 16. p1(s)+p10(s) ≤ 0.1512 85 17. p2(s)+p9(s)+p10(s)+p11(s)+p12(s) ≤ 0.216 18. p11(s)+p13(s) ≤ 0.162 19. p3(s)+p12(s)+p13(s) ≤ 0.1512 20. p4(s)+p14(s)+p15(s) ≤ 0.1638 21. p5(s)+p9(s)+p14(s)+p16(s)+p17(s) ≤ 0.234 22. p15(s)+p16(s)+p18(s) ≤ 0.1755 23. p17(s)+p18(s) ≤ 0.1638 24. p6(s)+p10(s)+p14(s)+p19(s)+p20(s) ≤ 0.252 25. p15(s)+p19(s) ≤ 0.189 26. p2(s) ≤ 0.1764 27. p7(s)+p11(s)+p16(s)+p19(s)+p21(s) ≤ 0.27 28. p8(s)+p12(s)+p17(s)+p20(s)+p21(s) ≤ 0.252 29. p13(s)+p18(s)+p21(s) ≤ 0.189 30. pi ( s ) ≥ 0 , i = 1,2,…,21. 31. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,21. After solving the above model, we get the desired results displayed in table 3, with the value of the D(p0 , p1 ) as 0.439125. Example 2(b): For this example the population size is N=7 and the sample size is n=3, thus the set S of all possible samples and the set of non-preferred samples will remain the same as in part (a) of this example. The Yi and pi values associated with the 7 villages of the population are: Yi : 12 15 17 24 pi : .08 .08 .16 .11 86 17 19 25 .24 .2 .13 To find out the initial p(s) values we apply the Sampford (1967) plan and get the following values of p(s) (only of preferred sample combinations). p(s1) = .0032; p(s2) = .0135; p(s3) = .0040; p(s4) = .0082; p(s5) = .0343; p(s6) = .0201; p(s7) = .0515; p(s8) = .0251; p(s9) = .0343; p(s10) =.0201; p(s11) = .0515; p(s12) =.0251; p(s13) = .0157; p(s14) =.0504; p(s15) = .0318; p(s16) = .1254 p(s17) =.0628; p(s18) = .0398; p(s19) =.0753; p(s20) = .0371; p(s21) = .0934. Now to assign zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows. p0(s1)=.1835; p0(s2)=.0225; p0(s3)=.0066; p0(s4)=.0137; p0(s5)=.0570; p0(s6)=.0334; p0(s7)=.0855; p0(s8)=.0418; p0(s9)=.0570; p0(s10)=.0334; p0(s11)=.0855; p0(s12)=.0418; p0(s13)=.0261; p0(s14)=.0838; p0(s15)=.0529; p0(s16)=.2084; p0(s17)=.1044; p0(s18)=.0661; p0(s19)=.1251; p0(s20)=.0616; p0(s21)=.1552. Now the objective function and the constraints are given as follows Minimize z = 5.45*p1(s)^2+44.42*p2(s)^2+150.98*p3(s)^2+73.23*p4 (s)^2+17.53*p5(s)^2+29.98*p6(s)^2+11.69*p7(s)^2+23.93*p8(s)^2+1 7.53*p9(s)^2+29.98*p10(s)^2 +11.69*p11(s)^2+23.93*p12(s)^2+38.25* p13(s)^2+11.93*p14(s)^2+18.91*p15(s)^2+4.8*p16(s)^2+9.58*p17(s)^2+ 15.14*p18(s)^2+7.99*p19(s)^2+16.23*p20(s)^2+6.45* p21(s)^2-1 87 Subject to the constraints defined in (15), with the change in the values of right hand side as follows 1, 0.24, 0.24, 0.48, 0.33, 0.72, 0.60, 0.39, 0.0576, 0.1152, 0.0792, 0.1728, 0.144, 0.0936, 0.1152, 0.0792, 0.1728, 0.144, 0.0936, 0.1584, 0.3456, 0.288, 0.1872, 0.2376, 0.198, 0.1287, 0.432, 0.2808, 0.234 After solving the above model, we get the desired results displayed in table 3, with the value of the D(p0 , p1 ) as 0.274255 Example 3(a): Consider a population consisting of 8 villages and we have to draw a sample of 3 villages. The set S of all possible samples consists of 56 samples and are given as follows 123; 124; 125; 126; 127; 128; 134; 135; 136; 137; 138; 145; 146; 147; 148; 156; 157; 158; 167; 168; 178; 234; 235; 236; 237; 238; 245; 246; 247; 248; 256; 257; 258; 267; 268; 278; 345; 346; 347; 348; 356; 357; 358; 367; 368; 378; 456; 457; 458; 467; 468; 478; 567; 568; 578; 678. The non-preferred sample combinations for this population are: 123; 126; 128; 136; 137; 146; 147; 167; 178; 248; 234; 236; 237; 246; 247; 347; 458; 467; 468; 478; 578. The values of Yi and pi are given as follows: Yi : 12 pi : .10 15 .10 17 .11 24 .12 88 17 19 25 18 .18 .13 .12 .14 Since the pi values satisfy the condition (1), we apply MS scheme to get an IPPS plan with the revised normal size measures pi*’s. Using (2), the values of pi*’s are given as follows p1*’s=.02; p2*’s=.02; p3*’s=.062; p4*’s=.104; p5*’s=.356; p6*’s=.146; p7*’s=.104; p8*’s=.188. The values of p(s) for preferred sample combinations are given as follows. p(s1) = .0069; p(s2) = .0189; p(s3) = .0069; p(s4) = .0089; p(s5) = .0209; p(s6) = .0129; p(s7) = .0229; p(s8) = .0149; p(s9) = .0249; p(s10) =.0229; p(s11) = .0269; p(s12) =.0169; p(s13) = .0209; p(s14) =.0129; p(s15) = .0229; p(s16) = .0249; p(s17) =.0229; p(s18) = .0269; p(s19) =.0129; p(s20) = .0169; p(s21) = .0149; p(s22) =.0249; p(s23) = .0149; p(s24) =.0169; p(s25) = .0269; p(s26) = .0249; p(s27) =.0289; p(s28) = .0149; p(s29) =.0189; p(s30) = .0169; p(s31) = .0289; p(s32) =.0269; p(s33) = .0289; p(s34) =.0329; p(s35) = .0209. Now to assign zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows. p0(s1)=.0097; p0(s2)=.027; p0(s3)=.0097; p0(s4)=.0125; p0(s5)=.0295; p0(s6)=.0182; p0(s7)=.0324; p0(s8)=.0210; p0(s9)=.0352; p0(s10)=.0323; p0(s11)=.0380; p0(s12)=.0238; p0(s13)=.0295; p0(s14)=.0182; p0(s15)=.0323; p0(s16)=.0352; p0(s17)=.0323; p0(s18)=.0380; p0(s19)=.0182; p0(s20)=.0238; p0(s21)=.0210; p0(s22)=.0352; p0(s23)=.0210; p0(s24)=.0238; 89 p0(s25)=.0380; p0(s26)=.0352; p0(s27)=.0408; p0(s28)=.0210; p0(s29)=.026; p0(s30)=.0238; p0(s31)=.0408; p0(s32)=.0380; p0(s33)=.0408; p0(s34)=.0465; p0(s35)=.0295. Now the objective function and the constraints for this example are given as follows Minimize z = 102.96*p1(s)^2+37.44*p2(s)^2+102.96*p3(s)^2+79.71* p4(s)^2+33.85*p5(s)^2+54.91*p6(s)^2+30.89*p7(s)^2+47.52*p8(s)^2+ 28.40*p9(s)^2+30.89*p10(s)^2+26.29*p11(s)^2+41.88*p12(s)^2+33.85 *p13(s)^2+54.91*p14(s)^2+30.89*p15(s)^2+28.40*p16(s)^2+30.89*p17( s)^2+26.29*p18(s)^2+54.91*p19(s)^2+41.88*p20(s)^2+47.52*p21(s)^2+ 28.40*p22(s)^2+47.52*p23(s)^2+41.88*p24(s)^2+26.29*p25(s)^2+28.40 *p26(s)^2+24.47*p27(s)^2+47.52*p28(s)^2+37.44*p29(s)^2+41.88*p30( s)^2+24.47*p31(s)^2+26.29*p32(s)^2+24.47*p33(s)^2+2.49*p34(s)^2+3 3.85*p35(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s )+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+ p19(s)+p20(s)+p 21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s) +p31(s)+p32(s)+p33(s)+p34(s)+p35(s) = 1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s )+p12(s) = 0.3 3. p1(s)+p2(s)+p3(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+ p20(s)+p21(s) = 0.3 4. p4(s)+p5(s)+p6(s)+p13(s)+p14(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+ p27(s)+p28(s)+p29(s)+p30(s) = 0.33 90 5. p1(s)+p4(s)+p7(s)+p8(s)+p15(s)+p22(s)+p23(s)+p24(s)+p31(s)+p32(s)= 0.36 6. p2(s)+p5(s)+p7(s)+p9(s)+p10(s)+p11(s)+p13(s)+p15(s)+p16(s)+p17(s)+ p18(s)+p22(s)+p25(s)+p26(s)+p27(s)+p31(s)+p32(s)+p33(s)+p34(s)= .54 7. p9(s)+p12(s)+p16(s)+p19(s)+p20(s)+p23(s)+p25(s)+p28(s)+p29(s)+p31(s) +p33(s)+p34(s)+p35(s) = 0.39 8. p3(s)+p10(s)+p17(s)+p19(s)+p21(s)+p26(s)+p28(s)+p30(s)+p32(s)+p33(s) +p35(s) = 0.36 9. p6(s)+p8(s)+p11(s)+p12(s)+p14(s)+p18(s)+p20(s)+p21(s)+p24(s)+p27(s) +p29(s)+p30(s)+p34(s)+p35(s) = 0.42 10. p1(s)+p2(s)+p3(s)≤ 0.09 11. p4(s)+p5(s)+p6(s)≤ 0.099 (16) 12. p1(s)+p4(s)+p7(s)+p8(s) ≤ 0.108 13. p2(s)+p5(s)+p7(s)+p9(s)+p10(s)+p11(s) ≤ 0.162 14. p9(s)+p12(s) ≤ 0.117 15. p3(s)+p10(s) ≤ 0.108 16. p6(s)+p8(s)+p11(s)+p12(s) ≤ 0.126 17. p13(s)+p14(s) ≤ 0.099 18. p1(s)+p15(s) ≤ 0.108 19. p2(s)+p13(s)+p15(s)+p16(s)+p17(s)+p18(s) ≤ 0.162 20. p16(s)+p19(s)+p20(s) ≤ 0.117 21. p3(s)+p17(s)+p19(s)+p21(s) ≤ 0.108 22. p14(s)+p18(s)+p20(s)+p21(s) ≤ 0.126 23. p4(s)+p22(s)+p23(s)+p24(s) ≤ 0.1188 24. p5(s)+p13(s)+p22(s)+p25(s)+p26(s)+p27(s) ≤ 0.1782 25. p23(s)+p25(s)+p28(s)+p29(s) ≤ 0.1287 91 26. p26(s)+p28(s)+p30(s) ≤ 0.1188 27. p6(s)+p14(s)+p24(s)+p27(s)+p29(s)+p30(s) ≤ 0.1386 28. p7(s)+p15(s)+p22(s)+p31(s)+p32(s) ≤ 0.1944 29. p23(s)+p31(s) ≤ 0.1404 30. p32(s) ≤ 0.1296 31. p8(s)+p24(s) ≤ 0.1512 32. p9(s)+p16(s)+p25(s)+p31(s)+p33(s)+p34(s) ≤ 0.2106 33. p10(s)+p17(s)+p26(s)+p32(s)+p33(s) ≤ 0.1944 34. p11(s)+p18(s)+p27(s)+p34(s) ≤ 0.2268 35. p19(s)+p28(s)+p33(s)+p35(s) ≤ 0.1404 36. p12(s)+p20(s)+p29(s)+p34(s)+p35(s) ≤ 0.1638 37. p21(s)+p30(s)+p35(s) ≤ 0.1512 38. pi ( s) ≥ 0 , i = 1,2,…,35. 39. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,35. After solving the above model, we get the desired results displayed in table 4, with the value of the D(p0 , p1 ) as 0.195194. Example 3(b): The population size and the sample size for this example is same as part (a), i.e. N=8 and n=3, thus the set S of all possible samples and the set of non-preferred samples will remain the same as in part (a) of this example. The Yi and pi values associated with the 8 villages of the population are: Yi : 12 15 17 pi : .05 .09 .2 24 17 19 25 18 .15 .10 .11 .12 .18 92 To find out the initial p(s) values we apply the Sampford (1967) plan and get the values of p(s) (only of preferred sample combinations) as follows. p(s1) = .0043; p(s2) = .0024; p(s3) = .0031; p(s4) = .0146; p(s5) = .0083; p(s6) = .0199; p(s7) = .0049; p(s8) = .0118; p(s9) = .0031; p(s10) =.0035; p(s11) = .0067; p(s12) =.0076; p(s13) = .0163; p(s14) =.0388; p(s15) = .0096; p(s16) = .0061; p(s17) =.0069; p(s18) = .0132; p(s19) =.0078; p(s20) = .0149; p(s21) = .0167; p(s22) =.0325; p(s23) = .0366; p(s24) =.0761; p(s25) = .0210; p(s26) = .0235; p(s27) =.0441; p(s28) = .0266; p(s29) =.0497; p(s30) = .0556; p(s31) = .0124; p(s32) =.0139; p(s33) = .0089; p(s34) =.0169; p(s35) = .0215. Now to assign zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows. p0(s1)=.0065; p0(s2)=.0036; p0(s3)=.0046; p0(s4)=.0221; p0(s5)=.0125; p0(s6)=.0301; p0(s7)=.0073; p0(s8)=.0178; p0(s9)=.0046; p0(s10)=.0052; p0(s11)=.0101; p0(s12)=.0114; p0(s13)=.0246; p0(s14)=.0587; p0(s15)=.0145; p0(s16)=.0093; p0(s17)=.0104; p0(s18)=.0199; p0(s19)=.0118; p0(s20)=.0225; p0(s21)=.0253; p0(s22)=.0492; p0(s23)=.0555; p0(s24)=.1152; p0(s25)=.0317; p0(s26)=.0356; p0(s27)=.0667; p0(s28)=.0403; p0(s29)=.0752; p0(s30)=.0842; p0(s31)=.0188; p0(s32)=.0211; p0(s33)=.0135; p0(s34)=.0257; p0(s35)=.0326. 93 Now the objective function and the constraints are given as follows Minimize z = 154.95*p1(s)^2+276.35*p2(s)^2+216.24*p3(s)^2+45.2 1*p4(s)^2+79.67*p5 (s)^2+33.16*p6(s)^2+135.63*p7(s)^2+55.9*p8(s)^ 2+213.13*p9(s)^2+189.18*p10(s)^2+98.76*p11(s)^2+87.24*p12(s)^2+4 0.5*p13(s)^2+17.02*p14(s)^2+68.63*p15(s)^2+107.5*p16(s)^2+95.49*p 17(s)^2+50.11*p18(s)^2+84.31*p19(s)^2+44.31*p20(s)^2+39.43*p21(s)^ 2+20.3*p22(s)^2+17.99*p23(s)^2+8.67*p24(s)^2+31.44*p25(s)^2+28.0 1*p26(s)^2+14.97*p27(s)^2+24.8*p28(s)^2+13.28*p29(s)^2+11.86*p30( s)^2+53.15*p31(s)^2+47.28*p32(s)^2+73.85*p33(s)^2+38.86*p34(s)^2+ 30.61*p35(s)^2-1 Subject to the constraints defined in (16), with the change in the values of right hand side as follows 1, 0.15, 0.27, 0.6, 0.45, 0.3, 0.33, 0.36, .54, 0.0405, 0.09, 0.0675, 0.045, 0.0495, 0.054, 0.081, 0.162, 0.1215, 0.081,0 .0891, 0.0972, 0.1458, 0.27, 0.18, 0.198, 0.216, 0.324, 0.135, 0.1485, 0.162, 0.243, 0.099, 0.108, 0.162, 0.1188, 0.1782, 0.1944 After solving the above model, we get the desired results displayed in table 4, with the value of the D(p0 , p1 ) as 0.441567. Example 4(a): Consider a population consisting of 8 villages and we have to draw a sample of 4 villages. The set S of all possible samples consists of 70 samples and are given as follows 1234; 1235; 1236; 1237; 1238; 1245; 1246; 1247; 1248; 1256; 1257; 1258; 1267; 1268; 1278; 1345; 1346; 1347; 1348; 1356; 1357; 1358; 1367; 1368; 1378; 1456; 1457; 1458; 1467; 1468; 1478; 1567; 1568; 1578; 1678; 2345; 2346; 2347; 2348; 2356; 2357; 2358; 2367; 2368; 2378; 94 2456; 2457; 2458; 2467; 2468; 2478; 2567; 2568; 2578; 2678; 3456; 3457; 3458; 3467; 3468; 3478; 3567; 3568; 3578; 3678; 4567; 4568; 4578; 4678; 5678. The non-preferred sample combinations are 28 and are already defined in example 4(a). The values of Yi and pi are given as follows: Yi : 12 15 17 24 17 pi : .11 .11 .12 .13 .17 19 .12 25 18 .11 .13 Since the pi values satisfy the condition (1), we apply MS scheme to get an IPPS plan with the revised normal size measures pi*’s. Using (2), the values of pi*’s are given as follows p1*’s=.02; p5*’s=.44; p2*’s=.02; p3*’s=.09; p4*’s=.16; p6*’s=.09; p7*’s=.02; p8*’s=.16. The values of p(s) for preferred sample combinations are given as follows p(s1) = .0163; p(s2) = .0043; p(s3) = .0183; p(s4) = .0063; p(s5) = .0163; p(s6) = .0143; p(s7) = .0183; p(s8) = .0043; p(s9) = .0063; p(s10) =.0203; p(s11) = .0083; p(s12) =.0183; p(s13) = .0203; p(s14) =.0063; p(s15) = .0103; p(s16) = .0083; p(s17) =.0183; p(s18) = .0223; p(s19) =.0083; p(s20) = .0103; p(s21) = .0183; p(s22) =.0083; p(s23) = .0123; p(s24) =.0183; p(s25) = .0163; p(s26) = .0203; p(s27) =.0063; p(s28) = .0103; p(s29) =.0083; p(s30) = .0183; p(s31) = .0223; p(s32) =.0083; p(s33) = .0103; p(s34) =.0183; p(s35) = .0203; p(s36) = .0243; p(s37) =.0103; p(s38) = .0123; p(s39) =.0223; p(s40) = .0203; p(s41) = .0243; p(s42) =.0223. 95 Now to assign zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows. p0(s1)=.0268; p0(s2)=.007; p0(s3)=.0301; p0(s4)=.0103; p0(s5)=.0268; p0(s6)=.0235; p0(s7)=.0301; p0(s8)=.007; p0(s9)=.0103; p0(s10)=.0333; p0(s11)=.0136; p0(s12)=.0301; p0(s13)=.0333; p0(s14)=.0103; p0(s15)=.0169; p0(s16)=.0136; p0(s17)=.0301; p0(s18)=.0366; p0(s19)=.0136; p0(s20)=.0169; p0(s21)=.0300; p0(s22)=.0136; p0(s23)=.0202; p0(s24)=.0301; p0(s25)=.0267; p0(s26)=.033; p0(s27)=.0103; p0(s28)=.0169; p0(s29)=.0136; p0(s30)=.0301; p0(s31)=.0366; p0(s32)=.0136; p0(s33)=.0169; p0(s34)=.0301; p0(s35)=.0333; p0(s36)=.0399; p0(s37)=.0169; p0(s38)=.0202; p0(s39)=.0366; p0(s40)=.0333; p0(s41)=.0399; p0(s42)=.0366. Now the objective function and the constraints for this example are given as follows Minimize z = 37.33*p1(s)^2+141.86*p2(s)^2+33.25*p3(s)^2+96.72*p 4(s)^2+37.33*p5(s)^2+42.56*p6(s)^2+33.25*p7(s)^2+141.86*p8(s)^2+ 96.72*p9(s)^2+29.97*p10(s)^2+73.37*p11(s)^2+33.25*p12(s)^2+29.97 *p13(s)^2+96.72*p14(s)^2+59.11*p15(s)^2+73.37*p16(s)^2+33.25*p17( s)^2+27.28*p18(s)^2+73.37*p19(s)^2+59.11*p20(s)^2+33.25*p21(s)^2+ 73.37*p22(s)^2+49.48*p23(s)^2+33.25*p24(s)^2+37.33*p25(s)^2+29.97 *p26(s)^2+96.72*p27(s)^2+59.11*p28(s)^2+73.37*p29(s)^2+33.25*p30( s)^2+27.28*p31(s)^2+73.37*p32(s)^2+59.11*p33(s)^2+33.25*p34(s)^2+ 96 29.97*p35(s)^2+25.03*p36(s)^2+59.11*p37(s)^2+49.48*p38(s)^2+27.28 *p39(s)^2+29.97*p40(s)^2+25.03*p41(s)^2+27.28*p42(s)^2-1 Subject to the following constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s )+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p2 1(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s) +p31(s)+p32(s)+p33(s)+p34(s)+p35(s)+p36(s)+p37(s)+p38(s)+p39(s)+p40 (s)+p41(s)+p42(s) = 1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s )+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p2 1(s)= 0.44 3. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+22(s)+p23(s) +p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33 (s)+p34(s) = 0.44 4. p1(s)+p2(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p22(s) +p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p35(s)+p36(s)+p37 (s)+p38(s)+p39(s)+p40(s)= 0.48 5. p3(s)+p4(s)+p10(s)+p11(s)+p17(s)+p18(s)+p19(s)+p20(s)+p22(s)+p23(s) +p30(s)+p31(s)+p32(s)+p33(s)+p35(s)+p36(s)+p37(s)+p38(s)+p41(s)+p42 (s)= 0.52 6. p1(s)+p3(s)+p5(s)+p6(s)+p7(s)+p10(s)+p12(s)+p13(s)+p17(s)+p18(s)+p 21(s)+p24(s)+p25(s)+p26(s)+p30(s)+p31(s)+p34(s)+p35(s)+p36(s)+p39(s) +p40(s)+p41(s)+p42(s) = 0.68 97 7. p5(s)+p8(s)+p12(s)+p14(s)+p15(s)+p19(s)+p24(s)+p27(s)+p28(s)+p32(s) +p37(s)+p39( s)+p41(s) = 0.48 8. p2(s)+p4(s)+p6(s)+p8(s)+p9(s)+p11(s)+p14(s)+p16(s)+p17(s)+p19(s)+p 20(s)+p21(s)+p22(s)+p25(s)+p27(s)+p29(s)+p30(s)+p32(s)+p33(s)+p34(s) +p35(s)+p37(s)+p38(s)+p40(s)+p42(s) = 0.44 9. p7(s)+p9(s)+p13(s)+p15(s)+p16(s)+p18(s)+p20(s)+p21(s)+p23(s)+p26(s) +p28(s)+p29(s)+p31(s)+p33(s)+p34(s)+p36(s)+p38(s)+p39(s)+p40(s)+p41 (s)+ p42(s) = 0.52 10. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)≤ 0.1936 11. p1(s)+p2(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)≤ 0.2112 12. p3(s)+p4(s)+p10(s)+p11(s)+p17(s)+p18(s)+p19(s)+p20(s) ≤ 0.2288 13. p1(s)+p3(s)+p5(s)+p6(s)+p7(s)+p10(s)+p12(s)+p13(s)+p17(s)+p18(s)+p 21(s) ≤ 0.2992 14. p5(s)+p8(s)+p12(s)+p14(s)+p15(s)+p19(s) ≤ 0.2112 15. p2(s)+p4(s)+p6(s)+p8(s)+p9(s)+p11(s)+p14(s)+p16(s)+p17(s)+p19(s)+p 20(s)+p21(s)≤ 0.1936 16. p7(s)+p9(s)+p13(s)+p15(s)+p16(s)+p18(s)+p20(s)+p21(s) ≤ 0.2288 17. p1(s)+p2(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s) ≤ 0.2112 18. p3(s)+p4(s)+p22(s)+p23(s)+p30(s)+p31(s)+p32(s)+p33(s) ≤ 0.2288 19. p1(s)+p3(s)+p5(s)+p6(s)+p7(s)+p24(s)+p25(s)+p26(s)+p30(s)+p31(s)+p 34(s) ≤ 0.2992 20. p5(s)+p8(s)+p24(s)+p27(s)+p28(s)+p32(s) ≤ 0.2112 21. p2(s)+p4(s)+p6(s)+p8(s)+p9(s)+p22(s)+p25(s)+p27(s)+p29(s)+p30(s)+p 32(s)+p33(s)+p34(s) ≤ 0.1936 98 22. p7(s)+p9(s)+p23(s)+p26(s)+p28(s)+p29(s)+p31(s)+p33(s)+p34(s)≤ .2288 23. p10(s)+p11(s)+p22(s)+p23(s)+p35(s)+p36(s)+p37(s)+p38(s) ≤ 0.2496 24. p1(s)+p10(s)+p12(s)+p13(s)+p24(s)+p25(s)+p26(s)+p35(s)+p36(s)+p39(s) +p40(s)≤ 0.3264 (17) 25. p12(s)+p14(s)+p15(s)+p24(s)+p27(s)+p28(s)+p37(s)+p39(s) ≤ 0.2304 26. p2(s)+p11(s)+p14(s)+p16(s)+p22(s)+p25(s)+p27(s)+p29(s)+p35(s)+p37(s) +p38(s)+p40 (s) ≤ 0.2112 27. p13(s)+p15(s)+p16(s)+p23(s)+p26(s)+p28(s)+p29(s)+p36(s)+p38(s)+p39(s )+p40(s)≤ 0.2496 28. p3(s)+p10(s)+p17(s)+p18(s)+p30(s)+p31(s)+p35(s)+p36(s)+p41(s)+p42(s) ≤ 0.3536 29. p19(s)+p32(s)+p37(s)+p41(s) ≤ 0.2496 30. p4(s)+p11(s)+p17(s)+p19(s)+p20(s)+p22(s)+p30(s)+p33(s)+p35(s)+p37(s) +p38(s)+p42 (s) ≤ 0.2288 31. p18(s)+p20(s)+p23(s)+p31(s)+p33(s)+p36(s)+p38(s)+p41(s)+p42(s)≤ 0.2704 32. p5(s)+p12(s)+p24(s)+p39(s)+p41(s) ≤ 0.3264 33. p6(s)+p17(s)+p21(s)+p25(s)+p30(s)+p34(s)+p35(s)+p40(s)+p42(s)≤ 0.2992 34. p7(s)+p13(s)+p18(s)+p21(s)+p26(s)+p31(s)+p34(s)+p36(s)+p39(s)+p40(s) +p41(s)+p42(s) ≤ 0.3536 35. p8(s)+p14(s)+p19(s)+p27(s)+p32(s)+p37(s) ≤ 0.2112 36. p15(s)+p28(s)+p39(s)+p41(s) ≤ 0.2496 37. p9(s)+p16(s)+p20(s)+p21(s)+p29(s)+p33(s)+p34(s)+p38(s)+p40(s)+p42(s) ≤ 0.2288 38. pi ( s) ≥ 0 , i = 1,2,…,42. 99 39. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,42. After solving the above model, we get the desired results displayed in table 5, with the value of the D(p0 , p1 ) as 0.405197 Example 4(b): Again consider a population with N=8 and n=3. The set S of all possible samples and the set of non-preferred samples will remain the same as in part (a) of this example. The Yi and pi values associated with the 8 villages of the population are: Yi : 12 15 17 24 17 19 25 18 pi : .09 .09 .18 .11 .12 .14 .17 .10 To find out the initial p(s) values we apply the Sampford (1967) plan and get the values of p(s) (only of preferred sample combinations) as follows. p(s1) = .0084; p(s2) = .0175; p(s3) = .0029; p(s4) = .0061; p(s5) = .0045; p(s6) = .0070; p(s7) = .0025; p(s8) = .0093; p(s9) = .0052; p(s10) =.0112; p(s11) = .0233; p(s12) =.0171; p(s13) = .0097; p(s14) =.0353; p(s15) = .0129; p(s16) = .0202; p(s17) =.0095; p(s18) = .0033; p(s19) =.0125; p(s20) = .0071; p(s21) = .0082; p(s22) =.0233; p(s23) = .0084; p(s24) =.0171; p(s25) = .0268; p(s26) = .0097; p(s27) =.0353; p(s28) = .0129; p(s29) =.0202; p(s30) = .0095; p(s31) = .0033; p(s32) =.0125; p(s33) = .0071; p(s34) =.0082; p(s35) = .0357; p(s36) = .0130; p(s37) =.0469; p(s38) = .0270; p(s39) =.0199; p(s40) = .0310; p(s41) = .0070; p(s42) =.0110. 100 Now to assign zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows p0(s1)=.0135; p0(s2)=.0281; p0(s3)=.0046; p0(s4)=.0098; p0(s5)=.0072; p0(s6)=.0113; p0(s7)=.0040; p0(s8)=.0150; p0(s9)=.0085; p0(s10)=.0181; p0(s11)=.0375; p0(s12)=.0276; p0(s13)=.0157; p0(s14)=.0568; p0(s15)=.0208; p0(s16)=.0326; p0(s17)=.0152; p0(s18)=.0054; p0(s19)=.0202; p0(s20)=.0114; p0(s21)=.0132; p0(s22)=.0375; p0(s23)=.0136; p0(s24)=.0276; p0(s25)=.0431; p0(s26)=.0157; p0(s27)=.0568; p0(s28)=.0208; p0(s29)=.0326; p0(s30)=.0152; p0(s31)=.0054; p0(s32)=.0202 p0(s33)=.0114; p0(s34)=.0132; p0(s35)=.0575; p0(s36)=.0210; p0(s37)=.0756; p0(s38)=.0435; p0(s39)=.0320; p0(s40)=.0500; p0(s41)=.0112; p0(s42)=.0177. Now the objective function and the constraints for this example are given as follows. Minimize z = 74.02*p1(s)^2+35.57*p2(s)^2+213.54*p3(s)^2+101.34 *p4(s)^2+138.88*p5(s)^2+87.89*p6(s)^2+247.47*p7(s)^2+66.24*p8(s) ^2+117.27*p9(s)^2+55.11*p10(s)^2+26.6*p11(s)^2+36.19*p12(s)^2+63 .68*p13(s)^2+17.59*p14(s)^2+48.07*p15(s)^2+30.67*p16(s)^2+65.39*p 17(s)^2+183.27*p18(s)^2+49.36*p19(s)^2+87.12*p20(s)^2+75.58*p21(s) ^2+26.6*p22(s)^2+73.38*p23(s)^2+36.19*p24(s)^2+23.15*p25(s)^2+63 .68*p26(s)^2+17.59*p27(s)^2+48.07*p28(s)^2+30.67*p29(s)^2+65.39*p 30(s)^2+183.27*p31(s)^2+49.36*p32(s)^2+87.12*p33(s)^2+75.58*p34(s) 101 ^2+17.36*p35(s)^2+47.45*p36(s)^2+13.22*p37(s)^2+22.95*p38(s)^2+3 1.2*p39(s)^2+19.99*p40(s)^2+88.64*p41(s)^2+56.27*p42(s)^2-1 Subject to the constraints defined in (17), with the change in the values of right hand side as follows 1, 0.36, 0.36, 0.72, 0.44, 0.48, 0.56, 0.68, .4, 0.1296, 0.2592, 0.1584, 0.1728, 0.2016, 0.2448, 0.144, 0.2592, 0.1584, 0.1728, 0.2016, 0.2448, 0.144, 0.3168, 0.3456, 0.4032, 0.4896, 0.288, 0.2112, 0.2464, 0.2992, 0.176, 0.2688, 0.3264, 0.192, 0.3808, 0.224, 0.272 After solving the above model, we get the desired results displayed in table 5, with the value of the D(p0 , p1 ) as 0.579257 Example 5: Consider a population consisting of 7 villages and we have to draw a sample of 4 villages. The set S of all possible samples consists of 35 samples and are given as follows 1234; 1235; 1236; 1237; 1245; 1246; 1247; 1256; 1257; 1267; 1345; 1346; 1347; 1356; 1357; 1367; 1456; 1457; 1467; 1567; 2345; 2346; 2347; 2356; 2357; 2367; 2456; 2457; 2467; 2567; 3456; 3457; 3467; 3567; 4567. The non-preferred sample combinations are 14 and are already defined in example 5. The values of Yi and pi are given as follows: Yi : 12 15 17 24 17 19 25 pi : .14 .13 .15 .13 .16 .15 .14 Since the pi values satisfy the condition (1), we apply MS scheme. The values of pi*’s are given as follows 102 p1*’s=.12; p2*’s=.04; p3*’s=.2; p5*’s=.28; p6*’s=.2; p7*’s=.12; p4*’s=.04; The values of p(s) for preferred sample combinations are given as follows p(s1) = .032; p(s2) = .024; p(s3) = .024; p(s4) = .016; p(s5) = .032; p(s6) = .028; p(s7) = .024; p(s8) = .032; p(s9) = .024; p(s10) =.004; p(s11) = .032; p(s12) =.028; p(s13) = .024; p(s14) =.02; p(s15) = .036; p(s16) = .032; p(s17) =.028; p(s18) = .024; p(s19) =.02; p(s20) = .032; p(s21) = .02. Now to assign zero probability of selection to non-preferred samples we use the design p0(s) and get the values as follows p0(s1)=.0559; p0(s2)=.0419; p0(s3)=.0419; p0(s4)=.0279; p0(s5)=.0559; p0(s6)=.0489; p0(s7)=.0419; p0(s8)=.0559; p0(s9)=.0419; p0(s10)=.0699; p0(s11)=.0559; p0(s12)=.0489; p0(s13)=.0419; p0(s14)=.0349; p0(s15)=.0629; p0(s16)=.0559; p0(s17)=.0489; p0(s18)=.0419; p0(s19)=.0349; p0(s20)=.0559; p0(s21)=.0349. Now the objective function and the constraints for this example are given as follows. Minimize z = 18.13*p1(s)^2+24.17*p2(s)^2+24.17*p3(s)^2+36.25*p4 (s)^2+18.13*p5(s)^2+20.71*p6(s)^2+24.17*p7(s)^2+18.13*p8(s)^2+2 4.17*p9(s)^2+14.5*p10(s)^2+18.13*p11(s)^2+20.71*p12(s)^2+24.17*p1 103 3(s)^2+29*p14(s)^2+16.11*p15(s)^2+18.13*p16(s)^2+20.71*p17(s)^2+2 4.17*p18(s)^2+29*p19(s)^2+18.13*p20(s)^2+20.71*p21(s)^2-1 Subject to the following constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s )+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p2 1(s) =1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s )+p12(s)+p13(s) = 0.56 3. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p14(s)+p15(s)+p16(s)+p17 (s)+p18(s)+p19(s) = 0.52 4. p1(s)+p2(s)+p8(s)+p9(s)+p10(s)+p11(s)+p14(s)+p15(s)+p16(s)+p17(s)+ p20(s)+p21(s) = 0.6 5. p3(s)+p4(s)+p8(s)+p9(s)+p12(s)+p13(s)+p14(s)+p18(s)+p19(s)+p20(s)+ p21(s) = 0.52 6. p1(s)+p3(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s)+p15(s)+p16(s)+p18(s)+p 20(s)= 0.64 7. p5(s)+p7(s)+p10(s)+p11(s)+p13(s)+p15(s)+p17(s)+p19(s)+p21(s) = 0.6 8. p2(s)+p4(s)+p6(s)+p7(s)+p9(s)+p11(s)+p12(s)+p13(s)+p14(s)+p16(s)+p 17(s)+p18(s)+p19(s)+p20(s)+p21(s) = 0.56 9. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)≤ 0.2912 10. p1(s)+p2(s)+p8(s)+p9(s)+p10(s)+p11(s) ≤ 0.336 11. p3(s)+p4(s)+p8(s)+p9(s)+p12(s)+p13(s) ≤ 0.2912 12. p1(s)+p3(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s) ≤ 0.3584 13. p5(s)+p7(s)+p10(s)+p11(s)+p13(s) ≤ 0.336 104 14. p2(s)+p4(s)+p6(s)+p7(s)+p9(s)+p11(s)+p12(s)+p13(s) ≤ 0.3136 15. p1(s)+p2(s)+p14(s)+p15(s)+p16(s)+p17(s) ≤ 0.312 16. p3(s)+p4(s)+p14(s)+p18(s)+p19(s) ≤ 0.2704 17. p1(s)+p3(s)+p5(s)+p6(s)+p15(s)+p16(s)+p18(s) ≤ 0.3328 18. p5(s)+p7(s)+p15(s)+p17(s)+p19(s) ≤ 0.312 19. p2(s)+p4(s)+p6(s)+p7(s)+p14(s)+p16(s)+p17(s)+p18(s)+p19(s)≤ 0.2912 20. p8(s)+p9(s)+p14(s)+p20(s)+p21(s) ≤ 0.312 21. p8(s)+p10(s)+p15(s)+p16(s)+p20(s) ≤ 0.384 22. p10(s)+p11(s)+p15(s)+p17(s)+p21(s) ≤ 0.36 23. p2(s)+p9(s)+p11(s)+p14(s)+p16(s)+p17(s)+p20(s)+p21(s) ≤ 0.336 24. p3(s)+p8(s)+p12(s)+p18(s)+p20(s) ≤ 0.3328 25. p13(s)+p19(s)+p21(s) ≤ 0.312 26. p4(s)+p9(s)+p12(s)+p13(s)+p14(s)+p18(s)+p19(s)+p20(s)+p21(s)≤ .2912 27. p5(s)+p10(s)+p15(s) ≤ 0.384 28. p6(s)+p12(s)+p16(s)+p18(s)+p20(s) ≤ 0.3584 29. p7(s)+p11(s)+p13(s)+p17(s)+p19(s)+p21(s) ≤ 0.336 30. pi ( s) ≥ 0 , i = 1,2,…,21. 31. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,21. After solving the above model, we get the desired results displayed in table 6, with the value of the D(p0 , p1 ) as .228878. 105 CHAPTER III TWO DIMENSIONAL OPTIMAL CONTROLLED NEAREST PROPORTIONAL TO SIZE SAMPLING DESIGN USING QUADRATIC PROGRAMMING 3.1 INTRODUCTION Controlled selection, originated by Goodman and Kish (1950), may be described as a method of sampling from finite universe which permits multiple stratification beyond what is possible by stratified random sampling, while conforming strictly to the requirements of probability sampling. In many practical situations, some combinations of units may be too expensive, less prominent or even undesirable to be included in the sample. The samples containing these undesirable combinations of units are termed as non-preferred samples. Controlled selection in such cases would either exclude the possibility of including such combinations of units or assign them a low probability of selection. Controls may be imposed to secure proper distribution geographically or otherwise and to increase adequate sample size for some domains of the population. In fact any departure from simple 106 random sampling may be regarded as a control, which increases the selection probability of preferred combinations by eliminating or reducing the non-preferred combinations. This situation generally arises in field experiments where the practical considerations make some units undesirable, but the theoretical compulsion make it necessary to follow the probability sampling. An important area where controlled selection can be effectively used is sampling in two or more dimensions. Multi-dimensional sampling problems often arise in social research dealing with highly variable populations requiring stratification in several directions. Bryant (1961), Hess and Srikantan (1966), Moore et. al. (1974) and Jessen (1975) demonstrated the need for multi-way stratification in different real life situations. This multiple stratification often leads to more strata cells than can be accommodated in a sampling design. For example, in Jessen’s (1975) study with 12 geographic areas and 12 income classes, there will be 144 strata cells but the funds were available to sample only 24 of the cells. Similarly in Bryant’s (1961) study, with 5 locations, two times of a day, 4 seasons and two types of days, there will be 80 strata cells but only 46 cells could be covered within the budget of the study. This leads to the need for stratification 107 techniques, which could permit fewer cells to be sampled than the total number of strata cells, without sacrificing the requirements of probability sampling. Controlled selection has been effectively used by different researchers to deal with such situations. Multidimensional controlled selection is highly useful when the number of strata cells exceeds the permissible sample size. Another situation that needs attention arises when stratification cannot fully exploit the gains of controls, leading to the need for ‘controls beyond stratification’. Controls beyond stratification further enhance the probabilities of preferred combinations, by eliminating or reducing the undesirable combinations because they violate defined control classes. A real problem emphasizing the need for ‘controls beyond stratification’ was discussed by Goodman and Kish (1950, p. 354). Tiwari and Nigam (1998,p. 92) also demonstrated the utility of ‘controls beyond stratification’ with the help of an example borrowed from Jessen (1978). Two-way and Multi-way stratification has been discussed by Jessen (1970, 1973, 1975, 1978) under the title of ‘lattice sampling’ and ‘multi stratification’. Goodman and Kish (1950), Hess and Srikantan (1966) and Waterton (1983) also discussed multi 108 dimensional controlled selection and proposed different methods for achieving the controls. All these methods of multi dimensional controlled selection are quite arbitrary, involving a lot of trial and error for selecting the samples and in many situations they even fail to produce a solution. Causey et.al. (1985) were the first to use the transportation theory to solve the two- dimensional controlled selection problem. Their method is efficient but complex to implement. Rao and Nigam (1990, 1992) used the simplex method in linear programming to solve the one dimensional controlled selection problem. Taking an inspiration from Rao and Nigam (1990, 1992), Sitter and Skinner (1994) extended the linear programming approach to multi-way stratification. Tiwari and Nigam (1998) also used simplex method in linear programming to solve the two dimensional optimal controlled selection problem with ‘controls beyond stratification’ and considered the related estimation problems. The plan of Tiwari and Nigam (1998) is best suited to problems with integer marginals, whereas the method of Sitter and Skinner is best suited for non-integer marginals. Here, ‘marginals’ are the totals corresponding to each row, column and grand total in a two-dimensional table. Extending the linear 109 programming approach of Sitter and Skinner (1994), Lu and Sitter (2002) developed some methods to reduce the amount of computation so that very large problems became feasible using the linear programming approach. Recently, using quadratic programming, Tiwari et al. (2007) applied the idea of ‘nearest proportional to size sampling design’ originated by Gabler (1987), to one- dimensional optimal controlled selection designs which fully exclude the nonpreferred combination of units from the selected samples. In this chapter we extend the idea of Tiwari et al. (2007) to the multi-dimensional controlled selection problems with ‘controls beyond stratification’. The proposed plan appears to be superior to the earlier two-dimensional controlled selection plans, as it ensures zero probability to non-preferred samples. The greatest difficulty with the multi-dimensional controlled selection problems is that due to increased problem magnitude and complexity, the process of enumeration of all possible samples becomes quite tedious. The methodological modification in multi-dimensional approach over the one dimensional approach is that instead of taking all the N Cn combinations as the set of all possible samples, we consider only a sub-set of the NCn combinations which satisfy the marginal constraints 110 of the given multi-dimensional problem. With multi-dimensional controlled selection problems, the potential technical difficulty lies in the fact that the non-negativity condition of the Yates-Grundy (1953) form of the Horvitz-Thompson (1952) variance estimator is not satisfied. This leads to the omission of this constraint from the plan and introduction of an alternative variance estimator. Another problem that needs attention is of variance estimation in multi-dimensional controlled selection problems. As also pointed out by Tiwari and Nigam (1998), a practical difficulty while dealing with the multi-dimensional controlled selection problems is that they generally do not satisfy the non-negativity condition of the YatesGrundy (1953) form of the Horvitz-Thompson (1952) variance estimator. To overcome this difficulty, a random group method for variance estimation in two dimensional controlled selection problems has been suggested. The proposed method appears to perform better than the ‘split sample’ method of Jessen (1975) and the ‘half-sample’ method of Tiwari and Nigam (1998). In Section 3.2, the proposed design has been discussed. In Section 3.3, the proposed design has been illustrated with the help of some numerical examples. In Section 3.4, we suggest a random group 111 method for variance estimation in two-dimensional controlled selection problems and demonstrate its utility with the help of examples. 3.2 THE TWO DIMENSIONAL OPTIMAL CONTROLLED NEAREST PROPORTIONAL TO SIZE SAMPLING DESIGN In what follows, we use the idea of ‘nearest proportional to size sampling designs’, originated by Gabler (1987), to propose a two dimensional optimal controlled IPPS sampling design that matches the original inclusion probabilities ( π i ’s) of each unit in the population and ensures zero probability to non-preferred samples. Let us consider a two-dimensional population of y’s consisting of N elements and let x’s be their measures of size. The selection probabilities of these N units of the population (pi’s) are known and are given by pi = N xi , where X = ∑ xi . Suppose a sample of size n is to X i =1 be drawn from this population. We denote the inclusion probability of the i th unit in the sample by π i , where π i = n pi. Let S and S1 denote respectively, the set of all possible samples and the set of nonpreferred samples. 112 In the proposed plan, using the given selection probabilities for N units of the population (pi’s), we first obtain an appropriate uncontrolled IPPS design p(s), such as Sampford (1967) or MidzunoSen (1952, 1953) design. In this discussion, we make use of Sampford’s (1967) IPPS design to obtain our initial uncontrolled IPPS design (p(s)), as this design imposes only one restriction on initial probabilities ( pi’s), that is, pi ≤ 1/n, whereas the other IPPS designs impose more stringent restrictions on initial probabilities. For instance, the Midzuno-Sen (1952, 1953) IPPS scheme has the restriction that (n-1)/{n(N-1)} ≤ pi ≤ 1/n, which limits the applicability of the method to units that are rather similar in size. Using Sampford’s scheme, the probability of including n units in the s th sample is given by n p( s ) = π i1i2 ...in = nk n λi1 λi2 ...λin 1 − ∑ piu u =1 −1 where n tLn −t kn = ∑ t , t =1 n λi = pi and for a set S (m) of 1 − pi m ≤ N different units, i1, i2, …im, Lm is defined as 113 (3.2.1) L0 = 1 and Lm = ∑λ λ S (m) i1 i2 ...λim (1 ≤ m ≤ N ) . After obtaining the initial IPPS design p(s), the idea behind the proposed plan is to get rid of the non-preferred samples S1 by restricting ourselves to the set S-S1. To attain this objective, following the idea of Tiwari et al. (2007), we introduce a new design p0(s) given by p(s ) , for s ∈ S − S1 , 1 − ∑ p (s ) , { p 0 (s ) = s∈S1 0, (3.2.2) otherwise where p(s) is the initial uncontrolled IPPS sampling plan. The design p0(s) assigns zero probability to the non-preferred samples. Due to practical considerations, one would like to perform the sampling design p0(s). However, the design p0(s) is probably no longer an IPPS design and it may be desirable to have an IPPS design due to theoretical considerations. Therefore, applying the idea of Gabler (1987), we are interested for an IPPS design p1(s) which is as near as possible to the design p0(s). To achieve the design p1(s) which is as near as possible to p0(s) and also satisfies the condition of an IPPS design, we minimize the directed distance D from the sampling design p1(s) to p0(s) defined as 114 2 p p (s) −1, D ( p 0 , p1 ) = E p0 ( 1 − 1) 2 = ∑ 1 p0 p0 (s) s (3.2.3) subject to the following constraints: (i) p1 ( s ) ≥ 0 ∑ p (s) = 1 (ii) s∈S − S1 (iii) ∑ p (s) = π s ∋i (3.2.4) 1 1 i , for p1(s) to be an IPPS design. The minimization of the objective function (3.2.3) subject to the constraints (3.2.4) is achieved through quadratic programming using Microsoft Excel Solver of Microsoft Office 2000 package. The constraints (i) and (ii) in (3.2.4) are necessary for any sampling design and the constraint (iii) assures that the resultant design p1(s) is an IPPS design. We also tried to add one more constraint ∑ p1 ( s) ≤ π i π j , i<j=1,…,N to (3.2.4), to ensure the nons ∋ i,j negativity of the Yates-Grundy form of H-T variance estimator and applied it to all the two-dimensional problems considered by us. However, in no case did it yield a solution, implying that this condition is quite stringent to be satisfied in any two-dimensional controlled selection problem. Consequently, this constraint was 115 dropped and an alternative variance estimator for two-dimensional controlled selection is suggested in Section 3.4. The distance measure D(p0 , p1) defined in (3.2.3) is similar to the χ2-statistics often employed in related problems and is also used by Cassel and Särndal (1972) and Gabler (1987). Few other distance measures are also discussed by Takeuchi et al. (1983). Two alternative distance measures for the present discussion may be defined as: D( p 0 , p1 ) = ∑ p 0 ( s ) − p1 ( s ) (3.2.5) s D ( p 0 , p1 ) = ∑ s ( p 0 ( s ) − p1 ( s )) 2 . ( p 0 ( s ) + p1 ( s )) (3.2.6) These distance measures when applied on different numerical problems considered by us, we found that the use of (3.2.5) and (3.2.6) gave similar results to (3.2.3) in convergence and efficiency and so we will give results using (3.2.3) as the distance measure, as it is a widely used distance measure for similar problems. However, the other distance measures may also be used as per the convenience of the investigator. 116 While all the other two dimensional optimal controlled selection plans discussed by earlier authors attempt to minimize the selection probabilities of the non-preferred samples, the proposed plan completely eliminates the non-preferred samples by assigning zero probabilities to them. The proposed plan is superior to the plans of Sitter and Skinner (1994) and Tiwari and Nigam (1998) in the sense that it assures zero probability to non-preferred samples and is much nearer to the controlled design (p0(s)), which we wanted to achieve due to practical considerations. Moreover, the proposed plan also incorporates the possibility of ‘controls beyond stratification’, which was not considered by Sitter and Skinner (1994). One limitation of the proposed plan is that it becomes quite time consuming for drawing samples from large populations. It can, however, work very well for sampling in small populations particularly in field experimentation. This has been demonstrated with the help of a real life example where two-dimensional stratification is required in plot sampling in field experiments [see, Example 2; Section 3.3]. The proposed method may also be used to select a small number of firststage units from each of a large number of strata. This involves a 117 solution of a series of quadratic programming problems each of a reasonable size. 3.3. EXAMPLES In this Section, we consider some examples to demonstrate the utility of the proposed procedure over the existing optimal controlled selection methods. Example 1: We first demonstrate the proposed method on a hypothetical example borrowed from Bryant et al. (1960) given in Table 1. The desired sample size of n = 10 is less than the total number of cells, 15. The integer parts of npi’s are known as certainty proportions. For example, in cell (1,1) with npi = 1.0, the certainty proportion is 1 which indicates that one unit is to be selected from this cell with certainty. Similarly, in cell (4,2) with npi = 1.8, the certainty proportion is 1. The term ‘certainty proportion’ was introduced by Jessen (1978, p. 396) and was further used by Tiwari and Nigam (1998, p. 92). To reduce the computation and also to satisfy the condition that the probability should lie between 0 and 1, we initially 118 remove these certainty proportions and replace them at their original positions after the set of feasible samples is obtained. This may be noted here that the value of npi may also be greater than one for a particular cell, which indicates that more than one unit is to be selected from that cell. After removing the certainty proportions, we get the following two-way array given in Table 2. Table 1 Expected sample cell counts (npi) under proportionate stratification with n = 10 Type of Community Regions Urban Rural Metropolitan Total 1 2 3 4 5 1.0 0.2 0.2 0.6 1.0 0.5 0.3 0.6 1.8 0.8 0.5 0.5 1.2 0.6 0.2 2.0 1.0 2.0 3.0 2.0 Total 3.0 4.0 3.0 10.0 119 Table 2 Expected sample cell counts (npi) for 5x3 population after removing certainty proportions ∑ Expected Sample Cell Counts (npi) 0.0 0.2 0.2 0.6 0.0 1.0 ∑ 0.5 0.3 0.6 0.8 0.8 3.0 0.5 0.5 0.2 0.6 0.2 2.0 1.0 1.0 1.0 2.0 1.0 6.00 Now the problem is reduced into selecting 6 units from the above array. The set of all possible samples consists of 15 C 6 samples, out of which 4989 samples do not satisfy the marginal constraints of the 5x3 population. Thus, the set of preferred combinations consists of only 16 samples, demonstrated in Table 3. Case I: To compare our procedure with that of Sitter and Skinner (1994), we first consider that there are no controls beyond stratification and therefore there are only 4989 non-preferred samples, which arise due to marginal constraints of the 5x3 population. To find out the initial p(s) values we apply the Sampford (1967) plan and get the following values of p(s). p(s1) =.000254; p(s2) = .000633; 120 p(s3) = .003275; p(s4) = .000080; p(s5) = .002271; p(s6) = .001048; p(s7) = .000168; p(s8) = .000080; p(s9) = .000421; p(s10) =.000917; p(s11) = .004716; p(s12) =.002271; p(s13) = .00042; p(s14) =.002189; p(s15) = .001048; p(s16) =.011527. Now to assign the zero probability of selection to nonpreferred samples we use the design p0(s) and get the values as follows p0(s1)=.0081; p0(s2)=.0202; p0(s3)=.01046; p0(s4)=.0025; p0(s5)=.0725; p0(s6)=.0335; p0(s7)=.0054; p0(s8)=.0025; p0(s9)=.0134; p0(s10)=.0293; p0(s11)=.1506; p0(s12)=.0725; p0(s13)=.0134; p0(s14)=.0699; p0(s15)=.0335; p0(s16)=.3681. Now we apply the proposed plan as follows. Min z = 123.39*p1(s)^2+49.46*p2(s)^2+9.56*p3(s)^2+393.77*p4(s)^ 2+13.79*p5(s)^2+29.88*p6(s)^2+185.94*p7(s)^2+393.77*p8(s)^2+74 .39*p9(s)^2+34.15*p10(s)^2+6.64*p11(s)^2+13.79*p12(s)^2+74.37*p13 (s)^2+14.3*p14(s)^2+29.88*p15(s)^2+2.71*p16(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p1 1(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)=1 2. p1(s)+p2(s)+p4(s)+p5(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)= 0.5 3. p3(s)+p6(s)+p13(s)+p14(s)+p15(s)+p16(s)=0.5 4. p1(s)+p2(s)+p3(s)=0.2 121 5. p4(s)+p6(s)+p7(s)+p8(s)+p9(s)+p13(s)+p14(s)+p15(s)=0.3 6. p5(s)+p10(s)+p11(s)+p12(s)+p16(s)=0.5 7. p4(s)+p5(s)+p6(s)=0.2 8. p1(s)+p3(s)+p7(s)+p10(s)+p11(s)+p13(s)+p14(s)+p16(s)=0.6 9. p2(s)+p8(s)+p9(s)+p12(s)+p15(s)=0.2 10. p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+ p16(s)=0.6 11. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s)+p13(s)+ p15(s)+p16(s)=0.8 12. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p9(s)+p11(s)+p14(s)= 0.6 13. p2(s)+p3(s)+p5(s)+p6(s)+p9(s)+p11(s)+p12(s)+p14(s)+p15(s)+p16(s) = 0.8 14. p1(s)+p4(s)+p7(s)+p8(s)+p10(s)+p13(s)=0.2 15. pi ( s) ≥ 0 for i = 1,2,…16. After solving above model, we get the selection probabilities of the 16 preferred samples, given in Table 3, with the value of D( p 0 , p1 ) as 0.611986. Sitter and Skinner (1994) also solved the above 5x3 population and obtained the following results: p(1) = 0.2; p(2) = 0; p(3) = 0; p(4) =0; p(5) = 0; p(6) = 0.2; p(7) = 0; p(8) = 0; p(9) = 0; p(10) = 0; p(11)=0.1; p(12) = 0.2; p(13) = 0; p(14)=0.1; p(15) = 0; p(16)=0.2 and φ = the probability of non-preferred samples = 0. 122 Table 3 Selection probabilities (p1(s)) of the preferred samples for the 5x3 population using the proposed plan S. No. 2 3 4 5 6 Included 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 1 0 1 elements 1 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 2 0 1 1 1 0 1 1 0 1 1 0 1 0 2 1 0 2 1 0 2 1 0 2 1 0 2 1 0 2 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0.03043 0.03931 0.13026 0.01166 p1(s) S. No. 1 0.06967 8 9 10 11 12 Included 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 Elements 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 2 0 0 2 0 1 1 0 1 1 0 0 2 1 1 1 1 2 0 1 1 1 1 2 0 1 1 1 1 2 0 1 0 1 1 0 1 1 1 0 1 0 1 1 1 0 1 1 0 0.01898 0.0102 0.02311 0.08411 0.08658 0.07696 p1(s) S. No. 7 0.11867 14 15 16 Included 1 0 1 1 0 1 1 0 1 1 0 1 elements 0 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 0 0 2 0 1 1 1 2 0 1 1 1 1 2 0 1 2 0 1 0 1 1 1 0 1 1 0 1 1 0 0.04461 0.07133 0.050429 0.133691 p1(s) 13 123 Here we find that the probability of non-preferred samples is zero for the Sitter and Skinner’s (1994) method also, but many of the preferred samples such as sample number 2, 3, 4, etc. have also been assigned zero probability by their method. Moreover, on substituting the values of p1(s)’s obtained by the method of Sitter and Skinner (1994) in D( p 0 , p1 ) , we get the distance between p0(s) and the IPPS plan proposed by Sitter and Skinner (1994) as 6.001075, which is larger than the value of D( p 0 , p1 ) for the proposed plan (that is, 0.611986). This shows that when there are no controls beyond stratification, although both the plans achieve the marginal constraints, the proposed IPPS plan appears to perform better than the plan of Sitter and Skinner (1994), as it is much nearer to the ideal controlled plan (p0(s)). Moreover, the proposed plan guarantees zero probability to non-preferred samples, while the plan of Sitter and Skinner (1994) only attempts to minimize it. Case II: Now we consider the situation of ‘controls beyond stratification’. Based on the considerations similar to those of Tiwari and Nigam (1998) and Avadhani and Sukhatme (1973), suppose that if all the three units 4th, 8th and 12th or 6th, 8th and 10th do not appear in a sample, then the sample is a non-preferred sample. Thus the set of 124 preferred combinations consists of only 10 samples i.e. the sample numbers 1, 3, 5, 7, 9, 10, 11, 13, 14 and 16. Now applying the proposed plan to the modified problem, we get the following results: p(1)= 0.0; p(3)=0.2; p(5)=0.2; p(7)=0.0; p(9)=0.2; p(10)=0.1; p(11)=0.0; p(13)=0.1; p(14)=0.0; p(16)=0.2. This ensures zero probability to non-preferred samples with D( p 0 , p1 ) = 3.262855. Solution of the above problem using the method of Tiwari and Nigam (1998) gives the following results: p(1)=0.1; p(3)=0.1; p(5)=0.2; p(7)=0.0; p(9)=0.2; p(10)=0.0; p(11)=0.0; p(13)=0.1; p(14)=0.0; p(16)=0.3; and the probability of non-preferred samples (φ) is zero. After replacing the values of p1(s) obtained by the method of Tiwari and Nigam (1998) in D( p 0 , p1 ) we get the distance between p0(s) and the IPPS plan proposed by Tiwari and Nigam (1998) as D( p 0 , p1 ) = 3.882095, which is larger than that obtained for the proposed plan. This shows that the proposed IPPS plan is nearer to p0(s) and therefore appears to perform better than the IPPS plan of Tiwari and Nigam (1998). Moreover, the proposed plan always ensures zero probability to non-preferred samples, whereas the plan of Tiwari and Nigam (1998) only attempts to minimize the probability of non-preferred samples. 125 Example 2: Now we produce an example in which the probability of non-preferred samples (φ) is not equal to zero for Tiwari and Nigam (1998) plan, while the proposed plan always gives φ =0. Consider a 4x3 hypothetical population given in Table 4. The desired sample size of n = 8 is less than the total number of cells, 12. Table 4 The expected sample cell counts (npi) for the 4x3 population Expected cell counts (npi) Total Total 0.8 0.5 0.7 2.0 0.8 0.9 0.3 2.0 0.7 0.7 0.6 2.0 0.7 0.9 0.4 2.0 3.0 3.0 2.0 8.0 The set of all possible samples consists of 12 samples demonstrated in Table 5. Let the set of non-preferred samples consists of those samples that do not contain all the three units 1st, 5th and 9th or 3rd, 5th and 7th. Thus the sample numbers 6th and 9th will become the non-preferred samples. Applying the proposed plan to this population, we get the following results: p(1) = 0.0462; p(2) = 0.0829; p(3) = 0.0121; p(4) = 0.0589; p(5) = 0.1000; p(6) = 0; p(7) = 0.0417; p(8) = 0.1583; 126 p(9) = 0; p(10) = 0.1171; p(11) = 0.2538; p(12) = 0.1290; with D(p0 , p1) = 1.751808 and the probability of selecting nonpreferred samples, (φ)= p(6) + p(9) = 0. Table 5 The set of all possible samples for the 4x3 population S. No.: 1 2 3 4 5 6 Inc. x x . x x . x x . x x . x x . x x . Ele. x x . x x . . x x . x x x . x x . x . x x x . x x x . x . x x x . . x x x . x . x x x . x x x . . x x x x . S. No.: 7 8 9 10 11 12 Inc. . x x . x x . x x x . x x . x x . x x x . x x . x . x x x . x x . . x x Ele. x x . x . x x x . x x . . x x x x . x . x x x . x x . . x x x x . x x . _____________________________________________________________________ We have also solved this problem using the method of Tiwari and Nigam (1998) and obtained the following results: p(1) = 0.1; p(2) = 0.2; p(3) = 0; p(4) = 0; p(5) = 0; p(6) = 0; p(7) = 0; p(8) = 0.1; p(9) = 0.1; p(10) = 0.1; p(11) = 0.2; p(12) = 0.2; and (φ)= p(6) + p(9) = 0.1. 127 Thus for this example, the method of Tiwari and Nigam (1998) assigns a probability of 0.1 to non-preferred samples, whereas the proposed method always assures zero probability to non-preferred samples. Example 3: Now we consider a real life application where twodimensional stratification is required in plot sampling in field experiments. Consider the yields (in tons) of wheat given in Table 6 for an experiment involving 4 blocks (B1, B2, B3 and B4) and 4 treatments (T1, T2, T3 and T4). Table 6 Yield (in tons) for the 4x4 experiment T1 T2 T3 T4 Total B1 11.25 18.75 13.75 6.25 50.00 B2 16.25 3.75 6.25 23.75 50.00 B3 16.50 6.50 18.75 8.25 50.00 B4 6.00 21.00 11.25 11.75 50.00 50.00 50.00 50.00 200.00 Total 50.00 128 Table 7 Expected Sample Cell Counts (npi) for 4x4 population Expected Sample Cell Counts (npi) ∑ ∑ 0.45 0.75 0.55 0.25 2.0 0.65 0.15 0.25 0.95 2.0 0.66 0.26 0.75 0.33 2.0 0.24 0.84 0.45 0.47 2.0 2.0 2.0 2.0 2.0 8.0 A sample of size n = 8 was selected from this population. The expected sample cell counts (npi) for this 4x4 population are demonstrated in Table 7. The set of all possible samples satisfying the marginal requirements of the 4x4 array consists of 90 samples. Suppose due to considerations of travel and organization of fieldwork, the set of non-preferred samples consists of the samples that contain three or more diagonal elements in them. Thus the set of nonpreferred combinations consists of 57 samples. The remaining 33 preferred samples are demonstrated in Table 8. 129 Table 8 The Set of preferred combinations with their selection probabilities (p1(s)) using the proposed plan for 4x4 population S. No. 1 2 3 4 5 6 Included 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 0 elements 0 0 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 1 1 1 0 0 1 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 p1(s) 0.023925 0.073749 0.051726 0.114597 0.000641 0.003188 S. No. 7 8 9 10 11 12 Included 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 Elements 0 1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 p1(s) S. No 1 0 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 1 1 0 0.000384 0.007685 13 0.03233 0.061156 0.0000 0.0000 15 16 17 18 14 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 Included 1 0 0 1 1 0 0 1 1 0 1 0 0 1 0 1 1 1 0 0 1 1 0 0 Elements 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 1 0 0 0 0 1 1 0 1 0 1 p1(s) 1 0 1 0 0 1 0 1 1 0 1 0 0.257761 0.000339 0.0049 130 0 0 1 1 1 1 0 0 0.010486 0.000425 0.001588 S. No. 19 20 21 22 23 24 Included 0 0 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 elements 1 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 p1(s) 0 1 1 0 1 1 0 0 0 1 1 0 0.001867 0.00327 0.057485 0.000456 0.077772 S. No. 25 26 27 28 0.050686 29 30 Included 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 elements 1 0 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 1 0 p1(s) S. No. 0.007147 0.001349 0.046992 0.00096 31 32 33 Included 0 1 0 1 0 1 0 1 0 1 0 1 elements 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 1 0 1 1 0 1 0 p1(s) 0.000702 0.04646 0.001242 0.033256 0.025465 Applying the proposed plan to this 4x4 population, the selection probabilities of these 33 preferred combinations are also demonstrated in Table 8. The distance between the proposed plan (p1(s)) and the controlled plan (p0(s)) was obtained as D(p0 , p1) = 5.373178. 131 This problem was also solved by the method of Tiwari and Nigam (1998). Incidentally, the probability of selecting non-preferred samples (φ) comes out to be zero for their method also and the distance between their plan and the controlled plan (p0(s)) was obtained as 200.8496. Thus, the proposed method always guarantees zero probability to non-preferred samples and is also nearer to the desired controlled plan p0(s). Further examples were considered to analyze the performance of the proposed plan. Detail description of these examples has been given in appendix. The probabilities of selecting the non-preferred samples (φ) and the distance between the controlled and the resultant IPPS plan [D(p0, p1)] for the proposed plan and the plan of Tiwari and Nigam (1998) are given in Table 9. Table 9 shows that while the plan of Tiwari and Nigam (1998) only attempts to minimize the probability of non-preferred samples, the proposed plan always ensures zero probability to non-preferred samples. The values of D(p0, p1) are also found to be lesser for the proposed plan for all the examples, which shows that the proposed plan is much nearer to the controlled plan which we wished to achieve due to practical considerations. 132 Table 9 The probabilities of selecting the non-preferred samples (φ) and the distance between the controlled and the resultant IPPS plan [D(p0, p1)] for the proposed plan and the plan of Tiwari and Nigam (1998) [T-N] Proposed Plan T-N plan Example 4(3x3 population, N=9, n=6) Example 5(4x4population, N=16, n=8) Example 6 (8x3 population, N=24, n=10) Example 7 (3x3 population, N=9, n=3) φ D(p0, p1) φ D(p0,p1) 0 23.178 0 22.87 0 132.0628 0 0.70611 0 9.296 0 2.1929 .1 .257972 0 .195885 3.4 VARIANCE ESTIMATION FOR THE PROPOSED PLAN In this section, using the idea of random group method for variance estimation originally propounded by Mahalanobis (1939,1946), we propose an estimator for two-dimensional controlled selection problems, which appears to perform better than the split sample estimator of Jessen (1973) and the estimator proposed by 133 Tiwari and Nigam (1998). This has been demonstrated with the help of few examples. As an alternative to H-T estimator, Jessen (1973) has suggested the use of split sample estimator. The split sample estimator of Jessen (1973) is useful in the situations where the stability condition of the H-T estimator or the non-negativity condition of Y-G form of H-T estimator is not satisfied. However, the Jessen’s split sample estimator is negatively biased and bias is found to be quite high. To overcome this difficulty, Tiwari and Nigam (1998) have proposed a method for variance estimation for two-dimensional controlled selection problems. This variance estimator was found to be positively biased but the bias were quite low in comparison to split sample estimator of Jessen (1973). Another limitation of the variance estimators suggested by Jessen (1973) and Tiwari and Nigam (1998) is that exactly two units from each row and column are required to obtain an estimator of population total and variance of the estimator. If two units from each row and column are not available, we cannot use any of the above two methods. To overcome this difficulty, we suggest an alternative method for variance estimation that can be used 134 even for the situations where exactly two units are not available from each row and column. In the proposed method of variance estimation, we first form the random groups of the selected units in the sample. The random group must not be formed in a purely arbitrary fashion. Random groups should be formed so that each random group has essentially the same sampling design as the parent sample. There are different rules for forming the random groups for different sampling designs. For detail about these rules, one may refer to Walter (1985). To construct the random groups from a sample of size n from a population of N units, let us suppose that each random group is of size m (m ≥ 2). Then the number of random groups to be formed will be k = n/m. To select the first random group, we draw a simple random sample with out replacement (SRSWOR) of size m from the parent sample of size n. To obtain the second random group, we again draw a SRSWOR of size m from the remaining n-m units in the sample. This process is repeated until the k random groups are drawn. If n/k is not an integer i.e. n=mk+q; with 0<q<k, then the q excess units may be left out and only k random groups are considered for 135 further estimation purposes. To reduce the computation, we consider the random groups of size m = 2 for the present discussion. Using the proposed method, an unbiased estimator of population total is given by ∧ k y i1 i =1 π i1 Y RG = ∑ ( + yi 2 π i2 ), (3.4.1) where yi1 and yi2 are the observations from the i th random group and π i1 and π i 2 are their corresponding inclusion probabilities. An ∧ estimator of the variance of Y is given by ∧ N ∧ k y i1 i =1 π i1 Var (Y RG ) = (1 − n∑ pi )(1 / 4)∑ ( 2 i =1 − yi 2 π i2 )2 , (3.4.2) N where (1 − n∑ pi 2 ) is an approximate finite population correction i =1 factor and k is the number of random groups. The proposed method of variance estimation can be used for square as well as for rectangular two-way populations and works equally well even for the situations where the units selected from each row and column are not fixed and equal. To demonstrate the utility of the proposed variance estimator and compare it with the Jessen’s (1973) split sample estimator and the 136 estimator proposed by Tiwari and Nigam (1998), we consider the following examples. Example 4.1: Let us first consider a 3x3 population, demonstrated in ∧ Table 12 in Appendix. Values of Var ( Y ) obtained by the split sample (S-S) estimator, the estimator proposed by Tiwari and Nigam (1998) (T-N) and the proposed method are produced in Table 10. The actual value of Y for this population is 123/20. From Table 10, we have 5 ∧ ∧ E (Y RG ) = ∑ p( s )(Y ) s = 123 / 20 = Y i =1 Thus ∧ ∧ Y RG is an unbiased estimate of Y. The expected value of ∧ Var(Y RG ) for proposed estimator is ∧ ∧ E [ Var(Y RG ) ] = 5 ∧ ∧ ∑ p(s) Var(Y ) i =1 s = 0.0596319 . ∧ The true value of Var ( Y ) for this population is 0.0581, which shows that the proposed estimator is positively biased. The bias of the proposed estimator is lowest among the three estimators, showing that the proposed estimator performs better than the Jessen’s (1973) split 137 sample estimator and the estimator proposed by Tiwari and Nigam (1998). Table 10 ∧ Comparison of different estimators of Var ( Y ) for 3x3 population ∧ ∧ Var(Y ) ∧ p(s) ( Y )s 1 0.3 19/3 0.034445 0.03875 0.064583 2 0.1 35/6 0.008611 0.11625 0.075347 3 0.2 13/2 0.008611 0.025833 0.02368 4 0.2 6 0.137777 0.090417 0.051667 5 0.2 35/6 0.077500 0.099028 0.088264 Sample ∧ ∧ S-S T-N Proposed Estimator E [ Var(Y ) ] 0.0559722 0.0663056 0.0596319 Bias -0.0021278 0.0082056 0.0015319 Var [ Var(Y ) ] 0.0022430 0.0011353 0.0004672 (Bias)2 0.0000045 0.0000673 0.0000023 0.0022475 0.0012025 0.0004695 0.0020022 0.0559667 0.0048988 ∧ ∧ ∧ ∧ MSE [ Var(Y ) ] ∧ ∧ (Bias)2 / MSE [ Var(Y ) ] Example 4.2: To further evaluate the utility of the proposed variance estimator, we consider a 4x4 population demonstrated in Table 13 of 138 ∧ ∧ Appendix. The values of Var ( Y ) for the three estimators are presented in Table 11. From Table 11, we get 20 ∧ ∧ E (Y RG ) = ∑ p( s )(Y ) s = 10 = the true value of Y. i =1 ∧ Thus Y RG is an unbiased estimate of Y. The expected value of ∧ ∧ Var(Y RG ) for the proposed estimator is ∧ ∧ E [ Var(Y RG ) ] = 20 ∧ ∧ ∑ p(s) Var(Y ) i =1 s = 0.255927 . ∧ The true value of Var ( Y ) for this population is 0.24375, which again shows that the proposed estimator is positively biased and the bias is lowest for the proposed estimator among the three estimators considered by us. 139 Table 11 ∧ ∧ Comparison of different values of the Var ( Y ) for the 4x4 population ∧ ∧ Var(Y ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ∧ p(s) Sample ( Y )s 0.028116 0.153002 0.061919 0.054647 0.008153 0.013501 0.219054 0.026101 0.027504 0.008004 0.045592 0.003484 0.017874 0.112939 0.039660 0.042060 0.003414 0.107901 0.001931 0.025146 ∧ 10.250 9.8750 9.3750 9.7500 9.7500 9.2500 9.2500 10.375 10.250 9.8750 10.250 10.750 10.250 10.750 10.375 9.7500 10.875 10.875 10.750 10.750 ∧ E [ Var(Y ) ] Bias ∧ ∧ Var [ Var(Y ) ] (Bias)2 ∧ ∧ MSE [ Var(Y ) ] ∧ ∧ (Bias)2 / MSE [ Var(Y ) ] S-S T-N Proposed Estimator 0.578125 0.052031 0.005781 0.208125 0.023125 0.023125 0.023125 0.005781 0.023125 0.052031 0.208125 0.578125 0.023125 0.578125 0.283281 0.023125 0.144531 0.052031 0.578125 0.023125 0.242813 0.199453 0.199453 0.196563 0.208125 0.196563 0.23125 0.095391 0.254375 0.182109 0.439375 0.41625 0.450938 0.404688 0.268828 0.23125 0.268828 0.291953 0.370000 0.346875 0.329531 0.342539 0.221133 0.167656 0.052031 0.248594 0.040469 0.383008 0.398906 0.319414 0.375781 0.387344 0.271719 0.352656 0.359883 0.387344 0.296289 0.307852 0.375781 0.260156 0.139939 -0.103811 0.263878 0.020128 0.255927 0.012177 0.038379 0.010777 0.006944 0.000405 0.016581 0.000148 0.049156 0.007349 0.016729 0.219236 0.0551193 0.0088585 140 The results of the above two examples demonstrate that the proposed variance estimator may perform better than the split-sample estimator of Jessen (1973) and the estimator proposed by Tiwari and Nigam (1998), as it provides lower biases and is also applicable in the situations where these two estimators can not be applied. 141 APPENDIX 3.0 Example 1 Case II: We have given the following population, with N= 15 and n=6. ∑ Expected Sample Cell Counts (npi) ∑ 0.0 0.5 0.5 1.0 0.2 0.3 0.5 1.0 0.2 0.6 0.2 1.0 0.6 0.8 0.6 2.0 0.0 0.8 0.2 1.0 1.0 3.0 2.0 6.00 All possible samples and the set of preferred samples for the above population are already defined. The set of preferred combinations consists of only 10 samples which are the sample numbers 1, 3, 5, 7, 9, 10, 11, 13, 14 and 16. Before applying the proposed plan, we have to find out the values of p(s) and p0(s). These values are given as follows. p(s1) =.000254; p(s2) = .003275; p(s3) = .002271; p(s4) = .000168; p(s5) = .000421; p(s6) = .000917; p(s7) = .004716; p(s8) = .000421; p(s9) = .002189; p(s10) =.011527; Values of p0(s) are given as follows p0(s1)=.0097; p0(s2)=.1251; 142 p0(s3)=.0868; p0(s4)=.0064; p0(s5)=.0161; p0(s6)=.035; p0(s9)=.0837; p0(s10)=.4406; p0(s7)=.1802; p0(s8)=.0160; The objective function and the constraints of the proposed model are given as follows. Min z = 103.07*p1(s)^2+7.98*p2(s)^2+11.5*p3(s)^2+155.32*p4(s)^2+ 62.12*p5(s)^2+28.52*p6(s)^2+5.54*p7(s)^2+62.12*p8(s)^2+11.94*p9 (s)^2+2.26*p10(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=1 2. p1(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)=.5 3. p2(s)+p8(s)+p9(s)+p10(s)=.5 4. p1(s)+p2(s)=.2 5. p4(s)+p5(s)+p8(s)+p9(s)=.3 6. p3(s)+p6(s)+p7(s)+p10(s)=.5 7. p3(s)=.2 8. p1(s)+p2(s)+p4(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.6 9. p5(s)=.2 10. p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.6 11. p1(s)+p2(s)+p3(s)+p6(s)+p8(s)+p10(s)=.8 12. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p7(s)+p9(s)=.6 13. p2(s)+p3(s)+p5(s)+p7(s)+p9(s)+p10(s)=.8 14. p1(s)+p4(s)+p6(s)+p8(s)=.2 15. pi ( s) ≥ 0 for i = 1,2,…,10. 143 After solving above model, we get the desired results, which are already given in example 1. Example 2: Consider a 4x3 hypothetical population given in following table, with N= 12 and n=8. Expected cell counts (npi) Total 0.8 0.5 0.7 2.0 0.8 0.9 0.3 2.0 0.7 0.7 0.6 2.0 0.7 0.9 0.4 2.0 Total 3.0 3.0 2.0 8.0 The set of all possible samples consists of 12 samples, already demonstrated in Table 5. As considered earlier sample number 6 and 9 are non-preferred samples. Values of p(s) for the preferred samples are given as follows. p(s1) =.000203; p(s2) = .000721; p(s3) = .00003; p(s4) = .000293; p(s5) = .00006; p(s6) = .000184; p(s7) = .00135; p(s8) = .002291; p(s9) = .004663; p(s10) =.000941; Values of p0(s) are given as follows p0(s1)=.01886; p0(s2)=.06715; p0(s3)=.003627; p0(s4)=.02728; p0(s5)=.00601; p0(s6)=.017114; p0(s7)=.125676; p0(s8)=.213269; p0(s9)=.434154; p0(s10)=.087592; 144 Now the proposed model can be written in the following form. Min z = 53.1*p1(s)^2+14.9*p2(s)^2+275.91*p3(s)^2+36.68*p4(s)^2+1 66.5*p5(s)^2+58.47*p6(s)^2+7.96*p7(s)^2+4.69*p8(s)^2+2.3*p9(s)^2 +11.42*p10(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p8(s)+p9(s)+p10(s)=.8 3. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)=.5 4. p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.7 5. p1(s)+p2(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)=.8 6. p1(s)+p2(s)+p3(s)+p4(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.9 7. p3(s)+p4(s)+p5(s)+p10(s)=.3 8. p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p10(s)=.7 9. p1(s)+p3(s)+p5(s)+p6(s)+p8(s)+p9(s)+p10(s)=.7 10. p1(s)+p2(s)+p4(s)+p7(s)+p9(s)=.6 11. p1(s)+p3(s)+p4(s)+p6(s)+p7(s)+p9(s)+p10(s)=.7 12. p2(s)+p4(s)+p5(s)+p7(s)+p8(s)+p9(s)+p10(s)=.9 13. p1(s)+p2(s)+p3(s)+p5(s)+p6(s)+p8(s)=.4 14. pi ( s) ≥ 0 for i = 1,2,…,10. After solving above model, we get the results displayed in example 2. Example 3: Consider the following population. We have to select a sample of size n=8 from N=16 population units. The set of all possible samples consists of 90 samples. The set of non-preferred samples are 145 already defined in example 3. The set of preferred samples consists of 33 samples demonstrated in table 8. ∑ Expected Sample Cell Counts (npi) ∑ 0.45 0.75 0.55 0.25 2.0 0.65 0.15 0.25 0.95 2.0 0.66 0.26 0.75 0.33 2.0 0.24 0.84 0.45 0.47 2.0 2.0 2.0 2.0 2.0 8.0 Values of p(s) for the preferred samples are given as follows. p(s1) =.00003; p(s2) = .000139; p(s3) = .000224; p(s4) = .001646; p(s5) = .0000008; p(s6) = .000003; p(s7) = .0000002; p(s8) = .000134; p(s9) = .000221; p(s10) =.000137; p(s11) =.000986; p(s12) = .001586; p(s13) = .008835; p(s14) = .00002; p(s15) = .00003; p(s16) = .00001; p(s17) = .0000003; p(s18) = .000001; p(s19) = .000002; p(s20) =.000151; p(s21) =.00005; p(s22) = .0000003; p(s23) = .000157; p(s24) = .00007; p(s25) = .00009; p(s26) = .000002; p(s27) = .00005; p(s28) = .0000007; p(s29) = .0000005; p(s30) =.002522; p(s31) =.00001; p(s32) = .00006; p(s33) = .00002; Values of p0(s) are given as follows p0(s1)=.0018; p0(s2)=.0081; p0(s3)=.013; p0(s4)=.0956; p0(s5)=.0001; p0(s6)=.0002; p0(s7)=.0001; p0(s8)=.0078; p0(s9)=.0128; p0(s10)=.0079; p0(s11)=.0572; p0(s12)=.0921; 146 p0(s13)=.5130; p0(s14)=.0013; p0(s15)=.0022; p0(s16)=.0007; p0(s17)=.0001; p0(s18)=.0001; p0(s19)=.0001; p0(s20)=.0088; p0(s21)=.0029; p0(s22)=.0001; p0(s23)=.0091; p0(s24)=.0044; p0(s25)=.0057; p0(s26)=.0001; p0(s27)=.0031; p0(s28)=.0001; p0(s29)=.0001; p0(s30)=.1464; p0(s31)=.0005; p0(s32)=.0034; p0(s33)=.0011; Now the proposed model can be written in the following form Min z = 562.5*p1(s)^2+124.2*p2(s)^2+76.8*p3(s)^2+10.5*p4(s)^2+19 429.42*p5(s)^2+4892.4*p6(s)^2+61258.7*p7(s)^2+128.8*p8(s)^2+77. 9*p9(s)^2+126*p10(s)^2+17.4*p11(s)^2+10.8*p12(s)^2+1.9*p13(s)^2+7 36.4*p14(s)^2+460.2*p15(s)^2+1366.8*p16(s)^2+54010.8*p17(s)^2+11 733.6*p18(s)^2+7203.03*p19(s)^2+113.9*p20(s)^2+341.7*p21(s)^2+53 461*p22(s)^2+109.84*p23(s)^2+226.6*p24(s)^2+176.3*p25(s)^2+7768. 5*p26(s)^2+326.5*p27(s)^2+23495*p28(s)^2+29359*p29(s)^2+6.8*p30( s)^2+1675.6*p31(s)^2+286.8*p32(s)^2+848.8*p33(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11 (s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+ p21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30 (s)+p31(s)+p32(s)+p33(s)=1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+ p27(s)+p28(s)=.45 147 3. p1+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+ p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33 (s)=.75 4. p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+ p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p21(s)+p24(s)+p25(s)+p26(s)+p27 (s)+p28(s)=.55 5. p17(s)+p18(s)+p19(s)+p20(s)+p21(s)+p22(s)+p23(s)+p29(s)+p30(s)+p31 (s)+p32(s)+p33(s)=.25 6. p4(s)+p5(s)+p6(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+ p17(s)+p18(s)+p19(s)+p20(s)+p23(s)+p25(s)+p26(s)+p29(s)+p30(s)+p31 (s)+p32(s)=.65 7. p6(s)+p7(s)+p16(s)+p17(s)+p18(s)+p19(s)+p21(s)+p22(s)+p27(s)+p28(s) +p29(s)+p33(s)=.15 8. p1(s)+p2(s)+p3(s)+p5(s)+p7(s)+p8(s)+p15(s)+p22(s)+p24(s)+p26(s)+ p31(s)+p32(s)=.25 9. p1(s)+p2(s)+p3(s)+p4(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+ p14(s)+p16(s)+p20(s)+p21(s)+p23(s)+p24(s)+p25(s)+p27(s)+p28(s)+p30 (s)+p33(s)=.95 10. p1(s)+p3(s)+p6(s)+p7(s)+p8(s)+p9(s)+p12(s)+p13(s)+p15(s)+p16(s)+ p17(s)+p19(s)+p20(s)+p21(s)+p22(s)+p24(s)+p27(s)+p30(s)+p31(s)+p32 (s)+p33(s)=.66 11. p1(s)+p5(s)+p9(s)+p10(s)+p14(s)+p17(s)+p20(s)+p23(s)+p24(s)+p25(s) +p26(s)+p28(s)=.26 12. p2(s)+p4(s)+p10(s)+p11(s)+p13(s)+p18(s)+p21(s)+p23(s)+p29(s)+p30(s) +p32(s)+p33(s)=.75 148 13. p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p11(s)+p12(s)+p14(s)+ p15(s)+p16(s)+p18(s)+p19(s)+p22(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29 (s)+p31(s)=.33 14. p2(s)+p7(s)+p8(s)+p10(s)+p11(s)+p14(s)+p16(s)+p18(s)+p21(s)+p28(s) +p29(s)+p33(s)=.24 15. p2(s)+p3(s)+p4(s)+p8(s)+p11(s)+p12(s)+p13(s)+p15(s)+p18(s)+p19(s)+ p20(s)+p21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p30(s)+p31 (s)+p32(s)=.84 16. p1(s)+p3(s)+p4(s)+p5(s)+p6(s)+p9(s)+p12(s)+p14(s)+p16(s)+p17(s)+ p19(s)+p20(s)+p22(s)+p23(s)+p25(s)+p27(s)+p28(s)+p29(s)+p30(s)+p31 (s)+p33(s)=.45 17. p1(s)+p5(s)+p6(s)+p7(s)+p9(s)+p10(s)+p13(s)+p15(s)+p17(s)+p24(s)+ p26(s)+p32(s)=.47 18. pi ( s) ≥ 0 for i = 1,2,…,33. After solving above model, we get the results displayed in table 8. Example 4: Consider the population given in Table 12 consisting of 9 units in a 3x3 population borrowed from Jessen (1970; p. 778). Table 12 Basic data for 3x3 population πi= npi = 6pi i ∑ ∑ Yi 1 2 3 0.8 0.5 0.7 2.0 5.6 3.0 6.3 14.9 4 5 6 0.7 0.8 0.5 2.0 4.2 3.2 4.0 11.4 7 8 9 0.5 0.7 0.8 2.0 1.5 3.5 5.6 10.6 ∑ 2.0 2.0 2.0 6.0 11.3 9.7 15.9 36.9 149 ∑ A sample of size 6 is to be drawn from this population. The set of all possible samples are given as follows. x . x x x . . x x x x . x . x . x x (1)x x . (2). x x (3)x . x (4) x . x (5). x x (6)x x . . x x x . x x x . . x x x x . x . x Suppose the samples in which all the three units 1st, 5th and 9th do not appear are non-preferred samples. Thus the sample number (4) is non-preferred. Now the values of p(s) are given as follows. p(s1) =.062811; p(s2) = .006922; p(s4) = .008973; p(s5) = .008973; p(s3) = .008973; Values of p0(s) are given as follows p0(s1)=.649866; p0(s2)=.0716183; p0(s4)=.09284; p0(s5)=.09284. p0(s3)=.09284; The objective function and the constraints of the proposed model for this example are given as follows. Min z = 1.53*p1(s)^2+13.96*p2(s)^2+10.77*p3(s)^2+10.77*p4(s)^2+1 0.77*p5(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)=1 2. p1(s)+p2(s)+p3(s)+p4(s)=.8 3. p2(s)+p3(s)+p5(s)=.5 150 4. p1(s)+p4(s)+p5(s)=.7 5. p1(s)+p3(s)+p5(s) =.7 6. p1(s)+p2(s)+p4(s)+p5(s)=.8 7. p2(s)+p3(s)+p4(s)=.5 8. p2(s)+p4(s)+p5(s)=.5 9. p1(s)+p3(s)+p4(s)=.7 10. p1(s)+p2(s)+p3(s)+p5(s)=.8 pi ( s) ≥ 0 11. for i = 1,2,…,5. After solving above model, we get following results p(1)= 0.3; p(2)=0.1; p(3)=0.2; p(4)=0.2; p(5)=0.2 with D( p 0 , p1 ) =.57. Example 5: Now let us consider a 4x4 population consisting of 16 units borrowed from Jessen (1978; p. 375). A sample of size 8 is to be drawn from this population. Basic data for this population is given in Table 13. Table 13 Basic data for the 4x4 population πi= npi = 8pi i ∑ Yi ∑ 1 2 3 4 0.0 0.6 1.0 0.4 2.0 0.0 1.05 2.25 0.8 4.10 5 6 7 8 0.8 0.4 0.4 0.4 2.0 0.6 0.70 0.65 0.4 2.35 9 10 11 12 0.6 0.2 0.4 0.8 2.0 0.6 0.20 0.50 0.6 1.90 13 14 15 16 0.6 0.8 0.2 0.4 2.0 0.3 0.80 0.25 0.3 1.65 ∑ 2.0 2.0 2.0 2.0 8.0 ∑ 1.5 2.75 3.65 2.1 10.00 151 After removing certainty proportions, we get following two way table ∑ ∑ 0.0 0.8 0.6 0.6 0.6 0.4 0.2 0.8 0.0 0.4 0.4 0.2 0.4 0.4 0.8 0.4 1.0 2.0 2.0 2.0 2.0 2.0 1.0 2.0 7.0 All possible combinations are given as follows. . x . . . x . . . x . .. x . . . . x . . x x . . x x . . x . x . x . x . x . . x (1)x . . x (2). . x x (3)x . . x (4). x . x (5)x . x . . . x x x . . x . x . x x . . x . x . x . x . . . x . . . x . . . x . . . x . . x . . x x . . x x . . x x . . x x . . x (6)x . . x (7)x x . . (8). x x . (9). x . x (10). . x x . x x . . . x x x . . x x . x . x x . . . x . . . x . . . x . . . x . . . x . . . x x . . x . x . x . x . . x x . . x x (11)x . . x (12)x . x . (13)x . . x x . . x x . . x x . x . x . . x x x . . . . . x . . . x . . . x . . . x . . . x x x . . xx . . x x . . x x . . x x . . (16). . x x (17)x x . . (18). x x . (19). x . x (20)x . . x x . . x x . x . . x x . x x . . . . x x 152 (14)x x . . (15)x . . x . . . x . . . x . . . x . . . x . . . x x x . . x . x . x . x . x . . x x . . x (21)x . x . (22)x x . . (23). x . x (24)x x . . (25). x x . . x . x . x . x x x . . . x x . x x . . . . . x . . . x . . . x . . . x . . . x . x x . . x x . . x . x . x . x . . x x (26)x x . . x . . x (27)x . . x (28)x x . . x x . . x . x . (29)x . x . x x . . (30)x x . . x x . . Suppose the samples that contain either all the diagonal elements or does not contain any of the diagonal elements are nonpreferred samples. Thus the sample number 2,3,4,7,8,10,11,12,13,14,1 6,17,18,19,20,21,22,25,27 and 28 are preferred samples. The values of p(s) are given as follows. p(s1) =.0015; p(s2) = .0105; p(s3) = .0015; p(s4) = .0038; p(s5) = .0001; p(s6) = .0003; p(s7) = .0195; p(s8) = .0015; p(s9) = .0006; p(s10) =.0001; p(s11) =.0094; p(s12) = .0001; p(s13) = .0003; p(s14) = .0038; p(s15) = .0007; p(s16) = .00071; p(s17) = .0001; p(s18) = .0038; p(s19) = .0001; p(s20) =.0007; Values of p0(s) are given as follows p0(s1)=.0222; p0(s2)=.2880; p0(s3)=.0222; p0(s4)=.0554; p0(s5)=.0016; p0(s6)=.0042; p0(s7)=.2880; p0(s8)=.0222; p0(s9)=.0088; p0(s10)=.0016; p0(s11)=.1378; p0(s12)=.000 153 p0(s13)=.0042; p0(s14)=.0554; p0(s15)=.0105; p0(s16)=.0105 p0(s17)=.0008; p0(s18)=.0554; p0(s19)=.0003; p0(s20)=.0105; The proposed model for this example can be defined as follows. Min z = 45.14*p1(s)^2+3.47*p2(s)^2+45.14*p3(s)^2+18.05*p4(s)^2+6 08.25*p5(s)^2+240.76*p6(s)^2+3.47*p7(s)^2+45.14*p8(s)^2+113.3*p9 (s)^2+608.25*p10(s)^2+7.25*p11(s)^2+1300.14*p12(s)^2+240.76*p13(s )^2+18.05*p14(s)^2+95.59*p15(s)^2+95.59*p16(s)^2+1300.14*p17(s)^ 2+18.05*p18(s)^2+3301.95*p19(s)^2+95.59*p20(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+ p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20 (s)=1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)= 0.6 3. p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+ p20(s)=0.4 4. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p11(s)+p12(s)+p13(s) +p14(s)+p15(s)+p16(s)=0.8 5. p1(s)+p8(s)+p9(s)+p11(s)+p12(s)+p13(s)+p14(s)+p17(s)+p18(s)+p19(s) +p20( s)=0.4 6. p2(s)+p3(s)+p8(s)+p10(s)+p15(s)+p17(s)+p18(s)=0.4 7. p4(s)+p5(s)+p6(s)+p7(s)+p9(s)+p10(s)+p16(s)+p19(s)+p20(s)=0.4 8. p1(s)+p2(s)+p4(s)+p5(s)+p8(s)+p9(s)+p10(s)+p12(s)+p14(s)+p15 (s)+p17(s) +p18(s)+p19(s)+p20(s)=0.6 154 9. p3(s)+p5(s)+p6(s)+p10(s)+p12(s)+p13(s)+p15(s)+p16(s)+p17(s)+p19 (s)=0.2 10. p4(s)+p6(s)+p7(s)+p11(s)+p16(s)+p20(s)=0.4 11. p1(s)+p2(s)+p3(s)+p7(s)+p8(s)+p9(s)+p11(s)+p13(s)+p14(s)+p18(s)=.8 12. p3(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p13(s)+p16(s)+p17 (s)+p18(s) +p19(s)+p20(s)=0.6 13. p2(s)+p4(s)+p7(s)+p11(s)+p14(s)+p15(s)+p16(s)+p18(s)+p20(s) =0.8 14. p1(s)+p5(s)+p9(s)+p12(s)+p13(s)+p14(s)+p19(s)=0.2 15. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s)+p15(s) +p17(s)= 0.4 16. pi ( s) ≥ 0 for i = 1,2,…,20. After solving above model, we get following results p(1)= 0.028116; p(2)=0.153002; p(3)=0.061919; p(4)=0.0546; p(5)= 0.008153; p(6)=0.013501; p(7)=0.219054; p(8)=0.26101; p(9)=0.27504; p(10)= 0.008; p(11)=0.04559; p(12)=0.0034; p(13)=0.0179; p(14)=0.1129; p(15)=0.0396; p(16)=0.042; p(17)=0.0034; p(18)=0.1079; p(19)=0.0019; p(20)=0.02514; with the value of D( p 0 , p1 ) as 0.70611. Example 6: Now we consider an 8x3 population borrowed from Causey et. al. (1985: p. 906) consisting of 24 units and a sample of size n=10 is to be drawn from it. Basic data for this population is reproduced in Table 14. 155 Table 14 Basic data for 8x3 population πi= npi = 10pi Unit (i) 1 4 7 10 13 16 19 22 2 5 8 11 14 17 20 23 3 6 9 12 15 18 21 24 ∑ 0.4 1.2 0.2 1.2 1.0 0.0 0.0 0.0 4.0 2.0 0.0 0.0 0.4 0.6 0.4 0.2 0.0 3.6 ∑ 0.0 1.0 0.0 0.2 0.2 0.4 0.4 0.2 2.4 2.40 2.20 0.20 1.80 1.80 0.80 0.60 0.20 10.00 After removing certainty proportions, we get following two way table. ∑ ∑ 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.4 0.6 0.4 0.2 0.0 1.6 0.0 0.0 0.0 0.2 0.2 0.4 0.4 0.2 1.4 0.4 0.2 0.2 0.8 0.8 0.8 0.6 0.2 4.0 The set of all possible combinations consists of 141 samples out of which the samples having two consecutive units in a column are considered as non-preferred. Thus the set of preferred sample combination s are given as follows. x . . . . . . . . . x . (1). . x . x . . . . . . . x . . . . . . . . . x . (2). . . . x . . . x . . . x . . . . . . . . . x . (3). . . . x . . . . . . x x . . . . . . . . . x . (4). . x . . . . x . . . . 156 x . . . . . . . . . x . (5). . . . . x . x . . . . x . . . . . . . . . x . (6). . . . . . . x . . . x x . . . . . . . . . . x (7). x . . . . . x . . . . x . . . . . . . . (13) . . x . . . . x . . . x . . . x . . . (8). . . . . . . . . . . . x . . x x . . . x . . x . . x . . . . . . . . . . . . . . . . . . . . . . . . . x . . x (9). x . (10). x . (11). . . . . . . . x . . x . x . . . . . x . . . x . . . . . . x . . . . . . . . . . x (12). x . . . . . . x . . . x . . x . . x . . x . . . . . . . . . . . . . . . . . . . . . . . . . . (14). . x (15). . x (16). . x (17). x . . x . . . . . . . . . x . . . . x . . . . . . . . . . . . . . x . . . x . . x . . x . . x . . . x . . . . . . . . (18). . . . . x . x . . . x . . . x . . x . . . . . . . . . . . . . . (19). x . (20). . . . . x . . x . . . . x . . . . . . . . . x . . x . . . . . . x . . x . . . . . . . . (25). x . (26). x . . . x . . . . x . . x . . . . . . x . . . . . . . . . x . . . . . (31). . x . x . . . . . x . . . . . . . x . . . . . (32). . . . x . . . x . x . . . . x . . . . . . . . (21). . . . . x . . . . x . . . x x . . (23). . . . . . . x . . . . . . . . . . . . . . (24) . . . x. . . . . x . . x . . . x . . x . . x . . . . . . . x . . . . . . . x . (28). x . . . . . x x . . . . . . . x . . x . . . . x . (29). . . . . . . . . . . . x . . . . . . . x . (30). x . . . . . . . x . . . x . . x . . . . . x . . . . . . x . . x . . . . . . . . (33). . . (34). . x . x . . x . . . . . . x . x . . . . . . x . . . . x . (35). . . . . . . . . . . . x . . . x x . . . . x . (27). . . . . x . . . . . . . . (22). x . . . . . . x . . . . . x 157 . . . x . . . . . (36). . x . x . . . . . . x . . . . . . x . . . . . (37). . x . . . .x . . . x . . . . x . (38). . . . . . . . x . . . . . . x . (43). . x . . . . . . . . x . . . . x . . x (49). . . x . . . . . . . . . . . x x . . . . . . x . . . . . . x . . x . . . . . . . . (39). . x (40). . x . . . . . . . x . . . . . . . . x . . . x . . x . . . . x . x . . x . (42) . . . . . . . . . . . . . . x x . . x . . . . . x . . . . . . . . (44). . x . x . . . . . . x . . . . . . . . . x . . x . . x . . . . . . . . . . . . . . .x . . . . (45). . x (46). . . (47). x . . . . . . x . . x . x . . . . . . . . . x . . x . . x . x . . (48). . . . . . . . . . . . . . . x x . . x . . . . x . . . . . . . . . x . . . x . (50). . . . x . . . x . . . . . . . . . . . . . . . x . . x . . . x . . x . (51). . . (52). . x . x . . . . . . . . x . . . x . . . . . . . . . x . . . x . (53). . . . . x . x . . . . . . . . . . x . . . x . (54) . . . . . . . x . . . x . . . . . . x . . . . x (55). x . . . . . x . . . . . . . . . . x . . . . . (56). x . . . x . x . . . . . . . . . . . . . . . . x . . x . . . . . . . x (57). x . (58). x . . . . . . x . x . . . . . . x . . . . . . . . . x . . . . x (59). . . . . x . x . . . . . . . . . . x . . . . x (60). x . . . . . . x . . . . . . . . . x . . . . x (61). . . . x . . . x . . . . . . . . . x . . . . x (62). . . . x . . . . . . x . . . . . . . . . . . . x . . x . . . . x . x . (63). . . (64). . x . . . . . . . x . . . x . . x . . . . . . . . . x . . . . . (65). . x . x . . . x . . . . . . . . . x . . . x . (66). . x . . . . . . . . x 158 . x . (41). . . . . . . . x . . . . . . x . (67). . . . . . . . . . . . . x x . . . . x . . . . . . x . . . . . (68). . x . . . . x . . . x . . . . . . . . . . . . x . . x . . . x . . . . (69). . . (70). x . . . x . . x . . . . . . . . x . . x . . . . . . x . . . . . (71). . . . . x . x . . . x . . . . . . . . . x . . (72). x . . . x . x . . . . . . . . . . . . . (73)x . . . x . . . . . x . . . x . . . . . . . . . (74)x . . . . x . x . . . x . . . . . . (75)x . . . . . . . . . . (77)x . . x . . . . . . . . . . . . . . . (78)x . . . . . . . x . x . . . x . . . . . . . . . . . . . . . . . (76)x . . . x . . x x . . . . . . . x . . x . . x . . . . . x . x The values of p(s) are given as follows. p(s1) =.0020; p(s2) = .0050; p(s3) = .0020; p(s4) = .0008; p(s5) = .0020; p(s6) = .0008; p(s7) = .0017; p(s8) = .0042; p(s9) = .0017; p(s10) =.0042; p(s11) =.0008; p(s12) = .0042; p(s13) = .0020; p(s14) = .0017; p(s15) = .0008; p(s16) = .0003; p(s17) = .0020; p(s18) = .0020; p(s19) = .0008; p(s20) =.0008; p(s21) =.0003; p(s22) = .0020; p(s23) = .0042; p(s24) = .0008; p(s25) = .0008; p(s26) = .0020; p(s27) = .0008; p(s28) = .0003; p(s29) = .0008; p(s30) =.0003; p(s31) =.0007; p(s32) = .0017; p(s33) = .0007; p(s34) = .0017; p(s35) = .0003; p(s36) = .0017; p(s37) = .0008; p(s38) = .0007; p(s39) = .0003; p(s40) =.0001; p(s41) =.0008; p(s42) = .0008; p(s43) = .0003; p(s44) = .0003; p(s45) = .0001; p(s46) = .0008; p(s47) = .0017; p(s48) = .0003; p(s51) =.0008; p(s52) = .0003 p(s49) = .0008; p(s50) =.0020; 159 p(s53) = .0008; p(s54) = .0003; p(s55) = .0007; p(s56) = .0017; p(s57) = .0007; p(s58) = .0017; p(s59) = .0003; p(s60) =.0017; p(s61) =.0008; p(s62) = .0003; p(s63) = .0001; p(s64) = .0008; p(s65) = .0008; p(s66) = .0003; p(s67) = .0003; p(s68) = .0001; p(s69) = .0008; p(s70) =.0017; p(s71) =.0003; p(s73) = .0007; p(s74) = .0008; p(s75) = .0003; p(s76) = .0001; p(s77) = .0017; p(s78) = .0003. p(s72) = .0017; Values of p0(s) are given as follows p0(s1)=.0223; p0(s2)=.0548; p0(s3)=.0223; p0(s4)=.009; p0(s5)=.0223; p0(s6)=.009; p0(s7)=.0188; p0(s8)=.0462; p0(s9)=.0188; p0(s10)=.0462; p0(s11)=.009; p0(s12)=.0462; p0(s13)=.0223; p0(s14)=.0188; p0(s15)=.009; p0(s16)=.0036; p0(s17)=.0223; p0(s18)=.0223; p0(s19)=.009; p0(s20)=.009 p0(s21)=.0036; p0(s22)=.0223; p0(s23)=.0462; p0(s24)=.009 p0(s25)=.009; p0(s26)=.0223; p0(s27)=.009; p0(s28)=.0036 p0(s29)=.009; p0(s30)=.0036; p0(s31)=.0076; p0(s32)=.0188 p0(s33)=.0076; p0(s34)=.0188; p0(s35)=.0036; p0(s36)=.0188 p0(s37)=.009; p0(s38)=.0076; p0(s39)=.0036; p0(s40)=.0014 p0(s41)=.009; p0(s42)=.009; p0(s43)=.0036; p0(s44)=.0036 p0(s45)=.0014; p0(s46)=.009; p0(s47)=.0188; p0(s48)=.0036 p0(s49)=.009; p0(s50)=.0223; p0(s51)=.009; p0(s52)=.0036 p0(s53)=.009; p0(s54)=.0036; p0(s55)=.0076; p0(s56)=.0187 p0(s57)=.0077; p0(s58)=.0188; p0(s59)=.0036; p0(s60)=.0187 p0(s61)=.009; p0(s62)=.0036; p0(s63)=.0014; p0(s64)=.0089 160 p0(s65)=.009; p0(s66)=.0036; p0(s67)=.003 p0(s68)=.0014 p0(s69)=.009; p0(s70)=.0188; p0(s71)=.0036; p0(s72)=.0187 p0(s73)=.0076; p0(s74)=.009; p0(s75)=.0036; p0(s76)=.0014 p0(s77)=.0188; p0(s78)=.0036 The proposed model for this example can be defined as follows. Min z = 44.91*p1(s)^2+18.24*p2(s)^2+44.91*p3(s)^2+111.21*p4(s)^2 +44.91*p5(s)^2+111.21*p6(s)^2+53.22*p7(s)^2+21.62*p8(s)^2+53.22 *p9(s)^2+21.62*p10(s)^2+111.21*p11(s)^2+21.62*p12(s)^2+44.91*p13( s)^2+53.22*p14(s)^2+111.21*p15(s)^2+276.79*p16(s)^2+44.91*p17(s)^ 2+44.91*p18(s)^2+111.21*p19(s)^2+111.21*p20(s)^2+276.79*p21(s)^2 +44.91*p22(s)^2+21.62*p23(s)^2+111.21*p24(s)^2+111.21*p25(s)^2+4 4.91*p26(s)^2+111.21*p27(s)^2+276.79*p28(s)^2+111.21*p29(s)^2+27 6.79*p30(s)^2+131.8*p31(s)^2+53.22*p32(s)^2+131.8*p33(s)^2+53.22* p34(s)^2+276.79*p35(s)^2+53.22*p36(s)^2+111.21*p37(s)^2+131.8*p38 (s)^2+276.79*p39(s)^2+691.98*p40(s)^2+111.21*p41(s)^2+111.21*p42 (s)^2+276.79*p43(s)^2+276.79*p44(s)^2+691.98*p45(s)^2+111.21*p46 (s)^2+53.22*p47(s)^2+276.79*p48(s)^2+111.21*p49(s)^2+44.91*p50(s) ^2+111.21*p51(s)^2+276.79*p52(s)^2+111.21*p53(s)^2+276.79*p54(s) ^2+131.8*p55(s)^2+53.22*p56(s)^2+131.8*p57(s)^2+53.22*p58(s)^2+2 76.79*p59(s)^2+53.22*p60(s)^2+111.21*p61(s)^2+276.79*p62(s)^2+69 1.98*p63(s)^2+111.21*p64(s)^2+111.21*p65(s)^2+276.79*p66(s)^2+27 6.79*p67(s)^2+691.98*p68(s)^2+111.21*p69(s)^2+53.22*p70(s)^2+276. 79*p71(s)^2+53.22*p72(s)^2+131.8*p73(s)^2+111.21*p74(s)^2+276.79 *p75(s)^2+691.98*p76(s)^2+53.22*p77(s)^2+276.79*p78(s)^2-1 161 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p1 1(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20( s)+p21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s) +p30(s)+p31(s)+p32(s)+p33(s)+p34(s)+p35(s)+p36(s)+p37(s)+p38(s)+ p39(s)+p40(s)+p41(s)+p42(s)+p43(s)+p44(s)+p45(s)+p46(s)+p47(s)+p4 8(s)+p49(s)+p50(s)+p51(s)+p52(s)+p53(s)+p54(s)+p55(s)+p56(s)+p57( s)+p58(s)+p59(s)+p60(s)+p61(s)+p62(s)+p63(s)+p64(s)+p65(s)+p66(s) +p67(s)+p68(s)+p69(s)+p70(s)+p71(s)+p72(s)+p73(s)+p74(s)+p75(s)+ p76(s)+p77(s)+p78(s)=1 2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p1 1(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20( s)+p21(s)+p22(s)+p23(s)+p24(s)=.4 3. p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33(s)+p3 4(s)+p35(s)+p36(s)+p37(s)+p38(s)+p39(s)+p40(s)+p41(s)+p42(s)+p43( s)+p44 (s)+p45(s)+p46(s)+p47(s)+p48(s)=.2 4. p49(s)+p50(s)+p51(s)+p52(s)+p53(s)+p54(s)+p55(s)+p56(s)+p57(s)+p5 8(s)+p59(s)+p60(s)+p61(s)+p62(s)+p63(s)+p64(s)+p65(s)+p66(s)+p67( s)+p68 (s)+p69(s)+p70(s)+p71(s)=.2 5. p72(s)+p73(s)+p74(s)+p75(s)+p76(s)+p77(s)+p78(s)=.2 6. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p17(s)+p19(s)+p22(s)+p25(s) +p26(s)+p27(s)+p28(s)+p29(s)+p30(s)+p41(s)+p43(s)+p46(s)+p49(s)+ p50(s)+ p51(s)+p52(s)+p53(s)+p54(s)+p64(s)+p66(s)+p69(s)=.4 162 7. p7(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p31(s)+p34 (s)+p35(s)+p36(s)+p37(s)+p38(s)+p39(s)+p40(s)+p55(s)+p58(s)+p59(s )+p60 (s)+p61(s)+p62(s)+p63(s)=.2 8. p7(s)+p8(s)+p9(s)+p10(s)+p12(s)+p14(s)+p23(s)+p31(s)+p32(s)+p33(s )+p34(s)+p36(s)+p38(s)+p47(s)+p55(s)+p56(s)+p57(s)+p58(s)+p60(s) +p70(s)+p72(s)+p73(s)+p77(s)=.6 9. p1(s)+p4(s)+p17(s)+p18(s)+p19(s)+p20(s)+p21(s)+p25(s)+p28(s)+p41( s)+p42(s)+p43(s)+p44(s)+p45(s)+p49(s)+p52(s)+p64(s)+p65(s)+p66(s) +p67(s)+p68(s)+p74(s)+p75(s)+p76(s)=.2 10. p1(s)+p2(s)+p3(s)+p13(s)+p15(s)+p18(s)+p20(s)+p25(s)+p26(s)+p27(s )+p37(s)+p39(s)+p42(s)+p44(s)+p49(s)+p50(s)+p51(s)+p61(s)+p62(s) +p65(s)+p67(s)+p74(s)+p75(s)=.4 11. p5(s)+p8(s)+p10(s)+p11(s)+p22(s)+p23(s)+p24(s)+p29(s)+p32(s)+p34( s)+p35(s)+p46(s)+p47(s)+p48(s)+p53(s)+p56(s)+p58(s)+p59(s)+p69(s) +p70(s)+p71(s)+p72(s)+p77(s)+p78(s)=.4 12. p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p11(s)+p16(s)+p21(s)+p24(s) +p28(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33(s)+p35(s)+p40(s)+p45(s)+ p48(s)+p52(s)+p53(s)+p54(s)+p55(s)+p56(s)+p57(s)+p59(s)+p63(s)+p6 8(s)+p71(s)+p72(s)+p73(s)+p76(s)+p78(s)=.2 13. p2(s)+p12(s)+p13(s)+p17(s)+p18(s)+p26(s)+p36(s)+p37(s)+p41(s)+p42 (s)+p50(s)+p60(s)+p61(s)+p64(s)+p65(s)+p74(s)=.4 14. p3(s)+p6(s)+p9(s)+p14(s)+p15(s)+p16(s)+p19(s)+p20(s)+p21(s)+p22(s )+p23(s)+p24(s)+p27(s)+p30(s)+p33(s)+p38(s)+p39(s)+p40(s)+p43(s) +p44(s)+p45(s)+p46(s)+p47(s)+p48(s)+p51(s)+p54(s)+p57(s)+p62(s)+ p63(s)+p66(s)+p67(s)+p68(s)+p69(s)+p70(s)+p71(s)+p73(s)+p75(s)+p7 6(s)+p77(s)+ p78(s)=.2 163 15. pi ( s) ≥ 0 for i = 1,2,…,78. After solving above model, we get following results p(1)= 0.106451; p(2)=0.100122; p(8)=0.046253; p(12)=0.101397 p(23)= 0.045779; p(25)=0.046825; p(26)=0.050308; p(32)=0.02685; p(36)=0.049532 p(47)= 0.026474; p(49)=0.04672; p(50)=0.049571; p(56)=0.026453; p(60)=0.049064 p(70)= 0.028191; p(72)=0.100437; p(77)=0.099552. The value of p(s) for all other preferred sample combinations comes out to be 0 with the value of D( p 0 , p1 ) as 2.1929. Example 7: We consider a 3x3 hypothetical population with N = 9 and n = 3. The expected sample cell counts (nAi) for this population is given in Table 15. Table 15 Expected Sample Cell Counts (npi) for 3x3 population ∑ Expected Sample Cell Counts (nAi) ∑ 0.2 0.2 0.6 0.3 0.4 0.3 0.5 0.4 0.1 1.0 1.0 1.0 1.0 1.0 1.0 3.0 According to above population the set of all possible samples are given as follows. 164 x . . x . . . x . . x . . . x . . x (1). x . (2). . x (3)x . . (4) . . x (5)x . . (6). x . . . x . x . . . x x . . . x . x . . The set of non-preferred samples consists of those samples in which all the three units 1st, 5th and 9th appear together. Thus nonpreferred sample is sample number1. Values of p(s) for preferred sample combinations are given as follows. p(s1) =.07729; p(s2) = .270527; p(s3) = .05797; p(s4) = .167469; p(s5) = .025764. Values of p0(s) are given as follows p0(s1)=.129037; p0(s2)=.451631; p0(s4)=.27958; p0(s5)=.043012. p0(s3)=.096778; The objective function and the constraints for this population are given as follows. Min z = 3.58*p1(s)^2+7.75*p2(s)^2+2.21*p3(s)^2+23.25*p4(s)^2+10. 33*p5(s)^2-1 Subject to the constraints 1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)=1 2. p1(s)=.2 3. p2(s)+p3(s)=.3 4. p4(s)+p5(s)=.5 5. p2(s)+p4(s)=.2 165 6. p5(s)=.4 7. p1(s)+p3(s)=.4 8. p3(s)+p5(s)=.6 9. p1(s)+p4(s)=.3 10. p2(s)=.1 11. pi ( s) ≥ 0 for i = 1,2,…,5. After solving above model, we get following results p(1)= 0.2; p(2)=0.1; p(4)=0.1 p(5)= 0.4. p(3)=0.2; with the value of D( p 0 , p1 ) as 1.1944. 166 CHAPTER IV ON STATISTICAL DISCLOSURE CONTROL USING RANDOM ROUNDING AND CELL PERTURBATION TECHNIQUES 4.1 INTRODUCTION Statistical offices collect information about society. The most common method of providing data to the public is through statistical tables. Each entry in a table is called a cell. In some situations, it is required that the statistical offices do not disclose in any way the information provided by the individual respondent. The release of statistical data inevitably reveals some information about individual data subject. When confidential information is revealed disclosure occurs. Thus statistical offices need to protect the confidentiality of data it collects. Not all the data collected and published by the statistical offices are confidential. The statistical offices have to protect only confidential data. The cells in a table containing confidential data are termed as “Sensitive cells” and all other cells are 167 termed as “non-sensitive cells”. In sensitive cells, we assume the existence of individuals who may analyze the published pattern to disclose the confidential information. These individuals are referred to as “Attackers” (or “Intruders” or “Snoopers”). If there exists more than one attacker in a cell, the problem is referred to as “Multiattacker” problem. On the other hand the problem with only one attacker in a cell is referred to as “Single-attacker” Problem. Attackers can also be categorized as “External attacker” and “Internal attacker”. External attacker knows the set of linear system My = b and the information that the cell values are non-negative. Internal attacker knows the set of linear system My = b and also the tighter bounds (lower and upper bounds) on cell values. Before publishing any information, statistical offices face two problems. The first problem is of identifying the sensitive cells in a table. Identification of sensitive cells is carried out through several rules such as threshold rule, linear sensitivity rule, p percent rule, p-q percent rule etc. This problem has been discussed in details by Cox (1980, 1981), Willenborg and Waal (2001) and Merola (2003a). The second problem is of protecting the confidential information contained in sensitive cells, while minimizing the loss of information. This problem is generally termed 168 as “Disclosure control”. In this chapter, we concern ourselves with the problem of disclosure control with single internal attacker. The confidential information can be protected by the application of statistical disclosure limitation methods, which ensure that the risk of disclosing confidential information is very low, while minimizing the loss of information. Several disclosure control techniques are used in the literature to achieve the required protection of confidential information. Two widely used techniques of disclosure control are “Controlled rounding” and “Cell suppression”. Rounding techniques involve the replacement of the original data by multiples of a given rounding base. Controlled rounding problem is the problem of optimally rounding real valued entries in a tabular array to adjacent integer values in a manner that preserves the tabular structure of the array. Rounding methods are used for many purposes, such as for improving the readability of data values, to control statistical disclosure in tables, to solve the problem of iterative proportional fitting (or raking) in two-way tables and controlled selection. Statistical disclosure control is one of the area in which rounding methods are widely used. Fellegi (1975) proposed a technique for random rounding which unbiasedly rounds the cell 169 values and also maintains the additivity of the rounded table. The drawback of the random rounding procedure proposed by Fellegi (1975) is that it is applicable to one-dimensional tables only. Cox and Ernst (1982) used the transportation theory in linear programming to obtain an optimal controlled rounding of a two way tabular array. Using the general theory of transportation problems they demonstrated that solutions always exist to the controlled rounding problems. They also showed that their technique guarantees optimal solutions to the zero-restricted controlled rounding problem, i.e., the controlled rounding in which the absolute difference between the original values and the rounded values is always less than the rounding base, subject to the restriction that the integer values are rounded to themselves. Causey, Cox and Ernst (1985) summarized the idea of Cox and Ernst (1982) and used the transportation theory to solve the controlled rounding problem. They discussed several statistical applications in which controlled rounding can be used and applied the concept of controlled rounding to solve the controlled selection problem. They also showed that the zero-restricted controlled rounding problem in three dimensional tables is not always feasible. Cox (1987) presented a constructive algorithm for achieving 170 unbiased controlled rounding which is simple to implement by hand. He also discussed a controlled rounding problem in three dimensions and provided a counter example to the existence of unbiased controlled rounding in three dimensions. Tiwari and Nigam (1988) improved the method of Cox (1987) to terminate it in fewer steps. Another method widely used by different researchers for protecting sensitive cells in a table is the method of “Cell suppression”, in which sensitive cells are not published i.e. they are suppressed. These suppressed sensitive cells are called primary suppressions. To make sure that the primary suppressions cannot be derived by subtraction from published marginal totals, additional cells are selected for suppressions, which are known as complementary suppressions or secondary suppressions. Remaining cells in the table are published with their original values. The problem of cell suppression is to find out the complementary suppressions in such a way that the loss of information is minimum. This problem has been widely discussed by Cox (1980, 1995), Sande (1984), Carvalho et.al. (1994) and Fischetti and Salazar (2000). In cell suppression, a large amount of information is lost as in addition to suppression of sensitive cells, some non-sensitive cells are also suppressed. To reduce the loss 171 of information, Fischetti and Salazar (2003) proposed an improved methodology, known as “Partial cell suppression”. In partial cell suppression, instead of wholly suppressing primary and complementary suppressed cells, some intervals obtained with the help of a mathematical model, are published for these cell entries. In partial cell suppression, the published intervals must provide a convenient set of possible values for the corresponding entries, containing the true original values. The loss of information in partial cell suppression is smaller in comparison to complete cell suppression. In order to reduce the amount of data loss that occurs from cell suppressions, Salazar (2005) proposed an improved method and termed it as “Cell perturbation”. This method is closely related to the classical controlled rounding methods and has the advantage that it also ensures the protection of sensitive cells to a specified level, while minimizing the loss of information. However, this method has some disadvantages also. Firstly, it perturbs all the cell values resulting a large amount of data loss. Secondly, the marginal cell values of the resultant tables are not preserved, thereby disturbing the marginal which are non-sensitive and expected to be published in their original form. 172 In this chapter, we use the idea of random rounding and Integer Quadratic Programming to propose an improved methodology for disclosure control in an array that perturbs only the sensitive cells and adjusts some non-sensitive cells to preserve the marginal values of the array. The table obtained through the proposed procedure guarantees the protection level requirement and also attempts to minimize the information loss by minimizing the distance between the original and final table. In section 4.2, we first describe the problem of attacker and the protection of sensitive cells and then introduce the proposed methodology of disclosure control against the single internal attacker. The proposed methodology appears to perform better than the procedure suggested by Salazar (2005). In section 4.3, we discuss some numerical examples to demonstrate the utility of the proposed procedure. 173 4.2 CONTROLLED CELL PERTURBATION: THE PROPOSED METHODOLOGY Let A denote the tabular array (a ) (a ) (a ) pq mXn p . mX1 (a.. )1X1 . q 1Xn . The tabular array A can be represented with the help of a vector a= (ai : i ∈ I ) , where a1 = a11, a2 = a12, a3 = a13 …so on are all nonnegative integers and I is the set of all elements including internal, marginal and grand total, consisting of mn+m+n+1 elements with the structure Ma=0, i.e., 1 0 . . . 1 0 1 1 0 . . . 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 0 . . . . . 1 −1 0 . . . . . 0 0 1 . . . 1 . 1 0 1 . . . 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . 0 1 . . . . . 1 0 −1 . . . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 174 0 . . . . −1 . . 0 . . . . 0 −1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a1 a 2 0 0 . 0 . 0 . a 0 n 0 a n +1 0 a 0 an + 2 00 n+3 0 0 . . 0 . . = 0 . . 0 a 0 2 n +1 0 a 2 n + 2 00 − 1 . 0 . 0 . 0 a mn + m +1 0 a mn + m + 2 0 0 . 0 . . 0 a mn + m + n +1 . The vector a= (ai : i ∈ I ) satisfies the linear system My=b and contains some sensitive cells also. Let us denote the subset of sensitive cells by S. Let there be r sensitive cells each having one internal attacker denoted by ks (s= 1…r). The set of attackers in different sensitive cells is denoted by K. Now suppose that by observing the published pattern, attacker ks will compute the interval ( y s k ... y s k ) , where y s k is the minimum and y s k is the s s s s maximum value of the interval. The sensitive cell s will be protected against the attacker ks if the interval computed by the attacker ks is wide enough. To decide whether the interval computed by the attacker ks is wide enough or not we need three parameters defined as follows: Upper protection level: It is a number UPLs k representing a s desired lower bound for y s k − a s . s Lower protection level: It is a number LPLs k representing a s desired lower bound for a s − y s k . s Sliding protection level: It is a number SPLs k representing a s desired lower bound for y s k − y s k . s s 175 The values of these parameters are provided by statistical offices for each sensitive cell and for each attacker ks. These values can also be defined by using common sense rule [see, Sande (1984)]. Protection values are assumed to be unknown to the attacker. Let us assume that the attacker ks knows two bounds lbi k and ubi k such that s k s k ai ∈ (lbi s ...ubi s ) for each cell i ∈ I . Thus the sensitive cells in the published table will be protected if lbi ks ≤ ys ks ≤ ai − LPLs ks ≤ ai ≤ ai + UPLs ks ≤ ys ks ≤ ubi ks (4.2.1). This protection level is obtained by satisfying the protection equations which are obtained with the help of the attacker’s problem. Suppose the attacker is provided with the information that some values of the table are rounded to a common rounding base b. Then the attacker’s problem becomes ∑M ji yi = b j i xi − b ≤ y i ≤ x i + b lbi ks ≤ y i ≤ ubi ks (4.2.2) ∀ i∈I where j represent the number of equations ( j = 1,…,m+n+1 ) and ( xi : i ∈ I ) is the published pattern. The attacker can compute the value 176 of y s k and y s k respectively by maximizing y s k and minimizing y s k , s s s s subject to the constraint (4.2.2). The published table will be protected if Maximize [ y s k : (4.2.2) holds] ≥ u s l s k s Minimize [ y s k : (4.2.2) holds] ≤ l s l s k s (4.2.3) s (4.2.4) s Maximize [ y s k :(4.2.2) holds]s Minimize[ y s k : (4.2.2) holds] ≥ SPLs k s s (4.2.5) where u s l s k = as + UPLs k and l s l s k = as - LPLs k . s s s s In order to solve the constraints (4.2.3)-(4.2.5), we convert these constraints into linear form, using duality theory in linear programming. Let us consider the dual variables α i 1 , β i 1 , α i 2 , β i 2 and γ j associated with the inequalities y i ≤ ubi k , − yi ≤ −lbi k , y i ≤ xi + b , s − y i ≤ b − xi and ∑M ji s y i = b j , respectively. Thus the attacker’s i problem Maximize [ y s k : (4.2.2) holds] is equivalent to s 177 Minimize ∑γ b j + ∑ [α i ubi 1 j j ks + α i ( x i + b) − β i lbi 2 1 ks − β i ( x i − b)] 2 i (4.2.6) Subject to the constraints α s 1 + α s 2 − β s 1 − β s 2 + ∑ M js γ j = 1 for all S α i 1 + α i 2 − β i 1 − β i 2 + ∑ M ji γ j = 0 for all non - sensitive cells j j α i1 ≥ 0 αi2 ≥ 0 (4.2.7) β i1 ≥ 0 βi 2 ≥ 0 γ j is unrestricted in sign. Now (4.2.3) can be written in simplified form as k k Maximize [ y s : (4.2.2) holds] ≥ u s l s s s ⇒ Minimize (4.2.6) ≥ u s l s k : all α i 1 , α i 2 , β i 1 , β i 2 , γ j satisfying (4.2.7) s ⇒Minimize ∑γ j b j + ∑ [α i ubi s + α i ( xi + b) − β i lbi s − β i ( xi − b)] 1 j k 2 1 i 178 k 2 ≥ usls k s ⇒Minimize ∑γ j b j + ∑ [α i UBi s + α i xi + α i xi + α i b − β i xi + β i LBi k 1 j 1 2 2 1 1 ks i − β i xi + β i b] ≥ a s + UPLs 2 2 ks ⇒ ∑ [α UBi s + α i ( xi + b − ai ) + β i LBi 1 i k 2 1 ks − β i ( xi − b − ai )] ≥ UPLs 2 ks i (4.2.8) where UBi k = ubi k − ai and LBi k = ai - lbi k , s s s s for all α i 1 , α i 2 , β i 1 , β i 2 , γ j satisfying (4.2.7). Similarly (4.2.4) can be written in simplified form as ∑ [α UBi s + α i ( xi + b − ai ) + β i LBi '1 i k '2 '1 ks − β i ( xi − b − ai )] ≥ LPLs '2 ks i (4.2.9) for all α i '1 , α i '2 , β i '1 , β i '2 and γ j ' satisfying the following constraints α s '1 + α s '2 − β s '1 − β s '2 + ∑ M js γ j ' = −1 for all S j α i '1 + α i '2 − β i '1 − β i '2 + ∑ M ji γ j ' = 0 for all non-sensitive cells j 179 α i '1 ≥ 0 α i '2 ≥ 0 (4.2.10) β i '1 ≥ 0 β i '2 ≥ 0 γ j ' is unrestricted in sign, and (4.2.5) reduces to ∑ [(α 1 i + α i )UBi s + (α i + α i )( xi + b − ai ) + ( β i + β i ) LBi k '1 2 '2 1 '1 ks i + ( β i + β i )(ai − xi + b)] ≥ SPLs 2 '2 ks (4.2.11) for all α i 1 , α i 2 , β i 1 , β i 2 , γ j satisfying (4.2.7) and α i '1 , α i '2 , β i '1 , β i '2 , γ j ' satisfying (4.2.10). The conditions obtained through (4.2.8), (4.2.9) and (4.2.11) ensure upper protection, lower protection and sliding protection, respectively. Solving (4.2.7) and (4.2.10), we obtain the values of the dual variables α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 . In some situations, it may happen that some or all of the values of α i 1 , α i 2 , β i 1 , β i 2 come out to be 0 for one or more sensitive cells. In such situations, we may not obtain the inequality for upper protection and hence the upper protection requirement may not be satisfied. Similar situation may 180 arise for the case of lower protection and sliding protection also. Substituting the values of UBi k , LBi k , UPLs k , LPLs k , SPLs k , α i 1 , α i 2 s s s s s , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 , β i '2 ,ai and b in (4.2.8), (4.2.9) and (4.2.11), we obtain the three simplified inequalities, which consist of only the variable xi and a constant. If the values for the sensitive cells in the final table satisfy these inequalities, we say that the table is protected against the single internal attacker. Now, we round the sensitive cells unbiasedly to base b. The rounding base b should be chosen in such a way that it is, as far as possible, a multiple of the sum of the entries in the sensitive cells. However, if it is not possible to choose a rounding base, which is a multiple of sum of the entries in sensitive cells, some other rounding base may be chosen. The advantage of taking the rounding base, a multiple of the sum of entries of sensitive cells is that the sum of the rounded values of the sensitive cells will remain unaltered. Moreover, we also assure that b should not be a multiple of any of the sensitive cell values, otherwise that value will be rounded to itself. From these sets of unbiasedly rounded values, we select the set which satisfy the simplified inequalities for upper, lower and sliding protection, i.e., (4.2.8), (4.2.9) and (4.2.11). If more than one set of unbiasedly 181 rounded values satisfy the protection equations, we choose the set which has the minimum distortion between the rounded and the original values, i.e., { 2 ∑ ( xi − a i ) i } 1/ 2 (*) where ai and xi represents the original and rounded values, respectively. The sensitive values in the table are then replaced by these unbiasedly rounded values. After replacing the sensitive cell values with the rounded values, the resultant table may not be additive. To make the table additive, some or all of the non-sensitive cell values are then adjusted from their true value by as small an amount as possible. This is achieved with the help of the following model: m n Minimize z = ∑∑ p =1 q =1 x pq 2 a pq −1 (4.2.12) Subject to the constraints n (i )∑ x pq = X p − ∑ S q =1 ∀ p = 1 …m q m (ii )∑ x pq = X q − ∑ S p =1 ∀ q= 1… n p (iii)∑∑ x pq = X pq − ∑∑ S p q p (4.2.13) q 182 (iv) lb pq ks ≤ x pq ≤ ub pq ks for all non-sensitive cells (v) x pq is integer and ≥ 0 Solving (4.2.12) and (4.2.13) by integer quadratic programming using Microsoft Excel Solver of Microsoft Office 2000 package, we obtain the required adjusted non-sensitive cell values. In the above model, Xp and Xq denotes the marginal total of row and column respectively and X pq is the grand total. 4.3 EXAMPLES Example 1: Consider the following problem taken from Fellegi (1975). 12 23 34 3 49 23 50 17 8 13 Let the cell values a4 and a9 are sensitive. We set the values of UBi k s and LBi k as s k UBi s = ai and LBi k = ai/2 s for all the examples considered in this chapter. Let the protection level for a4 provided by statistical office is UPL4 4 = 2, k LPL4 4 = 1, SPL4 4 = 5 k LPL9 9 = 2, SPL9 9 = 5 UPL9 9 = 4, k k k k 183 and for a9 is and b= 5. In order to find out the protection equations, first we have to find out the values of α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 for the two sensitive cells. For this purpose, we have to solve (4.2.7) and (4.2.10). For this example the matrix Mji can be defined as follows Mji = (1 1 1 1 1 1 1 1 1 1 -1) Now the equations of (4.2.7) for the sensitive cell a4 can be written as follows 1. 2. α 11 + α 1 2 − β11 − β1 2 + γ 1 =0 α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0 3. α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0 4. α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =1 5. α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0 6. α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0 7. α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =0 8. α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0 9. α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0 10. α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0 11. α 111 + α 11 2 − β111 − β11 2 − γ 11 =0 12. α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign, where i, j = 1, …, 11. 184 (4.3.1) Solving above equations, we get the following results for the sensitive cell a4. α 41 = 0, α 4 2 = 1, β 41 = 0 and β 4 2 = 0 Again to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the sensitive cell a4, we have to solve the equations of (4.2.10). These equations are given as follows 1. α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =0 2. α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0 3. α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0 4. α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =-1 5. α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0 6. α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0 (4.3.2) 7. α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =0 8. α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0 9. α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0 10. α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0 11. α 11 '1 + α 11 '2 − β11 '1 − β11 '2 + γ 11 ' =0 12. α i '1 , α i '2 , β i '1 , β i '2 are ≥ 0 and γ j ' is unrestricted in sign, where i, j = 1, …, 11. Solving above equations, we get the following results for the sensitive cell a4. 185 α 4 '1 =0, α 4 '2 =0, β 4 '1 =0 and β 4 '2 =1 The equations of (4.2.7) for the sensitive cell a9 are same as (4.3.1), with the change in the values of the right hand side as follows. 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0. After solving (4.2.7) for the sensitive cell a9, we get following results α 91 = 0, α 9 2 = 1, β 91 = 0 and β 9 2 = 0 Similarly the equations of (4.2.10) for the sensitive cell a9 are same as (4.3.2) with the change in the values of the right hand side as follows 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0. Solving these equations, we get following results. α 9 '1 =0, α 9 '2 =0, β 9 '1 =0 and β 9 '2 =1. Putting these values in (4.2.8), (4.2.9) and (4.2.11), we get protection equation for a4 as (i) x4 +5-3 ≥ 2 ⇒ x4 ≥ 0 (ii) -x4 +5+3 ≥ 1 ⇒ x4 ≤ 7 and for a9 as (i) x9 +5-8 ≥ 4 ⇒ x9 ≥ 7 (ii) -x9 +5+8 ≥ 2 ⇒ x9 ≤ 11. 186 Now we unbiasedly round above sensitive cell values and found that only the set (0, 10) of unbiasedly rounded cell values satisfies the protection equation. So we take this set and replace the original sensitive cell values by these values. After putting these rounded values, we observe that table is not additive. To make the table additive we apply the model (4.2.12)-(4.2.13) as follows Minimize z = 0.083*x1^2+0.043*x2^2+0.029*x3^2+0.020*x5^2+0.043 *x6^2+0.020*x7^2+0.059*x8^2+0.077*x10^2+.00431*X11^2-1 Subject to the constraints 1. x1+x2+x3+x5+x6+x7+x8+x10 = X11 – 10 2. 6 ≤ x1 ≤24 3. 11.5 ≤ x2 ≤46 4. 17 ≤ x3 ≤68 5. 24.5 ≤ x5 ≤98 6. 11.5 ≤ x6 ≤46 7. 25 ≤ x7 ≤100 8. 8.5 ≤ x8 ≤34 9. 6.5 ≤ x10 ≤26 10. 116 ≤ X11 ≤464 11. x1, x2, x3, x5, x6, x7, x8, x10 and X11 are integer and ≥ 0 After solving above model, we get following results. 12 23 34 0 50 23 50 17 10 13 and z = 234.4737. 187 After solving this example by the Salazar’s (2005) procedure, we get following results 10 25 35 0 50 25 50 15 10 10 and z = -9. Deviations of the final table obtained by the proposed procedure from the original table using (*) is 3.74 and deviations of the final table obtained by the procedure of Salazar from the original table is 6.63. Thus we see that the deviations in the proposed procedure are small. This table rounds the sensitive cells in such a way that the confidential information contained in the sensitive cells is protected against the single internal attacker and the marginal are also not disturbed. To make the table additive only one non-sensitive cell (a5) is disturbed and that also by only 2.0408%, while all other nonsensitive cell values are published in their original form. Example2: Consider following example taken from Cox (1995). 20 10 20 10 20 80 10 10 20 5 15 60 40 10 10 20 10 90 5 5 15 10 5 40 75 35 65 45 50 270 188 Let the values a1, a9, a16 and a22 are sensitive and the protection levels for a1, a9 and a16 provided by the statistical office is k UPLi i = 7, k k LPLi i = 5, SPLi i = 14 for i = 1, 9 and 16 and for a22 is UPL22 k 22 = 5, k LPL22 22 = 2, SPL22 k 22 = 14. Now we solve (4.2.7) and (4.2.10) to find out the values of the dual variables α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 for all the sensitive cells. After solving (4.2.7) and (4.2.10), we put these values in (4.2.8), (4.2.9) and (4.2.11) and get the following protection equation. (i) x1 ≤ 47 for the sensitive cell a1 only to satisfy the lower protection and sliding protection requirement. We can not obtain the equations to satisfy the upper protection requirement for the cell a1. Since the values of the dual variables for all the other sensitive cells come out to be 0, we could not obtain any protection equation for all the other sensitive cells. This may be noted that if we could not form any lower or upper protection equation for a particular sensitive cell even then the sensitive cell may be protected. Thus in auditing phase we have to check whether the sensitive cell for which no protection equation could be obtained or only one protection equation (upper or lower) is 189 obtained, is protected or not. Now we unbiasedly round these sensitive cell values taking b=14 and get the following sets of rounded values, which are protected and nearest to the set of original sensitive cell values. (i) (28, 14, 14, 14) (ii) (14, 28, 14, 14) (iii) (14, 14, 28, 14) After replacing the original sensitive cell values by the above sets of rounded values and applying the model (4.2.12)-(4.2.13), we could not obtain the solution for the set (iii). Also the value of the objective function, which minimizes the distance between original and final table comes out to be 213.9713 and 209.7067 for the set (i) and (ii), respectively. Hence we select set (ii) of rounded values and get the following results. Table 1 14 12 18 13 23 80 9 8 28 4 11 60 47 10 8 14 11 90 5 5 11 14 5 40 75 35 65 45 50 270 190 with z = 209.7067. Since for this problem, we could not obtain any protection equation for the sensitive cell a9, a16 and a22. Also for the sensitive cell a1 upper protection equation could not be obtained, so we have to check, whether these sensitive cells are protected or not. In auditing phase, we observe that all the sensitive cells are protected. After solving this problem by the procedure of Salazar (2005), we get following results. 14 14 28 14 14 84 14 14 14 0 14 56 42 0 14 14 14 84 0 0 14 14 14 42 70 28 70 42 56 266 with z = -68. Deviations of the final table obtained by the proposed procedure from the original table using (*) is 16.43 and deviations of the final table obtained by the procedure of Salazar from the original table is 28.53. Thus we see that the deviations in the proposed procedure are small for this problem also. Thus in this problem, however we could not obtain any protection equation for the sensitive cells a9, a16 and a22 but in the final table these cells are protected. To make the table additive only 191 12 non-sensitive cells are disturbed, while in the procedure of Salazar (2005) all the non-sensitive cells are disturbed and marginal are also not preserved. Example 3: Consider the following two way table: 200 40 50 200 120 610 20 70 60 100 120 370 40 90 250 100 30 510 100 150 30 80 150 510 360 350 390 480 420 2000 The cell values a4, a10, a15, a19, a20 and a23 are sensitive. Let the protection levels provided by the statistical office for these sensitive cells are: For cells a4 UPL = 20, LPL = 10, SPL = 15 For cells a10 and a19 UPL = 10, LPL = 5, SPL = 15 For cells a15 UPL = 25, LPL = 20, SPL = 15 For cells a20 and a23 UPL = 15, LPL = 7, SPL = 15. 192 Now we solve (4.2.7) and (4.2.10) to find out the values of the dual variables α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 for all the sensitive cells. After solving (4.2.7) and (4.2.10), we put these values in (4.2.8), (4.2.9) and (4.2.11) and get the following protection equation. (i) x4 ≤ 209 for the sensitive cell a4 to satisfy the lower protection and sliding protection requirement and (ii) x23 ≤ 154 for the sensitive cell a23 to satisfy the lower protection and sliding protection requirement. We can not obtain the equations to satisfy the upper protection requirement for the cell a4 and a23. Since the values of the dual variables for all the other sensitive cells come out to be 0, we could not obtain any protection equation for all the other sensitive cells. Now we unbiasedly round these sensitive cell values taking b = 19 and get the following sets of rounded values. Both of these sets are equidistant from the set of original sensitive cell values. (i) (190, 114, 247, 95, 152, 152) (ii) (190, 95, 247, 114, 152, 152) After replacing the original sensitive cell values by the above sets of rounded values and applying the model (4.2.12)-(4.2.13), we 193 observe that the set (i) is more nearer to the set of the original sensitive cell values than the set (ii). Thus we select set (i) and get the following results Table 2 204 41 52 190 123 610 19 65 58 114 114 370 42 92 247 98 31 510 95 152 33 78 152 510 360 350 390 480 420 2000 and z = 1056.654. Since in this problem we could not obtain the protection equation for some sensitive cells, so in auditing phase we have to check whether these cells are protected or not. In auditing phase, we observe that the sensitive cells a4 and a15 could not satisfy the upper protection requirement, while all other cells are protected. After solving this problem by the procedure of Salazar (2005), we get following results. 209 38 57 190 114 608 19 57 57 114 114 361 38 95 247 95 38 513 95 152 38 76 152 513 361 342 399 475 418 1995 194 with z = -86. Deviations of the final table obtained by the proposed procedure from the original table using (*) is 21.45 and deviations of the final table obtained by the procedure of Salazar from the original table is 34.99. Thus in this problem also the proposed procedure results with smaller loss of information as compared to the procedure of Salazar(2005). Example 4: Consider the following two way table taken from Fischetti and Salazar (2003). 20 50 10 80 8 19 22 49 17 32 12 61 45 101 44 190 The cell value a7 is sensitive. Let the protection levels provided by the statistical office for a7 is k UPL7 7 = 7, k k LPL7 7 = 5, SPL7 7 = 5. Now we solve (4.2.7) and (4.2.10) to find out the values of the dual variables α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 for the sensitive cell a7. After solving (4.2.7) and (4.2.10) all the values of the above dual variables comes out to be 0, so we can not form any protection 195 equation for the sensitive cell a7. After applying rounding procedure taking b= 5 we get the rounded value as 20. Now we put this value in place of the original sensitive cell value and apply the model (4.2.12)(4.2.13). After applying the model we get following results Table 3 20 49 11 80 9 20 20 49 16 32 13 61 45 101 44 190 with z = 172.3245. Since in this problem also, we could not obtain the protection equation for the sensitive cell, so in auditing phase we have to check whether the sensitive cell is protected or not. In auditing phase, we observe that the sensitive cell a7 could not satisfy the upper, lower and sliding protection requirement and hence the cell a7 is not protected. We also solved this problem by the procedure of Salazar (2005) and get the following results 20 50 15 85 10 20 20 50 20 30 10 60 100 45 195 50 and z = -9. 196 Deviations of the final table obtained by the proposed procedure from the original table using (*) is 3.16 and deviations of the final table obtained by the procedure of Salazar from the original table is 11.4. Thus we see that again the deviations in the proposed procedure are small. 197 APPENDIX 4.0 Example2: Consider following example. 20 10 20 10 20 80 10 10 20 5 15 60 40 10 10 20 10 90 5 5 15 10 5 40 75 35 65 45 50 270 The matrix Mji for this example is given as follows. Transpose of Mji = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28 x29 x30 1 1 1 1 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 -1 0 0 -1 0 0 0 0 0 0 0 0 0 0 198 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 -1 Let the cell values a1, a9, a16 and a22 are sensitive. The values of k k k UPLi i , LPLi i and SPLi i are already defined. Now the equations of (4.2.7) for the sensitive cell a1 can be written as follows α 11 + α 1 2 − β11 − β1 2 + γ 1 =1 α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0 α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0 α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =0 α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0 α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0 α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =0 α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0 α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0 α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0 α 111 + α 11 2 − β111 − β11 2 − γ 11 =0 α 121 + α 12 2 − β121 − β12 2 + γ 12 =0 α 131 + α 13 2 − β 131 − β 13 2 + γ 13 =0 α 141 + α 14 2 − β141 − β14 2 + γ 14 =0 α 151 + α 15 2 − β151 − β 15 2 + γ 15 =0 α 161 + α 16 2 − β 161 − β16 2 + γ 16 =0 α 17 1 + α 17 2 − β 17 1 − β 17 2 + γ 17 =0 α 181 + α 18 2 − β181 − β 18 2 + γ 18 =0 199 (4.3.3) α 191 + α 19 2 − β 191 − β19 2 + γ 19 =0 α 201 + α 20 2 − β 201 − β 20 2 + γ 20 =0 α 211 + α 21 2 − β 211 − β 21 2 + γ 21 =0 α 221 + α 22 2 − β 221 − β 22 2 − γ 22 =0 α 231 + α 23 2 − β 231 − β 23 2 + γ 23 =0 α 241 + α 24 2 − β 241 − β 24 2 + γ 24 =0 α 251 + α 25 2 − β 251 − β 25 2 + γ 25 =0 α 261 + α 26 2 − β 261 − β 26 2 + γ 26 =0 α 27 1 + α 27 2 − β 27 1 − β 27 2 + γ 27 =0 α 281 + α 28 2 − β 281 − β 28 2 + γ 28 =0 α 291 + α 29 2 − β 291 − β 29 2 + γ 29 =0 α 301 + α 30 2 − β 301 − β 30 2 + γ 30 =0 α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign, where i, j = 1, …, 30. Solving above equations, we get the following results for the cell a1. α 11 =0, β11 =0, α 1 2 =0 and β1 2 =0. Now to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the sensitive cell a1, we have to solve the equations of (4.2.10). These equations are given as follows. α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =-1 α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0 α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0 α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =0 200 α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0 α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0 (4.3.4) α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =0 α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0 α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0 α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0 α 11'1 + α 11 '2 − β11'1 − β11'2 + γ 11 ' =0 α 12 '1 + α 12 '2 − β12 '1 − β12 '2 + γ 12 ' =0 α 13 '1 + α 13 '2 − β13 '1 − β 13 '2 + γ 13 ' =0 α 14 '1 + α 14 '2 − β14 '1 − β14 '2 + γ 14 ' =0 α 15 '1 + α 15 '2 − β15 '1 − β 15 '2 + γ 15 ' =0 α 16 '1 + α 16 '2 − β16 '1 − β16 '2 + γ 16 ' =0 α 17 '1 + α 17 '2 − β 17 '1 − β 17 '2 + γ 17 ' =0 α 18 '1 + α 18 '2 − β18 '1 − β 18 '2 + γ 18 ' =0 α 19 '1 + α 19 '2 − β19 '1 − β19 '2 + γ 19 ' =0 α 20 '1 + α 20 '2 − β 20 '1 − β 20 '2 + γ 20 ' =0 α 21 '1 + α 21 '2 − β 21 '1 − β 21 '2 + γ 21 ' =0 α 22 '1 + α 22 '2 − β 22 '1 − β 22 '2 + γ 22 ' =0 α 23 '1 + α 23 '2 − β 23 '1 − β 23 '2 + γ 23 ' =0 α 24 '1 + α 24 '2 − β 24 '1 − β 24 '2 + γ 24 ' =0 α 25 '1 + α 25 '2 − β 25 '1 − β 25 '2 + γ 25 ' =0 α 26 '1 + α 26 '2 − β 26 '1 − β 26 '2 + γ 26 ' =0 201 α 27 '1 + α 27 '2 − β 27 '1 − β 27 '2 + γ 27 ' =0 α 28 '1 + α 28 '2 − β 28 '1 − β 28 '2 + γ 28 ' =0 α 29 '1 + α 29 '2 − β 29 '1 − β 29 '2 + γ 29 ' =0 α 30 '1 + α 30 '2 − β 30 '1 − β 30 '2 + γ 30 ' =0 α i '1 , α i '2 , β i '1 , β i '2 are ≥ 0 and γ j ' is unrestricted in sign, where i, j = 1, …, 30. Solving above equations, we get the following results for the sensitive cell a1. α 1 '1 =0, α 1 '2 =1, β1 '1 =0 and β1 '2 =2 The equations of (4.2.7) and (4.2.10) are same as (4.3.3) and (4.3.4) for the sensitive cells a9, a16 and a22, with the change in the right hand side(R. H. S. ) values as follows. For a9 R. H. S. values for (4.3.3) are: 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.4) are: 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 For a16 R. H. S. values for (4.3.3) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.4) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 For a22 R. H. S. values for (4.3.3) are: 202 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.4) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0 After solving (4.2.7) and (4.2.10) for the cells a9, a16 and a22, we get following results. α 91 =0, β 91 =0, α 9 2 =0 and β 9 2 =0 α 9 '1 =0, α 9 '2 =0, β 9 '1 =0 and β 9 '2 =0. α 161 =0, β 161 =0, α 16 2 =0 and β16 2 =0 α 16 '1 =0, α 16 '2 =0, β 16 '1 =0 and β 16 '2 =0. α 221 =0, β 221 =0, α 22 2 =0 and β 22 2 =0 α 22 '1 =0, α 22 '2 =0, β 22 '1 =0 and β 22 '2 =0. Putting these values in (4.2.8), (4.2.9) and (4.2.11), we get the protection equations for the sensitive cells a1, a9, a16 and a22. These equations are already defined. We have already found out the set of unbiasedly rounded and protected cell values, thus here we are giving only the objective function and the constraints. The objective function and the constraints for this example are given as follows. Minimize z = 0.1*x2^2+0.05*x3^2+0.1*x4^2+0.05*x5^2+0.0125*X6^2 +0.1*x7^2+0.1*x8^ 2+0.2*x10^2+0.066667*x11^2+0.016667*X12^2+0. 025*x13^2+0.1*x14^2+0.1* x15^2+0.1*x17^2+0.011111*X18^2+0.2*x19 203 ^2+0.2*x20^2+0.066667*x21^2+0.2*x23^2+0.025*X24^2+0.013333*X25 ^2+0.028571*X26^2+0.015385* X27^2+0.022222*X28^2+0.02*X29^2+ 0.003704*X30^2-1 Subject to the constraints 1. x2+x3+x4+x5=X6-14 2. x7+x8+x10+x11=X12-28 3. x13+x14+x15+x17=x18-14 4. x19+x20+x21+x23=X24-14 5. x7+x13+x19=X25-14 6. x2+x8+x14+x20=X26 7. x3+x15+x21=X27-28 8. x4+x10=X28-28 9. x5+x11+x17+x23=X29 10. x2+x3+x4+x5+x7+x8+x10+x11+x13+x14+x15+x17+x19+x20+x21+x23= X30-70 11. 5 ≤ x2 ≤20 12. 10 ≤ x3 ≤40 13. 5 ≤ x4 ≤20 14. 10 ≤ x5 ≤40 15. 40 ≤ X6 ≤160 16. 5 ≤ x7 ≤20 17. 5 ≤ x8 ≤20 18. 2.5 ≤ x10 ≤10 19. 7.5 ≤ x11 ≤30 20. 30 ≤ X12 ≤120 21. 20 ≤ x13 ≤80 204 22. 5 ≤ x14 ≤20 23. 5 ≤ x15 ≤20 24. 5 ≤ x17 ≤20 25. 45 ≤ X18 ≤180 26. 2.5 ≤ x19 ≤10 27. 2.5 ≤ x20≤10 28. 7.5 ≤ x21 ≤30 29. 2.5 ≤ x23 ≤10 30. 20 ≤ X24 ≤80 31. 37.5 ≤ X25 ≤150 32. 17.5 ≤ X26 ≤70 33. 32.5 ≤ X27 ≤130 34. 22.5 ≤ X28 ≤90 35. 25 ≤ X29 ≤100 36. 135 ≤ X30 ≤540 37. x2, x3, x4, x5, X6, x7, x8, x10, x11, X12, x13, x14, x15, x17, X18, x19, x20, x21, x23, X24, X25, X26, X27, X28, X29, and X30 are integer and ≥ 0. After solving above model, we get the desired results displayed in table 1. Example 3: Consider the following population. 200 40 50 200 120 610 20 70 60 100 120 370 40 90 250 100 30 510 100 150 30 80 150 510 360 350 390 480 420 2000 205 The cell values a4, a10, a15, a19, a20 and a23 are sensitive and the protection levels for these sensitive cells are already defined. The matrix Mji for this population is given as follows. Transpose of Mji = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28 x29 x30 1 0 1 0 1 0 1 0 1 0 -1 0 0 1 0 1 0 1 0 1 0 1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 -1 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 -1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 -1 Now the equations of (4.2.7) for the sensitive cell a4 can be written as follows. 206 α 11 + α 1 2 − β11 − β1 2 + γ 1 =0 α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0 α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0 α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =1 α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0 α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0 α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =0 α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0 α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0 α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0 α 111 + α 11 2 − β111 − β11 2 − γ 11 =0 α 121 + α 12 2 − β121 − β12 2 + γ 12 =0 α 131 + α 13 2 − β 131 − β 13 2 + γ 13 =0 α 141 + α 14 2 − β141 − β14 2 + γ 14 =0 α 151 + α 15 2 − β151 − β 15 2 + γ 15 =0 α 161 + α 16 2 − β 161 − β16 2 + γ 16 =0 α 17 1 + α 17 2 − β 17 1 − β 17 2 + γ 17 =0 α 181 + α 18 2 − β181 − β 18 2 + γ 18 =0 α 191 + α 19 2 − β 191 − β19 2 + γ 19 =0 α 201 + α 20 2 − β 201 − β 20 2 + γ 20 =0 α 211 + α 21 2 − β 211 − β 21 2 + γ 21 =0 α 221 + α 22 2 − β 221 − β 22 2 − γ 22 =0 207 (4.3.5) α 231 + α 23 2 − β 231 − β 23 2 + γ 23 =0 α 241 + α 24 2 − β 241 − β 24 2 + γ 24 =0 α 251 + α 25 2 − β 251 − β 25 2 + γ 25 =0 α 261 + α 26 2 − β 261 − β 26 2 + γ 26 =0 α 27 1 + α 27 2 − β 27 1 − β 27 2 + γ 27 =0 α 281 + α 28 2 − β 281 − β 28 2 + γ 28 =0 α 291 + α 29 2 − β 291 − β 29 2 + γ 29 =0 α 301 + α 30 2 − β 301 − β 30 2 + γ 30 =0 α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign, where i, j = 1, …, 30. Solving above equations, we get the following results for the cell a4. α 41 =0, β 41 =0, α 4 2 =0 and β 4 2 =0. Now to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the sensitive cell a4, we have to solve the equations of (4.2.10). These equations are given as follows. α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =0 α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0 α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0 α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =-1 α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0 α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0 (4.3.6) α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =0 208 α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0 α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0 α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0 α 11 '1 + α 11 '2 − β11 '1 − β11 '2 + γ 11 ' =0 α 12 '1 + α 12 '2 − β12 '1 − β12 '2 + γ 12 ' =0 α 13 '1 + α 13 '2 − β13 '1 − β 13 '2 + γ 13 ' =0 α 14 '1 + α 14 '2 − β14 '1 − β14 '2 + γ 14 ' =0 α 15 '1 + α 15 '2 − β15 '1 − β 15 '2 + γ 15 ' =0 α 16 '1 + α 16 '2 − β16 '1 − β16 '2 + γ 16 ' =0 α 17 '1 + α 17 '2 − β 17 '1 − β 17 '2 + γ 17 ' =0 α 18 '1 + α 18 '2 − β18 '1 − β 18 '2 + γ 18 ' =0 α 19 '1 + α 19 '2 − β19 '1 − β19 '2 + γ 19 ' =0 α 20 '1 + α 20 '2 − β 20 '1 − β 20 '2 + γ 20 ' =0 α 21'1 + α 21'2 − β 21'1 − β 21 '2 + γ 21' =0 α 22 '1 + α 22 '2 − β 22 '1 − β 22 '2 + γ 22 ' =0 α 23 '1 + α 23 '2 − β 23 '1 − β 23 '2 + γ 23 ' =0 α 24 '1 + α 24 '2 − β 24 '1 − β 24 '2 + γ 24 ' =0 α 25 '1 + α 25 '2 − β 25 '1 − β 25 '2 + γ 25 ' =0 α 26 '1 + α 26 '2 − β 26 '1 − β 26 '2 + γ 26 ' =0 α 27 '1 + α 27 '2 − β 27 '1 − β 27 '2 + γ 27 ' =0 α 28 '1 + α 28 '2 − β 28 '1 − β 28 '2 + γ 28 ' =0 α 29 '1 + α 29 '2 − β 29 '1 − β 29 '2 + γ 29 ' =0 209 α 30 '1 + α 30 '2 − β 30 '1 − β 30 '2 + γ 30 ' =0 α i '1 , α i '2 , β i '1 , β i '2 are ≥ 0 and γ j ' is unrestricted in sign, where i, j = 1, …, 30. Solving above equations, we get the following results for the sensitive cell a4. α 4 '1 =0, α 4 '2 =0, β 4 '1 =0 and β 4 '2 =1 The equations of (4.2.7) and (4.2.10) are same as (4.3.5) and (4.3.6) for the sensitive cells a10, a15 ,a19, a20 and a23, with the change in the right hand side(R. H. S. ) values as follows. For a10 R. H. S. values for (4.3.5) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.6) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 For a15 R. H. S. values for (4.3.5) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.6) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 For a19 R. H. S. values for (4.3.5) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.6) are: 210 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 For a20 R. H. S. values for (4.3.5) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.6) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 For a23 R. H. S. values for (4.3.5) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 and the values of (4.3.6) are: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 After solving (4.2.7) and (4.2.10) for the cells a10, a15, a19, a20 and a23, we get following results. α 101 =0, β 101 =0, α 10 2 =0 and β 10 2 =0 α 10 '1 =0, α 10 '2 =0, β 10 '1 =0 and β 10 '2 =0. α 151 =0, β 151 =0, α 15 2 =0 and β 15 2 =0 α 15 '1 =0, α 15 '2 =0, β 15 '1 =0 and β 15 '2 =0. α 191 =0, β 191 =0, α 19 2 =0 and β 19 2 =0 α 19 '1 =0, α 19 '2 =0, β 19 '1 =0 and β 19 '2 =0. α 201 =0, β 201 =0, α 20 2 =0 and β 20 2 =0 α 20 '1 =0, α 20 '2 =0, β 20 '1 =0 and β 20 '2 =0. 211 α 231 =0, β 231 =0, α 23 2 =0 and β 23 2 =0 α 23 '1 =0, α 23 '2 =0, β 23 '1 =0 and β 23 '2 =1. Putting these values in (4.2.8), (4.2.9) and (4.2.11), we get the protection equations for the sensitive cells a4, a10, a15, a19, a20 and a23, which are already defined. The unbiasedly rounded set of sensitive values is already defined. After replacing the original sensitive cell values with the unbiasedly rounded values and then applying the proposed model, we get the following objective function and the constraints. Minimize z = 0.005*x1^2+0.025*x2^2+0.02*x3^2+0.008333*x5^2+0.0 01639*X6^2+0.05*x7^2+0.014286*x8^2+0.016667*x9^2+0.008333*x1 1^2+0.002703*X12^2+0.025*x13^2+0.011111*x14^2+0.01*x16^2+0.03 3333*x17^2+0.001961*X18^2+0.033333*x21^2+0.0125*x22^2+0.00196 1*X24^2+0.002778*X25^2+0.002857*X26^2+0.002564*X27^2+0.00208 3*X28^2+0.002381*X29^2+0.0005*X30^2-1 Subject to the constraints 1. x1+x2+x3+x5=X6-190 2. x7+x8+x9+x11=X12-114 3. x13+x14+x16+x17=X18-247 4. x21+x22=X24-399 5. x1+x7+x13=X25-95 6. x2+x8+x14=X26-152 7. x3+x9+x21=X27-247 8. x16+x22=X28-304 212 9. x5+x11+x17=X29-152 10. x1+x2+x3+x5+x7+x8+x9+x11+x13+x14+x16+x17+x21+x22=X30-950 11. 100 ≤ x1 ≤400 12. 20 ≤ x2 ≤80 13. 25 ≤ x3 ≤100 14. 60 ≤ x5 ≤240 15. 305 ≤ X6 ≤1220 16. 10 ≤ x7 ≤40 17. 35 ≤ x8 ≤140 18. 30 ≤ x9 ≤120 19. 60 ≤ x11 ≤240 20. 185 ≤ X12 ≤740 21. 20 ≤ x13 ≤80 22. 45 ≤ x14 ≤120 23. 50 ≤ x16 ≤200 24. 15 ≤ x17 ≤60 25. 255 ≤ X18 ≤1020 26. 15 ≤ x21 ≤60 27. 40 ≤ x22 ≤160 28. 255 ≤ X24 ≤1020 29. 180 ≤ X25 ≤720 30. 175 ≤ X26 ≤700 31. 195 ≤ X27 ≤780 32. 240 ≤ X28 ≤960 33. 210 ≤ X29 ≤840 34. 1000 ≤ X30 ≤4000 213 35. x1, x2, x3, x5, X6, x7, x8, x9, x11, X12, x13, x14, x16, x17, X18, x21, x22, X24, X25, X26, X27,X28,X29 and X30 are integer and >=0. After solving above model, we get the desired results displayed in table 2. Example 4: consider following 3x3 population. 20 50 10 80 8 19 22 49 17 32 12 61 45 101 44 190 The cell value a7 is sensitive and the values of UPLi k , LPLi k i i and SPLi k are already defined for a7. The matrix Mji for this example i is given as follows. Transpose of Mji = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 1 1 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 -1 0 0 0 214 1 0 0 0 1 0 0 0 1 0 0 0 -1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 -1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 -1 Now the equations of (4.2.7) for the sensitive cell a7 can be written as follows. α 11 + α 1 2 − β11 − β1 2 + γ 1 =0 α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0 α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0 α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =0 α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0 α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0 (4.3.7) α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =1 α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0 α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0 α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0 α 111 + α 11 2 − β111 − β11 2 − γ 11 =0 α 121 + α 12 2 − β121 − β12 2 + γ 12 =0 α 131 + α 13 2 − β 131 − β 13 2 + γ 13 =0 α 141 + α 14 2 − β141 − β14 2 + γ 14 =0 α 151 + α 15 2 − β151 − β 15 2 + γ 15 =0 α 161 + α 16 2 − β 161 − β16 2 + γ 16 =0 α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign, where i, j = 1, …, 16. Solving above equations, we get the following results for the cell a7. α 7 1 =0, β 7 1 =0, α 7 2 =0 and β 7 2 =0. 215 Now to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the sensitive cell a7, we have to solve the equations of (4.2.10). These equations are given as follows. α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =0 α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0 α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0 α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =0 α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0 α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0 (4.3.8) α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =-1 α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0 α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0 α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0 α 11'1 + α 11 '2 − β11'1 − β11'2 + γ 11 ' =0 α 12 '1 + α 12 '2 − β12 '1 − β12 '2 + γ 12 ' =0 α 13 '1 + α 13 '2 − β13 '1 − β 13 '2 + γ 13 ' =0 α 14 '1 + α 14 '2 − β14 '1 − β14 '2 + γ 14 ' =0 α 15 '1 + α 15 '2 − β15 '1 − β 15 '2 + γ 15 ' =0 α 16 '1 + α 16 '2 − β16 '1 − β16 '2 + γ 16 ' =0 α i '1 , α i '2 , β i '1 , β i '2 are ≥ 0 and γ j ' is unrestricted in sign, where i, j = 1, …, 16. 216 Solving above equations, we get the following results for the sensitive cell a7. α 7 '1 =0, α 7 '2 =0, β 7 '1 =0 and β 7 '2 =0 Putting these values in (4.2.8), (4.2.9) and (4.2.11), we could not get any protection equation for the sensitive cell a7. After replacing the original sensitive cell values with the unbiasedly rounded values and then applying the proposed model, we get the objective function and the constraints as follows. Minimize z = 0.05*x1^2+0.02*x2^2+0.1*x3^2+0.0125*X4^2+0.125*x5 ^2+0.052632*x6^2+0.020408*X8^2+0.058824*x9^2+0.03125*x10^2+0 .083333*x11^2+0.016393*X12^2+0.022222*X13^2+0.009901*X14^2+0 .022727*X15^2+0.005263*X16^2-1 Subject to the constraints 1. x1+x2+x3=X4 2. x5+x6=X8-20 3. x9+x10+x11=X12 4. x1+x5+x9=X13 5. x2+x6+x10=X14 6. x3+x11=X15-20 7. x1+x2+x3+x5+x6+x9+x10+x11=X16-20 8. 10 ≤ x1 ≤40 9. 25 ≤ x2 ≤100 10. 5 ≤ x3 ≤20 11. 40 ≤ X4 ≤160 12. 4 ≤ x5 ≤16 217 13. 9.5 ≤ x6 ≤38 14. 24.5 ≤ X8 ≤98 15. 8.5 ≤ x9 ≤34 16. 16 ≤ x10 ≤64 17. 6 ≤ x11 ≤24 18. 30.5 ≤ X12 ≤122 19. 22.5 ≤ X13 ≤90 20. 50.5 ≤ X14 ≤202 21. 22 ≤ X15 ≤88 22. 95 ≤ X16 ≤380 23. x1, x2, x3, X4, x5, x6, X8, x9, x10, x11, X12, X13, X14, X15 and X16 are integer and >=0. After solving above model, we get the desired results displayed in table 3. 218 CHAPTER V OPTIMAL CONTROLLED SELECTION PROCEDURE FOR SAMPLE CO-ORDINATION PROBLEM USING LINEAR PROGRAMMING 5.1 INTRODUCTION In many occasions it is often required to sample the population on two or more surveys to cover a variety of topics or to obtain the current estimates of a characteristic of the population. There are certain applications for which samples are selected at the same time point, for two or more surveys for the same population. For example, a sample can be designed for households and persons and another sample can be designed for literacy for the same population. These surveys can be conducted simultaneously with different measures of size and possibly with different stratification. On the other hand, if after conducting a survey, improved data have become available after a certain time period, then it would be desirable to improve the stratification and measures of size. It is possible that both the stratification and the selection probabilities (i.e. the measure of size) 219 of the sampling units are different in each survey. Thus a redesign is undertaken in which the old units remain the same but the stratification and the selection probabilities change, because an improved data have been obtained. In the redesign of a survey for the same population, the two samples must be selected sequentially since the designs are for different time point. In the redesign of a survey the new sample must be selected independently of the old sample but it may be considered desirable to retain as many old units as possible in the new sample, to reduce the expenses associated with hiring of new interviewer, training of new data collectors etc. Moreover, in almost all of the surveys, the cost of sampling is roughly proportional to the total number of units sampled in the survey. Thus, if we select the same unit twice instead of selecting two distinct units, it will reduce the cost of the survey. Therefore, if it is possible to minimize the number of distinct units chosen in the different survey, it would minimize the cost of the survey. Therefore when the cost of survey is limited, it is usually desirable to select the units which can be taken as a sample for both the survey (in case of simultaneous as well as in sequential selection). Thus in order to reduce the cost of a survey one has to conduct the surveys in such a way as to minimize the maximum 220 number of units in the union of the samples. This is known as the problem of maximization of overlap between the sampling units. There also exists the situation when it is desirable to avoid or minimize the likelihood of selecting the same unit for more than one survey. This problem is called the minimization of overlap of sampling units. For example, if we are interested to take two consecutive samples from a population after a gap of say three months to check the level of immunization in children of this population, we may be interested that the children chosen in the second sample are different from those selected in the first sample. Thus, we resort to the technique of minimization of overlap of sampling units. Therefore, we observe that in many situations it is desirable to maximize or minimize the expected overlap between two or more surveys drawn in different or same time points. The problem of overlap of sampling unit is also referred to as the sample coordination problem. Maximizing the overlap of sampling unit can be referred to as positive co-ordination and minimizing the overlap of sampling unit is referred to as negative co-ordination. The problem of co-ordination of sampling units has been a topic of interest for more than fifty years. Different methods have 221 been proposed by various authors in order to solve the sample coordination problem. The first approach on sample co-ordination problem was discussed by Keyfitz (1951). Keyfitz (1951) proposed an optimum procedure for selecting one unit per stratum designs, when initial and new designs have identical stratification, with only the change in selection probability. Fellegi (1963, 1966), Gray and Platck (1963) and Kish (1963) also proposed methods for sample coordination problem but these methods are in general restricted to either two successive samples or to small sample size. In order to solve the problem in context of a larger sample size, Kish and Scott (1971) proposed a method for sample co-ordination problem. Brever et. al. (1972) introduced the concept of permanent random number (PRN) for solving the sample co-ordination problem. The concept of linear programming approach for solving the problem of sample coordination was first discussed by Causey et al. (1985). Causey et. al. (1985) proposed an optimum linear programming procedure for maximizing the expected number of sampling units which are common to the two designs, when the two sets of sample units are chosen sequentially. Ernst and Ikeda (1995) also presented a linear programming procedure for overlap maximization under very general 222 conditions. Ernst (1996) developed a procedure for sample coordination problem, with one unit per stratum designs where the two designs may have different stratifications. Ernst (1998) proposed a procedure for sample co-ordination problem with no restriction on the number of sample units per stratum, but the stratification must be identical. Both of these procedures proposed by Ernst (1996, 1998) uses the controlled selection algorithm of Causey, Cox and Ernst (1985) and can be used for simultaneous as well as sequential sample surveys. Ernst and Paben (2002) proposed a new methodology for sample co-ordination problem, which is based on the procedure of Ernst (1996, 1998). This procedure has no restriction on the number of sample units selected per stratum and also does not require that the two designs have identical stratification. Recently Matei and Tillé (2006) proposed a methodology for sample co-ordination problem for two sequential sample surveys. They proposed an algorithm, based on iterative proportional fitting (IPF), to compute the probability distribution of a bi-design. Their methods can be applied to any type of sampling design for which it is possible to compute the probability distribution for both samples. 223 In this chapter, using the linear programming approach, we propose an improved method for sample co-ordination problem which maximizes (or minimizes) the overlap of sampling units between two designs. The proposed procedure is motivated by Ernst (1998). The basic idea of the proposed procedure is adapted from Ernst (1998), however the way of solving the controlled selection problem is different. In section 5.2, we describe the proposed methodology for positive and negative sample co-ordination problem. In section 5.3, some numerical examples have been considered to demonstrate the utility of the proposed procedure. 5.2 THE OPTIMAL CONTROLLED PROCEDURE Following the notations of Ernst (1998), we consider two sampling designs D1 and D2, with identical population and stratification, consisting of N units, with S denoting one of the strata. We have to select the given number of sample units from the two designs. The selection probability of each unit in S is not in general same for the two designs. In order to reduce the cost of the survey, let us first suppose that we want to maximize the overlap of sampling 224 units for the two sampling designs. Thus the problem to be solved here is of maximizing the overlap of sampling units in D1 and D2. To maximize the overlap of sampling units in D1 and D2, we select the sample units subject to the following conditions originally derived by Ernst (1998): (i) There are a predetermined number of units, nj, selected from S for the Ds sample, s= 1,2. That is the sample size for each stratum and design combination is fixed. (5.2.1) (ii) The ith unit in S is selected for the Ds sample with its assigned probability, denoted π ij (5.2.2) (iii) The expected value of the number of sample units common to the two designs is maximized. (5.2.3) (iv) The number of sample units in common to any D1 and D2 samples is always within one of the maximum expected value. (5.2.4) As described in Ernst (1998) the problem of maximizing the overlap of sampling units for the two designs can be converted into the “controlled selection” problem W= ( wij ) , where ( wij ) denotes the internal elements of W. Here W is an (N+1)x5 array with N internal rows and 4 internal columns, where N is the number of units in the 225 stratum universe. The solution of this controlled selection problem W would then maximize the overlap of sampling units. This controlled selection problem W can be solved by constructing a sequence of integer valued, tabular array, M1 = (mij1 ) , M2 = (mij 2 ) ,…, Mt = (mijt ) , with the same number of rows and columns as W and associated probabilities, p1,…,pt, which specify certain conditions. Finally a random array M = (mij ) , is then chosen from among these t arrays using the indicated probability and this array determines the sample allocation. The procedure of computing the internal elements of W is as follows. Let wi 3 = min (π i1 , π i 2 ), wis = π ij − wi 3 , i = 1,..., N s = 1,2 (5.2.5) (5.2.6) 3 wi 4 = 1 − ∑ wij (5.2.7) j =1 This array W can be considered as a controlled selection problem. In the i th internal row of the array W, the first element denotes the probability that the i th unit is in D1, the second element denotes the probability that the i th unit is in D2, the third element denotes the probability that the i th unit is in D1 and D2 and the forth element denotes the probability that the i th unit is neither in D1 nor in 226 D2. We next describe the conditions which must be satisfied by the sequence of integer valued arrays, M1,…,Mt and associated probabilities p1,…,pt which determine the sample allocation. In each internal row of these arrays, one of the four internal columns shall have the value 1 and the other three have the value 0. The value in the first column indicates that the unit is only in the D1 sample; value 1 in the second column indicates that the unit is only in the D2 sample; value 1 in the third column indicates that the unit is in both samples; and the value 1 in the forth column indicates that the unit is in neither of the two samples. Ernst (1998) has derived a set of conditions which, if met by the random array M, are sufficient to satisfy (5.2.1)-(5.2.4). (5.2.2) will be satisfied if p(mis = 1) + p(mi 3 = 1) = wis + wi 3 = π is , i = 1,...N , s = 1,2 (5.2.8) (5.2.3) will be satisfied if we have p(mi 3 = 1) = wi 3 , i = 1,...N (5.2.9) Therefore if it can be established that t E (mij ) = ∑ p k mijk = wij , i = 1,..., N , k =1 227 j = 1,...,5 (5.2.10) Then (5.2.2) and (5.2.3) will hold, since (5.2.10) implies (5.2.8) and (5.2.9). To establish (5.2.1), it only needs to show that m( N +1) sk + m( N +1) 3k = n s , s = 1,2 k = 1,..., t (5.2.11) Finally, to establish (5.2.4), it is sufficient to show that mijk − wij < 1, i = 1,..., N + 1 j = 1,...5 k = 1,...t (5.2.12) Since, in particular, m( N +1)3k − w( N +1)3 < 1, k = 1,...t , where w(N+1)3 is the maximum expected number of units which are common to the two samples and m(N+1)3k is the number of units common to the k th possible sample. Now our problem becomes to find out the solution of the controlled selection problem W in such a way as to satisfy the conditions (5.2.10)-(5.2.12). The solutions of the controlled selection problem W which satisfy the condition (5.2.10)-(5.2.12), will then maximize the overlap of sampling units in the design D1 and D2. The first step of the proposed procedure is to find out the all possible combination of units according to the probabilities of the 228 array W. This set is denoted by A. In order to satisfy condition (5.2.11) we select only those arrays for which m( N +1) sk + m( N +1) 3k = n s , s = 1,2 k = 1,..., t . Thus the first step is to exclude those arrays, out of all possible arrays, which does not satisfy the condition (5.2.11). Let us denote this set of arrays by A1. Now we have proposed following model to solve the controlled selection problem W and to satisfy the conditions (5.2.10) and (5.2.12). In this model we use linear programming approach, for maximizing the probability of those sample combinations which consist of maximum number of overlapped − sampling units and this set is denoted by A . The non-negativity condition of H-T estimator is also included in this model for the purpose of variance estimation. The proposed model is described as follows: Maximize z = ∑ p k (5.2.13) k∈ A Subject to the constraints t i. ∑p k =1 k =1 229 ii. t ∑p m k =1 iii. k ijk = wij i = 1,..., N , mijk ≤ mijk ≤ mijk iv. π i i s ≤ π i sπ i s v. pk ≥ 0 vi. πii s ≥ 0 12 1 2 j = 1,...,4 (5.2.14) for s = 1,2 for k = 1,..., t 12 where k refers to a particular sample combination and mijk and mijk denotes the lower and upper rounded values of mijk respectively, when the rounding base is 1. Condition (i) and (v) are necessary for any sampling design and Condition (ii) and (iii) are required to satisfy (5.2.10) and (5.2.12) respectively. Condition (ii) also ensures that the resultant design is an IPPS design. Condition (iv) is desirable as it ensures the sufficient condition for non-negativity of the YatesGrundy estimator of the variance and condition (vi) is desirable because it ensures unbiased variance estimation of H-T estimator. In the above model, condition (iv) is very stringent and it is possible that in some situations a feasible solution of the model may not exist with this condition. In such situations, this condition can be dropped. Even 230 after dropping condition (iv), it is possible to find out a non-negative estimate of H-T estimator in the situations where we have to select a sample of size greater than two. The reason for this is that the condition (iv) is necessary for the non-negative estimation of H-T estimator for the sample of size less than or equal to two only and not for the sample of size greater than two. Again, if in some situations the condition (vi) is also not satisfied, this condition may also be dropped and alternative method of variance estimation, other than Horvitz-Thompson variance estimator, can be used. Moreover, in many situations, it is often desirable to avoid the selection of same unit for two or more surveys. Thus in these situations, we have to minimize the overlap of sampling units for two or more surveys. The proposed procedure can be easily modified to minimize the overlap of sampling units. In order to minimize the overlap of sampling units, we have to redefine the internal elements of W. condition (5.2.5) can be replaced by wi3= max ( πi1+ πi2 -1, 0) ( 5.2.15 ) and (5.2.6), (5.2.7) will remain the same as for the case of maximization of overlap of sampling units. Rest of the procedure will remain the same as defined for the case of maximization of overlap of 231 sampling units except of the objective function of the proposed model. Objective function in the case of minimization of overlap of sampling units can be defined as Max z = ∑p k∈C k where C denotes the set of all those sample combinations, which consists of minimum number of overlapped sampling units. The proposed procedure can be used for the situations when the two surveys are conducted for the same population with identical stratification. These two surveys can be conducted sequentially or simultaneously. There is no restriction on the number of units selected per stratum. The proposed procedure is superior to the procedure of Ernst (1998) as the proposed procedure maximizes the probability of those sample combinations which consists of maximum number of overlapped sampling units (in case of positive co-ordination) or minimizes the probability of those sample combinations which consists of maximum number of overlapped sampling units (in case of negative co-ordination). The proposed procedure also ensures variance estimation using Horvitz-Thompson (1952) variance estimator and in the situations, where the conditions of H-T estimator could not be satisfied some alternative variance estimator can be used. 232 Moreover, the procedure of Ernst (1998) does not take in account the all possible combinations of units and terminates after few steps. But the proposed procedure provides all possible combinations of sampling units to the sampler so that the sampler has a large choice to select the sample combination. In some situations it may happen that the selected units in the sample are too distinct and thus increases the cost of the survey. In this situation the sampler can select another combination of units in order to reduce the cost of the survey. Thus if we consider all possible combinations of units, then it will be advantageous to the sampler to select any sample combination according to the budget of the survey. Thus the proposed procedure not only maximizes or minimizes the overlap of sampling units but also controls the cost of the survey. The proposed procedure can be applied to any number of sampling units per stratum, but the size of the linear programming problem increase rapidly as the number of sampling units per stratum increases. But this problem encounters with all those procedures, which uses linear programming approach for solving the sample coordination problem. Thus these procedures can be used for the designs with small number of sample units per stratum. 233 5.3 EXAMPLES In this section, we illustrate some numerical examples to demonstrate the utility of the proposed procedure. We also compare the proposed plan with the procedure of Ernst (1998) to demonstrate the superiority of the proposed plan over the procedure of Ernst (1998). Example 1: Consider the following example taken from Ernst (1998). Inclusion probabilities are to be given for two sampling designs for 5 different units. TABLE 1: Inclusion probabilities i 1 2 3 4 5 π i1 .6 .4 .8 .6 .6 π i2 .8 .4 .2 .4 .2 We have to select a sample of size 3 for the design D1 and a sample of size 2 for the design D2. Case I (Maximization): The first step is to find out the value of internal elements of W using (5.2.5)-(5.2.7), which are given as follows 234 W= 0 .2 .6 .2 1 0 0 .4 .6 1 .6 0 .2 .2 1 .2 0 .4 .4 1 .4 0 .2 .4 1 1.2 .2 1.8 1.8 5 Now the problem becomes of solving the above controlled selection problem with N= 20 and n=5, where N denotes the total number of population units and n denotes the sample units in the above array. All possible combinations satisfying condition (5.2.11) are as follows . . (1) x x . x . . . x . . . x . . . x . x . . . . x . . x . . . . . . (2) x . . . (3) x . . . (4) x . . . . x . . . . . . x . . . . x . . x . x . . . x . . . . x . . x . . . . . (5) x . . . . . . . x x . . . . . . . . . . (6) x . x . . x . . x . . x . . x . . x . . (7)x . . . . . . . x . . . x . x . x . . x . x . x . . . x . . (9) x . . . . . . . . x . . x x . . . . . . (10) x . x . . . . . . . . x x x . x . . (11). . x . x x . . . . . x . . . . . x . . . . . x (12). . x . x . . . x . . . x . . . 235 . . (8)x . . . . . . . . . (13). x . . . . . . x x . . . . . . . . . x (14) . . . x . x . . x . x . . . . . x . . . (15). . . x . x . . x . . . x . . . . x . . x x (16). . x . x . . . . . . x . . . x . . (17) . x . . . . . . . x . . x x . . . . . x (18) . . . x . . . . . . x . x x . . x . . . (19) . . . . . . x . x x . . . . . x (20) x . . . . . x . . . . . x . x . . . x . x . . . (21) . . x . . . . . x . . x . . . . x . . x (22) . . . . . . x . . x x . . x . . . . . . (23) . . x . . . x . . x . x . x . x (24) . . . . . . x . . . . . . x x . . x x . . . Now we apply the proposed model as follows. Maximize z = p5+p6+p7+p8+p9+p10+p13+p14+p15+p16+p17+p18+p19+ p20+p21+p22+p23+p24 Subject to the Constraints 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+ p16+p17+p18+p19+p20+p21+p22+p23+p24=1 2. p1+p2+p3+p4+p11+p12=.2 3. p5+p6+p7+p13+p14+p15+p19+p20+p21=.6 4. p8+p9+p10+p16+p17+p18+p22+p23+p24=.2 5. p1+p3+p5+p8+p9+p11+p13+p16+p17+p19+p22+p23=.4 6. p2+p4+p6+p7+p10+p12+p14+p15+p18+p20+p21+p24=.6 7. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10=.6 236 8. p12+p14+p16+p18+p20+p22+p24=.2 9. p11+p13+p15+p17+p19+p21+p23=.2 10. p1+p2+p11+p12+p13+p14+p15+p16+p17+p18=.2 11. p4+p6+p8+p10+p21+p23+p24=.4 12. p3+p5+p7+p9+p19+p20+p22=.4 13. p3+p4+p11+p12+p19+p20+p21+p22+p23+p24=.4 14. p2+p7+p9+p10+p15+p17+p18=.2 15. p1+p5+p6+p8+p13+p14+p16 =.4 16. p5+p13+p19<=.24 17. p5+p6+p7+p14+p20<=.48 18. p6+p13+p14+p15+p21 <=.36 19. p7+p15+p19+p20+p21 <=.36 20. p1+p3+p5+p8+p9+p16+p22<=.32 21. p1+p8+p11+p13+p16+p17+p23<=.24 22. p3+p9+p11+p17+p19+p22+p23<=.24 23. p2+p3+p4+p7+p9+p10+p12+p18+p20+p22+p24<=.48 24. p1+p2+p4+p6+p8+p10+p12+p14+p16+p18+p24<=.48 25. p2+p4+p10+p11+p12+p15+p17+p18+p21+p23+p24<=.36 26. p1+p3+p5+p11+p13+p19<=.32 27. p12+p14+p20<=.16 28. p4+p6+p21 <=.32 29. p2+p7+p15<=.16 30. p16+p22<=.08 31. p8+p23<=.16 32. p9+p17<=.08 33. p24<=.08 237 34. p18<=.04 35. p10<=.08 36. pi(s) ≥ 0 for i = 1,…,24. 37. The equations from 16 to 35 should be ≥ 0. After solving above model, we get following solutions Z = .8 p1= .03017; p2= .02386; p3= .04727; p4= .07873; p5= .118; p6= .09977; p7= .07396; p8= .0585; p9= .02465; p10= .0451; p11= .06692; p12= .01328; p13= .0298; p14= .0564; p15= .0236; p16= .0073; p17= .0029; p18= .0059; p19= .0463; p20= .0757; p21= .0765; p22= .0141; p23= .01418; p24= .02718; Ernst (1998) has given following results for this problem. p3= .2; p6= .4; p18= .2; p19= .2; However the summation of the probability of those sample combinations which consist of maximum number of overlapped sampling units for the procedure of Ernst (1998) is .8, which is same as achieved by the proposed procedure but the proposed procedure has the advantage that the variance estimation is possible with this procedure. Also the procedure of Ernst (1998) does not consider all possible combination of sampling units but the proposed procedure takes into account all possible combination of sampling units. The 238 advantage of considering the all possible combination of sampling units can be understood by the following situation. Suppose for the sampling design D1 the sample combinations consisting of the units 2, 3, 4 or 2, 4, 5 are desirable and for the sampling design D2 the sample combinations consisting of the units 2, 3 or 3, 4 or 4, 5 is desirable. In this situation, if we consider the solutions provided by the procedure of Ernst (1998), then there exist no sample combination in his solution which consists of these units together for the sampling design D1 and D2. However, the proposed procedure provides the sample combinations consisting of these units together for the sampling design D1 and D2. Case II (Minimization): For the case of minimization of overlap of sampling units, the proposed procedure could not get a feasible solution with the constraints (iv) and (vi), so we dropped the constraints (iv) and (vi) and get the following results. Z= .6 p1= .150275; p2= .099617; p3= .029148; p4= .150108; p5= .108444; p6= .038652; p7= .0; p8= .004897; p9= .0; p10= .01886; p11= .023756; p12= .0; p13= .176244; p14= .107237; p15= .008826; p16= .083937. 239 After solving this example by the procedure of Ernst (1998), for the case of minimization of overlap of sampling units, we get following results. p1= .4; p6= .2; p13= .2; p16= .2. For the case of minimization of overlap of sampling units, the summation of the probability of those sample combinations which consists of minimum number of overlapped sampling units for the procedure of Ernst (1998) is .6, which is again same as achieved by the proposed procedure but as we said earlier that the proposed procedure has some advantages over the procedure of Ernst (1998). Example 2: Consider the following example taken from Tiwari, Nigam and Pant (2007). Inclusion probabilities for two designs for 6 units are given as follows. TABLE 2: Inclusion probabilities i 1 2 3 π i1 0.42 0.42 0.45 0.48 0.66 0.57 π i2 0.3 0.45 0.3 0.60 0.81 0.54 4 5 6 We have to select a sample of size 3 for both the designs. 240 Case I (Maximization): Internal values of W are as follows. W= .12 0 .3 .58 1.0 0 .03 .42 .55 1.0 .15 0 .3 .55 1.0 0 .12 .48 .4 1.0 0 .15 .66 .19 1.0 .03 0 .54 .43 1.0 .3 .3 2.7 2.7 6.0 All possible combinations satisfying condition (5.2.11) are as follows. x . . . . x . . (1) . . x . . . x . . . . x . . . x x . . . . x . . (2) . . x . . . . x . . x . . . . x x . . . . x . . (3) . . x . . . . x . . . x . . x . x . . . . x . . (4) . . . x . . x . . . x . . . . x x . . . . x . . (6) . . . x . . . x . . x . . . x . x . . . . . x . (7) . . x . . x . . . . . x . . . x x . . . . . x . (8) . . . x . x . . . . x . . . . x x . . . . . x . (9) . . . x . x . . . . . x . . x . x . . . x . . . . . . x . . . x (11) . . x . (12) . . . x . x . . . x . . . . . x . . x . . . x . . . x . x . . . . . x . (13) . . x . . . . x . x . . . . . x 241 x . . x (5) . . . . . . . . . . . . . x x . . x x . x . . . . . . x (10) . . x . . x . . . . x . . . . x x . . . x . . . . . x . . . x . (14) . . . x (15) . . . x . . x . . . . x . x . . . x . . . . . x . . x . x . . . x . . . . . . x . . . x (16) . . x . (17) . . x . . . x . . . . x . x . . . x . . . . . x . . x. . x . . . . . . x (18) . . . x . . x . . x . . . . x . . . (21)x . . . . x . . . . x x . . . x . . . . . (22) x . . . . . x . . x . . . x . . x . . x . . . . x . . (23) x . . . . . x . . . x . . x . x . . . x . . . . x . . . . (24) x . . . (25) x . . . . x . . . . x . x . . . . x . . x . . . . x . x . . . x . . . . x . . (26)x . . . . x . . . x . . . x . . . x . . . (27) x . . . x . . . x . . . x . . . . x . . x . . x . . . (28)x . . . . . x . x . . . . x . . . x . . . x . . . x . . . x . . x . . . . x (29) x . . . (30) x . . . . x . . . x . . . . . x . . x . . . x . . . x . . . x . . . x . . . . x . . . x (32) x . . . (33) x . . . . . x . . . . x . x . . . x . . . . . x . . x . . . . x . . . x . . x . . . x . (34) x . . . (35) x . . . . . x . . . . x . x . . . x . . . . . x . . x . . . x . . . x . (31) x . . . . . . x . x . . . . . x . . (36)x . . . . . x . . . x . . x . . . x . . (19) x . . . (20) x . . . . . x . . . . x . . . x . . x . . . . x . . . x . . x . . x . . . x . x . . . . . (37) . . x . . x . . . . x x . . . . . x . x . x . . . . . x . . x . . (38) . . . x . . x . . . . x x . . . . . x . . x . . (39) . . . x . . . x . . x . x . . . . . . x . x . . (40) . . x . . . x . . . . x x . . . . . . x . . . x . x . . . x . . (41). . x . (42) . . . x . . . x . . x . . . x . . . x . x . . . x . . . . . x . . . x . (43) . . . x . x . . . . . x x . . . . . x . . . . x (44) . . x . . x . . . . . x x . . . . . x . . . . x (45) . . . x . x . . . . x . x . . . 242 . . . x . . x . (46). . x . . x . . . . . x x . . . . . . x . . x . (47) . . . x . x . . . . x . x . . . . . . x . . . x (48) . . x . . x . . . . x . x . . . . . x . . . x . (49) . . . x . . . x . x . . x . . . . . x . . . . x (50) . . x . . . . x . x . . x . . . . . x . . . . x . . . x . . x . (51). . . x (52) . . x . . . x . . . . x . x . . . x . . x . . . x . . . . . . x . . x . (53) . . . x . . x . . x . . x . . . . . . x . . . x (54) . . x . . . x . . x . . x . . . . . x . . . x . (55) . . x . . . . x . . . x . . . x . . (56). . . . . . . . . . x . x . . x x . . x . x x . x . . x . x x . . x . . (58) . . . . . . . . . . x . x . . x . x . x x . . . (59) . . . . . . . . . . x . . x x . x . . x . x . . x . . . . x (60) . . x . . . . x . . x . . . . x . . (61). . . . . . . . . . x . . . x . . x . . . x x . (62) . . . x . x . . x . . x . . x . x . . . . x . . (63) . . . . . . . . . . x . . x . x x . . x x . . . (64) . . . . . . . . . . x . . x . x . x x . x . . . . x . . x . (65) . . x . . . x . . . . x . . . x . . . x . . x . (70) . . . x . . . x . . x . . . x . . . (57) . . . . . . . . . . .. . x . . x . (66) . . x . . . . x . . x . . . . x . . (67) . . . . . . . . . . . x x . x . . x . x x . . . (68) . . . . . . . . . . . x x . . x x . x. .x . . (69) . . . . . . . . . . . x x . . x x . . x x . . . (71) . . . . . . (72) . . . . . . . . . . . x . x x . x . . x. x . . . (73) . . . . . . . . . . . x . x x . . x x . x . . . (74) . . . . . . . . . . . x . x . x x . x . x . . . . . . . . x . x x . x . x . . x 243 After applying the proposed model, we get following results. Z = .7 p1= .0181; p2= .0002; p3= .0009; p4= .0006; p5= .0002; p6= .0011; p7= .022; p8= .0019; p9= .0085; p10= .0012; p11= .0069; p12= .0125; p13= .0119; p14= .0082; p15= .0102; p16= .0042; p17= .0084; p18= .0031; p19= .0031; p20= .0002; p21= .0031; p22= .0009; p23= .0002; p24= .0011; p25= .0193; p26= .0015; p27= .0099; p28= .0031; p29= .0064; p30= .0183; p31= .0228; p32= .016; p33= .0121; p34= .0054; p35= .0142; p36= .0177; p37= .000004; p38= .00001; p39= .00006; p40= .00001; p41= .00001; p42= .0003; p43= .001; p44=.0044; p45= .0004; p46= .0011; p47= .0009; p48= .0005; p49= .0004; p50= .0026; p51= .0173; p52= .0001; p53= .0006; p54= .0003; p55= .0014; p56= .0048; p57= .0173; p58= .0078; p59= .0028; p60= .017; p61= .0035; p62= .0543; p63= .0143; p64= .0678; p65= .0057; p66= .0348; p67= .0075; p68= .091; p69= .0228; p70= .0887; p71= .0657; p72= .014; p73= .0646; p74= .1137; We have also solved this example by the method of Ernst (1998) and found the following results. p8= .067; p10= .042; p17= .012; p24= .033; p30= .009; p31= .108; p48= .003; p52= .032; p55= .192; p66=.022; p74= .48; 244 The summation of the probability of those sample combinations which consist of maximum number of overlapped sampling units for the procedure of Ernst (1998) is .694 which is less than the value achieved by the proposed procedure. Also the proposed procedure has the advantage that the variance estimation is possible with it. The proposed procedure could not get a feasible solution with the constraint(iv), for the case of minimization of overlap of sampling units, so for this example also, we dropped the constraint (iv) and get the following results. Z= .34 p1= .0; p2= .0; p4= .0519; p5= .0781; p6= .0111; p9= .0138; p10= .0109; p11= .0069; p12= .0141; p13= .015; p14= .0087; p15= .0129; p16= .01; p17= .011; p18= .0203; p19= .058; p20= .0005; p21= .025; p22= .088; p23= .0005; p24= .0003; p25= .017; p26= .003; p27= .0064; p28= .022; p29= .0024; p30= .0212; p31= .004; p32= .002; p33= .0; p35= .0223; p36= .0022; p37= .018; p38= .0012; p39= .0133; p40= .0239; p41= .0118; p42= .0; p43= .021; p44= .0086; p45= .0254; p46= .0028; p47= .0009; p48= .008; p49= .007; p50= .0022; p51= .005; p52= .0269; p53= .0104; p54= .0039; p55= .019; p56= .0131; p57= .0277; p58= .0104; p59= .0085; p60= .0013; p61=.003; p62= .012; p63= .0036; p64= .029; p65= .013; p66= .0026; p67= .0; p71= .0267; p72= .0021; p7= .0163; p8= .0; p3= .0; p34= .0; p68= .0086; p69= .0214; p70= .017; 245 p73= .008; p74= .0; After solving this example by the method of Ernst (1998) for the case of minimization of overlap of sampling units we found the following results. p9= .03; p18= .11; p20= .19; p22= .24; p24= .1; p30= .03; p37= .01; p40= .17; p65= .09; p71= .02; p29= .01; For this example the summation of the probability of those sample combinations which consists of minimum number of overlapped sampling units for the procedure of Ernst (1998) is .33, which is less than the value achieved by the proposed procedure and thus shows the superiority of the proposed procedure. Example 3: consider the following population taken from Keyfitz (1951) for two sequential sample surveys. Inclusion probabilities are to be given as follows. TABLE 3: Inclusion probabilities i 1 2 3 4 π i1 0.14562 0.6462 0.58534 0.62284 π i2 0.16404 0.67018 0.5596 0.60618 We have to select a sample of size two for both the survey. Internal elements of W are as follows. 246 W= 0 0 .01842 .02398 .14562 .6462 .83596 .32982 1 1 .02574 0 .5596 .41466 1 .01666 0 .60618 .37716 1 .0424 .0424 1.9576 1.9576 4.0 All possible combinations satisfying condition (5.2.11) are as follows. . . . . . . (1). . x . . x x . . . x . . x . . . . x . . x . . . x . x . . x . . . . x . . x . . . . x . . x . . (2) . . . x (3) . . . x (4) . . x . (5) . . x . (6) . . . x . . . x . . . x . . . . x . . . x . . . x . . (7) . x . . x x . . . x . . . . . . (8). x . x . . x . . . x . . . . . x . . x . x . . . . . x . . x . . x . . . x . . . . x (9). . x . (10). . . x (11)x . . .(12) x . . . . . . x . . . x . . . . . x . . . . x . . (13)x . x . . . x . . . . x . . x . . . . (14)x . . . . x . . . . x After applying the proposed model, we get following results. Z = .96 p1= .2517; p2= .2885; p3= .0475; p4= .2717; p5= .0307; p6= .0675; p7= .0055; p8= 0.0; p9= 0.0; p10= .0111; p11= .0185; p12=0.0; p13= 0.0; p14= .0073; We have also solved this example by the method of Ernst (1998) and found the following results. 247 p1= .16; p2= .42; p4= .23; p5= .15; p7= .02; p13= .02; However for this example again the summation of the probability of those sample combinations which consist of maximum number of overlapped sampling units for the procedure of Ernst (1998) is same as achieved by the proposed procedure but for this example also the variance estimation is possible with the proposed procedure. For this example, the proposed procedure gives the following results for the case of minimization of overlap of sampling units. Z = .31 p1= .0335; p2= .0583; p3= .0538; p4= .055; p5= .0547; p6= .05468; p7=.1145; p8= .0725; p9= .1145; p10= .1582; p11= .0725; p12=.1582; For the case of minimization of overlap of sampling units, the method of Ernst (1998) gives the following results. p3= .14; p6= .16; p7= .24; p10=.27; p11=.05; p12=0.05; p8= 0.09; For this example the summation of the probability of those sample combinations which consists of minimum number of overlapped sampling units for the procedure of Ernst (1998) is .3, which is again less than the value achieved by the proposed procedure 248 Example 4: Consider the following population taken from Goodman and Kish (1950) for D1 and from Tiwari, Nigam and Pant (2007) for D2. Inclusion probabilities for the two designs are given as follows. TABLE 4: Inclusion probabilities i 1 2 3 4 5 6 π i1 0.4 0.6 0.4 0.8 1 0.8 π i2 0.42 0.42 0.45 0.48 0.66 0.57 We have to select a sample of size 4 for the design D1 and a sample of size 3 for the design D2. The value of internal elements of W, are given as follows W= 0 .02 .4 .58 1 .18 0 .42 .4 1 0 .05 .4 .55 1 .32 0 .48 .2 1 .34 0 .66 0 1 .23 0 .57 .2 1 1.07 .07 2.93 1.93 6.0 All possible combinations satisfying condition (5.2.11) are as follows. . . x . . . x . . . x . . . . x . . . x . . . x x . . . x . . . x . . . x . . . x . . . x . . . . . x . . . . x . . . x . . x . . . x . . . . x (1). . . x (2) . . x . (3). . . x (4). . x . (5) . . . x (6). . x . . . x . . . x . . . x . . . x . . . x . . . x . . . . x . . . x . . x . . . . x . . x . . . x . 249 . . . (7)x . . . . . . . . x . x . . x . . x . . x . . x . . . x . . . . . x . . . x . . . x . . . . x . (8)x . . . (9) x . . . (10)x . . x . . . x . . . . . x . . x . . . . x . . . x . . . x . x . . . x . . . . x . x . . . . x . . x . . . . (11)x . . . (12)x . . . . x . . . x . . . x . . . x . . x . . . x . . . x . . . x . . . x . . . x . . . x . . . x . . . x . . . x . . . x . . . . x . . . x . . . x (13). . x . (14). . . x (15). . . x(16). . x . (17). . x .(18). . . x . . . x . . x . . . . x . . x . . . . x . . x . x . . . x . . . x . . . x . . . x . . . x . . . . . . x . . . x . . x . . . . x . . x . . . x . . . . x . . . x . . . x . . . x . . x . . . x . . . x . . . x . . . x . . . . x . . x . . . . x (19) . . x . (20). . x . (21). . . x (22). . x .(23) . . . x (24). . x . . . x . . . . x . . x . . . x . . . . x . . . x x . . . x . . . x . . . x . . . . . x . . . x . . . . x . . x . . . x . . . x . x . . . x . . . . . x . . . . x . . . x . . . x . x . . . x . . . . x . . x . . . x . . . x . x . . . x . . (25). . . x (26). . x . (27) . . . x (28). . x . (29). . x .(30). . . . . x . . . . x . . x . . . x . x . . . x . . . . x . . . x . . . x . . . x . . . x . . . x x . . . x . . . x . . . x . . . . . . x . . x . . x . . . . x . x x . . . x . . . x . (31). x . . (32). x . . (33) . . x . . . x . . . . . . . x . . . x . x . . . . x . . x . . . . . x x . . . x . (37). . . x . . . . x . . x . . . . . . x . . . x . . . . x . . . x . . . x . . . x . . . x .(34) . . x . (35) . . . x(36). x . . x . . . . x . . x . . . x . . . x . . . x . . . x . . . . x . . x . . . x . . . . x . . . . x . x . . . x . . . x . . . x . . . x . . . . . x . . . . x . . . . x x (38). . x . (39). . . x(40). . x . . x . . . . . x . . . x . . . . x . x . . . x . . . 250 . . x . . . . x x . . . x . . . . x . . . x . . (41) . . . x(42). . x . . . x . . . x . x . . . x . . . . x . . . x . . . x . . . . x . . x . . . x . . . . x . . x . . x . . . . x . . x . . x . (43)x . . . (44) x . . . (45)x . . .(46)x . . x . . . x . . . x . . . x . . . . . x . . x . . . x . . . . . x . . . x (49). . . x . . . . x x . . . . . . . x . . x . . . . x . . . x . . x . . x . . . x . . (47)x . . . (48)x . . . x . . . x . . . . . x . . . x . . . x . . . . x . . . . x . . . . . x . . . x . . x . . x (50). . x . (51). x . . (52). x . .(53) . . x . . . x . . . x . . . . . . . x . . . x . . . x . x . x . . . x . . . x . . . x . x . . . . . x . . x . (55). . x . x . . . x . . . . . x . . . x . . x . . (56) . . . x x . . . x . . . . . x . . . . x . x . . (57) . . x . x . . . x . . . x . . . x . . . x . . . x . . x . (54) . . . x . . x . . x . . . . x . . . . . . x . . . . . . x . . x . . x . . (58) . . x . x . . . x . . . After applying the proposed model, we could not get a feasible solution for this example. Since we have to select a sample of size 3 and 4 for this population so constraint (iv) is not necessary for nonnegative estimation of H-T estimator. Thus we dropped constraint (iv) After dropping constraint (iv), we get following solutions. Z = .93 p1= 0; p2= .0111; p3= .0154; p4= .0123; p5= .0169; p7=.019; p8=.0133; p9=.0794; p10=.0202; p11=.0685; p12= .0882; p13=.062; p14= .01; p15= .0139; p16= .0232; p17= .0121; p18= .0435; p19=.011; p20= .0154; p21= .0544; p22= .0485; p23= .0127; p24= .007; p25=.038; p26= .0132; p27= .0782; p28= .0415; p29= .0002; p30= .001; 251 p6= .102; p31=.004; p32= .003; p33= .0004; p34= .00008; p35= .0005; p36= .001; p37= .002; p38= .0015; p39= .0021; p40= .0006; p41= .0063; p42= .001; p43=.009; p44= .001; p45= .0005; p49= .001; p50= .0006; p51= .0028; p46= .004; p47= .0012; p48= .002; p52= .0017; p53= .0021; p54= .0004; p55= .0002; p56= .0194; p57= .0005; p58= .0011; We have also solved this example by the method of Ernst (1998) and found the following results. p1= .17; p9= .22; p10= .03; p11= .07; p21= .26; p28= .16; p55= .02; p58= .05; p20= .02; For this example also the summation of the probability of those sample combinations which consist of maximum number of overlapped sampling units for the procedure of Ernst (1998) is same as achieved by the proposed procedure but again for this example the variance estimation is possible with the proposed procedure. For the case of minimization of overlap of sampling units, the proposed procedure could not get a feasible solution with the constraints (iv) and (vi), so we dropped the constraints (iv) and (vi) and get the following results. Z = .67 p1= 0; p2= .0; p6= .001; p7= .0184; p3= .0; p4= .0; p5= .001015; p8= .0; p9= .0647; p10= .0184; 252 p11= .027; p12= .0213; p13= .0434; p14= .0632; p15= .0272; p16= .0158; p17= .0215; p18= .0423; p19= .0408; p20= .0286; p21= .049; p22= .0438; p23= .0106; p24= .0107; p25= .0315; p26= .0178; p27= .0386; p28= .0329; p29= .000; p30= .001; p31= .00; p32= .00; p33= .00; p34= .0122; p35= .0047; p36= .000; p37= .00; p38= .00; p39= .00; p40= .0; p41= .0; p42= .0; p43= .0; p44= .0; p45= .0237; p46= .0; p47= .0151; p48= .0453; p49= .0169; p50= .0124; p51= .0; p52= .0157; p53= .0202; p54= .0535; p55= .0353; p56= .0067; p57= .029; p58= .038; We have also solved this example by the method of Ernst (1998) and get the following results. p9= .01; p16= .01; p21= .42; p24= .2; p25= .02; p36= .02; p43= .17; p51= .11; p55= .02; p57= .02; For this example the summation of the probability of those sample combinations which consist of minimum number of overlapped sampling units for the procedure of Ernst (1998) is .66, which is again less than the value achieved by the proposed procedure. Thus we see that for the proposed procedure, in all the examples discussed above the summation of the probability of those sample combinations which consists of maximum number of 253 overlapped sampling units ( in case of positive co-ordination) and the summation of the probability of those sample combinations which consists of minimum number of overlapped sampling units ( in case of negative co-ordination) is always greater than or equal to the summation of the probabilities obtained by the procedure of Ernst (1998), with the advantage of estimation of variance. 254 APPENDIX 5.0 Example 1: Case II (Minimization): The internal elements of W, using (5.2.15), (5.2.6) and (5.2.7) are given as follows. W= 0.2 0.4 0.4 0.0 1.0 0.4 0.4 0.0 0.2 1.0 0.8 0.2 0.0 0.0 1.0 0.6 0.4 0.0 0.0 1.0 0.6 0.2 0.0 0.2 1.0 2.6 1.6 0.4 0.4 5.0 Now above problem becomes of solving the controlled selection problem with N= 20 and n= 5. All possible combinations satisfying condition (5.2.11) are as follows. . x . x (1)x . x . x . . . . . . . . x . . x . . . . x . . x . . . . x . . x . . . (2). x . . (3). x . . (4)x . . . x . . . x . . . . x . . x . . . x . . . x . . . x . . . x . . . . x . . x . . . (5)x . . . (6). x . . . x . . . x . . x . . . x . . . x x . (7) x . x . . x . . . . . . x . . . . x . . (8)x . . . x . . . . x . . . . x . . . x . . . . . x . . . x . (11). x . . (12)x . . . . x . . . . x . . . x . . . x . . . . . x . . . x . . . x . x . (13)x . . . (14)x . . x . . . x . . . x . . . . . . x . . . x . . . (9). x . . x . . . . x . . x . . x . . (10)x . . . x . . x . . . . . . . . x . . . x . . x . . . x . . . . (15). x . . (16)x . . . . x . . . . x . . x . . . x . . . x Now we apply the proposed model as follows. 255 . . . . . Max z = p1+p2+p3+p4+p5+p6+p7+p8+p9+p10 Subject to the Constraints 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16=1 2. p3+p5+p6+p8+p9+p10=.2 3. p1+p2+p4+p7=.4 4. p11+p12+p13+p14+p15+p16=.4 5. p2+p4+p6+p7+p9+p10+p15+p16=.4 6. p1+p3+p5+p8+p14=.4 7. p11+p12+p13=.2 8. p1+p4+p5+p7+p8+p10+p12+p13+p14+p16=.8 9. p2+p3+p6+p9+p11+p15=.2 10. p1+p2+p3+p7+p8+p9+p11+p13+p14+p15=.6 11. p4+p5+p6+p10+p12+p16=.4 12. p1+p2+p3+p4+p5+p6+p11+p12 =.6 13. p7+p8+p9+p10+p13=.2 14. p14+p15+p16=.2 15. p6+p9+p10+p15+p16<=.24 16. p5+p8+p10+p12+p13+p14+p16<=.48 17. p3+p8+p9+p11+p13+p14+p15<=.36 18. p3+p5+p6+p11+p12<=.36 19. p4+p7+p10+p16<=.32 20. p2+p7+p9+p15<=.24 21. p2+p4+p6<= .24 22. p1+p7+p8+p13+p14<=.48 23. p1+p4+p5+p12<=.48 24. p1+p2+p3+p11<=.36 256 25. p1+p14<=.32 26. p2+p11+p15<=.16 27. p4+p12+p16<=.32 28. p7+p13<=.16 29. p3<=.08 30. p5<=.16 31. p8<=.08 32. p6<=.08 33. p9<=.04 34. p10<=.08 35. pi(s)≥ 0 for i =1,…,16. 36. The equations from15 to 34 should be ≥ 0. After solving above model, we get the desired results displayed already in example 1. Example 2: Case I (Maximization): The internal elements of W and all possible combinations for this problem are already defined in example 2. The objective function and the constraints are given as follows. Max z = p55+p56+p57+p58+p59+p60+p61+p62+p63+p64+p65+p66+p67+p68+ p69+p70 +p71+p72+p73+p74 Subject to the Constrains 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+ p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+ 257 p33+p34+p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+ p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58+p59+p60+p61+p62+ p63+p64+p65+p66+p67+p68+p69+p70+p71+p72+p73+p74=1 2. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+ p18= .12 3. p19+p20+p21+p25+p26+p27+p31+p32+p33+p37+p38+p39+p43+p44+p45+ p49+p50+p51+p55+p56+p57+p58+p59+p60+p61+p62+p63+p64=.3 4. p22+p23+p24+p28+p29+p30+p34+p35+p36+p40+p41+p42+p46+p47+p48+ p52+p53+p54+p65+p66+p67+p68+p69+p70+p71+p72+p73+p74=.58 5. p1+p2+p3+p4+p5+p6+p19+p20+p21+p22+p23+p24+p37+p38+p39+p40+ p41+p42=.03 6. p7+p8+p9+p13+p14+p15+p25+p28+p29+p31+p34+p35+p43+p46+p47+p49 +p52+p53+p55+p56+p57+p58+p65+p66+p67+p68+p69+p70=.42 7. p10+p11+p12+p16+p17+p18+p26+p27+p30+p32+p33+p36+p44+p45+p48+ p50+p51+p54+p59+p60+p61+p62+p63+p64+p71+p72+p73+p74=.55 8. p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+p33+ p34+p35+p36=.15 9. p1+p2+p3+p7+p10+p11+p13+p16+p17+p37+p40+p41+p44+p46+p48+p50+ p52+p54+p55+p59+p60+p61+p65+p66+p67+p71+p72+p73=.3 10. p4+p5+p6+p8+p9+p12+p14+p15+p18+p38+p39+p42+p43+p45+p47+p49+ p51+p53+p56+p57+p58+p62+p63+p64+p68+p69+p70+p74=.55 11. p7+p8+p9+p10+p11+p12+p25+p26+p27+p28+p29+p30+p43+p44+p45+p46+ p47+p48=.12 12. p1+p4+p5+p14+p16+p18+p19+p22+p23+p32+p34+p36+p38+p40+p42+p51 +p53+p54+p56+p59+p62+p63+p65+p68+p69+p71+p72+p74=.48 13. p2+p3+p6+p13+p15+p17+p20+p21+p24+p31+p33+p35+p37+p39+p41+p49 258 +p50+p52+p55+p57+p58+p60+p61+p64+p66+p67+p70+p73=.4 14. p13+p14+p15+p16+p17+p18+p31+p32+p33+p34+p35+p36+p49+p50+p51+ p52+p53+p54=.15 15. p2+p4+p6+p8+p10+p12+p20+p22+p24+p26+p28+p30+p39+p41+p42+p45+ p47+p48+p51+p60+p62+p64+p66+p68+p70+p71+p73+p74=.66 16. p1+p3+p5+p7+p9+p11+p19+p21+p23+p25+p27+p29+p37+p38+p40+p43+ p44+p46+p55+p56+p58+p59+p61+p63+p65+p67+p69+p72=.19 17. p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+p48+p49+p50+p51+ p52+p53+p54=.03 18. p3+p5+p6+p9+p11+p12+p15+p17+p18+p21+p23+p24+p27+p29+p30+p33+ p35+p36+p58+p61+p63+p64+p67+p69+p70+p72+p73+p74=.54 19. p1+p2+p4+p7+p8+p10+p13+p14+p16+p19+p20+p22+p25+p26+p28+p31+ p32+p34+p55+p56+p57+p59+p60+p62+p65+p66+p68+p71 =.43 20. p7+p8+p9+p13+p14+p15+p25+p31+p43+p49+p55+p56+p57+p58<=.1764 21. p1+p2+p3+p7+p10+p11+p13+p16+p17+p19+p20+p21+p25+p26+p27+p31+ p32+p33+p37+p44+p50+p55+p59+p60+p61<=.189 22. p1+p4+p5+p14+p16+p18+p19+p32+p38+p51+p56+p59+p62+p63<=.2016 23. p2+p4+p6+p8+p10+p12+p20+p26+p39+p45+p57+p60+p62+p64<=.2772 24. p3+p5+p6+p9+p11+p12+p15+p17+p18+p21+p27+p33+p37+p38+p39+p43+ p44+p45+p49+p50+p51+p58+p61+p63+p64<=.2394 25. p7+p13+p25+p28+p29+p31+p34+p35+p46+p52+p55+p65+p66+p67<=.189 26. p14+p34+p53+p56+p65+p68+p69<=.2016 27. p8+p28+p47+p57+p66+p68+p70<=.2772 28. p9+p15+p29+p35+p43+p46+p47+p49+p52+p53+p58+p67+p69+p70<=.2394 29. p1+p16+p19+p22+p23+p32+p34+p36+p40+p54+p59+p65+p71+p72<=.216 30. p2+p10+p20+p22+p24+p26+p28+p30+p41+p48+p60+p66+p71 +p73 <=.297 259 31. p3+p11+p17+p21+p23+p24+p27+p29+p30+p33+p35+p36+p37+p40+p41+ p44+p46+p48+p50+p52+p54+p61+p67+p72+p73<=.2565 32. p4+p22+p42+p62+p68+p71+p74<=.3168 33. p5+p18+p23+p36+p38+p40+p42+p51+p53+p54+p63+p69+p72+p74<=.2736 34. p6+p12+p24+p30+p39+p41+p42+p45+p47+p48+p64+p70+p73+p74<=.3762 35. p19+p20+p21+p25+p31+p37+p38+p39+p43+p49+p55+p56+p57+p58<=.135 36. p37+p44+p50+p55+p59+p60+p61 <=.09 37. p19+p25+p26+p27+p32+p38+p43+p44+p45+p51+p56+p59+p62+p63<=.18 38. p20+p26+p31+p32+p33+p39+p45+p49+p50+p51+p57+p60+p62+p64<=.243 39. p21+p27+p33+p58+p61+p63+p64<=.162 40. p1+p2+p3+p7+p13+p37+p40+p41+p46+p52+p55+p65+p66+p67<=.135 41. p1+p4+p5+p7+p8+p9+p14+p19+p22+p23+p25+p28+p29+p34+p38+p40+ p42+p43+p46+p47+p53+p56+p65+p68+p69<=.27 42. p2+p4+p6+p8+p13+p14+p15+p20+p22+p24+p28+p31+p34+p35+p39+p41+ p42+p47+p49+p52+p53+p57+p66+p68+p70<=.3645 43. p3+p5+p6+p9+p15+p21+p23+p24+p29+p35+p58+p67+p69+p70<=.243 44. p1+p7+p10+p11+p16+p40+p44+p46+p48+p54+p59+p65+p71+p72<=.18 45. p2+p10+p13+p16+p17+p41+p48+p50+p52+p54+p60+p66+p71 +p73 <=.243 46. p3+p11+p17+p61+p67+p72+p73<=.162 47. p4+p8+p10+p12+p14+p16+p18+p22+p26+p28+p30+p32+p34+p36+p42+p45 +p47+p48+p51+p53+p54+p62+p68+p71+p74<=.486 48. p5+p9+p11+p12+p18+p23+p27+p29+p30+p36+p63+p69+p72+p74<=.324 49. p6+p12+p15+p17+p18+p24+p30+p33+p35+p36+p64+p70+p73+p74<=.4374 50. pi(s) ≥0 for i=1,…,74. 51. The equations from 20 to 49 should be ≥ 0. 260 After solving above model, we get the desired results displayed already in example 2. Example 2: Case II (Minimization): The internal elements of W are given as follows. W= 0.42 0.3 0.0 0.28 1.0 0.42 0.45 0.0 0.13 1.0 0.45 0.3 0.0 0.25 1.0 0.4 0.52 0.08 0.0 1.0 0.19 0.34 0.47 0.0 1.0 0.46 0.43 0.11 0.0 1.0 2.34 2.34 0.66 0.66 6.0 Now above problem becomes of solving the controlled selection problem with N= 24 and n= 6. All possible combinations satisfying condition (5.2.11) are as follows. x . x . (1)x . . x . x . x . . . . . . . . . . . . x . x . (2). x x . . x . x . . . . . . . . . . . . x . x . (3). x . x x . . x . . . . . . . . . . . . x . . . x . . . (4). x . . . x . . . x . . x . . . x . . . . x . . (5)x . . . x . . . . x . . . x . . x . . . . x . . (6)x . . . . x . . x . . . . x . . x . . x (7)x . . x . x x . . . . . . . . . . . . . x . . x (8). x x . x . . x . . . . . . . . . . . . x . . . x . . . . x . . . x . . . x . . . x . . x . . . x . . . (9). x . . (10). x . . (11)x . . . (12)x . . . x . . . . x . . x . . . . x . . . x . . x . . . . x . . x . . . x . . . x . . . . x . . . x . . . x . . . x . . . x . . . x . . . x . . . x . . x . . . x . . . x . . . x . . . . x . . . x . . (13)x . . . (14). x . . (15). x . . (16). x . . (17)x . . . (18)x . . . 261 . x . . . x . . x . . . x . . . x . . . . x . . . x . . . x . . . x . . . x . . (19)x . . . (20). x . . . x . . x . . . x . . . x . . . x . . . x . . . x . . . . . (25)x . . . x . . . x . x . x . . . x . (31). . . . x . x . . . . x . x . . x . . . . (26)x . . . . . x . . x . . . x . x . . . . x . . x . . . x . . x . . (21). . . . x . . x . . . x . x . . x . . . x . . . x . . . x . . . . x . . x . . . . x . . x . . . . x . . . x . . . x . . . . x . . . x . . . . . . x x (22). . . x (23). . . x (24) x . . . . . x . . . . x . . x . . . . . x . . x . . . x . . . . x . . . x . . . . x . . x . . . x . x . . . (27). . . x . x . . . . . x . . . . . x . x . . . x . . . x . . . . x . . . . . x . . . x (28). . . x (29). x . . (30). x . . x . . . x . . . x . . . . . x . . x . . . . x . . x . . . . x . . x . . . x . . . x . . . . . x . . . . . x x (32). . . x (33). x . . . . . x . . x . . . x . . . x . . . . . x . . . . x . x . . . x . . . x . . . . . x . x . . . x . (34). x . . (35). . . x (36). . . . . x . . x . . . . x x . . . . . x . . x . . x . . x . . . x . . . . x . . . x . . . x . . . . . . x . . . x . . . x x . . . (37). x . . (38). x . . (39)x . . . . x . . . . x . . x . . . . x . . x . . . x . . x . . . x . . . . . x . . . . x . . . x . x . x . . . x . . . x . . (40)x . . . (41)x . . . (42). . . . x . . . . x . x . . . . x . . x . . . x . . x . . . x . . . . x . . x . . . . x . . . . . x . . . x x . . . x . . . x . . . (43). . . x (44). x . . (45). x . . x . . . x . . . x . . . . . x . . x . . . . x . . x . . . . x . . x . . 262 . x . x . . (46). . . . x . x . . . . x . . x . . x . . x (47). . . . . . x . x . . . . x . . . . . x . x . . . x (48). x . . . . x . . . x . . . . . . x . . . . x . . (49). x . . . x x . . . x . x . x . . . x . . . x . . . x . . . . (50). . . x (51). . . x . . x . . . . x . . . . x . . x . . . x . . . x . . . . . . x . . . x . x . . x . . . x . . . . . . x (52). x . . (53). x . .(54)x . . . . x . . . . x . x . . . . . x . . x . . . x . . x . . . x . . . . . x . . x . . . . . x . . . x . . . x . . . x . . . x . x . . . x . . . x . . . x . . (55) x . . . (56)x . . . (57)x . . . (58)x . . . (59)x . . . x . . . x . . . x . . . . x . . . . x . . . x . . x . . . . x . x . . . x . . . . x . . . . x . . x . . . . x . . x . . . x . . . . . x (60)x . . . . x . . x . . . . . x . . x . . . x . . . x . . . . . x . . . x . . . x . . . x . . . x . x . . . x . . (61)x . . . (62)x . . . (63)x . . . (64)x . . . (65)x . . . . . x . . x . . . . x . . x . . . . x . x . . . . . x . . x . . . . x . . x . . . x . . x . . . x . . . x . . . x . . . . x . . x . (66). . . x . . x . . . . x . . x . . . . x . . . . . x (67). x . . x . . . x . . . . . x . . x . . x . (72). . . . . x x . . x . . . . x . . . . . . x . x . . . x . . . x . . . x . . . . . x (68). x . . (69). . . x (70). x . . x . . . x . . . x . . . x . . . . . x . . . x . . . x . x . . . x . . . . . . x . x . . . x . . . . . x (73). x . . (74). x . . . . x . . . x . x . . . x . . . x . . . x . . . 263 . . . x . x . . (71). x . . x . . . . . x . x . . . Now we apply the proposed model as follows. Max Z=p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16 +p17+p18+p19+p20 Subject to the Constraints 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+ p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+ p33+p34+p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+ p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58+p59+p60+p61+p62+ p63+p64+p65+p66+p67+p68+p69+p70+p71+p72+p73+p74=1 2. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p21+p22+p23+p24+p25+p26+p27+ p28+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38=.42 3. p11+p12+p13+p14+p15+p16+p17+p18+p19+p20+p42+p43+p46+p47+p50+ p51+p54+p55+p60+p61+p62+p63+p66+p67+p69+p70+p72+p74=.3 4. p39+p40+p41+p44+p45+p48+p49+p52+p53+p56+p57+p58+p59+p64+p65+ p68+p71+p73=.28 5. p1+p2+p3+p4+p11+p12+p13+p14+p15+p16+p21+p22+p23+p39+p40+p41+ p42+p43+p44+p45+p46+p47+p48+p49+p50+p51+p52+p53=.42 6. p5+p6+p7+p8+p9+p10+p17+p18+p19+p20+p27+p28+p31+p32+p35+p36+ p56+p57+p58+p59+p64+p65+p66+p68+p69+p71+p72+p73=.45 7. p24+p25+p26+p29+p30+p33+p34+p37+p38+p54+p55+p60+p61+p62+p63+ p67+p70+p74=.13 8. p1+p5+p6+p7+p11+p12+p13+p17+p18+p19+p24+p25+p26+p39+p40+p41+ p54+p55+p56+p57+p58+p59+p60+p61+p62+p63+p64+p65=.45 9. p2+p3+p4+p8+p9+p10+p14+p15+p16+p20+p29+p30+p33+p34+p37+p38+ p44+p45+p48+p49+p52+p53+p67+p68+p70+p71+p73+p74=.3 264 10. p21+p22+p23+p27+p28+p31+p32+p35+p36+p42+p43+p46+p47+p50+p51+ p66+p69+p72= .25 11. p2+p5+p8+p9+p11+p14+p15+p17+p18+p20+p27+p28+p29+p30+p42+p43+ p44+p45+p54+p55+p56+p57+p66+p67+p68+p69+p70+p71=.4 12. p1+p3+p4+p6+p7+p10+p12+p13+p16+p19+p21+p22+p24+p25+p31+p33+ p35+p37+p39+p40+p46+p48+p50+p52+p58+p60+p62+p64=.52 13. p23+p26+p32+p34+p36+p38+p41+p47+p49+p51+p53+p59+p61+p63+p65+ p72+p73+p74=.08 14. p3+p6+p8+p10+p12+p14+p16+p17+p19+p20+p31+p32+p33+p34+p46+p47+ p48+p49+p58+p59+p60+p61+p66+p67+p68+p72+p73+p74=.19 15. p1+p2+p4+p5+p7+p9+p11+p13+p15+p18+p21+p23+p24+p26+p27+p29+ p36+p38+p39+p41+p42+p44+p51+p53+p54+p56+p63+p65=.34 16. p22+p25+p28+p30+p35+p37+p40+p43+p45+p50+p52+p55+p57+p62+p64+ p69+p70+p71 =.47 17. p4+p7+p9+p10+p13+p15+p16+p18+p19+p20+p35+p36+p37+p38+p50+p51+ p52+p53+p62+p63+p64+p65+p69+p70+p71+p72+p73+p74=.46 18. p1+p2+p3+p5+p6+p8+p11+p12+p14+p17+p22+p23+p25+p26+p28+p30+ p32+p34+p40+p41+p43+p45+p47+p49+p55+p57+p59+p61 =.43 19. p21+p24+p27+p29+p31+p33+p39+p42+p44+p46+p48+p54+p56+p58+p60+ p66+p67+p68=.11 20. p1+p2+p3+p4+p21+p22+p23<=.1764 21. p1+p5+p6+p7+p24+p25+p26<=.189 22. p2+p5+p8+p9+p23+p26+p27+p28+p29+p30+p32+p34+p36+p38<=.2016 23. p3+p6+p8+p10+p22+p25+p28+p30+p31+p32+p33+p34+p35+p37<=.2772 24. p4+p7+p9+p10+p21+p24+p27+p29+p31+p33+p35+p36+p37+p38<=.2394 25. p1+p11+p12+p13+p39+p40+p41<=.189 265 26. p2+p11+p14+p15+p23+p41+p42+p43+p44+p45+p47+p49+p51+p53<=.2016 27. p3+p12+p14+p16+p22+p40+p43+p45+p46+p47+p48+p49+p50+p52<=.2772 28. p4+p13+p15+p16+p21+p39+p42+p44+p46+p48+p50+p51+p52+p53<=.2394 29. p5+p11+p17+p18+p26+p41+p54+p55+p56+p57+p59+p61+p63+p65<=.216 30. p6+p12+p17+p19+p25+p40+p55+p57+p58+p59+p60+p61+p62+p64<=.297 31. p7+p13+p18+p19+p24+p39+p54+p56+p58+p60+p62+p63+p64+p65<=.2565 32. p8+p14+p17+p20+p28+p30+p32+p34+p43+p45+p47+p49+p55+p57+p59+ p61+p66+p67+p68+p69+p70+p71+p72+p73+p74<=.3168 33. p9+p15+p18+p20+p27+p29+p36+p38+p42+p44+p51+p53+p54+p56+p63+ p65+p66+p67+p68+p69+p70+p71+p72+p73+p74<=.2736 34. p10+p16+p19+p20+p31+p33+p35+p37+p46+p48+p50+p52+p58+p60+p62+ p64+p66+p67+p68+p69+p70+p71+p72+p73+p74<=.3762 35. p17+p18+p19+p20+p66+p69+p72<=.135 36. p14+p15+p16+p20+p67+p70+p74<=.09 37. p12+p13+p16+p19+p46+p47+p50+p51+p60+p61+p62+p63+p72+p74<=.18 38. p11+p13+p15+p18+p42+p43+p50+p51+p54+p55+p62+p63+p69+p70<=.243 39. p11+p12+p14+p17+p42+p43+p46+p47+p54+p55+p60+p61+p66+p67<=.162 40. p8+p9+p10+p20+p68+p71+p73<=.135 41. p6+p7+p10+p19+p31+p32+p35+p36+p58+p59+p64+p65+p72+p73<=.27 42. p5+p7+p9+p18+p27+p28+p35+p36+p56+p57+p64+p65+p69+p71<=.3645 43. p5+p6+p8+p17+p27+p28+p31+p32+p56+p57+p58+p59+p66+p68<=.243 44. p3+p4+p10+p16+p33+p34+p37+p38+p48+p49+p52+p53+p73+p74<=.18 45. p2+p4+p9+p15+p29+p30+p37+p38+p44+p45+p52+p53+p70+p71<=.243 46. p2+p3+p8+p14+p29+p30+p33+p34+p44+p45+p48+p49+p67+p68<=.162 47. p1+p4+p7+p13+p21+p22+p23+p24+p25+p26+p35+p36+p37+p38+p39+p40 +p41+p50+p51+p52+p53+p62+p63+p64+p65<=.486 266 48. p1+p3+p6+p12+p21+p22+p23+p24+p25+p26+p31+p32+p33+p34+p39+p40 +p41+p46+p47+p48+p49+p58+p59+p60+p61 <=.324 49. p1+p2+p5+p11+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+ p39+p40+p41+p42+p43+p44+p45+p54+p55+p56+p57<=.4374 50. pi(s)≥ 0 for i =1,…,74. 51. The equations from 20 to 49 should be ≥0 After solving above model, we get the desired results displayed already in example 2. Example 3: Case I (Maximization): The internal elements of W and all possible combinations for this problem are already defined in example 3. The objective function and the constraints are given as follows. Max z = p1+p2+p3+p4+p5+p6 Subject to the equations: 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14=1 2. p9+p10+p13+p14=.01842 3. p3+p5+p6+p8+p12=.14562 4. p1+p2+p4+p7+p11=.83596 5. p7+p8+p11+p12=.02398 6. p2+p4+p6+p10+p14=.6462 7. p1+p3+p5+p9+p13=.32982 8. p11+p12+p13+p14=.02574 9. p1+p4+p5+p7+p9=.5596 10. p2+p3+p6+p8+p10=.41466 267 11. p7+p8+p9+p10=.01666 12. p1+p2+p3+p11+p13=.60618 13. p4+p5+p6+p12+p14=.37716 14. p6<=.0941 15. p5+p12<=.085237 16. p3+p8<=.090698 17. p4+p14<=.378247 18. p2+p10<=.402479 19. p1+p7+p9+p11+p13<=.364573 20. p6+p8+p10+p12+p14<=.109936 21. p5+p9<=.091797 22. p3+p13<=.099438 23. p4+p7<=.375033 24. p2+p11<=.40625 25. p1<=.339218 26. pi(s) ≥0 for i= 1,…,14. 27. The equations from 14 to 25 should be ≥0. After solving above model, we get the desired results displayed already in example 3. Example 3: Case II (Minimization): The internal elements of W are given as follows. W= 0.14562 0.16404 0.0 0.69034 1.0 0.32982 0.3538 0.31638 0.0 1.0 0.4404 0.41466 0.14494 0.0 1.0 0.39382 0.37716 0.22902 0.0 1.0 1.30966 1.30966 0.69034 0.69034 4.0 268 Now above problem becomes of solving the controlled selection problem with N= 16 and n= 4. All possible combinations satisfying condition (5.2.11) are as follows. x . . . (1) x . . . . x . . . x . . . . . x (7)x . . . . x . . . . x . x . . . (2). x . . x . . . . x . . . (8)x . . . . x . . . . x . x . . x . . . (3). x . . . x . . x . . . . . . (9). x . x . . . . x . x . . (4)x . . . x . . . . x . . x . . . x . (10). . x . . x . . . . . x . . . x (5)x . . x x . Max z = p1+p2+p3+p4+p5+p6 Subject to the equations 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12=1 3. p4+p5+p6=.16404 4. p7+p8+p9+p10+p11+p12=.69034 5. p1+p4+p5+p7+p8=.32982 6. p2+p3+p6+p9+p11=.3538 7. p10+p12=.31638 8. p2+p4+p6+p9+p10= .4404 9. p1+p3+p5+p7+p12=.41466 10. p8+p11=.14494 11. p3+p5+p6+p11+p12=.39382 12. p1+p2+p4+p8+p10=.37716 269 . . . . . x (6). x x . x . . . . . . . . . . . . x . . . x (11). x . . (12). . x . . . x . . x . . x . . . x . . . Now we apply the proposed model as follows. 2. p1+p2+p3=.14562 . . . . 13. p7+p9=.22902 14. p1<=.0941 15. p2<=.085237 16. p3<=.090698 17. p4+p8+p10<=.378247 18. p5+p7+p12<=.402479 19. p6+p9+p11 <=.364573 20. p6<=.109936 21. p5<=.091797 22. p4<=.099438 23. p3+p11+p12<=.375033 24. p2+p9+p10<=.40625 25. p1+p7+p8<=.339218 26. pi(s) ≥0 for i=1,…,12. 27. The equations from 14 to 25 should be ≥0 After solving above model, we get the desired results displayed already in example 3. Example 4: Case I (Maximization): The internal elements of W and all possible combinations for this problem are already defined in example 4. The objective function and the constraints are given as follows. Max z =p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+ p17+p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28 Subject to the Constraints 270 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+ p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+ p33+p34+p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+ p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58=1 2. p29+p30+p33+p34+p35+p39+p40+p43+p44+p45+p49+p50+p53+p54+p55= .02 3. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+ p36+p37+p41+p46+p47+p51+p56+p57=.4 4. p4+p5+p6+p10+p11+p12+p19+p20+p21+p22+p26+p27+p28+p32+p38+p42 +p48+p52+p58=.58 5. p1+p2+p3+p4+p5+p6+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+ p39+p40+p41+p42=.18 6. p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26+p27+p43+p44+p46+ p48+p49+p52+p53+p54+p56+p58=.42 7. p8+p9+p12+p16+p17+p18+p22+p24+p25+p28+p45+p47+p50+p51+p55+ p57=.4 8. p31+p32+p36+p37+p38+p42+p43+p44+p47+p48+p51+p52+p56+p57+p58= .05 9. p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+p33 +p34+p39+p43+p45+p50+p53+p55=.4 10. p2+p3+p6+p7+p9+p11+p14+p15+p18+p21+p23+p25+p27+p30+p35+p40+ p44+p49+p54=.55 11. p7+p8+p9+p10+p11+p12+p29+p30+p31+p32+p43+p44+p45+p46+p47+p48 +p49+p50+p51+p52=.32 12. p2+p4+p6+p14+p16+p18+p19+p21+p22+p25+p27+p28+p33+p35+p36+p38 +p40+p42+p54+p55+p57+p58=.48 271 13. p1+p3+p5+p13+p15+p17+p20+p23+p24+p26+p34+p37+p39+p41+p53+p56 =.2 14. p13+p14+p15+p16+p17+p18+p19+p20+p21+p22+p33+p34+p35+p36+p37+ p38+p43+p44+p45+p46+p47+p48+p53+p54+p55+p56+p57+p58=.34 15. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p23+p24+p25+p26+p27+ p28+p29+p30+p31+p32+p39+p40+p41+p42+p49+p50+p51+p52=.66 16. p23+p24+p25+p26+p27+p28+p39+p40+p41+p42+p49+p50+p51+p52+p53+ p54+p55+p56+p57+p58=.23 17. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p30+p32+p34+p35+ p37+p38+p44+p45+p47+p48=.57 18. p1+p2+p4+p7+p8+p10+p13+p14+p16+p19+p29+p31+p33+p36+p43+p46 =.2 19. p1+p2+p3+p7+p13+p14+p15+p23+p31+p36+p37+p41+p46+p56<=.24 20. p1+p8+p13+p16+p17+p24<=.16 21. p2+p14+p16+p18+p25+p36+p57<=.32 22. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+ p36+p37+p41+p46+p47+p51+p56+p57<=.4 23. p3+p9+p15+p17+p18+p23+p24+p25+p37+p41+p47+p51+p56+p57<=.32 24. p1+p4+p5+p10+p13+p19+p20+p26+p29+p33+p34+p39+p43+p53<=.24 25. p2+p4+p6+p7+p10+p11+p14+p19+p21+p27+p29+p30+p31+p32+p33+p35+ p36+p38+p40+p42+p43+p44+p46+p48+p49+p52+p54+p58<=.48 26. p1+p2+p3+p4+p5+p6+p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26 +p27+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+p39+p40+p41+p42 +p43+p44+p46+p48+p49+p52+p53+p54+p56+p58<=.6 27. p3+p5+p6+p11+p15+p20+p21+p23+p26+p27+p30+p32+p34+p35+p37+p38 +p39+p40+p41+p42+p44+p48+p49+p52+p53+p54+p56+p58<=.48 272 28. p4+p8+p10+p12+p16+p19+p22+p28+p29+p33+p43+p45+p50+p55<=.32 29. p1+p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+ p33+p34+p39+p43+p45+p50+p53+p55<=.4 30. p5+p12+p17+p20+p22+p24+p26+p28+p34+p39+p45+p50+p53+p55<=.32 31. p2+p4+p6+p7+p8+p9+p10+p11+p12+p14+p16+p18+p19+p21+p22+p25+ p27+p28+p29+p30+p31+p32+p33+p35+p36+p38+p40+p42+p43+p44+p45+ p46+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.8 32. p6+p9+p11+p12+p18+p21+p22+p25+p27+p28+p30+p32+p35+p38+p40+ p42+p44+p45+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.64 33. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p23+p24+p25+p26+ p27+p28+p30+p32+p34+p35+p37+p38+p39+p40+p41+p42+p44+p45+p47+ p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58<=.8 34. p7+p13+p14+p15+p23+p43+p44+p46+p49+p53+p54+p56<=.1764 35. p1+p8+p13+p16+p17+p24+p29+p31+p33+p34+p36+p37+p39+p41+p43+ p45+p46+p47+p50+p51+p53+p55+p56+p57<=.189 36. p2+p14+p16+p18+p25+p33+p35+p36+p40+p54+p55+p57<=.2016 37. p1+p2+p3+p7+p8+p9+p23+p24+p25+p29+p30+p31+p39+p40+p41+p49+ p50+p51 <=.2772 38. p3+p9+p15+p17+p18+p30+p34+p35+p37+p44+p45+p47<=.2394 39. p10+p13+p19+p20+p26+p43+p46+p48+p52+p53+p56+p58<=.189 40. p14+p19+p21+p27+p54+p58<=.2016 41. p7+p10+p11+p23+p26+p27+p49+p52<=.2772 42. p11+p15+p20+p21+p44+p48<=.2394 43. p4+p16+p19+p22+p28+p33+p36+p38+p42+p55+p57+p58<=.216 44. p1+p4+p5+p8+p10+p12+p24+p26+p28+p29+p31+p32+p39+p41+p42+p50+ p51 +p52<=.297 273 45. p5+p12+p17+p20+p22+p32+p34+p37+p38+p45+p47+p48<=.2565 46. p2+p4+p6+p25+p27+p28+p40+p42<=.3168 47. p6+p18+p21+p22+p35+p38<=.2736 48. p3+p5+p6+p9+p11+p12+p30+p32<=.3762 49. pi(s) ≥0 for i= 1,…, 58. 50. The equations from 19 to 48 ≥0. After solving above model, we get the desired results displayed already in example 4. Example 4: Case II (Minimization): The internal elements of W are given as follows. W= 0.4 0.42 0.0 0.18 1.0 0.58 0.4 0.02 0.0 1.0 0.4 0.45 0.0 0.15 1.0 0.52 0.2 0.28 0.0 1.0 0.34 0.0 0.66 0.0 1.0 0.43 0.2 0.37 0.0 1.0 2.67 1.67 0.33 6.0 1.33 Now above problem becomes of solving the controlled selection problem with N= 24 and n= 6. All possible combinations satisfying condition (2.11) are as follows. x . . . . . x . (1)x . . . . x . . x . . . . x . . x . . . . x (2). x . x . . x . . . x . . . . . . . x . . . . . x . (3). x . . . x . . x . . . x . . . 274 . x . . . x (4)x . . x . . x . . . x . . . x . . . . . x . . (5)x . . . . . x . . . x . . . . x . . . . x . . . x (6). x . x . . x . . x . . . . . . . . x . . . x . . . (7). x . . . . x . x . . . . x . . x . . . x . (8)x . . . . x x . . . x . x . . . x . . x . . . x . . (13)x . . . (14). x . . x . . x . . . . x . . . x . x . . . x . . . . . . . x . . . x . (9). x . . . x x . . x . . . x . . . x . . . (15). x . . . x . . . . x . x . . . . x . . x . . . (10)x . . . . . x . x . . . . x . . . x . . x . . . (11). x . . . . x . x . . . x . . . . x . . . x . . (12)x . . . . . x . x . . . x . . . . . . . . . x . . . x . . . . x . . . x . . (16)x . . . (17)x . . . x . . . . x . . . . x . . . x . . x . . x . . . x . . . . x . . (18). x . . x . . . . . x . x . . . . . . . . . x . . . . x . . (24)x . . . . x . . x . . . . . x . . . . . . . x . x . . (19)x . . x . . . . x . x . . . x . x . . (20)x . . . x . . . . x . . . . x . . . x . . . . (21). x . . . x . . x . . . x . . x . . . . x . . . x . . (22)x . . . x . . . . . x . x . . x . . . x . (25). x . x . . x . . . . x . . x . . . x . . x . . . x . . . (26)x . . . (27). x . . . x . . x . . . x . . . x . . . . . x . . . x . . x . . . x . . (28)x . . . x . . . x . . . . . x x . . . (31). . . . x . . x . . . x . . . . . x . x . . . x . . . x . . . x (32). . . x (33)x . . . (34)x x . . . x . x . . . . . . x . . . . . x . . . . x . . . . x . . x . x . . . x . . . (23). x . . . x . . x . . . . . x . . . . x . . . x . . . x . . . x . . (29)x . . . (30). x . . . . . x . . . x . . x . . . x . . . . . x . . x . . . . . x . . . x x . . . . x . . . x . . . x . . . . (35). x . . (36). . . x x . . x . . . x . . . . x . . . x . . . x . . . . x . . . . x . . x . . . . x . . . . . x . . . x x . . . . x . . . . x . . . x . . . x . . . x . . . x . . . x . (37). . . x (38). . . x (39)x . . . (40). x . . (41). . . x (42). . . x . x . . x . . . . x . . x . . . . x . . x . . . . . x . . . x . x . . . x . . . x . . . x . . . x . . . x . . . . . x . . . x . . . x . . . x . 275 . . . x . . (43)x . . . . x . . x . x . x . . . x . . . x . x . . . . x . . . (44). x . . (45)x . . . . . . x . . . x . . . . x . . . x . . x . . . x . . . x . . . x . . . . x . . x . . . . x . . x . . . (46). . . x (47). . . x (48). . . x . . x . . . x . . . x . . . x . . . x . . . x . . x . . x . . . x . . . . . . x . . . x x . . . . x . x . . . . x . . . x . . x . . (49). x . . (50)x . . . (51). . . x (52). . . . . x . . . x . . . x . . . x x . . . x . . . x . . . x . . . . x . . . x . . . x . . . x . . . x x . . . x . . . . x . . x . . . . x . . (55)x . . . (56). . . x (57). . . x x . . . . x . . x . . . . . x . . . x . . . x . . . x . . . x . . . x . . . . . x . . . x . x . . . x . . . x (53)x . . . (54). x . . . . x . . x . . . . . . x . . . x . . . . x . . . x . . x . . x . . . (58). . . x x . . . . . x . . . x . Now we apply the proposed model as follows. Max z= p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+ p16+p17+p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28 Subject to the constraints 1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+p18+ p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+p33+p34+ p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+p48+p49+p50+ p51+p52+p53+p54+p55+p56+p57+p58=1 2. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+p36+ p37+p41+p46+p47+p51+p56+p57=.4 3. p4+p5+p6+p10+p11+p12+p19+p20+p21+p22+p26+p27+p28+p32+p38+p42+p48+ p52+p58=.42 276 4. p29+p30+p33+p34+p35+p39+p40+p43+p44+p45+p49+p50+p53+p54+p55=.18 5. p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26+p27+p43+p44+p46+p48+ p49+p52+p53+p54+p56+p58=.58 6. p8+p9+p12+p16+p17+p18+p22+p24+p25+p28+p45+p47+p50+p51+p55+p5 7=.4 7. p1+p2+p3+p4+p5+p6+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+p39+ p40+p41+p42=.02 8. p1+p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+ p33+p34+p39+p43+p45+p50+p53+p55=.4 9. p2+p3+p6+p7+p9+p11+p14+p15+p18+p21+p23+p25+p27+p30+p35+p40+p44+ p49+p54=.45 10. p31+p32+p36+p37+p38+p42+p46+p47+p48+p51+p52+p56+p57+p58=.15 11. p2+p4+p6+p14+p16+p18+p19+p21+p22+p25+p27+p28+p33+p35+p36+p38 +p40+p42+p54+p55+p57+p58=.52 12. p1+p3+p5+p13+p15+p17+p20+p23+p24+p26+p34+p37+p39+p41+p43+p44+p53+ p56=.2 13. p7+p8+p9+p10+p11+p12+p29+p30+p31+p32+p43+p44+p45+p46+p47+p48+p49+ p50+p51+p52=.28 14. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p23+p24+p25+p26+p27+p28+ p29+p30+p31+p32+p39+p40+p41+p42+p49+p50+p51+p52=.34 15. p13+p14+p15+p16+p17+p18+p19+p20+p21+p22+p33+p34+p35+p36+p37+p38+ p43+p44+p45+p46+p47+p48+p53+p54+p55+p56+p57+p58=.66 16. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p30+p32+p34+p35+p37+ p38+p44+p45+p47+p48=.43 17. p1+p2+p4+p7+p8+p10+p13+p14+p16+p19+p29+p31+p33+p36+p43+p46=.2 18. p23+p24+p25+p26+p27+p28+p39+p40+p41+p42+p49+p50+p51+p52+p53+p54+ p55+p56+p57+p58=.37 277 19. p1+p2+p3+p7+p13+p14+p15+p23+p31+p36+p37+p41+p46+p56<=.24 20. p1+p8+p13+p16+p17+p24<=.16 21. p2+p7+p8+p9+p14+p16+p18+p25+p31+p36+p46+p47+p51+p57<=.32 22. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+p36+ p37+p41+p46+p47+p51+p56+p57<=.4 23. p3+p9+p15+p17+p18+p23+p24+p25+p37+p41+p47+p51+p56+p57<=.32 24. p1+p4+p5+p10+p13+p19+p20+p26+p29+p33+p34+p39+p43+p53<=.24 25. p2+p4+p6+p7+p10+p11+p14+p19+p21+p27+p29+p30+p31+p32+p33+p35+ p36+p38+p40+p42+p43+p44+p46+p48+p49+p52+p54+p58<=.48 26. p1+p2+p3+p4+p5+p6+p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26 +p27+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+p39+p40+p41+p42 +p43+p44+p46+p48+p49+p52+p53+p54+p56+p58<=.6 27. p3+p5+p6+p11+p15+p20+p21+p23+p26+p27+p30+p32+p34+p35+37+p38+p39+ p40+p41+p42+p44+p48+p49+p52+p53+p54+p56+p58<=.48 28. p4+p8+p10+p12+p16+p19+p22+p28+p29+p33+p43+p45+p50+p55<=.32 29. p1+p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+p33+ p34+p39+p43+p45+p50+p53+p55<=.4 30. p5+p12+p17+p20+p22+p24+p26+p28+p34+p39+p45+p50+p53+p55<=.32 31. p2+p4+p6+p7+p8+p9+p10+p11+p12+p14+p16+p18+p19+p21+p22+p25+p27+ p28+p29+p30+p31+p32+p33+p35+p36+p38+p40+p42+p43+p44+p45+p4 6+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.8 32. p6+p9+p11+p12+p18+p21+p22+p25+p27+p28+p30+p32+p35+p38+p40+p42+p44 +p45+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.64 33. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p23+p24+p25+p26+p27+ p28+p30+p32+p34+p35+p37+p38+p39+p40+p41+p42+p44+p45+p47+p48+p49+ p50+p51+p52+p53+p54+p55+p56+p57+p58<=.8 278 34. p4+p5+p6+p12+p22+p28+p32+p38+p42<=.1764 35. p6+p11+p21+p27<=.189 36. p5+p10+p11+p12+p20+p26+p32+p48+p52<=.2016 37. p19+p20+p21+p22+p38+p48+p58<=.2772 38. p4+p10+p19+p26+p27+p28+p42+p52+p58<=.2394 39. p2+p3+p6+p9+p18+p25+p30+p35+p40<=.189 40. p1+p3+p5+p8+p9+p12+p17+p24+p29+p30+p31+p32+p34+p37+p39+p41+p45+ p47+p50+p51<=.2016 41. p16+p17+p18+p22+p33+p34+p35+p36+p37+p38+p45+p47+p55+p57<=.2772 42. p1+p2+p4+p8+p16+p24+p25+p28+p29+p31+p33+p36+p39+p40+p41+p42+p50+ p51 +p55+p57<=.2394 43. p3+p7+p9+p11+p15+p23+p30+p44+p49<=.216 44. p14+p15+p18+p21+p35+p44+p54<=.297 45. p2+p7+p14+p23+p25+p27+p40+p49+p54<=.2565 46. p13+p15+p17+p20+p34+p37+p43+p44+p45+p46+p47+p48+p53+p56<=.3168 47. p1+p7+p8+p10+p13+p23+p24+p26+p29+p31+p39+p41+p43+p46+p49+p50+p51+ p52+p53+p56<=.2736 48. p13+p14+p16+p19+p33+p36+p43+p46+p53+p54+p55+p56+p57+p58<=.3762 49. pi(s) ≥0 for i=1,…,58. 50. The equations from 19 to 48 should be ≥0. After solving above model, we get the desired results displayed already in example 4. 279 CHAPTER VI THE APPLICATION OF FUZZY LOGIC TO THE SAMPLING SCHEME 6.1 INTRODUCTION In many situations, while conducting a survey, it is not possible to enumerate all the units in the population, as it is very time consuming and also increases the cost of the survey. Thus in most of the cases, the sampler likes to take a part of the population instead of taking the whole population, to determine the characteristics of the population. This part of the population which represents the characteristics of the whole population is known as a “sample”. There exist many sampling procedures in the literature to draw a sample form the population e.g. Simple random sampling (SRS), Stratified sampling, systematic sampling, Probability proportional to Size sampling etc. The simplest method of drawing a sample from the population is SRS, in which each and every unit of the population has equal chance of being included in the sample i.e. the sampler can draw the 280 sampling units one by one by assigning equal probability of selection to each of the available units in the population. In SRS there is no restriction in the selection of the sampling units but the drawback of the SRS is that there is no guarantee that all the segments of the population will be represented in the sample. One way to overcome from the drawback of the SRS is to use stratified sampling. In stratified sampling, the whole population is divided into several groups (called strata), each of which is more homogeneous than the entire population and then draw a random sample of pre-determined size from each one of the groups. Stratified sampling can effectively be used in the situations when the population is heterogeneous. Systematic sampling is another way to draw a sample from the population, in which only the first unit is selected at random, the rest being automatically selected according to some predetermined pattern involving regular spacing of units. The main drawback of the systematic sampling is that an unbiased estimation of variance is not possible with systematic sampling. There exist many other sampling procedures in the literature, which have their own characteristics and these sampling procedures can be used in different situations according to their suitability. 281 One drawback of the above mentioned sampling procedures is that none of these sampling procedure takes into account the size of the population units, while selecting the units from the population. If the population units vary considerably in size, then it may not be appropriate to select the population units with equal probability, since it does not take into account the possible importance of the larger units in the population. One way to overcome from this problem is to assign unequal probabilities of selection to different units in the population. Thus, when the population units vary considerably in size and the variate under study is highly correlated with the size of the unit, probabilities of selection may be assigned in proportion to the size of the population unit. For example, villages with larger geographical area are likely to have larger area under food crop. Thus in estimating production or food supply, it may well be desirable to adopt a scheme of selection, in which villages are selected with probabilities proportional to their geographical areas. Thus, a sampling scheme in which the units are selected with probabilities proportional to some measure of their size is known as sampling with probability proportional to size (PPS). PPS sampling can be done with or without replacement. If xi is an integer 282 proportional to the size of the ith unit, in a population of N units i.e., i=1,…,N, then the initial selection probabilities using PPS can be defined as follows pi = N xi , where x = ∑ xi . x i =1 (6.1.1) There exists a lot of literature, related with the estimation purpose for the PPS sampling. An estimator commonly used to estimate the population mean or total with PPS is the well known Horvitz- Thompson estimator, defined in expression (9) of subsection 2.2.3 of chapter 2. Sen (1953) and Yates and Grundy (1953) have defined the ∧ expression for variance Y HT , given in expression (10) of subsection ∧ 2.2.3 of chapter 2. The evaluation of Y HT using expression (10) consists of calculating the value of π ij . Various techniques have been defined by different authors to calculate the value of π ij . Ashok and Sukhatme (1976a) provide good approximation for π ij correct to O (N-4), for Sampford’s procedure, given as follows 283 { } π ij = n(n − 1) pi p j [1 + ( pi + p j ) − ∑ pt 2 + {2( p 2 i + p j ) − 2 ∑ p t − ( n − 2) p i p j 2 3 } (6.1.2) + (n − 3)( pi + p j )∑ pt − (n − 3)(∑ pt ) 2 ] 2 2 ∧ and the variance of Y HT correct to O (N-2) is given as follows ∧ V (Y HT ) SAMP = 1 2 2 2 p A − (n − 1) ∑ pi Ai 2 ∑ i i nN i∈S i∈S n −1 3 2 2 − 2∑ pi Ai − ∑ pi 2 nN i∈S i∈S 2 pi Ai + (n − 2) ∑ pi Ai ∑ i∈S i∈S 2 2 2 (6.1.3) where Ai = N Yi − Y and Y = ∑ Yi . pi i =1 (6.1.4) Goodman and Kish (1950) has also defined the expression for ∧ variance of Y HT correct to O(N-2), defined in expression (12) of subsection (2.2.3) of chapter 2. Recently, Brewer and Donadio (2003) derived the πij-free formula for high entropy variance of HT estimator. Their expression for the variance of the HT estimator is given in expression (13) of subsection (2.2.3) of chapter 2. 284 In PPS sampling, we take into consideration only the size of the population units, but in some situations, it may happen that some auxiliary information related with the population units is also available. This information can also be utilized while assigning the initial selection probabilities to the population units, to increase the efficiency of the survey. In this chapter, we have made an attempt to utilize all the available auxiliary information related with the population units, in addition to the size of the population unit, in assigning the initial selection probabilities to the population units. For this purpose, we use the concept of Fuzzy approach. Using Fuzzy approach, we can utilize all the auxiliary information related with the population units, to obtain more efficient sampling design. In section 6.2, we give the concept of fuzzy logic approach. In section 6.3, we describe the proposed procedure and also give some examples to show the superiority of the proposed procedure over the PPS sampling procedure. 285 6.2 FUZZY LOGIC APPROACH As the name suggests, the fuzzy logic is a logic, which deals with the values, which are approximate rather than exact. The classical logic relies on something which is either true or false. A True element is usually assigned a value of 1 and false has a value 0. Thus, something either completely belongs to a set or it is completely excluded from the set. The fuzzy logic broadens this definition of classical logic. The basis of the logic is fuzzy sets. Unlike in classical sets, where membership is full or none, an object is allowed to belong only partly to one set. The membership of an object to a particular set is described by a real value which lies between 0 and 1. Thus, for instance, an element can have a membership value 0.5, which describes a 50% membership in a given set. Such logic allows a much easier application of many problems that cannot be easily implemented using classical approach. The importance of fuzzy logic derives from the fact that most modes of human reasoning and especially common sense reasoning are approximate in nature. For example, consider a set of tall people in the classical logic, and suppose that the person with height greater than or equal to 6 feet is considered as tall. Then, a person having height 6 feet and 1 inch will 286 be included in the set of tall people and the person having height 5 feet 11 inches will not be included in the set. In this case such a representation of reality leaves much to be desired. On the other hand, using the fuzzy logic, the person being 6-1 tall can still have a full membership of the set of tall people, but the person that is 5-11 tall, can have 90% membership of the set. The 5-11 person thus can have, what can be described as a "quite tall" representation in a model. Fuzzy Set Theory was formalized by Professor Lofti Zadeh at the University of California in 1965. Lofti Zadeh described the essential characteristics of fuzzy logic as follows. • In fuzzy logic, exact reasoning is viewed as a limiting case of approximate reasoning. • In fuzzy logic everything is a matter of degree. • Any logical system can be fuzzified. • In fuzzy logic, knowledge is interpreted as a collection of elastic or, equivalently, fuzzy constraint on a collection of variables. • Inference is viewed as a process of propagation of elastic constraints. 287 Now, we give a brief introduction of fuzzy inference system (FIS). Fuzzy inference system provides the facility to incorporate all the auxiliary information to draw the conclusions. Mat lab Fuzzy logic toolbox provides an opportunity to look at all the components of Fuzzy inference system. The first step of working with Fuzzy inference system is to define a base line model for all the input variables and also for the final or output variable. Input variables contain some auxiliary information and the output variable gives the final results by extracting all the useful information from the input variables. After defining the base line model, the next step is to define the fuzzy rules, which plays an important role in assigning the final grade of membership to the elements. In the following subsections, we give a brief introduction of the base line model and the fuzzy rules. 6.2.1 Base line model: To draw any inferences from the Fuzzy inference system, the first step is to define the base line model. The base line model consists of some input variables and also the output variable. An example of the base line model is given in figure 1. In the base line model, we 288 have to define the fuzzy sets for all the input variables and also for the final or output variable. Input variables are those variables from which one has to draw the inferences i.e. the input variables contains some auxiliary information and the output variable is the variable, which defines the final grade of membership for all the elements of the set. To define the fuzzy sets for any input variable, first, we have to choose the range and the membership function for that input variable. Range of the input variable can be defined by taking the minimum and the maximum value of the input variable. There exist many membership functions in fuzzy inference system. The membership function for any input variable can be chosen according to the property of that input variable. These membership functions can be located graphically in fuzzy inference system. After choosing the membership function for the input variable, we define the different fuzzy sets for the input variable, using the selected membership function. An example of the model (consisting of fuzzy sets and range for the input variable) of input variable is given in figure 2. The membership function for different input variables can be different. Actually the membership function depends upon the characteristics of the input variable, hence it may vary from one input variable to 289 another input variable. After defining the range and the fuzzy sets for all the input variables, we define the range and the fuzzy set for the output variable. Figure 5 shows the model for the output variable. 6.2.2 Fuzzy Rules: In the real life situations, all the human being makes the decision. These decisions are based on rules. For Example: if the weather is fine and today is holiday, then we may decide to go out or if the forecast says that the weather will be bad today, but fine tomorrow, then we make a decision not to go today, and postpone it till tomorrow. Similarly, in order to design a FIS, we have to define the fuzzy rules. Fuzzy inference system consists of if-then rules that specify a relationship between the input and output fuzzy sets. In order to draw some conclusion from input variables, we have to define the fuzzy rules. These rules are based on common sense. The fuzzy rules, are formulated using a series of if-then statements, combined with AND/OR operators. These rules are very useful to find out the final decision. 290 6.3 THE PROPOSED PROCEDURE The fuzzy inference process can be described completely in the five steps as follows: Step 1: The first step is to choose the base line model in fuzzy inference system. Step 2: The second step is to take inputs and define the range and membership function for these input variables. Step 3: Define the membership function and range for output variable to obtain the final grading. Step 4: Define the fuzzy if-then rules for the inference process. Step 5: Enter the value of input variables to get the required result. Now, we describe the proposed procedure through the following examples. 291 Example 1: The government of Uttarakhand wants to run a scheme for literacy. Before applying the scheme to all the districts of the state, the government wants to apply it to a few districts of the state to get the idea of its success. The districts should be selected on the basis of low literacy rate, less population and less area. The problem to be solved here is to find out the initial selection probability of each district by considering the above mentioned criteria. For this sample survey, we have N=13 and n=3. In this situation, if the initial selection probabilities are assigned according to PPS, then it will not be justified, as in PPS sampling procedure the probabilities are assigned only by taking the criteria of size and all other factors are ignored in PPS sampling. In this situation, the concept of fuzzy logic approach works very well as fuzzy logic approach has the capability to express the above mentioned factors in mathematical terms, which can be utilized to assign the initial selection probabilities to all the districts. In order to incorporate all the above three factors, we have to use the fuzzy inference system. The procedure of using fuzzy inference system is as follows. 292 Consider the following data related with the 13 districts of Uttarakhand. Table 1 Districts Population Previous literacy rate Area Number of interviewer Pithoragarh 462289 75.9% (Square K. M.) 7169 Almora 630567 73.6% 3689 315 Nainital 762909 78.4% 3422 381 Bageshwar 249462 71.3% 1696 124 Champawat 224542 70.4% 2004 112 U. S. Nagar 1235614 64.9% 3055 617 Uttarkashi 295013 65.7% 8016 147 Chamoli 370359 75.4% 7520 185 Rudraprayag 227439 73.6% 2439 113 Tehri 604747 66.7% 3796 302 Dehradun 1282143 79% 3088 641 Pauri 697078 77.5% 5230 348 Hardwar 1447187 63.7% 2360 723 231 Firstly, we define the following base line model, for the above data. 293 Figure 1 The above base line model consists of three input variables and one output variable. The first input variable is the number of interviewer, the second input variable is the literacy rate and the third input variable is the area of the district. For the first input variable, i.e. for the number of interviewers, we have defined 3 fuzzy sets, namely, excellent, good and poor. Less number of interviewers will reduce the cost of the survey, thus less number of interviewers are taken under the category of excellent. Similarly, the fuzzy sets good and poor are defined according to the increment in the number of interviewers. We have taken the gaussmf membership function for making these three 294 fuzzy sets. The gaussmf membership function depends on two parameters σ and c and is given as follows. f ( x; σ , c) = e − ( x −c ) 2 2σ 2 One interviewer per 2000 population has been taken as the criterion for the required number of interviewers for a given population. Number of interviewers for all the districts are given in the 5th column of table 1. The range for number of interviewers is taken as [100, 1000]. Thus the model for the first input variable is as follows. Figure 2 For the second input variable, i.e. for the literacy rate, we have 295 defined 2 fuzzy sets, namely, good and poor. Since the scheme is applied for illiterate persons, so the district with less number of literate persons will be preferred and thus we have taken less literacy rate under the category of good and high literacy rate under the category of poor. We have taken the pimf membership function for making these two fuzzy sets. The pimf is a Π shaped membership function. The syntax for pimf membership function can be described as y = pimf(x,[a b c d]). This membership function is evaluated at the points determined by the vector x. The parameters a and d locate the “feet” of the curve, while b and c locate its “shoulders”. The range for literacy rate is taken as [0,100] and the model for the literacy rate is as follows. Figure 3 296 For the third input variable, i.e. for the area of the district, we have again defined 3 fuzzy sets, namely, excellent, good and poor. Less area of the district will reduce the distance and hence the time of the survey will be reduced. Thus less area of the district is taken under the category of excellent. Similarly, the fuzzy sets good and poor are defined according to the increment in the area of the district. We have taken the gaussmf membership function for making these three fuzzy sets. The range for the area of the district is taken as [2000, 8000]. Thus the model for the third input variable is as follows. Figure 4 After defining the membership functions for all the three input variables, we define the membership function for the output variable 297 i.e. for the final grade of membership. For the output variable, we have defined 3 fuzzy sets, namely, excellent, good and poor. The fuzzy set excellent consists of the districts with high grade of membership and the fuzzy set good and poor consists of the districts according to the grade of membership in decreasing order. For the output variable, we have taken the membership function gaussmf. The range for the output variable is [0, 1] and the model for the final grade of membership is as follows. Figure 5 After defining the base line model, we define the fuzzy rules for the above problem as follows. 298 1. If (Interviewer is excellent) AND (literacy is good) AND (area is excellent) then (grade is excellent.) 2. If (Interviewer is good) AND (literacy is good) AND (area is good) then (grade is good.) 3. If (Interviewer is poor) AND (literacy is poor) AND (area is poor) then (grade is poor). The following figure shows the fuzzy inference process Figure 6: fuzzy inference process Now, we have done all the things, which are necessary for the data input. In order to find out the final grade of membership for any district, we have to put the triplet in fuzzy inference system, i.e. the 299 values of the number of interviewers, literacy rate and area from table1. After putting the values of these three factors for all the districts, we find the grade of membership for all the districts, given in table 2. Table 2 Districts Grade of membership Pithoragarh Almora Nainital Bageshwar Champawat U.S. Nagar Uttarkashi Chamoli Rudraprayag Tehri Dehradun Pauri Hardwar 0.5 0.5265 0.5455 0.8568 0.8589 0.5176 0.5 0.5 0.804 0.5181 0.5328 0.4993 0.5106 Initial selection Initial selection probability probability (Proposed (PPS) procedure) 0.065 0.05 0.069 0.07 0.071 0.09 0.112 0.03 0.112 0.04 0.067 0.15 0.065 0.03 0.065 0.04 0.105 0.03 0.068 0.07 0.069 0.15 0.065 0.08 0.067 0.17 After getting the final grade of membership for all the districts, the next step is to obtain the initial selection probabilities for all the districts. This can be done as follows. 300 Let Xi (1=1,…,N) represents the final grade of membership for the ith district. Then the initial selection probability of the ith district can be obtained as follows. pi = N Xi , where X = ∑ X i . X i =1 (6.3.1) Using (6.3.1), we get the initial selection probabilities for all the 13 districts, which are displayed in the 3rd column of table 2. To compare the proposed procedure with PPS sampling procedure, we have also solved the above problem by PPS sampling procedure, by taking the size proportional to the population of the districts and get the initial selection probabilities of all the districts, which have been given in the 4th column of table 2. Now, to demonstrate the utility of the proposed procedure in terms of precision of the estimate, the variance for the proposed procedure is compared with the variance for the PPS sampling procedure using the expression (10), (12) and (13) of subsection (2.2.3) of chapter 2 and the expression (6.1.3) of this chapter. 301 ∧ In order to compute the variance of (Y HT ) YG using the expression (10), we have to calculate the values of π ij . Thus, using (6.1.2), we get the values of π ij for the proposed procedure demonstrated in table 3. Table 3 ( i,j) π ij ( i,j) π ij ( i,j) π ij ( i,j) π ij ( i,j) π ij 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 1,10 1,11 1,12 1,13 2,3 2,4 2,5 2,6 .027 .029 .060 .060 .026 .025 .025 .054 .027 .027 .025 .026 .031 .063 .063 .029 2,7 2,8 2,9 2,10 2,11 2,12 2,13 3,4 3,5 3,6 3,7 3,8 3,9 3,10 3,11 3,12 .027 .027 .057 .029 .030 .027 .029 .065 .065 .030 .029 .029 .059 .031 .031 .029 3,13 4,5 4,6 4,7 4,8 4,9 4,10 4,11 4,12 4,13 5,6 5,7 5,8 5,9 5,10 5,11 .031 .109 .062 .06 .06 .101 .062 .063 .06 .062 .062 .06 .06 .101 .062 .063 5,12 5,13 6,7 6,8 6,9 6,10 6,11 6,12 6,13 7,8 7,9 7,10 7,11 7,12 7,13 8,9 .06 .062 .026 .026 .056 .028 .029 .026 .027 .025 .054 .027 .027 .025 .026 .054 8,10 8,11 8,12 8,13 9,10 9,11 9,12 9,13 10,11 10,12 10,13 11,12 11,13 12,13 .027 .027 .025 .026 .056 .057 .054 .056 .029 .027 .028 .027 .029 .026 Having obtained the values of π ij , we have calculated the ∧ variance of (Y HT ) YG for the proposed procedure, the value of which is demonstrated in table 4. Similarly, we have calculated the values of 302 π ij for PPS sampling procedure and then calculate the value of ∧ variance of (Y HT ) YG for PPS sampling procedure, which is demonstrated in table 4. ∧ We have also calculated the value of variance of Y HT by using the expressions (12) and (13) of subsection (2.2.3) of chapter 2 and the expression (6.1.3) of this chapter for both the procedures i.e. for the proposed procedure and for PPS sampling procedure. The value of ∧ variance of (Y HT ) obtained through all the above expressions are demonstrated in table 4. Table 4 ∧ V (Y HT ) YG 205.9167 Proposed procedure PPS 530.4458 ∧ ∧ ∧ V (Y HT ) GK 207.746 V (Y HT ) SAMP 207.586 V (Y HT ) BD 203.507 460.978 458.735 446.011 From table 4, we observe that the variance of the proposed procedure is very small as compared to PPS sampling procedure in all the cases. This shows that the proposed procedure has less variability 303 in comparison to PPS sampling procedure and hence the proposed procedure can be considered more efficient than PPS sampling procedure. Example 2: Consider the following data taken from singh and Chaudhary (1986). Table 5 S. No. 1 2 3 4 5 6 7 8 No. of Trees(xi) 50 30 25 40 26 44 20 35 Yield 60 35 30 44 30 50 22 40 pi (xi/x) .185 .111 .093 .148 .096 .163 .074 .130 The number of trees and the yield for 8 orchards are given and we have to assign the initial selection probabilities to the above 8 orchards. The initial selection probabilities using PPS sampling procedure has been given in the 4th column of table 5. From table 5, we observe that only the criteria of number of trees has been taken while assigning the initial selection probabilities to the orchards and the criteria of yield of orchards has been ignored. We can utilize the 304 auxiliary information of yield also, in addition to the number of trees, in assigning the initial selection probabilities to the orchards. In order to utilize the yield also, we have to use the concept of fuzzy logic approach. For this sample survey, we have N=8 and suppose n=3. The fuzzy inference process for this example can be described as follows. We have taken the following baseline model for the above data. Figure 7 The above base line model consists of two input variables and one output variable. The first input variable is the number of trees and the second input variable is the yield in the orchards. 305 For the first input variable, i.e. for the number trees, we have defined 3 fuzzy sets, namely, low, medium and high. Less number of trees will have less yield, thus less number of trees are taken under the category of low. Similarly, the fuzzy sets medium and high are defined according to the increment in the number of trees. We have taken the gaussmf membership function for making these three fuzzy sets. The range for number of trees is taken as [20, 50]. Thus the model for the first input variable is as follows. Figure 8 306 For the second input variable, i.e. for the yield, we have defined 3 fuzzy sets, namely, poor, good and excellent. We have taken the gaussmf membership function for making these three fuzzy sets. The range for the yield is taken as [20, 60] and the model for the second input variable is as follows. Figure 9 After defining the membership functions for the input variables, we define the membership function for the output variable i.e. for the final grade of membership. For the output variable, we have defined 3 fuzzy sets, namely, low, medium and high. The fuzzy set high consists of the orchards with high grade of membership and 307 the fuzzy set medium and low consists of the orchards according to the grade of membership in decreasing order. For the output variable, we have taken the function gaussmf. The range for the output variable is [0, 1] and the model for the final grade of membership is as follows. Figure 10 After defining the base line model, we define the fuzzy rules for the above problem as follows. 1. If (tree is low) AND (yield is good) then (grade is medium.) 2. If (tree is low) AND (yield is excellent) then (grade is high.) 3. If (tree is medium) AND (yield is poor) then (grade is low.) 308 4. If (tree is medium) AND (yield is excellent) then (grade is high.) 5. If (tree is high) AND (yield is poor) then (grade is low.) 6. If (tree is high) AND (yield is good) then (grade is medium.) The following figure shows the fuzzy inference process Figure 11: Fuzzy inference process Now, the process of data input has been done. In order to find out the final grade of membership for any orchard, we have to put the values of number of trees and the yield for that orchard. After putting the values of the number of trees and the yield for all the orchards, we 309 get the final grade of membership for all the orchards, given in table 6. Table 6 orchards 1 2 3 4 5 6 7 8 Grade of Initial selection membership probability 0.7435 0.4612 0.3933 0.5173 0.3571 0.6401 0.2922 0.4916 (Proposed procedure) 0.19 0.12 0.10 0.13 0.09 0.16 0.07 0.13 Having obtained the final grade of membership for all the orchards, we have calculated the initial selection probabilities for all the orchards using (6.3.1), given in the 3rd column of table 6. The values of π ij for the proposed procedure using (6.1.2) are given in the table 7. Having obtained the values of π ij , we have ∧ calculated the variance of (Y HT ) YG for the proposed procedure, the value of which is demonstrated in table 8. Similarly, we have 310 calculated the values of π ij for PPS sampling procedure and then ∧ calculate the value of variance of (Y HT ) YG for PPS sampling procedure, which is demonstrated in table 8. Table 7 ( i,j) 1,2 1,3 1,4 1,5 1,6 1,7 1,8 π ij .1997 .1657 .2177 .1496 .2754 .1193 .2177 ( i,j) 2,3 2,4 2,5 2,6 2,7 2,8 3,4 π ij .0757 .1248 .0734 .1691 .051 .1248 .0976 ( i,j) 3,5 3,6 3,7 3,8 4,5 4,6 4,7 π ij .0523 .137 .0328 .0976 .0848 .186 .0608 ( i,j) 4,8 5,6 5,7 5,8 6,7 6,8 7,8 π ij .1393 .1218 .0244 .0848 .0933 .1860 .0608 ∧ We have also calculated the value of variance of Y HT by using the expressions (12) and (13) of subsection (2.2.3) of chapter 2 and the expression (6.1.3) of this chapter for both the procedures i.e. for the proposed procedure and for PPS sampling procedure. The value of ∧ variance of (Y HT ) obtained through all the above expressions are demonstrated in table 8. 311 Table 8 ∧ V (Y HT ) YG Proposed .03571 procedure PPS .09416 ∧ ∧ ∧ V (Y HT ) GK V (Y HT ) SAMP V (Y HT ) BD .01247 .01246 .01249 .02607 .02607 .02589 From table 8, we observe that for this example also the variance of the proposed procedure is very small as compared to PPS sampling procedure in all the cases. From both of the above examples, we observe that the proposed procedure has less variability than the PPS sampling procedure and thus we can say that the proposed procedure is more efficient than the PPS sampling procedure. 312 CHAPTER VII SUMMARY ‘Controlled selection’ or ‘controlled sampling’ as the name suggests, is a method of selecting the samples from the finite population by imposing certain restrictions or controls while selecting the samples. The technique of controlled selection is used in sampling to minimize as far as possible the probability of selecting the nonpreferred samples, while conforming strictly to the requirements of probability sampling. Although the concept of controlled selection was being used by the statisticians for a long period of time, it received considerable attention in recent years due to its practical importance. The term ‘controlled selection’ or ‘controlled sampling’ is rather uncommon in the field of sample surveys, however the need of this special technique in sampling was long felt. Even a generation ago, the conflicting needs of controls and randomization were widely thought to be irreconcilable, as can be seen on the debate 313 between purposive and random methods in the Bulletin of International Statistical Institute (1926). Conceptually, the imposition of controls in selecting a sample may be viewed as an extension of the technique of purposive sampling, although it involves more judgment than purposive sampling. In fact, any departure from simple random sampling may be regarded as a control, which enhances the probability of preferred combinations by eliminating or reducing non-preferred (undesirable) combinations. The technique of controlled selection was originally formulated by Goodman and Kish (1950). They used the technique of controlled selection to a specified problem of selecting twenty one primary sampling units to represent the North-central states and found that by the use of this technique, the between first stage unit components of the variance were reduced from 11% to 32% below the same components corresponding to the stratified random sampling. The concept of ‘Controlled Selection’ is applicable in many fields, such as rounding techniques, disclosure control, overlap of sampling units etc. 314 In chapter I of the thesis, we have given a brief introduction of historical background of controlled selection, definition of controlled selection and some applications of controlled selection to statistical problems. A brief review of literature of controlled selection and the fields in which the controlled selection is applicable is also given in this chapter. In the last section of the chapter, the problem of estimates of the variance is also discussed. In chapter II of the thesis, we have used the concept of ‘nearest proportional to size sampling designs’ originated by Gabler (1987) to obtain an optimal controlled sampling design, which ensures the probabilitie of selecting the non-preferred samples exactly equal to zero. The variance estimation for the proposed optimal controlled sampling design using the Yates-Grundy form of Horvitz-Thompson estimator is discussed. Variance of the proposed procedure is compared with that of existing optimal controlled and uncontrolled high entropy selection procedures. Utility of the proposed procedure is demonstrated with the help of examples. In chapter III of the thesis, using the quadratic programming and utilizing the concept of ‘nearest proportional to size sampling design’, we have proposed a method for two dimensional optimal 315 controlled selection, which ensures zero probability to non- preferred samples. An estimator for estimating the variance in controlled selection is also proposed. The utility of the proposed procedure is demonstrated with the help of examples. In chapter IV of the thesis, using the technique of random rounding, we have introduced a new methodology for protecting the confidential information of tabular data with minimum loss of information. The tables obtained through the proposed method consist of unbiasedly rounded values, are additive and have specified level of confidentiality protection. Some numerical examples are also discussed to demonstrate the superiority of the proposed procedure over the existing procedures. In chapter V of the thesis, we have proposed a new methodology, which not only selects the sample in a controlled way but also maximizes or minimizes the overlap of sampling units for the two sample surveys. The two surveys can be conducted simultaneously or sequentially. The proposed method uses the linear programming approach for maximizing the probability of those sample combinations which consists of maximum number of overlapped sample units or for minimizing the probability of those 316 sample combinations which consists of maximum number of overlapped sample units. The proposed procedure has the advantage of the estimation of variance as the proposed procedure satisfies the non-negativity condition of Horvitz-Thompson (H-T) estimator for variance estimation and in those situations where the non-negativity condition of Horvitz- Thompson (H-T) estimator could not be satisfied, alternative method of estimation can be used. In chapter VI of the thesis, we have used the concept of fuzzy logic approach to obtain the more efficient sampling design. The proposed procedure utilizes all the auxiliary information, while assigning the initial selection probability to the population units. The superiority of the proposed procedure over the PPS sampling procedure is also discussed through some numerical examples. 317 REFERENCES: Albert, P. (1978). The algebra of fuzzy logic. Fuzzy sets and systems, 1(3), 203-230. Ashok, C. and Sukhatme, B. V. (1976a). On Sampford’s procedure of unequal probability sampling without replacement. Journal of American Statistical Association, 71, 912-918. Avadhani, M.S. and Sukhatme, B.V. (1973). Controlled sampling with equal probabilities and without replacement. International Statistical Review, 41, 175-182. Azmi, Z. A. (1993). New Fuzzy approaches by using Statistical and Mathematical methodologies in operations research. Journal of fuzzy Mathematics, 1(1), 69-87. Bellman, R. E. and Zadeh, L. A. (1970). Decision making in a fuzzy environment. Management science, 17(4), 141-164. Biswal, M. P. (1992). Fuzzy Programming technique to solve multiobjective geometric programming problems. Fuzzy sets and systems, 51(1), 67-72. Bit, A. K., Biswal, M. P. and Alam, S. S. (1992). Fuzzy programming approach to multicriteria decision making transportation problem. Fuzzy sets and systems, 50(2), 135-141. 318 Brewer, K. R. W. and Donadio, M. E. (2003). The high- entropy variance of the Horvitz-Thompson Estimator. Survey Methodology, 29, 189-196. Brewer, K. R. W., Early, L. J. and Joyce, S. F. (1972). Selecting several samples from a single population. Australian journal of statistics, 14, 231-239. Bryant, E.C.(1961). Sampling methods. Seminar paper, Iowa State University. Bryant, E.C., Hartley H.O. and Jessen, R.J.(1960). Design and estimation in two-way stratification. Journal of American Statistical Association, 55, 105-124. Carvalho, F. D., Dellaert, N. P. and Osório, M. S. (1994). Statistical Disclosure in Two-Dimensional Tables: General Tables. Journal of American Statistical Association, 89, 1547-1557. Cassel, C.M. and Särndal, C.E. (1972). A model for studying robustness of estimators and in formativeness of labels in sampling with varying probabilities. Journal of Royal Statistical Society, Series B, 34, 279-289. Causey, B.D., Cox, L.H. and Ernst, L.R. (1985). Application of transportation theory to statistical problems. Journal of American Statistical Association, 80, 903-909. 319 Chakrabarti, M.C. (1963). On the use of incidence matrices of designs in sampling from finite populations. Journal of Indian Statistical Association, 1, 78-85. Cox, L. H. (1980). Suppression Methodology and Statistical Disclosure Control. Journal of American Statistical Association, 75, 377-385. Cox, L. H. (1981). Linear Sensitivity Measures in Statistical Disclosure Control. Journal of Statistical Planning and Inference, 5, 153-164. Cox, L. H. (1987). A Constructive Procedure for Unbiased Controlled Rounding. Journal of American Statistical Association, 82, 420-424. Cox, L. H. (1995). Network Models for Complementary Cell Suppression. Journal of the American Statistical Association, 90, 1453-1462. Cox, L. H. and Ernst, L. R. (1982). Controlled Rounding. INFOR 20, 423-432. Doherty, P. D., Driankov and Hellendoorn, H. (1993). Fuzzy if-then unless rules and their implementation. International Journal of uncertainty, Fuzziness and knowledge based systems, 1(2), 167-182. 320 Dockery, J. T. and Murrar, E. (1987). A Fuzzy approach in aggregating military assessments. International journal of approximate reasoning, 1(3), 251-271. Ernst, L. R. (1996). Maximizing the overlap of sample units for two designs with simultaneous selection. Journal of official statistics, 12, 33-45. Ernst, L. R. (1998). Maximizing and Minimizing overlap when selecting a large number of units per stratum simultaneously for two designs. Journal of official statistics, 14, 297-314. Ernst, L. R. and Ikeda, M. (1995). A reduced-size Transportation algorithm for maximizing the overlap between surveys. Survey Methodology, 21, 147-157. Ernst, L. R. and Paben, S. P. (2002). Maximizing and minimizing overlap when selecting any number of units per stratum simultaneously for two designs with different stratifications. Journal of official statistics, 18, 185-202. Fellegi, I. (1963). Sampling with varying probabilities without replacement, rotating and non-rotating samples. Journal of American Statistical Association, 58, 183-201. Fellegi, I. (1966). Changing the probabilities of selection when two units are selected with PPS without replacement. In proceedings of the 321 Social Statistics Section, American Statistical Association, Washington, 434-442. Fellegi, I. P. (1975). Controlled Random Rounding. Survey Methodology, 1, 123-135. Fischetti, M. and Salazar, J. J. (2000). Models and Algorithms for Optimizing Cell Suppression in Tabular Data with Linear Constraints. Journal of American Statistical Association, 95, 916-928. Fischetti, M. and Salazar, J. J. (2003). Partial Cell Suppression: A new Methodology for Statistical Disclosure Control. Statistics and Computing, 13, 13-21. Foody, W. and Hedayat, A. (1977). On theory and applications of BIB designs and repeated blocks. Annals of Statistics, 5, 932-945. Frankel, L. R. and Stock, J.S. (1942): On the sample survey of unemployment. Journal of American Statistical Association, 10, 288293. Frühwirth-Schnatter, S. (1992). On Statistical Inference for Fuzzy data with applications to descriptive Statistics. Fuzzy sets and systems, 50(2), 143-165. Frühwirth-Schnatter, S. (1993). On Fuzzy Bayesian Inference. Fuzzy sets and systems, 60(1), 41-58. 322 Gabler, S. (1987). The nearest proportional to size sampling design. Communications in Statistics-Theory & Methods, 16(4), 1117-1131. Goodman, R. and Kish, L. (1950). Controlled selection-a technique in probability sampling. Journal of American Statistical Association, 45, 350-372. Gray, G. and Platek, R. (1963). Several methods of redesigning area samples utilizing probabilities proportional to size when the sizes change significantly. Journal of American Statistical Association, 63, 1280-1297. Gupta, V.K., Nigam, A.K. and Kumar, P. (1982). On a family of sampling schemes with inclusion probability proportional to size. Biometrika, 69, 191- 196. Gupta, V. K., Srivastava, A. K. and Reddy, K. S. (1989): On the use of connected block designs in inclusion probability proportional to size. Technical report. Indian agricultural research institute, New Delhi. Hedayat, A. and Lin, B.Y. (1980). Controlled probability proportional to size sampling designs. Technical Report, University of Illinois at Chicago. 323 Hedayat, A., Lin, B.Y. and Stufken, J. (1989). The construction of IPPS sampling designs through a method of emptying boxes. Annals of Statistics, 17, 1886-1905. Hess, I., Riedel, D. C. and Fitzpatrick, T. P. (1961): Probability sampling of hospitals and patients. Annals Arbor. Mich.: Bureau of hospital administration. Hess, I. and Srikantan, K.S. (1966). Some aspects of probability sampling technique of controlled selection. Health Serv. Res. Summer 1966, 8-52. Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from finite universes. Journal of American Statistical Association, 47, 663-85. Jessen, R. J. (1969): Some methods of probability non replacement sampling. Journal of American Statistical Association, 64, 175-193. Jessen, R.J. (1970). Probability sampling with marginal constraints. Journal of American Statistical Association, 65, 776-796. Jessen, R.J. (1973). Some properties of probability lattice sampling. Journal of American Statistical Association, 68, 26-28. Jessen, R.J. (1975). Square and cubic lattice sampling. Biometrics 31, 449-471. 324 Jessen, R.J. (1978). Statistical Survey Techniques. Wiley, New York. Keyfitz, N. (1951). Sampling with probabilities proportional to size: Adjustment for changes in probabilities. Journal of American statistical Association, 46, 105-109. Kish, L. (1963). Changing Strata and selection probabilities. In proceeding of the Social Statistics section, American Statistical Association, Washington, 124-131. Kish, L. and Scott, A. (1971). Retaining units after changing Strata and Probabilities. Journal of American statistical Association, 66, 461-470. Kuhn, H. W. and Tucker A. W. (1951). Non-linear programming. Proceedings of Second Berkely Symposium on Mathematical Statistics and Probability, 481-492. Lu, W. and Sitter, R.R. (2002). Multi-way stratification by linear programming made practical. Survey Methodology Vol. 28, No. 2, 199-207. Mahalanobis, P.C. (1939). A sample survey of the acreage under Jute in Bengal. Sankhya 4, 511-531. Mahalanobis, P.C. (1946). Recent experiments in Statistical sampling in the Indian Statistical Institute. Journal of the Royal Statistical Society, 109, 325-378. 325 Matei, A. and Tillé, Y. (2006). Maximal and Minimal sample coordination. Sankhya: The Indian journal of Statistics, 67, 590-612. Merola, G. M. (2003a). Generalized Risk Measures for Tabular Data. Proceedings of the 54th Session of the International Statistical Institute. Midzuno, H. (1952). On the sampling system with probability proportional to sums of sizes. Annals of Institute of Statistics & Mathematics, 3, 99-107. Moore, R.P., Chromy, J.R. and Rogers, W.T. (1974). The National Assessment approach to sampling. Nat. Assess. Of Education Progress, Denver. Mukhopadhyay, P. and Vijayan, K. (1996). On controlled sampling designs. Journal of Statistical Planning & Inference, 52, 375-378. Murthy, M.N. (1957). Ordered and unordered estimators in sampling without replacement. Sankhya, 18, 379-90. Nargundkar, M. S. and Saveland, W. (1972). Random Rounding to prevent Statistical Disclosures. In proceedings of the Social Statistics Section, American Statistical Association, 382-385. 326 Nigam, A.K., Kumar, P. and Gupta, V.K. (1984). Some methods of inclusion probability proportional to size sampling. Journal of Royal Statistical Society, B 46, 564-571. Patterson, H. D. (1954): The errors of lattice sampling. Journal of Royal Statistics society, (B), 16, 140-149. Rao, J.N.K. and Nigam, A.K. (1990). Optimal controlled sampling designs. Biometrika, 77, 807-814. Rao, J.N.K. and Nigam, A.K. (1992). ‘Optimal’ controlled sampling: A unified approach. International Statistical Review, 60, 89-98. Salazar, J. J. (2005). Controlled Rounding and Cell Perturbation: Statistical Disclosure Limitation Methods for Tabular Data. Mathematical Programming, Ser. B 105, 583-603. Sampford, M.R. (1967). On sampling with replacement with unequal probabilities of selection. Biometrika, 54, 499-513. Sande, G. (1984). Automated Cell Suppression to Preserve Confidentiality of Business Statistics. Statistical Journal of the United Nations ECE, 2, 33-41. Sen, A.R. (1953). On the estimation of variance in sampling with varying probabilities. Journal of Indian Society of Agricultural Statistics, 5, 119-127. 327 Singh, D. (1954). On efficiency of sampling with varying probabilities without replacement. Journal of Indian Society of Agricultural Statistics, 6, 48-57. Singh, D. and Chaudhary, S. S. (1986). Theory and Analysis of Sample Survey Designs. Wiley Eastern Ltd. Sitter, R.R. and Skinner, C. J. (1994). Multi-way stratification by linear programming. Survey Methodology, 20, 65-73. Srivastava, J. and Saleh, F. (1985). Need of t-designs in sampling theory. Utilitas Mathematica, 28, 5-17. Takeuchi, K., Yanai, H. and Mukherjee, B. N. (1983). The Foundations of Multivariate Analysis. 1st Edition New Delhi: Wiley Eastern Ltd. Tiwari, N. and Nigam, A. K. (1993). A note on constructive procedure for unbiased controlled rounding. Statistics and probability letters, 18, 415-420. Tiwari, N. and Nigam, A.K. (1998). On two-dimensional optimal controlled selection. Journal of Statistical Planning & Inference, 69, 89-100. Tiwari, N., Nigam, A. K. and Pant, I. (2007). On an optimal controlled nearest proportional to size sampling methodology, Vol. 33, 87-94. 328 scheme. Survey Walter, K.M. (1985). Introduction to variance estimation. Springer verlay, New York. Waterton, J.J. (1983). An exercise in controlled selection. Applied Statistics, 32, 150-164. Wilkerson, M. (1960): The revised city sample for the consumer price index. Monthly labor review, 1078-1083. Willenborg, L. C. R. J. and de Waal, T. (2001). Elements of Statistical Disclosure Control. Lecture Notes in Statistics, 155, Springer. Wynn, H.P. (1977). Convex sets of finite population plans. Annals of Statistics, 5, 414-418. Yates, F. (1960): Sampling methods for censuses and surveys, 3rd edition. London, Charles griffin and company. Yates, F. and Grundy, P.M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of Royal Statistical Society, B, 15, 253-261. Zadeh, L. A. (1965b). Fuzzy sets. Information and control, 8(3), 338353. 329
© Copyright 2024