Sample Design Issues in PISA MEXICO Ismael Flores Cervantes Gustavo Flores Vázquez September 14, 2009 Kiel Germany Kiel, 1 What is PISA? OECD s Programme for International OECD’s Student Assessment (PISA) surveys z Survey of 15-year-old students in grade 7 and higher z Every three years since 2000 z Complex design – z Stratified two two-stage stage with probability proportional to size (PPS) design. Use schools as clusters (PSUs) – Fixed within school sample to achieve selfweighting samples 2 PISA Research Opportunities z Cyclical nature – z Previous results can be used to improve sample design for following cycle Provide information for decision making for future cycles – – Minimum sample size and associated errors Minimum detectable differences 3 Sample Design Objectives PISA Obj ti Objectives Country Obj ti Objectives Sample Design Country Options Base Cost + Option Cost Efficient design g - Minimize total cost - Meets PISA objective - Meets M t country t objectives = Total C t Cost 4 Mexico Sample z National option (expanded sample) for estimates at state level since 2003 – z z z z z Increase the sample size by 10 times It did not meet national option objectives in 2003 and 2006 Limited usability of state level data No corrective action in 2006 Better in 2009, there are still improvements to made International objectives have been met 5 Implications of sample allocation z Limited usability of data at state level 6 Mexico’s Mexico s Design Characteristics z Frame information – – – Outdated (lag of one year or more) Different sources (school’s (school s systems) and time periods Not reliable numbers for 15 y year olds in some schools z z Misclassification in stratification by school size Incorrect measure of size used in PPS sampling 7 Design Effects and Effective Sample z Design effect – – – – z Measure of efficiency of the sample design Large design effect → inefficient design More sample to obtain the same precision Higher g cost for same precision p Effective sample size: available for inferences nnominal neffective = DEFF n nominal Design effect n effective 1,000 2 500 2 000 2,000 4 500 1,000 4 250 8 Factors that Affect the Design Effect z z z z z Sampling unit (school) Stratification / sample allocation ( b (subsampling) li ) Measure of size Subsampling within school (students) Nonresponse adjustments 9 Total Design Effect 2000-2006 YEAR 2000 2003 2006 Total Design Effect Mean Mexico 5 5 8 53 9 34 Ratio 1 10 7 10 10 DEFF from Sample allocation 2 0 0 0 2 0 0 3 Small Medium Large Total 250,000 23% 500 2% 1.0 0.04 210,000 20% 700 2% 1.7 0.07 Change 610,000 57% 28,800in PSU 96% 23.6 1.00 Explicit p allocation of 5.4 1 070 000 100% definition 1,070,000 30 000 100% 30,000 54 Small Medium Large Total 190,000 180,000 15% 820,000 69% 1,190,000 100% 2 0 0 6 Number of students 120,000 13% 170 000 170,000 18% 670,000 70% 960,000 100% DEFF Relative Sampled B Total Sampling A students 200 8% 1.0 0.59 400 16% 1.4 1 4 0.83 0 83 1,900 76% 1.7 1.00 2,500 100% 1.1 NA 1.1 School Size Small Medium Large Total Y 11 1.1 57 5.7 2% 1.6 0.09 97% 17.4 1.00 100% 3.9 1.1 4.3 Effect oflarge stateschools by state 400 1% 1.0 0.06 16% stratification 600 30,000 31,000 11 Incorrect MOS and SRS z z SRS of schools in small and medium school strata (30% of population) SRS clusters is very y inefficient with cluster (school variable size) (Cochran, 1977)) C FSRS Cluster (N ∑ = 1+ c =1 c − N )N cYc C −1 1 NSY2 Increase in DEFF Small Medium Year schools schools 2000 10 12 2003 18 28 2006 16 8 12 Incorrect MOS and PPS z z PPS is used in 70% of population Analysis looks at effect of incorrect MOS and treats it as a form of oversampling p g FError i MOS in YEAR 2000 2003 2006 1 = 2 N N c2 c =1 p c ∑ Coefficient of variation of weights 40 230 188 C F MOS error 1.2 1.9 18 1.8 13 A Better MOS z z PISA sampling manual mentions the use of number of students in modal grade instead of number of 15 year old students It has no been implemented despite diff differences b between t MOS and d observed b d enrollment in all cycles 14 Modeling the School MOS z z Linear regression of transformed variables by modality and support W evaluate We l t the th reduction d ti off variability i bilit off the difference as V (Model MOS − Observed ) R= V (MOS − Observed ) z Large reduction → low ratio 15 Modeling the School MOS (continued) G Group Type Private R 59% High School Public Private 43% # in modal grade, # total in school, # 15 years old students # modal grade, # total in school, school # 15 years old students Urban status Type (i.e., tecnologico) 43% # in modal grade, # total in school, Type (i.e., telesecundaria) 50% # modal grade, # total in school, # 15 years old ld students t d t Middle School Public Variables i bl 16 Better Sample Allocation Frame State Sample Size # Schools 6 B. CALIFORNIASmall 5 B. CALIFORNIAMedium 4 B. CALIFORNIALarge Total 6 ZACATECAS Small 5 ZACATECAS Medium 4 ZACATECAS Large T t l Total # Students Rel. smp deff rates Schools Students 266 44% 106 18% 229 38% 601 100% 1,915 2,652 31,053 35,620 5% 7% 87% 100% 5 3 34 42 36 75 1,190 1,301 0.5 1.0 1.03 0.7 1.5 1.0 2.0 856 82% 77 7% 105 10% 1 038 100% 1,038 3,313 1,864 10,768 15 945 15,945 21% 12% 68% 100% 11 6 29 46 43 145 1,015 1 203 1,203 0.1 1.0 1.87 0.8 6.1 1.0 7.3 Very different sample sizes Effective sample 1,261 642 17 Sample Allocation Formula in PISA manual z Close to the best allocation when small schools are subsampled by a factor of 1/2 z Produces large effects when lower subsampling factors are implemented O ti i ti problem: Optimization bl allocate ll t sample l that th t z Minimize DEEF with these conditions – – – – – Subsampling S b li factor f t < 1/k (reduce ( d small ll schools) h l ) Number of school less than allocated by PISA or Number of students less than allocated by PISA Minimum number of schools in strata > S Effective sample size is fixed (i.e., 1,000) 18 Better Sample Allocation Sample allocation Frame State Sample Size # Schools # Students Schools Students Rel. smp deff rates Effective sample 06 B. CALIFORNIA Small 05 B. CALIFORNIA Medium 04 B. CALIFORNIA Large g Total 4 4 26 34 29 100 900 1,029 1.03 1,000 96 ZACATECAS 95 ZACATECAS 94 ZACATECAS Total 26 6 27 59 100 149 954 1,203 1.20 1,000 Option A Option B Total Small Medium Large 2,504 2,232 272 1,903 2,000 -97 19 Alternative Design Evaluation Number All Allocation ti off schools 2009 1 442 1,442 Alternative 1,306 Difference 136 Number off students 40 264 40,264 31,051 9,213 % 77.1% 20 Summary and Recommendations z z z z Cyclical nature of PISA offers opportunities to improve the sample designs g for future cycles y Need of QC procedures to track high values of design g effects to avoid the 2003 and 2006 situation Review of sample p allocation rules when lower subsampling is implemented Consider use of modal g grade or total number of students as proxy to total number of 15 years old to reduce variance 21 Summary (continued) z z z Explore reduction of variance by changing the sampling rate within schools Countries should get involved to improve d i off national design ti l options. ti Thi This can translate into considerable savings Objectives of national options should be clearly defined so the sample design can be worked ensuring that the PISA objectives are always met 22 Contact Information Ismael Flores Cervantes [email protected] G t Gustavo Flores Fl Vázquez Vá [email protected] 23 z Review documents: reports tables with errors 24 Number of Sampled Students 2009 Altervative Sampled Student Students weight 35 14.5 35 10.7 35 8.7 35 7.9 35 7.8 35 8.0 35 6.9 35 8.9 School ID 12 11 8 1 14 6 21 19 State Aguascalientes Aguascalientes Aguascalientes Aguascalientes Aguascalientes Aguascalientes Aguascalientes Aguascalientes … … … … 7 5 20 3 18 T t l Total Ratio CV Aguascalientes Aguascalientes Aguascalientes Aguascalientes Aguascalientes 35 35 35 35 29 825 7.7 93 9.3 3.7 5.5 5.3 42.31% Sampled Student Students weight 35 7.2 40 9.4 40 7.6 38 7.2 38 7.2 39 7.2 33 7.2 40 7.8 … 38 40 20 27 26 825 … 7.2 82 8.2 6.5 7.2 7.2 25 Optimization Problem z z Chose n students within school such that it minimizes the variation of student weights in the stratum Upper and lower bonds – – – Maximum 40 students Minimum 25 students Same total number of students as initially allocated 26 z Do not use 27 PISA Research Conference 2009 z A particular focus of the conference will be on issues concerning the quality and improvement of the assessment, assessment and an important outcome will be to identify issues that may feed into an agenda for future research and development activities 28 Better sample allocation Frame Option State Sample Size # Schools # Students Rel. smp deff rates Schools Students 06 BAJA CALIFORNIA 05 BAJA CALIFORNIA 04 BAJA CALIFORNIA Total Small Medium Large 266 44% 106 18% 229 38% 601 100% 1 915 1,915 2,652 31,053 35,620 5% 7% 87% 100% 5 3 34 42 36 75 1,190 1,301 05 0.5 0.7 1.0 1 0 1.03 1.0 1 03 1.5 2.0 96 ZACATECAS 95 ZACATECAS 94 ZACATECAS Total Small Medium Large 856 82% 77 7% 105 10% 1,038 100% 3,313 1,864 10,768 15,945 21% 12% 68% 100% 11 6 29 46 43 145 1,015 1,203 0.1 0.8 1.0 1.0 1.87 6.1 7.3 Effective sample 1 261 1,261 A 642 06 BAJA CALIFORNIA 05 BAJA CALIFORNIA 04 BAJA CALIFORNIA Total Small Medium Large 4 4 26 34 29 100 900 1,029 1.03 1,000 96 ZACATECAS 95 ZACATECAS 94 ZACATECAS Total Small Medium Large 26 6 27 59 100 149 954 1,203 1.20 1,000 B Option A Option B Total 2,504 2,232 272 1,903 2,000 -97 29 Mexico’s Mexico s Participation 30 Sample Size Average Total Design Effect Year Mexico Canada Italy Spain 2000 5 6 4 5 2003 53 12 11 8 2006 34 12 11 14 31 Total Design Effect in 2000 z DEFF (5) Number of Countries 34 4.9 51 5.1 Overall Mean Mexico Mean Quantile 100% Maximum 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Minimum Mi i Value 15.1 13.9 8.5 6.2 4.2 2.3 1.8 0.9 0.8 08 0.8 32
© Copyright 2024