What is ecological inference (EI)? eiPack: Tools for R × C Ecological Inference and Higher-Dimension Data Management Olivia Lau Ryan T. Moore Goal: infer individual level behavior from aggregate data Unit of analysis: contingency table with observed marginals Michael Kellermann row1 row2 row3 Department of Government Institute for Quantitative Social Science Harvard University Vienna, Austria 16 June 2006 Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management What is ecological inference (EI)? Goal: infer individual level behavior from aggregate data Unit of analysis: contingency table with observed marginals row1 row2 row3 col1 N11i N21i N31i N·1i col2 N12i N22i N32i N·2i col3 N13i N23i N33i N·3i col1 N11i N21i N31i N·1i Olivia Lau, Ryan T. Moore, Michael Kellermann col2 N12i N22i N32i N·2i col3 N13i N23i N33i N·3i N1·i N2·i N3·i Ni eiPack: R × C Ecological Inference and Data Management eiPack Other packages focus on 2 × 2 inference (e.g., eco, MCMCpack) eiPack: R × C inference N1·i N2·i N3·i Ni eiPack methods estimate unobserved internal cells (or functions thereof) Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management eiPack eiPack Other packages focus on 2 × 2 inference (e.g., eco, MCMCpack) eiPack: R × C inference eiPack methods: Method of bounds Ecological regression Multinomial-Dirichlet model Other packages focus on 2 × 2 inference (e.g., eco, MCMCpack) eiPack: R × C inference eiPack methods: Method of bounds Ecological regression Multinomial-Dirichlet model eiPack data: senc Individual level party affiliation Black, White, and Native American voters 8 counties (212 precincts) in SE North Carolina Cell counts known Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management eiPack Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management eiPack The models implemented in eiPack share: The models implemented in eiPack share: A common input syntax of the form: cbind(col1, ..., colC) ∼ cbind(row1, ...,rowR) Functions to calculate proportions of some subset of columns Appropriate print, summary, and plot functions Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Method of bounds Method of bounds Quantity of interest: proportion of row members in each column for each unit Observed row and column marginals determine upper and lower bounds eiPack: R × C Ecological Inference and Data Management Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management 51 65 58 63 0.8 52 61 54 75 67 37 68 118 0.6 139 39 18 144 71 30 89 212 111 113 90 91 86 35 104 97 92 85 31 2829 34 25 0.4 Proportion Democratic Quantity of interest: proportion of row members in each column for each unit Observed row and column marginals determine upper and lower bounds Row thresholds implemented for extreme case analysis Output: $white.dem lower upper 18 0.519 0.559 25 0.450 0.469 28 0.392 0.487 1.0 Method of bounds 94 96 95 99 110 122 117 123 128 130 137 115 127 120 131 129 147 145 98 207 200 88 0.2 Method of bounds eiPack: R × C Ecological Inference and Data Management Olivia Lau, Ryan T. Moore, Michael Kellermann 0.0 Olivia Lau, Ryan T. Moore, Michael Kellermann Quantity of interest: proportion of row members in each column for each unit Observed row and column marginals determine upper and lower bounds Row thresholds implemented for extreme case analysis Precincts at least 90% White Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Ecological regression 1.0 Method of bounds 51 65 0.8 ●52 61 ● 54● ● 67 ●● 18 39 ● ● 30 ● ● 31 2829 34 25 ● ● ● ● 35 ● ● 68 ● 118 ● 139 ●144 212 ● 122 117 123 128 130 137 ● 113 ● ● 147 90 ● 127● ● 131 91 115 99●110 ● 120 ●86 145 ●● ● 94 ● 129 ● ● 96 ● 89● ● ● ● ● ●●95 ● ● 98 ● ● ● 207 ●● ● 200 ● ● 71 ● 85 92 97 104 111 88 ● 0.2 0.4 0.6 37 75 ● 0.0 Proportion Democratic 58 63 ● Express data as proportions of row totals Regress each column on all row proportions (C regressions) Coefficients estimate cell proportions Precincts at least 90% White Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Ecological regression Express data as proportions of row totals Regress each column on all row proportions (C regressions) Coefficients estimate cell proportions eiPack: freq. and Bayesian regression Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Ecological regression Express data as proportions of row totals Regress each column on all row proportions (C regressions) Coefficients estimate cell proportions eiPack: freq. and Bayesian regression lambda functions calculate shares of a subset of columns – e.g. “among Blacks, Dem. share of 2-party registration” Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Multinomial-Dirichlet (MD) model 20 Express data as counts Fit hierarchical Bayesian model 0 Density 40 Ecological regression −0.2 0.2 0.6 Level 1: column marginals ∼ Multinomial, ⊥ ⊥ across units Level 2: rows of cell fractions ∼ Dirichlet, ⊥ ⊥ across rows and units Level 3: Dirichlet parameters ∼ Gamma, i.i.d. 1.0 20 0 Density 40 Proportion Democratic −0.2 0.2 0.6 1.0 Proportion Republican Level 1: column marginals ∼ Multinomial, ⊥ ⊥ across units Level 2: rows of cell fractions ∼ Dirichlet, ⊥ ⊥ across rows and units Level 3: Dirichlet parameters ∼ Gamma, i.i.d. 1.0 0.8 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 lambda and density.plot functions ● 0.6 Express data as counts Fit hierarchical Bayesian model Multinomial-Dirichlet (MD) model 0.4 Multinomial-Dirichlet (MD) model eiPack: R × C Ecological Inference and Data Management Olivia Lau, Ryan T. Moore, Michael Kellermann 0.2 eiPack: R × C Ecological Inference and Data Management Proportion of White Democrats Olivia Lau, Ryan T. Moore, Michael Kellermann 0.0 0.2 0.4 0.6 0.8 1.0 Proportion White in precinct Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Data Management Data Management Reasonable-sized problems produce unreasonable amounts of data Reasonable-sized problems produce unreasonable amounts of data E.g., a model for voting in Ohio includes 11000 precincts 3 racial groups 4 party options Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Data Management eiPack: R × C Ecological Inference and Data Management Data Management Reasonable-sized problems produce unreasonable amounts of data E.g., a model for voting in Ohio includes 11000 precincts 3 racial groups 4 party options Reasonable-sized problems produce unreasonable amounts of data E.g., a model for voting in Ohio includes 11000 precincts 3 racial groups 4 party options 1000 iterations yields about 1.3 × 108 parameter draws Draws occupy ≈ 1GB of RAM; probably not enough iterations Olivia Lau, Ryan T. Moore, Michael Kellermann Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management 1000 iterations yields about 1.3 × 108 parameter draws Draws occupy ≈ 1GB of RAM; probably not enough iterations eiPack allows users to write chains to disk, or discard chains not of interest Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management Visit our poster for more! Olivia Lau, Ryan T. Moore, Michael Kellermann eiPack: R × C Ecological Inference and Data Management
© Copyright 2024