What is ecological inference ( )? EI

What is ecological inference (EI)?
eiPack: Tools for R × C Ecological Inference and
Higher-Dimension Data Management
Olivia Lau
Ryan T. Moore
Goal: infer individual level behavior from
aggregate data
Unit of analysis: contingency table with
observed marginals
Michael Kellermann
row1
row2
row3
Department of Government
Institute for Quantitative Social Science
Harvard University
Vienna, Austria
16 June 2006
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
What is ecological inference (EI)?
Goal: infer individual level behavior from
aggregate data
Unit of analysis: contingency table with
observed marginals
row1
row2
row3
col1
N11i
N21i
N31i
N·1i
col2
N12i
N22i
N32i
N·2i
col3
N13i
N23i
N33i
N·3i
col1
N11i
N21i
N31i
N·1i
Olivia Lau, Ryan T. Moore, Michael Kellermann
col2
N12i
N22i
N32i
N·2i
col3
N13i
N23i
N33i
N·3i
N1·i
N2·i
N3·i
Ni
eiPack: R × C Ecological Inference and Data Management
eiPack
Other packages focus on 2 × 2 inference
(e.g., eco, MCMCpack)
eiPack: R × C inference
N1·i
N2·i
N3·i
Ni
eiPack methods estimate unobserved
internal cells (or functions thereof)
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
eiPack
eiPack
Other packages focus on 2 × 2 inference
(e.g., eco, MCMCpack)
eiPack: R × C inference
eiPack methods:
Method of bounds
Ecological regression
Multinomial-Dirichlet model
Other packages focus on 2 × 2 inference
(e.g., eco, MCMCpack)
eiPack: R × C inference
eiPack methods:
Method of bounds
Ecological regression
Multinomial-Dirichlet model
eiPack data: senc
Individual level party affiliation
Black, White, and Native American voters
8 counties (212 precincts) in SE North Carolina
Cell counts known
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
eiPack
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
eiPack
The models implemented in eiPack share:
The models implemented in eiPack share:
A common input syntax of the form:
cbind(col1, ..., colC) ∼ cbind(row1, ...,rowR)
Functions to calculate proportions of some
subset of columns
Appropriate print, summary, and plot
functions
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Method of bounds
Method of bounds
Quantity of interest: proportion of row
members in each column for each unit
Observed row and column marginals
determine upper and lower bounds
eiPack: R × C Ecological Inference and Data Management
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
51
65
58 63
0.8
52 61
54
75
67
37
68
118
0.6
139
39
18
144
71
30
89
212
111
113
90
91
86
35
104
97
92
85
31
2829 34
25
0.4
Proportion Democratic
Quantity of interest: proportion of row
members in each column for each unit
Observed row and column marginals
determine upper and lower bounds
Row thresholds implemented for extreme
case analysis
Output:
$white.dem
lower upper
18 0.519 0.559
25 0.450 0.469
28 0.392 0.487
1.0
Method of bounds
94
96
95
99 110
122
117
123 128 130 137
115
127
120
131
129
147
145
98
207
200
88
0.2
Method of bounds
eiPack: R × C Ecological Inference and Data Management
Olivia Lau, Ryan T. Moore, Michael Kellermann
0.0
Olivia Lau, Ryan T. Moore, Michael Kellermann
Quantity of interest: proportion of row
members in each column for each unit
Observed row and column marginals
determine upper and lower bounds
Row thresholds implemented for extreme
case analysis
Precincts at least 90% White
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Ecological regression
1.0
Method of bounds
51
65
0.8
●52 61 ●
54● ●
67
●●
18
39
●
●
30
●
●
31
2829 34
25 ● ●
●
●
35
●
●
68
●
118
●
139
●144
212
●
122
117
123 128 130 137 ●
113 ●
●
147
90
● 127●
●
131
91
115
99●110 ●
120
●86
145
●●
●
94
● 129
●
●
96
● 89● ●
● ●
●
●●95
●
●
98
●
●
●
207
●●
●
200
●
●
71
● 85
92
97
104
111
88
●
0.2
0.4
0.6
37
75
●
0.0
Proportion Democratic
58 63
●
Express data as proportions of row totals
Regress each column on all row
proportions (C regressions)
Coefficients estimate cell proportions
Precincts at least 90% White
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Ecological regression
Express data as proportions of row totals
Regress each column on all row
proportions (C regressions)
Coefficients estimate cell proportions
eiPack: freq. and Bayesian regression
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Ecological regression
Express data as proportions of row totals
Regress each column on all row
proportions (C regressions)
Coefficients estimate cell proportions
eiPack: freq. and Bayesian regression
lambda functions calculate shares of a
subset of columns – e.g. “among Blacks,
Dem. share of 2-party registration”
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Multinomial-Dirichlet (MD) model
20
Express data as counts
Fit hierarchical Bayesian model
0
Density
40
Ecological regression
−0.2
0.2
0.6
Level 1: column marginals ∼ Multinomial, ⊥
⊥
across units
Level 2: rows of cell fractions ∼ Dirichlet, ⊥
⊥
across rows and units
Level 3: Dirichlet parameters ∼ Gamma, i.i.d.
1.0
20
0
Density
40
Proportion Democratic
−0.2
0.2
0.6
1.0
Proportion Republican
Level 1: column marginals ∼ Multinomial, ⊥
⊥
across units
Level 2: rows of cell fractions ∼ Dirichlet, ⊥
⊥
across rows and units
Level 3: Dirichlet parameters ∼ Gamma, i.i.d.
1.0
0.8
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●● ● ●
●
● ●
●● ●
●
●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
● ●
●
●
● ●
●●
●
●
● ● ●●
●●
● ●
●
●● ●
●
●
●●●
●
●
●
●●
● ●
●● ● ●
●
●●
●●● ●
●
●
●
●
●●
● ●
●● ●
●
●
● ●
●
●
●
● ●
●
●
●
● ●●
● ● ●
●
● ●●
●
●● ●
●
●
●● ●
● ● ●● ● ●
● ● ●
● ●
●
●
● ●
●
● ● ●●
●●
●● ●
●
●
●
●
●
●
●● ●●
●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
● ●
●● ● ●
●
●
●
●
●
●
●
●
● ● ● ●
●●
●●
●
● ●
●
●● ●
●
●●● ●
●
●
●
●
●
●
●●
●●
●●
● ● ● ●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
● ● ●● ●
●
●
●
●
●
● ●
● ●
● ●
●
●●
●●
●
●●
●
●● ●
●
●
●
●● ● ●●
●
●
●
●
●
● ●
●
●●
●
●
●
● ●●
● ●●
● ●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
lambda and density.plot functions
●
0.6
Express data as counts
Fit hierarchical Bayesian model
Multinomial-Dirichlet (MD) model
0.4
Multinomial-Dirichlet (MD) model
eiPack: R × C Ecological Inference and Data Management
Olivia Lau, Ryan T. Moore, Michael Kellermann
0.2
eiPack: R × C Ecological Inference and Data Management
Proportion of White Democrats
Olivia Lau, Ryan T. Moore, Michael Kellermann
0.0
0.2
0.4
0.6
0.8
1.0
Proportion White in precinct
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Data Management
Data Management
Reasonable-sized problems produce
unreasonable amounts of data
Reasonable-sized problems produce
unreasonable amounts of data
E.g., a model for voting in Ohio includes
11000 precincts
3 racial groups
4 party options
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Data Management
eiPack: R × C Ecological Inference and Data Management
Data Management
Reasonable-sized problems produce
unreasonable amounts of data
E.g., a model for voting in Ohio includes
11000 precincts
3 racial groups
4 party options
Reasonable-sized problems produce
unreasonable amounts of data
E.g., a model for voting in Ohio includes
11000 precincts
3 racial groups
4 party options
1000 iterations yields about 1.3 × 108
parameter draws
Draws occupy ≈ 1GB of RAM; probably not
enough iterations
Olivia Lau, Ryan T. Moore, Michael Kellermann
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
1000 iterations yields about 1.3 × 108
parameter draws
Draws occupy ≈ 1GB of RAM; probably not
enough iterations
eiPack allows users to write chains to
disk, or discard chains not of interest
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management
Visit our poster for more!
Olivia Lau, Ryan T. Moore, Michael Kellermann
eiPack: R × C Ecological Inference and Data Management