Poster - jonathan k. nelson

EXPLORING ACROSS-SCALE RELATIONSHIPS IN SPATIALLY AGGREGATED DATA:
INFORMING THE MODIFIABLE AREAL UNIT PROBLEM
Jonathan K. Nelson and Cynthia A. Brewer
Penn State Department of Geography
To achieve a better understanding of the stability of these data across the three scales, we first
spatially joined values from upper-level aggregates to nested lower-level aggregates (fig. 2). The
spatial joins resulted in: tract-level data with county values appended and block group-level data
with county and tract values appended.
(a)
Spatial join values from upper-level
aggregates to nested lower-level
aggregates.
We demonstrate our statistical-visual approach to exploring the scalar effects of
MAUP using two variables and three spatial scales. Figure 1 cartographically depicts
both datasets at all three scales. Classification breaks are held constant to visualize
the effects of scale on data variability across the three aggregation levels. Our aim
is to advance the understanding of the (in)stability of these data across the
three scales.
Spatial Scales
1. Block Group
2. Tract
3. County
Why these Spatial Scales?
● well-defined; complete
● statistically uniform
● reliable data
● common in literature
● clean, nesting relationship
Figure 1: PA Median Income and NY Cancer
Diagnosis Rates at the (a) block group level, (b) tract
level, and (c) county level.
COUNTY
COUNTY
INCOME
TRACT
TRACT
INCOME
BLOCK
GROUP
BLOCK
GROUP
INCOME
Pennsylvania
Philadelphia
County
Philadelphia
County
Philadelphia
County
Philadelphia
County
Philadelphia
County
Philadelphia
County
Philadelphia
County
Philadelphia
County
Philadelphia
County
Philadelphia
County
$36,251
Census
8.01
Census
8.03
Census
8.03
Census
8.03
Census
8.03
Census
8.04
Census
8.04
Census
8.04
Census
9.01
Census
9.01
Tract
$79,000
Group
$79,000
Tract
$59,135
Group
$40,795
Tract
$59,135
Group
$51,574
Tract
$59,135
Group
$106,658
Tract
$59,135
Group
$107,727
Tract
$62,589
Group
$64,643
Tract
$62,589
Group
$72,431
Tract
$62,589
Group
$48,750
Tract
$39,265
Group
$53,897
Tract
$39,265
Block
1
Block
1
Block
2
Block
3
Block
4
Block
1
Block
2
Block
3
Block
1
Block
2
Group
$25,810
Pennsylvania
Figure 2b: Sample of records underlying PA block group
topology post spatial join processing (right). All block groups
are within four different census tracts in Philadelphia county.
Because of this there is one median income value for the
county, four different values for each of the tracts, and
individual values for the ten block groups. Note just how
different some of these values are considering they are
spatially related across scale. In the row highlighted in red, we
see a county value of ~$36,000, a tract value of ~$59,000, and
a block group value of over $100,000.
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
$36,251
$36,251
$36,251
$36,251
$36,251
$36,251
$36,251
$36,251
$36,251
Next, we performed bivariate local indicators of spatial association (LISA) analyses (Anselin 1995,
Geog Analy 27:2). LISA is a local form of spatial autocorrelation, which decomposes the global
Moran’s I statistic into individual local indicators of spatial autocorrelation. Bivariate LISA analysis
allows values of one variable to be regressed on neighboring values of a different variable. Here,
we apply LISA to quantify across-scale
autocorrelation. As such, values of the
lower-level aggregates were standardized
in standard deviation units with a mean of
zero and variance of one, and regressed on
standardized neighboring values of the
appended upper-level aggregate scales.
PA Median Income: Tract against County
Moran's I (0.55, n=3217, p<0.01) Scatterplot
NY Cancer Diagnosis Rates: Tract against County
Moran's I (0.41, n=4901, p<0.01) Scatterplot
LISA
2
Indices
Similar
0
Dissimilar
2
3
0
3
● How do median household income
values at one scale correlate with
nearby median household income
values of a different scale?
● How do cancer diagnosis rates
at one scale correlate with
nearby cancer diagnosis rates
of a different scale?
Similar
0
Dissimilar
2
2.5
0
Dissimilar
2
5.0
4
Standardized Values
Lower Level Aggregate
8
12
4
Indices
Standardized Values
Lower Level Aggregate
Similar
NY Cancer Diagnosis Rates: Block Group against County
Moran's I (0.31, n=15247, p<0.01) Scatterplot
LISA
0.0
Indices
0
2
2.5
LISA
2
6
Standardized Values
Lower Level Aggregate
PA Median Income: Block Group against County
Moran's I (0.51, n=9738, p<0.01) Scatterplot
2. Bivariate LISA Analysis
Spatially Lagged Values
Upper Level Aggregate
● commonly analyzed in
aggregate form
● vary differently across
space
● spatially adjacent
geographies for effective
cartographic comparison
STATE
Pennsylvania
Spatially Lagged Values
Upper Level Aggregate
Why these Variables?
(d)
(b)
(a)
(d)
(b)
(e)
(c)
(f)
(e)
(b)
Spatially Lagged Values
Upper Level Aggregate
2. Cancer Diagnosis
Rates (2005-09), NY
(a)
Figure 2a: Schematic diagram of spatial join
process (left). County values are appended to
nested tracts and block groups. Tract values are
appended to nested block groups.
LISA
2
Indices
Similar
0
2
Dissimilar
4
7.5
0
PA Median Income: Block Group against Tract
Moran's I (0.69, n=9738, p=0.01) Scatterplot
10
20
Standardized Values
Lower Level Aggregate
NY Cancer Diagnosis Rates: Block Group against Tract
Moran's I (0.46, n=15247, p<0.01) Scatterplot
LISA
Indices
Similar
2
0
Dissimilar
2
2.5
0.0
2.5
Standardized Values
Lower Level Aggregate
5.0
7.5
LISA
Indices
Similar
2
0
Dissimilar
2
0
10
20
Standardized Values
Lower Level Aggregate
(c)
(f)
Figure 4: Bivariate choropleth maps of across-scale
relationships of PA median income.
The Moran’s I scatterplots (fig. 3) plot Figure 5: Bivariate choropleth maps of across-scale
standardized median income values and relationships of NY cancer rates.
cancer diagnosis rates of lower-level aggregates against spatially lagged values and
rates of upper-level aggregates. Color represents LISA indices and visually
reinforces areas in the plots that convey similar (green), dissimilar (purple), and
random (brown) across-scale relationships.
The maps in a, b, and c of figures 4 and 5 integrate type of spatial autocorrelation
with level of significance. Hue conveys type of across-scale spatial autocorrelation:
purple denotes negative autocorrelation; brown denotes little or no autocorrelation;
and green denotes positive autocorrelation. The varying lightness in hue represents
the level of significance for a given type of spatial autocorrelation, with darker
shades indicating greater significance.
The maps in d, e, and f of figures 4 and 5 depict standardized median income values
and cancer diagnosis rates of lower-level aggregates against spatially lagged values
and rates of upper-level aggregates. Shades of gray depict similarities between
levels of areal aggregation. Shades of pink and green represent areas of
across-scale discordance in median income and cancer rates.
FINDINGS
30
4
4
Spatially Lagged Values
Upper Level Aggregate
1. Median Household
Income (2010), PA
The LISA analyses provided both global and local insights on the spatial
autocorrelation of median household income and cancer diagnosis rates across
scale. However, we needed strategies for visualizing the results to better understand
the spatial patterns of autocorrelation and significance. Statistical output from the six
bivariate LISA analyses was visually transformed into trivariate Moran’s I
scatterplots (fig. 3), and bivariate choropleth maps (fig. 4 and 5).
1. Data Processing
VARIABLES, SCALES & RATIONALE
Variables
VISUALIZATION & INTERPRETATION
METHODS
Spatially Lagged Values
Upper Level Aggregate
Socioeconomic and health analysts commonly rely on areally aggregated data, in
part because government regulations on confidentiality prohibit data release at the
individual level. Analytical results from areally aggregated data, however, are
sensitive to the modifiable areal unit problem (MAUP) (Openshaw 1984, Geo
Books). Levels of aggregation, as well as the arbitrary and modifiable sizes, shapes,
and arrangements of zones affect the validity and reliability of findings from analyses
of areally aggregated data. MAUP, long acknowledged, remains unresolved (Root
2012, AAG 102:5; Manley 2014, H Reg Sci). We present an exploratory spatial data
analytical approach to understand the scalar effects of MAUP.
Spatially Lagged Values
Upper Level Aggregate
INTRODUCTION
30
Figure 3: Trivariate Moran's I scatterplots for across-scale
relationships of PA median income and NY cancer diagnosis rates.
● Positive GLOBAL spatial autocorrelation for all relationships
● Tract-block group relationships most similar
● County-block group relationships most dissimilar
● Tract-county and block group-county relationships differ in a similar way
● Cancer diagnosis rates possess weaker signal
● Pockets of LOCAL across-scale instability for all relationships