Download Report

How to Measure Segregation
Angelo Mele∗
Department of Economics
University of Illinois, Urbana-Champaign
[email protected]
http://netfiles.uiuc.edu/amele2/www
July 30, 2007
Abstract
We introduce a new theoretical framework for the measurement of residential segregation
with two goals: 1) a segregation index should not depend on arbitrary partitions of the city
in neighborhoods, but only on agents’ locations; 2) a segregation index should allow the measurement of segregation of a continuous variable (income) and of multiple attributes (race and
income) together. We assume that the individuals locations follow a spatial Poisson point process
over the metropolitan area and, conditional on location, we associate to each agent a random
variable (mark) representing his socioeconomic characteristic(s). We construct a new spatial
dissimilarity index and we compare it to existing neighborhood-based indices of dissimilarity.
We provide nonparametric methods for estimating the spatial dissimilarity for the case in
which individual locations data are available and an alternative approximate method when
only summary data at the block level are available. We apply our approach to the analysis of
racial segregation and we analyze both artificial and census data, showing that the ranking of
cities is diﬀerent under our index and the traditional dissimilarity. This last result potentially
challenges the findings of the literature studying the eﬀects of racial segregation on individual
socioeconomic outcomes, since in those studies the level of segregation is measured according to
the neighborhood-based approach.
Keywords: racial segregation, income segregation, point processes, poisson processes, spatial
statistics, nonparametric estimation
∗
Previous versions of this work circulated under the title: "A Unified Stochastic Framework for Measures of
Socioeconomic Residential Segregation". The first idea for this paper came out during a twenty-minutes discussion
with Roger Koenker: he suggested to explore the literature on point processes. That search turned out to be
extremely helpful, at least as much as the following twenty-minutes conversations with him. I am also grateful to
Patrick Bayer, Alberto Bisin, Rosa Ferrer, Antonio Galvao, Shadmehr Mehdi, Antonio Mele, Luca Opromolla, Franco
Peracchi, Giorgio Topa, Jungmo Yoon and participants to the Washington University Graduate Students Conference
2006, UIUC Econometrics Lunch Seminar, ESPE 2007 Conference, La Pietra-Mondragone Workshop 2007 for useful
comments and suggestions. All remaining errors are mine.
1
1
Introduction
In this work we introduce a new theoretical framework for the measurement of spatial segregation
with two main goals: first of all, a segregation index should not depend on arbitrary partitions of
the city in neighborhoods, but only on the individuals’ locations over the urban area; second, a
segregation index should be able to measure the segregation of a continuous variable (income) or
the residential separation on multiple attributes (race, income, education) together.
We provide two main contributions to the literature: we develop a flexible framework for the
measurement of segregation introducing a stochastic process driving the locations of individuals of
diﬀerent racial groups and providing a definition of segregation based on the individual locations
over a metropolitan area; second, we present methods of estimation for point patterns data and
show nonparametric methods for the case in which only count data at the block level are available.
We start by observing that we do not expect people to be located uniformly over the entire
metropolitan area, for a number of reasons: geographic barriers, diﬀerent building structures, laws,
dedicated areas, etc.: otherwise we would observe some newyorkers living in central park. In fact
there are diﬀerent intensities (mean number of people per unit area) over the urban area, some
neighborhoods showing more population density than others. An index of segregation is a function
of the individuals’ locations summarizing the diﬀerence between the spatial pattern of each (racial)
group and that of the population as a whole.1 The stochastic framework introduced in this work
provides a way to define and measure the diﬀerence among spatial patterns that is both theoretically
sound and easily implementable in empirical analysis.
It is known that spatial separation by race (or other socioeconomic variables) over the urban
area is a specific and historical characteristic of the US cities. In this work we answer the question:
how should we measure the extent of segregation in a metropolitan area? Suppose we want to
measure the segregation of blacks from non-blacks: the traditional approach, which I will refer to
as the neighborhood-based approach, involves the partition of the city in n neighborhoods, then
the computation for each neighborhood i of the share of blacks Bi /Pi , where Pi is the number of
individuals and Bi the number of blacks in neighborhood i. If there is no segregation the fraction of
blacks in each neighborhood Bi /Pi will be equal to the fraction of blacks in the whole city B/P . An
index of segregation is then a synthetic measure of the diﬀerence between the actual distribution
of races across neighborhoods, i.e. the distribution (B1 /P1 , ..., Bn /Pn ), and the distribution arising
when there is no segregation, (B/P, ..., B/P ), with appropriate normalization in order to get a
quantity between zero and one, which is comparable across cities. According to the notion used by
the researcher to measure this diﬀerence one will obtain alternative indices.2
However, all the indices built according to the neighborhood-based approach present some
common problems. First of all, all the indices are based on some partition of the metropolitan area
in neighborhoods (as argued by Echenique and Fryer (2006)), usually census tracts or blocks, making
the measurement directly dependent on the specific partition adopted. Second, if we compute the
index of segregation using diﬀerent levels of aggregation of the data (tracts, block groups or blocks)
we will get diﬀerent numbers and, even worse, diﬀerent rankings of the cities in terms of segregation.
a problem known in spatial analysis as Modifiable Area Unit Problem (MAUP). Third, the majority
of the indices does not take into account the spatial location of the individuals over the urban area,
thus completely ignoring the inherently spatial nature of the phenomenon. Fourth, the indices are
devoted to the measurement of segregation of a categorical variable (race, occupation): whenever
we are interested in the segregation of a continuous variable (income, education, wealth) we have
1
The idea applies analogously to income segregation.
See Massey and Denton (1988) for an extensive review of the traditional indices. Reardon and Firebaugh (2002)
explicitly provide a discussion of the neighborhood based approach.
2
2
to split the continuum into a set of categories (income groups, education groups, etc) in order to
use the same indices. We think that an ideal segregation index should be able to take into account
the continuity of the variables under consideration, or even better it should be able to measure
segregation on diﬀerent levels (for example race and income together).
The approach developed in this work starts from the same argument of Echenique and Fryer
(2006) and Reardon and O’Sullivan (2004), considering the individuals and their spatial location
as the primitive of the segregation measure, hence avoiding the problem of arbitrary partitions.
The main innovation is the introduction of a stochastic framework, with the assumption that the
individuals locations are the realization of a stochastic process, mapping points on the plane. We
build on the theory of point processes, a branch of stochastic geometry often used in disciplines
like biology, epidemiology, astronomy, ecology and geology to model spatial data.
A spatial point process is a stochastic process mapping a countable set (of points) X in a space
S ⊆ R2 . The fundamental parameter of the process is the Intensity Function λ (ξ), i.e. the expected
number of points of the process in an infinitesimal area around the point ξ in S. The Intensity
Measure Λ (A) is obtained by integration of λ (ξ) over A ⊆ S, and it corresponds to the expected
number of points over A.
An Inhomogeneous Poisson Point Process is a spatial point process defined as follows: 1) for
any area A in S, the number N (A) of points of the process in the region A follows a Poisson
distribution with parameter equal to the Intensity Measure Λ (A); 2) given N (A) = n, the points
are identically and independently distributed over A according to a density f (ξ) = λ (ξ) /Λ (A). A
Marked Poisson Process is a process X = { {ξ, m (ξ)}| ξ ∈ X0 } such that a random mark m ∈ M
is attached to each point of the Poisson process X0 . In our application the mark represents the
racial group of an individual living at location ξ.
We develop a general framework in order to measure segregation, but we present an application
to racial segregation only. Income segregation is analyzed in another paper in progress, since the
estimation methodology is more complicated. The model specifically assumes that:
1. The individuals’ locations X0 are the realization of an Inhomogeneous Poisson Point Process
over the space S with intensity function λ0 (ξ).
2. Conditional on the realization of X0 , the marks m are mutually independent
3. For any racial group m, the conditional marks distribution ρ (ξ, m, X0 Âξ) . depends only on
the location ξ (it does not depend on the location of the other points X0 Âξ).
The first two conditions amount to assume that the locations of individuals belonging to the
diﬀerent racial groups follow an Inhomogeneous Marked Poisson Process, i.e. a process with different types and spatially varying intensities. The third condition insures that the process is
Poisson on the enlarged space S × M and it can be shown (see the technical appendix) that this
process is equivalent to a multivariate Inhomogeneous Poisson Process, a process composed by
independent Poisson Processes, one for each racial group, with intensities of racial group m being
λ (ξ, m) = λ0 (ξ) ρ (ξ, m), where ρ (ξ, m) is the conditional probability that in point ξ there is an
individual of racial group m.
The marked point process X is completely unsegregated if and only if the conditional probability
ρ (ξ, m) does not vary over space, i.e. ρ (ξ, m) = ρ (m) for all ξ and m. The marked point process
X is completely segregated if and only if the mark distribution is degenerate at each point, i.e. for
all ξ ∈ X0 , there exist an m∗ such that ρ (ξ, m∗ ) = 1 and ρ (ξ, m) = 0 for any m 6= m∗ .
We present an example of index, constructed according to this approach: we measure the level
of segregation at point ξ as absolute deviation from the completely unsegregated process, by using
3
the quantity |ρ (ξ, m) − ρ (m)|. The total segregation in the metropolitan area is then defined as the
sum of this quantity over all racial groups and points. The index of spatial dissimilarity is obtained
by normalizing this sum with its value under complete segregation. Such an index is immune from
the problems mentioned above, by definition.
We estimate the conditional mark distribution with the ratio of the estimated racial-specific and
b (ξ, m) /λ
b0 (ξ). The intensity estimates are obtained using
overall intensity functions, b
ρ (ξ, m) = λ
a multiplicative quartic kernel, a standard method in the literature (see Diggle 2003). The choice
of the kernel bandwidth is done by a MSE minimization criterion, as suggested by Diggle (1985)
and Berman and Diggle (1989). Since the MSE minimization prescribes a diﬀerent bandwidth
for each city, we have comparability problems among cities, cause the diﬀerence in the estimated
index can reflect also the diﬀerence in bandwidths: we thus present estimates in which we use the
same bandwidth for all the cities. We use a finite grid method, following the literature, but we
also provide a valid alternative in which the kernel estimator is evaluated only at the observed
locations, thus avoiding the approximations involved when using a finite grid. We then develop an
approximated nonparametric method for the case in which only count data at the block level are
available: the approximation is good enough as long as the intensity is smooth and the block area
is small.
We apply the methodology to artificial point pattern data and census block level count data.
Using census 2000 data we measure segregation levels in all the metropolitan areas of the US, and
we present results for 9 of them in order to make the exposition more concise. The rankings implied
by the spatial dissimilarity and the traditional indices are diﬀerent, proving that this methodology
doesn’t provide only a refinement of the traditional measurement but it is capturing some features
of the segregation phenomenon that standard indices are unable to detect. The correlations between
our index and the traditional ones are between .65 and .8, providing more evidence that we are
capturing something that traditional indices are not able to capture.
The advantages of this approach are striking. First of all the index does not depend on arbitrary
partitions and it is a function of individuals’ coordinates, by definition. We also avoid the problem
of comparability among diﬀerent census years because even if the census tracts definition changes
from one census to another the spatial dissimilarity index is unaﬀected. The comparability extends
also across countries, since the index is based on geographic coordinates (longitude and latitude),
while the neighborhood-based indices, since based on the specific neighborhood definition, do not
have this property. Second, we can measure the segregation of a continuous variable, like income:
the marks space M can be any metric space. In the case of racial segregation M = {1, 2, 3, 4, 5} is
discrete, while for income segregation we would have M = [0, ∞). In principle we can measure multilevel segregation by defining M = {1, 2, 3, 4, 5}×[0, ∞) or subgroups segregation by redefining the
conditional probabilities over subgroups, given the independece property implied by our assumptions. Third, we can build statistical tests, for example to test if New York is more segregated than
Chicago, based on the stochastic process. Fourth, we can estimate segregation at diﬀerent scales,
by changing the bandwidth of the kernel estimator. Theoretically, with a larger bandwidth the
point process is more similar to an homogeneous process and the segregation index will converge to
zero. Our simulations with census data confirm this result, which is also a theoretical justification
of Reardon et al (2006) and Feitosa et al (2006) empirical findings. Fifth, we have not considered
here the determinants of segregation, but the framework is flexible enough to be used for this kind
of study. Nothing
Pprevents us from assuming a parametric spatial model for the intensity function,
λ (ξ, m) = αm + K
k=1 β k Zk (ξ, m) , where the Zk (ξ, m)’s are geocoded variables aﬀecting the location of individuals. Once we estimate the parameters of such a model we are able to run some policy
experiment to determine which desegregation policies are more eﬀective, ceteris paribus. Finally, in
this work we present what can be called a "descriptive" theory of segregation indices. The index is
4
a random variable, depending on the realization of the point process: ideally we would like to know
how the segregation index changes as a function of the process parameters. In a paper in progress
we provide some theoretical results for indices under the same assumptions used in this work. We
are able to compute expected value and variance, so that we can estimate if New York is on average
structurally more segregated than Chicago, where structurally is interpreted as conditional on the
intensity function.
This work is related to several strands of literature. First of all it builds on the enormous body
of research on indices of segregation, which is highly influenced by Massey and Denton (1988): they
provided a principal component analysys of the segregation indices, showing that the dissimilarity
and isolation indices were able to explain most of the variability of segregation in US cities.Their
results encouraged researchers to use the dissimilarity as the only measure of racial segregation,
until recently, when some research pointed out the flaws of neighborhood based indices. Among
others, Echenique and Fryer (2006) question the arbitrary partition of the city in neighborhoods
and develop a segregation index based on individuals’ social networks, building on three basic axioms: monotonicity, linearity and homogeneity.3 Basically the Spectral Segregation Index measures
segregation based on social interactions of same race neighbors, where neighbors are the agents living within 1 km from the individual itself. Reardon and O’Sullivan (2004) extend the theory of
neighborhood-based segregation indices to spatial measures, adapting the properties required to
neighborhood-based indices to a framework based on the individuals location over a city map. In
their study the overall segregation is a function of the "local environment" of the agents, where local
environment is defined by a proximity function that may assume diﬀerent functional forms. Their
approach is very close to the one proposed here but in our setting the notion of local environment
is infinitesimal. Reardon et al (2006) show that empirically the segregation level measured by their
indices decreases as the local environment radius increases. They adopt an estimation strategy
similar to the one we use in this work, but they do not specify the underlying stochastic process
for the spatial pattern: they define the "empirical" local environment by the radius of the gaussian
kernel estimator for the intensity. Using similar techniques Feitosa et al (2006) compute several
segregation indices with spatial features. Even if they do not specify a stochastic process for the
individuals locations, their work provides local and global measures of segregation, and a basic test
for detection of segregation that builds on Anselin (1995).
The work is also related indirectly to the research on the eﬀects of racial and socioeconomic
segregation on the individual outcomes. Among others, Cutler and Glaeser (1997) analyze the
eﬀect of racial segregation in MSAs on socioeconomic outcomes, in particular high school and
college graduation, job idleness and earnings and single motherhood. Their estimates show that,
once correcting for endogeneity, segregation worsens those outcomes. Card and Rothstein (2006)
find that school and residential segregation explain most of the negative relative black test score.
Ananat (2007) confirms the negative eﬀect of segregation on outcomes, while providing a better
correction for the endogeneity of segregation.4 All of these works measure segregation using the
neighborhood-based approach. Echenique and Fryer (2006) replicate the specifications of Cutler
and Glaeser (1997) using their Spectral Segregation Index to measure segregation, showing that the
3
Monotonicity, if the individuals in city A have a larger share of connections/interactions with same race individuals
than in city B, then the level of segregation in A is higher than in B; Linearity, an individual’s segregation increaseas
linearly with the level of segregation of the agents she is connected to; Homogeneity, if all individuals in a city
network have half of their social interactions with same race agents, the index of segregation is one-half (this is just
a normalization).
4
In a related analysis, La Ferrara and Mele (2006) show that segregation has a positive eﬀect on the average public
school per pupil spending both at the district and the metropolitan level; nonetheless segregation is also associated
with an increased inequality of expenditure among districts.
5
qualitative results are unchanged, even if the magnitude of estimated eﬀects is slightly diﬀerent.
It is not clear that this results should still hold when using our approach and future research is
needed in this direction.
The third strand of literature related to this paper is the rapidly growing research on point
processes theory and their applications.5 Statistical models of point patterns are used in spatial
epidemiology (Diggle, Zheng and Durr (2005), Kelsall and Diggle (1998)), Neuroscience (Diggle,
Eglen and Troy (2006)), Astrophysics, Ecology, Geology (Zhuang, Ogata and Vere-Jones (2006))
and Image Recognition.
Especially related to the present work is Diggle, Zheng and Durr (2005), that studies the clustering of bovine tubercolosis in cornwall. They assume that the cases of diﬀerent types of tubercolosis
follow a multivariate inhomogeneous poisson process and then compute risk surfaces and conditional probability of a specific type of disease at a specific location. The definition of segregation
is similar to the one proposed here, but the conditional probabilities are computed taking into
account the controls.6 They use a kernel regression estimator for the conditional probabilities and
provide a test for detection of segregation based on Monte Carlo simulation: the null hypothesis of
no segregation is rejected.
The paper is organized as follows. In Section 2 we briefly consider the underlying motivation
of this work. In Section 3 we briefly introduce the theory of point processes and in Section 4 we
develop the idea of measuring segregation via conditional probabilities. In section 5 we present the
data, we briefly review the available estimation methods for point patterns data and we provide the
approximated estimation method for count data. Section 6 show the results and Section 7 provides
the conclusions and a discussion of future directions of research. The appendices contain a more
technical introduction to the theory of point process (A), the description of parametric estimation
methods (B) and some alternative estimates (C).
2
Motivation
Consider the problem of measuring the residential segregation of blacks in a city. The traditional
approach, which I will refer to as the neighborhood-based approach, involves the partition of the city
in n neighborhoods, then the computation for each neighborhood i of the share of blacks Bi /Pi ,
where Pi is the number of individuals and Bi the number of blacks in neighborhood i. If there is no
segregation the fraction of blacks in each neighborhood Bi /Pi will be equal to the fraction of blacks
in the whole city B/P . An index of segregation is then a synthetic measure of the diﬀerence between
the actual distribution of races over neighborhoods, i.e. (B1 /P1 , ..., Bn /Pn ), and the distribution
arising when there is no segregation (B/P, ..., B/P ), with appropriate normalization in order to get
a quantity between zero and one which is comparable across cities. According to the notion used
by the researcher to measure this diﬀerence one will obtain alternative indices.7
For example, the most popular measure of residential segregation is the dissimilarity index that
defines the diﬀerence among distribution by using the absolute deviation
5
See Diggle (2003), Moller and Waagepetersen (2004), Stoyan, Kendall and Mecke (1987) and Stoyan and Stoyan
(1994) for excellent introductions to the theory and some applications.
6
In their model there are four types of tubercolosis and there is also a control group, i.e. locations in which there
is an animal not infected by the disease. We don‘t have to model the control group in our application.
7
See Massey and Denton (1988) for an extensive review of the traditional indices. Reardon and Firebaugh (2002)
explicitly provide a discussion of the neighborhood based approach.
6
¯
¯
¯ Bi B ¯ 
−
P
i ¯ Pi
P¯
1

¡
¢
D=
B
B
2
P
1
−
P
P
i=1
n
X

(2.1)
This index is interpreted as the fraction of blacks that would have to move to another neighborhood in order to achieve a completely integrated city. This correspond to an intuitive notion of
segregation, i.e. an uneven distribution of the racial groups over the city’s neighoborhoods.
However the dissimilarity index and all the available neighborhood-based indices of segregation
share some common undesirable properties.
[Insert Figure 1 Here]
First of all, the index depends on the specific partition of the urban area, as argued by Echenique
and Fryer (2006). The Bureau of Census usually provides data at diﬀerent levels of aggregation:
census tracts, block groups and blocks. Census tracts usually have between 2,500 and 8,000 persons
and, when first delineated, are designed to be homogeneous with respect to population characteristics,
economic status, and living conditions.8 Therefore the definition of tracts itself biases the index
towards higher segregation. Furthermore diﬀerent partitions could lead to diﬀerent values of the
index: consider the example in Figure 1. The figure shows the locations of blacks (black circles)
and whites (white circles) in four stylized cities. The geographic distribution of the racial groups
in the four cities is the same, but diﬀerent neighborhood partitions are shown. If we adopt the
neighborhood-based approach, city A exhibits maximum segregation (D = 1), city B is perfectly
integrated (D = 0), city C is perfectly segregated (D = 1), and city D has an intermediate level of
segregation (D = .2291). This is of course an undesirable property.
The second problem is that the index does not take into account the spatial location of agents.
Consider again Figure 1, for example city C. A neighborhood-based index will consider all the people
living in the same neighborhood as experiencing the same level of spatial separation. Of course this
is not the case: the black agent living at coordinates (4,5) will experience much more segregation
than the black living at (3,6), the first one being surrounded only by same race neighbors while
the second experiencing more heterogeneity of his neighbors. This cannot be taken into account as
long as we do not consider individual-based indices.
The third problem is known to geographers as the Modifiable Area Unit Problem (MAUP): If
we compute the index of segregation using diﬀerent levels of aggregation of the data (tracts, block
groups or blocks) we will get diﬀerent numbers and, even worse, diﬀerent rankings of the cities in
terms of segregation.Let us compare cities A and B in Figure 1: city A is obtained by subdividing
each neighborhood of city B in four equivalent subunits. City A shows maximum segregation while
city B complete integration. There is evidence of the MAUP problem when using census data, and
the eﬀect is amplified when there is a very high level of segregation because smaller subunits (block
groups) are more homogeneous than bigger ones (census tracts), hence when using block groups
the index will be higher than when using census tracts.
8
http://www.census.gov/geo/www/cen_tract.html
7
Table 1: Rankings
Census Tracts
Detroit, MI
Gary-Hammond, IN
Cleveland, OH
Chicago, IL
Milwaukee, WI
Flint, MI
Saginaw-Bay City-Midland
Buﬀalo, NY
Newark, NJ
Glens Falls, NY
depends on subunits used
Block Groups
0.8728 Laredo, TX
0.8692 Gary-Hammond, IN
0.8482 Detroit, MI
0.8359 Cleveland
0.8204 Wausau, WI
0.8092 Bismark, ND
0.8072 Chicago, IL
0.8070 Eau Claire, WI
0.7798 Buﬀalo, NY
0.7780 Milwaukee, WI
0.8915
0.8893
0.8837
0.8623
0.8572
0.8550
0.8529
0.8438
0.8383
0.8359
In Table 1, using data from the 1990 Census, we show that the ranking of the cities in terms of
segregation of african-americans (we use the dissimilarity index) is diﬀerent if we use census tracts
or block groups. Furthermore, when using block groups, Laredo, TX is the most segregated, while
when using census tracts it is the 126th most segregated, with a dissimilarity of .51. For the least
segregated MSAs the pattern is similar, but less important than for highly segregated cities.
The framework that we propose here is able to solve all these problems together, since we will
use individual-based measures instead of neighborhood-based indices.
3
The Stochastic Environment
This section presents an introduction to the theory of spatial point processes, providing the necessary background for the understanding of our theoretical framework. A more detailed and technical
exposition is contained in the appendix. The reader familiar with these stochastic processes can
skip this section.
3.1
Notation, Basic Properties and Definitions
A spatial point process is a stochastic process that maps countable sets in planar regions.9 More
generally a spatial point process X is a random countable subset of a space S ⊆ R2 . We will denote
the process as a random set X = {xi } or, according to the context, as a random variable N (A),
that is the number of points in set A ⊆ S. We will denote the realizations of X as x and the
realizations of N as n. We will denote the generic point in S as ξ or η and the generic point of the
process as xi . We will write |A| to indicate the area of set A and dξ to describe the infinitesimal
region containing ξ. We will refer to the points of the process as events.
We will consider only finite point processes, i.e. stochastic processes that map finite sets of
points in any planar region. We will denote the set of all such realizations x as N1f . The point
processes that will be considered here are simple (or orderly), i.e. for any i 6= j, we have xi 6= xj :
this means that there are no coincident events (points).
[Insert Figure 2 here]
In Figure 2 we show the realizations of diﬀerent point processes that we use to explain the
basic definitions and properties. The HPP is an Homogeneous Poisson Process, which is considered
9
We just cover details about what is relevant for the present work. Any book about point processes listed in the
references can be used as a valid and much more detailed introduction.
8
in literature as the benchmark of complete spatial randomness, the IPP is an Inhomogeneous
Poisson Process, the MultitypePP is a Multitype Poisson Process (a superposition of independent
univariate HPP processes of diﬀerent type, where type is visually identified by colors), and the
MPP is a Marked Poisson Process, where the type of the points is defined by the radius of the
circles.
The first important concept is stationarity: if the point field is observed from diﬀerent regions on
the plane then the configurations of the points are similar, diﬀerences arising only from randomness.
More formally, a point process is stationary if all probability statements about the process in any
bounded A of the plane are invariant under arbitrary translations. This property is very important
in defining the randomness of the process as we’ll see below. Analogously a point process is isotropic
if the invariance holds under arbitrary rotations. Stationarity and Isotropy together give what is
called motion-invariance. In the measurement of segregation we will consider non-stationary and
non-isotropic processes.
The processes HPP and MultitypePP in Figure 2 are stationary and isotropic (see the Homogeneous Poisson Process definition below) while the IPP is neither stationary nor isotropic. One
should be aware that motion-invariance or stationarity do not imply regular patterns, since the
process is stochastic and there is noise in the realizations as can be seen in the figure.
3.2
The intensity function
Consider a process X defined over S ⊆ R2 . The intensity function is defined as a (locally integrable)10 function λ : S → [0, ∞)
¾
½
EN (dξ)
(3.1)
λ (ξ) = lim
|dξ|
|dξ|→0
and it is the analogous of the expectation for a random variable. We can interpret it as the
expected number of points of the process per infinitesimal area dξ around the point ξ. The intensity
measure of a point process X is defined for any A ⊆ S as
Z
λ (ξ) dξ
(3.2)
EN (A) = Λ (A) =
A
Since we are considering only finite simple point processes, we will have Λ (A) < ∞ for all
bounded A ⊆ S and Λ ({ξ}) = 0, for any ξ ∈ S.
We can reinterpret the intensity function in the following way: the quantity λ (ξ) dξ can be
thought of as the probability that there exists an event in the infinitesimal region dξ, i.e λ (ξ) dξ ≈
P [N (dξ) = 1], since for an infinitesimal region dξ we have EN (dξ) ≈ P [N (dξ) = 1].
3.3
Poisson Processes
The Poisson point processes are by far the most important in applications and are the models that
define the notion of complete spatial randomness.
DEFINITION 3.1 (Poisson Point Process) A point process X on S is a Poisson Point Process with intensity λ (ξ) if the following two conditions are satisfied:
1. for any A ⊆ S, N (A) ∼ P oisson (Λ (A))
10
A function is locally integrable if
U
A
λ (ξ) dξ < ∞ for all bounded A ⊆ S.
9
2. conditional on N (A) = n, the events are identically and independently distributed over A
according to the density f (ξ) = λ (ξ) /Λ (A)
We will denote the generic Poisson Process as X ∼ P oi (S, λ (ξ)).
The processes HPP and IPP in Figure 2 are examples of Poisson Point Processes. Condition (1)
drives the number of events in the region A, while condition (2) states that conditional on the draw
from the Poisson distribution, the events are i.i.d. with density f , the ratio of intensity function
and intensity measure. Condition (1) also implies for any bounded A ⊆ S, that EN (A) = Λ (A).
A Poisson Point Process is Homogeneous or stationary (HPP) if the intensity function is constant
over space, i.e. λ (ξ) = λ, for all ξ ∈ S and f (ξ) = |A|−1 , for any A ⊆ S. It follows that for an
Homogeneous Poisson Process EN (A) = λ |A|. The HPP is considered the ideal of complete spatial
randomness in literature: roughly speaking complete spatial randomness means that we do not
expect the intensity of the process to vary over the region we are considering and that there are no
interactions amongst diﬀerent events. Indeed, by condition (1) and the fact that λ (ξ) = λ, an HPP
shows stationarity and isotropy, cause N (A) ∼ P oisson (λ |A|), and thus the expected number
of events does not vary over the planar region A; by condition (2) and f (ξ) = |A|−1 , we have no
clustering or inhibition (the presence of a point in ξ does not make more or less likely the occurrence
of an event η in the neighborhood of ξ). The process HPP in Figure 2 is an Homogeneuous Poisson
Process with intensity λ = 100 over the unit square.
The Poisson Process is Inhomogeneous or nonstationary (IPP) if the intensity function is not
constant. The IPP is the simplest class of nonstationary point processes used in applications. The
IPP realization in Figure 2 shows an example of IPP with intensity λ (ξ) = 200 (ξ 1 )2 + 200 (ξ 2 )2
over the unit square, where ξ = (ξ 1 , ξ 2 )
3.4
Marked Point Processes
Consider a point process X0 defined over the space S ⊆ R2 . If we attach a random mark m (ξ) ∈ M
to each point ξ ∈ X0 then the process
X = { {ξ, m (ξ)}| ξ ∈ X0 }
is called Marked Point Process with events in S and marks in M. The marks attached to the
point of the process are itself random variables. The easiest way to think about this process is
a point process to which we randomly attach labels: the realization is thus a bunch of locations
(points) with diﬀerent labels.
Notice that the space M may be either a finite set, i.e. M = {1, 2, ..., M }, in which case X
is called multitype process, or a general subset M ⊆ Rq , q ≥ 1. Both the bottom realizations in
Figure 2 come from Marked Point Processes, the first one being a multitype process with marks in
M = {red, black, green} and the second one a marked point process with marks space M = [0, ∞).
In next section we build a segregation measure based on a specific marked point process, that
we describe in the following.
DEFINITION 3.2 (Marked Poisson Process) The Point Process X = { {ξ, m (ξ)}| ξ ∈ X0 }
is a Marked Poisson Process if:
1. X0 ∼ P oi (S, λ0 (ξ))
2. conditional on X0 the marks { m (ξ)| ξ ∈ X0 } are mutually independent
10
Let us denote the conditional marks distribution as ρ (ξ, m, X0 Âξ). In principle ρ (·) can depend
on the specific location ξ but also the location of the other points of the process X0 Âξ (it cannot
depend on the other points’ marks by condition (2) of the above definition). If ρ (ξ, m, X0 Âξ) =
ρ (ξ, m) for any ξ ∈ X0 and for any m ∈ M, i.e. the conditional marks distribution does not
depend on the location of the other events X0 Âξ, then the MPP is a Poisson Point Process over
the enlarged space S × M, with intensity λ (ξ, m) = λ0 (ξ) ρ (ξ, m) (see proposition A.1 in the
appendix).
Furthermore, when the marks space M is finite (for example for racial groups) we have another
useful result: a multitype process with ρ (ξ, m, X0 Âξ) = ρ (ξ, m) is equivalent to a multivariate
Poisson Process (see proposition A.2 in the appendix). A multivariate Poisson Process is obtained
by superposition of independent univariate Poisson Processes. Therefore if we have a multitype
process with M = {1, 2, ..., M } and ρ (ξ, m, X0 Âξ) = ρ (ξ, m) for any ξ ∈ X0 and for any m ∈ M, we
can reformulate it as a multivariate Poisson Process (X1 , X2 , ..., XM ), with Xm ∼ P oi (S, λ (ξ, m))
mutually independent and λ (ξ, m) = λ0 (ξ) ρ (ξ, m), m = 1, ..., M . This last result will be exploited
in the estimation.
4
4.1
Segregation Measurement
General Framework
In this section we will develop a statistical framework in order to measure segregation, based on
point processes theory. Similar statistical models are employed in spatial epidemiology for disease
mapping and for detection of disease clusters. Among others Kelsall and Diggle (1998) develop a
multivariate poisson process model in order to estimate the spatial variation in risk of desease for a
population at risk. A similar model is used in Diggle, Zheng and Durr (2005) to detect clustering
of diﬀerent types of bovine tubercolosis in a region.
Consider the set of all the possible finite realizations of the marked point process, that we call
N1m . We want to measure the spatial segregation of a set of points X = {xi , m (xi )}ni=1 that are
characterized both by their position xi in the city area S and a mark m (xi ) defined over a space
M. Examples of marks are racial groups, income groups, income levels, education levels, a mix of
them. The marks space can be any metric space so we are not constrained to measure segregation
over a univariate mark.
In our view an index of segregation should be a function of the locations of all the individuals
and their type (racial group, income level, education level). Therefore we define a segregation index
to be a function of the realization of a marked point process, with range in [0, 1], i.e. a segregation
index is a function Φ : N1m −→ [0, 1]: Φ is increasing with respect to the diﬀerences among the
actual spatial distribution and the distribution under complete integration. The index is zero if the
process is unsegregated and one if the process is completely segregated.
It should be realized that in this stochastic setting the segregation index is a random variable,
and according to the realization x of the marked point process, there will be a corresponding
realization φ of the segregation index. In this work we will provide an analysis of segregation of
the realized spatial pattern: Mele (2007, in progress) provides theoretical results and shows how to
compute the moments of any index.
We assume that the locations of the individuals X0 are the realization of an Inhomogeneous
Poisson Point Process over the space S ⊆ R2 with intensity function λ0 (ξ)
ASSUMPTION 4.1 The individuals locations X0 follow an Inhomogeneous Poisson Process with
11
intensity λ0 (ξ) over S
X0 ∼ P oi (S, λ0 (ξ))
(A1)
The next two are the crucial assumptions for the model.
ASSUMPTION 4.2 Conditional on X0 , the marks are mutually independent, i.e. for ξ i ∈ X0 ,
i = 1, ..., n
n
Y
P ( m (ξ i ) = mi | X0 )
(A2)
P ( m (ξ 1 ) = m1 , ..., m (ξ n ) = mn | X0 ) =
i=1
We are thus assuming that X = { {ξ, m (ξ)}| ξ ∈ X0 } is a Marked Poisson Process over the
region S with marks in the space M.
Let’s define ρ (ξ, m, X0 Âξ) ≡ P ( m (ξ) = m| X0 ), the probability that a point ξ has mark m,
conditional on the realization of the locations X0 . We assume that this conditional probability
depends on the location ξ, but it does not depend on the locations of the other points of the
process X0 \ ξ.
ASSUMPTION 4.3 For all ξ ∈ X0 , for all m ∈ M
ρ (ξ, m, X0 \ ξ) = ρ (ξ, m)
(A3)
The assumptions (A1-A3) imply that the process X is Poisson over the enlarged space S × M,
with intensity λ (ξ, m) = λ0 (ξ) ρ (ξ, m) (proposition A.1 in appendix). When the marks space
is discrete we can reformulate the model as a multivariate inhomogeneous poisson process X =
M
S
Xm with intensities λ (ξ, m) = λ0 (ξ) ρ (ξ, m), m = 1, 2, ..., M respectively, where Xm and Xm0
m=1
are stochastically independent for m 6= m0 (proposition A.2 in appendix).
In this setup the process exhibits segregation if the spatial pattern of a specific type is diﬀerent
from that of the population as a whole. In terms of the model there is no segregation if the
conditional probability of each type does not depend on the location, i.e. if the intensity of each
type is proportional to the intensity of the whole population over the entire metropolitan area S.11
DEFINITION 4.1 The marked point process X is completely unsegregated if and only if
ρ (ξ, m) = ρ (m) for all ξ ∈ X0 , m ∈ M
In the definition ρ (m) corresponds to the marginal marks’ distribution (and it can easily be
estimated from the data).
We observe maximum segregation if the realization exhibits a degenerate conditional marks’
distribution at each point. Formally, the definition is slightly diﬀerent if we have a continuous or a
discrete marks’ space. In the discrete case the definition is the following12
DEFINITION 4.2 The marked point process X is completely segregated if and only if for all
ξ ∈ X0 , ∃m∗ ∈ M such that ρ (ξ, m∗ ) = 1 and ρ (ξ, m) = 0 for any m 6= m∗ .
11
In literature this is called random labelling.
See also Diggle, Zheng and Zurr (2005) that use a similar definition.
When M is continuous as in the case of income segregation we modify the definition as follows:
The marked point process X is completely segregated if and only if for all ξ ∈ x0 , ∃m∗ = m∗ (ξ) ∈ M such
that ρ (ξ, m) = δ (m − m∗ ), where δ (u) is the Dirac-Delta function.
12
12
4.2
An Index of Racial Segregation
We measure the level of segregation at location
P ξ as the absolute deviation from a complete unsegregated process. We consider the quantity m∈M |ρ (ξ, m) − ρ (m)| for each ξ ∈ X0 as the measure
of diﬀerence between the two distributions. If we sum for all the points of the realization of X0
we get a measure of total deviation from complete integration. In order to have an index varying
between zero and one we normalize the realized sum by the theoretical value of the sum under
complete segregation.
P
P
Our measure of segregation is defined as ξ∈X0 m∈M |ρ (ξ, m) − ρ (m)|. This is the equivalent
of the dissimilarity index in our setting, so
Pcall it spatial dissimilarity index. The normalizaPwe will
tion is obtained by dividing the quantity ξ∈X0 m∈M |ρ (ξ, m) − ρ (m)| by its value under perfect
P
segregation,13 i.e. 2n m∈M ρ (m) (1 − ρ (m)).
The Spatial Dissimilarity Index is thus defined as the ratio
X X
|ρ (ξ, m) − ρ (m)|
φdism =
ξ∈X0 m∈M
2n
X
m∈M
(4.1)
ρ (m) (1 − ρ (m))
The index is very similar to the traditional dissimilarity index, but it specifically takes into
account the position of the individuals in the city map. The main diﬀerence is that in the traditional
approach the conditional probability ρ (ξ, m) is the same for all the individuals belonging to the
same census tract, while here we are not making any such restrictions.
The most popular works on racial segregation are dedicated to the dichotomous case, in which
we measure the segregation of a group with respect to the rest of the population, e.g. we measure
the segregation of blacks with respect to the non-blacks. In its dichotomous version, the spatial
dissimilarity can be simplified (b=blacks).14
X
|ρ (ξ, b) − ρ (b)|
φdism =
ξ∈X0
2nρ (b) (1 − ρ (b))
(4.2)
It should be clear that the spatial dissimilarity is X
just an example of index that we can build
in this framework: any index based on the measure
h (ξ), where h is a nonnegative function
ξ∈X0
summarizing the diﬀerence among the actual distribution and the distribution under no segregation
S
S
Consider the quantity ξ∈x0 m∈M |ρ (ξ, m) − ρ (m)|. With some algebra and considering that, under complete
∗
segregation, for all ξ ∈ X0 , ∃m ∈ M such that ρ (ξ, m∗ ) = 1 and ρ (ξ, m) = 0 for any m 6= m∗ , we get
[ [
[
|ρ (ξ, m) − ρ (m)| =
(|ρ (ξ, 1) − ρ (1)| + ... + |ρ (ξ, M ) − ρ (M)|)
13
ξ∈x0 m∈M
ξ∈x0
=
=
=
ρ (1) n |1 − ρ (1)| + (1 − ρ (1)) n |0 − ρ (1)| + ...
... + ρ (M) n |1 − ρ (M)| + (1 − ρ (M)) n |0 − ρ (M)|
2ρ (1) n (1 − ρ (1)) + ... + 2ρ (M) n (1 − ρ (M))
[
2n
ρ (m) (1 − ρ (m))
m∈M
14
With some
algebra
we get
S
S
m∈M |ρ(ξ,m)−ρ(m)|
0
S
=
φdism = ξ∈x
2n
ρ(m)(1−ρ(m))
m∈M
2
S
ξ∈x0 |ρ(ξ,b)−ρ(b)|
4nρ(b)(1−ρ(b))
=
13
S
ξ∈x0 |ρ(ξ,b)−ρ(b)|
2nρ(b)(1−ρ(b))
(a distance function for example), can be used as index ofPsegregation, under the appropriate
normalization.15 In the spatial dissimilarity we used h (ξ) = m∈M |ρ (ξ, m) − ρ (m)|
4.3
Discussion
The main assumption we imposed is that the pattern of locations is a realization of an Inhomogeneous Multitype Poisson process over the metropolitan area. While this assumption may sound
inappropriate for the phenomenon under consideration, racial or income segregation, it can be justified somehow. First, it is clear that the segregation of blacks over the metropolitan area implies
some interaction and/or interdependence among events of diﬀerent type. While this is an interesting consideration per se, it is not clear how this interdependence can be modeled, a priori, and it
would probably require the specification of a parametric model with some covariates driving the
intensity function. This can be implemented only with the availability of very detailed individuallevel census data containing not only the location coordinates, but also detailed socio-economic
data. Given the actual availability of data at the block level in which the only informations are the
location of the blocks’ centroids, its racial composition and some summary indicators of housing
values, it is not obvious that we can construct such a model. Second, in this work we are mainly
interested in measuring the extent of socioeconomic segregation, we are not considering its determinants. Modelling the interdependence of the events would probably give more insights on the
causes of segregation, but it is unlikely to give some additional information about the measurement.
Therefore the assumption of independence can be considered as a useful benchmark (to be
improved in the future, especially for policy purposes), since we are measuring the extent of segregation without exploring its determinants. Future research will be devoted to explore other models
and to relax this assumption. In particular, of great interest are the Marked Pairwise Interactions
Models, that can actually take into account the possible "repulsion" or "attraction" among diﬀerent
locations and among diﬀerent marks. See Diggle (2003) or Moller and Waagepetersen (2004) for a
extensive introduction to these models.
This framework is very flexible and we can adapt it to many diﬀerent setting and measurement
purposes. For example let’s measure the segregation of blacks with respect to the minorities (defined
as blacks, asians, hispanics). This is a dichotomous index in which we consider blacks vs a fraction
of the whole population. We redefine the conditional probabilities: let p (ξ, b) be the conditional
probability that at location ξ there is P
a black, given that in ξ there is a member of the minority
{a, b, h}. We have p (ξ, b) = ρ (ξ, b) / m∈{a,b,h} ρ (ξ, m), while under perfect integration p (b) =
S
P
ξ∈x0 |p(ξ,b)−p(b)|
ρ (b) / m∈{a,b,h} ρ (m) and we can define the index accordingly φdism = 2np(b)(1−p(b))
.
If we P
want to measure the multigroup segregation
of the minorities we define p (ξ, m) =
P
ρ (ξ, m) / k∈{a,b,h} ρ (ξ, k) and p (m) = ρ (m) / k∈{a,b,h} ρ (k) and the index will become φdism =
S
ξ∈x0
S
S
m∈{a,b,h} |p(ξ,m)−p(m)|
.
Following the same methodolody and with a continuous mark space, we can define a measure of
income segregation. We can also build measures of income segregation by racial groups or measures
of income segregation by income groups. For example income segregation among those with income
above or below a certain level or in an income interval. This also can lead to build measures of
multilevel segregation by using an higher dimensional marks’ space. It should be noted that when
considering multidimensional marks the independence assumption holds for the vector m and not
for the components of m: for example if we consider the segregation of race (r) and income (y)
we assume independence across pairs m = (r, y) but not among r and y. This implies that we can
2n
15
m∈{a,b,h}
p(m)(1−p(m))
See Mele (2007, in progress) for general results on indices additive in the events.
14
account for any possible correlation among marks and use the conditional joint distribution of race
and income ρ (r, y) to describe the non segregated case. We leave these developments to future
research.
Finally, in this work we present what can be called a "descriptive" theory of segregation indices.
As we already noticed above, the index is a random variable, depending on the realization of the
point process: ideally we would like to know how the segregation index changes as a function of
the process parameters. In Mele (2007, in progress) we provide some theoretical results for indices
under the same assumptions used in this work. The Poisson assumption simplify the problem of
computing the moments (in particular expectation and variance) of the index. We prove that given
an intensity function we are able to compute the moments of any index. The only drawback is
that in practice these are computationally very diﬃcult, involving the evaluation ofPan infinite
sum of infinite integrals. However, if we restrict the attention to indices of the form ξ∈X h (ξ),
i.e. if we define the index as the sum over all the events of a particular individual level index
(or function), we can show that the moments reduce to a simple integral that can be computed
with traditional numerical integration methods. This allow us to estimate if (say) New York is on
average structurally more segregated than Chicago, where structurally is interpreted as conditional
on the intensity function.
5
Empirical Methodology
All the data analysis was performed with R16 by using some available packages for the analysis of
spatial point patterns and by custom functions written by the author in R and C.17 We first test
the performance of this approach with artificial datasets, in order to give a flavor of the potential
improvement with respect to the traditional neighborhood-based measures. We also use data of
Metropolitan Statistical Areas (MSAs) and Primary Metropolitan Statistical Areas (PMSAs) from
the 1990 and 2000 Census.
5.1
Data
The artificial data were created in order to show the diﬀerences of the individuals-based measure
and in order to convince the reader that it is immune from the problems mentioned in section 2.
[Insert Figure 3 here]
In Figure 3 we show the plot of the six artificial cities: A(symptotia), B(ayesia), C(lassica),
D(eMoivria), E(mpirica) and F(isheria). Each city contains 800 individuals, distributed over the
square [0, 4] × [0, 4]. There are 25% blacks (the black circles) and 75% whites (the red circles). The
grid represents the partition in neighborhoods, so each city contains 16 neighborhoods with the
same area (a square [0, 1] × [0, 1]). The following picture it is useful to explain how we constructed
the cities.
16
http://www.r-project.org/
In particular we used the packages Splancs and SpatStat. The first one developed by Diggle and Rowlingson
(1993) is especially devoted to nonparametric methods and it’s quite flexible for handling and manipulating data.
The disadvantage is the fact that the polygons in which the data are created or simulated must be convex. The
second one, developed by Baddeley and Turner (2005), is more related to parametric techniques but it also allows
for nonconvex polygons, which is useful when considering real datasets (Manhattan, for example, is not a convex
polygon). We also used the package spatialkernel developed by Diggle, Zheng and Durr (2005), but we modified
some C routines in order to compute our indices and to speed up our empirical strategy.
17
15
1
5
9
13
2
6
10
14
3
7
11
15
4
8
12
16
Cities A, B and C were constructed in the following way: we simulated an HPP with 50 points
on a unit square, one for blacks and a diﬀerent one for whites; then we cloned and used the unit
squares as neighborhoods of the cities, assigning 4 of them to be black and 12 of them to be white.
For city A the neighborhoods 13 to 16 are black, for city B the neighborhoods 2, 6, 7 and 16
are black, and for city C we created a central ghetto by assigning 6, 7, 10 and 11 to the black
population.
City D was constructed in a diﬀerent way. We simulated an HPP with 600 points over the
square [0, 4] × [0, 4] and we assigned those points to be whites. Then we simulated an HPP with
200 points in the circle of radius one and we translated the coordinates such that the center of the
circle coincided with the center of the city. We assigned the points in the circle to be black. This
creates a pattern similar to city C, but we allow some whites to be inside the ghetto and the ghetto
does not cover the entire area of 4 neighborhoods.
City E was constructed by simulating an HPP with 600 points over the square [0, 4] × [0, 4] for
the whites. Then we simulated two HPP with 100 points each over the circle of radius 1 for the
black population. We set the circle’s centers to coincide with the centroid of neighborhoods 6 and
11. This creates an irregular black neighborhood in the city, while allowing whites to be inside the
ghetto too.
Finally, city F is the result of a simulation of an HPP with 600 points over the square [0, 4]×[0, 4]
for the whites and an HPP with 200 points over the square [0, 4] × [0, 4] for the blacks. This is the
perfect integrated case, according to our framework.
In order to prove the eﬀects of arbitrary partitions and the MAUP on the traditional measures
of segregation we constructed diﬀerent partitions of the cities in neighborhoods of equal area. In
particular we will show how cities D and E’s measured level of segregation changes by progressively
increasing the numbers of identical neighborhoods in the progression: 4, 16, 64.
We also use census data from the 1990 and 2000 US Census of Population and Housing. The
ideal dataset would consist of individual or household level data on location, racial group and
socioeconomic characteristics. Unfortunately such data are not available, therefore initially we
followed Echenique and Fryer (2006) and used data aggregated at the block level from the Summary
Tape File 1B (STF1B) of the Census 1990 and the Summary File 1 (SF1) of the Census 2000,
containing the location of the centroid of the block, the racial composition and some indicator of
socioeconomic status (mean housing value, mean rent). We use the procedure of Echenique and
Fryer (2006) and define a block to be black if its black population is more than a half. Otherwise we
define the block to be nonblack. The same rule is followed for other racial groups. When computing
the multigroup indices we assign a block to belong to a specific race if that race is majority in the
block.
[Insert Figure 4 and 5 here]
Even if this is not the ideal dataset, it is a reasonable approximation of household level data,
because blocks are very homogeneous for racial and socioeconomic composition (see the discussion
in Echenique and Fryer (2006) for example). As an example the geographic pattern of blacks’
segregation in New York is shown in Figure 4: in red we have the centroids of the nonblacks’ blocks
while in black the african-americans’ blocks. The pattern of geographic separation is evident: the
16
black population is concentrated in Harlem, Bronx and Bedford-Stuyvesant. In Figure 5 we show
the same patterns for all the racial groups, showing in red the whites, in black the african americans,
in green the asians, and cyan the other races (which are more or less equivalent to the hispanic
population). The racial segregation pattern is very clear in the multigroup case too.
However we noticed that this procedure can bias the index towards more segregation, since we
are considering a block with 51% blacks and one with 99% blacks to be the same in our dataset
and this is exactly what we would like to avoid. For instance, the metropolitan area of Laredo, TX,
has 193117 inhabitants in 2000 and a population of 713 blacks. Using the Echenique and Fryer’s
procedure we end up with 2 black blocks and 3091 white blocks. It turns out that these two black
blocks are blocks with just one inhabitant, which is black. In Laredo we are thus using just 2 blacks
of the 713 living in the city. This of course biases the index towards one.
To take this into account we propose two alternative estimation methods, one for point pattern
data and one for block level data. The second method is particularly relevant when only block level
summary census data and the centroid of the block are available, since the Poisson assumption
allows the researcher to recover the intensity function at each location (not only at the centroids).
5.2
Estimation Strategy with Point Pattern Data
In order to get an estimate of the spatial dissimilarity index we use b
ρ (m) = nm /n as an estimate
of the marginal mark distribution. The spatial dissimilarity index can be estimated by
¯
PM Pn ¯¯
nm ¯
b
ρ
(x
,
m)
−
i
m=1
i=1
b
φ
(5.1)
¡
¢n
P
dism =
nm
nm
2n M
1
−
m=1 n
n
The estimate b
ρ (ξ, m) is obtained by nonparametric methods. We already explained that we can
reformulate the multitype point process as a multivariate poisson process with independent univariate processes, and so we can estimate the univariate processes separately (because of independence).
This observation leads to a convenient and intuitive estimate of b
ρ (ξ, m)
b
ρ (ξ, m) =
b (ξ, m)
λ
b0 (ξ)
λ
(5.2)
b (ξ, m) is the estimate of the intensity function for the univariate process Xm , correwhere λ
sponding to the spatial pattern of group m. Diggle (1985) and Berman and Diggle (1989) suggested
b (ξ) = N (ξ, h) /πh2 , where N (ξ, h) is the number of events within disthe following estimator, λ
tance h from ξ. Basically this is the counting of events within the disc of radius h and centered
in ξ, scaled by the area of the disc.18 . In a more general formulation we have the following kernel
estimator (see Diggle (2003), p.148 or Moller and Waagepetersen (2004))
b (ξ) =
λ
n
X
i=1
kh (ξ − xi )
S kh (ξ − xi ) dξ
R
(5.3)
where kh (u) = h12 k (u/h). In our computations we will use a multiplicative quartic kernel in
order to speed up the estimation procedure.19
18
This can be intepreted as a kernel estimator in which the kernel is
1
if 0 ≤ u ≤ 1
πu2
k (u) =
0
otherwise
19
For example, a gaussian kernel takes longer to compute and does not give much improvement in the estimation.
17
Some words whould be spent about the choice of the bandwidth h. Of course, the use of a
kernel estimator calls for a criterium to choose the optimal bandwidth h. Researcher usually rely
on MSE minimization, cross-validation criteria or more complicated methods. In this application
it is not clear if the bandwidth should be chosen by using one of these criteria.
From one hand, we can say that the optimal h should be diﬀerent for each city. If a city is more
spatially dense than another one then the bandwidth should take it into account. Also since the
bandwidth can be interpreted as defining the relevant neighborhood for the individual (the local
environment, in the words of Reardon and O’Sullivan (2004)), we can think that diﬀerent cities
could in principle have diﬀerent relevant neighborhoods, and thus diﬀerent h’s. This would suggest
to choose diﬀerent h’s for diﬀerent cities. In almost all the estimations we will choose h such that
the Mean Squared Error is minimized, following the computations of Diggle (1985) and Berman
and Diggle (1989) that show the formula for the M SE (h) in the case of a stationary and isotropic
Cox Process.20,21
Z Z
1 − 2K (h) ¡ 2 ¢−2
λ2 (kξ − ηk) dηdξ
M SE (h) = λ2 (0) + Λ (A)
+
πh
(5.4)
πh2
where λ2 (kξ − ηk) is the second-order intensity function defined as
¾
½
E [N (dη) N (dξ)]
λ2 (ξ, η) =
(5.5)
lim
|dη| |dξ|
|dξ|,|dη|−→0
which is a measure of the spatial assotiation of the process. Notice that E [N (dη) N (dξ)] ≈
P [N (dη) = N (dξ) = 1], for ξ and η close. If we assume stationarity and isotropy then λ2 (ξ, η) =
λ2 (kξ − ηk), i.e it is a function of the euclidean distance among the two points. The quantity K (h)
is
K (h) = λ
−1
−2
E [No (h)] = 2πλ
Z
h
λ2 (ξ) ξdξ
(5.6)
0
and it is defined as the expected number of further events in the circle of radius h and center
ξ. We estimate K (h) with the celebrated Ripley’s estimator: define w (ξ, u) as the proportion of
the circumference of the circle with center ξ and radius u, which lies in S, and wij = w (xi , uij ),
where uij = kxi − xj k .
n
b (h) =
K
XX
1
−1
|S|
wij
Ih (uij )
n (n − 1)
(5.7)
i=1 j6=i
where Ih (uij ) = I (uij ≤ h) is an indicator function. This gives edge-corrected estimates of the
K(h) function. For the remaining part of (5.4), λ2 (0) does not depend on h, while for the integral
we use the weighted integral suggested by Berman and Diggle (1989). By plugging these estimates
\
in (5.4) we obtain an estimated M
SE (h).
On the other hand, the estimated index will be a function of the bandwidth. We know that the
intensity estimator is more sensitive to the bandwidth than to the specific kernel function chosen.
Furthermore it is known that the choice of the specific kernel function is not as crucial as the choice of the bandwidth
h: we concentrated our eﬀorts on the latter issue.
20
A Cox Process is
a point process such that:
stochastic process
1) Λ (ξ) : ξ ∈ R2 is a non-negative-valued
2) Conditional on the realization Λ (ξ) = λ (ξ) : ξ ∈ R2 , the point process follows a IPP with intensity λ (ξ).
We can see a IPP as a particular Cox process in which the distribution of Λ (ξ) is degenerate at λ (ξ).
21
This is a simple but rough method of computing the optimal bandwidth. The literature on Point Processes
usually relies on ad hoc criteria. Diggle, Zheng and Durr (2005) use cross-validated likelihood methods.
18
This would imply that the diﬀerence of the measured segregation in two cities can depend on the
specific bandwidth selection, suggesting to choose the same fixed bandwidth for all the cities. We
present results in which we use a fixed bandwidth of .5 and 1 km respectively.
It should be mentioned that there are other methods to estimate ρ (ξ, m). Among others, Kelsall
and Diggle(1998) and Diggle, Zheng and Durr (2005) use a kernel regression estimator for ρ (ξ, m)
and they choose the optimal bandwidth by cross-validated likelihood. We will not experiment with
these techniques here and we will consider alternative estimation approaches in future developments.
We conducted several experiments in order to find the fastest estimation procedure. We estimated the intensities using a finite grid of 2000×2000 for the kernel estimation, but we experimented
with finer grids to test the robustness of the results. As a practical matter, when estimating the
b (ξ, m) and λ
b0 (ξ), in order to avoid unconditional probability, we use the same bandwidth for λ
pleasant results like probabilities greater than one or conditional probabilities not summing up to
one. However, we realized that the use of the grid can be a source of problems sometimes: using
the same grid for (say) New York and Champaign will lead to a greater approximation error for
New York, since this city is much more densely populated than Champaign. Moreover, notice that
we we do not need to evaluate the intensity in each point of the set S but just at the observed
locations. This is feasible because in order to compute the index (5.1) we need to evaluate the conditional probabilities at the observed points only. This method is considered more precise because
the estimated indices do not rely on approximations on a finite grid.
5.3
Estimation Strategy with Count data
In the case in which we have data aggregated by area, i.e. counts per block as in the Census
Summary Files, we can use an approximated kernel regression method. The metropolitan area S
K
[
Sk and Sk ∩ Sl = ∅, for k 6= l. By the independent
is partitioned in K disjoint subunits, S =
k=1
scattering property the counting variables N (Sk ) over disjoint regions are independent. Therefore,
by the definition of intensity function and intensity measure, we have
Z
λ (ξ) dξ
EN (Sk ) =
Sk
for any k. This implies that we can write the number of points as
Z
N (Sk ) =
λ (ξ) dξ + uk
Sk
where uk is a mean zero error, uncorrelated across
R blocks because
¡ ¢of the independence across
disjoint regions: thus there exists a ξ ∈ Sk such that Sk λ (ξ) dξ = λ ξ |Sk | and we can write
¡ ¢
N (Sk ) = λ ξ |Sk | + uk
(5.8)
If we assume that λ (ξ) is a very smooth function and the area of the block |Sk | is small, we can
approximate (5.8) for ξ ∈ Sk with
N (Sk ) ≈ λ (ξ) |Sk | + uk
This allows us to use a kernel regression approach to estimate the expected number of points
in Sk ,
E [ N (Sk )| ξ] ≈ λ (ξ) |Sk |
19
and thus the function λ (ξ) |Sk | can be estimated as
b (ξ) |Sk | =
λ
n
X
i=1
K (ξ − xi )
Pn h
ni
j=1 Kh (ξ − xj )
where xi ’s are the centroids of the census blocks. Using this procedure we can then estimate
b (ξ, m) |Sk | and taking the ratio we get an estimate for b
b0 (ξ) |Sk | and λ
ρ (ξ, m)
λ
Pn
b (ξ, m)
Kh (ξ − xi ) nmi
λ
b
ρ (ξ, m) =
= Pi=1
n
b0 (ξ)
λ
i=1 Kh (ξ − xi ) n0i
(5.9)
where n0i is the number of people living in block i and nmi is the number of people belonging
to race m and living in block i; we use the estimated conditional probabilities evaluated at the
block centroid to compute the index.
In Appendix C we present indices estimated by smoothing the proportion of each racial group,
using the same approach. A practical alternative would be to assume that all the mass of individuals
is concentrated at the centroid of the block. This is equivalent to assume that the intensity at
the centroid is equal to the total number of people in the block. This procedure is practically
very appealing but contradicts the point process assumptions, thus we prefere to use the kernel
regression approach just shown.
In Appendix B we present alternative parametric estimation methods for point patterns and
count data. Specifically, for count data, we can recover the intensity function using MLE techniques,
as long as we assume a parametric model for the intensity and we know the polygonal shape and
coordinates of each block.22
6
Results
6.1
Artificial Data
In Figure 6 we show a plot of the estimated MSE as a function of the bandwidth h. For most
of the artificial cities the search for the optimal bandwidth is not hard. In general, as expected,
the optimal bandwidth for the general spatial pattern is larger than the one for blacks: given the
segregation pattern, blacks are much closer to each other and the precision of the kernel must be
augmented.
[Insert Figure 6 here]
The selected bandwidth are summarized in the following table23
22
The procedure is very computationally involved since it requires to compute the intensity measure (an integral)
for each block and iterate the numerical maximization routing to find the parameters of the intensity.
23
We had a minor problem with the bandwidth selection for City E. As shown in the table, the optimal h0 for
city E, would be 0.005, which in fact gives kernel estimates that are just little circular regions of almost null radius.
Given the final goal of obtaining a concrete estimate for the conditional probabilities ρ (ξ, b), we experimented various
arbitrary h0 ’s. At the end we considered appropriate to use the bandwidth for whites.
20
Table 2: Optimal bandwidths
Total Blacks Whites
City A 2.83
0.418
2.43
City B 2.605
0.264
0.37
City C 0.37
0.264
0.62
City D 2.445
0.194
2.96
2.85
City E 2.395 0.00524
City F 2.73
2.78
2.75
In order to give an idea about how the intensities estimates look like we show the kernel estimates
for city C in Figure 7 to 9. The visualization of the estimate is suggestive of why we should be
concerned about the neighborhood-based indices of segregation, since the the intensity seems to
vary a lot over the city’s area, at least for some cities. In Figure 7 we present the estimates of
intensity for the population as a whole, in Figure 8 the one for blacks and in Figure 9 the intensity
for nonblacks. In Figure 10 we show the estimated conditional probability of blacks, which is
smoothed out at the border of the central black neighborhood.25
[Insert Figure 7, 8, 9 and 10 here]
The comparison of the spatial dissimilarity with the standard dissimilarity index shows some
interesting patterns, as shown in Table 3.
Table 3: Spatial Dism vs. Traditional
Spatial Dism
Dism
City A
0.9225333
1
City B
0.900698
1
City C
0.9061751
1
City D
0.803017
0.7816667
City E
0.8993939
0.8816667
City F
0.03108531
0.1216667
For the "extreme" cities A, B and C the spatial dissimilarity is smaller than the standard
counterpart: this is the result of smoothing out the conditional probability ρ (ξ, b) over the region,
as a consequence of computing the index based on the individual locations. For the cities D and
E, the ones with a non-squared central ghetto it seems there is generally an accordance among the
two indices, and they look very close. Of course if we change the neighborhoods definition this
would change (see below). For the other "extreme" city, the perfectly integrated F, the spatial
dissimilarity measures less segregation than the standard measure.
The Modifiable Area Unit Problem (MAUP) does not aﬀect the spatial dissimilarity by definition, while it heavily alters the standard measure. If we compute the dissimilarity index using
diﬀerent levels of aggregation of the data we will get diﬀerent numbers. The problem is amplified
when there is a very high level of segregation cause smaller subunits are more homogeneous than
24
The actual bandwidth used in the estimation is the one of whites, i.e h0 = 2.85. This is to avoid weird behaviour
of the estimated intensities and conditional probabilities.
25
When looking at these estimates, we should keep in mind that we used a diﬀerent bandwidth for each type
(total population, blacks and whites), so the visible diﬀerences in intensity over space should not be interpreted as
relevant in the computation of the conditional probabilities, since there we use the same bandwidth for blacks and
total population.
21
bigger ones, hence when using smaller neighborhoods the index will be higher than when using
bigger ones.
The result of our simulations are shown in Table 4
Table 4: Dissimilarity and MAUP
City D
City E
(φdism = 0.803017) (φdism = 0.8993939)
4
0.07666667
0.495
16
0.7816667
0.8816667
64
0.7558333
1.0
We computed the dissimilarity index for several diﬀerent partitions of the cities: 4, 16, and 64
neighborhoods respectively.
For city E we see a clear increase of the index as we increase the number of neighborhoods.
Surprisingly, for city D, the value of the index is not necessarily monotonically increasing in the
number of neighborhoods: from 4 neighborhoods to 16 the index increases, while it decreases from
16 neighborhoods to 64.
Table 4 suggests another potential problem of the neighborhood-based approach: the relationship between the scale of the partition and the index is not necessarily monotonic. This does not
happen in our framework: we will show that the spatial dissimilarity is a monotonically decreasing
function of the bandwidth.
6.2
Metropolitan Areas Census Data
We have computed the index of spatial dissimilarity for all the racial groups and all the US
metropolitan areas in 2000, by using the diﬀerent estimation methods. The computed indices
are available at the website http://netfiles.uiuc.edu/amele2/www/pps/.
For ease of exposition we analyze only blacks and multigroup segregation, showing results for a
sample of 9 MSAs: Detroit, New York, Chicago, Los Angeles, San Francisco, Philadelphia, Boston,
Cleveland, Champaign-Urbana. This is enough to show some of the properties of our measure and
compare it with the traditional approach.
6.2.1
Blacks Segregation
In Figure 11 we report the estimated MSE for the black population in the New York PMSA, as
an example to illustrate the procedure of estimation. The minimizer corresponds to the optimal
bandwidth of 348 meters.
[Insert Figure 11 here]
The corresponding estimated conditional probability is shown in Figure 12: the three main
black areas in the Bronx, Harlem and Bedford-Stuyvesant shown in Figure 4 above, correspond to
the whiter areas in Figure 12, where the conditional probability is close or equal to 1.26
[Insert Figure 12 here]
26
The reader should be aware that Figure 12 is realized with a grid 1000 × 1000, smaller than the grid we actually
used in estimation. The main pattern is nonetheless clear even with the smaller resolution.
22
In Table 5 we present the principal result: we compare the spatial dissimilarity with the traditional indices computed using blocks and tracts as subunits.
The indices reported in column 1 of table 5 are obtained using the approximated kernel regression method. Both the levels of segregation and the ranking of the cities are diﬀerent from those
implied by the traditional approach.
Table 5: Spatial Dissimilarity vs Traditional (2000), Indices and Rankings
Indices
Rankings
Spatial Dism Blocks Tracts Spatial Dism Blocks
Tracts
Detroit
0.8701
0.8655 0.8405
1st
1st
1st
New York
0.6903
0.7013 0.6714
5th
6th
5th
Chicago
0.7632
0.8215 0.7789
2nd
2nd
2nd
Los Angeles
0.6148
0.6266 0.5765
6th
7th
7th
San Francisco
0.5217
0.6149 0.5528
9th
8th
8th
Philadelphia
0.7276
0.7565 0.6897
4th
4th
4th
Boston
0.6009
0.7084 0.6364
7th
5th
6th
Cleveland
0.7532
0.8096 0.7713
3rd
3rd
3rd
Champaign
0.5937
0.6055 0.4468
8th
9th
9th
One could object that this is just a consequence of smoothing out the neighborhood-based
index, but if this is the case we would expect the estimated spatial dissimilarity to have values
between the ones in columns 2 and 3: this is not true, thus we conclude that our index is not
only a smooth version of the neighborhood-based indices but it is able to detect some aspect of
the segregation phenomenon that traditional indices cannot. We think of this method as the most
reliable nonparametric method when using count data.27
In Table 6 we presents results using the kernel regression and diﬀerent bandwidths. One of the
advantages of our approach is the possibility to compute segregation indices at diﬀerent scales.28
The scale is a proxy for the local environment of the individuals and by varying the bandwidth
we can vary the scale of the measurement. Moreover, as suggested in the methodological section,
since the MSE minimization (or any other method to select the bandwidth) will prescribe a diﬀerent
bandwidth for each city according to the specific morphology of the metropolitan area, we may think
that the measurement of segregation is directly dependent on the diﬀerence between bandwidths
among cities. Therefore using the same bandwidth for all the cities may give us more comparable
estimates.
27
For parametric methods we refer the reader to Appendix B. Notice that there is a price to pay when using
parametric methods with count data: we need to specify a parametric model for the intensity function and we need
to know the coordinates of the blocks boundaries in order to perform the integration of the intensity function over
the block region.
28
I am extremely grateful to Patrick Bayer for this suggestion.
23
Table 6: Spatial Dissimilarity, diﬀerent bandwidths
Indices
Rankings
Optimal h = 0 .5 h = 1 Optimal h = 0 .5 h = 1
Detroit
0.8701
0.8536 0.8380
1st
1st
1st
New York
0.6903
0.6878 0.6679
5th
5th
5th
Chicago
0.7632
0.7552 0.7400
2nd
3rd
3rd
Los Angeles
0.6148
0.6027 0.5795
6th
6th
6th
San Francisco 0.5217
0.5275 0.5031
9th
9th
9th
Philadelphia
0.7276
0.7079 0.6738
4th
4th
4th
Boston
0.6009
0.5999 0.5780
7th
7th
7th
Cleveland
0.7532
0.7588 0.7418
3rd
2nd
2nd
Champaign
0.5937
0.5862 0.5459
8th
8th
8th
On the first column of Table 6 we reproduce the kernel regression estimate of Table 5. In
column 2 and 3 we propose the estimate with a bandwidth of half and one kilometer respectively
for each city. The ranking obtained from a .5 km bandwidth is not very diﬀerent from the one
obtained from a 1 km bandwidth, while they are both diﬀerent from the one obtained with MSE
minimization.29
We computed the indices for the 9 MSAs by varying the bandwidth from .1 to 3 km: theoretically
we expect that the index will converge to zero as the bandwidth increases, since the estimated
process will converge to an homogeneous process with zero segregation.
[Insert Figure 13 here]
This result is the same Reardon et al (2006) and Feitosa et al (2006) found, but since they
do not assume any stochastic process they are not able to give a theoretical justification of the
negative relationship. Notice that the rankings are quite stable as a function of the bandwidth.
6.2.2
Multigroup Segregation
For the multigroup version of the dissimilarity we computed similar tables. Table 7 has the same
structure of Table 5. In order to avoid conditional probabilities that do not sum to one, we have
to use the same bandwidth for each racial group. It is then quite arbitrary to decide which one to
use. We think the optimal bandwidth of the entire population is the safest bet, so we used that in
all the computation.
29
The ranks in column 5 and 6 are the same only for the 9 cities in our tables, but they change for other cities.
24
Table 7: Multigroup Spatial Dissimilarity vs Traditional (2000)
Indices
Rankings
Spat Dism Blocks Tracts Spat Dism Blocks Tracts
Detroit
0.8286
0.8530 0.8190
1st
1st
1st
New York
0.6054
0.6647 0.4783
5th
6th
6th
Chicago
0.6563
0.7705 0.7213
4th
4th
3rd
Los Angeles
0.4834
0.5381 0.4780
8th
8th
7th
San Francisco
0.4770
0.5276 0.4706
9th
9th
8th
Philadelphia
0.6966
0.7931 0.7127
3rd
3rd
4th
Boston
0.5336
0.6713 0.5479
7th
5th
5th
Cleveland
0.7208
0.8345 0.7980
2nd
2nd
2nd
Champaign
0.5466
0.5780 0.4376
6th
7th
9th
As for the dichotomous case the spatial dissimilarity in column 1 implies diﬀerent levels of
segregation than the traditional one. The ranking is slightly diﬀerent too.
In Table 8 we repeat the exercise of Table 6, using a fixed bandwidth for all the cities. The
diﬀerence in the indices are not striking, while the rankings appear quite similar.30
Table 8: Multigroup Segregation, diﬀerent bandwidths (2000)
Indices
Rankings
Optimal h = 0 .5 h = 1 Optimal h = 0 .5 h = 1
Detroit
0.8286
0.7945 0.7734
1st
1st
1st
New York
0.6054
0.6020 0.5803
5th
5th
5th
Chicago
0.6563
0.6412 0.6216
4th
4th
4th
Los Angeles
0.4834
0.4715 0.4488
8th
9th
9th
San Francisco 0.4770
0.4815 0.4618
9th
8th
8th
Philadelphia
0.6966
0.6712 0.6338
3rd
3rd
3rd
Boston
0.5336
0.5322 0.5053
7th
7th
6th
Cleveland
0.7208
0.7287 0.7071
2nd
2nd
2nd
Champaign
0.5466
0.5389 0.5028
6th
6th
7th
The first conclusion we may draw from these table is that the bandwidth choice is crucial for the
correct measurement of the segregation levels and more research is needed in order to improve the
quality of the estimates. Nonetheless the results are suggestive of the diﬀerences of our measures
and the neighborhood-based approach.
We computed the correlation among our indices and the neighoborhood-based ones in Table
9. We present correlations with the standard dissimilarity, the isolation index, the information
index and the Gini index (see Massey and Denton (1988) or Reardon and Firebaugh (2004) for a
detailed description). For blacks we also show the correlation with the Spectral Segregation Index
of Echenique and Fryer (2006), which is the only index based on individuals locations available in
literature.
30
If we consider all the cities we have some diﬀerences.
25
Table 9: Correlations with traditional indices
Panel A: Blacks
SDism (opt) SDism (.5km) SDism (1km)
SSI
Dism
SDism (.5km)
0.9773
SDism (1km)
0.9462
0.9702
SSI
0.7044
0.7213
0.7905
Dissimilarity
0.6675
0.6640
0.7522
0.5740
Isolation
0.7371
0.7468
0.8227
0.9000 0.7810
Information
0.7290
0.7313
0.8234
0.7926 0.9210
Gini
0.6749
0.6706
0.7577
0.5905 0.9897
Isol
Info
0.9545
0.7797
0.9180
Panel B: Multigroup
SDism (.5km)
0.9854
SDism (1km)
0.9728
Dissimilarity
0.7484
Isolation
0.7241
Information
0.7470
Gini
0.7430
0.9544
0.8442
0.9402
0.9825
0.7485
0.7258
0.7429
0.7455
0.8244
0.7990
0.8176
0.8157
0.8821
0.9530
0.9860
For the dichotomous version of the index (blacks) in Panel A, the correlation with the standard
dissimilarity is between .6675 and .7522, indicating that we are not just replicating the measurement
of the dissimilarity: our index captures something that the neighborhood-based dissimilarity cannot,
i.e. the individual locations and exposure to other races (via the spatially varying conditional
probabilities). Similarly the correlation with the Gini is between .6749 ad .7577. Notice that
Gini and Dissimilarity are almost perfectly correlated. The correlation with the Information and
Isolation indices is slightly higher but still far from one.
It is interesting to notice the correlation with the Spectral Segregation Index (SSI), that varies
from .7044 to .7905, showing that we are not just replicating the measures of Echenique and Fryer
(2006). Also notice the high correlation of the SSi and the Isolation index: the SSI is based on the
interactions among points so it is not surprising that it is highly correlated with the isolation index
that measures the exposure to other race neighbors. The correlations for the multigroup indices in
Panel B are slightly higher but the pattern is similar.
In Appendix C we present the results obtained using the kernel regression method for smoothing
the proportions in each block. The results do not perfectly overlap the ones presented here, since
with proportions the approximation is more demanding in terms of smoothness of the intensity.
All these results show that the choice of the estimation method is important in this context: the
researcher should choose the estimation strategy based on the data availability (point pattern or
count data) but also on the a priori information on the smoothness of the intensity. We leave the
development of alternative methods to future research, while the interested reader can get a flavor
of what can be done with parametric methods by reading Appendix B.
7
Conclusion and Discussion
In this work we have shown a new approach to measure residential segregation with an application
to the racial segregation. We assume that the locations of individuals of diﬀerent racial groups
26
follow an Inhomogeneous Marked Poisson Process over the metropolitan area and we compute
the conditional probability that in a specific location there is an individual of racial group m.
If there is no segregation this conditional probability should not vary over space. We build a
segregation index analogous to the dissimilarity and we show that it is immune from the problems
arising with neighborhood-based measures: it does not depend on arbitrary partitions of the city
in neighborhoods, it is a function of the individuals’ locations and it is immune from the modifiable
area unit problem. Furthermore the index computed according to our approach gives diﬀerent
rankings of the cities than traditional measures, proving that this methodology doesn’t provide
only a refinement of the existing indices.
This framework is very flexible and future research will be devoted to explore all its potential.
The main assumption we imposed is that the pattern of locations is a realization of a Inhomogeneous
Multitype Poisson process over the metropolitan area: this amounts to assume that the univariate
poisson processes are independent, i.e. for example the spatial location of blacks is independent by
the spatial location of whites in the urban area. While this assumption may seem inappropriate
when measuring racial segregation, it provides a useful benchmark. Future research will explore
models where the interaction of diﬀerent events is explicitly modeled. In particular, of great interest
are the Marked Pairwise Interactions Models, that can actually take into account the possible
"repulsion" or "attraction" of the diﬀerent events/points and marks.31
Another interesting application is the measurement of income segregation, where the marks
space is continuous. The definition of complete segregation is slightly diﬀerent: The marked point
process X is completely segregated if and only if for all ξ ∈ x0 , ∃m∗ = m∗ (ξ) ∈ M such that
ρ (ξ, m) = δ (m − m∗ ), where δ (u) is the Dirac-Delta function. In this case the spatial dissimilarity
is
Z
1 X
|ρ (ξ, m) − ρ (m)| dm
φdism =
2n
M
ξ∈x0
where we have replaced the sum over the racial groups by the integral over M = [0, ∞) and we
use an analogous normalization. The estimation is also more complicated in this case so we refer
the reader to a companion paper in progress. We have shown that the framework can be easily
extended for the measurement of subgroups segregation or multilevel segregation. It should be
noted that when considering multidimensional marks the independence assumption holds for the
vector m and not for the components of m: for example, if we consider the segregation of race (r)
and income (y) together, we assume independence across pairs m = (r, y) but not among r and y.
This implies that we can account for any possible correlation among the submarks r and y and use
the joint distribution of race and income ρ (r, y) to describe the non segregated case. We leave this
development to future eﬀorts.
In this work we have shown a "descriptive" theory of segregation, where the index is a function
of the specific realization. However the index is a random variable itself, being a function of the
point process realization. So we can build a test that probabilistically assess if a city is more
segregated than another one. The test provided in Kelsall and Diggle (1998) or in Diggle, Zheng
and Durr (2005) is for detection only: the null hypothesis is no segregation and rejecting means that
there is segregation, wihout referring to the level. The development of tests is highly influenced
by the computational speed, therefore the experimentation of faster and more precise estimation
methods is necessary. We think of experimenting kernel regression methods (Kelsall and Diggle
(1998) and Diggle, Zheng and Durr (2005)), total variation regularization methods used in density
31
See Diggle (2003) or Moller and Waagepetersen (2004) for an extensive introduction to these models.
27
estimation (see for example Koenker and Mizera (2004))32 and other smoothing techniques. The
parametric methods analyzed in Appendix B are appealing when we have count data at the block
level or other small areas, as long as we have the boundaries of these polygons. The drawbacks
are that we have to specify a parametric model for the intensity and the numerical optimization
routine can be computationally very slow.
Finally, in this work we present what can be called a "descriptive" theory of segregation indices.
As we already noticed above, the index is a random variable, depending on the realization of the
point process: ideally we would like to know how the segregation index changes as a function of
the process parameters. In Mele (2007, in progress) we provide some theoretical results for indices
under the same assumptions used in this work. The Poisson assumption simplify the problem of
computing the moments (in particular expectation and variance) of the index. We prove that given
an intensity function we are able to compute the moments of any index. The only drawback is
that in practice these are computationally very diﬃcult, involving the evaluation ofPan infinite
sum of infinite integrals. However, if we restrict the attention to indices of the form ξ∈X h (ξ),
i.e. if we define the index as the sum over all the events of a particular individual level index
(or function), we can show that the moments reduce to a simple integral that can be computed
with traditional numerical integration methods. This allow us to estimate if (say) New York is on
average structurally more segregated than Chicago, where structurally is interpreted as conditional
on the intensity function.
References
[1] Ananat, Oltmans Elizabeth (2007), "The Wrong Side(s) of the Tracks: Estimating the Causal
Eﬀect of Racial Segregation on City Outcomes", mimeo, Duke University and NBER
[2] Anselin, Luc (1995), "Local Indicators of Spatial Association - LISA", Geographical Analysis
27(2):93-115
[3] Baddeley, Adrian and Turner, Rolf (2005), "spatstat: An R Package for Analyzing Spatial
Point Patterns", Journal of Statistical Software, 12(6):1-42
[4] Berman, Mark and Diggle, Peter (1989), "Estimating Weighted Integrals of the Second-Order
Intensity of a Spatial Point Process", Journal of the Royal Statistical Society, Series B, 51(1):8192
[5] Card, David and Rothstein, Jesse (2007), "Racial Segregation and the Black-White Test Score
Gap", forthcoming, Journal of Public Economics.
[6] Cutler , D. M. and Glaeser, E. L. (1997), "Are Ghettos Good or Bad", Quarterly Journal of
Economics, 112: 827-872
[7] Cutler, D. M., Glaeser, E.L. and Vigdor, Jacob L. (1999), The Rise and Decline of the American
Ghetto, Journal of Political Economy, 107(3):455-506
[8] Daley, D. J. and Vere-Jones, D (2003), "An Introduction to the Theory of Point Processes",
Springer, 2nd Edition
32
This methods are likely to produce better results with income segregation, where the continuity of the marks
creates problem in the kernel estimation in the form of Dirac catastrophe.
28
[9] Diggle, Peter (1983), "Statistical Analysis of Spatial Point Patterns", Academic Press, London,
First Edition
[10] Diggle, Peter (1985), "A Kernel Method for Smoothing Point Process Data", Applied Statistics,
34(2):138-147
[11] Diggle, Peter (2003), "Statistical Analysis of Spatial Point Patterns", Academic Press, London,
Second Edition
[12] PJ Diggle, SJ Eglen, JB Troy (2006). "Modelling the Bivariate Spatial Distribution of Amacrine
Cells", In A. Baddeley et al. (Eds.) Case Studies in Spatial Point Process Modelling, Springer
Lecture notes in Statistics 185:215—233
[13] Diggle, Peter, Zheng, Pingping and Durr, Peter (2005), "Nonparametric estimation of spatial
segregation in a multivariate point process: bovine tubercolosis in Cornwall, UK", Applied
Statistics, 54(3):645-658
[14] Echenique, Federico and Fryer, Roland (2007), "A Measure of Segregation Based on Social
Interactions", Quarterly Journal of Economics 122(2):441-485
[15] Feitosa, Flavia, Camara, Gilberto, Monteiro, Antonio M. V., Koschitzki, Thomas, and Silva,
Marcelino P. S. (2007), "Global and Local Spatial Indices of Urban Segregation", International
Journal of Geographical Information Science 21(3):299-323
[16] Glaeser, E. L. and Vigdor, Jacob L. (2000), Racial Segregation in 2000 Census: Promising
News, Center of Urban and Metropolitan Policy, The Brookings Institution Survey Series,
April
[17] Kelsall, Julia E. and Diggle, Peter J. (1998), "Spatial Variation in Risk of Diseases: A Nonparametric Binary Regression Approach", Applied Statistics, 47(4):559-573
[18] Koenker, Roger and Mizera, Ivan (2004), "Penalized Triograms: Total Variation Regularization for Bivariate Smoothing", Journal of Royal Statistical Society: Series B (Statistical
Methodology) 66(1):145-163
[19] La Ferrara, Eliana and Mele, Angelo (2006), "Racial Segregation and Public School Expenditure", CEPR Discussion Paper 5750
[20] Massey, Douglas S. and Denton, Nancy A. (1988). The Dimensions of Residential Segregation,
Social Forces, 67(2):281-315
[21] Mele, Angelo (2007), "Poisson Indices of Segregation", in progress, UIUC
[22] Moller, Jesper and Waagepetersen, Rasmus Plenge (2004), "Statistical Inference and Simulation for Spatial Point Processes", Monographs on Statistics and Applied Probability 100,
Chapman and Hall
[23] Reardon, Sean F. and Firebaugh, Glenn (2002), Multigroup Segregation Indices, Sociological
Methodology 32:33-68
[24] Reardon, Sean F. and O‘Sullivan, David (2004), "Measures of Spatial Segregation", Sociological Methodology 34:121-162
29
[25] Reardon, Sean F, O‘Sullivan, David, Lee, Barrett A., Firebaugh, Glenn, Farrell, Chad (2006),
"The Segregation Profile: Investigating How Metropolitan Racial Segregation Varies by Spatial
Scale", WP 06-01, Stanford University
[26] Rowlingson, B.S. and Diggle P.J. (1993), "Splancs: Spatial Point Patterns Analysis Code in
S-Plus", Computers and Geosciences, 19:627-655
[27] Stoyan, D., Kendall, W.S. and Mecke, J. (1987), "Stochastic Geometry and Its Applications",
John Wiley and Sons
[28] Stoyan, D. and Stoyan, H (1994), "Fractals, Random Shapes and Point Fields: Methods of
Geometrical Statistics", Wiley Series in Probability and Mathematical Statistics, John Wiley
and Sons
[29] Zhuang J., Ogata Y. and Vere-Jones D. (2005), "Diagnostic Analysis of Space-Time Branching
Processes for Earthquakes" Chap. 15 (Pages 275-290) of Case Studies in Spatial Point Process
Models, Edited by Baddeley A., Gregori P., Mateu J., Stoica R. and Stoyan D. Springer-Verlag,
New York.
30
A
Preliminaries of Point Processes Theory
This appendix elaborates from various sources and we only cover what we consider a prerequisite
for the understanding of the methodology used to measure residential segregation: we refer the
interested reader to the books listed in the references for a more exhaustive treatment of the
theory.
A.1
Basic properties and definitions
A spatial point process is a stochastic process that maps countable sets in planar regions. More
generally a point process X is a random countable subset of a space S ⊆ R2 . We will denote the
random set as X = {xi } or according to the context as the random variable N (A), that is the
number of points in set A ⊆ S. We will denote the realizations of X as x and the realizations of N
as n. We will denote a generic point in S as ξ or η and the generic point of the process as xi . We
will write |A| to indicate the area of region A and dξ to refer to the infinitesimal region containing
ξ.
We will consider only finite point fields. Formally
DEFINITION A.1 Let’s consider any realization of the process x ⊆ S and let’s denote the cardinality of the set as n (x). Then we say that x is locally finite if n (x ∩ A) < ∞, for any bounded
A⊆S
Consider the set of all such realizations x
N1f = {x ⊂ S : n (x ∩ A) < ∞, for any bounded A ⊆ S}
whom elements are locally finite point configurations. In the following we will consider only
processes X with realizations in N1f .
The first important concept is stationarity: a point process is stationary if when observed
from diﬀerent sets on the plane, the configurations of the points are similar, diﬀerences arising
from randomness (that follows the same laws). More formally, a point process is stationary if all
probability statements about the process in any bounded set A of the plane are invariant under
arbitrary translations. This property is very important in defining the randomness of the process
as we’ll see below.
DEFINITION A.2 (Stationarity) A point process X is stationary if for any p ∈ R2 , the translated process Xp = X + p = {xi + p : xi ∈ X} and X have the same distribution, i.e. P (X ∈ A) =
P (Xp ∈ A).
This implies that all the statistics are invariant under translation, e.g. EN (A) = ENp (A) are
constant over the region A.
A point process is isotropic if the invariance holds under arbitrary rotations.
DEFINITION A.3 (Isotropy) A point process X is isotropic if for any m ∈ R, the processes
X and mX have the same distribution, i.e. P (X ∈ A) = P (mX ∈ A)
A process that is stationary and isotropic is called motion-invariant. For convenience we will
also assume that the process is simple (or orderly), i.e that multiple coincident events cannot occur.
Formally we have the following
DEFINITION A.4 (Orderliness) A point process X is orderly (simple) if xi 6= xj for all i 6= j.
31
A.2
First and Second Order Properties
Consider a process X defined over S ⊆ R2 . The intensity function is a locally integrable function
λ : S → [0, ∞), defined as the limit of the expected number of points per infinitesimal area
¾
½
E [N (dξ)]
(A.1)
λ (ξ) = lim
|dξ|
|dξ|→0
R
A function is locally integrable if λ (ξ) dξ < ∞ for all bounded A ⊆ S. If we assume stationA
arity then λ (ξ) = λ for all ξ.
The second order intensity function is defined as
¾
½
E [N (dξ) N (dη)]
λ2 (ξ, η) = lim
|dξ| |dη|
|dξ|,|dη|→0
(A.2)
and it is a measure of the spatial assotiation of the process. If we assume stationarity and
isotropy then λ2 (ξ, η) = λ2 (kξ − ηk), it is a function of the euclidean distance among the two
points.
It is convenient to define another quantity: the intensity measure of a point process X is defined
for A ⊆ S as
Z
λ (ξ) dξ
(A.3)
Λ (A) = EN (A) =
A
It is usually assumed that Λ (A) is locally finite, i.e. Λ (A) < ∞ for all bounded A ⊆ S, and
diﬀuse, i.e. Λ ({ξ}) = 0, for ξ ∈ S (or alternatively @ξ ∈ S s.t. Λ ({ξ}) > 0)
The fact that Λ (A) is diﬀuse implies that P [N (dξ) > 1] = o (|dξ|), i.e. there are no coincident
points, so the process is simple (or orderly).
The intensity function has also an infinitesimal interpretation, since the fact that P [N (dξ) > 1] =
o (|dξ|) implies that E [N (dξ)] converges to P [N (dξ) = 1] as |dξ| → 0.33 It follows that the quantity λ (ξ) dξ can be interpreted as the probability of an event in the infinitesimal region dξ, i.e
λ (ξ) dξ ≈ P [N (dξ) = 1]. Analogously Notice that E [N (dη) N (dξ)] ≈ P [N (dη) = N (dξ) = 1], for
ξ and η close, and we can interpret the quantity λ2 (ξ, η) dξdη as the probability of observing two
events in the infinitesimal regions dξ and dη.
A.3
Poisson Processes
The Poisson point processes are by far the most important in applications and are the models that
define the notion of complete spatial randomness.
Before going over the general definition of a Poisson process we have to consider a related
process. Let’s consider any density function f defined on A ⊆ S and let n ∈ N
DEFINITION A.5 (Binomial Point Process) A point process X is a Binomial Point Process
of n points in A with density f if it consists of n i.i.d. points with density f . We will denote such
a process as X ∼ Bin (A, n, f ).
33
With a back-of-the-envelope computation
E [N (dξ)]
=
P [N (dξ) = 1] E [ N (dξ)| N (dξ) = 1] + P [N (dξ) > 1] E [ N (dξ)| N (dξ) > 1]
=
P [N (dξ) = 1] + P [N (dξ) > 1] E [ N (dξ)| N (dξ) > 1]
and as |dξ| → 0 the claim is proven.
32
Since f is a density function, i.e.
R
A
f (ξ) dξ = 1, it follows necessarily that |A| > 0. The
simplest Binomial point process has finite A, i.e. |A| < ∞, and the points are drawn from a
uniform distribution over A, so that f (ξ) = |A|−1
DEFINITION A.6 (Poisson Point Process) A point process X on S is a Poisson Point
Process with intensity λ (ξ) if the following two conditions are satisfied:
1. for any bounded A ⊆ S with Λ (A) < ∞
P [N (A) = n] = [Λ (A)]n
exp [−Λ (A)]
,
n!
n = 0, 1, 2, ....
2. for any n ∈ N and any bounded A ⊆ S with 0 < Λ (A) < ∞ , conditional on N (A) = n
with f (ξ) = λ (ξ) /Λ (A) = λ (ξ) /
R
X ∼ Bin (A, n, f )
A λ (ξ) dξ.
We will write X ∼ P oi (S, λ (ξ)).
Given the condition (1) of the definition, for any bounded A ⊆ S, we have EN (A) = Λ (A). In
many works condtion (2) is replaced by the independent scattering condition.
3. for disjoint sets A1 , A2 , A3 , ...Ak ⊆ A the random variables N (A1 ) , N (A2 ) , N (A3 ) , ... are
stochastically independent, i.e.
P [N (A1 ) = n1 , ..., N (Ak ) = nk ] = [Λ (A1 )]n1
exp [−Λ (A1 )]
exp [−Λ (Ak )]
× · · · × [Λ (Ak )]nk
n1 !
nk !
for n = n1 + n2 + ... + nk .
It is straightforward to show that condition (3) is implied by (1) and (2) of the above definition.34
A Poisson Point Process is said Homogeneous (or stationary) if λ (ξ) = λ, for all ξ ∈ S and
f (ξ) = |A|−1 , for any bounded A ⊆ S. It follows that for an Homogeneous Poisson Process (HPP)
EN (A) = λ |A|.
34
Let’s consider the case in which we have only two disjoint sets, i.e. A = A1 ∪ A2 . The extension to k sets is done
by induction. Conditional on N (A) = n1 + n2 = n, P [ξ ∈ (X ∩ A)] = f (ξ) = λ (ξ) /Λ (A). Then given N (A) = n,
]
Λ (A1 )
f (ξ) dξ =
P [ N (A1 ) = 1| N (A) = n] =
Λ (A)
A1
ln1
k
1)
and also
and by condition (2), P [ N (A1 ) = n1 | N (A) = n] = Λ(A
Λ(A)
P [ N (A1 ) = n1 , N (A2 ) = n2 | N (A) = n]
=
$
#
n n
Λ (A1 ) 1 Λ (A2 ) 2
n1 + n2
n1
Λ (A)
Λ (A)
=
[Λ (A1 )]n1 [Λ (A2 )]n−n1
n!
n1 ! (n − n1 )!
Λ (A)n
and thus (1) implies that the unconditional probability is
P [N (A1 ) = n1 , N (A2 ) = n2 ]
=
=
[Λ (A1 )]n1 [Λ (A2 )]n−n1
exp [−Λ (A)]
n!
[Λ (A)]n
n1 ! (n − n1 )!
[Λ (A)]n
n!
exp
[−Λ
(A
)]
exp
[−Λ
(A
)]
1
2
[Λ (A1 )]n1
[Λ (A2 )]n−n1
n1 !
(n − n1 )!
33
DEFINITION A.7 (Homogeneous Poisson Process) A point process X on S is an Homogeneous Poisson Point Process with intensity λ if the following two conditions are satisfied:
1. for any bounded A ⊆ S
P [N (A) = n] = [λ |A|]n
exp [−λ |A|]
,
n!
n = 0, 1, 2, ....
2. for any n ∈ N and any bounded A ⊆ S, conditional on N (A) = n
´
³
X ∼ Bin A, n, |A|−1
The HPP is considered the ideal of complete spatial randomness in literature. Complete spatial
randomness means that we do not expect the intensity of the process to vary over the region we are
considering and that there are no interactions amongst diﬀerent events. Indeed, by condition (1)
and the fact that λ (ξ) = λ, an HPP shows stationarity and isotropy, cause N (A) ∼ P oisson (λ |A|),
and thus the expected number of events does not vary over the planar region A; by condition (2)
and f (ξ) = |A|−1 , we have no clustering or inhibition (the presence of a point in ξ does not make
more or less likely the occurrence of an event η in the neighborhood of ξ).
A Poisson Point Process is Inhomogeneous (IPP) if the intensity function is not constant over
A, thus is nonstationary and anisotropic. The IPP is the simplest class of nonstationary point
processes used in applications.
A.4
Marked Point Processes
Consider a point process X0 defined over the space S ⊆ R2 . If there are random marks m (ξ) ∈ M
attached to each point ξ ∈ X0 then the process
X = { {ξ, m (ξ)}| ξ ∈ X0 }
is called marked point process with events in S and marks in M.
Notice that M may be either a finite set, i.e. M = {1, 2, ..., M }, in which case X is multitype process, or a general subset M ⊆ Rq , q ≥ 1 (it can also be a set of compact subsets, i.e.
M = {F : F ⊆ Rq }, that is called boolean model ).
In the case of categorical variables we use a finite set (for example when considering the racial
groups).
DEFINITION A.8 (Marked Poisson Process) The process X = { {ξ, m (ξ)}| ξ ∈ X0 } is a
Marked Poisson Process if
1. X0 is a Poisson Point Process over S with intensity function λ0 (ξ) (with
all bounded A ⊆ S)
R
A
λ0 (ξ) dξ < ∞ for
2. conditional on X0 the marks { m (ξ)| ξ ∈ X0 } are mutually independent
The intensity λ (ξ, m) of the Marked Poisson Process is such that
R
M
λ (ξ, m) dm = λ0 (ξ). We
have the following proposition (for a proof see Proposition 3.9 in Moller and Waagepetersen (2004),
p. 26)
34
If X = { {ξ, m (ξ)}| ξ ∈ X0 } is a Marked Poisson Point Process with
PROPOSITION A.1
M ⊆ Rq , q ≥ 1 and if
1. conditional on X0 , the marks have distribution ρ (ξ, m, X0 Âξ) = ρ (ξ, m)
2. the intensity of the process can be written as λ (ξ, m) = λ0 (ξ) ρ (ξ, m)
then X ∼ P oi (S × M, λ (ξ, m))
The proposition is very useful in the framework we use in measuring segregation, cause it implies
that the Marked Poisson Process is Poisson over the enlarged space S × M and we can use the
standard estimation methods developed for Poisson Processes.
Another useful corollary of the proposition is the following
PROPOSITION A.2 Consider a Multitype Point Process with M = {1, 2, ..., M } and a multivariate point process (X1 , X2 , ..., XM ). The following two properties are equivalent
1. P ( m (ξ) = m| X0 = x0 ) = ρ (ξ, m) does not depend on X0 Âξ
2. (X1 , X2 , ..., XM ) is a multivariate Poisson Process with Xm ∼ P oi (S, λ (ξ, m)) mutually independent and λ (ξ, m) = λ0 (ξ) ρ (ξ, m), m = 1, ..., M
When the conditional mark distribution does not depend on location, ρ (ξ, m) = ρ (m) for all
ξ, then we have random labelling.
B
Alternative Estimation Methods
B.1
Parametric Estimation with Point Pattern Data
In the case of the Inhomogeneous Poisson Process the likelihood can be written easily by exploiting
the definition
P (X) = P ( X| N (S) = n) P (N (S) = n)
!
Ãn ·
Y λ (xi ) ¸ exp [−Λ (S)]
[Λ (S)]n
=
Λ (S)
n!
i=1
By taking the logs and rearranging we get
log P (X) =
=
=
n
X
i=1
n
X
i=1
n
X
i=1
log λ (xi ) − n log Λ (S) − Λ (S) + n log Λ (S) − log (n!)
log λ (xi ) − Λ (S) − log (n!)
log λ (xi ) −
Z
S
λ (ξ) dξ − log (n!)
So the parameters of the intensity function can be estimated using maximum likelihood techniques
b
θ = arg max
θ
n
X
i=1
log λθ (xi ) −
35
Z
S
λθ (ξ) dξ
B.2
Recovering Intensity from Count data
When the available data are not point patterns, but aggregated by area, we can use the independent
scattering property of the Poisson Process to recover the intensity. Suppose the metropolitan are
K
[
Sk and Sk ∩ Sl = ∅, for k 6= l, that we will call
S is partitioned in K disjoint subunits, S =
k=1
blocks (but may be arbitrarily small areas).
By the independent scattering property the counting variables over disjoint regions are independent. Therefore
P (N (S1 ) = n1 , ..., N (SK ) = nK ) = P (N (S1 ) = n1 ) · · · P (N (SK ) = nK )
exp [−Λ (S1 )]
exp [−Λ (SK )]
[Λ (S1 )]n1 · · ·
[Λ (SK )]nK
=
n1 !
nK !
"K
#
K
Y
Y
=
n−1
!
exp
[−Λ
(S)]
[Λ (Sk )]nk
k
k=1
k=1
"
So the log-likelihood can be written as (we don’t consider
K
Y
#
n−1
k ! , since it is constant
k=1
log P (X) = −Λ (S) +
= −
= −
Z
K
X
nk log [Λ (Sk )]
k=1
λ (ξ) dξ +
S
K ·Z
X
k=1
K
X
nk log
λ (ξ) dξ +
Sk
λ (ξ) dξ
Sk
k=1
¸
·Z
K
X
·Z
nk log
¸
λ (ξ) dξ
Sk
k=1
¸
and we can estimate the intensity function, assuming a parametrization θ
b
θ = arg max
θ
K
X
k=1
nk log
·Z
Sk
¸
λθ (ξ) dξ −
K ·Z
X
k=1
Sk
λθ (ξ) dξ
¸
The main price to pay is the necessity to specify a functional form for the intensity function in
order to compute the integral. Furthermore the integral cannot be computed if we don‘t have the
coordinate of the blocks boundaries. The Census releases boundary files down to the block group
level, but not at the block level. An alternative is to use the Tiger Files.
C
Kernel Regression using Proportions
In this appendix we show the results obtained using the kernel regression of the proportions of each
racial groups in the block. The results are slightly diﬀerent, confirming the fact that this is an
approximated estimate.
36
Table C1: Spatial Dissimilarity, diﬀerent bandwidths (2000)
Indices
Rankings
Optimal
h = .5
h=1
Optimal h = .5 h = 1
Detroit
0.8703224 0.8501121 0.8352947
1st
1st
1st
New York
0.6877347 0.6848539 0.6675551
5th
5th
5th
Chicago
0.7606747 0.7498699 0.7345411
2nd
3rd
3rd
Los Angeles
0.629817 0.6175982 0.5955263
7th
6th
6th
San Francisco 0.5319183 0.5397131 0.5041305
10th
10th
9th
Philadelphia
0.7272086 0.7076749 0.6758686
4th
4th
4th
Boston
0.6034576 0.6022681 0.5777609
8th
7th
7th
Cleveland
0.7496532 0.7559522 0.7377916
3rd
2nd
2nd
Champaign
0.5962014 0.5898408 0.5583697
9th
8th
8th
Laredo
0.676968 0.5063632 0.3431035
6th
9th
10th
Table C2: Multigroup Segregation, diﬀerent bandwidths (2000)
Indices
Rankings
Optimal
h = 0 .5
h =1
Optimal h = 0 .5 h = 1
Detroit
0.8290554 0.7929796 0.7712272
1st
1st
1st
New York
0.6047147 0.6011003 0.5804878
5th
5th
5th
Chicago
0.6575917 0.6392469 0.6171649
4th
4th
4th
Los Angeles
0.4906992 0.4776546 0.4549143
8th
9th
9th
San Francisco 0.4741394 0.4796534 0.4552766
9th
8th
8th
Philadelphia
0.6979656 0.6727337 0.6367538
3rd
3rd
3rd
Boston
0.5337201 0.5320079 0.4981971
7th
7th
7th
Cleveland
0.7185795 0.7274199 0.7031806
2nd
2nd
2nd
Champaign
0.5476806 0.5401149 0.5064247
6th
6th
6th
Laredo
0.3056275 0.2343179 0.1868961
10th
10th
10th
37
Table C3: Correlations with traditional indices
Panel A: Blacks
SDism (opt)
SDism (.5km)
0.9778
SDism (1km)
0.9523
SSI
0.6302
Dissimilarity
0.6163
Isolation
0.6677
Information
0.6630
Gini
0.6220
Panel B: Multigroup
SDism (.5km)
0.9846
SDism (1km)
0.9756
Dissimilarity
0.7190
Isolation
0.6870
Information
0.7199
Gini
0.7192
SDism (.5km)
SDism (1km)
SSI
Dism
Isol
Info
0.9750
0.6470
0.6144
0.6772
0.6660
0.6199
0.7347
0.7100
0.7687
0.7699
0.7138
0.5740
0.9000
0.7926
0.5905
0.7810
0.9210
0.9897
0.9545
0.7797
0.9180
0.9856
0.7197
0.6888
0.7164
0.7228
0.7920
0.7633
0.7914
0.7900
0.8821
0.9530
0.9860
0.9544
0.8442
0.9402
38
7
6
5
y
4
3
2
1
1
2
3
4
y
5
6
7
8
CITY B
8
CITY A
1
2
3
4
5
6
7
8
1
2
3
4
x
5
6
7
8
6
7
8
x
7
6
5
y
4
3
2
1
1
2
3
4
y
5
6
7
8
CITY D
8
CITY C
1
2
3
4
5
6
7
8
1
x
2
3
4
5
x
Figure 1: Diﬀerent partitions matter
1
HPP
IPP
MultitypePP
MPP
Figure 2: Examples of realizations of point processes
2
3
2
w[,2]
1
2
0
0
1
pp[,2]
3
4
Figure 7: CITY B
4
Figure 7: CITY A
0
1
2
3
4
0
1
pp[,1]
4
3
4
3
4
4
0
0
1
2
yc
3
3
2
1
w[,2]
3
Figure 7: CITY D
4
Figure 7: CITY C
0
1
2
3
4
0
1
w[,1]
2
xc
Figure 7: CITY F
2
1
0
0
1
2
yc
3
3
4
4
Figure 7: CITY E
yc
2
w[,1]
0
1
2
3
4
0
1
xc
2
xc
Figure 3: Artificial data
3
150
100
Northings
50
0
0
20
40
60
80
Eastings
Figure 4: Geographic distribution of blacks in New York PMSA, 2000
4
150
100
Northings
50
0
0
20
40
60
80
Eastings
Figure 5: Geographic distribution of racial groups in New York PMSA, 2000
5
Figure 6: Estimated MSE and optimal bandwidths for cities A and C
6
Figure 7: Estimated λ0 (ξ) for city C
7
Figure 8: Estimated λ (ξ, b) for city C
8
Figure 9: Estimated λ (ξ, nb) for city C
9
Figure 10: Estimated conditional probability ρ (ξ, b) for city C
10
Figure 11: Estimated conditional probability for blacks in New York PMSA
11
Figure 12: Estimated conditional probability of black in New York PMSA, 2000
12
Figure 13: Spatial Dissimilarity and Scale
13