IDENTIFICATION OF CRASH CONTRIBUTING FACTORS: U. R. R. Manepalli, *

1
IDENTIFICATION OF CRASH CONTRIBUTING FACTORS:
2
EFFECTS OF SPATIAL AUTOCORRELATION AND SAMPLE DATA SIZE
3
4
1
U. R. R. Manepalli, *2G. H. Bham, Ph.D.
5
6
7
8
9
10
11
12
13
1
Civil, Architectural and Environmental Engineering
Missouri University of Science and Technology
2
Civil Engineering Department
University of Alaska, Anchorage
*Corresponding Author
E-mail: [email protected], [email protected]
14
15
16
17
18
Total words in abstract = 255
19
Total words in paper = 5143 + 255 + 1*250 (Figures) + 6*250 (Tables)
20
= 7148
21
22
23
24
25
26
27
28
29
30
31
32
33
1
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
IDENTIFICATION OF CRASH CONTRIBUTING FACTORS:
2
EFFECTS OF SPATIAL AUTOCORRELATION AND SAMPLE DATA SIZE
3
4
5
6
7
8
9
10
11
12
U. R. R. Manepalli
Civil, Architectural and Environmental Engineering
Missouri University of Science and Technology
13
G. H. Bham Ph.D.
Civil Engineering Department
University of Alaska, Anchorage
Corresponding Author
E-mail: [email protected], [email protected]
ABSTRACT
14
This paper uses sample sets of crash data to examine the similarities in crash contributing
15
factors among various counties that have similar effects on spatial autocorrelation (SA). Moran’s
16
I and Getis-Ord G i * statistics were used to determine the correlation, and the multinomial logistic
17
regression to identify the crash contributing factors. Seventy-five counties in the state of
18
Arkansas were divided into five categories based on the Z-values of the Getis-Ord G i * statistic.
19
Depending on the sample size, crash data from a county or a group of counties from each of
20
these categories were used, and factors contributing to crashes in each of the categories were
21
identified based on the crash severity index. Results indicated that most of the crash contributing
22
factors identified for each category were also identified by the crash data from a county or a
23
group of counties in that category. Pulaski county, with the highest Z-value from the first
24
category indicated largest cluster of crashes and identified the highest percentage (55%) of
25
factors that contributed to crashes in that category using sample crash data. From the sample data
26
used, the multinomial logistic regression indicated the following factors to be positively
27
associated with crash severity: nighttime driving, driving under the influence of alcohol, roadway
28
gradient, curved alignment, rural areas, and head-on and sideswipe-same direction collision
29
types. The results of this research can be used for better allocation of funds by departments of
30
transportation to identify crash contributing factors that are associated with higher levels of crash
31
severity by analyzing smaller sets of data.
32
Keywords
33
Spatial autocorrelation, multinomial logistic regression, Moran’s I, Getis-Ord G i * statistic,
34
Geographic Information Systems (GIS)
2
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
INTRODUCTION
2
In 2010, the United Nations General Assembly adopted a resolution, which proclaimed
3
the current decade (2011-2020) as the ‘Decade of Action for Road Safety’ (1). By highlighting
4
the need for increased activities at the national, regional and global levels, the action aims to
5
rethink approaches to traffic safety. Traffic fatalities and injuries constitute a major public health
6
concern worldwide, with nearly 1.3 million fatalities as a result of road crashes, and between 20
7
million and 50 million injuries (1). Highway crashes also cost society an estimated more than
8
$230 billion a year in the United States (2). Thus, there is a vital need to identify crash
9
contributing factors at sites with fatal and sever injury crashes.
10
In recent years, the techniques for estimation of crash prediction and crash contributing
11
factors have improved. With improved methods, the location of crash concentration and the main
12
crash causal factors can be identified. Further research to identify these crash contributing factors
13
with minimal resources is essential as budgets for departments of transportation (DOT) are
14
declining. The Missouri DOT will cut its expenditures by $512 million over the next five-years.
15
The future annual budget for construction and maintenance is also expected to reduce by half,
16
from $1.2 billion to $600 million (3). Therefore, it is imperative to examine techniques that save
17
personnel time and resources, and can identify major crash contributing factors with limited data
18
that will save lives and crash costs to society.
19
Past studies have used crash data to identify high spatial concentration of crashes (4-8).
20
Kim et al. (9) identified spatial and temporal patterns among crashes. McMohan (10) used
21
buffering, cluster analysis, and made spatial queries to analyze pedestrian crash risk in
22
Geographic Information Systems (GIS). Peled et al. (11) generated maps for the distribution of
23
crash concentrations. Premo (12) used global spatial statistics to determine the presence of
24
spatial autocorrelation (SA) in archaeological data. To identify the local trends, spatial statistics
25
was used and previously undetected archaeological trends were discovered. In traffic safety,
26
when researchers considered the effects of SA, they found a significant impact on the estimation
27
of crash risk factors (13-15). Huang et al. (16) and Siddiqui et al. (17) developed Bayesian
28
models to relate various county-level socioeconomic factors and traffic data to crash occurrence
29
while accounting for possible SA among adjacent counties. Li et al. (18) used Bayesian approach
30
to identify and rank road segments for hotspot identification. They generated three dimensional
31
maps to express the crash risk for various road segments and recommended determining the
3
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
contributing factors for high risk road segments. In summary, extensive research has been
2
conducted in identifying spatial clusters. However, to identify if any relation exists among these
3
clusters requires further research.
4
Identification of crash contributing factors is a vital area of research; however, very few
5
researchers has examined the validity of sample size e.g., data at the level of a county to identify
6
factors that contribute to crashes over a state and research in this area can play a significant role
7
in identifying crash clusters (10). The first law of geography states that “everything is related to
8
everything else, but near things are more related than distant things” (19). Therefore, it can be
9
hypothesized that spatially, crashes have (some) similarities that may include common crash
10
contributing factors. SA is used in this paper to identify these similarities between crashes
11
(events). Further, to reduce fatal and severe injury crashes, multinomial logistic regression
12
(MLR) is used to determine the relative risk among crashes, given that a crash has occurred.
13
These objectives are the focus of this paper. The details of SA and MLR are described in the
14
subsequent sections. To identify sample data for analyses, a county or a set of counties with the
15
highest crash severity index (CSI) were chosen. Details of CSI are presented under
16
‘Methodology’.
17
Further, Guerts et al. (20) analyzed crash sites in Belgium to identify hotspots. A
18
sensitivity analysis indicated that 190 sites (23.8%) showed a higher risk of crashes when crash
19
severity was considered. Hauer et al. (21) indicated that identification of sites with high crash
20
severity led to more cost-effective projects. From the literature reviewed, it was clearly observed
21
that consideration of crash severity compared to crash frequency, lead to cost-effective projects.
22
Crash severity is used in this paper by utilizing the crash severity index.
23
This paper next presents the data set analyzed, a description of methodology, spatial
24
autocorrelation (SA), the various indices used in this paper, and the multinomial logistic
25
regression (MLR) technique. This is followed by analysis and application of results. The paper
26
ends with discussion, conclusions and recommendations for future research.
27
CRASH DATA ANALYZED
28
Arkansas crash data from eight Interstate, 19 U.S. and 239 State highways were used for
29
this study. The data comprised of 112,695 crashes in 75 counties from 2004 through 2006.
30
Tables 1 and 2, list the factors used to analyze crash severity injury and the SA among them. The
31
distance between the crashes required to calculate SA was obtained from the crash data set.
4
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
2
METHODOLOGY
The first step in the methodology was the categorization of counties. The second step
3
identified the county(ies) from each category that provided the sample data. The county with the
4
highest crash severity index (CSI) was selected to represent the sample data. For MLR models,
5
the minimum sample size required is 2000 (22). Therefore, county (ies) were chosen to satisfy
6
this criterion. The third step identified the crash contributing factors from the sample data and the
7
remaining data in each category. In the final step, the percentage of common crash contributing
8
factors between the sample data (county) and the remaining data for each category were
9
computed. ‘Common’ refers to factors that followed similar trend (increase/decrease) in the odds
10
ratio and statistically significant crash contributing factors between the county data and the data
11
for the remaining counties in each category. Details of SA, categorization of the counties,
12
computation of CSI, and MLR are presented in the following.
13
Spatial Autocorrelation (SA) Indices
14
The basic principle of SA is similar to the first law of geography mentioned previously.
15
SA is defined as the correlation of a variable with itself in space. SA measures the strength of
16
autocorrelation and the assumption of independence. A variable is said to be spatially
17
autocorrelated if there are any systematic patterns in its spatial distribution. SA is positive if
18
nearby areas (regions) are alike. Negative autocorrelation applies to neighboring areas that are
19
unlike, and no SA is exhibited by random patterns. Spatial autocorrelation indices, however, do
20
not explain why locations that indicate a cluster of crashes have a higher incidence of crashes
21
compared to other locations; therefore, SA methods cannot identify factors that cause crashes
22
(23). In this paper, Moran’s I was used to determine SA. Getis-Ord G i * statistic was then used to
23
identify the clusters of crashes by county. Z-values of G i *were used to categorize these counties.
24
Moran’s I is one of the oldest indicators of SA (24). SA compares the value of a variable at one
25
location with its value at other locations. Similar to a correlation coefficient, SA varies between
26
–1.0 and + 1.0. A positive correlation indicates clustering (i.e., higher crash concentrations),
27
whereas a negative correlation indicates dispersion or lower crash concentration. Moran’s I can
28
be expressed as:
5
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
Moran ' s I =
1
n ∑∑ wij (Yi − Y )(Y j − Y )
i
j
i≠ j
2
(1)
( ∑ wij )∑ (Yi − Y )
i
where:
3
n
= crash frequency,
4
w ij
= weight used to compare crashes at locations i and j,
5
Y
= mean crash severity index,
6
Yi
= crash severity at location i, and
7
Yj
= crash severity at location j.
8
9
The term w ij represents a contiguity matrix. If location j is adjacent to location i, the
10
interaction receives a weight of 1, otherwise, it is zero. The term w ij compares the sum of the
11
cross-products of values at different locations, weighted by the inverse of the distance between
12
the locations. The significance of the Moran’s I can be evaluated by a Z-value as:
14
E(I ) =
−1
n −1
n 2 ( n − 1) s1 − n ( n − 1) s2 − 2 s02
( n + 1)( n − 1) s02
S(I ) =
where:
S0 =
∑w
i≠ j
19
(3)
To calculate the Z-value, S(I), the standard deviation is computed as:
17
18
(2)
where, E(I), the expected value of Moran’s I (without SA) can be computed as:
15
16
I − E(I )
S (I )
Z (I ) =
13
ij ,
S1 =
1
2
∑ (w
i≠ j
ij
+ w ji ) 2 , and S 2 =
(4)
∑ (∑ w + ∑ w
kj
k
j
ik
)2
i
In the above formula i, j, and k represent the locations of crashes, respectively. Values of
20
Z greater than +1.96, and less than -1.96 indicate positive and negative spatial autocorrelation,
21
respectively, at a significance level of 5%.
6
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
Getis-Ord G i * Statistic G-statistics, developed by Getis and Ord, analyzes the evidence of
2
spatial patterns and represent a global SA index (25, 26). The G i * statistic, on the other hand, is a
3
local SA index. It is more suitable for discerning cluster structures of high or low concentration.
4
A simple form of the G i * statistic is (27):
n
∑w
G i* =
5
ij
xj
j =1
(5)
n
∑x
j
j =1
6
where: G i * is the SA statistic of an event i over n events (crashes) (4). The term x j characterizes
7
the magnitude of the variable x at events j over all n, and it is the CSI value determined at a
8
particular location. The distribution of the G i * statistic can be observed from the underlying
9
distribution of the variable x (4). The threshold distance (the proximity of one crash to another)
10
in this study was set to zero to indicate that all features were considered neighbors of all other
11
features. This threshold was applied over the entire region of the study.
12
13
The standardized G i * is essentially a Z-value and can be associated with statistical
significance:
n
Gi* =
14
15
16
∑w x
j =1
ij ij
n
− X ∑ wij
j =1
2

 n
n


2


n∑ wij − ∑ wij 


 j =1
 j =1  
S
n −1
(6)
where:
n
∑x
j =1
2
j
17
S=
18
Positive and negative G i * statistic values correspond to clusters of crashes with high- and
19
n
− ( X )2
(7)
low-value events, respectively. A G i * close to zero implies a random distribution of events.
20
7
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
Categorization of Counties
2
The categorization of counties was based on the Z-value of the G i * statistic calculated for
3
each of the counties. This categorization can be based on six different classification schemes i.e.,
4
equal interval, defined interval, quartile, natural breaks, geometric interval, and standard
5
deviation. The natural breaks scheme was best suited for the present study (28, 29). In the natural
6
breaks scheme, the classes are based on categorizing inherent in the data. The classes identify the
7
break points that best group similar values and maximize the differences between these classes.
8
9
In this study, the Jenks’ algorithm was used to categorize the natural breaks (28). This
algorithm is commonly used to classify the data in a choropleth map; a type of thematic map that
10
uses shading to represent classes of a feature associated with specific areas (e.g., a population
11
density map). The Jenk’s algorithm generates a series of values that best represent the actual
12
breaks in the data, as opposed to some arbitrary classification scheme. Thus, it preserves the true
13
clustering of data values. As a result, the algorithm creates k classes as the variance within
14
categories is minimized (30).
15
Selection of Counties for Analysis of Crash Data
16
From each category, a county or a set of counties starting with the highest crash severity
17
index (CSI) was (were) selected as a data sample. The highest CSI was used as the criterion as it
18
provided the greatest variability in the crash data and provides a better choice in terms of sample
19
data for analysis compared to other data samples such as random selection of data samples. CSI
20
was used to incorporate crash severity, and to associate crash contributing factors with high
21
levels of crash severity. A high CSI indicates a large number of fatal crashes or frequency of
22
crashes with various levels of severity. The CSI was computed as (31):
23
24
CSI = S1*W1 + S2*W2 + S3*W3 + S4*W4 + S5*W5
(8)
where:
25
S1
= frequency of crashes involving fatalities,
26
S2
= frequency of crashes involving incapacitating injuries,
27
S3
= frequency of crashes involving moderate injury,
28
S4
= frequency of crashes involving complaint of pain,
29
S5
= frequency of crashes involving property-damage-only (PDO), and
8
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
2
3
WI
= weights (542, 29, 11, 6, and 1(32)) assigned to crash severity levels, I = 1, 2
…5.
The weights used are based on the comprehensive crash costs per person for each level of
4
crash severity (S1-S5) as used in the Highway Safety Manual (HSM) (32) and Manepalli et al.
5
(31). They were calculated as the ratio of different crash costs with respect to property damage
6
only crashes. The crash costs were $4,008,900 for a fatal crash (S1), $216,000 for major injury
7
(S2), $79,000 for minor injury (S3), $44,900 for complain of pain (S4) and $7,400 for property
8
damage only crashes (S5), i.e., weight for the fatal crash equals 542 (4,008,900/7,400). Similarly,
9
the other weights were computed and rounded up to the nearest integer.
10
11
Multinomial Logistic Regression
Logistic regression (LR) can be used to predict dependent variables from different types
12
of independent variables, and can compute the percent of variance associated with the dependent
13
variable that is explained by the independent variable. LR can also rank the relative importance
14
of independent variables to assess the interaction effects, and explain the impact of covariate
15
control variables. The impact of predictor variables can be explained in terms of the odds ratios.
16
In this study, crash severity with five different levels (S1-S5) was used as the dependent variable.
17
As the dependent variable has more than two categories, multinomial logistic regression (MLR)
18
was selected for this study. Crash severity was calculated relative to the property-damage-only
19
crashes as explained previously. The details of independent variables and their levels are
20
presented in Table 1.
21
All independent variables were checked using variance inflation factor (VIF) to ensure
22
that multicollinearity did not exist in the data. VIF was found to be less than 10 for all of the
23
variables; hence, multicollinearity was not observed. Variables selected for model development
24
depended on the quality of the data. Only certain factors were retained for analysis as some
25
factors had missing values. When more than 10% of the values were missing, that factor was not
26
considered. For the factors presented in Table 1, values no more than 1% were missing. Mallow
27
C p was used to retain the variables. A smaller value of C p indicates a better model (33).
28
The Statistical Analysis System (SAS) (34) was used to perform MLR using the
29
CATMOD (Categorical Modeling) procedure to identify the factors that contribute to crashes
30
and are positively associated with crash severity. The CATMOD procedure has been used in the
9
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
past for linear modeling, log-linear modeling, logistic regression, and repeated measurement
2
analysis (33).
3
INTERPRETATION AND ANALYSIS OF RESULTS
4
In this section, the results for SA are first presented. The results of MLR for the county
5
selected in the first category, and then for the rest of the category are presented. To avoid
6
repetition, consolidated results for the remaining categories are summarized at the end of the
7
section.
8
Results for Spatial Autocorrelation Indices
9
Table 2 presents the results of Moran’s I, demonstrating that the total crash frequency and
10
the different crash severity levels exhibit SA, and that the crashes were not random chance
11
events. The crash severity levels proved significant at various levels. The counties in Arkansas
12
were divided into five categories using G i *; Figure 1 shows these categories, and Table 3
13
presents the results by category and lists the number of counties in each category. For some
14
categories, more than one county was selected as MLR models need a minimum sample size of
15
2000 crashes (22). As mentioned earlier, a set of counties were identified based on the highest
16
CSI. In Table 3, the size of sample data used in terms of CSI and crash frequency ratio used in
17
identifying the crash contributing factors is also presented. Columns D and E indicate the CSI
18
computed for the sample county(ies) and the data for the remaining counties used, respectively.
19
The first category included three counties, Pulaski, Benton and Washington. Pulaski
20
County covers the Arkansas state capital of Little Rock. Little Rock has a population of 183,133
21
and an area of 116.8 square miles (35). Fayetteville, another major city in Arkansas, with a
22
population of 72,202 and an area of 44.5 square miles, is located in Washington County. Several
23
Interstate highways pass through Little Rock. The AADT on one of those highways, I-630, was
24
52,297 vehicles/day. Benton and Washington border the states of Missouri and Oklahoma; have
25
a high volume of traffic, and high incidence of crashes. These counties, therefore, indicated a
26
high Z-value of the G i * statistic.
27
Results for Multinomial Logistic Regression
28
Table 4 presents the results for Pulaski County (the county with the highest CSI in the
29
first category), and Table 5 shows the collective results for the remaining counties in the first
10
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
category; results are presented that are statistically significant at a level of 0.05. The factors
2
common to both Pulaski County and the other counties in the first category follow similar trend
3
(i.e., an increase or decrease in the odds ratio) with respect to the factors identified for the first
4
category. These factors are shaded in these tables. Details of the analysis of the odds ratio are
5
presented below, along with a few examples for each case. For detailed results on all of the other
6
categories, the interested reader is referred to Manepalli (36).
7
Table 4 indicates that during darkness, fatal crashes were more likely to occur than
8
property damage only crashes, and the odds ratio increased by a factor of 1.28 if other variables
9
remained constant. Similarly, the relative risk of fatal crashes was greater than property-damage-
10
only crashes in rural areas and on curved roads. For details on calculation of the odds ratio, the
11
interested reader is referred to Bham et al. (33).
12
Table 5 indicates that major injury crashes to property-damage-only crashes were more
13
likely to result from a head-on collision than from a sideswipe-same-direction (SSSD) collision
14
by a factor of 4.66, given that all other variables remained constant. Similarly, major injury
15
crashes were more likely than property-damage-only crashes on rural roads and curved roads, as
16
a result of single-vehicle crashes (SVC). The risk of a major injury crash relative to a property-
17
damage–only crash decreased by a factor of 0.50 in rear-end collisions compared to SSSD.
18
Similarly, major injury crashes were more likely than property-damage-only crashes when
19
alcohol was involved, AADT was high, occurred on the weekends, and in the case of SSSD
20
collisions. To reduce the repetitive nature of the results, only the results of the first category are
21
explained in the text.
22
Summary
23
Table 6 summarizes the results for each of the five categories at each crash severity level.
24
It indicates the number of crash contributing factors common to both a county or a set of
25
counties with the highest CSI and the remaining counties collectively within a category that
26
showed similar trends in the odds ratio for both (sample county(ies) and remaining counties).
27
For the third category, crashes involving complains of pain were more frequently the
28
result of SVC than SSSD crashes, but indicated positive and negative values for the odds ratio
29
(i.e., similar trend was not observed), respectively. For Craighead County, the odds ratio
30
increased (SVC vs SSSD); however, overall for the third category, it decreased. Therefore, SVC
11
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
or SSSD collision types cannot be considered factors that are positively associated with crash
2
severity.
3
Rural areas were positively associated with crash severity. The following factors also had
4
a positive association with crash severity: darkness, driving under the influence of alcohol,
5
roadways on a grade, curved roadways, head-on and sideswipe same direction collisions (SSSD).
6
Positive association of these factors with crash severity has been found in other studies as well
7
(33).
8
Additional inferences
9
The factors identified were based on classification of SA in the first step. The division of
10
the counties by SA indices, therefore, plays a major role in identifying factors that contribute to
11
crashes. These results are supported by the factors identified by MLR as well. Analysis
12
conducted for each highway individually in Arkansas would require significant computational
13
time and resources. A safety analyst’s time would be significantly reduced if the factors that
14
contribute to crashes can be analyzed at the county level. The summary of results indicates that
15
an analysis of a set of counties in a state was sufficient to identify 49% of factors that contributed
16
to crashes statewide. Tables 2, 3 (column G) and 6 demonstrate that crash data from a group of
17
selected counties (13 out of 75 counties) represented 25% of the total crash data and identified
18
49% of the common factors that contributed to the crashes. The results from Table 3 (column G)
19
indicate that the use of a small proportion of crash data can be used to identify the major crash
20
contributing factors. This proportion varies for different categories and is remarkable that overall
21
such a small proportion of the crash data (25%) identifies major (49%) crash contributing
22
factors.
23
Previous studies (4) have shown that negative Z-values of the G i * statistic contains fewer
24
clusters of events. The fourth and fifth categories with Z-values from -.481 to -.098 and from -
25
.775 to -.481, respectively; had fewer crash clusters. As values close to zero imply random
26
distribution of spatial events, the third category (-.098, .560), and to a lesser degree the second
27
category (.560, 1.769), demonstrate a random distribution of spatial events.
28
APPLICATION TO CURRENT PRACTICE
29
30
State DOTs allocate funds for application of remedial measures to hotspots. The results
of this study indicate that as Z-values decrease, crash concentrations also decrease. DOTs can
12
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
allocate funds and concentrate on counties with higher Z-values, rather than equal importance to
2
all counties. State DOTs can initiate a statewide improvement program such as improve shoulder
3
width, add median barriers, etc. by identifying the factors from the sample data using SA.
4
For specific improvements, each highway should be analyzed separately; however, the
5
entire length of each highway need not be analyzed. For instance, I-40 passes through the entire
6
state of Arkansas. SA indices indicated the highest positive Z-value for Pulaski County. Sections
7
of I-40 in Pulaski County should, therefore, receive the highest priority rather than the entire
8
length (285 miles) of I-40 in Arkansas (35).
9
The procedure presented in this paper can be used for identification of crash clusters;
10
counties that exhibit similar levels of clustering. Categories with positive Z-values require more
11
attention to detail compared to categories with negative Z-values. This will aid DOTs in
12
improved allocation of funds with budget constraints in the current economy of the United
13
States. This procedure will require limited personnel time in analysis of data; however, statistical
14
knowledge is required for such an analysis.
15
DISCUSSION, CONCLUSIONS AND RECOMMENDATIONS
16
The methodology proposed in this paper can allow departments of transportation (DOTs)
17
to effectively identify factors that contribute to crashes by using a sample of crash data, thus
18
allocating more time to study crash contributing factors associated with crash severity.
19
The methodology used categorized 75 counties in Arkansas into five categories. From
20
these categories, counties with the highest crash severity index were selected. Pulaski County, in
21
the first category, had the highest Z-value (6.16) and identified 55% of the crash contributing
22
factors that were common with remaining counties in the category.
23
The use of Moran’s I or Getis-Ord G i * statistic does not suffice in revealing the effects of
24
spatial autocorrelation (SA) in crash data analysis. The use of spatial indices, Moran’s I and
25
Getis-Ord G i * statistic, is recommended because Moran’s I indicates the presence of SA and
26
Getis-Ord G i * indicates the relative level of clustering. DOTs should consider categorization of
27
counties based on Z-values of the G i * statistic, if global SA exists.
28
Factors contributing to crashes were identified using multinomial logistic regression
29
(MLR) in this paper. The odds ratio was used to identify the factors positively associated with
30
crash severity. Rural areas had a positive association with crash severity in addition to the
31
following factors: darkness, driving the under influence of alcohol, roadway on a grade, curved
13
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
roadways, and head-on and sideswipe same direction collisions. The positive association of these
2
factors with crash severity was also found in another study (33).
3
Further research is recommended to identify and compare factors that contribute to
4
crashes over time using crash data for more than three years. The effect of selecting counties
5
with a low frequency of certain levels of crash severity is also a subject for future research.
6
Analysis of additional factors related to roadway geometry will provide more insight on the
7
methodology used in the paper. Further, the authors plan to study factors such as land use
8
distribution for spatial autocorrelation and identify the crash contributing factors.
9
ACKNOWLEDGMENTS
10
The authors are grateful to the Arkansas Highway and Transportation Department for
11
providing the data for this research. They also like to thank Drs. V. A. Samaranayke and
12
Dominique Lord for their input. The authors also like to thank the four anonymous reviewers for
13
their comments which helped improve the quality of the paper.
14
REFERENCES
15
16
17
18
19
20
21
22
23
24
25
26
1. Decade of Action for Road Safety 2011 to 2020, http://www.decadeofaction.org/, Accessed
July 1, 2011.
2. NHTSA (2006). http://www-nrd.nhtsa.dot.gov/Pubs/810623.PDF, Accessed March 10, 2010.
3. MoDOT's Bolder Five-Year Direction, A Presentation to the Missouri Highways and
Transportation Commission, June, 2011,
http://www.modot.org/bolderfiveyeardirection/documents/FINAL_June8plan.pdf. Accessed
July 1, 2011.
4. Songchitruksa, P., and Zeng, X. Getis-Ord Spatial Statistics for Identifying Hot Spots Using
Incident Management Data. Proceedings of Transportation Research Board 89th Annual
Meeting, Washington D. C., 2010.
5. Depue, L. Safety Management Systems: A Synthesis of Highway Practice. NCHRP Synthesis
322, Transportation Research Board of the National Academies, Washington, D.C., 2003.
6. Norden, M., J. Orlansky and H. Jacobs. Application of Statistical Quality-Control
Techniques to Analysis of Highway-Accident Data. Bulletin 117, HRB, National Research
Council, Washington, D.C., 1956, pp. 17-31.
7. Hakkert, A. S. and D. Mahalel. Estimating the Number of Accidents at Intersections from a
Known Traffic Flow on the Approaches. Accident Analysis & Prevention, Vol. 10, No. 1,
1978, pp. 69-79.
8. McGuigan, D. R. D. The Use of Relationships between Road Accidents and Traffic Flow in
“Black-Spot” Identification. Traffic Engineering and Control, 1981, pp. 448–453.
9. Kim, K. D. Takeyama and L. Nitz. Moped safety in Honolulu Hawaii. Journal of Safety
Research, Vol. 26, No. 3, 1995, pp. 177–185.
10. McMahon, P. A Quantitative and Qualitative Analysis of the Factors Contributing to
Collisions between Pedestrians and Vehicles along Roadway Segments. Masters Project,
27
28
14
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Department of City and Regional Planning, University of North Carolina at Chapel Hill, NC.
1999.
11. Peled, A., Haj-Yehia, B., Hakkert, A.S. ArcInfo Based Geographical Information System for
Road Safety Analysis and Improvement,
http://www.esri.com/library/userconf/proc96/TO50/PAP005/P5.HTM, Accessed November,
2009.
12. Premo, L. S. Local spatial autocorrelation statistics quantify multi-scale patterns in
distributional data: an example from the Maya Lowlands. Journal of Archaeological Science,
Vol. 31, 2004, pp. 855-866.
13. Huang, H., Chin, H. C., and Haque, M. M. Empirical Evaluation of Alternative Approaches
in Identifying Crash Hot Spots: Naïve Ranking, Empirical Bayes, and Full Bayes. In
Transportation Research Record: Journal of the Transportation Research Board, No. 2013,
2009, pp. 32-41.
14. Quddus, M. A. Modeling Area-Wide Count Outcomes with Spatial Correlation and
Heterogeneity: An Analysis of London Crash Data. Accident Analysis and Prevention, Vol.
40, No. 4, 2008, pp. 1486-1497.
15. Miaou, S., Song, J. J., and Mallick, B. K. Roadway traffic crash mapping: a space-time
modeling approach. Journal of Transportation and Statistics, Vol. 6, No. 1, 2003, pp. 33-57.
16. Huang, H., Abdel-Aty, M. A., and Darwiche, A. L. County-level crash risk analysis in
Florida: Bayesian spatial modeling. Proceedings of Transportation Research Board 89th
Annual Meeting, Washington D. C., 2010.
17. Siddiqui, C., Abdel-Aty, M., and Choi, K. Macroscopic spatial analysis of pedestrian and
bicycle crashes. Accident Analysis & Prevention, Vol. 45, No. 45, 2012, pp. 382-391.
18. Li, L., Zhu, L., and Sui, D, Z. A GIS-based Bayesian approach for analyzing spatial-temporal
patterns of intra-city motor vehicle crashes. In Journal of Transport Geography, 2007, pp.
274-285.
19. Tobler, W. A computer movie simulating urban growth in the Detroit region. Economic
Geography, Vol. 46, No. 9, 1970, pp. 234-240.
20. Guerts, E., Wets, G., Brijis, T., and Vanhoof, K. Identification and ranking of black spots. In
Transportation Research Record: Journal of the Transportation Research Board, No. 1897,
2004, pp. 34-42.
21. Hauer, E., Allery, B, K., Konnov, J., and Griffith, M, S. How Best to Rank Sites with
Promise. In Transportation Research Record: Journal of the Transportation Research Board,
No. 1897, 2004, pp. 48-54.
22. Ye, F. and Lord, D. Comparing Three Commonly Used Crash Severity Models on Sample
Size Requirements: Multinomial Logit, Ordered Probit and Mixed Logit Models,
Proceedings of the Annual Meeting of the Transportation Research Board, Washington, D.
C., 2011.
23. Mitra, S. Spatial autocorrelation and Bayesian spatial statistical method for analyzing fatal
and injury crash prone intersections. Proceedings of Transportation Research Board 88th
Annual Meeting, Washington D. C., 2009.
24. Moran, P. A. P. Notes on continuous stochastic phenomena. Biometrika, No. 37, 1950, pp.
17-33.
25. Getis, A. and Ord, J. K. The analysis of Spatial Association by use of Distance Statistics.
Geographic Analysis, Vol. 24, No. 3, 1992, pp. 189-206.
15
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26. Ord, J. K. and Getis, A. Local Spatial Autocorrelation Statistics: Distributional Issues and an
Application. Geographic Analysis, Vol. 27, No. 4, 1995, pp. 286-306.
27. McGuigan, D. R. D. The Use of Relationships between Road Accidents and Traffic Flow in
“Black-Spot” Identification. Traffic Engineering and Control, 1981, pp. 448–453.
28. http://go.owu.edu/~jbkrygie/krygier_html/geog_353/geog_353_lo/geog_353_lo07.html
Accessed, June 23, 2010. J. B. Krygier: Geography lecture notes.
29. http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Standard_classification_sc
hemes Accessed, June 23, 2010. ArcGIS 9.2 Desktop Help Online Manual.
30. http://danieljlewis.org/2010/06/07/jenks-natural-breaks-algorithm-in-python/ Accessed, June
25, 2010. Daniel J Lewis, University College London, Department of Geography.
31. Manepalli, U. R. R., Bham, G, H., and Samaranayake, V. A. An Evaluation of Crash
Frequency, Crash Severity, and Composite Rank Methods for Hotspot Identification, No.123349. Proceedings of Transportation Research Board 91th Annual Meeting, Washington D.
C., 2012.
32. Highway Safety Manual, First Edition, 2010. Vol. 1 Section 4-66, Table 4A-1.
33. Bham, G. H., Javvadi, B, S., and Manepalli, U. R. R. Multinomial logistic regression model
for single-vehicle and multivehicle collisions on urban U.S. Highways in Arkansas. Journal
of Transportation Engineering, Vol. 138, No. 6, 2012, pp. 786-797.
34. SAS online documentation manual
9.1.2., http://support.sas.com/onlinedoc/912/docMainpage.jsp Accessed, December, 2009.
35. Bham, G. H., and Manepalli, U. R. R. Identification and Analysis of High Crash Segments
on Interstate, US, and State Highway Systems of Arkansas. Report MBTC 2099/3006, 2009.
36. Manepalli, U. R. R., A Statewide Analysis of Highway Safety in Arkansas, M.S. thesis,
Missouri University of Science and Technology, Rolla, MO, 2011.
26
16
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
Table 1. List of Independent Variables
Terms
Variables
Atmospheric Conditions
ATM
(ATM)
LGT
Light Conditions
RSUR
Roadway Surface
RU
Roadway Type
RALI
Roadway Alignment
RPRO
Roadway Profile
TOH
Roadway Classification
TOC
Collision Types
WK
Days of the week
Driving Under the
Influence
DUI
AADT
Annual Average Daily
Traffic
Levels of Variables
Clear, Rain
Dark, Daylight
Dry, Wet
Rural, Urban
Curve, Straight
Grade, Level
Divided, Undivided
Angle, Head-On, Rear End,
Sideswipe Same Direction (SSSD),
Single Vehicle Crashes (SVC),
Sideswipe Opposite Direction (SWOD)
Weekdays (M-F), Weekends (Sat, Sun)
Yes, No
<20,000; 20,000-40,000; 40,000-60,000;
60,000-80,000 ; 80,000-100,000;
100,000-12000
2
3
4
Table 2. Results of Spatial Autocorrelation Index, Moran’s I
Global
Variables
Z-score
α
Moran's I
TCF
0.06
2.50
0.05*
S1 (Fatal)
0.08
2.98
0.01**
S2 (Major Injury)
0.10
3.33
0.01**
S3 (Minor Injury)
0.07
2.72
0.01**
S4 (Complain of Pain)
0.07
2.75
0.01**
S5 (Property Damage Only)
0.05
2.26
0.05*
5
6
7
Notes:
*statistically significant at 95%, **statistically significant at 99%
TCF = Total Crash Frequency
17
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
2
3
4
5
6
7
8
9
10
11
12
Table 3. Results by category, highest CSI in each category, and ratio of crash data
Category
Number
of
counties
Counties with
highest CSI@
CSI!
Total
CSI
CSI
ratio*
Crash
Freq.
ratio
(A)
(B)
(C)
(D)
(E)
(F)
(G)
First
3
Pulaski
137,627
276,755
.50
.51
Second
9
Garland
52,189
324,668
.16
.27
Third
13
Craighead
28,676
298,379
.10
.17
Fourth
25
45,707
273,196
.17
.16
-0.481099,
-0.097832
Fifth
25
53,477
133,861
.40
.57
-0.775375,
-0.481100
Total
75
317,676
1,306,859
.24
.34
-
Notes:
Madison,
Cleburne,
Logan
Chicot,
Montgomery,
Polk, Perry,
Little River,
Clay,
Colombia
13^
Range of Z value
of G* i statistic
(H)
1.7678741,
6.161180
0.559918,
1.768740
-0.097831,
0.559918
@ satisfies the condition of minimum sample size of 2000 in terms of crash frequency
!CSI computed for county/counties in col. ‘C’
$CSI computed for counties in col. ‘B’
*Ratio of CSI values in col. D and col. E.
^total number of counties in col. C
18
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
Table 4. Factors Positively Associated with Crash Severity, Pulaski County (First Category)
Parameter
LGT
RU
RALI
DUI
Intercept
RU
RALI
TOC
TOC
TOC
TOC
TOC
DUI
Intercept
RU
TOH
TOC
TOC
TOC
TOC
TOC
WK
DUI
AADT
AADT
Intercept
TOC
TOC
TOC
TOC
TOC
WK
AADT
AADT
2
3
Standard
Estimate
Error
Chi Square
Fatal vs Property Damage Crashes
Dark vs Daylight
0.25
0.12
3.92
Rural vs Urban
0.71
0.13
29.31
Curve vs Straight
0.34
0.13
6.25
No vs Yes
-1.17
0.13
86.71
Major Injuries vs Property Damage Crashes
-2.49
0.18
185.59
Rural vs Urban
0.43
0.08
29.24
Curve vs Straight
0.29
0.08
13.05
Angle vs SSSD
-0.39
0.17
5.36
Head-on vs SSSD
1.86
0.23
64.43
Rear-end vs SSSD
-0.58
0.15
15.34
SVC vs SSSD
0.69
0.13
26.32
SWOD vs SSSD
-1.50
0.25
36.62
No vs Yes
-0.77
0.08
91.27
Minor Injuries vs Property Damage Crashes
-1.19
0.09
158.92
Rural vs Urban
0.13
0.05
7.86
Divided vs
Undivided
0.15
0.04
18.29
Angle vs SSSD
0.17
0.07
5.32
Head-on vs SSSD
1.13
0.16
51.00
Rear-end vs SSSD
-0.50
0.07
49.05
SVC vs SSSD
0.61
0.07
72.22
SWOD vs SSSD
-1.18
0.10
137.05
Weekdays vs
Weekends
-0.10
0.03
9.03
No vs Yes
-0.42
0.05
70.48
20000 vs 120000
0.25
0.06
17.50
80000 vs 120000
-0.16
0.07
5.59
Complain of Pain vs Property Damage Crashes
-0.73
0.08
79.38
Angle vs SSSD
0.18
0.06
9.04
Head-on vs SSSD
0.63
0.15
17.80
Rear-end vs SSSD
0.30
0.06
29.19
SVC vs SSSD
-0.19
0.07
8.56
SWOD vs SSSD
-0.46
0.07
47.60
Weekdays vs
Weekends
-0.09
0.03
12.68
40000 vs 120000
-0.12
0.04
7.53
80000 vs 120000
-0.09
0.05
3.89
Factors
Pr > Chi
Square
Odds
Ratio
0.0476
<.0001
0.0124
<.0001
1.28
2.04
1.40
0.31
<.0001
<.0001
0.0003
0.0206
<.0001
<.0001
<.0001
<.0001
<.0001
1.54
1.33
0.68
6.41
0.56
2.00
0.22
0.46
<.0001
0.0051
1.14
<.0001
0.0210
<.0001
<.0001
<.0001
<.0001
1.16
1.18
3.10
0.61
1.85
0.31
0.0027
<.0001
<.0001
0.0180
0.91
0.66
1.29
0.85
<.0001
0.0026
<.0001
<.0001
0.0034
<.0001
1.20
1.88
1.35
0.82
0.63
0.0004
0.0061
0.0486
0.91
0.89
0.91
Note: *shading indicates common factors in Tables 4 and 5 (includes similar increase/decrease in the odds ratio)
19
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
Table 5. Factors Positively Associated with Crash Severity (First Category#)
Standard
ChiPr > Chi
Odds
Estimate
Error
Square
Square
Ratio
Fatal vs Property Damage Crash
Intercept
-3.1045
0.3359
85.4
<.0001
RU
Rural vs Urban
0.7018
0.1378
25.94
<.0001
2.02
TOH
Divided vs Undivided
0.4316
0.1503
8.25
0.0041
1.54
TOC
Angle vs SSSD
0.6343
0.2995
4.48
0.0342
1.89
TOC
Head-on vs SSSD
3.052
0.3328
84.08
<.0001
21.16
TOC
Rear-end vs SSSD
-2.0406
0.5333
14.64
0.0001
0.13
TOC
SVC vs SSSD
0.5922
0.2861
4.28
0.0385
1.81
TOC
SWOD vs SSSD
-1.5616
0.6343
6.06
0.0138
0.21
DUI
No vs Yes
-1.2111
0.1275
90.22
<.0001
0.30
AADT
40000 vs 120000
-0.7311
0.2897
6.37
0.0116
0.48
Major Injury vs Property Damage Crash
Intercept
-1.6167
0.1538
110.53
<.0001
RU
Rural vs Urban
0.6714
0.0668
101.16
<.0001
1.96
RALI
Curve vs Straight
0.2893
0.0642
20.31
<.0001
1.34
TOH
Divided vs Undivided
0.2977
0.0734
16.46
<.0001
1.35
TOC
Head-on vs SSSD
1.539
0.231
44.37
<.0001
4.66
TOC
Rear-end vs SSSD
-0.7016
0.1394
25.33
<.0001
0.50
TOC
SVC vs SSSD
0.6892
0.126
29.91
<.0001
1.99
TOC
SWOD vs SSSD
-1.0892
0.2264
23.15
<.0001
0.34
WK
Weekday vs Weekend
-0.19
0.0565
11.31
0.0008
0.83
DUI
No vs Yes
-0.6524
0.078
69.9
<.0001
0.52
AADT
40000 vs 120000
-0.6028
0.1233
23.91
<.0001
0.55
Minor Injury vs Property Damage Crash
Intercept
-0.8127
0.0949
73.4
<.0001
ATM
Clear vs Rain
0.1561
0.081
3.72
0.0538
1.17
RU
Rural vs Urban
0.1381
0.0439
9.89
0.0017
1.15
RPRO
Grade vs Level
0.1299
0.032
16.45
<.0001
1.14
TOH
Divided vs Undivided
0.1946
0.0404
23.24
<.0001
1.21
TOC
Angle vs SSSD
0.1734
0.0622
7.77
0.0053
1.19
TOC
Head-on vs SSSD
1.1991
0.1375
76.04
<.0001
3.32
TOC
Rear-end vs SSSD
-0.5982
0.0646
85.82
<.0001
0.55
TOC
SVC vs SSSD
0.7107
0.072
97.38
<.0001
2.04
TOC
SWOD vs SSSD
-1.2811
0.1159
122.23
<.0001
0.28
WK
Weekday vs Weekend
-0.1116
0.0318
12.31
0.0005
0.89
DUI
No vs Yes
-0.5
0.0504
98.23
<.0001
0.61
AADT
20000 vs 120000
0.2738
0.0685
15.97
<.0001
1.31
AADT
60000 vs 120000
-0.238
0.1131
4.43
0.0353
0.79
Complain of Pain vs Property Damage Crash
Intercept
-0.3083
0.0756
16.61
<.0001
LGT
Dark vs Daylight
0.0538
0.0234
5.27
0.0217
1.06
RALI
Curve vs Straight
0.0746
0.0335
4.97
0.0258
1.08
RPRO
Grade vs Level
0.0566
0.0236
5.73
0.0167
1.06
TOH
Divided vs Undivided
0.1422
0.0295
23.18
<.0001
1.15
TOC
Rear-end vs SSSD
0.2054
0.0457
20.2
<.0001
1.23
TOC
SWOD vs SSSD
-0.5199
0.0668
60.55
<.0001
0.59
WK
Weekday vs Weekend
-0.0678
0.0238
8.15
0.0043
0.93
DUI
No vs Yes
-0.1589
0.0469
11.49
0.0007
0.85
AADT
20000 vs 120000
0.1518
0.0474
10.26
0.0014
1.16
AADT
40000 vs 120000
-0.1173
0.0456
6.62
0.0101
0.89
Note: *shading indicates common factors in Tables 4 and 5(includes similar increase/decrease in the odds ratio); # excludes Pulaski County
Parameter
2
Factors
20
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
2
Table 6. Summary of Results: Crash Frequency and Severity Contributing Factors for Selected County and Category
Major Minor Complain
S. No
Fatal Injury Injury
of pain
Description
(1)
(2)
(3)
(4)
(5)
(6)
I.
Category 1
1
No. of contributing factors for Pulaski County
4
8
11
8
No. of contributing factors in Category 1 (excluding Pulaski
2
county)
9
10
13
10
3
No. of contributing factors common to I.1 and I.2
2
7
10
4
Percentage of all crashes resulting from factors common to I.1
4
and I.2
22
70
77
40
55
5
Percentage of commonly identified factors for the Category 1
(((2+7+10+4)/(9+10+13+10))*100)
II
Category 2
1
No. of contributing for factors Garland County
3
8
10
7
No. of contributing factors in Category 2 (excluding Garland
2
county)
7
10
15
12
3
No. of contributing factors common to II.1 and II.2
3
6
10
5
Percentage of all crashes resulting from factors common to II.1
4
and II.2
43
60
67
42
5
Percentage of commonly identified factors for the Category 2
54
III
Category 3
1
No. of contributing for factors Craighead County
4
4
8
4
No. of contributing factors in Category 3 (excluding Craighead
2
county)
8
13
10
8
3
No. of contributing factors common to III.1 and III.2
3
4
7
3
Percentage of all crashes resulting from factors common to III.1
4
and III.2
38
31
70
38
5
Percentage of commonly identified factors for the Category 3
44
IV
Category 4
No. of contributing factors for Madison, Cleburne, Logan
1
counties
6
5
4
5
No. of contributing factors in Category 4 (excluding Madison,
2
Cleburne, Logan counties)
6
11
10
8
3
No. of contributing factors common to IV.1 and IV.2
3
5
3
4
Percentage of all crashes resulting from factors common to IV.1
4
and IV.2
50
45
30
50
5
Percentage of commonly identified factors for the Category 4
43
V
Category 5
No. of contributing factors for Chicot, Montgomery, Polk, Perry,
1
Little River, Clay and Columbia counties
5
4
7
1
No. of contributing factors in Category 5 (excluding Chicot,
2
Montgomery, Polk, Perry, Little River, Clay and Columbia
counties)
3
5
7
5
3
No. of contributing factors common to V.1 and V.2
1
3
4
0
Percentage of all crashes resulting from factors common to V.1
4
and V.2
33
60
57
0
5
Percentage of commonly identified factors for the Category 5
40
Consolidated total percentage of commonly identified factors
49
Note: The frequency of crash contributing factors identified is with respect to property damage crashes only
3
21
TRB 2013 Annual Meeting
Paper revised from original submittal.
Manepalli and Bham
1
2
Figure 1. Counties categorized by G i *statistic for three years of Arkansas crash data
3
TRB 2013 Annual Meeting
Paper revised from original submittal.