Learning Appearance Models for Road Detection

Learning Appearance Models for Road Detection
Jos´e M. Alvarez, Mathieu Salzmann and Nick Barnes
NICTA
{jose.alvarez, mathieu.salzmann, nick.barnes}@nicta.com.au
Abstract— We introduce an approach to image-based road
detection that exploits the availability of unannotated training
images to learn an appearance model. Our approach allows
us to remove the standard assumption that the lower part of
the input image belongs to the road surface, which does not
always hold and often yields strongly biased appearance models.
Instead, we exploit this assumption in the training images,
which yields a much more general appearance model. We then
use the learned model to classify the pixels of an input image as
road or background without requiring any assumptions about
this image. Our experimental evaluation shows the benefits of
our approach over existing methods in challenging real-world
driving scenarios.
I. I NTRODUCTION
Road detection is key to autonomous driving systems, as
well as to many driver assistance tasks, such as lane keeping,
collision avoidance and road following [1], [2]. Furthermore,
it can serve as a preprocessing step for other challenging
problems, such as pedestrian or vehicle detection [3]. In this
paper, we tackle the problem of detecting roads in monocular
color images acquired with a mobile platform in real driving
situations. This scenario yields a challenging computer vision
problem due to the lack of control of the environment and
to the large variability of road appearance arising from the
different road types, as well as from varying lighting and
weather conditions.
To achieve robustness to such high intra-class variability,
many low-level cues, such as color [4], [5], [6], [7], [8],
[9], [10], [11], texture [12], [13], [14] or a combination of
them [15], [16], have been proposed. The standard approach
to road detection consists of modeling the road appearance
by learning the distribution of these cues from image pixels
labeled as road. Detection is then performed by computing
the image cue at each pixel of the input image and, based
on the corresponding distribution value, classifying the pixel
as either road or background.
To deal with the highly dynamic nature of road scenes,
i.e., the road appearance may vary significantly between
two images, state-of-the-art algorithms usually build the road
model online by exploiting the assumption that the lower
part of the input image belongs to the road surface. Indeed,
for most common camera placements, the lower part of the
image corresponds to an area about 4 meters away from
the camera, which often contains road. Nevertheless, this
assumption suffers from two major drawbacks: (1) in some
scenarios, the lower part of the image may not belong to
the road, e.g., Fig. 1(a); (2) even if it does belong to the
road, the pixels in this area may not be representative of the
(a)
(b)
Fig. 1. Common road detection algorithms assume that the central–lower
part of the image belongs to the road surface. However, this assumption has
two main drawbacks: (a) in some scenarios, that area does not belong to
the road surface; (b) pixels in that area are not representative of the entire
road in the image.
entire road in the input image, e.g., Fig. 1(b). In both cases,
the road model built on this assumption would fail to detect
most of the road in the input image.
In this paper, we propose to overcome the abovementioned drawbacks by making use of the availability of
training images to build the road appearance model. To
avoid requiring fully labeled training images, we exploit the
assumption that the lower part of the training images contains
road pixels. However, we do not make any such assumption
for the input image, on which we seek to detect the road. As
a consequence, our approach is unaffected by the presence
of background objects in the lower part of the input image.
Furthermore, exploiting multiple training images prevents
the resulting road model from being biased to the specific
appearance of a single road region.
More specifically, we represent each pixel in the lower part
of the training images as a linear combination of different
color planes. The weights of this combination are learned so
as to minimize the variance of the resulting representation,
which corresponds to making road pixels in this new space as
uniform as possible. Given a new input image, we compute
the different color planes at each pixel, combine them
according to the learned weights, and determine if the pixel
is road or background based on the value of the distribution
learned from training data, as well as on how uniform the
region around the pixel is. No assumption about the input
image is required.
We evaluated our algorithm on several video sequences
obtained from a vehicle driving in real-world conditions. In
practice, we used early images of the sequence as training
data to build a distribution and detect the road in the images
later in the sequence. Our experiments demonstrate the
robustness of our algorithm in challenging scenarios, and
show that it outperforms state-of-the-art methods.
II. R ELATED W ORK
Road detection is a key component for driver assistance
systems, and is a challenging problem in computer vision
since images are acquired in an outdoor scenario using a
mobile platform. Low–level cues such as color [7], [8], [9],
[10], [11], texture [13], [14] or a combination of them [14],
[15] have been widely used. Among these cues, color has
been accepted as the common approach for low–level road
detection since texture is scale dependent and may fail due
to the strong prespective effect present in road images.
Furthermore, color imposes less physical restrictions and
provides powerful information for detecting the road even
in the absence of shape information. The main challenge
of color based approaches is dealing with the high intra–
class variability due to the highly dynamic nature of the
scenes. Common approaches model the road appearance by
exploiting variant and invariant properties of different color
spaces such as HSV in [7], normalized rg in [9] or a physics–
based illuminant invariant color space in [11], [4]. More
recently, in [17], a linear combination of color planes is
learned to reduce the variability of the road texture.
Learning algorithms are usually based on positive (i.e.,
road) or negative (i.e., background) samples. However, for
our problem, learning a sufficiently general background
representation is not feasible. Instead, only a road model
is usually built. This model typically exploits road samples
that are obtained from the lower part of the image. That is,
most existing algorithms assume that this part of the image
belongs to the road surface [11], [7], [17]. However, this
assumption has two main problems: (1) the bottom part of
the image may not correspond to road pixels; (2) pixels
within that area may not represent the high-variability of
the road appearance. To increase the number of training
samples some algorithms rely on manually annotated data.
For instance, labelled data is used in [17] to build a road
model. This information is then combined with information
from the current image to ensure adaptation to the current
image conditions. Other algorithms exploit the sequential
nature of the data and use road results from previous images
as training data [7], [9]. Unfortunately, the former approach
requires human interaction to annotate the images. The latter
relies on the robustness of the algorithm and thus may
propagate errors. Therefore, in the next section, we propose
an algorithm to learn the appearance of the road that requires
minimal assumptions. The algorithm learns from positive
examples processed without any human supervision.
Fig. 2. Examples of road images used to build our training set. As expected,
the lower part of the image corresponds to the road surface in most cases.
III. L EARNING A ROAD A PPEARANCE M ODEL
In this section, we present our approach to learning an
appearance model for road pixels from training images. We
first describe how we exploit the training images, and then
discuss our pixel representation and the resulting road model.
A. Training Data for Road Modeling
Our goal is to build an appearance model for road pixels,
but avoid the usual assumption that the lower part of the input
image belongs to the road surface, which does not always
hold and may yield a strong bias to the model. To this end,
we propose to exploit training images to learn the model.
The standard approach to exploiting training data consists
of manually labeling the pixels in the training images as
positive (i.e., road) or negative (i.e., background). This,
however, quickly becomes time-consuming. Furthermore, obtaining sufficiently many examples to cover the background
class would require a huge number of training images. To
overcome these issues, we propose to only use positive
(road) examples collected automatically from the training
images. These examples can be obtained by relying on the
assumption that the lower part of the training images belongs
to the road surface most of the time (see Fig. 2). Note that
this entails no assumption on the input image.
The training pixels obtained as described above depict a
large variability of road pixels and could directly be used
to build the appearance model. However, some appearances
will be much less frequent than others (e.g., strong shadows).
Therefore, we propose a pruning algorithm to balance the
appearance of road samples in the training set and thus
reduce the effect of dominating samples that would bias
the distribution. Our pruning process consists of two steps
(see Fig. 3): first, we over-segment the lower part of each
training image into superpixels [18] and compute the modes
(maximum histogram value) of the RGB values of each
superpixel; then we cluster these modes using mean shift
with a fixed bandwidth value in the RGB color space. The
Fig. 3. Overview of our pruning algorithm. We first segment the lower part of the training images into superpixels, and then cluster the mean RGB values
of these superpixels [18]. We only retain the superpixel closest to each cluster center as training example.
reason for selecting this color space is its sensitivity to
lighting variations, i.e., small changes in lighting produce
high changes in the average RGB color, which makes it ideal
to retain a diversified set of road samples. The final training
set is obtained by taking the pixels of the closest superpixel
to each cluster center.
B. Road Pixel Representation
where y is some representation of the pixel. To achieve better
discriminative power, as well as robustness to lighting and
weather conditions, we seek to find a pixel representation
whose variance is minimal for all road pixels. In other
words, our goal is to find a low-level cue such that all
road pixels have similar appearance. Following [17], such a
representation can be obtained by exploiting the variant and
invariant properties of different color planes using a weighted
linear combination, as depicted by Fig. 4.
More specifically, let xij ,  ≤ i ≤ N ,  ≤ j ≤ P be the value
of the jth color plane for the ith training pixel. We represent
each pixel by a single value yi , such that
P
X
=
=
i=
=
P
X
j=
w j wk σ jk ,
(4)
j,k=
where x^j is the mean value over all training pixels for color
plane j, and σ jk is the covariance between the jth and kth
color planes. In this paper, we assume that the road has a
stochastic (random) texture and thus do not take the pixel
location into account. The weights of the color planes can
then be obtained by solving the optimization problem
minimize
w
subject to
P
X
w j wk σ jk
j,k=
P
X
wj =  ,
(5)
j=
w j xij ,
(2)
j=
where w = [w , . . . , wP ] represents the contribution of each
color plane to the final combination. Our goal now becomes
that of finding the weights w j that minimize the variance σd
of the road pixels. To this end, we note that the mean pixel
value µd can be expressed as
X i
y .
N
N
µd =
 X i
(y − µd )
N −
i=


P
N
P
N X
X
 X X

w j xkj 
w j xij −
N −
N
i=
j=
k= j=


N
P
 X X
w j (xij − x^j )
N −
N
σd
=
Given the training set described above, we can now
compute a low-level cue for each training pixel and build the
appearance model. Here, we consider a simple road model,
where the probability of a pixel belonging to the road is
given by
p(y) ∼ N(µd , σd ) ,
(1)
yi =
Note that to minimize the effect of lighting variations, µd
can also be computed from the centers of the clusters found
by mean shift. The variance can then be written as
i=
(3)
− ≤ w j ≤  , ∀ j ,
where we further enforce the weights to sum up to 1, and
bound them. This problem is a quadratic program, whose
global minimum can be found using available software, such
as Matlab’s quadprog function.
Our final appearance model is then defined by the mean
and variance given in Eq. (3) and Eq. (4), as well as by
the normalized histogram hd of the training pixel values yi
centered at µd . Our algorithm to build the road model is
given in Algorithm 1.
Fig. 4. Building a road model. Each training pixel is encoded as a linear combination of multiple color planes. The weights of the combination are
computed so as to minimize the variance of the resulting values across the training samples.
Algorithm 1 Learning a Road Model
Compute an over–segmentation of the training images
using an edge preserving superpixel algorithm (e.g., [18]).
Extract superpixels corresponding to the central–lower part
of each image.
Compute the modes of the RGB values of each superpixel.
Cluster these modes using mean shift with a fixed bandwidth (e.g., 0.025).
Select the superpixels closest to each cluster centroid.
Build the training set consisting of the pixels in the
selected superpixels.
Compute w = [w , . . . , wN ]T from Eq. (5).
Compute µd from Eq. (3).
Compute σd from Eq. (4).
IV. ROAD D ETECTION WITH AN A PPEARANCE M ODEL
We now present our road detection algorithm that makes
use of the road model introduced in the previous section (see
Fig. 5). Note that our road detection algorithm does not make
any assumption about the image being currently analyzed.
As a first step, the input image is converted into a number
of different color planes (e.g., R, G, B, nr, ng, o o, L, a, b,
H, S, V as derived in Table I). These color representations are
then combined at pixel level using the weights learned from
training data. This yields a single channel image y˜ where
road areas tend to be uniform and similar to the road model.
For each pixel p in this image, we can then make use of our
road model to define a classifier that determines if pixel p
belongs to the road, or to the background. In particular, we
use the classification rule
C(p) = (std(y˜ p ) < σd ) ∧ (hd (y˜ p ) > θ ) ,
Fig. 5. Overview of our road detection algorithm. Given an input image,
we convert it to multiple color planes, which are then combined using the
weights learned from the training images. We then use the resulting image
in conjunction with the learned appearance model to determine whether
a pixel belongs to the road or background class. Based on the detection
results, the current image may be added to the training set to improve the
model for future images.
TABLE I
D ERIVATION OF NORMALIZED rgb, OPPONENT COLOR SPACE , HSV AND
CIE–Lab COLOR SPACES FROM RGB VALUES .
Color Space
Definition
normalized RG
Opponent Color Space
G
R
, g = R+G+B
r = R+G+B
O = R−G
, O = B − R+G


  




HSV
V
 V  = 

V



−
√

√


−
√

−
√

H = arctan VV , S =

X
0.490
 Y  =  0.177
Z
0.000
0.310
0.812
0.010

√

√


R

G 

B
√ 
V +V


0.200
R
0.011   G 
0.990
B

CIE–Lab
Y 
L = (
Y ) − , i
h


X 
a =  ( X ) − ( YY )  ,
h

i
b =  ( YY )  − ( ZZ )  .
X , Y and Z correspond to white reference point.
(6)
where y˜ p is the image value at pixel p, hd is the normalized
histogram of training pixel values, and std(·) is the standard
deviation of the image value in a small region around pixel
p. In particular, we used a neighborhood of  ×  pixels. In
our experiments, the parameter θ was set to 0.01, but the
results are insensitive to its specific value. According to this
rule, pixel p is labeled as road if C(p) > .
The first term in Eq. (6) classifies pixels according to how
uniform the region around the pixel is. This term follows
the analysis based on uniformity proposed in [17]. However,
1
True Positive Rate (Sensitivity)
0.9
0.8
0.7
Ours, (EER=0.154, AUC=0.907)
Alv−prev, (EER=0.320, AUC=0.737)
H−prev, (EER=0.274, AUC=0.771)
S−prev, (EER=0.254, AUC=0.813)
V−prev, (EER=0.330, AUC=0.720)
Alv−cur, (EER=0.314, AUC=0.746)
H−cur, (EER=0.267, AUC=0.788)
S−cur, (EER=0.264, AUC=0.798)
V−cur, (EER=0.290, AUC=0.771)
0.6
0.5
0.4
0.3
0.03
0.1
False Positive Rate (1−Specificity)
1
Ours
Alv-prev
H-prev
S-prev
V-prev
Alv-cur
H-cur
S-cur
V-cur
AUC
. ± .
. ± .
. ± .
. ± .
. ± .
. ± .
. ± .
. ± .
. ± .
EER
.
.
.
.
.
.
.
.
.
Fig. 6. Quantitative comparison of our approach against different baselines. We show the ROC curves on the left and the area under the curve (AUC)
and the equal error rate (EER) on the right, averaged over 50 test images. Note that our approach yields better accuracy than the baselines. EER is defined
as the intersection between the curve and the line where error rates are equal, i.e., ( − T PR) = FPR.
uniformity is limited to the resolution of the histogram (i.e.,
 bins may not yield enough resolution), and often leads to
false positive detections (e.g., the sky often is uniform). The
second term in Eq. (6) removes many of these false positives
by classifying the pixels according to their transformed
image value compared to the expected road value, µd . For
non-road regions, these values are expected to be dissimilar.
Although any training set could be used, in our experiments we exploited the fact that images were acquired as
video sequences to perform detection. Therefore, to detect
the road in image t of the sequence, we used images up
to t − K as training images. Note that this still entails no
assumption on the current image, but only on the previous
images in the sequence. While some of these images may
violate the assumption that their lower part contains road
pixels, the combination of multiple training images proved
robust enough to still learn an effective road model. Furthermore, after detection of the road in image t, we can decide
whether or not to add it to the training set for image t +  by
analyzing the central–lower part of the road mask. If the area
is mainly road according to the current model, it is included
as training, otherwise the image is discarded. This process
minimizes the presence of outliers in the training set, and
thus offers some control over the error propagation of our
online algorithm.
V. E XPERIMENTS
In this section, we validate the proposed algorithm using
road images acquired in real driving situations. Our dataset
consists of  images of  ×  pixel resolution acquired using an onboard camera mounted on a vehicle. To
obtain our results, we used  different color planes: the R,
G, and B channels and the color planes defined in Table I.
The number of previous frames used to learn the road model
was set to K =  frames.
As a first baseline, we compared our algorithm against
the texture-less descriptor proposed in [17]. Similarly to
our approach, this method models the image pixels as a
linear combination of multiple color planes, whose weights
minimize the variance of the resulting representation across
the training set. However, in that work, detection is achieved
by only analyzing the uniformity (texture) of small areas.
Furthermore, no pruning of the training set is performed.
As a second baseline, we performed detection using a non–
parametric classifier on different color planes. In particular,
the hue, saturation and intensity planes from the HSV color
space (Table I). Hue and saturation are well–known to be
relatively invariant to lighting variations and shadows [7],
[19], while, on the other hand, the intensity is sensitive to
lighting variations. For these color planes, the road model
was taken as the normalized histogram built from the training
samples. This histogram was then used as an estimate of the
probability of each image pixel of belonging to the road
class. For each baseline, we used two different training sets.
First, as in our approach, we utilized the pixels from the
lower part of the previous images in the sequence, but without any pruning, since pruning is part of our contributions.
Second, we used only the lower part of the current image
as training data. The baselines are referred to as Alv-prev,
Alv-cur, {H, S,V }-prev and {H, S,V }-cur, respectively.
Fig. 6 shows the receiver operating curves (ROC) and area
under the curve (AUC) respectively for our method and the
different baselines averaged over 50 input images. Note that
our approach significantly outperforms the other methods.
The relatively poor performance of Alv-prev and Alv-cur
comes from the fact that the sky is often detected as road
due to its uniformity. On average the performance of the
baselines degrades when using previous images as training
data rather than just the current image. Our approach is able
to leverage the information in previous images to achieve
better accuracy than the baselines. For qualitative evaluation,
rather than showing simple road scenes where many methods
perform adequately, in Fig. 7 we show images with more
complex road shape, and shadows or other cars on the road.
Additionally, results on images where the lower part does
not belong to the road surface are shown in Fig. 8. These
results are provided without any post–processing steps (e.g.,
morphological operations). Note that our algorithm recovers
road areas despite shadows and the presence of other vehicles
in different areas of the image. This confirms the ability of
Input
Gnd-truth
Ours
Alv-prev
H-prev
S-prev
V-prev
Input
Alv-cur
H-cur
S-cur
V-cur
Fig. 7. Typical qualitative results. We compare our results to those obtained with different baselines. Results referred to as -prev use the same training
set as our approach. Results referred to as -cur use the lower part of the image as training set as shown in input . Note that our detections corresponds
much more closely to the ground-truth labels.
our algorithm to learn a robust road model without relying
on samples from the current image. From these results, it
is also clear that our approach outperforms the baselines. It
can be observed that Alv-prev and Alv-cur tend to detect
road pixels, but also other uniform areas (e.g., sky). This is
mainly due to the low resolution of the histogram used to
estimate the uniformity. Furthermore, Alv-cur fails to recover
the road when the lower part of the image does not belong
to the road surface (see Fig. 8). In most cases, {H, S,V }-prev
tends to fail to detect the road due to the large variations
input
ground truth
Ours
Alv-prev
H-prev
S-prev
V-prev
Input
Alv-cur
H-cur
S-cur
V-cur
Fig. 8. Qualitative results on images where the lower part does not correspond to the road. Note that approaches learning a model from the lower part
of the image fail to detect the road. In contrast, our method still yields good detections.
in the appearance of the road. In contrast, {H, S,V }-cur is
not able to properly characterize the road due to the lack
of generality of the pixels in the lower part of the image.
Furthermore, {H, S,V }-cur completely fails when the lower
part does not belong to the road surface.
Failure analysis reveals that our algorithm fails to distinguish uniformly colored areas when they are similar to the
road surface (e.g., gray vehicles). This is to be expected,
since our algorithm only uses appearance cues. Results could
be improved by adding a post–processing step to select areas
connected to the lower part of the image.
VI. C ONCLUSIONS
In this paper, we have introduced an approach to road
detection based on an appearance model learned from training data. Our algorithm exploits unlabeled training images
to collect positive examples, and removes the restrictive
assumption that the lower part of the input image contains
road pixels. Given training images, pixels are automatically
selected so as to allow modeling the diversity of road appearance. These pixels are encoded using a linear combination of
color planes, whose weights yield minimum variance in the
road areas. Detection is then performed by comparing the
input image pixels against the learned model. Experiments
conducted on real driving situations demonstrate the ability
of our algorithm to detect road despite the presence of
shadows and other objects in the scene, and show the benefits
of our approach over existing methods. In the future, we
intend to study the use of training images coming from
different sequences as training images, thus removing the
sequential aspect of our approach and hoping to build an
even more general appearance model.
ACKNOWLEDGEMENTS
NICTA is funded by the Australian Government as represented
by the Department of Broadband, Communications, and the Digital
Economy, and the Australian Research Council (ARC) through the
ICT Centre of Excellence Program.
R EFERENCES
[1] C. Thorpe, M. Hebert, T. Kanade, and S. Shafer, “Vision and navigation for the carnegie-mellon navlab,” IEEE Trans. on Pattern Analysis
and Machine Intelligence (PAMI), vol. 10, no. 3, pp. 362 – 373, May
1988.
[2] A. Lookingbill, J. Rogers, D. Lieb, J. Curry, and S. Thrun, “Reverse
optical flow for self-supervised adaptive autonomous robot navigation,” International Journal of Computer Vision (IJCV), vol. 74, no. 3,
pp. 287–302, 2007.
[3] P. Doll´ar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection:
An evaluation of the state of the art,” vol. 99, 2011.
[4] B. Kim, J. Son, and K. Sohn, “Illumination invariant road detection
based on learning method,” in ITSC’11, oct. 2011, pp. 1009 –1014.
[5] G. K. Siogkas and E. S. Dermatas, “Random-walker monocular road
detection in adverse conditions using automated spatiotemporal seed
selection,” IEEE Trans. on Intel. Transp. Systems (ITS), vol. PP, no. 99,
pp. 1 –12, 2012.
[6] C. Oh, J. Son, and K. Sohn, “Illumination robust road detection using
geometric information,” in ITSC’12, sept. 2012, pp. 1566 –1571.
[7] M. Sotelo, F. Rodriguez, and L. Magdalena, “Virtuous: vision-based
road transportation for unmanned operation on urban-like scenarios,”
IEEE Trans. Intelligent Transportation Systems (ITS), vol. 5, no. 2,
pp. 69 – 83, June 2004.
[8] Y. He, H. Wang, and B. Zhang, “Color–based road detection in urban
traffic scenes,” IEEE Trans. Intelligent Transportation Systems (ITS),
vol. 5, no. 24, pp. 309 – 318, 2004.
[9] C. Tan, T. Hong, T. Chang, and M. Shneier, “Color model-based
real-time learning for road following,” in ITSC’06: Procs. IEEE
Intl. Conf. on Intel. Transp. Systems, 2006, pp. 939–944.
[10] J. M. Alvarez, T. Gevers, and A. M. Lopez, “Learning photometric
invariance from diversified color model ensembles,” in CVPR’09,
2009, pp. 565–572.
[11] J. M. Alvarez and A. Lopez, “Road detection based on illuminant
invariance,” IEEE Trans. on Intelligent Transportation Systems (ITS),
vol. 12, no. 1, pp. 184 –193, 2011.
[12] H. Kong, J. Y. Audibert, and J. Ponce, “General road detection from a
single image,” IEEE Trans. on Image Processing (TIP), vol. 19, no. 8,
pp. 2211 –2220, 2010.
[13] C. Rasmussen, “Grouping dominant orientations for ill-structured road
following.” in CVPR’04: Procs. of the IEEE on Computer Vision and
Pattern Recognition, Washington, DC, 2004, pp. 470–477.
[14] P. Lombardi, M. Zanin, and S. Messelodi, “Switching models for
vision-based on–board road detection,” in ITSC’05: Procs. IEEE
Intl. Conf. on Intel. Transp. Systems, Vienna, Austria, 2005, pp. 67
– 72.
[15] P. Sturgess, K. Alahari, L. Ladicky, and P. H. S. Torr, “Combining
appearance and structure from motion features for road scene understanding,” in BMVC’09, 2009.
[16] S. Yun, Z. Guo-ying, and Y. Yong, “A road detection algorithm by
boosting using feature combination,” in IV’07: Procs. of the IEEE
Intel. Vehicles Symposium, June 2007, pp. 364–368.
[17] J. M. Alvarez, T. Gevers, Y. LeCun, and A. M. Lopez, “Road scene
segmentation from a single image,” in ECCV’12: Procs. of European
Conference Computer Vision, ser. Lecture Notes in Computer Science,
vol. 7578, 2012, pp. 376–389.
[18] A. Levinshtein, A. Stere, K. Kutulakos, D. Fleet, S. Dickinson, and
K. Siddiqi, “Turbopixels: Fast superpixels using geometric flows,”
PAMI, vol. 31, no. 12, 2009.
[19] C. Rotaru, T. Graf, and J. Zhang, “Color image segmentation in
HSI space for automotive applications,” Journal of Real-Time Image
Processing, pp. 1164–1173, 2008.