Salient Object Detection via Bootstrap Learning

Salient Object Detection via Bootstrap Learning
Na Tong1 , Huchuan Lu1 , Xiang Ruan2 and Ming-Hsuan Yang3
1
Dalian University of Technology 2 OMRON Corporation 3 University of California at Merced
Abstract
We propose a bootstrap learning algorithm for salient
object detection in which both weak and strong models are
exploited. First, a weak saliency map is constructed based
on image priors to generate training samples for a strong
model. Second, a strong classifier based on samples directly from an input image is learned to detect salient pixels.
Results from multiscale saliency maps are integrated to further improve the detection performance. Extensive experiments on six benchmark datasets demonstrate that the proposed bootstrap learning algorithm performs favorably against the state-of-the-art saliency detection methods. Furthermore, we show that the proposed bootstrap learning approach can be easily applied to other bottom-up saliency
models for significant improvement.
Figure 1. Saliency maps generated by the proposed method.
Brighter pixels indicate higher saliency values. Left to right: input, ground truth, weak saliency map, strong saliency map, and
final saliency map.
t object detection via bootstrap learning [22]. To address
the problems of noisy detection results and limited representations from bottom-up methods, we present a learning
approach to exploit multiple features. However, unlike existing top-down learning-based methods, the proposed algorithm is bootstrapped with samples from a bottom-up model, thereby alleviating the time-consuming off-line training
process or labeling positive samples manually.
1. Introduction
As an important preprocessing step in computer vision
problems to reduce computational complexity, saliency detection has attracted much attention in recent years. Although significant progress has been made, it remains a
challenging task to develop effective and efficient algorithms for salient object detection.
Saliency models include two main research areas: visual attention which is extensively studied in neuroscience
and cognitive modeling, and salient object detection which
is of great interest in computer vision. Salient object detection methods can be categorized as bottom-up stimulidriven [1, 8–12, 15–18, 20, 23, 28–38, 41, 43] and top-down
task-driven [19, 40, 42] approaches. Bottom-up methods
are usually based on low-level visual information and are
more effective in detecting fine details rather than global
shape information. In contrast, top-down saliency models are able to detect objects of certain sizes and categories
based on more representative features from training samples. However, the detection results from top-down methods tend to be coarse with fewer details. In terms of computational complexity, bottom-up methods are often more
efficient than top-down approaches.
In this paper, we propose a novel algorithm for salien-
2. Related Work and Problem Context
Both weak and strong learning models are exploited in
the proposed bootstrap learning algorithm. First, we compute a weak contrast-based saliency map based on superpixels of an input image. This coarse saliency map is smoothed by a graph cut method, where a set of training
samples is collected, where positive samples are pertaining
to the salient objects while negative samples are from the
background in this image. Next, a strong classifier based on
Multiple Kernel Boosting (MKB) [39] is learned to measure
saliency where three feature descriptors (RGB, CIELab color pixels, and the Local Binary Pattern histograms) are extracted and four kernels (linear, polynomial, RBF, and sigmoid functions) are used to exploit rich feature representations. Furthermore, we use multiscale superpixels to detect
salient objects of varying sizes. As the weak saliency model tends to detect fine details and the strong saliency model
focuses on global shapes, these two are combined to generate the final saliency map. Experiments on six benchmark
1
Figure 2. Bootstrap learning for salient object detection. A weak saliency map is constructed based on image priors to generate training
samples for a strong model. A strong classifier based on multiple kernel boosting is learned to measure saliency where three feature
descriptors are extracted and four kernels are used to exploit rich feature representations. Detection results at multiple scales are The weak
and strong saliency maps are weighted combined to generate the final saliency map.
datasets show that the proposed bootstrap learning algorithm performs favorably against the state-of-the-art saliency
detection methods. In addition, we incorporate the proposed bootstrap learning algorithm with existing bottom-up
saliency methods, and achieve significant improvement in
salient object detection. Figure 1 shows some saliency maps generated by the proposed method where brighter pixels
indicate higher saliency values.
Numerous bottom-up saliency detection methods have
been proposed in recent years. Itti et al. [16] propose a
saliency model based on a neural network that integrates
three feature channels over multiple scales for rapid scene
analysis. While it is able to identify salient pixels, the
results contain a significant amount of false detections.
A graph-based saliency measure is proposed by Harel et
al. [12]. However, this method focuses on eye fixation prediction and generates a low resolution saliency map similar
to [16]. Saliency models based on Bayesian inference have
been proposed in [29, 35, 36]. In [18], the low-level saliency stimuli and the shape prior are integrated using an iterative energy minimization measure. In [28], Perrazzi et al.
present a contrast-based saliency filter and measure saliency
by the uniqueness and spatial distribution of regions over an
image. While the above-mentioned contrast-based methods
are simple and effective, pixels within the salient objects are
not always highlighted well. Shen and Wu [30] construct
a unified model combining lower-level features and higherlevel priors for saliency detection based on the theory of low
rank matrix recovery. In [34], Wei et al. focus on the background instead of the foreground and build a saliency detection model based on two background priors, i.e., boundary
and connectivity. Cheng et al. [10] utilize a soft abstraction
method to remove unnecessary image details and produce
perceptually accurate salient regions. In [37], Yan et al.
formulate a multiscale method using a tree model to deal
with the scale problem. A graph-based bottom-up method
is proposed using manifold ranking [38]. Recently, Zhu et
al. [43] construct a salient object detection method based on
boundary connectivity.
Compared to bottom-up approaches, considerable efforts have been made on top-down saliency models. In [42],
Zhang et al. construct a Bayesian-based top-down model
by integrating both the top-down and bottom-up information where saliency is computed locally. A saliency model based on the Conditional Random Field is formulated
with latent variables and a discriminative dictionary in [40].
Jiang et al. [19] propose a learning-based method by regarding saliency detection as a regression problem where
the saliency detection model is constructed based on the integration of numerous descriptors extracted from training
samples with ground truth labels.
As these two categories bring forth different properties
of efficient and effective salient detection algorithms, we
propose a bootstrap learning approach which exploits the
strength of both bottom-up contrast-based saliency models
and top-down learning methods.
3. Bootstrap Saliency Model
Figure 2 shows the main steps of the proposed salient object detection algorithm. We first construct a weak saliency map from which the training samples are collected. For
each image, we learn a strong classifier based on superpixels. To deal with the scale problem, multiscale detection results are generated and merged to construct a strong saliency map. The final saliency map is the weighted integration
of the weak and strong maps for accurate detection results.
3.1. Image Features
Superpixels have been used extensively in vision tasks
as the basic units to capture the local structural information.
In this paper, we compute a fixed number of superpixels
from an input image using the Simple Linear Iterative Clustering (SLIC) method [2]. Three descriptors including the
RGB, CIELab and Local Binary Pattern (LBP) features are
used to describe each superpixel. The rationale to use two
different color representations is based on empirical results
where better detection performance is achieved when both
are used, which can be found in the supplementary document. We consider the LBP features in a 3 × 3 neighborhood of each pixel. Next, each pixel is assigned to a value
between 0 and 58 in the uniform pattern [27]. We construct
an LBP histogram for each superpixel, i.e., a vector of 59
dimensions ({hi }, i = 1, 2, ...59, where hi is the value of
the i-th bin in an LBP histogram).
3.2. Weak Saliency Model
The center-bias prior has been shown to be effective in
salient object detection [5, 25]. Based on this assumption,
we develop a method to construct a weak saliency model by
exploiting the contrast between each region and the regions along the image border. However, existing contrast-based
methods usually generate noisy results since low-level visual cues are limited. In this paper, we exploit the center-bias
and dark channel priors to better estimate saliency maps.
The dark channel prior is proposed for the image haze
removal task [14]. The main observation is that, for regions
that do not cover the sky (e.g., ground or buildings), there
exist some pixels with low intensity values in one of the
RGB color channels. Thus, the minimum pixel intensity in
any such region is low. The dark channel of image patches
is mainly generated by colored or dark objects and shadows,
which usually appear in the salient regions as shown in Figure 3. The sky region of an image usually belongs to the
background, which is just consistent with the dark channel
property for the sky region. Therefore, we exploit the dark
channel property to estimate saliency of pixels. In addition,
for situations where the input image has dark background
or bright foreground, we use an adaptive weight computed
based on the average value on the edge of dark channel map.
In the proposed method, we define the dark channel prior of an image on the pixel level. For a pixel p, the dark
channel prior Sd (p) is computed by
Sd (p) = 1 − min
min
I ch (q) , (1)
q∈patch(p)
ch∈{r,g,b}
where patch(p) is the 5 × 5 image patch centered at p and
I ch (q) is the color value of pixel q on the corresponding
color channel ch. Note that all the color values are normalized into [0, 1]. We achieve pixel-level accuracy instead of
Figure 3. Examples of dark channel prior. Left to right: input, dark
channel map and dark channel prior (the opposite of dark channel
map and brighter pixels indicate higher saliency values).
the patch-level counterpart in [14]. We also show the effect
of dark channel prior quantitatively in Figure 7(b).
An input image is segmented into N superpixels,
{ci }, i = 1, . . . , N . The regions along the image border
are represented as {nj }, j = 1, . . . , NB , where NB is the
number of regions along the image border. We compute
the dark
P channel prior for each region ci using Sd (ci ) =
1
p∈ci Sd (p), where Nci is the number of pixels within
N ci
the region ci . The coarse saliency value for the region ci is
constructed by


NB
X
X
 1
dκ (ci , nj ),
f0 (ci ) = g(ci )×Sd (ci )×
NB j=1
κ∈{F1 ,F2 ,F3 }
(2)
where dκ (ci , nj ) is the Euclidean distance between region
ci and nj in the feature space that κ represents, i.e., the
RGB (F1 ), CIELab (F2 ) and LBP (F3 ) texture features respectively. Note that all the distance values in each feature
space are normalized into [0, 1]. In addition, g(ci ) is computed based on the center prior using the normalized spatial
distance between the center of the superpixel ci and the image center [18]. Thus the saliency value of the region closer
to the image center is assigned a higher weight. We generate
a pixel-wise saliency map M0 using (2), where the saliency
value of each superpixel is assigned to the contained pixels.
Most existing methods usually use Gaussian filtering to
generate smoothed saliency maps at the expense of accuracy. In this paper, we use a simple yet effective algorithm
based on the Graph Cut method [7, 21], to determine the
foreground and background regions in M0 . Given an input
image, we construct an undirected graph G = (V, E, T ),
where E is a set of undirected edges that connect the nodes
V (pixels) while T is the set of the weights of nodes connected to the background and foreground terminals. The
weight of each node (pixel) p connected to the foreground
terminal is assigned with the saliency value in the pixelwise map M0 . Thus for each pixel p, the set T consists of
two components, defined as {T f (p)} and {T b (p)}, and is
computed by
T f (p) = M0 (p), T b (p) = 1 − M0 (p),
(3)
Figure 4. Performance of Graph Cut. Left to right: input, saliency
maps without Graph Cut, binary results using Graph Cut, saliency
maps after summing up the previous two maps.
where T f (p) is the weight of pixel p connected to the foreground while T b (p) is the weight to the background. The
minimum cost cut generates a foreground mask M1 using
the Max-Flow [6] method to measure the probability of each
pixel being foreground.
As shown in Figure 4, M1 is a binary map which may
contain noise in both foreground and background. Thus we
consider both the binary map M1 and the map M0 to conˇw
struct the continuous and smoothed weak saliency map M
by
ˇ w = M0 + M1 .
(4)
M
2
We show the performance of the Graph Cut method quantitatively in Figure 7(b). The training set for the strong classifier is selected from the weak saliency map. We compute the average saliency value for each superpixel and set
two thresholds to generate the training set containing both
positive and negative samples. The superpixels with saliency values larger than the high threshold are labeled as the
positive samples with +1 while those with saliency values
smaller than the low threshold as the negative samples labeled with −1. More details about the threshold setting can
be found in the supplementary document.
3.3. Strong Saliency Model
One of the main difficulties using a Support Vector Machine (SVM) is to determine the appropriate kernel for the
given dataset. This problem is more complicated when the
dataset contains thousands of diverse images with different properties. While numerous saliency detection methods
based on various features have been proposed, it is still not
clear how these features can be well integrated. To cope
with these problems, we present a method similar to the
Multiple Kernel Boosting (MKB) [39] method to include
multiple kernels of different features. We treat SVMs with
different kernels as weak classifiers and then learn a strong
classifier using the boosting method. Note that we restrict
the learning process to each input image to avoid the heavy
computational load of extracting features and learning kernels for a large amount of training data (as required in several discriminative methods [19] in the literature for saliency
detection).
The MKB algorithm is a boosted Multiple Kernel Learning (MKL) method [4], which combines several SVMs of
different kernels. For each image, we have the training samˇ
ples {ri , li }H
i=1 from the weak saliency map Mw (See Section 3.2) where ri is the i-th sample, li represents the binary
label of the sample and H indicates the number of the samples. The linear combination of kernels {km }M
m=1 is defined
by
k(r, ri ) =
M
X
βm km (r, ri ),
m=1
M
X
βm = 1, βm ∈ R+ , (5)
m=1
where βm is the kernel weight and M denotes the number
of the weak classifiers, and M = Nf × Nk . Here, Nf is the
number of the features and Nk indicates the number of the
kernels (e.g., Nf = 3, Nk = 4 in this work). For different
feature sets, the decision function is defined as a convex
combination,
Y (r) =
M
X
βm
m=1
H
X
αi li km (r, ri ) + ¯b,
(6)
i=1
where αi is the Lagrange multiplier while ¯b is the bias in
the standard SVM algorithm. The parameters {αi }, {βm }
and ¯b can be learned from a joint optimization process.
We note that (6) is a conventional function for the MKL
method. In this paper we use the boosting algorithm instead of the simple combination of single-kernel SVMs in
the MKL method. We rewrite (6) as
Y (r) =
M
X
βm (α⊤ km (r) + ¯bm ),
(7)
m=1
where α = [α1 l1 , α2 l2 , . . . , αH lH ]⊤ , km (r) =
PM
[km (r, r1 ), km (r, r2 ), . . . , km (r, rH )]⊤ and ¯b = m=1 ¯bm .
By setting the decision function of a single-kernel SVM as
zm (r) = α⊤ km (r) + ¯bm , the parameters can be learned
straightforwardly. Thus, (7) can be rewritten as
Y (r) =
J
X
βj zj (r).
(8)
j=1
In order to compute the parameters βj , we use the Adaboost
method and the parameter J in (8) denotes the number of iterations of the boosting process. We consider each SVM as
a weak classifier and the final strong classifier Y (r) is the
weighted combination of all the weak classifiers. Starting
with uniform weights, ω1 (i) = 1/H, i = 1, 2, . . . , H, for
the SVM classifiers, we obtain a set of decision functions
{zm (r)}, m = 1, 2, . . . , M . At the j-th iteration, we compute the classification error for each of the weak classifiers,
ǫm =
PH
i=1
ω(i)|zm (ri )|(sgn(−li zm (ri )) + 1)/2
, (9)
PH
i=1 ω(i)|zm (ri )|
where sgn(x) is the sign function, which equals to 1 when
x > 0 and −1 otherwise. We locate the decision function
zj (r) with the minimum error ǫj , i.e., ǫj = min1≤m≤M ǫm .
Then the combination coefficient βj is computed by βj =
1−ǫj
1−ǫj
1
1
2 log ǫj · 2 (sgn(log ǫj ) + 1). Note that βj must be
larger than 0, indicating ǫj < 0.5, which accords with the
basic hypothesis that the boosting method could make the
weak classifiers into a strong one. In addition, we update
the weight using the following equation,
ωj+1 (i) =
ωj (i)e−βj li zj (ri )
p
.
2 ǫj (ǫj − 1)
(10)
After J iterations, all the βj and zj (r) are computed and we
have a boosted classifier (8) as the saliency model learned
directly from an input image. We apply this strong saliency model to the test samples (based on all the superpixels
of an input image), and a pixel-wise saliency map is thus
generated.
To improve the accuracy of the map, we first use the
Graph Cut method to smooth the saliency detection resultˇ s by further
s. Next, we obtain the strong saliency map M
enhancing the saliency map with the guided filter [13] as
it has been shown to perform well as an edge-preserving
smoothing operator.
3.4. Multiscale Saliency Maps
The accuracy of the saliency map is sensitive to the number of superpixels as salient objects are likely to appear at
different scales. To deal with the scale problem, we generate four layers of superpixels with different granularities,
where N = 100, 150, 200, 250 respectively. We represent the weak saliency map (See Section 3.2) at each scale as
ˇ w } and the multiscale weak saliency map is comput{M
i
P4
ˇ w . Next, the training sets from
ed by Mw = 14 i=1 M
i
the four scales are used to train one strong saliency model
and the test sets (based on all the superpixels from four scales) are tested by the learned model simultaneously. Four
strong saliency maps from four scales are constructed (See
ˇ s }, i = 1, 2, 3, 4. Finally, we
Section 3.3), denoted as {M
i
P4
ˇs .
obtain the final strong saliency map as Ms = 14 i=1 M
i
As such, the proposed method is robust to scale variation.
3.5. Integration
The proposed weak and strong saliency maps have complementary properties. The weak map is likely to detect fine
details and to capture local structural information due to the
contrast-based measure. In contrast, the strong map works
well by focusing on global shapes for most images except
the case when the test background samples have similarity with the positive training set or large differences compared to the negative training set, or vice versa for the test
foreground sample. In this case, the strong map may misclassify the test regions as shown in the bottom row of Figure 1. Thus we integrate these two maps by a weighted
combination,
M = σMs + (1 − σ)Mw ,
(11)
where σ is a balance factor for the combination, and σ =
0.7 to weigh the strong map more than the weak map, and
M is the final saliency map via bootstrap learning. More
discussions about the values of σ can be found in the supplementary document.
4. Experimental Results
We present experimental results of 22 saliency detection
methods including the proposed algorithms on six benchmark datasets. The ASD dataset, selected from a bigger
image database [25], contains 1,000 images, and is labeled
with pixel-wise ground truth [1]. The THUS dataset [9]
consists of 10, 000 images where all images are labeled
with pixel-wise ground truth. The SOD dataset [26] is
composed of 300 images from the Berkeley segmentation
dataset where each one is labeled with salient object boundaries, based on which the pixel level ground truth [34] is
built. Some of the images in the SOD dataset include more
than one salient object. The SED2 dataset [3] contains 100
images which are labeled with pixel-wise ground truth annotations. It is challenging due to the fact that every image
has two salient objects. The Pascal-S dataset [24] contains
850 images which are also labeled with pixel-wise ground
truth. For comprehensive evaluation, we use all the images in the Pascal-S dataset for test instead of using 40%
for training and the rest for test as [24]. The DUT-OMRON
dataset [38] contains 5168 challenging images with pixelwise ground truth annotations. All the experiments are carried out using MATLAB on a desktop computer with an Intel i7-3770 CPU (3.4 GHz) and 32GB RAM. For fair comparison, we use the original source code or the provided
saliency detection results in the literature. The MATLAB
source code is available on our project site.
We first evaluate the proposed algorithms and other 19
state-of-the-art methods including the IT98 [16], SF [28], LRMR [30], wCO [43], GS SP [34], XL13 [36], RA10 [29],
GB [12], LC [41], SR [15], FT [1], CA [11], SVO [8],
CBsal [18], GMR [38], GC [10], HS [37], RC-J [9] and
DSR [23] methods on the ASD, SOD, SED2, THUS and
DUT-OMRON datasets. In addition, the DRFI [19] method
uses images and ground truth for training, which contains
part of the ASD, THUS and SOD datasets, and the results on the Pascal-S dataset are not provided. Accordingly,
we only compare our method with the DRFI model on the
SED2 dataset. Therefore, our methods are evaluated with
20 methods on the SED2 datasets. The MSRA [25] dataset
consists of 5,000 images. Since more than 3,700 images
in the MSRA dataset are included in the THUS dataset, we
do not present the evaluation results on this dataset due to
space limitations.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
Figure 5. Comparison of our saliency maps with ten state-of-the-art methods. Left to right: (a) input (b) GS SP [34] (c) wCO [43]
(d) LRMR [30] (e) GMR [38] (f) DSR [23] (g) XL13 [36] (h) HS [37] (i) RC-J [9] (j) GC [10] (k) SF [28] (l) Ours (m) wCO-bootstrapped
(n) ground truth. Our model is able to detect both the foreground and background uniformly.
4.1. Qualitative Results
We present some results of saliency maps generated
by twelve methods for qualitative comparison in Figure 5,
where “wCO-bootstrapped” means the wCO model bootstrapped by the proposed learning approach. The saliency
maps generated by the proposed algorithms highlight the
salient objects well with fewer noisy results. We note that
these salient objects appear at different image locations although the center-bias is used in the proposed algorithm.
The detected foreground and background in our maps are
smooth due to the using of the Graph Cut and guided filtering methods. As a result of using both weak (effective for
picking up details) and strong (effective for discriminating
boundaries) saliency maps, the proposed bootstrap learning
algorithm performs well for images containing multiple objects as shown in the bottom two rows of Figure 5. Furthermore, due to the contribution of the LBP features (effective
for texture classification), the proposed method is able to
detect salient objects accurately despite similar appearance
to the background regions as shown in the fourth and fifth
rows of Figure 5. More results can be found in the supplementary document.
4.2. Quantitative Results
We use the Precision and Recall (P-R) curve to evaluate
all the methods. We set the fixed threshold from 0 to 255
with an increment of 5 for a saliency map with consistent gray value, thus producing 52 binary masks. Using the
pixel-wise ground truth data, 52 pairs of average P-R values
of all the images included in the test datasets are computed. Figure 6 shows the P-R curves where several state-ofthe-art methods and the proposed algorithms perform well.
To better assess these methods, we compute the Area Under ROC Curve (AUC) for the best performing methods.
Table 1 shows that the proposed algorithms perform favorably against other state-of-the-art methods in terms of AUC
on all the six datasets that contain both single and multiple
salient objects.
In addition, we measure the quality of the saliency maps using the F-Measure by adaptively setting a segmentation threshold for binary segmentation [1]. The adaptive
threshold is twice the average value of the whole saliency map. Each image is segmented with superpixels and
masked out if the mean saliency values are lower than the
adaptive threshold. The average precision and recall values
are computed based on the generated binary masks and the
ground truth while the F-Measure is computed by
(1 + η 2 ) × P recision × Recall
,
(12)
Fη =
η 2 × P recision + Recall
and η 2 is set to 0.3 to weigh precision more than recall.
Figure 7(a) shows the F-Measure values of the evaluated
methods on the six datasets. Overall, the proposed algorithms perform well (with top or second values) against the
state-of-the-art methods.
4.3. Analysis of the Bootstrap Saliency Model
Every component in the proposed algorithm contributes
to the final saliency map. Figure 7(b) shows the performance of each step in the proposed method, i.e., the dark
channel prior, graph cut, weak saliency map, and strong
saliency map, among which the dark channel prior appears
to contribute least but is still indispensable for the overall
performance. The proposed weak saliency model may generate less accurate results than several state-of-the-art methods, but it is efficient with less computational complexity.
0.6
0.8
1
0
Recall
0.2
(a) ASD dataset
0.920397 0.900215
0.318609 0.343403
0.889093 0.779385
0.300779 0.325901
0.982312 0.956932
0.304591 0.332426
0.811953 0.724261
0.403439 0.450895
0.908959 0.856114
0.411477 0.465892
0.981352 0.965058
0.343307 0.370144
0.864981 0.855784
0.450058 0.467134
0.951824 0.935726
CBsal
0.341097 0.356443
0.828456GB0.774615
0.517502GMR
0.562637
0.998903 0.993501
0.283778SF0.301968
0.994851wCO
0.972493
0.288398 0.33035
0.880188
0.367061
0.690937
0.34763
0.933906
0.354204
0.674243
0.482919
0.818419
0.502491
0.94625
0.398938
0.846473
0.484738
0.91753
0.370743
0.734617
0.588039
0.979455
0.338161
0.920558
0.41584
wCO-bootstrapped
0.85977 0.835566
0.385807 0.405899
0.619392 0.561021
0.365899 0.385027
0.914687 0.896828
0.372869 0.389554
0.637425 0.603023
0.50621 0.523534
0.785698 0.760926
0.529808 0.550709
0.930765 0.914455
0.420865 0.444195
0.836489 0.825607
0.499938 0.516507
0.903515 0.889645
FT
0.383495 0.394081
GC0.677636
0.704402
0.606665
HS0.622914
0.95613 0.919795
XL13
0.387345
0.438927
0.850387
0.790701
Ours
0.509834 0.576931
0.807048
0.419001
0.511016
0.40402
0.879444
0.405172
0.577778
0.536015
0.739447
0.568408
0.897402
0.464804
0.813837
0.534666
0.876273
0.407028
0.654846
0.63587
0.882807
0.484032
0.753508
0.612973
0.1
0
0.2
0.4
0.6
0.8
1
0.775269
0.9 1
0.278717
0.465889 1
0.278717
0.424071
0.8
0.861938 1
0.278717
0.419798
0.7 1
0.555252
0.278717
0.545794
0.717638 1
0.6
0.278717
0.584325
0.876269 1
0.278717
0.481924
0.5
0.800785 1
0.550987
0.278717
0.862776
0.4 1
0.418978
0.278717
0.634945 1
0.3
0.650379
0.278717
0.845409 1
0.522597
0.278717
0.2 1
0.725791
0.638586
0.278717
1
0.1
0.278717
Precision
Precision
1
0.9
0.278717
1
0.278717
0.8
1
0.278717
0.7 1
0.278717
1
0.6
0.278717
1
0.278717
0.5
1
0.278717
0.4 1
0.278717
1
0.3
0.278717
1
0.278717
0.2 1
0.278717
0
Recall
0.965883 0.95398
0.285422 0.30894
0.878138 0.769397
0.291383 0.318867
0.9873 0.968333
0.277074 0.305775
0.846864 0.778059
0.386484 0.433136
0.938931 0.899931
0.375932 0.432296
0.979326 0.966887
0.308492 0.338046
0.5269 0.429284
0.485926 0.519804
0.890918
0.853066
CBsal
0.474668 0.511158
GB 0.926569
0.947408
0.315744
0.335085
GMR
0.998558 0.994203
SF 0.270651
0.253438
0.994877
0.980451
XL13
0.25874 0.295281
0.942742
0.326191
0.680655
0.339337
0.950208
0.328961
0.734257
0.465611
0.870508
0.466385
0.953031
0.363343
0.376869
0.541968
0.825251
0.535236
0.910012
0.348504
0.98472
0.305467
0.949376
0.369636
wCO-bootstrapped
0.925081
0.347144
0.606671
0.361939
0.934101
0.34844
0.701182
0.48654
0.843321
0.494078
0.94211
0.385162
0.341243
0.558288
0.802656
FT
0.552803
GC
0.897208
0.360016
HS
0.967267
wCO
0.352755
0.897116
Ours
0.462459
0.890885
0.386078
0.488931
0.395018
0.903473
0.382191
0.649816
0.517125
0.796207
0.538854
0.918466
0.426283
0.28854
0.579269
0.767143
0.58002
0.871442
0.382424
0.911625
0.444531
0.817102
0.556053
0.1
0
Pascal-S---1
0.2
0.4
0.6
0.2
0.4
Recall
0.960624
0.334325
0.782123
0.555101
0.818419
0.502491
0.957521
0.34706
0.671189
0.338826
0.956097
0.33632
0.9692
0.305473
0.846473
0.484738
0.546597
0.421177
0.998476
0.281767
0.979455
0.338161
0.920558
0.41584
0.4
0.95479 0.945668 0.936532
0.745381 0.765974 0.782669
0.978928 0.974681 0.970165
0.561826 0.602324 0.637095
0.975782 0.962913 0.947228
0.31854 0.344854 0.369555
0.978837 0.97597 0.973555
0.567411 0.598715 0.627108
0.983804 0.974234 0.961022
0.335606 0.361047 0.384458
0.771966 0.741939 0.714665
0.374589 0.402885 0.427825
0.995607 0.99253 0.98726
0.330774 0.375495 0.420845
0.997798 0.99779 0.997773
RC-J
0.227349 0.227497 0.227706
GS_SP 0.357466 0.3072
0.41962
0.427229
0.444462 0.458069
LC
0.999943 0.999863 0.999743
RA10 0.20593 0.2092
0.203536
0.99552
SVO 0.991584 0.987516
0.403944 0.483701 0.552295
wCO-bootstrapped
0.991479
0.987154 0.983048
0.618884 0.702657 0.750125
0.6
0.8
1
0.942776 0.922811 0.899572
0.35178 0.368478 0.384358
0.750731 0.722971 0.69682
0.580696 0.600371 0.617507
0.785698 0.760926 0.739447
0.529808 0.550709 0.568408
0.937717 0.91511 0.890587
0.366539 0.384716 0.402191
0.616628 0.571376
0.532
0.352695 0.365443 0.379055
0.934126 0.908171 0.880446
0.358456 0.381386 0.402669
0.969114 0.968957 0.968752
0.305578 0.305748 0.306006
0.836489 0.825607 0.813837
0.499938
DSR 0.516507 0.534666
0.455467 0.382779 0.32432
IT98
0.442881 0.460598 0.474906
LRMR0.996979 0.995556
0.997882
0.282968
RC-J 0.284667 0.286572
0.95613
SVO 0.919795 0.882807
0.387345 0.438927 0.484032
wCO-bootstrapped
0.850387
0.790701 0.753508
0.509834 0.576931 0.612973
Recall
0.6
0.8
1
0.927745 1 1
0.222103
0.797484
1
0.965189
0.9
0.222103
0.669028
1
0.928877
0.222103
0.392639
0.8
1
0.970536
0.222103
0.652996
0.944978 0.7 1
0.222103
0.406513
0.689201 0.6 1
0.222103
0.450075
1
0.981167
0.5
0.222103
0.465218
1
0.997734
0.222103
0.4
1
0.222103
0.469238
0.3 1
0.999635
0.222103
0.213592
0.982987 0.2 1
0.222103
0.609845
0.979367
0.1
0.783839
0.987069 0.980847
0.314634 0.370988
0.951958 0.903791
0.293593 0.362877
0.968468 0.923185
0.297256 0.36568
0.91468 0.877391
0.552342 0.633414
0.95948 0.935991
0.54934 0.633719
0.977386 0.964358
0.425414 0.48894
0.761261 0.693828
0.709993 0.757686
CBsal
0.950287
0.934015
0.656596 0.70949
GB
0.980551 0.972205
0.402736
0.44558
GMR
0.999773 0.998923
SF
0.242388 0.286028
0.999103
0.995535
XL13
0.258531 0.341484
0.97397
0.416672
0.859761
0.419597
0.86687
0.420626
0.852093
0.676323
0.916691
0.684846
0.952411
0.539133
0.651438
0.782122
0.918569
0.742347
0.964589
0.474548
0.996485
0.347219
0.986617
0.474535
wCO-bootstrapped
0
0.2
0.4
0.965677 0.956743
0.458815 0.497213
0.818823 0.780645
0.466306 0.505243
0.805067 0.741772
0.464475 0.499184
0.832689 0.8159
0.705022 0.726102
0.900049 0.884974
0.720312 0.747557
0.941185 0.929486
0.580378 0.615381
0.61917 0.592876
0.798695 0.811369
FT 0.893063
0.905108
0.764508 0.781974
GC
0.957946 0.951684
0.497976
HS 0.518036
0.991469 0.983682
wCO
0.418188 0.487416
0.9695
Ours0.951501
0.612212 0.692879
Recall
0.6
0.8
Recall
1
0.9 1
0.407191
0.248389
0.439762 1
0.408844
0.248389
0.8
0.887888 1
0.397633
0.248389
0.627203
0.7 1
0.529202
0.248389
0.774922 1
0.6
0.556143
0.248389
0.904709 1
0.451115
0.248389
0.5
0.270033
1
0.586938
0.248389
0.750837 1
0.4
0.248389
0.858831 1
0.391502
0.3
0.248389
0.883466 1
0.478597
0.248389
0.2 1
0.792197
0.579995
0.248389
1
0.1
0.248389
0
0.946469
0.533738
0.744503
0.538392
0.679701
0.527023
0.800529
0.744269
0.871189
0.769427
0.919761
0.64351
0.569939
0.820342
0.881937
0.796488
0.9453
0.535715
0.973703
0.548543
0.937318
0.739694
0.8
0.9351191
1
0.568017
0.222103
0.7101181
0.9
0.566386
0.222103
0.6206021
0.549691
0.222103
0.8
0.7861481
0.222103
0.7 1
0.222103
0.910072
0.6 1
0.671294
0.222103
0.5491341
0.5
0.827856
0.222103
0.8712361
0.809351
0.222103
0.4
0.9391891
0.551826
0.222103
0.3
0.9623811
0.222103
0.2 1
0.925193
0.773425
0.222103
0.1 1
0.222103
1
0
0.996328 0.989178
0.281958 0.322188
0.957917 0.932881
0.599362 0.664561
0.330148 0.306394
0.538401 0.551535
0.88152 0.826333
0.331526 0.379626
0.999549 0.998413
0.239974 0.267948
0.990574 0.990574
0.251908 0.251911
0.952553 0.948099
0.446923 0.493239
0.965047 0.957245
CA 0.55458
0.512723
0.771609
IT980.616693
0.383583 0.433048
LRMR
0.999836
0.999742
RC-J0.226093
0.224324
0.999773
SR 0.998923
0.242388 0.286028
Ours0.995535
0.999103
0.258531 0.341484
0.2
0.978913
0.355386
0.911647
0.705254
0.288952
0.561188
0.783045
0.41519
0.995953
0.301171
0.990572
0.251919
0.942357
0.536935
0.9497
0.587868
0.508506
0.462678
0.999586
0.228505
0.996485
0.347219
0.986617
0.474535
0.4
0.965924 0.950435 0.932833
0.384211 0.410173 0.433882
0.89299 0.875778 0.859471
0.73411 0.756322 0.774389
0.274662 0.26235 0.25145
0.56911 0.576003 0.582197
0.74554 0.711841 0.680906
0.444234 0.469051 0.490581
0.992031 0.986492 0.979276
0.33783 0.375759 0.41396
0.990569 0.99056 0.99054
0.251936 0.251968 0.252033
0.935989 0.928907 0.921142
0.576124 0.611286 0.64286
0.941221 0.933144 0.924724
DSR 0.639386 0.660974
0.615624
0.426506
0.361824 0.309413
LC
0.483773 0.499986 0.512909
RA10 0.998974 0.99856
0.999326
GS_SP 0.237431 0.24362
0.232315
0.991469
SVO 0.983682 0.973703
0.418188 0.487416 0.548543
wCO-bootstrapped
0.9695
0.951501 0.937318
0.612212 0.692879 0.739694
0.6
0.8
0.913483
0.455589
0.843914
0.789605
0.241636
0.652046
0.509777
0.970024
0.451702
0.990507
0.252174
0.670765
0.266312
0.523653
0.998104
0.251088
0.962381
0.925193
0.773425
1
Recall
(b) THUS dataset
0.875362 1 1
0.399934
0.213355
0.672548
1
0.9
0.632632
0.213355
0.717638
1
0.584325
0.213355
0.8
0.863454
1
0.418906
0.213355
0.495953 0.7 1
0.213355
0.390106
0.852426 0.6 1
0.213355
0.422975
1
0.5
0.213355
0.306416
1
0.800785
0.213355
0.550987
0.4
1
0.276651
0.213355
0.486527
0.3 1
0.994192
0.213355
0.290644
0.845409 0.2 1
0.213355
0.522597
1
0.725791
0.1
0.213355
0.638586
0.936126 0.922062
0.386271 0.453001
0.952448 0.92478
0.574942 0.645992
0.899414 0.827485
0.419311 0.534552
0.981994 0.959752
0.238611 0.263392
0.911909 0.865299
0.505345 0.581536
0.880763 0.84456
0.533234 0.626902
0.948966 0.930895
0.465231 0.509529
CBsal
0.759301
0.700876
0.756712
FT 0.804471
0.965879 0.954867
GC 0.283322
0.269846
0.883156
HS 0.855872
0.648819 0.736792
XL130.998045
0.999494
0.226993 0.266335
Ours
0.993875 0.984345
0.238523 0.33158
0
0.2
0.91379
0.481837
0.904946
0.689842
0.765704
0.605753
0.938597
0.282927
0.838978
0.621193
0.816747
0.699246
0.90895
0.575584
0.662971
0.830692
0.942914
0.300273
0.835237
0.778927
0.995005
0.322554
0.970906
0.449444
0.4
0.901447 0.891325 0.882745
0.516842 0.553192 0.592218
0.887637 0.868179 0.855509
0.716141 0.736075 0.753745
0.713276 0.667121 0.623763
0.656277 0.697904 0.727815
0.918975 0.89956 0.881011
0.300164 0.316213 0.331714
0.821752 0.804198 0.785714
0.642791 0.656156 0.668612
0.795086 0.773492 0.753965
0.73825 0.758382 0.767909
0.876858 0.862216 0.854197
0.595728 0.615217 0.628555
DRFI 0.599371 0.573255
0.6293
0.844333
0.85412 0.858786
GB
0.931523 0.922264 0.913002
GMR 0.327719 0.344419
0.315331
0.819011
0.804484 0.793123
SF
0.803447 0.817378 0.828515
wCO 0.978136 0.962468
0.988546
0.383245 0.439602 0.491917
wCO-bootstrapped
0.94491 0.917434 0.892055
0.55163 0.61131 0.661932
0.6
0.8
1
0.869951
1 1
0.610487
0.213355
0.845319 1
0.9
0.773769
0.213355
0.583773 1
0.753631
0.213355
0.8
0.863264 1
0.347287
0.213355
0.7 1
0.772567
0.684892
0.213355
0.740319
0.6 1
0.779106
0.213355
0.847909 1
0.5
0.647151
0.213355
0.548363 1
0.859991
0.213355
0.4
0.902293 1
0.354018
0.213355
0.3 1
0.783876
0.836982
0.213355
0.943978
0.2 1
0.538216
0.213355
0.874709 1
0.1
0.698562
0.213355
0
0.983056 0.965443
0.257229 0.300394
0.911357 0.869239
0.539653 0.620679
0.887505 0.869648
0.502608 0.576543
0.993861 0.983784
0.262551 0.302232
0.921097 0.881372
0.294889 0.362692
0.996323 0.993363
0.22379 0.240993
0.967794 0.967424
0.232467 0.232496
0.841261 0.825767
CA
0.430695 0.465339
0.809991
0.676959
GS_SP
0.364592 0.435841
LC
0.999984 0.999939
RA10
0.213501
0.213812
0.999494
SR 0.998045
0.226993 0.266335
Ours
0.993875
0.984345
0.238523 0.33158
0.2
0.947118
0.3343
0.836998
0.665437
0.850465
0.627723
0.973254
0.334684
0.848972
0.423412
0.988582
0.264212
0.967354
0.232507
0.816857
0.505593
0.576438
0.475014
0.999842
0.214274
0.995005
0.322554
0.970906
0.449444
0.4
0.924838 0.901323 0.877169
0.362574 0.38941 0.414475
0.808718 0.786105 0.765419
0.700039 0.729821 0.752567
0.834849 0.82046 0.806779
0.65775 0.687091 0.705846
0.961812 0.949822 0.93689
0.360886 0.385321 0.408389
0.821028 0.794367 0.769576
0.471221 0.514498 0.54928
0.981045 0.969616 0.953867
0.288098 0.316204 0.349571
0.967333 0.967291 0.967185
0.232547 0.232629 0.232772
0.806882 0.797977 0.787129
DSR
0.536024 0.561506 0.585339
0.497468
IT98 0.433664 0.380941
0.503376 0.524367 0.54078
LRMR
0.999618 0.999117 0.998538
RC-J 0.215989 0.217263
0.215091
0.988546
SVO 0.978136 0.962468
0.383245 0.439602 0.491917
wCO-bootstrapped
0.94491
0.917434 0.892055
0.55163 0.61131 0.661932
0.6
0.8
1
0.852466
0.437278
0.747873
0.770373
0.795082
0.718395
0.429299
0.746927
0.582565
0.939025
0.384918
0.966822
0.780156
0.612561
0.336528
0.555837
0.997698
0.219156
0.943978
0.538216
0.874709
0.698562
Recall
Recall
(c) SOD dataset
0.90906
0.365129
0.543551
0.378982
0.918806
0.36594
0.673356
0.502147
0.818823
0.518836
0.930923
0.407627
0.311992
0.569247
0.784211
0.567355
0.884389
0.370742
0.941691
0.402201
0.849466
0.522307
Precision
Precision
0.9 1
0.248389
1
0.248389
0.8
1
0.248389
0.7 1
0.248389
1
0.6
0.248389
1
0.248389
0.5
1
0.248389
0.4 1
0.248389
1
0.3
0.248389
1
0.248389
0.2 1
0.248389
0.990269 0.976658
0.295729 0.315601
0.875416 0.822041
0.46948 0.520541
0.908959 0.856114
0.411477 0.465892
0.989806 0.974386
0.303741 0.326643
0.826139 0.738274
0.305291 0.323485
0.992001 0.976574
0.295655 0.315207
0.969722 0.969285
0.305168 0.305385
0.864981 0.855784
0.450058
CA 0.467134
0.820011 0.663094
GS_SP
0.353377 0.393224
LC 0.998939
0.999164
0.279978
0.280998
RA10
0.998903
SR 0.993501
0.283778 0.301968
Ours
0.994851
0.972493
0.288398 0.33035
0.964101
0.718003
0.982513
0.51417
0.985509
0.290259
0.982108
0.533049
0.990654
0.307885
0.806577
0.341013
0.997843
0.287794
0.997806
0.227236
0.498125
0.404479
0.999963
0.201901
0.998489
0.317062
0.995912
0.460789
Precision
0.982384 0.97312
0.614823 0.679576
0.989102 0.98598
0.409187 0.461513
0.997455 0.992517
0.228139 0.260025
0.988088 0.984873
0.445325 0.494336
0.998419 0.995459
0.240503 0.276504
0.896744 0.84615
0.25991 0.304303
0.999749 0.999104
0.217465 0.249498
0.997904 0.997824
DSR
0.22674 0.22713
CA 0.602621
0.756989
0.326707
IT980.373896
0.999989 0.999981
LRMR
0.19984
0.200697
0.999898
SR 0.999571
0.210277 0.248633
Ours0.998742
0.999705
0.2216 0.302111
Precision
0.879479
1
0.198502
0.764197
0.922293 1
0.9
0.198502
0.833464
0.959811 1
0.198502
0.640131
0.8
0.972685 1
0.198502
0.542221
0.7
0.761542 1
0.198502
0.565179
0.974305
0.6 1
0.198502
0.377825
0.747103 1
0.5
0.198502
0.850679
0.965436 1
0.198502
0.530991
0.4
0.954786 1
0.198502
0.808803
0.3
0.987516 1
0.198502
0.552295
0.2 1
0.983048
0.198502
0.750125
1
0.1
0.198502
Precision
0.4
0.889679
0.74431
0.930239
0.811859
0.966175
0.598013
0.978369
0.493797
0.801431
0.522639
0.981953
0.353891
0.767664
0.838617
0.969592
0.512009
0.959326
0.793867
0.991584
0.483701
0.987154
0.702657
(d) SED2 dataset
0.990412 0.975769
0.274154 0.298187
0.865216 0.807304
0.436554 0.486611
0.989697 0.974834
0.275304 0.29781
0.840079 0.768997
0.274989 0.293339
0.99368 0.984775
0.264879 0.284262
0.978797 0.978456
0.277877 0.278161
0.892977 0.886302
0.423379 0.437934
0.912275 0.89578
CA 0.440591
0.416359
0.76475
IT980.604459
0.3217 0.352792
LRMR
0.999529 0.999318
RC-J0.252126
0.250104
0.998558
SR 0.994203
0.253438 0.270651
Ours
0.994877
0.980451
0.25874 0.295281
0.2
0.957573
0.31799
0.759725
0.51764
0.95781
0.316962
0.714843
0.307346
0.97179
0.306064
0.978369
0.278237
0.879399
0.453332
0.87998
0.460114
0.49216
0.373129
0.998821
0.253655
0.98472
0.305467
0.949376
0.369636
0.4
0.936719 0.914022 0.889692
0.335862 0.352273 0.367438
0.721945 0.691511 0.665219
0.53937 0.55714 0.572755
0.937258 0.914989 0.889975
0.333716 0.349298 0.363896
0.669288 0.628861 0.593356
0.318316 0.327509 0.335946
0.955676 0.936512 0.914422
0.328005 0.350179 0.372331
0.978316 0.978249 0.97815
0.278307 0.278415 0.278616
0.869949 0.859089 0.847783
0.466989 0.480445 0.493182
0.866285 0.850857 0.834572
DSR 0.489981 0.503228
0.476494
0.407781
0.342621 0.290825
LC
0.388438 0.400207 0.40982
RA10
0.998364 0.996849 0.995714
GS_SP0.257485 0.261542
0.255196
0.967267
SVO 0.941691 0.911625
0.352755 0.402201 0.444531
wCO-bootstrapped
0.897116
0.849466 0.817102
0.462459 0.522307 0.556053
Recall
0.6
0.8
1
0.864076
0.381454
0.642916
0.586136
0.863607
0.377602
0.561494
0.344015
0.890593
0.393778
0.977947
0.278947
0.833327
0.505408
0.820381
0.515803
0.249069
0.417906
0.994589
0.266602
0.883466
0.478597
0.792197
0.579995
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.3
0.2
CBsal
FT
GB
GC
GMR
HS
SF
wCO
XL13
Ours
Precision
0.2
0.901242
0.7209
0.939012
0.784004
0.972422
0.546427
0.983668
0.444039
0.841839
0.473571
0.98805
0.328963
0.790586
0.825166
0.973411
0.489534
0.963843
0.774438
0.99552
0.403944
0.991479
0.618884
Precision
Precision
0
RC-J
0.915226
0.689749
0.947988
0.745228
0.978828
0.487547
0.988612
0.39482
0.881957
0.411997
0.992776
0.302544
0.813928
0.803371
GMR
0.977734
CBsal
0.461323
0.969033
GB
0.747522
0.998489
XL13
0.317062
Ours
0.995912
0.460789
Precision
1 0.956859 0.932825
0.198502 0.529796 0.636434
1 0.971362 0.958368
0.198502 0.596668 0.689279
wCO
1 0.991373 0.984814
0.8 0.198502 0.34036 0.420089
GMR
1 0.996639 0.993223
0.198502 0.275404 0.340007
SF 0.7
1 0.96539 0.92304
0.198502 0.264398 0.34321
XL130.6
1 0.999299 0.996734
0.198502 0.236202 0.272545
GC
1 0.886054 0.843728
0.5
0.198502
GC 0.705022 0.769759
GB
1 0.988078 0.982603
HS 0.38423 0.427996
0.4 0.198502
FT
1 0.981387 0.974862
FT
DRFI
0.3 0.198502 0.633372 0.704613
Ours
SF1 0.999898 0.999571
0.198502 0.210277 0.248633
wCO
0.2
wCO-ours
1 0.999705 0.998742
0.198502 0.2216 0.302111
wCO-bootstrapped
DSR 0.1
1
0.9
Precision
HS
Cbsal
0.4
0.3
0.2
wCO-bootstrapped
CA
DSR
IT98
LC
LRMR
RA10
RC-J
GS_SP
SR
SVO
Ours
wCO-bootstrapped
0.1
0.1
0
0.2
0.4
0.6
0.8
0
1
0.2
0.4
(e) Pascal-S dataset
0.6
0.8
1
Recall
Recall
(f) DUT-OMRON dataset
Figure 6. P-R curve results on six datasets.
THUS
Pascal-S
THUS (b)
Pascal-S (b)
SOD
ASD
SOD (b)
ASD (b)
SED2
DUT-OMRON
F-measure
0.8
0.7
0.6
0.5
0.4
0.3
0.2
SED2 (b)
DUT-OMRON (b)
11
0.198502
1
0.9
0.198502
1
0.198502
0.8
1
0.198502
0.7 1
0.198502
Precision
1
0.9
0.999367
0.259068
0.998443
0.236905
0.999143
0.278919
0.999839
0.206821
0.999898
0.210277
0.996218
0.355843
0.99556
0.303001
0.994943
0.389449
0.999603
0.2265
0.999571
0.248633
0.98951
0.443543
0.990095
0.3724
0.987283
0.480892
0.999021
0.261786
0.998489
0.317062
0.980134
0.51426
0.981637
0.435355
0.975956
0.551728
0.99793
0.317829
0.99552
0.403944
0.968261
0.572371
0.972306
0.489507
0.963132
0.608891
0.995926
0.39181
0.991584
0.483701
0.954498
0.621554
0.960871
0.536611
0.948835
0.656242
0.992831
0.470813
0.987516
0.552295
0.6
Weak saliency map without dark channel prior
0.5
Weak saliency map without graph cut
0.4
Weak saliency map
0.939315
0.663776
0.948325
0.576741
0.933191
0.696612
0.988666
0.544194
0.982987
0.609845
Strong saliency map
0.3
Ours
0.2
0
0.2
0.4
Recall
0.6
0.8
1
(a) F-measure
(b) P-R curve
Figure 7. (a) is the F-measure values of 21 methods on six datasets. Note that “ * (b)” shows improvement of state-of-the-art methods by
the bootstrap learning approach on the corresponding dataset as stated in Section 4.4. (b) shows performance of each component in the
proposed method on the ASD dataset.
4.4. Bootstrapping State-of-the-Art Methods
The performance of the proposed bootstrap learning
method hinges on the quality of the weak saliency model.
If a weak saliency model does not perform well, the proposed algorithm is likely to fail as an insufficient number
of good training samples can be collected for constructing
the strong model for a specific image. Figure 8 shows examples where the weak saliency model does not perform
well, thereby affecting the overall performance of the proposed algorithm. This motivates us that the proposed algorithm can be used to bootstrap the performance of the state-
Table 1. AUC (Area Under ROC Curve) on the ASD, SED2, SOD, THUS, Pascal-S and DUT-OMRON datasets. The best two results are
shown in red and blue fonts respectively. The colomn named “wCO-b” denotes the wCO model after bootstrapping using the proposed
approach. The proposed methods rank first and second on the six datasets. The two rows named “ASD (b)” show the AUC of the saliency
results by taking other state-of-the-art saliency maps as the weak saliency maps in the proposed approach on the ASD dataset. All the
evaluation results of the state-of-the-art methods are largely improved over the original results as shown in the two rows named “ASD”.
Ours
.9828
.9363
.8477
.9635
.8682
.8794
wCO
.9805
.9062
.8217
.9525
.8597
.8927
.9904
HS
.9683
.8387
.8169
.9322
.8368
.8604
.9876
LRMR
.9593
.8886
.7810
.9199
.8121
.8566
.9825
RC-J
.9735
.8606
.8238
.9364
.8379
.8592
.9869
RA10
.9326
.8500
.7710
.8810
.7836
.8264
.9817
of-the-art methods (i.e., with better weak saliency maps).
Thus, we generate different weak saliency maps by applying the graph cut method on the results generated by the
state-of-the-art methods. Note that we only use two scales
instead of four scales for efficiency and use equal weights in (11) (to better use these “weak” saliency maps) in the
experiments. Figure 9 shows the P-R curves on the ASD
dataset and Figure 7(a) shows the F-measure on six tested
datasets. In addition, the AUC measures are shown on the
two rows named “ASD (b)” of Table 1. These results show
that the performance of all state-of-the-art methods can be
significantly improved by the proposed bootstrap learning
algorithm. For example, the performance improvement of
the SR method over the original model for the AUC value
is 22.3%. Four methods, the wCO, DSR, HS and GS SP,
achieve over 0.987 using the proposed bootstrap learning
method and nine methods achieve higher than 0.98 in terms
of AUC. Four methods, the RC-J, GMR, wCO and DSR,
achieve over 0.98 for the highest precision value in the PR curve on the ASD dataset. The average F-measure performance gains of 19 methods on the ASD, SOD, SED2,
Pascal-S, THUS and DUT-OMRON datasets are 10.5%,
14.0%, 5.2%, 14.4%, 10.3% and 9.9% respectively.
5. Conclusion
In this paper, we propose a bootstrap learning model
for salient object detection in which both weak and strong
saliency models are constructed and integrated. Our learning process is restricted within multiple scales of the input
image and is unsupervised since the training examples for
the strong model are determined by a weak saliency map
based on contrast and image priors. The strong saliency
GC
.9456
.8618
.7181
.9032
.7479
.7931
.9773
SVO
.9530
.8773
.8043
.9280
.8226
.8662
.9722
DSR
.9774
.9136
.8380
.9504
.8299
.8922
.9872
GB
.9146
.8448
.8191
.8132
.8380
.8565
.9619
GS SP
.9754
.8999
.7982
.9462
.8553
.8786
.9888
FT
.8375
.8185
.6006
.7890
.6220
.6758
.9506
GMR
.9700
.8620
.7982
.9390
.8315
.8500
.9844
CA
.8736
.8585
.7868
.8712
.7829
.8137
.9531
SF
.9233
.8501
.8238
.8510
.6830
.7628
.9723
SR
.6973
.7593
.6695
.7149
.6585
.6799
.8530
XL13
.9609
.8470
.7868
.9353
.7983
.8160
.9791
LC
.7772
.8366
.6168
.7673
.6191
.6549
.8988
CBsal
.9628
.8728
.7409
.9270
.8087
.8419
.9811
IT98
.8738
.8904
.7862
.6282
.7797
.8218
.9479
Figure 8. Failure cases of the proposed algorithm as the weak
saliency maps do not perform well. Left to right: input, ground
truth, weak saliency map, strong saliency map and the bootstrap
saliency map generated by the proposed algorithm.
1 0.998985 0.995833 0.988996 0.978742 0.967646
1
0.198502 0.220614 0.311115 0.462189 0.588434 0.669275
1 0.956859 0.932825 0.915226 0.901242 0.889679
0.9 0.529796 0.636434 0.689749 0.7209 0.74431
0.198502
1 0.998944 0.996054 0.988925 0.978857 0.969766
0.198502
0.8 0.220623 0.312355 0.473508 0.615801 0.705122
1 0.971362 0.958368 0.947988 0.939012 0.930239
0.198502 0.596668 0.689279 0.745228 0.784004 0.811859
0.7 1 0.999971 0.999263 0.996502 0.992816 0.988713
0.198502 0.216807 0.290608 0.413276 0.530211 0.614287
GC-bootstrapped
GC 0.966175
0.984814 0.978828 0.972422
0.6 1 0.991373
0.198502 0.34036
0.420089 0.487547 0.546427
0.598013
GMR-bootstrapped
GMR
1
0.9998 0.9985 0.995489 0.990313 0.984006
0.5 0.212996
HS-bootstrapped
HS 0.579066
0.198502
0.273305 0.381449 0.490964
1 0.996639
0.993223
0.988612
0.983668
0.978369
CBsal-bootstrapped
CBsal
0.198502
0.4 0.275404 0.340007 0.39482 0.444039 0.493797
FT-bootstrapped
FT
1 0.999433 0.996985 0.990381 0.978753 0.964858
0.198502 0.210841
0.261313 0.340034 0.41977
GB-bootstrapped
GB0.488011
0.3
1 0.96539 0.92304 0.881957 0.841839 0.801431
SF-bootstrapped
SF
0.198502 0.264398 0.34321 0.411997 0.473571 0.522639
XL13-bootstrapped
XL130.99367
0.2 1 0.999914
0.999464 0.998457 0.996573
0.198502 0.203947
0.224206 0.266502 0.318258
0.363659
wCO-bootstrapped
wCO
0.1 1 0.999299 0.996734 0.992776 0.98805 0.981953
0.198502 0.236202 0.272545 0.302544 0.328963 0.353891
0
0.2
0.4
0.6
0.8
1 0.998672 0.994693 0.985401
0.96569
0.941913
Recall
0.958339
1 1 0.999605 0.997566 0.993717 0.987887 0.98121
0.722714 0.760505
0.198502 0.218252 0.304875 0.460966 0.597131 0.674241
0.879479 0.870624
1 0.982384 0.97312 0.964101 0.95479 0.945668
0.764197 0.779962
0.9 0.614823 0.679576 0.718003 0.745381 0.765974
0.198502
0.962073 0.955956
1 0.999695 0.998718 0.996238 0.992838 0.989672
0.761417 0.800181
0.198502
0.8 0.216012 0.286274 0.405711 0.517404 0.596888
0.922293 0.913841
1 0.989102 0.98598 0.982513 0.978928 0.974681
0.833464
0.198502 0.409187 0.461513 0.51417 0.561826 0.602324
0.984211 0.9795150.7 1 0.999616 0.998387 0.995843 0.991717 0.986935
0.67956
0.198502 0.205368 0.230541 0.270983 0.31678 0.359809
0.959811 0.9527790.6 1 0.997455 0.992517 0.985509 0.975782 0.962913
0.640131 0.677577
0.198502 0.228139
0.260025 0.290259 0.31854
DSR-bootstrapped
DSR 0.344854
0.977842 0.971123
1 0.999397
0.99857 0.996914 0.994967
RC-J-bootstrapped
RC-J 0.992583
0.5 0.220225
0.646069 0.699272
0.198502
0.28606 0.399811 0.515489 0.596625
CA-bootstrapped
CA
0.972685 0.965954
1 0.988088 0.984873 0.982108 0.978837 0.97597
GS_SP-bootstrapped
GS_SP
0.542221 0.582563
0.198502
0.598715
0.4 0.445325 0.494336 0.533049 0.567411
IT98-bootstrapped
IT98 0.986766
0.950376 0.935476
1 0.999512
0.998059 0.995498 0.991745
0.198502 0.205944
0.233525 0.280116 0.330367
0.545337 0.593391
LC-bootstrapped
LC 0.372913
0.3
1 0.998419
0.995459 0.990654 0.983804
0.761542 0.722546
LRMR-bootstrapped
LRMR0.974234
0.198502 0.240503 0.276504 0.307885 0.335606 0.361047
0.565179 0.602751
RA10-bootstrapped
RA10
0.989936
0.2 1 0.99741 0.989466 0.974851 0.954313 0.931797
SR-bootstrapped
SR 0.418483
0.198502 0.210601
0.250018 0.308412 0.366721
0.403125 0.439844
SVO-bootstrapped
SVO 0.741939
1 0.896744
0.84615 0.806577 0.771966
0.974305 0.965352
0.1
0.198502 0.25991 0.304303 0.341013 0.374589 0.402885
0.377825 0.401033
1
0.4
0.6 0.992555
0.8
10 0.999984 0.2
0.999724 0.998759
0.996396
0.923739 0.909219
Recall
Precision
ASD
SED2
SOD
THUS
Pascal-S
DUT-OMRON
ASD (b)
wCO-b
.9904
.9399
.8525
.9723
.8774
.9063
DRFI
.9349
-
Precision
ASD
SED2
SOD
THUS
Pascal-S
DUT-OMRON
ASD (b)
0.975035 0.969362
0.725753 0.764029
0.936532 0.927745
0.782669 0.797484
0.98615 0.981912
0.659366
0.970165 0.965189
0.637095 0.669028
0.981443 0.975838
0.399964 0.436634
0.947228 0.928877
0.369555 0.392639
0.989925 0.986518
0.654776 0.700652
0.973555 0.970536
0.627108 0.652996
0.980849 0.974396
0.410582 0.445527
0.961022 0.944978
0.384458 0.406513
0.910382 0.890669
0.463138 0.500718
0.714665 0.689201
0.427825 0.450075
0.9883610.983684
Figure 9. P-R curve results show improvement of state-of-the-art
methods by the bootstrap learning approach on the ASD dataset.
model is constructed based on the MKB algorithm which
combines all the weak classifiers into a strong one using
the Adaboost algorithm. Extensive experimental results demonstrate that the proposed approach performs favorably against 20 state-of-the-art methods on six benchmark
datasets. In addition, the proposed bootstrap learning algorithm can be applied to other saliency models for significant
improvement.
Acknowledgements. N. Tong and H. Lu are supported by
the Natural Science Foundation of China #61472060 and
the Fundamental Research Funds for the Central Universities under Grant DUT14YQ101. M.-H. Yang is supported in
part by NSF CAREER Grant #1149783 and NSF IIS Grant
#1152576.
References
[1] R. Achanta, S. Hemami, F. Estrada, and S. S¨usstrunk.
Frequency-tuned salient region detection. In CVPR, 2009.
1, 5, 6
[2] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and
S. S¨usstrunk. Slic superpixels. EPFL, 2010. 3
[3] S. Alpert, M. Galun, R. Basri, and A. Brandt. Image segmentation by probabilistic bottom-up aggregation and cue
integration. In CVPR, 2007. 5
[4] F. R. Bach, G. R. Lanckriet, and M. I. Jordan. Multiple kernel
learning, conic duality, and the SMO algorithm. In ICML,
2004. 4
[5] A. Borji, D. N. Sihite, and L. Itti. Salient object detection: A
benchmark. In ECCV, 2012. 3
[6] Y. Boykov and V. Kolmogorov. An experimental comparison
of min-cut/max-flow algorithms for energy minimization in
vision. PAMI, 26(9):1124–1137, 2004. 4
[7] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, 23(11):1222–1239,
2001. 3
[8] K.-Y. Chang, T.-L. Liu, H.-T. Chen, and S.-H. Lai. Fusing
generic objectness and visual saliency for salient object detection. In ICCV, 2011. 1, 5
[9] M.-M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S.-M.
Hu. Salient object detection and segmentation. PAMI, 2(3),
2013. 5, 6
[10] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet, and
N. Crook. Efficient salient region detection with soft image
abstraction. In ICCV, 2013. 2, 5, 6
[11] S. Goferman, L. Zelnik-Manor, and A. Tal. Context-aware
saliency detection. In CVPR, 2010. 5
[12] J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. In NIPS, 2006. 1, 2, 5
[13] K. He, J. Sun, and X. Tang. Guided image filtering. In ECCV,
2010. 5
[14] K. He, J. Sun, and X. Tang. Single image haze removal using
dark channel prior. PAMI, 33(12):2341–2353, 2011. 3
[15] X. Hou and L. Zhang. Saliency detection: A spectral residual
approach. In CVPR, 2007. 1, 5
[16] L. Itti, C. Koch, and E. Niebur. A model of saliency-based
visual attention for rapid scene analysis. PAMI, 20:1254–
1259, 1998. 2, 5
[17] B. Jiang, L. Zhang, H. Lu, C. Yang, and M.-H. Yang. Saliency detection via absorbing markov chain. In ICCV, 2013.
[18] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li.
Automatic salient object segmentation based on context and
shape prior. In BMVC, 2011. 1, 2, 3, 5
[19] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li.
Salient object detection: A discriminative regional feature
integration approach. In CVPR, 2013. 1, 2, 4, 5
[20] D. A. Klein and S. Frintrop. Center-surround divergence of
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
feature statistics for salient object detection. In ICCV, 2011.
1
V. Kolmogorov and R. Zabin. What energy functions can be
minimized via graph cuts? PAMI, 26(2):147–159, 2004. 3
B. Kuipers and P. Beeson. Bootstrap learning for place
recognition. In AAAI, 2002. 1
X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang. Saliency detection via dense and sparse reconstruction. In ICCV,
2013. 1, 5, 6
Y. Li, X. Hou, C. Koch, J. Rehg, and A. Yuille. The secrets
of salient object segmentation. In CVPR, 2014. 5
T. Liu, J. Sun, N.-N. Zheng, X. Tang, and H.-Y. Shum.
Learning to detect a salient object. In CVPR, 2007. 3, 5
V. Movahedi and J. H. Elder. Design and perceptual validation of performance measures for salient object segmentation. In POCV, 2010. 5
T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution
gray-scale and rotation invariant texture classification with
local binary patterns. PAMI, 24(7):971–987, 2002. 3
F. Perazzi, P. Kr¨ahenb¨uhl, Y. Pritch, and A. Hornung. Saliency filters: Contrast based filtering for salient region detection. In CVPR, 2012. 1, 2, 5, 6
E. Rahtu, J. Kannala, M. Salo, and J. Heikkil¨a. Segmenting
salient objects from images and videos. In ECCV, 2010. 2, 5
X. Shen and Y. Wu. A unified approach to salient object
detection via low rank matrix recovery. In CVPR, 2012. 2,
5, 6
J. Sun, H. Lu, and S. Li. Saliency detection based on integration of boundary and soft-segmentation. In ICIP, 2012.
N. Tong, H. Lu, L. Zhang, and X. Ruan. Saliency detection
with multi-scale superpixels. SPL, 21(9):1035–1039, 2014.
N. Tong, H. Lu, Y. Zhang, and X. Ruan. Salient object detection via global and local cues. Pattern Recognition,
doi:10.1016/j.patcog.2014.12.005, 2014.
Y. Wei, F. Wen, W. Zhu, and J. Sun. Geodesic saliency using
background priors. In ECCV, 2012. 2, 5, 6
Y. Xie and H. Lu. Visual saliency detection based on
Bayesian model. In ICIP, 2011. 2
Y. Xie, H. Lu, and M.-H. Yang. Bayesian saliency via low
and mid level cues. TIP, 22(5):1689–1698, 2013. 2, 5, 6
Q. Yan, L. Xu, J. Shi, and J. Jia. Hierarchical saliency detection. In CVPR, 2013. 2, 5, 6
C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection via graph-based manifold ranking. In CVPR,
2013. 1, 2, 5, 6
F. Yang, H. Lu, and Y.-W. Chen. Human tracking by multiple
kernel boosting with locality affinity constraints. In ACCV,
2011. 1, 4
J. Yang and M.-H. Yang. Top-down visual saliency via joint
CRF and dictionary learning. In CVPR, 2012. 1, 2
Y. Zhai and M. Shah. Visual attention detection in video
sequences using spatiotemporal cues. In ACM MM, 2006. 1,
5
L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell. Sun: A Bayesian framework for saliency using natural
statistics. Journal of Vision, 8(7), 2008. 1, 2
W. Zhu, S. Liang, Y. Wei, and J. Sun. Saliency optimization
from robust background detection. In CVPR, 2014. 1, 2, 5, 6