GLOBAL CONVERGENCE OF SLANTING FILTER METHODS FOR

GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
FOR NONLINEAR PROGRAMMING
ELIZABETH W. KARAS¶, ANA P. OENING§ , AND ADEMIR A. RIBEIRO¶
Abstract. In this paper we present a general algorithm for nonlinear programming which uses a slanting
filter criterion for accepting the new iterates. Independently of how these iterates are computed, we prove that all
accumulation points of the sequence generated by the algorithm are feasible. Computing the new iterates by the
inexact restoration method, we prove stationarity of all accumulation points of the sequence.
Key words. Filter methods, nonlinear programming, global convergence.
1. Introduction. We shall study the nonlinear programming problem
(P )
minimize
subject to
f0 (x)
fE (x) = 0
fI (x) ≤ 0,
where the index sets E and I refer to the equality and inequality constraints respectively. Let
the cardinality of E ∪ I be m, and assume that the functions fi : IRn → IR, i = 0, 1, . . . , m, are
continuously differentiable. The Jacobian matrices of fE and fI are denoted, respectively, AE (·)
and AI (·).
A nonlinear programming algorithm must deal with two different (and possibly conflicting)
criteria, related respectively to optimality and to feasibility. Optimality is measured by the objective
function f0 ; feasibility is typically measured by penalization of constraint violation, for instance,
by the function h : IRn → IR+ , given by
°
°
h(x) = °f + (x)° ,
(1.1)
where k · k is an arbitrary norm and f + : IRn → IRm is defined by
½
fi (x)
if i ∈ E
fi+ (x) =
max{0, fi (x)}
if i ∈ I.
Both criteria must be optimized and the algorithm should follow a certain balance between them
at every step of the iterative process. Several algorithms for nonlinear programming have been
designed in which a merit function is a tool to guarantee global convergence [1, 7, 8, 10, 11, 14].
As an alternative to merit function, Fletcher and Leyffer [5] introduced the so-called filter for
globalizing nonlinear programming methods. Filter methods are based on the concept of dominance, borrowed from multi-criteria optimization. A filter algorithm defines a forbidden region, by
memorizing pairs (f0 (xj ), h(xj )), chosen conveniently from former iterations and then avoids points
dominated by these by the following domination rule: a point x is dominated by y if, and only if,
f0 (x) ≥ f0 (y) − αh(y)
and
h(x) ≥ (1 − α)h(y)
¶ Department of Mathematics, Federal University of Paran´
a, Cx. Postal 19081, 81531-980, Curitiba, PR, Brazil;
e-mail :[email protected], [email protected]. Supported by PRONEX - Optimization.
§ Master Program in Numerical Methods in Engineering, Federal University of Paran´
a, Cx. Postal 19081, 81531980, Curitiba, PR, Brazil; e-mail :[email protected]. Supported by CAPES, Brazil.
1
2
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
where α ∈ (0, 1) is a given constant.
Under reasonable assumptions, like smoothness of the functions, boundedness of the iterates
and of the Hessians of the models, Fletcher et al. [4] have proved global convergence of a filter-SQP
algorithm based on the domination rule seen above. More precisely, they prove that the sequence
generated by the filter algorithm has a stationary accumulation point. The same filter criterion was
applied to inexact restoration methods by Gonzaga, Karas and Vanti [9], who proved stationarity
of all qualified accumulation points.
In this paper we use a slightly different way of defining the domination rule, proposed initially
by Chin [2]. A point x is dominated by y if, and only if,
f0 (x) + αh(x) ≥ f0 (y)
and
h(x) ≥ (1 − α)h(y).
A filter based on this domination rule is referred to as slanting filter. Figure 1.1 illustrates the two
domination rules on the f0 × h plane, where we simplified the notation by using y to represent the
pair (f0 (y), h(y)). The rectangular region represents the set of points dominated by y according to
the first criterion. The region dominated by the slanting filter criterion includes the large triangular
region but excludes the small triangle at the bottom.
h
y
f0
Fig. 1.1. The domination rules.
Chin and Fletcher [3] and Fletcher, Leyffer and Toint [6] prove that the sequence (h(xk ))
generated by a slanting filter algorithm converges to zero, assuming that an infinite number of pairs
(f0 (xj ), h(xj )) are added to the filter. We improve this result by proving the same claim without
this assumption. This result does not depend on how the new iterate is computed.
The same works [3] and [6] present the proof that the sequence generated by the algorithm has a
stationary accumulation point using sequential linear programming (SLP) and sequential quadratic
programming (SQP), respectively, for computing the new iterates.
We propose in this paper an inexact restoration method in the sense of Mart´ınez and Pilotta
[10, 11, 12] for computing the step. Each iteration is decomposed in two phases. First, a feasibility
phase aims to reduce the infeasibility measure h. Then an optimality phase computes a trial
point reducing the objective function in a linearization of the feasible set. The proposed method
independs of the internal algorithms used in both phases, the only requirement being that the
points generated must be acceptable for the filter and that near a feasible non-stationary point,
the reduction of the objective function be large in the optimality step. This efficiency condition,
stated ahead as Hypothesis H5 and introduced by Gonzaga, Karas and Vanti [9], is the main tool
of the global convergence analysis. Under this hypothesis, we prove that all accumulation points
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
3
of the sequence generated by the algorithm are stationary. Furthermore, we show how to compute
the optimality step in order to fulfill this hypothesis.
The paper is organized as follows. In Section 2 we present a general slanting filter algorithm
and prove convergence to feasibility. An inexact restoration algorithm for computing the step and
global convergence of the general algorithm are described in Section 3. In Section 4 we present an
algorithm for computing the optimality step and prove that Hypothesis H5 is satisfied.
2. The general algorithm. In this section we present a general algorithm whose main feature
is the construction of the filter based on [2, 3, 6]. Independently how the new point is obtained, we
prove that any accumulation point of the sequence generated by the algorithm is feasible.
¡ j The
¢ algorithm constructs a sequence of filter sets F0 ⊂ F1 ⊂ · · · ⊂ Fk , composed of pairs
f0 , hj ∈ IR2 . We also mention in the algorithm the sets Fk ⊂ IRn , which are formally defined in
each step for the sake of clarity, but are never actually constructed.
Algorithm 2.1. General filter algorithm model
Given: x0 ∈ IRn , F0 = ∅, F0 = ∅, α ∈ (0, 1).
k=0
repeat
S
Set F¯k = F
{(f0 (xk ), h(xk ))} and define
k
S
F¯k = Fk
{x ∈ IRn | f0 (x) + αh(x) ≥ f0 (xk ) and h(x) ≥ (1 − α)h(xk )}.
Step:
if xk is stationary, stop with success
else, compute xk+1 ∈
/ F¯k .
Filter update:
if f0 (xk+1 ) < f0 (xk ),
Fk+1 = Fk , Fk+1 = Fk
(f0 -iteration: the new entry is discarded)
else,
Fk+1 = F¯k , Fk+1 = F¯k
(h-iteration: the new entry becomes permanent)
k = k + 1.
At the beginning of each iteration, the pair (f0 (xk ), h(xk )) is temporarily introduced in the
filter. After the iteration is completed, this entry will become permanent in the filter only if the
iteration does not produce a decrease in f0 .
We now prove that the Algorithm 2.1 is well-defined. Given the generality of the algorithm, it
is enough to show that whenever the current point is not stationary, a new not forbidden point can
be chosen, unless the current point is a global solution of the problem (P).
¡
¢
¡
¢
In the next two results we simplify the notation by using f0k , hk to represent f0 (xk ), h(xk ) .
Lemma 2.2. Consider the setting of Algorithm 2.1. For all k ∈ IN such that xk is neither
stationary nor a global solution of (P), the following facts hold.
(i) hj > 0, for all j ∈ IN such that (f0j , hj ) ∈ Fk .
(ii) There exists xk+1 ∈
/ F¯k .
©
ª
Proof. We prove this lemma by induction. For k = 0, F0 = ∅ and F¯0 = (f00 , h0 ) . Suppose
that h0 = 0. Since x0 is not a minimizer of the problem (P), there exists a feasible point x1 such
that f01 + αh1 = f01 < f00 . On the other hand, if h0 > 0, we can take x1 as any feasible point. In
both cases, x1 ∈
/ F¯0 .
Now, suppose that (i) and (ii) hold for k − 1. If the iteration k − 1 is an f0 -iteration, then Fk =
Fk−1 and consequently the statement (i) for k follows from the induction hypothesis. Otherwise,
4
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
©
ª
k − 1 is an h-iteration and Fk = F¯k−1 = Fk−1 ∪ (f0k−1 , hk−1 ) . In this case, it is enough to prove
that hk−1 > 0. By the induction hypothesis, there exists xk ∈
/ F¯k−1 . In particular, if hk−1 = 0,
k−1
k
k
k
then f0 ≤ f0 + αh < f0 , which means that the iteration k − 1 is an f0 -iteration. Since this is
not the case, hk−1 > 0. Thus, (i) holds for k.
Let us prove (ii). Suppose that hk = 0. Since xk is not a minimizer of the problem (P), there
exists a feasible point xk+1 such that f0k+1 + αhk+1 = f0k+1 < f0k . On the other hand, if hk > 0,
we can take xk+1 as any feasible point. In both cases, using (i), we conclude that xk+1 ∈
/ F¯k .
In the light of Lemma 2.2, we shall suppose that Algorithm 2.1 generates an infinite sequence
(xk ). Furthermore, we assume the following hypotheses.
H1. The sequence (xk ) remains in a convex compact domain X ⊂ IRn .
H2. All the functions fi (·), i = 0, 1, . . . , m, are uniformly Lipschitz continuously differentiable
in an open set containing X.
H3. Every feasible point x
¯ of our nonlinear programming problem satisfies the MangasarianFromovitz constraint qualification (MFCQ), namely, the gradients ∇fi (¯
x) for i ∈ E are linearly
independent, and there exists a direction d ∈ IRn such that AE (¯
x)d = 0 and AI¯ (¯
x)d < 0, where
I¯ = {i ∈ I | fi (¯
x) = 0}.
Although H1 is an assumption on the sequence generated by the algorithm, it can be enforced
by including a bounded box into the problem constraints.
In the next theorem we show that any accumulation point of the generated sequence is feasible.
This result is proved by Chin and Fletcher [3] and also by Fletcher, Leyffer and Toint [6], assuming
that an infinite number of pairs (f0j , hj ) are added to the filter. We do not make this requirement
in our analysis.
Theorem 2.3. Consider the sequence (xk ) generated by Algorithm 2.1. Then h(xk ) → 0 and,
consequently, any accumulation point of the sequence (xk ) is feasible.
Proof. Consider the set
o
n
K = k ∈ IN | f0j > f0k , ∀ j > k .
In particular for all k ∈ K, f0k+1 > f0k and k is an h- iteration. So, for all j > k, k ∈ K
hj ≤ (1 − α)hk .
(2.1)
We will consider two cases.
First case: K infinite. Let us denote K = {k0 , k1 , . . . , ki . . .}. Using (2.1),
hki ≤ (1 − α)i hk0 .
K
Then hk → 0. For j ∈
/ K, j > k0 , there exists i ∈ IN such that ki < j < ki+1 . Using (2.1),
hj ≤ (1 − α) hki
IN
and consequently hk → 0.
Second case: K finite. Consider K = max{k ∈ K} and define the infinite set S = {s0 , s1 , . . . , si . . .}
as s0 = K + 1 and for i > 0,
n
o
s
si = min j > si−1 | f0j ≤ f0 i−1 .
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
5
Note that if si+1 = si + 1, then by construction of Algorithm 2.1, xsi+1 ∈
/ F¯si . On the other hand,
si +1
si
if si+1 > si + 1, then f0
> f0 and si is an h- iteration. This implies again xsi+1 ∈
/ F¯si . By
si
si+1
¯
definition of the set S, the sequence (f0 ) is decreasing. Furthermore as x
∈
/ Fsi , for all i ∈ IN
hsi+1 ≤ (1 − α) hsi
(2.2)
or
s
f0si − f0 i+1 ≥ αhsi+1 .
S
We will prove first that hk → 0. This is immediate if hsi+1 ≤ (1 − α) hsi for all i sufficiently large.
s
On the other hand if there exists an infinite set IN0 such that f0si − f0 i+1 ≥ αhsi+1 for i ∈ IN0 , then
IN0
hsi+1 → 0 because (f0si ) is bounded and decreasing. Thus, given ε > 0, there exists ` ∈ IN0 such
that for all i ≥ `, i ∈ IN0 ,
(2.3)
hsi+1 < ε.
We claim that hs`+j < ε for all j ∈ IN. This is clear for j = 1. Suppose that the claim holds for
some j. If ` + j ∈ IN0 , then (2.3) implies hs`+j+1 < ε. Else, ` + j ∈
/ IN0 and by (2.2), we have
S
hs`+j+1 ≤ (1 − α) hs`+j < ε. This means that hk → 0.
IN
Now we prove that hk → 0. For j ∈
/ S, j > K, there exists i ∈ IN such that si < j < si+1 . By
definition of the set S, si is an h-iteration and f0j > f0si . Thus hj ≤ (1 − α)hsi and consequently
IN
hk → 0, completing the proof.
3. Global Convergence. In this section we present a method for computing xk+1 ∈
/ F¯k with
no specification of the internal algorithms. Afterwards we state assumptions on the performance
of the step, and prove that any accumulation point of the sequence generated by Algorithm 2.1 is
stationary. The next section will show that quite usual methods for the internal steps fulfill these
assumptions.
The algorithm computes the step in two phases. First, a feasibility phase reduces a measure of
infeasibility. Then an optimality phase reduces the objective function in a tangential approximation
of the feasible set. These two phases are totally independent, and the only coupling between them
is provided by the filter.
Algorithm 3.1. Computation of xk+1 ∈
/ F¯k
n ¯
k
Data: x ∈ IR , Fk
Feasibility phase:
if h(xk ) = 0, set z k = xk
else, compute z k ∈
/ F¯k such that h(z k ) < (1 − α) h(xk ).
if impossible, stop with insuccess.
Optimality phase:
if z k is stationary, stop with success
else, compute xk+1 ∈
/ F¯k such that f0 (xk+1 ) ≤ f0 (z k ) and
k
k+1
AE (z ) (x
− z k ) = 0, fI (z k ) + AI (z k ) (xk+1 − z k ) ≤ fI+ (z k ).
Now we state the assumptions on the performance of the step at each iteration.
Feasibility phase. The purpose of the feasibility phase is to find from xk ∈ X a point z k ∈ X
such that h(z k ) < (1 − α)h(xk ) and z k 6∈ F¯k . The procedure used in this phase could in principle be
any iterative algorithm for decreasing h, and finite termination should be achieved because as we
have seen above all filter entries (f0j , hj ) ∈ Fk have hj > 0. We shall assume the following condition
on the performance of the feasibility step.
6
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
H4. At all iterations k ∈ IN the feasibility step must satisfy
h(xk ) − h(z k ) = Ω(kz k − xk k).
(3.1)
This can also be stated as
kz k − xk k = O(h(xk )),
(3.2)
because h(z k ) ≥ 0. Note that since ∇f0 (·) is bounded in X, the mean-value theorem ensures that
for all k ∈ IN,
|f0 (z k ) − f0 (xk )| = O(kz k − xk k).
Using this and (3.2) we have
|f0 (z k ) − f0 (xk )| = O(h(xk )).
(3.3)
The feasibility step studied by Mart´ınez [10], satisfies assumption H4 and applies directly to
our case. Thus we shall not describe the feasibility procedure in detail in this paper.
Note that the feasibility algorithm may fail, if h(·) has an infeasible stationary point. In this
case, the method stops with insuccess.
Optimality phase. Given the point z k , the optimality phase computes a trial point reducing the
objective function in a linearization of the feasible set.
Now we state the main assumption on the performance of the optimality step at each iteration.
Given an iterate xk , we start by defining the filter slack at xk :
o
n
Hk = min 1, min{(1 − α)hj | (f0j , hj ) ∈ Fk , f0j ≤ f0 (xk )} ,
(3.4)
illustrated in Figure 3.1.
h
xk
Hk
f0
Fig. 3.1. The filter slack Hk .
We assume that the optimality step must be efficient, in the sense that near a feasible nonstationary point, the reduction of the objective function at the optimality step is “large”. Formally
we require that:
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
7
H5. Given a feasible non-stationary point x
¯ ∈ X, there exists a neighborhood V of x
¯ such that
for any iterate xk ∈ V ,
p
(3.5)
f0 (z k ) − f0 (xk+1 ) = Ω( Hk ),
(3.6)
f0 (z k ) − f0 (xk+1 ) = Ω(kz k − xk+1 k).
The relation (3.5) means
¯ such that whenever xk is near
√ that there exists M > 0 dependent on x
k
k+1
x
¯, f0 (z ) − f0 (x
) ≥ M Hk . In Section 4 we shall state methods which satisfy this assumption.
Note that H5 is a local condition and it is not completely independent of the feasibility phase,
because it uses Hk , which is associated with xk . Note also that the condition is stated for xk ∈ V ,
and not z k ∈ V , but this is not important because kxk − z k k = O(h(xk )): if xk is near x
¯, then the
same is true for z k .
Throughout this section we will assume that Hypotheses H1-H5 hold.
Properties of the whole step. We extend for the whole step the properties of the optimality
step near a feasible non-stationary point.
Lemma 3.2. Given a feasible non-stationary point x
¯ ∈ X, there exists a neighborhood V of x
¯
such that for any xk ∈ V ,
p
f0 (xk ) − f0 (xk+1 ) = Ω( Hk ),
f0 (xk ) − f0 (xk+1 ) = Ω(kxk − xk+1 k).
Proof. Let x
¯ be a non-stationary feasible point, and let V be the neighborhood given by H5.
We have |f0 (xk ) − f0 (z k )| = O(h(xk )) by (3.3) and for any xk ∈ V
p
(3.7)
f0 (z k ) − f0 (xk+1 ) = Ω( Hk )
by (3.5). We can restrict the neighborhood V , if necessary, such that for xk ∈ V , h(xk ) < 1 and
consequently h(xk ) < Hk by definition of Hk . Restricting again the neighborhood V , if necessary,
we obtain
|f0 (xk ) − f0 (z k )| ≤
¢
1¡
f0 (z k ) − f0 (xk+1 ) .
2
It follows that
(3.8)
f0 (xk ) − f0 (xk+1 ) = f0 (xk ) − f0 (z k ) + f0 (z k ) − f0 (xk+1 ) ≥
¢
1¡
f0 (z k ) − f0 (xk+1 ) .
2
Thus, from (3.7),
f0 (xk ) − f0 (xk+1 ) = Ω(
p
Hk ),
which proves the first statement.
By the Lipschitz continuity of f0 and (3.7), we have for any xk ∈ V
p
¡
¢
kz k − xk+1 k = Ω f0 (z k ) − f0 (xk+1 ) = Ω( Hk ).
8
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
Furthermore, kxk − z k k = O(h(xk )) by (3.2). As above, we can deduce that
kxk − z k k ≤ kz k − xk+1 k,
and consequently
¡
¢
kxk − xk+1 k ≤ kxk − z k k + kz k − xk+1 k = O kz k − xk+1 k .
This together with (3.6) and (3.8) yields
f0 (xk ) − f0 (xk+1 ) = Ω(kxk − xk+1 k),
completing the proof.
3.1. Convergence of the objective function values. In this section we prove the convergence of the sequence (f0 (xk )) generated by Algorithm 2.1 with the step computed by Algorithm
3.1. The proofs of the results are similar to those presented in [9, Section 4.1]. The differences are
related to the definition of the filter. We start by showing that f0 cannot grow much in a single
iteration.
Lemma 3.3. There exists a constant M > 0 such that in any iteration k,
f0 (xk+1 ) ≤ f0 (xk ) + M h(xk ).
Proof. Note that f0 (·) can only grow in an h-iteration. From (3.3), there exists a constant
M > 0 such that in any iteration k, f0 (z k ) ≤ f0 (xk )+M h(xk ). By construction, f0 (xk+1 ) ≤ f0 (z k ),
completing the proof.
Now we show that f0 cannot grow much in a sequence of iterations.
¯ k¯ + 1, . . . , K} such that for k ∈ I,
Lemma 3.4. Consider a finite sequence of iterations I = {k,
¯
k
k
k
f ≡ f0 (x ) ≥ f0 (x ), and let M > 0 be the constant given by Lemma 3.3. Then
¯
fK ≤ fk +
M
¯
h(xk ).
α
¯ ¯
¯
Proof. Let us denote f k = f0 (xk ), hk = h(xk ), for k ∈ I and f¯ = f k , h
= hk . Let us also define
the following values:
¯
φ0 = f k
¯
φ1 = φ0 + M h
¯ = φ0 + [1 + (1 − α)]M h
¯
φ2 = φ1 + M (1 − α)h
j−1
X
¯
¯ ≤ φ0 + M h.
φj = φ0 + ( (1 − α)i )M h
α
i=0
We show the following: there exists an integer J ≤ K − k¯ such that the sequence has at least one
element in each interval [φj , φj+1 ], j = 0, 1, . . . , J, and f K ∈ [φJ , φJ+1 ]. Consequently f K will be
¯
smaller than φ0 + M h/α.
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
9
¯ enters the permanent filter, and
– First interval: The iteration k¯ is an h-iteration. The pair (f¯, h)
k
¯
¯
hence h ≤ (1 − α)h for k = k + 1, . . . , K and
¯
¯ = φ1
f k+1 ≤ φ0 + M h
by Lemma 3.3.
Let k0 be the largest k ∈ I such that f k ≤ φ1 (several h-iterations and f0 -iterations may have
occurred between k¯ and k0 ). If k0 = K, then the proof is complete. Otherwise f k0 +1 > φ1 ≥ f k0 .
– Second interval: The iteration k0 is an h-iteration, and like for the first interval, the pair (f k0 , hk0 )
¯ for k = k0 + 1, . . . , K and
enters the filter. Hence hk ≤ (1 − α)2 h
¯ = φ2
φ1 ≤ f k0 +1 ≤ φ1 + M (1 − α)h
by Lemma 3.3.
Following the same process, we detect an h-iteration k1 , the last in the second interval. If k1 = K,
the proof is complete. Otherwise f k1 +1 will be in the third interval, and so on, until f kJ = f K is
¯
obtained. Then f K ≤ φ0 + M h/α
completing the proof.
We can now prove the main result in this analysis.
Theorem 3.5. The sequence (f0 (xk )) converges.
Proof. Let us denote f k ≡ f0 (xk ) for k ∈ IN. The sequence (f k ) is bounded by hypothesis. We
shall use the following fact, which is a simple exercise in sequences:
Given a sequence (f k ) such that lim sup(f k ) > lim inf(f k ) + δ, δ > 0, it is possible to extract two
subsequences (f k )k∈K and (f k+jk )k∈K , K ⊂ IN such that for any k ∈ K,
f k+jk ≥ f k + δ
f k+r ≥ f k for r = 1, . . . , jk .
In fact, to prove this fact it is enough to take a subsequence convergent to lim sup(f k ) and associate
with each index (say, l) the last index l − jl such that f l−jl ≤ f l − δ, if it exists. For large l, the
construction will always be well-defined.
Assume by contradiction that lim sup(f k ) > lim inf(f k ) + δ, for some δ > 0, and let the
subsequence (f k )k∈K be given by the construction above. Then we conclude that for all k ∈ K, the
iteration k is an h-iteration and from Lemma 3.4,
(3.9)
f k + δ ≤ f k+jk ≤ f k +
M
h(xk ).
α
Taking subsequences if necessary, assume that (xk )k∈K converges to a point x
¯. Then x
¯ must be
feasible by Theorem 2.3. This contradicts (3.9), completing the proof.
3.2. Global convergence proof. In this section we show that any accumulation point of the
sequence generated by Algorithm 2.1, with the new point computed by Algorithm 3.1, is stationary.
Initially we show that near a feasible non-stationary point the objective function always changes
by a large amount, precluding the possibility of feasible non-stationary accumulation points.
Lemma 3.6. Let x
¯ ∈ X be a feasible non-stationary point. Then there exist a neighborhood V
of x
¯ and δ > 0 such that whenever xk ∈ V , there exists lk ∈ IN such that
(3.10)
f0 (xk ) − f0 (xk+lk ) ≥ δ.
10
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
Proof. From Lemma 3.2, there exist a neighborhood V1 of x
¯ and constants β1 , β2 > 0 such that
for all xk ∈ V1 ,
p
(3.11)
f0 (xk ) − f0 (xk+1 ) ≥ β1 Hk ,
(3.12)
f0 (xk ) − f0 (xk+1 ) ≥ β2 kxk − xk+1 k
and the iteration k is an f0 -iteration.
Consider ε > 0 such that Bε (¯
x) = {x ∈ IRn | kx − x
¯k < ε} ⊂ V1 , and define V = Bε/2 (¯
x).
Let k ∈ IN be such that xk ∈ V . While xk+i , i = 1, 2, . . ., remain in Bε (¯
x), the iterations (k + i)
are f0 -iterations, and the filter does not change, i.e.,
Fk+i = Fk
for i = 1, 2, . . . .
√
Consequently, from (3.11) f0 decreases by at least the constant amount β1 Hk . Hence, there exists
a finite lk ∈ IN such that xk+lk ∈
/ Bε (¯
x), xk+i ∈ Bε (¯
x), for i = 0, 1, . . . , lk − 1. We have
(3.13)
and
Fk+i = Fk
kxk − xk+lk k ≥
ε
2
because xk ∈ Bε/2 (¯
x). Using (3.12), (3.13) and the triangle inequality,
f0 (xk ) − f0 (xk+lk ) =
lX
k −1
f0 (xk+i ) − f0 (xk+i+1 )
i=0
lX
k −1
° k+i
°
°x
− xk+i+1 °
≥
β2
≥
≥
β2 kxk − xk+lk k
β2 ε/2,
i=0
completing the proof.
Now we show the main result of this section.
Theorem 3.7. Any accumulation point of the sequence (xk ) is stationary.
Proof. By contradiction, assume that there exist a non-stationary point x
¯ ∈ X and an infinite
k K
set K ⊂ IN such that x → x
¯. By Theorem 2.3, x
¯ is feasible. From Lemma 3.6, there exists δ > 0
such that for large k ∈ K there exists lk ∈ IN such that
f0 (xk ) − f0 (xk+lk ) ≥ δ.
This means that the sequence (f0 (xk )) is not a Cauchy sequence, contradicting Theorem 3.5 and
completing the proof.
4. Optimality phase algorithm. As we have seen above, Hypothesis H5 is crucial for the
convergence analysis. It is a very strong assumption and we must show that there exist methods
satisfying this condition. We shall present a general trust region method for computing the optimality step. Given z k obtained at the feasibility phase of Algorithm 3.1, the optimality phase must
find xk+1 in a linearized set such that f0 (xk+1 ) ≤ f0 (z k ), and so that xk+1 6∈ F¯k . We show that
the resulting step satisfies H5. The main tool for the analysis (not necessarily for the construction)
of such algorithms is the projected Cauchy direction.
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
11
The quadratic model. Given z k ∈ X generated by Algorithm 3.1 in the feasibility phase, the
trust region algorithm associates to z k a quadratic model of f0 ,
1
x ∈ IRn 7→ mk (x) = f0 (z k ) + ∇f0 (z k )T (x − z k ) + (x − z k )T Bk (x − z k ),
2
(4.1)
where Bk is an n × n symmetric matrix. This matrix may be an approximation of ∇2 f0 (z k ), or any
other matrix, provided that the hypothesis H6 below is verified. Usually, Bk will be an approximation of the Hessian of some Lagrangian function, and then mk deviates from a straightforward
model of f0 by incorporating the curvature along the manifold of the constraints. Although this
may be essential in the design of efficient algorithms, this discussion is out of the scope of this paper.
In this section we assume that Hypotheses H1-H4 are satisfied as well as the following condition.
H6. There exists β > 0 such that the quadratic model (4.1) satisfies kBk k ≤ β for all k ∈ IN.
The trust region step uses a radius ∆ > 0 and computes a step d(z k , ∆) ∈ IRn such that
kd(z k , ∆)k ≤ ∆. We define the predicted reduction produced by the step d(z k , ∆) as
pred(z k , ∆) = mk (z k ) − mk (z k + d(z k , ∆)),
(4.2)
and the actual reduction as
ared(z k , ∆) = f0 (z k ) − f0 (z k + d(z k , ∆)).
(4.3)
Lemma 4.1. Consider z k ∈ X and d(z k , ∆) ∈ IRn generated by the trust region algorithm.
Then
ared(z k , ∆) = pred(z k , ∆) + o(z k , ∆),
(4.4)
where
lim+
∆→0
o(z k , ∆)
=0
∆
uniformly in z k ∈ X.
Proof. [9, Lemma 3.1]
In the optimality step algorithm which we will discuss below we made the following choices
which simplify the treatment:
(1) Each trust region computation starts with a radius ∆ ≥ ∆min where ∆min > 0 is fixed. The
choice of ∆ is irrelevant for the theory, and it usually comes from the former iteration. The
use of this minimum radius ∆min simplifies the convergence proofs and enhances the chances
of taking a pure Newton step.
(2) A step d(z k , ∆) is only accepted if the sufficient decrease condition is satisfied:
(4.5)
ared(z k , ∆) > η pred(z k , ∆)
for a given η ∈ (0, 1).
(3) Given z ∈ IRn we define the linearization of the set {x ∈ IRn | fE (x) = fE (z), fI (x) ≤ fI+ (z)}
by
(4.6)
L(z) = {x ∈ IRn | AE (z) (x − z) = 0, fI (z) + AI (z) (x − z) ≤ fI+ (z)}.
12
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
(4) The trust region computation solves approximately the problem
minimize
subject to
(4.7)
mk (x)
x ∈ L(z k )
kx − z k k ≤ ∆,
where k · k is any norm in IRn .
Now we explain what we mean by “solving approximately”. Given z ∈ X and the set L(z), the
projected gradient direction is defined by
(4.8)
dc (z) = PL(z) (z − ∇f0 (z)) − z.
Define
ϕ(z) = − ∇f0 (z)T
dc (z)
.
kdc (z)k
Then ϕ is the descent rate of f0 along dc . As usual, we denote dkc = dc (z k ), ϕk = ϕ(z k ). According to
[9] we have, at a feasible point z, that the KKT conditions are equivalent to dc (z) = 0. Furthermore,
if z is non-stationary, then ϕ(z) > 0.
Now we use known results about the minimization of mk (·) along a direction – see the discussion
on the Cauchy point in [13]. Defining the generalized Cauchy point as the minimizer of mk (·) along
dc , in the trust region {x ∈ IRn | kx − z k k ≤ ∆},
ª
©
xc = argmin mk (x) | kx − z k k ≤ ∆, x = z k + λdkc , λ ≥ 0 ,
we know that
ξϕk
mk (z ) − mk (xc ) ≥
min
2
k
½
¾
ϕk
k
, kdc k, ∆ ,
kBk k
where ξ depends on the norms used. Using Hypothesis H6, this can be rewritten as
½ k
¾
ξϕk
ϕ
(4.9)
mk (z k ) − mk (xc ) ≥
min
, kdkc k, ∆ .
2
β
We accept as an approximate solution of (4.7) any feasible solution for this problem satisfying
(4.9).
Algorithm 4.2. Optimality phase
Data: z k ∈
/ F¯k , ∆min > 0, ∆ ≥ ∆min , η ∈ (0, 1)
repeat
Compute d = d(z k , ∆) such that kdk ≤ ∆, z k + d ∈ L(z k ) and
½ k
¾
ξϕk
ϕ
k
k
pred(z , ∆) ≥
min
, kdc k, ∆ .
2
kBk k
Set ared(z k , ∆) = f0 (z k ) − f0 (z k + d).
if z k + d ∈
/ F¯k and ared(z k , ∆) ≥ η pred(z k , ∆)
set xk+1 = z k + d, ∆k = ∆, and exit with success
else ∆ = ∆/2.
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
13
Note that the algorithms proposed in this paper differ from that proposed in [9] only by the
definition of the filter. The two next lemmas are independent of the filter, and hence remain true.
Lemma 4.3. For any z ∈ X, d ∈ IRn such that (z + d) ∈ L(z),
|h(z + d) − h(z)| = O(kdk2 ).
Proof. [9, Lemma 3.3].
Next lemma says that if we ignore the filter, then the trust region step is large near x
¯.
Lemma 4.4. Let x
¯ ∈ X be a feasible non-stationary point. Then there exist a neighborhood V˜
˜ ∈ (0, ∆min ) and a constant c˜ > 0 such that for any z k ∈ V˜ ,
of x
¯, ∆
˜
(i) for any ∆ > 0, pred(z k , ∆) ≥ c˜ min{∆, ∆},
k
˜
(ii) for any ∆ ∈ (0, ∆), ared(z , ∆) ≥ η pred(z k , ∆) ≥ η˜
c ∆.
Proof. [9, Lemma 3.4].
Next lemma shows that near a feasible non-stationary point the refusal of an optimality step
is due to a large increase of the infeasibility.
˜ and the
Lemma 4.5. Let x
¯ ∈ X be a feasible non-stationary point and consider the constant ∆
˜
¯
˜
neighborhood V given by Lemma 4.4. Then there exist
a ´neighborhood V ⊂ V of x
¯ and a constant
³
¯ ¯
∆
k k
¯
˜
¯
∆ ∈ (0, ∆) such that for any x , z ∈ V , and ∆ ∈ 2 , ∆ ,
(4.10)
f0 (xk ) − f0 (z k + d(z k , ∆)) ≥ αh(z k + d(z k , ∆)).
Furthermore, if z k + d(z k , ∆) was refused by Algorithm 4.2, then
h(z k + d(z k , ∆)) ≥ Hk .
(4.11)
Proof. Let x
¯ be a feasible non-stationary point. Consider the constants α given in Algorithm
˜ c˜ given by Lemma 4.4. By
2.1, η given in Algorithm 3.1, the neighborhood V˜ and the constants ∆,
Lemma 4.3 there exists a constant c > 0 such that
°
°2
(4.12)
|h(z k + d(z k , ∆)) − h(z k )| ≤ c °d(z k , ∆)° .
½
¾
c
˜ η˜
¯ = min ∆,
Consider ∆
.
8αc
From (3.3) we conclude that there exists a neighborhood V¯ ⊂ V˜ such that for all xk ∈ V¯ ,
|f0 (xk ) − f0 (z k )| ≤
By Lemma 4.4, for z k ∈ V¯ and ∆ ∈
³
¯ ¯
∆
2 ,∆
ared(z k , ∆) =
≥
≥
≥
1 ¯
η˜
c∆.
4
´
,
f0 (z k ) − f0 (z k + d(z k , ∆))
η pred(z k , ∆)
˜
η c˜ min{∆, ∆}
1
¯
η c˜ ∆.
2
14
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
Hence, for xk , z k ∈ V¯ and ∆ ∈
ared(xk , ∆)
³
¯ ¯
∆
2 ,∆
´
, we have
f0 (xk ) − f0 (z k + d(z k , ∆))
f0 (xk ) − f0 (z k ) + f0 (z k ) − f0 (z k + d(z k , ∆))
(4.13)
1
¯
≥
η c˜ ∆.
4
°
°
¯ and h(z k ) < (1 − α)h(xk ) we have
On the other hand, by (4.12) and the facts that °d(z k , ∆)° < ∆
=
=
h(z k + d(z k , ∆)) ≤
≤
¯2
h(z k ) + c∆
¯ 2.
(1 − α)h(xk ) + c∆
We can restrict the neighborhood V¯ , if necessary, such that for xk ∈ V¯ , h(xk ) <
consequently
αh(z k + d(z k , ∆)) ≤
≤
=
¯
η˜
c∆
and
8α(1 − α)
¯2
α(1 − α)h(xk ) + αc∆
1 ¯
1 ¯
η˜
c∆ + η˜
c∆
8
8
1 ¯
η˜
c∆.
4
From this and (4.13), we have
(4.14)
f0 (xk ) − f0 (z k + d(z k , ∆)) ≥
1 ¯
η˜
c∆ ≥ αh(z k + d(z k , ∆)),
4
proving the first statement. Furthermore, from Lemma 4.4, the trial point z k + d(z k , ∆) is accepted
by the trust region criterion. Therefore, if the trial point was refused by Algorithm 4.2, then
z k + d(z k , ∆) ∈ F¯k . We thus conclude from definition of the filter and (4.14) that
h(z k + d(z k , ∆)) ≥ Hk ,
completing the proof.
We now prove the main result of this section: Hypothesis H5 is satisfied when the optimality
step is computed by Algorithm 4.2.
Theorem 4.6. Given a feasible non-stationary point x
¯ ∈ X, there exists a neighborhood V of
x
¯ such that for any iterate xk ∈ V ,
√
f0 (z k ) − f0 (xk+1 ) = Ω( Hk ),
f0 (z k ) − f0 (xk+1 ) = Ω(kz k − xk+1 k)
where xk+1 = z k + d(z k , ∆) is computed by Algorithm 4.2.
Proof. Let x
¯ be a feasible non-stationary point. Algorithm 4.2 starts with the radius ∆ ≥ ∆min
and ends with ∆k ≤ ∆. Thus xk+1 = z k + d(z k , ∆k ). By Algorithm 4.2,
(4.15)
f0 (z k ) − f0 (xk+1 ) = ared(z k , ∆k ) ≥ ηpred(z k , ∆k ).
From Lemma 4.3 and (3.3), there exist constants c1 > 0 and c2 > 0 such that for all ∆ > 0
°
°2
(4.16)
h(z k + d(z k , ∆)) − h(z k ) ≤ c1 °d(z k , ∆)° ≤ c1 ∆2 ,
(4.17)
|f0 (z k ) − f0 (xk )| ≤ c2 h(xk ).
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
15
Consider the neighborhood V¯ given by Lemma 4.5 and V ⊂ V¯ such that h(x) < 1, for all x ∈ V .
By construction of Algorithm 3.1 and definition of Hk , we have for xk ∈ V ,
h(z k ) < (1 − α)h(xk ) < (1 − α)Hk .
(4.18)
¯ > 0 given by Lemma 4.5. We shall consider two cases.
Consider the constant ∆
¯
∆
First case: ∆k ≥ . In this case, from Lemma 4.4
2
n
o
¯
˜ ≥ c˜ ∆ .
pred(z k , ∆k ) ≥ c˜ min ∆k , ∆
2
Using (4.15) we conclude
f0 (z k ) − f0 (xk+1 ) ≥ η c˜
¯
∆
= Ω(1).
2
√
It follows trivially that f0 (z k ) − f0 (xk+1 ) = Ω( Hk ) and f0 (z k ) − f0 (xk+1 ) = Ω(kz k − xk+1 k),
because in both cases the right hand side is bounded in X.
¯
∆
Second case: now, assume that ∆k < . Algorithm 4.2 starts with a radius ∆ > 0, computes de
2
trial optimality step d(z k , ∆) and
µ ¯reduces
¶ the radius until the trial step is accepted.
∆ ¯
By Lemma 4.5 for all ∆ ∈
,∆ ,
2
h(z k + d(z k , ∆)) ≥ Hk .
(4.19)
We shall analyze two situations. In the first one, we suppose that (4.19) holds for all ∆ ≤
Using (4.18), we have
(4.20)
¯
∆
.
2
h(z k + d(z k , ∆k )) − h(z k ) ≥ Hk − (1 − α)Hk = αHk .
On the other hand, by (4.16),
h(z k + d(z k , ∆k )) − h(z k ) ≤ c1 ∆k 2 ,
consequently, using (4.20)
p
∆k = Ω( Hk ).
(4.21)
From (4.15) and Lemma 4.4
˜ = η c˜∆k .
f0 (z k ) − f0 (xk+1 ) ≥ η c˜ min{∆k , ∆}
Using (4.21) and the fact that ∆k ≥ kxk+1 − z k k , we obtain the two conditions of the theorem.
¯
∆
Let us see now the second possibility, that is, there exists ∆ ≤
such that
2
(4.22)
h(z k + d(z k , ∆)) < Hk .
16
E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO
¯
ˆ be the first ∆ ≤ ∆ satisfying such a condition. Then the radius 2∆
ˆ has been rejected by
Let ∆
2
Algorithm 4.2 and does not satisfy (4.22). Consequently, by (4.16) and (4.18) we have
ˆ − h(z k ) ≤ 4c1 ∆
ˆ 2.
αHk ≤ h(z k + d(z k , 2∆))
Thus
p
ˆ = Ω( Hk ).
∆
(4.23)
From Lemma 4.4
ˆ > ηpred(z k , ∆)
ˆ ≥ η˜
ˆ
f0 (z k ) − f0 (z k + d(z k , ∆))
c∆.
(4.24)
ˆ ≥ ∆k ≥ kxk+1 − z k k we conclude that the point x
Using (4.23) and the fact that the ∆
ˆ =
ˆ satisfies the two conditions of the theorem.
z k + d(z k , ∆)
ˆ > ηpred(z k , ∆)
ˆ and x
To finish the proof, we must show that xk+1 = x
ˆ. Since ared(z k , ∆)
ˆ
satisfies (4.22), it is enough to prove that
f0 (ˆ
x) + αh(ˆ
x) < f0 (xk ).
(4.25)
By (4.16) and (4.18)
ˆ 2.
h(ˆ
x) ≤ (1 − α)Hk + c1 ∆
Using (4.23) we have
ˆ 2 ).
h(ˆ
x) = O(∆
(4.26)
On the other hand, by (4.17) and (4.23) we have
ˆ 2 ).
|f0 (z k ) − f0 (xk )| ≤ c2 h(xk ) ≤ c2 Hk = O(∆
Using this and (4.24) we conclude that
ˆ
f0 (xk ) − f0 (ˆ
x) = f0 (xk ) − f0 (z k ) + f0 (z k ) − f0 (ˆ
x) = Ω(∆).
Comparing this with (4.26) we have (4.25), completing the proof.
5. Conclusions. In this work we have presented a general algorithm which uses a slanting
filter criterion for accepting the new iterates. This criterion was proposed initially by Chin [2] and
was used by Chin and Fletcher [3] and by Fletcher, Leyffer and Toint [6]. In these works they prove
that the all accumulation points of the sequence generated by the algorithm are feasible, assuming
that an infinite number of pairs are added to the filter. We improve this result by proving the same
claim without such assumption. This result does not depend on how the new iterate is computed.
By computing the new iterates by sequential linear programming (SLP), the authors in [3] have
proved that the sequence generated by the algorithm has a stationary accumulation. In the context
of sequential quadratic programming (SQP) the same result is proved in [6]. In this paper we
have proved stationarity of all accumulation points of the sequences generated by algorithms which
compute the new iterates by the inexact restoration method, in the sense of Mart´ınez and Pilotta
[11]. This result is independent of the internal algorithms used in the feasibility and optimality
phases of the inexact restoration methods provided that the points generated be acceptable by
the filter and that near a feasible non-stationary point, the reduction of the objective function be
large in the optimality step. We show how to compute the optimality step in order to fulfill this
hypothesis.
GLOBAL CONVERGENCE OF SLANTING FILTER METHODS
17
Acknowledgements. We thank Cl´ovis C. Gonzaga and the referee for their valuable comments
and suggestions which very much improved this paper.
REFERENCES
[1] R. H. Byrd, J. C. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for
nonlinear programming. Mathematical Programming, 89(1):149–185, 2000.
[2] C. M. Chin. A new trust region based SLP-filter algorithm which uses EQP active set strategy. PhD thesis,
Department of Mathematics, University of Dundee, Scotland, 2001.
[3] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filter algorithm that takes EQP steps.
Mathematical Programming, 96(1):161–177, 2003.
[4] R. Fletcher, N. Gould, S. Leyffer, P. Toint, and A. W¨
achter. Global convergence of a trust-region SQP-filter
algorithm for general nonlinear programming. SIAM J. Optimization, 13(3):635–659, 2002.
[5] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Mathematical Programming Ser. A, 91(2):239–269, 2002.
[6] R. Fletcher, S. Leyffer, and P. L. Toint. On the global convergence of a filter-SQP algorithm. SIAM J.
Optimization, 13(1):44–59, 2002.
[7] D. M. Gay, M. L. Overton, and M. H. Wright. A primal-dual interior method for nonconvex nonlinear programming. In Yuan Y, editor, Advances in Nonlinear Programming, pages 31–56. Kluwer Academic Publishers,
Dordretch, 1998.
[8] F. A. M. Gomes, M. C. Maciel, and J. M. Mart´ınez. Nonlinear programming algorithms using trust regions and
augmented Lagrangians with nonmonotone penalty parameters. Mathematical Programming, 84(1):161–
200, 1999.
[9] C. C. Gonzaga, E. W. Karas, and M. Vanti. A globally convergent filter method for nonlinear programming.
SIAM J. Optimization, 14(3):646–669, 2003.
[10] J. M. Mart´ınez. Inexact-restoration method with Lagrangian tangent decrease and a new merit function for
nonlinear programming. Journal of Optimization Theory and Applications, 111:39–58, 2001.
[11] J. M. Mart´ınez and E. A. Pilotta. Inexact restoration algorithm for constrained optimization. Journal of
Optimization Theory and Applications, 104:135–163, 2000.
[12] J. M. Mart´ınez and E. A. Pilotta. Inexact restoration methods for nonlinear programming: Advances and
perspectives. In Qi, Teo, and Yang, editors, Optimization and Control with Applications, pages 271–292.
Springer, 2005.
[13] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag,
1999.
[14] R. J. Vanderbei and D. F. Shanno. An interior-point algorithm for nonconvex nonlinear programming. Computational Optimization and Applications, 13:231–252, 1999.