GLOBAL CONVERGENCE OF SLANTING FILTER METHODS FOR NONLINEAR PROGRAMMING ELIZABETH W. KARAS¶, ANA P. OENING§ , AND ADEMIR A. RIBEIRO¶ Abstract. In this paper we present a general algorithm for nonlinear programming which uses a slanting filter criterion for accepting the new iterates. Independently of how these iterates are computed, we prove that all accumulation points of the sequence generated by the algorithm are feasible. Computing the new iterates by the inexact restoration method, we prove stationarity of all accumulation points of the sequence. Key words. Filter methods, nonlinear programming, global convergence. 1. Introduction. We shall study the nonlinear programming problem (P ) minimize subject to f0 (x) fE (x) = 0 fI (x) ≤ 0, where the index sets E and I refer to the equality and inequality constraints respectively. Let the cardinality of E ∪ I be m, and assume that the functions fi : IRn → IR, i = 0, 1, . . . , m, are continuously differentiable. The Jacobian matrices of fE and fI are denoted, respectively, AE (·) and AI (·). A nonlinear programming algorithm must deal with two different (and possibly conflicting) criteria, related respectively to optimality and to feasibility. Optimality is measured by the objective function f0 ; feasibility is typically measured by penalization of constraint violation, for instance, by the function h : IRn → IR+ , given by ° ° h(x) = °f + (x)° , (1.1) where k · k is an arbitrary norm and f + : IRn → IRm is defined by ½ fi (x) if i ∈ E fi+ (x) = max{0, fi (x)} if i ∈ I. Both criteria must be optimized and the algorithm should follow a certain balance between them at every step of the iterative process. Several algorithms for nonlinear programming have been designed in which a merit function is a tool to guarantee global convergence [1, 7, 8, 10, 11, 14]. As an alternative to merit function, Fletcher and Leyffer [5] introduced the so-called filter for globalizing nonlinear programming methods. Filter methods are based on the concept of dominance, borrowed from multi-criteria optimization. A filter algorithm defines a forbidden region, by memorizing pairs (f0 (xj ), h(xj )), chosen conveniently from former iterations and then avoids points dominated by these by the following domination rule: a point x is dominated by y if, and only if, f0 (x) ≥ f0 (y) − αh(y) and h(x) ≥ (1 − α)h(y) ¶ Department of Mathematics, Federal University of Paran´ a, Cx. Postal 19081, 81531-980, Curitiba, PR, Brazil; e-mail :[email protected], [email protected]. Supported by PRONEX - Optimization. § Master Program in Numerical Methods in Engineering, Federal University of Paran´ a, Cx. Postal 19081, 81531980, Curitiba, PR, Brazil; e-mail :[email protected]. Supported by CAPES, Brazil. 1 2 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO where α ∈ (0, 1) is a given constant. Under reasonable assumptions, like smoothness of the functions, boundedness of the iterates and of the Hessians of the models, Fletcher et al. [4] have proved global convergence of a filter-SQP algorithm based on the domination rule seen above. More precisely, they prove that the sequence generated by the filter algorithm has a stationary accumulation point. The same filter criterion was applied to inexact restoration methods by Gonzaga, Karas and Vanti [9], who proved stationarity of all qualified accumulation points. In this paper we use a slightly different way of defining the domination rule, proposed initially by Chin [2]. A point x is dominated by y if, and only if, f0 (x) + αh(x) ≥ f0 (y) and h(x) ≥ (1 − α)h(y). A filter based on this domination rule is referred to as slanting filter. Figure 1.1 illustrates the two domination rules on the f0 × h plane, where we simplified the notation by using y to represent the pair (f0 (y), h(y)). The rectangular region represents the set of points dominated by y according to the first criterion. The region dominated by the slanting filter criterion includes the large triangular region but excludes the small triangle at the bottom. h y f0 Fig. 1.1. The domination rules. Chin and Fletcher [3] and Fletcher, Leyffer and Toint [6] prove that the sequence (h(xk )) generated by a slanting filter algorithm converges to zero, assuming that an infinite number of pairs (f0 (xj ), h(xj )) are added to the filter. We improve this result by proving the same claim without this assumption. This result does not depend on how the new iterate is computed. The same works [3] and [6] present the proof that the sequence generated by the algorithm has a stationary accumulation point using sequential linear programming (SLP) and sequential quadratic programming (SQP), respectively, for computing the new iterates. We propose in this paper an inexact restoration method in the sense of Mart´ınez and Pilotta [10, 11, 12] for computing the step. Each iteration is decomposed in two phases. First, a feasibility phase aims to reduce the infeasibility measure h. Then an optimality phase computes a trial point reducing the objective function in a linearization of the feasible set. The proposed method independs of the internal algorithms used in both phases, the only requirement being that the points generated must be acceptable for the filter and that near a feasible non-stationary point, the reduction of the objective function be large in the optimality step. This efficiency condition, stated ahead as Hypothesis H5 and introduced by Gonzaga, Karas and Vanti [9], is the main tool of the global convergence analysis. Under this hypothesis, we prove that all accumulation points GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 3 of the sequence generated by the algorithm are stationary. Furthermore, we show how to compute the optimality step in order to fulfill this hypothesis. The paper is organized as follows. In Section 2 we present a general slanting filter algorithm and prove convergence to feasibility. An inexact restoration algorithm for computing the step and global convergence of the general algorithm are described in Section 3. In Section 4 we present an algorithm for computing the optimality step and prove that Hypothesis H5 is satisfied. 2. The general algorithm. In this section we present a general algorithm whose main feature is the construction of the filter based on [2, 3, 6]. Independently how the new point is obtained, we prove that any accumulation point of the sequence generated by the algorithm is feasible. ¡ j The ¢ algorithm constructs a sequence of filter sets F0 ⊂ F1 ⊂ · · · ⊂ Fk , composed of pairs f0 , hj ∈ IR2 . We also mention in the algorithm the sets Fk ⊂ IRn , which are formally defined in each step for the sake of clarity, but are never actually constructed. Algorithm 2.1. General filter algorithm model Given: x0 ∈ IRn , F0 = ∅, F0 = ∅, α ∈ (0, 1). k=0 repeat S Set F¯k = F {(f0 (xk ), h(xk ))} and define k S F¯k = Fk {x ∈ IRn | f0 (x) + αh(x) ≥ f0 (xk ) and h(x) ≥ (1 − α)h(xk )}. Step: if xk is stationary, stop with success else, compute xk+1 ∈ / F¯k . Filter update: if f0 (xk+1 ) < f0 (xk ), Fk+1 = Fk , Fk+1 = Fk (f0 -iteration: the new entry is discarded) else, Fk+1 = F¯k , Fk+1 = F¯k (h-iteration: the new entry becomes permanent) k = k + 1. At the beginning of each iteration, the pair (f0 (xk ), h(xk )) is temporarily introduced in the filter. After the iteration is completed, this entry will become permanent in the filter only if the iteration does not produce a decrease in f0 . We now prove that the Algorithm 2.1 is well-defined. Given the generality of the algorithm, it is enough to show that whenever the current point is not stationary, a new not forbidden point can be chosen, unless the current point is a global solution of the problem (P). ¡ ¢ ¡ ¢ In the next two results we simplify the notation by using f0k , hk to represent f0 (xk ), h(xk ) . Lemma 2.2. Consider the setting of Algorithm 2.1. For all k ∈ IN such that xk is neither stationary nor a global solution of (P), the following facts hold. (i) hj > 0, for all j ∈ IN such that (f0j , hj ) ∈ Fk . (ii) There exists xk+1 ∈ / F¯k . © ª Proof. We prove this lemma by induction. For k = 0, F0 = ∅ and F¯0 = (f00 , h0 ) . Suppose that h0 = 0. Since x0 is not a minimizer of the problem (P), there exists a feasible point x1 such that f01 + αh1 = f01 < f00 . On the other hand, if h0 > 0, we can take x1 as any feasible point. In both cases, x1 ∈ / F¯0 . Now, suppose that (i) and (ii) hold for k − 1. If the iteration k − 1 is an f0 -iteration, then Fk = Fk−1 and consequently the statement (i) for k follows from the induction hypothesis. Otherwise, 4 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO © ª k − 1 is an h-iteration and Fk = F¯k−1 = Fk−1 ∪ (f0k−1 , hk−1 ) . In this case, it is enough to prove that hk−1 > 0. By the induction hypothesis, there exists xk ∈ / F¯k−1 . In particular, if hk−1 = 0, k−1 k k k then f0 ≤ f0 + αh < f0 , which means that the iteration k − 1 is an f0 -iteration. Since this is not the case, hk−1 > 0. Thus, (i) holds for k. Let us prove (ii). Suppose that hk = 0. Since xk is not a minimizer of the problem (P), there exists a feasible point xk+1 such that f0k+1 + αhk+1 = f0k+1 < f0k . On the other hand, if hk > 0, we can take xk+1 as any feasible point. In both cases, using (i), we conclude that xk+1 ∈ / F¯k . In the light of Lemma 2.2, we shall suppose that Algorithm 2.1 generates an infinite sequence (xk ). Furthermore, we assume the following hypotheses. H1. The sequence (xk ) remains in a convex compact domain X ⊂ IRn . H2. All the functions fi (·), i = 0, 1, . . . , m, are uniformly Lipschitz continuously differentiable in an open set containing X. H3. Every feasible point x ¯ of our nonlinear programming problem satisfies the MangasarianFromovitz constraint qualification (MFCQ), namely, the gradients ∇fi (¯ x) for i ∈ E are linearly independent, and there exists a direction d ∈ IRn such that AE (¯ x)d = 0 and AI¯ (¯ x)d < 0, where I¯ = {i ∈ I | fi (¯ x) = 0}. Although H1 is an assumption on the sequence generated by the algorithm, it can be enforced by including a bounded box into the problem constraints. In the next theorem we show that any accumulation point of the generated sequence is feasible. This result is proved by Chin and Fletcher [3] and also by Fletcher, Leyffer and Toint [6], assuming that an infinite number of pairs (f0j , hj ) are added to the filter. We do not make this requirement in our analysis. Theorem 2.3. Consider the sequence (xk ) generated by Algorithm 2.1. Then h(xk ) → 0 and, consequently, any accumulation point of the sequence (xk ) is feasible. Proof. Consider the set o n K = k ∈ IN | f0j > f0k , ∀ j > k . In particular for all k ∈ K, f0k+1 > f0k and k is an h- iteration. So, for all j > k, k ∈ K hj ≤ (1 − α)hk . (2.1) We will consider two cases. First case: K infinite. Let us denote K = {k0 , k1 , . . . , ki . . .}. Using (2.1), hki ≤ (1 − α)i hk0 . K Then hk → 0. For j ∈ / K, j > k0 , there exists i ∈ IN such that ki < j < ki+1 . Using (2.1), hj ≤ (1 − α) hki IN and consequently hk → 0. Second case: K finite. Consider K = max{k ∈ K} and define the infinite set S = {s0 , s1 , . . . , si . . .} as s0 = K + 1 and for i > 0, n o s si = min j > si−1 | f0j ≤ f0 i−1 . GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 5 Note that if si+1 = si + 1, then by construction of Algorithm 2.1, xsi+1 ∈ / F¯si . On the other hand, si +1 si if si+1 > si + 1, then f0 > f0 and si is an h- iteration. This implies again xsi+1 ∈ / F¯si . By si si+1 ¯ definition of the set S, the sequence (f0 ) is decreasing. Furthermore as x ∈ / Fsi , for all i ∈ IN hsi+1 ≤ (1 − α) hsi (2.2) or s f0si − f0 i+1 ≥ αhsi+1 . S We will prove first that hk → 0. This is immediate if hsi+1 ≤ (1 − α) hsi for all i sufficiently large. s On the other hand if there exists an infinite set IN0 such that f0si − f0 i+1 ≥ αhsi+1 for i ∈ IN0 , then IN0 hsi+1 → 0 because (f0si ) is bounded and decreasing. Thus, given ε > 0, there exists ` ∈ IN0 such that for all i ≥ `, i ∈ IN0 , (2.3) hsi+1 < ε. We claim that hs`+j < ε for all j ∈ IN. This is clear for j = 1. Suppose that the claim holds for some j. If ` + j ∈ IN0 , then (2.3) implies hs`+j+1 < ε. Else, ` + j ∈ / IN0 and by (2.2), we have S hs`+j+1 ≤ (1 − α) hs`+j < ε. This means that hk → 0. IN Now we prove that hk → 0. For j ∈ / S, j > K, there exists i ∈ IN such that si < j < si+1 . By definition of the set S, si is an h-iteration and f0j > f0si . Thus hj ≤ (1 − α)hsi and consequently IN hk → 0, completing the proof. 3. Global Convergence. In this section we present a method for computing xk+1 ∈ / F¯k with no specification of the internal algorithms. Afterwards we state assumptions on the performance of the step, and prove that any accumulation point of the sequence generated by Algorithm 2.1 is stationary. The next section will show that quite usual methods for the internal steps fulfill these assumptions. The algorithm computes the step in two phases. First, a feasibility phase reduces a measure of infeasibility. Then an optimality phase reduces the objective function in a tangential approximation of the feasible set. These two phases are totally independent, and the only coupling between them is provided by the filter. Algorithm 3.1. Computation of xk+1 ∈ / F¯k n ¯ k Data: x ∈ IR , Fk Feasibility phase: if h(xk ) = 0, set z k = xk else, compute z k ∈ / F¯k such that h(z k ) < (1 − α) h(xk ). if impossible, stop with insuccess. Optimality phase: if z k is stationary, stop with success else, compute xk+1 ∈ / F¯k such that f0 (xk+1 ) ≤ f0 (z k ) and k k+1 AE (z ) (x − z k ) = 0, fI (z k ) + AI (z k ) (xk+1 − z k ) ≤ fI+ (z k ). Now we state the assumptions on the performance of the step at each iteration. Feasibility phase. The purpose of the feasibility phase is to find from xk ∈ X a point z k ∈ X such that h(z k ) < (1 − α)h(xk ) and z k 6∈ F¯k . The procedure used in this phase could in principle be any iterative algorithm for decreasing h, and finite termination should be achieved because as we have seen above all filter entries (f0j , hj ) ∈ Fk have hj > 0. We shall assume the following condition on the performance of the feasibility step. 6 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO H4. At all iterations k ∈ IN the feasibility step must satisfy h(xk ) − h(z k ) = Ω(kz k − xk k). (3.1) This can also be stated as kz k − xk k = O(h(xk )), (3.2) because h(z k ) ≥ 0. Note that since ∇f0 (·) is bounded in X, the mean-value theorem ensures that for all k ∈ IN, |f0 (z k ) − f0 (xk )| = O(kz k − xk k). Using this and (3.2) we have |f0 (z k ) − f0 (xk )| = O(h(xk )). (3.3) The feasibility step studied by Mart´ınez [10], satisfies assumption H4 and applies directly to our case. Thus we shall not describe the feasibility procedure in detail in this paper. Note that the feasibility algorithm may fail, if h(·) has an infeasible stationary point. In this case, the method stops with insuccess. Optimality phase. Given the point z k , the optimality phase computes a trial point reducing the objective function in a linearization of the feasible set. Now we state the main assumption on the performance of the optimality step at each iteration. Given an iterate xk , we start by defining the filter slack at xk : o n Hk = min 1, min{(1 − α)hj | (f0j , hj ) ∈ Fk , f0j ≤ f0 (xk )} , (3.4) illustrated in Figure 3.1. h xk Hk f0 Fig. 3.1. The filter slack Hk . We assume that the optimality step must be efficient, in the sense that near a feasible nonstationary point, the reduction of the objective function at the optimality step is “large”. Formally we require that: GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 7 H5. Given a feasible non-stationary point x ¯ ∈ X, there exists a neighborhood V of x ¯ such that for any iterate xk ∈ V , p (3.5) f0 (z k ) − f0 (xk+1 ) = Ω( Hk ), (3.6) f0 (z k ) − f0 (xk+1 ) = Ω(kz k − xk+1 k). The relation (3.5) means ¯ such that whenever xk is near √ that there exists M > 0 dependent on x k k+1 x ¯, f0 (z ) − f0 (x ) ≥ M Hk . In Section 4 we shall state methods which satisfy this assumption. Note that H5 is a local condition and it is not completely independent of the feasibility phase, because it uses Hk , which is associated with xk . Note also that the condition is stated for xk ∈ V , and not z k ∈ V , but this is not important because kxk − z k k = O(h(xk )): if xk is near x ¯, then the same is true for z k . Throughout this section we will assume that Hypotheses H1-H5 hold. Properties of the whole step. We extend for the whole step the properties of the optimality step near a feasible non-stationary point. Lemma 3.2. Given a feasible non-stationary point x ¯ ∈ X, there exists a neighborhood V of x ¯ such that for any xk ∈ V , p f0 (xk ) − f0 (xk+1 ) = Ω( Hk ), f0 (xk ) − f0 (xk+1 ) = Ω(kxk − xk+1 k). Proof. Let x ¯ be a non-stationary feasible point, and let V be the neighborhood given by H5. We have |f0 (xk ) − f0 (z k )| = O(h(xk )) by (3.3) and for any xk ∈ V p (3.7) f0 (z k ) − f0 (xk+1 ) = Ω( Hk ) by (3.5). We can restrict the neighborhood V , if necessary, such that for xk ∈ V , h(xk ) < 1 and consequently h(xk ) < Hk by definition of Hk . Restricting again the neighborhood V , if necessary, we obtain |f0 (xk ) − f0 (z k )| ≤ ¢ 1¡ f0 (z k ) − f0 (xk+1 ) . 2 It follows that (3.8) f0 (xk ) − f0 (xk+1 ) = f0 (xk ) − f0 (z k ) + f0 (z k ) − f0 (xk+1 ) ≥ ¢ 1¡ f0 (z k ) − f0 (xk+1 ) . 2 Thus, from (3.7), f0 (xk ) − f0 (xk+1 ) = Ω( p Hk ), which proves the first statement. By the Lipschitz continuity of f0 and (3.7), we have for any xk ∈ V p ¡ ¢ kz k − xk+1 k = Ω f0 (z k ) − f0 (xk+1 ) = Ω( Hk ). 8 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO Furthermore, kxk − z k k = O(h(xk )) by (3.2). As above, we can deduce that kxk − z k k ≤ kz k − xk+1 k, and consequently ¡ ¢ kxk − xk+1 k ≤ kxk − z k k + kz k − xk+1 k = O kz k − xk+1 k . This together with (3.6) and (3.8) yields f0 (xk ) − f0 (xk+1 ) = Ω(kxk − xk+1 k), completing the proof. 3.1. Convergence of the objective function values. In this section we prove the convergence of the sequence (f0 (xk )) generated by Algorithm 2.1 with the step computed by Algorithm 3.1. The proofs of the results are similar to those presented in [9, Section 4.1]. The differences are related to the definition of the filter. We start by showing that f0 cannot grow much in a single iteration. Lemma 3.3. There exists a constant M > 0 such that in any iteration k, f0 (xk+1 ) ≤ f0 (xk ) + M h(xk ). Proof. Note that f0 (·) can only grow in an h-iteration. From (3.3), there exists a constant M > 0 such that in any iteration k, f0 (z k ) ≤ f0 (xk )+M h(xk ). By construction, f0 (xk+1 ) ≤ f0 (z k ), completing the proof. Now we show that f0 cannot grow much in a sequence of iterations. ¯ k¯ + 1, . . . , K} such that for k ∈ I, Lemma 3.4. Consider a finite sequence of iterations I = {k, ¯ k k k f ≡ f0 (x ) ≥ f0 (x ), and let M > 0 be the constant given by Lemma 3.3. Then ¯ fK ≤ fk + M ¯ h(xk ). α ¯ ¯ ¯ Proof. Let us denote f k = f0 (xk ), hk = h(xk ), for k ∈ I and f¯ = f k , h = hk . Let us also define the following values: ¯ φ0 = f k ¯ φ1 = φ0 + M h ¯ = φ0 + [1 + (1 − α)]M h ¯ φ2 = φ1 + M (1 − α)h j−1 X ¯ ¯ ≤ φ0 + M h. φj = φ0 + ( (1 − α)i )M h α i=0 We show the following: there exists an integer J ≤ K − k¯ such that the sequence has at least one element in each interval [φj , φj+1 ], j = 0, 1, . . . , J, and f K ∈ [φJ , φJ+1 ]. Consequently f K will be ¯ smaller than φ0 + M h/α. GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 9 ¯ enters the permanent filter, and – First interval: The iteration k¯ is an h-iteration. The pair (f¯, h) k ¯ ¯ hence h ≤ (1 − α)h for k = k + 1, . . . , K and ¯ ¯ = φ1 f k+1 ≤ φ0 + M h by Lemma 3.3. Let k0 be the largest k ∈ I such that f k ≤ φ1 (several h-iterations and f0 -iterations may have occurred between k¯ and k0 ). If k0 = K, then the proof is complete. Otherwise f k0 +1 > φ1 ≥ f k0 . – Second interval: The iteration k0 is an h-iteration, and like for the first interval, the pair (f k0 , hk0 ) ¯ for k = k0 + 1, . . . , K and enters the filter. Hence hk ≤ (1 − α)2 h ¯ = φ2 φ1 ≤ f k0 +1 ≤ φ1 + M (1 − α)h by Lemma 3.3. Following the same process, we detect an h-iteration k1 , the last in the second interval. If k1 = K, the proof is complete. Otherwise f k1 +1 will be in the third interval, and so on, until f kJ = f K is ¯ obtained. Then f K ≤ φ0 + M h/α completing the proof. We can now prove the main result in this analysis. Theorem 3.5. The sequence (f0 (xk )) converges. Proof. Let us denote f k ≡ f0 (xk ) for k ∈ IN. The sequence (f k ) is bounded by hypothesis. We shall use the following fact, which is a simple exercise in sequences: Given a sequence (f k ) such that lim sup(f k ) > lim inf(f k ) + δ, δ > 0, it is possible to extract two subsequences (f k )k∈K and (f k+jk )k∈K , K ⊂ IN such that for any k ∈ K, f k+jk ≥ f k + δ f k+r ≥ f k for r = 1, . . . , jk . In fact, to prove this fact it is enough to take a subsequence convergent to lim sup(f k ) and associate with each index (say, l) the last index l − jl such that f l−jl ≤ f l − δ, if it exists. For large l, the construction will always be well-defined. Assume by contradiction that lim sup(f k ) > lim inf(f k ) + δ, for some δ > 0, and let the subsequence (f k )k∈K be given by the construction above. Then we conclude that for all k ∈ K, the iteration k is an h-iteration and from Lemma 3.4, (3.9) f k + δ ≤ f k+jk ≤ f k + M h(xk ). α Taking subsequences if necessary, assume that (xk )k∈K converges to a point x ¯. Then x ¯ must be feasible by Theorem 2.3. This contradicts (3.9), completing the proof. 3.2. Global convergence proof. In this section we show that any accumulation point of the sequence generated by Algorithm 2.1, with the new point computed by Algorithm 3.1, is stationary. Initially we show that near a feasible non-stationary point the objective function always changes by a large amount, precluding the possibility of feasible non-stationary accumulation points. Lemma 3.6. Let x ¯ ∈ X be a feasible non-stationary point. Then there exist a neighborhood V of x ¯ and δ > 0 such that whenever xk ∈ V , there exists lk ∈ IN such that (3.10) f0 (xk ) − f0 (xk+lk ) ≥ δ. 10 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO Proof. From Lemma 3.2, there exist a neighborhood V1 of x ¯ and constants β1 , β2 > 0 such that for all xk ∈ V1 , p (3.11) f0 (xk ) − f0 (xk+1 ) ≥ β1 Hk , (3.12) f0 (xk ) − f0 (xk+1 ) ≥ β2 kxk − xk+1 k and the iteration k is an f0 -iteration. Consider ε > 0 such that Bε (¯ x) = {x ∈ IRn | kx − x ¯k < ε} ⊂ V1 , and define V = Bε/2 (¯ x). Let k ∈ IN be such that xk ∈ V . While xk+i , i = 1, 2, . . ., remain in Bε (¯ x), the iterations (k + i) are f0 -iterations, and the filter does not change, i.e., Fk+i = Fk for i = 1, 2, . . . . √ Consequently, from (3.11) f0 decreases by at least the constant amount β1 Hk . Hence, there exists a finite lk ∈ IN such that xk+lk ∈ / Bε (¯ x), xk+i ∈ Bε (¯ x), for i = 0, 1, . . . , lk − 1. We have (3.13) and Fk+i = Fk kxk − xk+lk k ≥ ε 2 because xk ∈ Bε/2 (¯ x). Using (3.12), (3.13) and the triangle inequality, f0 (xk ) − f0 (xk+lk ) = lX k −1 f0 (xk+i ) − f0 (xk+i+1 ) i=0 lX k −1 ° k+i ° °x − xk+i+1 ° ≥ β2 ≥ ≥ β2 kxk − xk+lk k β2 ε/2, i=0 completing the proof. Now we show the main result of this section. Theorem 3.7. Any accumulation point of the sequence (xk ) is stationary. Proof. By contradiction, assume that there exist a non-stationary point x ¯ ∈ X and an infinite k K set K ⊂ IN such that x → x ¯. By Theorem 2.3, x ¯ is feasible. From Lemma 3.6, there exists δ > 0 such that for large k ∈ K there exists lk ∈ IN such that f0 (xk ) − f0 (xk+lk ) ≥ δ. This means that the sequence (f0 (xk )) is not a Cauchy sequence, contradicting Theorem 3.5 and completing the proof. 4. Optimality phase algorithm. As we have seen above, Hypothesis H5 is crucial for the convergence analysis. It is a very strong assumption and we must show that there exist methods satisfying this condition. We shall present a general trust region method for computing the optimality step. Given z k obtained at the feasibility phase of Algorithm 3.1, the optimality phase must find xk+1 in a linearized set such that f0 (xk+1 ) ≤ f0 (z k ), and so that xk+1 6∈ F¯k . We show that the resulting step satisfies H5. The main tool for the analysis (not necessarily for the construction) of such algorithms is the projected Cauchy direction. GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 11 The quadratic model. Given z k ∈ X generated by Algorithm 3.1 in the feasibility phase, the trust region algorithm associates to z k a quadratic model of f0 , 1 x ∈ IRn 7→ mk (x) = f0 (z k ) + ∇f0 (z k )T (x − z k ) + (x − z k )T Bk (x − z k ), 2 (4.1) where Bk is an n × n symmetric matrix. This matrix may be an approximation of ∇2 f0 (z k ), or any other matrix, provided that the hypothesis H6 below is verified. Usually, Bk will be an approximation of the Hessian of some Lagrangian function, and then mk deviates from a straightforward model of f0 by incorporating the curvature along the manifold of the constraints. Although this may be essential in the design of efficient algorithms, this discussion is out of the scope of this paper. In this section we assume that Hypotheses H1-H4 are satisfied as well as the following condition. H6. There exists β > 0 such that the quadratic model (4.1) satisfies kBk k ≤ β for all k ∈ IN. The trust region step uses a radius ∆ > 0 and computes a step d(z k , ∆) ∈ IRn such that kd(z k , ∆)k ≤ ∆. We define the predicted reduction produced by the step d(z k , ∆) as pred(z k , ∆) = mk (z k ) − mk (z k + d(z k , ∆)), (4.2) and the actual reduction as ared(z k , ∆) = f0 (z k ) − f0 (z k + d(z k , ∆)). (4.3) Lemma 4.1. Consider z k ∈ X and d(z k , ∆) ∈ IRn generated by the trust region algorithm. Then ared(z k , ∆) = pred(z k , ∆) + o(z k , ∆), (4.4) where lim+ ∆→0 o(z k , ∆) =0 ∆ uniformly in z k ∈ X. Proof. [9, Lemma 3.1] In the optimality step algorithm which we will discuss below we made the following choices which simplify the treatment: (1) Each trust region computation starts with a radius ∆ ≥ ∆min where ∆min > 0 is fixed. The choice of ∆ is irrelevant for the theory, and it usually comes from the former iteration. The use of this minimum radius ∆min simplifies the convergence proofs and enhances the chances of taking a pure Newton step. (2) A step d(z k , ∆) is only accepted if the sufficient decrease condition is satisfied: (4.5) ared(z k , ∆) > η pred(z k , ∆) for a given η ∈ (0, 1). (3) Given z ∈ IRn we define the linearization of the set {x ∈ IRn | fE (x) = fE (z), fI (x) ≤ fI+ (z)} by (4.6) L(z) = {x ∈ IRn | AE (z) (x − z) = 0, fI (z) + AI (z) (x − z) ≤ fI+ (z)}. 12 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO (4) The trust region computation solves approximately the problem minimize subject to (4.7) mk (x) x ∈ L(z k ) kx − z k k ≤ ∆, where k · k is any norm in IRn . Now we explain what we mean by “solving approximately”. Given z ∈ X and the set L(z), the projected gradient direction is defined by (4.8) dc (z) = PL(z) (z − ∇f0 (z)) − z. Define ϕ(z) = − ∇f0 (z)T dc (z) . kdc (z)k Then ϕ is the descent rate of f0 along dc . As usual, we denote dkc = dc (z k ), ϕk = ϕ(z k ). According to [9] we have, at a feasible point z, that the KKT conditions are equivalent to dc (z) = 0. Furthermore, if z is non-stationary, then ϕ(z) > 0. Now we use known results about the minimization of mk (·) along a direction – see the discussion on the Cauchy point in [13]. Defining the generalized Cauchy point as the minimizer of mk (·) along dc , in the trust region {x ∈ IRn | kx − z k k ≤ ∆}, ª © xc = argmin mk (x) | kx − z k k ≤ ∆, x = z k + λdkc , λ ≥ 0 , we know that ξϕk mk (z ) − mk (xc ) ≥ min 2 k ½ ¾ ϕk k , kdc k, ∆ , kBk k where ξ depends on the norms used. Using Hypothesis H6, this can be rewritten as ½ k ¾ ξϕk ϕ (4.9) mk (z k ) − mk (xc ) ≥ min , kdkc k, ∆ . 2 β We accept as an approximate solution of (4.7) any feasible solution for this problem satisfying (4.9). Algorithm 4.2. Optimality phase Data: z k ∈ / F¯k , ∆min > 0, ∆ ≥ ∆min , η ∈ (0, 1) repeat Compute d = d(z k , ∆) such that kdk ≤ ∆, z k + d ∈ L(z k ) and ½ k ¾ ξϕk ϕ k k pred(z , ∆) ≥ min , kdc k, ∆ . 2 kBk k Set ared(z k , ∆) = f0 (z k ) − f0 (z k + d). if z k + d ∈ / F¯k and ared(z k , ∆) ≥ η pred(z k , ∆) set xk+1 = z k + d, ∆k = ∆, and exit with success else ∆ = ∆/2. GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 13 Note that the algorithms proposed in this paper differ from that proposed in [9] only by the definition of the filter. The two next lemmas are independent of the filter, and hence remain true. Lemma 4.3. For any z ∈ X, d ∈ IRn such that (z + d) ∈ L(z), |h(z + d) − h(z)| = O(kdk2 ). Proof. [9, Lemma 3.3]. Next lemma says that if we ignore the filter, then the trust region step is large near x ¯. Lemma 4.4. Let x ¯ ∈ X be a feasible non-stationary point. Then there exist a neighborhood V˜ ˜ ∈ (0, ∆min ) and a constant c˜ > 0 such that for any z k ∈ V˜ , of x ¯, ∆ ˜ (i) for any ∆ > 0, pred(z k , ∆) ≥ c˜ min{∆, ∆}, k ˜ (ii) for any ∆ ∈ (0, ∆), ared(z , ∆) ≥ η pred(z k , ∆) ≥ η˜ c ∆. Proof. [9, Lemma 3.4]. Next lemma shows that near a feasible non-stationary point the refusal of an optimality step is due to a large increase of the infeasibility. ˜ and the Lemma 4.5. Let x ¯ ∈ X be a feasible non-stationary point and consider the constant ∆ ˜ ¯ ˜ neighborhood V given by Lemma 4.4. Then there exist a ´neighborhood V ⊂ V of x ¯ and a constant ³ ¯ ¯ ∆ k k ¯ ˜ ¯ ∆ ∈ (0, ∆) such that for any x , z ∈ V , and ∆ ∈ 2 , ∆ , (4.10) f0 (xk ) − f0 (z k + d(z k , ∆)) ≥ αh(z k + d(z k , ∆)). Furthermore, if z k + d(z k , ∆) was refused by Algorithm 4.2, then h(z k + d(z k , ∆)) ≥ Hk . (4.11) Proof. Let x ¯ be a feasible non-stationary point. Consider the constants α given in Algorithm ˜ c˜ given by Lemma 4.4. By 2.1, η given in Algorithm 3.1, the neighborhood V˜ and the constants ∆, Lemma 4.3 there exists a constant c > 0 such that ° °2 (4.12) |h(z k + d(z k , ∆)) − h(z k )| ≤ c °d(z k , ∆)° . ½ ¾ c ˜ η˜ ¯ = min ∆, Consider ∆ . 8αc From (3.3) we conclude that there exists a neighborhood V¯ ⊂ V˜ such that for all xk ∈ V¯ , |f0 (xk ) − f0 (z k )| ≤ By Lemma 4.4, for z k ∈ V¯ and ∆ ∈ ³ ¯ ¯ ∆ 2 ,∆ ared(z k , ∆) = ≥ ≥ ≥ 1 ¯ η˜ c∆. 4 ´ , f0 (z k ) − f0 (z k + d(z k , ∆)) η pred(z k , ∆) ˜ η c˜ min{∆, ∆} 1 ¯ η c˜ ∆. 2 14 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO Hence, for xk , z k ∈ V¯ and ∆ ∈ ared(xk , ∆) ³ ¯ ¯ ∆ 2 ,∆ ´ , we have f0 (xk ) − f0 (z k + d(z k , ∆)) f0 (xk ) − f0 (z k ) + f0 (z k ) − f0 (z k + d(z k , ∆)) (4.13) 1 ¯ ≥ η c˜ ∆. 4 ° ° ¯ and h(z k ) < (1 − α)h(xk ) we have On the other hand, by (4.12) and the facts that °d(z k , ∆)° < ∆ = = h(z k + d(z k , ∆)) ≤ ≤ ¯2 h(z k ) + c∆ ¯ 2. (1 − α)h(xk ) + c∆ We can restrict the neighborhood V¯ , if necessary, such that for xk ∈ V¯ , h(xk ) < consequently αh(z k + d(z k , ∆)) ≤ ≤ = ¯ η˜ c∆ and 8α(1 − α) ¯2 α(1 − α)h(xk ) + αc∆ 1 ¯ 1 ¯ η˜ c∆ + η˜ c∆ 8 8 1 ¯ η˜ c∆. 4 From this and (4.13), we have (4.14) f0 (xk ) − f0 (z k + d(z k , ∆)) ≥ 1 ¯ η˜ c∆ ≥ αh(z k + d(z k , ∆)), 4 proving the first statement. Furthermore, from Lemma 4.4, the trial point z k + d(z k , ∆) is accepted by the trust region criterion. Therefore, if the trial point was refused by Algorithm 4.2, then z k + d(z k , ∆) ∈ F¯k . We thus conclude from definition of the filter and (4.14) that h(z k + d(z k , ∆)) ≥ Hk , completing the proof. We now prove the main result of this section: Hypothesis H5 is satisfied when the optimality step is computed by Algorithm 4.2. Theorem 4.6. Given a feasible non-stationary point x ¯ ∈ X, there exists a neighborhood V of x ¯ such that for any iterate xk ∈ V , √ f0 (z k ) − f0 (xk+1 ) = Ω( Hk ), f0 (z k ) − f0 (xk+1 ) = Ω(kz k − xk+1 k) where xk+1 = z k + d(z k , ∆) is computed by Algorithm 4.2. Proof. Let x ¯ be a feasible non-stationary point. Algorithm 4.2 starts with the radius ∆ ≥ ∆min and ends with ∆k ≤ ∆. Thus xk+1 = z k + d(z k , ∆k ). By Algorithm 4.2, (4.15) f0 (z k ) − f0 (xk+1 ) = ared(z k , ∆k ) ≥ ηpred(z k , ∆k ). From Lemma 4.3 and (3.3), there exist constants c1 > 0 and c2 > 0 such that for all ∆ > 0 ° °2 (4.16) h(z k + d(z k , ∆)) − h(z k ) ≤ c1 °d(z k , ∆)° ≤ c1 ∆2 , (4.17) |f0 (z k ) − f0 (xk )| ≤ c2 h(xk ). GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 15 Consider the neighborhood V¯ given by Lemma 4.5 and V ⊂ V¯ such that h(x) < 1, for all x ∈ V . By construction of Algorithm 3.1 and definition of Hk , we have for xk ∈ V , h(z k ) < (1 − α)h(xk ) < (1 − α)Hk . (4.18) ¯ > 0 given by Lemma 4.5. We shall consider two cases. Consider the constant ∆ ¯ ∆ First case: ∆k ≥ . In this case, from Lemma 4.4 2 n o ¯ ˜ ≥ c˜ ∆ . pred(z k , ∆k ) ≥ c˜ min ∆k , ∆ 2 Using (4.15) we conclude f0 (z k ) − f0 (xk+1 ) ≥ η c˜ ¯ ∆ = Ω(1). 2 √ It follows trivially that f0 (z k ) − f0 (xk+1 ) = Ω( Hk ) and f0 (z k ) − f0 (xk+1 ) = Ω(kz k − xk+1 k), because in both cases the right hand side is bounded in X. ¯ ∆ Second case: now, assume that ∆k < . Algorithm 4.2 starts with a radius ∆ > 0, computes de 2 trial optimality step d(z k , ∆) and µ ¯reduces ¶ the radius until the trial step is accepted. ∆ ¯ By Lemma 4.5 for all ∆ ∈ ,∆ , 2 h(z k + d(z k , ∆)) ≥ Hk . (4.19) We shall analyze two situations. In the first one, we suppose that (4.19) holds for all ∆ ≤ Using (4.18), we have (4.20) ¯ ∆ . 2 h(z k + d(z k , ∆k )) − h(z k ) ≥ Hk − (1 − α)Hk = αHk . On the other hand, by (4.16), h(z k + d(z k , ∆k )) − h(z k ) ≤ c1 ∆k 2 , consequently, using (4.20) p ∆k = Ω( Hk ). (4.21) From (4.15) and Lemma 4.4 ˜ = η c˜∆k . f0 (z k ) − f0 (xk+1 ) ≥ η c˜ min{∆k , ∆} Using (4.21) and the fact that ∆k ≥ kxk+1 − z k k , we obtain the two conditions of the theorem. ¯ ∆ Let us see now the second possibility, that is, there exists ∆ ≤ such that 2 (4.22) h(z k + d(z k , ∆)) < Hk . 16 E. W. KARAS AND A. P. OENING AND A. A. RIBEIRO ¯ ˆ be the first ∆ ≤ ∆ satisfying such a condition. Then the radius 2∆ ˆ has been rejected by Let ∆ 2 Algorithm 4.2 and does not satisfy (4.22). Consequently, by (4.16) and (4.18) we have ˆ − h(z k ) ≤ 4c1 ∆ ˆ 2. αHk ≤ h(z k + d(z k , 2∆)) Thus p ˆ = Ω( Hk ). ∆ (4.23) From Lemma 4.4 ˆ > ηpred(z k , ∆) ˆ ≥ η˜ ˆ f0 (z k ) − f0 (z k + d(z k , ∆)) c∆. (4.24) ˆ ≥ ∆k ≥ kxk+1 − z k k we conclude that the point x Using (4.23) and the fact that the ∆ ˆ = ˆ satisfies the two conditions of the theorem. z k + d(z k , ∆) ˆ > ηpred(z k , ∆) ˆ and x To finish the proof, we must show that xk+1 = x ˆ. Since ared(z k , ∆) ˆ satisfies (4.22), it is enough to prove that f0 (ˆ x) + αh(ˆ x) < f0 (xk ). (4.25) By (4.16) and (4.18) ˆ 2. h(ˆ x) ≤ (1 − α)Hk + c1 ∆ Using (4.23) we have ˆ 2 ). h(ˆ x) = O(∆ (4.26) On the other hand, by (4.17) and (4.23) we have ˆ 2 ). |f0 (z k ) − f0 (xk )| ≤ c2 h(xk ) ≤ c2 Hk = O(∆ Using this and (4.24) we conclude that ˆ f0 (xk ) − f0 (ˆ x) = f0 (xk ) − f0 (z k ) + f0 (z k ) − f0 (ˆ x) = Ω(∆). Comparing this with (4.26) we have (4.25), completing the proof. 5. Conclusions. In this work we have presented a general algorithm which uses a slanting filter criterion for accepting the new iterates. This criterion was proposed initially by Chin [2] and was used by Chin and Fletcher [3] and by Fletcher, Leyffer and Toint [6]. In these works they prove that the all accumulation points of the sequence generated by the algorithm are feasible, assuming that an infinite number of pairs are added to the filter. We improve this result by proving the same claim without such assumption. This result does not depend on how the new iterate is computed. By computing the new iterates by sequential linear programming (SLP), the authors in [3] have proved that the sequence generated by the algorithm has a stationary accumulation. In the context of sequential quadratic programming (SQP) the same result is proved in [6]. In this paper we have proved stationarity of all accumulation points of the sequences generated by algorithms which compute the new iterates by the inexact restoration method, in the sense of Mart´ınez and Pilotta [11]. This result is independent of the internal algorithms used in the feasibility and optimality phases of the inexact restoration methods provided that the points generated be acceptable by the filter and that near a feasible non-stationary point, the reduction of the objective function be large in the optimality step. We show how to compute the optimality step in order to fulfill this hypothesis. GLOBAL CONVERGENCE OF SLANTING FILTER METHODS 17 Acknowledgements. We thank Cl´ovis C. Gonzaga and the referee for their valuable comments and suggestions which very much improved this paper. REFERENCES [1] R. H. Byrd, J. C. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming, 89(1):149–185, 2000. [2] C. M. Chin. A new trust region based SLP-filter algorithm which uses EQP active set strategy. PhD thesis, Department of Mathematics, University of Dundee, Scotland, 2001. [3] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filter algorithm that takes EQP steps. Mathematical Programming, 96(1):161–177, 2003. [4] R. Fletcher, N. Gould, S. Leyffer, P. Toint, and A. W¨ achter. Global convergence of a trust-region SQP-filter algorithm for general nonlinear programming. SIAM J. Optimization, 13(3):635–659, 2002. [5] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Mathematical Programming Ser. A, 91(2):239–269, 2002. [6] R. Fletcher, S. Leyffer, and P. L. Toint. On the global convergence of a filter-SQP algorithm. SIAM J. Optimization, 13(1):44–59, 2002. [7] D. M. Gay, M. L. Overton, and M. H. Wright. A primal-dual interior method for nonconvex nonlinear programming. In Yuan Y, editor, Advances in Nonlinear Programming, pages 31–56. Kluwer Academic Publishers, Dordretch, 1998. [8] F. A. M. Gomes, M. C. Maciel, and J. M. Mart´ınez. Nonlinear programming algorithms using trust regions and augmented Lagrangians with nonmonotone penalty parameters. Mathematical Programming, 84(1):161– 200, 1999. [9] C. C. Gonzaga, E. W. Karas, and M. Vanti. A globally convergent filter method for nonlinear programming. SIAM J. Optimization, 14(3):646–669, 2003. [10] J. M. Mart´ınez. Inexact-restoration method with Lagrangian tangent decrease and a new merit function for nonlinear programming. Journal of Optimization Theory and Applications, 111:39–58, 2001. [11] J. M. Mart´ınez and E. A. Pilotta. Inexact restoration algorithm for constrained optimization. Journal of Optimization Theory and Applications, 104:135–163, 2000. [12] J. M. Mart´ınez and E. A. Pilotta. Inexact restoration methods for nonlinear programming: Advances and perspectives. In Qi, Teo, and Yang, editors, Optimization and Control with Applications, pages 271–292. Springer, 2005. [13] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag, 1999. [14] R. J. Vanderbei and D. F. Shanno. An interior-point algorithm for nonconvex nonlinear programming. Computational Optimization and Applications, 13:231–252, 1999.
© Copyright 2024