Optimality Functions and Lopsided Convergence∗ Johannes O. Royset Roger J-B Wets Operations Research Department Naval Postgraduate School [email protected] Department of Mathematics University of California, Davis [email protected] Abstract. Optimality functions pioneered by E. Polak characterize stationary points, quantify the degree with which a point fails to be stationary, and play central roles in algorithm development. For optimization problems requiring approximations, optimality functions can be used to ensure consistency in approximations, with the consequence that optimal and stationary points of the approximate problems indeed are approximately optimal and stationary for an original problem. In this paper, we review the framework and apply it to nonlinear programming. This results in a convergence result for a primal interior point method without constraint qualifications or convexity assumptions. Moreover, we introduce lopsided convergence of bifunctions on metric spaces and show that this notion of convergence is instrumental in establishing consistency of approximations. Lopsided convergence also leads to further characterizations of stationary points under perturbations and approximations. Dedication. We dedicate this paper to our long-time friend, colleague, collaborator, and advisor Elijah (Lucien) Polak in honor of his 85th birthday. We wish him fair weather and following snow conditions. Keywords: epi-convergence, lopsided convergence, consistent approximations, optimality functions, optimality conditions, interior point method AMS Classification: Date: March 16, 2015 ∗ This material is based upon work supported in part by the U. S. Army Research Laboratory and the U. S. Army Research Office under grant numbers 00101-80683, W911NF-10-1-0246 and W911NF-12-1-0273. 1 1 Introduction It is well-known that optimality conditions are central to both theoretical and computational advances in optimization. They were developed over centuries starting with the pioneering works of Bishop N. Oresme (14th century) and P. de Fermat (17th century), and brought to their modern form by Karush, John, Kuhn, Tucker, Polak, Mangasarian, Fromowitz, and many others. In this paper, we discuss quantification of first-order necessary optimality conditions in terms of optimality functions as developed by E. Polak and co-authors; see [12] for numerous examples in nonlinear programming, semiinfinte optimization, and optimal control as well as [14, 17, 5, 9] for recent applications in stochastic and semiinfinte programming, nonsmooth optimization, and control of uncertain systems. It is apparent that how “far” a set of equalities, inequalities, and inclusions are from being satisfied can be quantified in numerous ways. The framework of optimality functions, as laid out in [12, Section 3.3] and references therein, stipulates axiomatic requirements that such quantifications should satisfy to facilitate the study and computation of approximate stationary points. Specifically, for an optimization problem that can only be “solved” through the solution of an approximating problem, one seeks to determine whether a near-stationary point of the approximating problem is an approximate stationary point of the original problem. The requirements on optimality functions exactly ensure this property. Moreover, there is ample empirical indications and some theoretical evidence (see for example [17, 16, 15, 8]) that computational benefits accrue from approximately solving a sequence of approximating problems with increasing fidelity, each warm-started with the previously obtained point. Optimality functions are tools to carry out such a scheme and give rise to adaptive rules for determining the timing of switches to higher-fidelity approximations. Consequently, the framework of optimality functions provides a pathway to constructing implementable algorithms consisting only of a finite number of arithmetic operations and function evaluations† . In this paper, we review the notion of optimality functions and illustrate the vast number of possibilities through several examples. In a novel application to nonlinear programming, we establish the convergence of a primal interior point method in the absence of constraint qualifications and convexity assumptions. We show that lopsided convergence of bifunctions [6, 7] is a useful tool for analyzing optimality functions and the associated stationary points. In particular, we prove that lopsided convergence of certain bifunctions, defining optimality functions of approximating problems, to a bifunction associated with an optimality function of the original problem, guarantees the axiomatic requirements on optimality functions. Moreover, we provide characterizations of stationary points under perturbations and approximations using lopsided convergence. In the process, we extend the primary definitions and results on lopsided convergence in [6, 7] from finite dimensions to any metric space. The paper is organized as follows. Section 2 defines optimality functions and gives several examples. Section 3 introduces approximating optimization problems, epi-convergence, and consistent approximations as defined by corresponding optimality functions, and demonstrates the implication for algorithmic development. Section 4 develops lopsided convergence in the context of metric spaces. The paper ends by utilizing lopsided convergence in the context of optimality functions. † The distinction between implementable and conceptual algorithms appears to be due to E. Polak [11, 10]. 2 2 Optimality Functions: Definitions and Examples We consider optimization problems defined on a metric space (X , ρX ), where C ⊂ X is a nonempty feasible set and f : C → IR an objective function, i.e., problems of the form minimize f (x) subject to x ∈ C ⊂ X . The function f might be defined and finite-valued outside C, but that will be immaterial to the following treatment. Therefore, the notation f : C → IR specifies the components f and C of optimization problems of this form, without implying that f is necessarily finite only on C. We denote by inf C f ∈ [−∞, ∞) and argminC f ⊂ C the corresponding optimal value and set of optimal points, respectively, the latter possibly being empty. For ε ≥ 0, the set of ε-optimal solutions is denoted by ε- argminC f = {x ∈ C | f (x) ≤ inf C f + ε} . As usually, we say that x∗ ∈ X is locally optimal (for f : C → IR) if there exists a δ > 0 such that f (x∗ ) ≤ f (x) for all x ∈ C with ρX (x, x∗ ) ≤ δ. Throughout the paper, we have that C is a nonempty subset of X and IR− = [−∞, 0]. We characterize stationary points in terms of optimality functions as defined next. 2.1 Definition (optimality function) An upper semicontinuous function θ : X → IR− is an optimality function for f : C → IR if C ⊂ X ⊂ X and x ∈ C locally optimal for f : C → IR =⇒ θ(x) = 0. The corresponding sets of stationary points and quasi-stationary points are SC,θ = {x ∈ C | θ(x) = 0} and Qθ = {x ∈ X | θ(x) = 0}, respectively. A series of examples help illustrate the concept; see also §5 and [12, 14, 17, 5, 9]. Example 1: Constrained Optimization over Convex Set. Consider the case X = IRn , C ⊂ X closed and convex, and f : C → IR continuously differentiable. Then, the function { } 1 2 θ(x) = min ⟨∇f (x), y − x⟩ + ∥y − x∥ , x ∈ X = C, y∈C 2 satisfies the requirements of Definition 2.1 and is therefore an optimality function for f : C → IR. If C = IRn , then the expression simplifies to 1 θ(x) = − ∥∇f (x)∥2 , 2 (1) which, of course, corresponds to the classical condition ∇f (x) = 0. Example 2: Nonlinear Programming. Consider the case X = IRn , C = {x ∈ IRn | fj (x) ≤ 0, j = 1, ..., q}, and f , f1 , ..., fq real-valued and continuously differentiable on IRn . Let ψ(x) = maxj=1,...,q fj (x) 3 and ψ+ (x) = max{0, ψ(x)}. Then, the function { 1 θ(x) = minn max −ψ+ (x) + ⟨∇f (x), y − x⟩ + ∥y − x∥2 , y∈IR 2 } 1 max {fj (x) − ψ(x)+ + ⟨∇fj (x), y − x⟩} + ∥y − x∥2 , x ∈ X = IRn , j=1,...,q 2 (2) satisfies the requirements of Definition 2.1 and is therefore an optimality function for f : C → IR. The condition θ(x) = 0 is equivalent to the Fritz-John conditions in the sense that when x ∈ C, θ(x) = 0 ⇐⇒ there exist µ0 , µ1 , ..., µq ≥ 0, with q ∑ µj = 1, j=0 such that µ0 ∇f (x) + q ∑ µj ∇fj (x) = 0 j=1 q ∑ µj fj (x) = 0. j=1 However, since θ is defined beyond C, it might also be associated with quasi-stationary points outside C. We refer to [12, Theorem 2.2.8] for proofs and further discussion. Example 3: Minimax Problem. Consider the case X = C = IRn and f (x) = maxz∈Z φ(x, z), x ∈ IRn , where φ : IRn × IRp → IR is continuous, the gradient ∇x φ : IRn × IRp → IRn with respect to the first argument exists and is continuous in both arguments, and Z is a compact subset of IRp . Then, the function { } 1 2 θ(x) = minn max φ(x, z) − f (x) + ⟨∇x φ(x, z), y − x⟩ + ∥y − x∥ , x ∈ X = IRn , (3) y∈IR z∈Z 2 satisfies the requirements of Definition 2.1 and is therefore an optimality function for f : IRn → IR. Moreover, θ(x) = 0 if and only if 0 ∈ ∂f (x) (the subdifferential of f ); see [12, Theorem 3.1.6] for details. We note that the upper semicontinuity of optimality functions ensures the computationally significant property that if a sequence xν → x and θ(xν ) ↗ 0, for example with {xν } obtained as approximate solutions of a corresponding optimization problem with gradually smaller tolerance, then θ(x) = 0 and x ∈ Qθ , i.e., x is quasi-stationary. Although not discussed further here, the optimality functions in Examples 1-3, and others, are also instrumental in constructing descent directions for the respective optimization problems; see [12] for details. 3 Approximations and Implementable Algorithms Problems involving functions defined in terms of integrals or optimization problems (as the maximization in Example 3), functions defined on infinite-dimensional spaces, and/or feasible sets defined by an infinite number of constraints almost always require approximations. For example, one might 4 resort to an approximating space X ν ⊂ X with points characterized by a finite number of parameters. Here, the superscript ν indicates that we might consider a family of such approximating spaces, ν ∈ IN = {1, 2, ..., }, with usually ∪ν∈IN X ν dense in X . A feasible set C ν ⊂ X ν may be an approximation of C or simply C ν = C ∩ X ν ; see §5 for a concrete illustration in the area of optimal control. A function f ν : C ν → IR could be a tractable approximation of f : C → IR. An example helps illustrate the situation. Example 3: Minimax Problem (cont.). Suppose that f ν (x) = maxz∈Z ν φ(x, z), x ∈ IRn , with Z ν ⊂ Z consisting of a finite number of points. Clearly, f ν is a (lower bounding) approximation of f = maxz∈Z φ(·, z) as defined above. The function f ν : IRn → IR can be associated with the optimality function { } 1 θν (x) = minn maxν φ(x, z) − f ν (x) + ⟨∇x φ(x, z), y − x⟩ + ∥y − x∥2 , x ∈ X = IRn , y∈IR z∈Z 2 which, as formalized in §5, approximates the optimality function θ in (3). We note that θν can be evaluated in finite time by solving a convex quadratic program with linear constraints; see [12, Theorem 2.1.6]. We next examine approximating functions f ν : C ν → IR and review the notion of epi-convergence, which provides a path to establishing that optimal points of the corresponding approximating problems indeed approximate optimal points of an original problem. To establish the analogous results for stationary points, we turn to optimality functions and slightly extend the approach in [12, Section 3.3] by considering arbitrary metric spaces and other minor generalizations. The section ends with a result that facilitates the development of implementable algorithms for the minimization of f : C → IR, which is then illustrated with the construction of an interior point method. Throughout the paper, we have that C ν is a nonempty subset of X . 3.1 Epi-Convergence We recall that epi-convergence, as defined next, is the key property when examining approximations of optimization problems; see [1, 3, 13] for more comprehensive treatments. 3.1 Definition (epi-convergence) The functions {f ν : C ν → IR}ν∈IN epi-converge to f : C → IR if (i) for every sequence xν → x ∈ X , with xν ∈ C ν , we have that liminf f ν (xν ) ≥ f (x) if x ∈ C and f ν (xν ) → ∞ otherwise; (ii) for every x ∈ C, there exists a sequence {xν }ν∈IN , with xν ∈ C ν , such that xν → x and limsup f ν (xν ) ≤ f (x). A main consequence of epi-convergence is the following well-known result. 3.2 Theorem (convergence of minimizers) Suppose that the functions {f ν : C ν → IR}ν∈IN epi-converge to f : C → IR. Then, limsup (inf C ν f ν ) ≤ inf C f. Moreover, if xk ∈ argmaxC νk f νk and xk → x for some increasing subsequence {ν1 , ν2 , ...} ⊂ IN , then x ∈ argmaxC f and limk→∞ inf C νk f νk = inf C f . 5 Proof. The second part is essentially in [2, Theorem 2.5], but not for the finite-valued setting. The first and second parts are in [6, Theorem 2.6] for the IRn case. The proof carries over essentially verbatim. A strengthening of the notion of epi-convergence ensures the convergence of infima. 3.3 Definition (tight epi-convergence) The functions {f ν : C ν → IR}ν∈IN epi-converge tightly to f : C → IR if f ν epi-converge to f and for all ε > 0, there exists a compact set Bε ⊂ X and an integer νε such that inf Bε ∩C ν f ν ≤ inf C ν f ν + ε for all ν ≥ νε . 3.4 Theorem (convergence of infima) Suppose that the functions {f ν : C ν → IR}ν∈IN epi-converge to f : C → IR and inf C f is finite. Then, they epi-converge tightly (i) if and only if inf C ν f ν → inf C f . (ii) if and only if there exists a sequence εν ↘ 0 such that εν - argmin f ν set-converges‡ to argmin f . Cν C Proof. Again, the proof in [6, Theorem 2.8] can be immediately translated to the present setting. 3.2 Consistent Approximations The convergence of optimal points as stipulated above is fundamental, but an analogous result for stationary points is also important, especially for nonconvex problems. Optimality functions play a central role in the development of such results. Combining epi-convergence with a limiting property for optimality functions lead to consistent approximations in the sense of E. Polak as defined next. We note that our definition of consistent approximations is an extension from that in [12, Section 3.3] as we consider arbitrary metric spaces and not only normed linear spaces. 3.5 Definition (consistent approximations) The pairs {(f ν : C ν → IR, θν : X ν → IR− )}ν∈IN of functions and corresponding optimality functions are weakly consistent approximations of the function and optimality-function pair (f : C → IR, θ : X → IR− ) if (i) {f ν : C ν → IR}ν∈IN epi-converge to f : C → IR and (ii) for every xν → x ∈ X , with xν ∈ X ν , limsup θν (xν ) ≤ θ(x) if x ∈ X, and θν (xν ) → −∞ otherwise. If in addition θν (x) < 0 for all x ∈ X ν \C ν and ν, then {(f ν : C ν → IR, θν : X ν → IR− )}ν∈IN are consistent approximations of (f : C → IR, θ : X → IR− ). We recall that the epigraph of f : C → IR is defined by epi f = {(x, x0 ) ∈ X × IR | x ∈ C, f (x) ≤ x0 }. ‡ We recall that the outer limit of a sequence of sets {Aν }ν∈IN , denoted by limsup Aν , is the collection of points y to which a subsequence of {y ν }ν∈IN , with y ν ∈ Aν , converges. The inner limit, denoted by liminf Aν , is the points to which a sequence of {y ν }ν∈IN , with y ν ∈ Aν , converges. If both limits exist and are identical, we say that the set is the Painlev´e-Kuratowski limit of {Aν }ν∈IN and that Aν set-converges to this set; see [4, 13]. 6 Since epi-convergence is equivalent to the set-convergence§ of the corresponding epigraphs, we have that Definition 3.5(i) amounts to epi f ν set-converges to f. Similarly, the hypograph of f : C → IR is defined by hypo f = {(x, x0 ) ∈ X × IR | x ∈ C, f (x) ≥ x0 }. In view of the definition of set-convergence, we therefore have that Definition 3.5(ii) amounts to limsup hypo θν ⊂ hypo θ. The additional condition in Definition 3.5 removing “weakly” can be viewed as a constraint qualification as it eliminates the possibility of quasi-stationary points that are not stationary point for f ν : C ν → IR, which might occur if the domain of θν is not restricted to C ν or other conditions are included. The main consequence of consistency is given next. 3.6 Theorem (convergence of stationary points) Suppose that the pairs {(f ν : C ν → IR, θν : X ν → IR− )}ν∈IN are weakly consistent approximations of (f : C → IR, θ : X → IR− ) and {xν }ν∈IN , xν ∈ X ν , is a sequence satisfying θν (xν ) ≥ −εν for all ν, with εν ≥ 0 and εν → 0. Then, every cluster point x of {xν }ν∈IN satisfies x ∈ Qθ , i.e., θ(x) = 0. If in addition the pairs are consistent approximations, εν = 0 for sufficiently large ν, and {f ν (xν )}ν∈IN is bounded from above, then x ∈ C, i.e., x ∈ SC,θ . Proof. Suppose that xν → x. Since −εν ≤ θν (xν ) for all ν, x ∈ X. Moreover, 0 ≤ limsup θν (xν ) ≤ θ(x) ≤ 0 and the first conclusion follows. In view of the definition of consistent approximations, we find that θν (xν ) = 0 for sufficiently large ν and therefore xν ∈ C ν for such ν. The epi-convergence of f ν : C ν → IR to f : C → IR implies that liminf f ν (xν ) ≥ f (x) if x ∈ C and f ν (xν ) → ∞ if x ̸∈ C. The latter possibility is ruled about by assumption and therefore x ∈ C. 3.3 Algorithms Theorem 3.6 provides a direct path to the construction of an implementable algorithm for minimizing f : C → IR. Specifically, construct a family of approximations {f ν : C ν → IR} and corresponding optimality functions {θν : X ν → IR− }, and then implement the following algorithm. Algorithm. 1. Set εν ≥ 0, with εν → 0, and ν = 1. 2. Obtain an approximate (quasi-)stationary point xν for f ν : C ν → IR that satisfies θν (xν ) ≥ −εν . § Here, we consider set-convergence of subsets of X × IR, which is equipped with the metric ρ((x, x0 ), (x′ , x′0 )) = max{ρX (x, x′ ), |x0 − x′0 |} for x, x′ ∈ X and x0 , x′0 ∈ IR. 7 3. Replace ν by ν + 1 and go to Step 2. If the pairs {(f ν : C ν → IR, θν : X ν → IR− )}ν∈IN are weakly consistent approximations of (f : C → IR, θ : X → IR− ), then every cluster point of the constructed sequence {xν } will be quasi-stationary for f : C → IR by Theorem 3.6. The algorithm is fully implementable under the practically reasonable assumption that one can obtain an approximate quasi-stationary point of f ν : C ν → IR in finite time. Example 2: Nonlinear Programming (cont.). Consider the standard logarithmic barrier approximation f ν (x) = f (x) − tν q ∑ log[−fj (x)], x ∈ C ν = {x ∈ IRn | fj (x) < 0, j = 1, ..., q}, j=1 where tν ↘ 0. We first establish epi-convergence of f ν : C ν → IR to f : C → IR. Suppose that xν → x, with xν ∈ C ν . Since C ν ⊂ C and C is closed, x ∈ C. Let ε > 0. There exists a νε such that −tν log[−fj (xν )] > −ε/q for all j with log[−fj (x)] ≥ 0 and ν ≥ νε . Hence, f ν (xν ) ≥ f (xν ) − ε for all ν ≥ νε . In view of the continuity of f and the fact that ε is arbitrary, we conclude that Definition 3.1(i) is satisfied. Next, let x∑∈ C. There exists a sequence {xν }ν∈IN such that xν ∈ C ν tends to x sufficiently slowly such that tν qj=1 log[−fj (xν )] → 0. Consequently, f ν (xν ) → f (x), which satisfies Definition 3.1(ii). Thus, f ν : C ν → IR epi-converge to f : C → IR. We next analyze optimality functions. Using a minmax theorem, one can show that (2) is equivalently expressed as 2 q q ∑ ∑ 1 n θ(x) = − min µ0 ψ+ (x) + µj [ψ+ (x) − fj (x)] + µ0 ∇f (x) + µj ∇fj (x) , x ∈ X = IR µ∈M 2 j=1 j=1 (4) ∑ where M = {(µ0 , µ1 , ..., µq ) | µj ≥ 0, j = 0, 1, ..., q, qj=0 = 1}; see [12, Theorem 2.2.8]. By (1) and direct differentiation of f ν , we obtain an approximating optimality function 2 q ∑ 1 ν ν ν θ (x) = − ∇f (x) + mj (x)∇fj (x) , x∈C , 2 j=1 where mνj (x) = −tν . fj (x) Suppose that xν → x ∈ IRn , with xν ∈ C ν . Since xν ∈ C ν ⊂ C and C is closed, x ∈ C. Let cν = 1 + q ∑ j=1 mνj (xν ), µν0 = mνj (xν ) 1 ν , and µ = , j = 1, ..., q. j cν cν Consequently, µν = (µν0 , µν1 , ..., µνq ) ∈ M for all ν. Since M is compact, {µν } has at least one convergent subsequence. Suppose that µν →N µ∞ , with N an infinite subsequence of IN . If j is such that fj (x) < 0, 8 then µνj →N 0 and consequently µ∞ j = 0 necessarily. In view of the continuity of the gradients, we then have that 2 2 q q ν (xν ) ν ν ∑ ∑ m θ (x ) 11 1 ∞ j ν ν N ∞ . = − ∇f (x ) + ∇f (x ) → − µ ∇f (x) + µ ∇f (x) j j 0 j ν (cν )2 2 cν 2 c j=1 j=1 Since x ∈ C, ψ+ (x) = 0. Therefore we also have that θν (xν ) (cν )2 →N 2 q ∑ 1 ∞ ∞ ∞ ∞ −µ0 ψ+ (x) − µj [ψ+ (x) − fj (x)] − µ0 ∇f (x) + µj ∇fj (x) ≤ θ(x), 2 j=1 j=1 q ∑ where the inequality follows from the fact that µ∞ ∈ M furnishes a possibly suboptimal solution in (4). Because θν (xν ) ≤ 0 and (cν )2 ≥ 1, the inequality remains valid when we drop the denominator on the left-hand side. Hence, we have shown that limsup θν (xν ) ≤ θ(x). This establishes the consistency of {f ν : C ν → IR, θν : C ν → IR− )}. Consequently, the above algorithm, which can then be viewed as a primal interior point method, generates cluster points that are stationary for f : C → IR in the sense of Fritz-John. We observe that this is achieved without any constraint qualifications and convexity assumptions. In this case, Step 2 of the algorithm can be achieved by any of the standard unconstrained optimization methods in finite time. The key technical challenge associate with the above scheme is to establish (weak) consistency. In the next section, we provide tools for this purpose that rely on lopsided convergence. 4 Lopsided Convergence In view of the definition of optimality functions, it is apparent that if Qθ ̸= ∅, then Qθ = argmaxX θ. Moreover, Examples 1-3 indicate that many optimality functions take the form θ(x) = inf F (x, y), with Y ⊂ Y y∈Y (5) for some metric space (Y, ρY ) and function F . In fact, in our examples, Y = IRn and F involves gradients and other quantities; §5 provides an example in infinite dimensions. From these observations it is apparent that the consideration of maxinf-problems of the form max inf F (x, y) x∈X y∈Y for bifunction F : X ×Y → IR will provide direct insight about (quasi-)stationary points of optimization problems. We therefore set out to describe the fundamental tool for examining the convergence of such maxinf-problems, which is lopsided convergence. In the process, we extend some of the results in [6, 7] to general metric spaces. 9 Suppose that (X , ρX ) and (Y, ρY ) are metric spaces, X ⊂ X and Y ⊂ Y are nonempty, and F : X × Y → IR is a bifunction. We say that x∗ is a maxinf-point of F if { } ∗ x ∈ argmax inf F (x, y) . x∈X y∈Y The study of such functions is facilitated by the notion of lopsided convergence as defined next. 4.1 Definition (lopsided convergence) The bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge to F : X × Y → IR if (i) for all y ∈ Y and xν → x ∈ X , with xν ∈ X ν , there exists y ν → y, with y ν ∈ Y ν , such that limsup F ν (xν , y ν ) ≤ F (x, y) if x ∈ X and F ν (xν , y ν ) → −∞ otherwise. (ii) for all x ∈ X, there exists xν → x, with xν ∈ X ν , such that for all y ν → y ∈ Y, with y ν ∈ Y ν , liminf F ν (xν , y ν ) ≥ F (x, y) if y ∈ Y and F ν (xν , y ν ) → ∞ otherwise. We assume throughout that the sets X ν ⊂ X and Y ν ⊂ Y are nonempty. We start with a preliminary result. 4.2 Proposition (epi-convergence of slices) Suppose that the bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge to F : X × Y → IR. Then, for all x ∈ X, there exists xν → x, with xν ∈ X ν such that the functions F ν (xν , ·) epi-converge to F (x, ·). Proof. We follow the same arguments as in [6, Proposition 3.2], where X = IRn is considered. From Definition 4.1(ii) there exists xν → x, with xν ∈ X ν , such that the functions {F ν (xν , ·)}ν∈IN and F (x, ·) satisfy Definition 3.1(i). From Definition 4.1(i), for any y ∈ Y and xν → x, with xν ∈ X ν , one can find y ν → y, with y ν ∈ Y ν , such that Definition 3.1(ii) is also satisfied. We recall that the inf-projections of the bifunctions F ν : X ν × Y ν → IR and F : X × Y → IR are defined as the functions h(x) = inf F (x, y), for x ∈ X, and hν (x) = inf ν F ν (x, y), for x ∈ X. y∈Y y∈Y In addition to their overall interest, inf-projections of bifunctions are central to the study of optimality functions as clearly highlighted by (5). 4.3 Theorem (containment of inf-projections) Suppose that the bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge to F : X × Y → IR and −∞ < inf Y F (x, ·) for some x ∈ X. Then, the inf-projections hν : X ν → [−∞, ∞) and h : X → [−∞, ∞) satisfy limsup hypo hν ⊂ hypo h. Proof. Suppose that (x, x0 ) ∈ limsup hypo hν . Then there exists a sequence {(xν , xν0 )}ν∈N , with N an infinite subsequence of IN , xν ∈ X ν , hν (xν ) ≥ xν0 , xν →N x, and xν0 →N x0 . If x ̸∈ X, then take y ∈ Y and construct a sequence y ν → y, with y ν ∈ Y ν , such that F ν (xν , y ν ) →N −∞, which exists by Definition 4.1(i). However, xν0 ≤ hν (xν ) ≤ F ν (xν , y ν ), ν ∈ N, 10 imply to a contradiction since xν0 →N x0 ∈ IR. Thus, x ∈ X. If h(x) = −∞, then there exists y ∈ Y such that F (x, y) ≤ x0 − 1. Definition 4.1(i) ensures that there exists a sequence y ν → y, with y ν ∈ Y ν , such that limsup F ν (xν , y ν ) ≤ F (x, y). Consequently, x0 = limsupν∈N xν0 ≤ limsupν∈N hν (xν ) ≤ limsupν∈N F ν (xν , y ν ) ≤ F (x, y) ≤ x0 − 1, which is a contradiction. Hence, it suffices to consider the case with h(x) finite. Given any ε > 0 arbitrarily small, pick yε ∈ Y such that F (x, yε ) − ε ≤ h(x). Then Definition 4.1(i) again yields y ν → yε , with y ν ∈ Y ν , such that limsupν∈N hν (xν ) ≤ limsupν∈N F ν (xν , y ν ) ≤ F (x, yε ) ≤ h(x) + ε, implying limsupν∈N hν (xν ) ≤ h(x). Since x0 = limsupν∈N xν0 ≤ limsupν∈N hν (xν ) ≤ h(x), the conclusion follows. Additional results can be obtained under a strengthening of the lopsided convergence analogous to tight epi-convergence. 4.4 Definition (ancillary-tight lop-convergence) The lop-convergence of bifunctions {F ν : X ν × Y ν → IR}ν∈IN to F : X × Y → IR is ancilliary-tight if Definition 4.1 holds and for any ε > 0 one can find a compact set Bε ⊂ Y, depending possibly on the sequence xν → x selected in Definition 4.1(ii), such that inf F ν (xν , y) ≤ inf ν F ν (xν , y) + ε for sufficiently large ν. ν y∈Y ∩Bε y∈Y Under ancillary-tight lop-convergence, we can strengthen the conclusion of Theorem 4.3 as follows. 4.5 Theorem (hypo-convergence of inf-projections) Suppose that the bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge ancillary-tightly to F : X × Y → IR and −∞ < inf Y F (x, ·) for some x ∈ X. Then, the corresponding inf-projections hν : X ν → [−∞, ∞) hypo-converges to the inf-projection h : X → [−∞, ∞), i.e., hypo hν set-converges to hypo h. Proof. Since it is a very short proof, we include it for completeness sake. It is verbatim the same as that of [6, Theorem 3.4]. Let x ∈ X be such that h(x) is finite. Now, choose xν → x, with xν ∈ X ν , such that F ν (xν , ·) epi-converge to F (x, ·), cf. Proposition 4.2. In fact, they epi-converge tightly as an immediate consequence of ancillary-tightness. Thus, hν (xν ) = inf ν F ν (xν , y ν ) → inf F (x, y) = h(x), y∈Y y∈Y via Theorem 3.4. Further strengthening of the notion is also beneficial. 4.6 Definition (tight lop-convergence) The lop-convergence of bifunctions {F ν : X ν × Y ν → IR}ν∈IN to F : X × Y → IR is tight if Definition 4.4 holds and for any ε > 0 one can find a compact set Aε ⊂ X such that sup inf ν F ν (x, y) ≥ sup inf ν F ν (x, y) − ε for sufficiently large ν. x∈X ν ∩Aε y∈Y x∈X ν y∈Y 11 Under tight lop-convergence, we can strengthen the conclusion of Theorem 4.5 as follows. 4.7 Theorem (approximating maxinf-points) Suppose that the bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge tightly to F : X × Y → IR and supX inf Y F is finite. Then, sup inf ν F ν (x, y) → sup inf F (x, y). x∈X ν y∈Y x∈X y∈Y Moreover, for every x∗ ∈ argmaxx∈X inf y∈Y F (x, y), there exist an infinite subsequence N of IN , {εν }ν∈N , with εν ↘ 0, and {xν }ν∈N , with xν ∈ εν - argmaxx∈X ν inf y∈Y ν F ν (x, y), such that xν →N x. Conversely, if such sequences exists, then supx∈X ν inf y∈Y ν F ν (x, y) →N inf y∈Y F (x∗ , ·) Proof. We refer to the arguments of the proof of [6, Theorem 3.7]. 5 Applications and Further Examples We now return to the context of optimality functions of the form (5). We start with providing a sufficient condition for the required upper semicontinuity of an optimality function; see Definition 2.1. We state the result for general inf-projections. 5.1 Theorem (upper semicontinuity of inf-projection) If the bifunction F : X × Y → IR is lower semicontinuous, F (·, y) is upper semicontinuous for all y ∈ Y , and Y is closed, then the corresponding inf-projection h(x) = inf Y F (x, ·), x ∈ X, is upper semicontinuous. Proof. Let xν → x ∈ X, with xν ∈ X. We show that the functions F (xν , ·) epi-converge to F (x, ·), which are all defined on Y . Suppose that y ν → y, with y ν ∈ Y . By the closedness of Y , y ∈ Y . Moreover, liminf F (xν , y ν ) ≥ F (x, y) by the lower semicontinuity of F at (x, y). Consequently, part (i) of Definition 3.1 is satisfied. Next, let y ν = y ∈ Y . Then, limsup F (xν , y ν ) = limsup F (xν , y) ≤ F (x, y) by the upper semicontinuity of F (·, y) and part (ii) of Definition 3.1 is satisfied. Since F (xν , ·) epiconverge to F (x, ·), Theorem 3.2 demonstrates that limsup h(xν ) ≤ h(x). Applications of this theorem to Examples 1-3 establish the upper semicontinuity of the corresponding optimality functions. We next turn to the requirement for consistency in Definition 3.5(ii). 5.2 Theorem (sufficient condition for consistency, optimality function part) Suppose that the bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge to F : X × Y → IR and that the bifunctions define the optimality functions θν = inf y∈Y ν F ν (·, y) and θ = inf y∈Y F (·, y), with −∞ < θ(x) for some x ∈ X. Then, for every xν → x ∈ X , with xν ∈ X ν , limsup θν (xν ) ≤ θ(x) if x ∈ X, and θν (xν ) → −∞ otherwise. Proof. The result is a direct consequence of Theorem 4.3. In view of this result, it is clear that (weak) consistency will be ensured by epi-convergence of the approximating objective functions and feasible sets as well as lopsided convergence of the approximating bifunctions defining the corresponding optimality functions. 12 We illustrate Theorem 5.2 in the context of Example 3. Example 3: Minimax Problem (cont.). Suppose that for every z ∈ Z, there exists a sequence z ν ∈ Z ν such that z ν → z. Let { } 1 ν ν 2 F (x, y) = maxν φ(x, z) − f (x) + ⟨∇x φ(x, z), y − x⟩ + ∥y − x∥ , x, y ∈ IRn , z∈Z 2 and F be defined similarly with the superscripts removed. We next show lopsided convergence of F ν to F . First consider part (i) of Definition 4.1. Let y ∈ IRn and xν → x ∈ IRn . Set y ν = y for all ν. Clearly, limsup F ν (xν , y ν ) ≤ limsup F (xν , y) = F (x, y) by the continuity of F and part (i) holds. Second, we consider part (ii). Let x ∈ IRn and y ν → y ∈ IRn . Set xν = x for all ν. Let { } 1 2 zx ∈ argmaxz∈Z φ(x, z) − f (x) + ⟨∇x φ(x, z), y − x⟩ + ∥y − x∥ . 2 Let ε > 0. By assumption on Z ν and the continuity of φ(x, ·) and ∇x φ(x, ·), there exists z ν ∈ Z ν and ν0 such that φ(x, z ν ) − φ(x, zx ) > −ε } { ε ∥∇x φ(x, z ν ) − ∇x φ(x, zx )∥ < min ε, ∥y − x∥ for all ν ≥ ν0 . Consequently, ν ≥ ν0 , F ν (xν , y ν ) = F ν (x, y ν ) { } 1 = maxν φ(x, z) − f ν (x) + ⟨∇x φ(x, z), y ν − x⟩ + ∥y ν − x∥2 z∈Z 2 1 ≥ φ(x, z ν ) − f (x) + ⟨∇x φ(x, z ν ), y ν − x⟩ + ∥y ν − x∥2 2 1 = φ(x, zx ) − f (x) + ⟨∇x φ(x, zx ), y − x⟩ + ∥y − x∥2 + φ(x, z ν ) − φ(x, zx ) 2 1 1 ν + ⟨∇x φ(x, z ) − ∇x φ(x, zx ), y − x⟩ + ⟨∇x φ(x, z ν ), y ν − y⟩ + ∥y ν − x∥2 − ∥y − x∥2 2 2 1 > φ(x, zx ) − f (x) + ⟨∇x φ(x, zx ), y − x⟩ + ∥y − x∥2 2 1 ν 1 ν ν − ε − ε + ⟨∇x φ(x, z ), y − y⟩ + ∥y − x∥2 − ∥y − x∥2 2 2 1 1 = F (x, y) − 2ε + ⟨∇x φ(x, z ν ), y ν − y⟩ + ∥y ν − x∥2 − ∥y − x∥2 2 2 Since y ν → y, {z ν } is bounded, and ∇x φ is continuous, it follows that liminf F ν (xν , y ν ) ≥ F (x, y) − 2ε. Since ε was arbitrary, part (ii) of Definition 4.1 holds and F ν therefore lop-converge to F . In view of Theorem 5.2 and the fact that epi-convergence is also easily established, we have that {(f ν : IRn → IR, θν : IRn → IR− )} are consistent approximations of {(f : IRn → IR, θ : IRn → IR− )} in this case. The above algorithm therefore is implementable for the solution of the semiinfinite minimax problem minx∈IRn maxz∈Z φ(x, z). 13 As we see next, under slightly stronger assumptions, the approximating bifunctions do not need to be associated with an optimality function to achieve convergence to quasi-stationary points. 5.3 Theorem (convergence to quasi-stationary points) Suppose that the bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge ancillary-tightly to F : X ×Y → IR and θ = inf y∈Y F (·, y) is an optimality function for f : C → IR with Qθ ̸= ∅. If xν ∈ argmaxx∈X ν inf y∈Y ν F ν (x, y) for all ν, then every cluster point x of {xν }ν∈IN is quasi-stationary for f : C → IR, i.e., x ∈ Qθ . Proof. By Theorem 4.5, hν = inf y∈Y ν F ν (·, y) hypo-converge to h = inf y∈Y F (·, y). By translating Theorem 3.2 from the stated minimization framework to a maximization framework we obtain that x ∈ argmaxX h. Since Qθ ̸= ∅, Qθ = argmaxX h and the conclusion follows. Further characterization of (quasi-)stationary points is available under tight lopsided convergence. 5.4 Theorem (characterization of quasi-stationary points) Suppose that the bifunctions {F ν : X ν × Y ν → IR}ν∈IN lop-converge tightly to F : X × Y → IR and θ = inf y∈Y F (·, y) is an optimality function for f : C → IR with Qθ ̸= ∅. For every x ∈ Qθ there exist an infinite subsequence N of IN , {εν }ν∈N , with εν ↘ 0, and {xν }ν∈N , with xν ∈ εν - argmaxx∈X ν inf y∈Y ν F ν (x, y), such that xν →N x. Proof. The result is a direct consequence of Theorem 4.7. We end the paper with an example from the area of optimal control and adjust the notation accordingly. Example 4: Optimal Control. We here follow the set-up in Section 5.6 and Chapter 4 of [12], which contain further details. For g : IRn × IRm → IRn , we consider the dynamical system x(t) ˙ = g(x(t), u(t)), for t ∈ [0, 1], with x(0) = ξ ∈ IRn , m where the control u ∈ Lm ∞ = {u : [0, 1] → IR | measurable, essentially bounded}. Since such controls are contained in the space of square-integrable functions from [0, 1] to IRm , the usual L2 -norm applies; see [12, p.709] for a motivation for this “hybrid” set-up. Let H = IRn × Lm ∞ . For initial condition and ¯ control pairs η = (ξ, u) ∈ H and η¯ = (ξ, u ¯) ∈ H, we equip H with the inner product and norm ∫ ¯ + ⟨η, η¯⟩H = ⟨ξ, ξ⟩ 0 1 ⟨u(t), u ¯(t)⟩dt and ∥η∥2H = ⟨η, η⟩H . We consider control constraints of the form u(t) ∈ C, for almost every t ∈ [0, 1] for some given convex and compact set C ⊂ IRm . By imposing the constraints for almost every t instead of every t, we deviate slightly from [12] and follow [9]. We therefore also define the feasible set n U = Lm ∞ ∩ {u | u(t) ∈ C, for almost every t ∈ [0, 1]} and H = IR × U. Under standard assumptions, a solution of the differential equation, for a given η ∈ H, denoted by xη is unique, Lipschitz continuous, and Gateaux differentiable in η. Consequently, for a given φ : IRn ×IRn → IR, Lipschitz continuously differentiable on bounded sets, the function f : H → IR defined by f (η) = φ(ξ, xη (1)), for η = (ξ, u) ∈ H, 14 has a Gateaux differential of the form ⟨∇f (η), η¯ − η⟩H for some Lipschitz continuous gradient ∇f (η) given in [12, Corollary 5.6.9]. The optimal control problem minimize f (η) subject to η ∈ H, analogous to Example 1, has an optimality function θ(η) = min F (η, η¯), for η ∈ H, η¯∈H where 1 F (η, η¯) = ⟨∇f (η), η¯ − η⟩H + ∥¯ η − η∥2H , for η, η¯ ∈ H. 2 We next consider approximations. Let U ν ⊂ U , ν ∈ IN , consist of the piecewise constant functions that are constant on each of the intervals [(k − 1)/ν, k/ν), k = 1, ..., ν. Set H ν = IRn × U ν . Moreover, let xνη be the (unique) solution of the forward Euler approximation of the differential equation, using time-step 1/ν, given input η = (ξ, u) ∈ H. An approximate problem then takes the form minimize f ν (η) subject to η ∈ H ν , where f ν (η) = φ(ξ, xνη (1)). One can show that θν (η) = minν F ν (η, η¯), for η ∈ H ν , η¯∈H where 1 η − η∥2H , for η, η¯ ∈ H ν , F ν (η, η¯) = ⟨∇f ν (η), η¯ − η⟩H + ∥¯ 2 is an optimality function of f ν : H ν → IR, where the Lipschitz continuous gradient ∇f ν (η) is given in [12, Theorem 5.6.19]. By [12, Theorem 4.3.2], for every bounded set S ⊂ H, there exists a CS < ∞ such that |f (η) − f ν (η)| ≤ CS /ν and ∥∇f (η) − ∇f ν (η)∥H ≤ CS /ν for all η ∈ S. Moreover, ∪ν∈IN H ν is dense in H. Consequently, it is easily established that f ν : H ν → IR epi-converge to f : H → IR. We next consider the optimality functions. Let η¯ ∈ H and η ν → η ∈ H, with η ν ∈ H ν . Necessarily, η ∈ H. Due to the density result, there exists η¯ν → η¯, with η¯ν ∈ H ν . Hence, |F ν (η ν , η¯ν ) − F (η, η¯)| ≤ ∥∇f ν (η ν ) − ∇f (η)∥H ∥¯ η ν − η ν ∥H 1 ν 1 ν η − η ν ∥2H − ∥¯ η − η ν ∥2H → 0 + ∥∇f (η)∥H ∥¯ η ν − η ν − η¯ + η∥H + ∥¯ 2 2 and we have shown Definition 4.1(i). Using similar arguments, we also establish part (ii) and the lopsided convergence of F ν to F . Consequently, {(f ν : H ν → IR, θν : H ν → IR− )}ν∈IN are consistent approximations of (f : H → IR, θ : H → IR− ). Since the minimization of f ν : H ν → IR is equivalent to an optimization problem on a Euclidean space, the above algorithm is implementable for the infinitedimensional problem f : H → IR. 15 References [1] H. Attouch. Variational Convergence for Functions and Operators. Applicable Mathematics Sciences. Pitman, 1984. [2] H. Attouch and R. Wets. A convergence theory for saddle functions. Transactions of the American Mathematical Society, 280(1):1–41, 1983. [3] J.-P. Aubin and H. Frankowska. Set-Valued Analysis. Birkh¨auser, 1990. [4] G. Beer. Topologies on Closed and Closed Convex Sets, volume 268 of Mathematics and its Applications. Kluwer, 1992. [5] J.C. Foraker, J.O. Royset, and I. Kaminer. Search-trajectory optimization: Part 1, formulation and theory. in review. [6] A. Jofre and R. J-B Wets. Variational convergence of bivariate functions: Lopsided convergence. Mathematical Programming B, 116:275–295, 2009. [7] A. Jofre and R. J-B Wets. Variational convergence of bifunctions: Motivating applications. SIAM J. Optimization, 24(4):1952–1979, 2014. [8] R. Pasupathy. On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simulation optimization. Operations Research, 58(4):889–901, 2010. [9] C. Phelps, J.O. Royset, and Q. Gong. Optimal control of uncertain systems using sample average approximations. in review. [10] E. Polak. Computational Methods in Optimization. A Unified Approach. Academic Press, 1971. [11] E. Polak. On the use of models in the synthesis of optimization algorithms. In H. Kuhn and G. Szego, editors, Differential Games and Related Topics, pages 263–279. North Holland, Amsterdam, 1971. [12] E. Polak. Optimization. Algorithms and Consistent Approximations, volume 124 of Applied Mathematical Sciences. Springer, 1997. [13] R.T. Rockafellar and R. Wets. Variational Analysis, volume 317 of Grundlehren der Mathematischen Wissenschaft. Springer, 3rd printing-2009 edition, 1998. [14] J. O. Royset. Optimality functions in stochastic programming. 135(1):293–321, 2012. Mathematical Programming, [15] J. O. Royset. On sample size control in sample average approximations for solving smooth stochastic programs. Computational Optimization and Applications, 55(2):265–309, 2013. [16] J. O. Royset and R. Szechtman. Optimal budget allocation for sample average approximation. Operations Research, 61:762–776, 2013. 16 [17] J.O. Royset and E.Y. Pee. Rate of convergence analysis of discretization and smoothing algorithms for semi-infinite minimax problems. Journal of Optimization Theory and Applications, 155(3):855– 882, 2012. 17
© Copyright 2024