Functional Analysis and Optimization Kazufumi Ito∗ November 4, 2014 Abstract In this monograph we develop the function space method for optimization problems and operator equations in Banach spaces. Optimization is the one of key components for mathematical modeling of real world problems and the solution method provides an accurate and essential description and validation of the mathematical model. Optimization problems are encountered frequently in engineering and sciences and have widespread practical applications. In general we optimize an appropriately chosen cost functional subject to constraints. For example, the constraints are in the form of equality constraints and inequality constraints. The problem is casted in a function space and it is important to formulate the problem in a proper function space framework in order to develop the accurate theory and the effective algorithm. Nonsmooth optimization becomes a very basic modeling toll and enlarges and enhances the applications of the optimization method in general. For example, in the classical variational formulation we analyze the non Newtonian energy functional and nonsmooth friction penalty. For the imaging/signal analysis the sparsity optimization is used by means of of L1 and T V regularization. In order to develop an efficient solution method for large scale optimizations it is essential to develop a set of necessary conditions in the equation form rather than the variational inequality. The Lagrange multiplier theory for the constrained minimizations and nonsmooth optimization problems is developed and analyzed The theory derives the complementarity condition for the multiplier and the solution. Consequently, one can derive the necessary optimality system for the solution and the multiplier in a general class of nonsmooth and noncovex optimizations. In the light of the theory we derive and analyze numerical algorithms based the primal-dual active set method and the semi-smooth Newton method. The monograph also covers the basic materials for real analysis, functional analysis, Banach space theory, convex analysis, operator theory and PDE theory, which makes the book self-contained and comprehensive. The analysis component is naturally connected to the optimization theory. The necessary optimality condition is in general written as nonlinear operator equations for the primal variable and Lagrange multiplier. The Lagrange multiplier theory of a general class of nonsmooth and non-convex optimization are based on the functional analysis tool. For example, the necessary optimality for the constrained optimization is in general nonsmooth and a psuedo-monontone type operator equation. To develop the mathematical treatment ∗ Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA 1 of the PDE constrained minimization we also present the PDE theory and linear and nonlinear C0 -semigroup theory on Banach spaces for well-posedness of the control dynamics and optimization problems. That is, the existence and uniqueness of solutions to a general class of controlled linear and nonlinear PDEs and Cauchy problems for nonlinear evolution equations in Banach space are discussed. Especially the linear operator theory and the (psuedo-) monotone operator and mapping theory and the fixed point theory. Throughout the monograph demonstrates the theory and algorithm using concrete examples and describes how to apply it for motivated applications 1 Introduction In this monograph we develop the function space approach for the optimization problems, e.g., nonlinear programming in Banach spaces, convex and non-convex nonsmooth variational problems, control and inverse problems, image/signal analysis, material design, classification, and resource allocation and for the nonlinear equations and Cauchy problems in Banach spaces. A general class of constrained optimization problems in function spaces is considered and we develop the Lagrange multiplier theory and effective solution algorithms. Applications are presented and analyzed for various examples including control and design optimization, inverse problems, image analysis and variational problems. Nonconvex and non-smooth optimization becomes increasing important for advanced and effective use of the optimization methods and also the essential topic for the monograph We introduce the basic function space and functional analysis tools needed for our analysis. The monograph provides an introduction to the functional analysis, real analysis and convex analysis and their basic concepts with illustrated examples. Thus, one can use the book as a basic course material for the functional analysis and nonlinear operator theory. An emerging application of optimization is the imaging analysis and can be formulated as the PDE’s constrained optimization. The PDE theory and the nonlinear operator theory and their applications are analyzed in details. There are numerous applications of the optimization theory and variational method in all sciences and to real world applications. For example, the followings are the motivated examples. Example 1 (Quadratic programming) min 1 t 2 x Ax − bt x over x ∈ Rn subject to Ex = c (equality constraint) Gx ≤ g (inequality constraint). where A ∈ Rn×n is a symmetric positive matrix, E ∈ Rm×n , G ∈ Rp×n , and b ∈ Rn , c ∈ Rm , g ∈ Rp are given vectors. The quadratic programming is formulated on a Hilbert space X for the variational problem, in which (x, Ax)X defines the quadratic form on X. Example 2 (Linear programming) The linear programing minimizes (c, x)X subject to the linear constraint Ax = b and x ≥ 0. Its dual problem is to maximize (b, x) subject to A∗ x ≤ c. It has many important classes of applications. For example, the optimal transport problem is to find the joint distribution function µ(x, y) ≥ 0 on X × Y that 2 transports the probability density p0 (x) to p1 (y), i.e. Z Z µ(x, y) dx = p0 (x), µ(x, y) dy = p1 (y), Y Y with minimum energy Z c(x, y)µ(x, y) dxdy = (c, µ). X×Y Example 3 (Integer programming) Let F is a continuous function Rn . The integer programming is to minimize F (x) over the integer variables x, i.e., min F (x) subject to x ∈ {0, 1, · · · N }. The binary optimization is to minimize F (x) over the binary variables x. The network optimization problem is the one of important applications. Example4 (Obstacle problem) Consider the variational problem; Z min J(u) = 0 1 1 d ( | u(x)|2 − f (x)u(x)) dx 2 dx (1.1) subject to u(x) ≤ ψ(x) over all displacement function u(x) ∈ H01 (0, 1), where the function ψ represents the upper bound (obstacle) of the deformation. The functional J represents the Newtonian deformation energy and f (x) is the applied force. Example 5 (Function Interpolation) Consider the interpolation problem; Z min 0 1 1 d2 ( | 2 u(x)|2 dx 2 dx over all functions u(x), (1.2) subject to the interpolation conditions u(xi ) = bi , d u(xi ) = ci , 1 ≤ i ≤ m. dx Example 6 (Image/Signal Analysis) Consider the deconvolution problem of finding the image u that satisfies (Ku)(x) = f (x) for the convolution K Z (Ku)(x) = k(x, y)u(y) dy. Ω over a domain Ω. It is in general ill-posed and we use the (Tikhonov) regularization formulation to have a stable deconvolution solution u, i.e., for Ω = [0, 1] Z min 0 1 (|Ku − f |2 + α d | u(x)|2 + β |u(x)|) dx over all possible image u ≥ 0, (1.3) 2 dx where α, β ≥ 0 are the regularization parameters and the two regularization functional is chosen so that the variation of image u is regularized. 3 Example 7 (L0 optimization) Problem of optimal resource allocation and material distribution and optimal time scheduling can be formulated as Z h(u(ω)) dω (1.4) F (u) + Ω with h(u) = α 2 2 |u| + β |u|0 where |0|0 = 0, Since Z |u|0 = 1, otherwise. |u(ω)|0 dω = vol({u(ω) 6= 0}), Ω the solution to (1.4) provides the optimal distribution of u for minimizing the performance F (u) with appropriate choice of (regularization) parameters α, β > 0. Example 8 (Bayesian statistics) Consider the statical optimization; min −log (p(y|x) + βp(x)) over all admissible parameters x, (1.5) where p(y|x) is the conditional probability density of observation y given parameter x and the p(x) is the prior probability density for parameter x. It is the maximum posterior estimation of the parameter x given observation y. In the case of inverse problems it has the form φ(x|y) + α p(x) (1.6) for x is in a function space X and φ(x|y) is the fidelity and p is a regularization semi-norm on X. A constant α > 0 is the Tikhonov regularization parameter. Multi-dimensional versions of these examples and more illustrated examples will be discussed and analyzed throughout the monograph. Discretization An important aspect of the function space optimization is the approximation method and theory. For example, let Bin (x) is the linear B-spline defined by x−x i−1 on [xi−1 , xi ] xi −xi−1 x−xi+1 (1.7) Bi (x) = on [xi , xi+1 ] x −x i i+1 0 otherwise and let un (x) = n−1 X ui Bi (x) ∼ u(x). (1.8) i=1 Then, d n ui − ui−1 u = , dx xi − xi−1 Let xi = i n and ∆x = 1 n x ∈ (xi−1 , xi ). and one can define the discretized problem of Example 1: min n X 1 ui − ui−1 2 ( | | − fi ui ) ∆x 2 ∆x i=1 subject to u0 = un = 0, ui ≤ ψi = ψ(xi ). 4 where fi = f (xi ). The discretized problem is the quadratic programming for ~u ∈ Rn−1 with 2 −1 f1 −1 2 −1 f2 1 .. . . . . . . A= , b = ∆x . . . . ∆x fn−2 −1 2 −1 −1 2 fn−1 Thus, we will discuss a convergence analysis of the approximate solutions {un } to the solution u as n → ∞ in an appropriate function space. In general, one has the optimization of the form min F0 (x) + F1 (x) over x ∈ X subject to E(x) ∈ K where X is a Banach space, F0 is C 1 and F1 is nonsmooth functional and E(x) ∈ K represents the constraint conditions with K is a closed convex cone. For the existence of a minimizer we require a coercivity condition and lower-semicontinuity and the continuity of the constraint in a weak or weak star topology of X. For the necessary optimality condition, we develop a generalized Lagrange multiplier theory: 0 ∗ F (x ) + λ + E 0 (x∗ )∗ µ = 0, E(x) ∈ K C1 (x∗ , λ) = 0 (1.9) C2 (x∗ , µ) = 0, where λ is the Lagrange multiplier for a relaxed derivative of F1 and µ is the Lagrange multiplier for the constraint E(x) ∈ K. We derive the complementarity conditions C1 and C2 so that (1.9) is a complete set of equation for determining (x∗ , λ, µ). We discuss the solution method for linear and nonlinear equations in Banach spaces of the form f ∈ A(x) (1.10) where f ∈ X ∗ in the dual space of a Banach space and A is pseudo-monotone mapping X → X ∗ . and the Cauchy problem: d x(t) ∈ A(t)(x(t)) + f (t), dt x(0) = x0 , (1.11) where A(t) is an m-dissipative mapping in X × X. Also, we note that the necessary optimality (1.9) is a nonlinear operator equation and we use the nonlinear operator theory to analyze solutions to (1.9). We also discuss the PDE constrained optimization problems, in which f and f (t) are a function of control and design and medium parameters. We minimize a proper performance index defined for the state and control functions subject to the constraint (1.10) or (1.11). Thus, we combine the analysis of the PDE theory and the optimization theory to develop a comprehensive treatment and analysis of the PDE constrained optimization problems. The outline of the monograph is as follows. In Section 2 we introduce the basic function space and functional analysis tools needed for our analysis. It provides the introduction to the functional analysis and real analysis and their basic concepts with illustrated examples. For example, the dual space of normed space, Hahn-Banach 5 theorem, open mapping theory, closed range theory, distribution theory and weak solution to PDEs, compact operator and spectral theory. In Section 3 we develop the Lagrange multiplier method for a general class of (smooth) constrained optimization problems in Banach spaces. The Hilbert space theory is introduced first and the Lagrange multiplier is defined as the limit of the penalized problem. The constraint qualification condition is introduced and we derive the (normal) necessary optimality system for the primal and dual variables. We discuss the minimum norm problem and the duality principle in Banach spaces. In Section 4 we present the linear operator theory and C0 -semigroup theory on Banach spaces. In Section 5 we develop the monotone operator theory and pseudo-monotone operator theory for nonlinear equations in Banach spaces and the nonlinear semigroup theory in Banach spaces. The fixed point theory for nonlinear equations in Banach spaces. In Section 7 we develop the basic convex analysis tools and the Lagrange multiplier theory for a general class of convex non-smooth optimization. We introduce and analyze the augmented Lagrangian method for the constrained optimization and non-smooth optimization problems. In Section 8 we develop the Lagrange multiplier theory for a general class of nonsmooth and non-convex optimizations. 2 Basic Function Space Theory In this section we discuss the basic theory and concept of the functional analysis and introduce the function spaces, which is essential to our function space analysis of a general class of optimizations. We refer to the more comprehensive treatise and discussion on the functional analysis for e.g., Yosida []. 2.1 Complete Normed Space In this section we introduce the vector space, normed space and Banach spaces. Definition (Vector Space) A vector space over the scalar field K is a set X, whose elements are called vectors, and in which two operations, addition and scalar multiplication, are defined, with the following familiar algebraic properties. (a) To every pair of vectors x and y corresponds a vector x + y, in such a way that x+y =y+x and x + (y + z) = (x + y) + z; X contains a unique vector 0 (the zero vector) such that x + 0 = 0 + x for every x ∈ X, and to each x ∈ X corresponds a unique inverse vector −x such that x + (−x) = 0. (b) To every pair (α, x) with α ∈ K and x ∈ X corresponds a vector α x, in such a way that 1 x = x, α(βx) = (αβ) x, and such that the two distributive laws α (x + y) = α x + α y, (α + β) x = α x + β x hold. 6 The symbol 0 will of course also be used for the zero element of the scalar field. It is easy to see that the inverse element −x of the additivity satisfies −x = (−1) x since 0 x = (1 − 1) x = x + (−1) x for each x ∈ X. In what follows we consider vector spaces only over the real number field R or the complex number field C. Definition (Subspace, Basis) (1) A subset S of X is a (linear) subspace of a vector space X if α x1 + β x2 ∈ S for all x, x2 ∈ S and and α, β ∈ K. P (2) A family of linear vectors {xi }ni=1 are linearly independent if ni=1 αi xi = 0 if and only if αi = 0 for all 1 ≤ i ≤ n. The dimension dim(X) is the largest number of independent vectors in X. (3) A basis {eα } of a vector space X is a set of linearly independent vectors that, in a P linear combination, can represent every vector in X, i.e., x = α aα eα . Example (Vector Space) (1) Let X = C[0, 1] be the vector space of continuous functions f on [0, 1]. X is a vector space by (α f + β g)(t) = α f (t) + β g(t), t ∈ [0, 1] S= a space of polynomials of order less than n is a linear subspace of X and dim(X) = n. A family {tk }∞ k=0 of polynomials is a basis of X and dom(X) = ∞. (2) The vector space of real number sequences x = (x1 , x2 , x3 , · · · ) is a vector space by (α x + β y)k = α xk + β yk , k ∈ N. A family of unit vectors ek = (0, · · · , 1, 0, · · · ), k ∈ N is a basis. (3) Let X be the vector space of square integrable functions on (0, 1). Let ek (x) = Z 1 p (2) sin(kπx), 0 ≤ x ≤ 1. Then {ek } are orthonomal basis, i.e. (ek , ej ) = ek (x)ej (x) dx = 0 δk,j and f (x) = ∞ X fk ek (x), the Fourier sine series k=1 R1 where fP k = 0 f (x)ek (x) dx. en+1 is not linearly dependent with (e1 , · · · , en ). If so, en+1 = ni=1 βi ei but βi = (en+1 , ei ) = 0, which is the contradiction. Definition (Normed Space) A norm | · | on a vector space X is a real-valued functional on X which satisfies |x| ≥ 0 for all x ∈ X with equality if and only if x = 0, |α x| = |α| |x| for every x ∈ x and α ∈ K |x + y| ≤ |x| + |y| (triangle inequality) for every x, y ∈ X. A normed space is a vector space X which is equipped with a norm | · | and will be denoted by (X, | · |). The norm of a normed space X will be denoted by | · |X or simply by | · | whenever the underlined space X is understood from the context. Example (Normed space) Let X = C[0, 1] be the vector space of continuous functions f on [0, 1]. The followings are examples the two normed space for X; X1 is equipped with sup norm |f |X1 = max |f (x)|. x∈[0,1] 7 and X2 is equipped with L2 norm s Z 1 |f |X2 = |f (x)|2 dx. 0 Definition (Banach Space) Let X be a normed space. (1) A sequence {xn } in a normed space X converges to the limit x∗ ∈ X if and only if limn→∞ |xn − x∗ | = 0 in R. (2) A subspace S of a normed space X is said to be dense in X if each x ∈ X is the limit of a sequence of elements in S. (3) A sequence {xn } in a normed space X is called a Cauchy sequence if and only if limm, n→∞ |xm − xn | = 0, i.e., for all > 0 there exists N such that for m, n ≥ N such that |xm − xn | < . If every Cauchy sequence in X converges to a limit in X, then X is complete and X is called a Banach space. A Cauchy sequence {xn } in a normed space X has a unique limit if it has accumulation points. In fact, for any limit x∗ lim |x∗ − xn | = lim n→∞ lim |xm − xn | = 0. n→∞ m→∞ Example (continued) Two norms |x|1 and |x|2 on a vector space X are equivalent if there exist 0 < c ≤ c¯ < ∞ such that c|x|1 ≤ |x|2 ≤ c¯ |x|1 for all x ∈ X. (1) All norms on Rn is areqequivalent. But for X = C[0, 1] consider xn (t) = tn . Then, 1 for all n. Thus, |xn |X2 → 0 as n → ∞ and the two |xn |X1 = 1 but |xn |X2 = 2n+1 norms X1 and X2 are not equivalent. (2) Every bounded sequence in Rn has a convergent subsequence in Rn (BolzanoWeierstrass theorem). Consider the sequence xn (t) = tn in X = C[0, 1]. Then xn is not convergent despite |xn |X1 = 1 is bounded. Definition (Hilbert Space) If X is a vector space, a functional (·, ·) defined on X ×X is called an inner product on X provided that for every x, y, z ∈ X and α, β ∈ C (x, y) = (y, x) (α x + β y, z) = α (x, z) + β (y, z), (x, x) ≥ 0 and (x, x) = 0 if and only if x = 0. where c¯ denotes the complex conjugate of c ∈ C. Given such an inner product, a norm on X can be defined by 1 |x| = (x, x) 2 . A vector space (X, (·, ·)) equipped with an inner product is called a pre–Hilbert space. If X is complete with respect to the induced norm, then it is called a Hilbert space. Every inner product satisfies the Cauchy-Schwarz inequality |(x, y)| ≤ |x| |y| for x, y ∈ X 8 In fact, for all t ∈ R |x + t(x, y) y|2 = |x|2 + 2t |(x, y)|2 + t2 |(x, y)|2 |y|2 ≥ 0. Thus, we must have |(x, y)|4 − |x|2 |(x, y)|2 |y|2 ≤ 0 which shows the Cauchy-Schwarz inequality. Example (Hilbert space) (1) Rn with inner product (x, y) = n X xk yk for x, y ∈ Rn k=1 (2) Consider a space `2 of square summable real number sequences x = (x1 , x2 , · · · ) define the inner product (x, y) = ∞ X xk yk for x, y ∈ `2 k=1 Then, `2 is an Hilbert space. In fact, let {xn } is a Cauchy sequence in `2 . Since n m |xnk − xm k | ≤ |x − x |`2 , {xnk } is a Cauchy sequence in R for every k. Thus, one can define a sequence x by xk = limn→∞ xnk . Since lim |xm − xn |`2 = |x − xn |`2 m→∞ `2 for a fixed n, x ∈ and |xn − x|`2 → as n → ∞. (3) Consider a space X3 of continuously differentiable functions on [0, 1] vanishing at x = 0, x = 1 with inner product Z 1 d d (f, g) = f g dx for f, g ∈ X3 . 0 dx dx Then X3 is a Pre-Hilbert space. Every normed space X is either a Banach space or a dense subspace of a Banach space Y whose norm |x|Y satisfies |x|Y = |x| for every x ∈ X. In this latter case Y is called the completion of X. The completion of a normed space proceeds as in Cantor’s Construction of real numbers from rational numbers. Completion The set of Cauchy sequences {xn } of the normed space X can be classified according to the equivalence {xn } ∼ {yn } if lim n→∞ |xn −yn | = 0. Since ||xn |−|xm || ≤ |xn − xm | and R is complete, it follows that limn→∞ |xn | exists. We denote by {xn }0 , the class containing {xn }. Then the set Y of all such equivalent classes x ˜ = {xn }0 is a vector space by {xn }0 + {yn }0 = {xn + yn }0 , α {xn }0 = {α xn }0 and we define |{xn }0 |Y = lim |xn |. n→∞ 9 It is easy to see that these definitions of the vector sum and scalar multiplication and norm do not depend on the particular representations for the classes {xn }0 and {yn }0 . For example, if {xm } ∼ {yn }, then lim n→∞ ||xn | − |yn || ≤ lim n→∞ |xn − yn | = 0, and thus lim n→∞ |xn | = lim n→∞ |yn |. Since |{xn }0 |Y = 0 implies that limn→∞ |xn − 0|, we have {0} ∈ {xn }0 and thus | · |Y is a norm. Since an element x ∈ X corresponds to the Cauchy sequence: x ¯ = {x, x, . . . , x, . . . }0 ∈ Y, X is naturally contained in Y . (k) To prove the completeness of Y , let {˜ xk } = {{xn }} be a Cauchy sequence in Y . Then for each k, we can choose an integer nk such that (k) −1 |x(k) m − xnk |X ≤ k for m > nk . We show that the sequence {˜ xk } converges to the class containing the Cauchy sequence (k) {xnk } of X. To this end, note that (k) (k) −1 |˜ xk − xnk |Y = lim |x(k) m − xnk |X ≤ k . m→∞ Since (k) (m) (k) (m) (m) ˜k |Y + |˜ xk − x ˜m |Y + |˜ xm − xnm |Y |x(k) nk − xnm |X = |xnk − xnm |Y ≤ |xnk − x ≤ k −1 + m−1 + |˜ xk − x ˜ m |Y , (k) (k) {xnk } is a Cauchy sequence of X. Let x ˜ = {xnk }0 . Then, (k) (k) (k) |˜ x−x ˜k |Y ≤ |˜ x − xnk |Y + |xnk − x ˜k |Y ≤ |˜ x − xnk |Y + k −1 . Since, as shown above (k) (k) |˜ x − xnk |Y = lim |xm xm − x ˜k |Y + k −1 nm − xnk |X ≤ lim |˜ m→∞ m→∞ (k) thus we prove that limk→∞ |˜ x − xnk |Y = 0 and so limk→∞ |˜ x−x ˜k |Y = 0. Moreover, the above proof shows that the correspondence x ∈ X to x ¯ ∈ Y is isomorphic and isometric and the image of X by this correspondence is dense in Y , that is X is dense in Y . Example (Completion) Let X = C[0, 1] be the vector space of continuous functions f on [0, 1]. We consider the two normed space for X; X1 is equipped with sup norm |f |X1 = max |f (x)|. x∈[0,1] and X2 is equipped with L2 norm s Z 1 |f |X2 = 0 10 |f (x)|2 dx We claim that X1 is complete but X2 is not. The completion of X2 is L2 (0, 1) =the space of square integrable functions. First, discuss X1 . Let {fn } is a Cauchy sequence in X1 . Since |fn (x) − fm (x)| ≤ |fn − fm |X1 , {fn (x)} is Cauchy sequence in R for every x ∈ [0, 1]. Thus, one can define a function f by f (x) = limn→∞ fn (x). Since |f (x) − f (ˆ x)| ≤ |fn (x) − fn (ˆ x)| + |fn (x) − f (x)| + |fn (ˆ x) − f (ˆ x)| and |fn (x) − f (x)| → 0 uniformly x ∈ [0, 1], the candidate f is continuous. Thus, X1 is complete. Next, discuss X2 . Define a Cauchy sequence {fn } by 1 on [ 12 − n1 , 12 + n1 ] 2 + n (x − 12 ) 0 on [0, 12 − n1 ] fn (x) = 1 on [ 12 + n1 , 1] The candidate limit f in this case is defined by f (x) = 0, x < 21 , f ( 12 ) = 12 and f (x) = 1, x > 12 and is not in X. Consider a Cauchy sequences gn and hn in the equivalence class {fn }; on [ 12 − n1 , 12 ] 1 + n (x − 12 ) 0 on [0, 12 − n1 ] gn (x) = 1 on [ 12 , 1] n (x − 12 ) 0 hn (x) = 1 on [ 12 , 12 + n1 ] on [0, 12 ] on [ 12 + n1 , 1]. Thus, the candidate limits g, (g(x) = 0, x < 12 , g(x) = 1, x ≥ 12 and h, (h(x) = 0, x ≤ 1 1 2 , h(x) = 1, x > 2 belong to the same equivalent class f = {fn }. Note that f , g and h differ only at x = 12 . In general functions in an equivalent class differ at countably many points. The completion Y of X2 is denoted by L2 (0, 1) = {the space of square integrable measurable functions on (0, 1)} with |f |2L2 1 Z |f (x)|2 dx. = 0 Here, the integrable is defined in the sense of Lebesgue as discussed in the next section. ¯ =the completion of X there The concept of completion states, for every f ∈ Y = X exists a sequence fn in X such that |fn − f |Y → 0 as n → ∞ and |f |Y = limn→∞ |fn |X . ¯ 2 there For the example X2 , for all square integrable function f ∈ L2 (0, 1) = Y = X exists a continuous function sequence fn ∈ X2 such that |f − fn |L2 (0,1) → 0. Example H01 (0, 1) The completion of X3 as defined above is denoted by H01 (0, 1) = {a space of absolutely continuous functions on [0, 1] with square integrable derivative and vanishing at x = 0, x = 1} 11 with Z (f, g)H01 = 1 0 d d f (x) g(x) dx dx dx where the integrable is defined in the sense of Lebesgue. Here, f is absolutely continuous function if there exits an integrable function g such that Z x f (x) = f (0) + g(x) dx x ∈ [0, 1]. 0 d and f is almost everywhere (a.e.) differentiable and dx f = g a.e. in (0, 1). For the n 1 linear spline function Bk (x), 1 ≤ k ≤ n − 1 is in H0 (0, 1). 2.2 Measure Theory In this section we discuss the measure theory. Definition (1) A topological space is a set E together with a collection τ of subsets of E, called open sets and satisfying the following axioms; The empty set and E itself are open. Any union of open sets is open. The intersection of any finite number of open sets is open. The collection τ of open sets is then also called a topology on E. We assume that a topological space (E, τ ) satisfies the Hausdoroff’s axiom of separation: for every disjoints x1 , x2 ∈ E there exist disjoint open sets G, G2 auch that x1 ∈ G1 , x2 ∈ G2 . (2) A metric space E is a topological space with metric d if for all x, y, z ∈ E d(x, y) ≥ for x, y ∈ E and d(x, y) = 0 if and only if x = y d(x, y) = d(y, x) d(x, z) ≤ d(x, y) + d(y, z). For x0 ∈ E and r > 0 B(x0 , r) = {x ∈ E : d(x, x0 ) < r is an open ball of E with center x0 an radius r. A set O is called open if for any points x ∈ E there exists an open ball B(x, r) contained in the set O. (3) A collection of subsets F of E is σ-algebra if for all A ∈ F, Ac = E \ A ∈ F for arbitrary family {Ai } in F, countable union S∞ i=1 Ai ∈F (4) A measure µ on (E, F) assigns A ∈ F the nonnegative real value µ(A) and satisfies σ-additivity; ∞ ∞ [ X µ( Ai ) = µ(Ai ) i=1 i=1 for all disjoint sets {Ai } in F. Theorem (Monotone Convergence) The measure S µ is σ-additive if and only if for all sequence {Ak } of nondecreasing events and A = k≥1 Ak , limn→∞ µ(An ) = µ(A). 12 Proof: {Ack } is a sequence of nonincreasing events and Ac = \ T c k≥1 Ak . Since Ack = Ac1 + (Ac2 \ Ac1 ) + (Ac3 \ Ac2 ) + · · · k≥1 we have µ(Ac ) = µ(Ac1 ) + µ(Ac2 \ Ac1 ) + µ(Ac3 \ Ac2 ) + · · · = µ(Ac1 ) + µ(Ac2 ) − µ(Ac1 ) + µ(Ac3 ) − µ(Ac1 ) + · · · = limn→∞ µ(Acn ) Thus, µ(A) = 1 − µ(Ac ) = 1 − (1 − lim µ(An )) = lim µ(An ). n→∞ n→∞ P Conversely, let A1 , A2 , · · · ∈ F be pairwise disjoint and let ∞ k=1 Ak ∈ F. Then ∞ n ∞ X X X µ( Ak ) = µ( Ak ) + µ( Ak ) k=1 Since P∞ k=n+1 Ak k=1 k=n+1 ↓ ∅, we have ∞ X lim n→∞ and thus µ( ∞ X Ak ) = lim k=n+1 n→∞ k=1 Ak = ∅ n X µ(Ak ) = k=1 ∞ X µ(Ak ). k=1 Example (Metric space) A normed space X is a metric space with d(x, y) = |x − y|. A point x ∈ E is the accumulation point of a sequence {xn } if d(xn , x) → 0 as n → ∞. The closure A¯ of a set A is the collection of all accumulation points of A. The boundary of asset A is the collection points of x that any open ball at x contains both points in ¯ i.e., every sequence in {xn } in B has a A and Ac . A set B in E is closed if B = B, c accumulation x. The complement A of an open set A is closed. Examples (σ-algebra) There are trivial σ-algebras; F0 = {Ω, ∅}, F ∗ = all subsets of Ω. Let A be a subset of Ω and σ-algebra generated by A is FA = {Ω, ∅, A, Ac }. A finite set of subsets A1 , A2 , · · · , An of E which are pairwise disjoint and whose S union is E. It is called a partition of Ω . It generates the σ-algebra: A = {A = j∈J Aj } where J runs over all subsets of 1, · · · , n. This σ-algebra has 2n elements. Every finite σ-algebra is of this form. The smallest nonempty elements {A1 , · · · , An } of this algebra are called atoms. Example (Countable measure) Let Ω has a countable decomposition {Dk }, i.e., Ω= X Dk , Dj ∩ Dj = ∅, i 6= j. 13 Let F = F ∗ and P (Dk ) = αk > 0 and nonnegative integers; Dk = {k}, P k αk = 1. For the Poisson measure on m(Dk ) = e−λ λk k! for λ > 0. Definition For any family C of subsets of E, we can define the σ-algebra σ(C) by the smallest σ-algebra A which contains C. The σ-algebra σ(C) is the intersection of all σ-algebras which contains C. It is again a σ-algebra, i.e., \ Aα A= α where Aα are all σ-algebras that contain C. The following construction of the measure space and the measure is essential for the triple (E, F, m) on uncountable space E. If (E, O) is a topological space, where O is the set of open sets in E, then the σalgebra B(E) generated by O is called the Borel σ-algebra of the topological space E and (E, B(E)) defines the measure space and a set B in B(E) is called a Borel set. For example (Rn , B(Rn )) is the the measure space and B(Rn ) is the σ-algebra generated by open balls in Rn . Caratheodory Theorem Let B = σ(A), the smallest σ-algebra containing an algebra A of subsets of E. Let µ0 is a σ additive measure of on (E, A). Then there exist a unique measure µ on (E, B) which is an extension of µ0 , i.e., µ(A) = µ0 (A), A ∈ A. Definition A map f from a measure space (X, A) to an other measure space (Y, B) is called measurable, if f −1 (B) = {x ∈ X : f (x) ∈ B} ∈ A for all B ∈ B. If f : (Rn , B(Rn )) → (Rm , B(Rm )) is measurable, we say f is a Borel function. In general every continuous function Rn → Rm is a Borel function since the inverse image of open sets in Rn are open in Rm . Example (Measure space) Let Ω = R and B(R) be the Borel σ-algebra. Note that (a, b] = \ 1 (a, b + ), n n [a, b] = \ 1 1 (a − , b + ) ∈ B(R). n n n Thus, B(R) coincides with the σ-algebra generated by the semi-closed intervals. Let A be the algebra of finite disjoint sum of semi-closed intervals (ai , bi ] and define P0 by µ0 ( n X (ak , bk ]) = k=1 n X (F (bk ) − F (ak )) k=1 where x ∈ R → F (x) ∈ R+ is nondecreasing, right continuous and the left limit exists everywhere. We have the measure µ on (R, B(R)) by the Caratheodory Theorem if µ0 is σ-additive on A. on (Ω, F) = (R, B(R)). We now prove that µ0 is σ-additive on A. By the monotone convergence theorem it suffices to prove that P0 (An ) ↓ 0, An ↓ ∅, 14 An ∈ A. Without loss of the generality one can assume that An ⊂ [−N, N ]. Since F is the right continuous, for each An there exists a set Bn ∈ A such that Bn ⊂ An and µ0 (An ) − µ0 (Bn ) ≤ 2−n for all > 0. The collection of sets {[−N, N ]\Bn } is an open covering of the compact set [−N, N ] since ∩Bn = ∅. By the Heine-Borel theorem there exists a finite subcovering: n0 [ [−N, N ] \ Bn = [−N, N ]. n=1 and thus Tn0 n=1 Bn = 0. Thus, µ0 (An0 ) = µ0 (An0 \ µ0 ( n0 \ Tn0 k=1 Bk ) (Ak \ Bk )) ≤ k=1 n0 X T 0 T 0 Bk ) Bk ) = µ0 (An0 \ nk=1 + µ0 ( nk=1 µ0 (Ak \ Bk ) ≤ . k=1 Since > 0 is arbitrary µ0 (An ) → 0 as n → ∞. Next, we define the integral of a measurable function f on (Ω, F, µ). Defintion (Elementary function) An elementary function f is defined by f (x) = n X xk IAk (x) k=1 where {Ak } is a partition of E, i.e, Ak ∈ F are disjoint and ∪Ak = E. Then the integral of f is given by Z f (x) dµ(x) = E n X xk µ(Ak ). k=1 Theorem (Approximation) For every measurable function f ≥ 0 on (E, F) there exists a sequence of elementary functions {fn } such that 0 ≤ fn (x) ≤ f (x) and fn (x) ↑ f (x) for all x ∈ E. Proof: For n ≥ 1, define a sequence of elementary functions by n fn (ω) = n2 X k−1 k=1 2n Ik,n (ω) + nIf (x)>n k where Ik,n is the indicator function of the set {x : k−1 2n < f (x) ≤ 2n }. It is easy to verify that fn (x) is monotonically nondecreasing and fn (x) ≤ f (x) and thus fn (x) → f (x) for all x ∈ E. Definition (Integral) For a nonnegative measurable function f on (E, F) we define the integral by Z Z f (x) dµ(x) = lim fn (x) dµ(x) E n→∞ E 15 Z fn (x) dµ(x) is an increasing number sequence. where the limit exists since E Note that f = f + − f − with f + (x) = max(0, f (x)), f − (x) = max(0, f (x)). So, we can apply for Theorem and Definition for f + and f − . Z Z Z + f − (x) dµ(x) f (x) dµ(x) − f (x) dµ(x) = Z f + (x) dµ(x), If E E E Z f − (x) dµ(x) < ∞, f is integrable and E E Z Z |f (x)| dµ(x) = Z + f (x) dµ(x) + E E E f − (x) dµ(x). Corollary Let f is a measurable function on (R, B(R), µ) with µ((a, b]) = F (b)−F (a), then we have as n → ∞ Z n ∞ f (x) dµ(x) = 0 n2 X k=0 k k−1 k−1 f ( n )(F ( n )−F ( n ))+f (n)(1−F (n)) → 2 2 2 Z ∞ f (x) dF (x), 0 which is the Lebesgue Stieljes integral with respect to measure dF . The space L of measurable functions on (E, F) is a complete metric space with metric Z |f − g| dµ. d(f, g) = 1 + |f − g| E In fact, d(f, g) = 0 if and if f = g almost everywhere and since |f + g| |f | |g| ≤ + , 1 + |f + g| 1 + |f | 1 + |g| d satisfies the triangle inequality. Theorem (completeness) (L, d) is a complete metric space. Proof: Let {fn } be a Cauchy sequence of (L, d). Select a subsequence fnk such that ∞ X d(fnk , fnk+1 ) < ∞ k=1 Then, X E[ k Since |f | ≤≤ 2|f | 1+|f | |fnk − fnk+1 | <∞ 1 + |fnk − fnk+1 | for |f | ≤ 1, ∞ X |fnk − fnk+1 | < ∞ a.e. and {fnk } almost everywhere converges to f for some f ∈ L. Moreover, d(fnk , f ) → 0 and thus (L, d) is complete. 16 Definition (Convergence in Measure) A sequence {fn } of mesurable functions on S converges to f in measure if for > 0, limn→∞ m({s ∈ S : |fn (s) − f (s)| ≥ })=0. Theorem (Convergence in measure) {fn } converges in measure to f if and only if {fn } converges to f in d-metric. Proof: For f, g ∈ L Z d(f, g) = |f −g|≥ |f − g| dµ + 1 + |f − g| Z |f −g|< |f − g| dµ ≤ µ(|f − g| ≥ ) + 1 + |f − g| 1+ holds for all > 0. Thus, fn converges in mesure to f , then fn converges to f in d-metric. Conversely, since d(f, g) ≥ µ(|f − g| ≥ ) 1+ if fn converges to f in d-metric, then fn converges in measure to f . Corollary The completion of C(E) with respect to Lp (E)-norm: Z 1 ( |f |p dµ) p E is Lp (E) = {f ∈ L : 2.3 R E |f |p dµ}. Baire’s Category Theory In this section we discuss the uniform boundedness principle and the open mapping theorem. The underlying idea of the proofs of these theorems is the Baire theorem for complete metric spaces. It is concerned with the decomposition of a space as a union of subsets. For instance, we have a countable union; [ R2 = Sk,j , Sk = (k, k + 1] × (j, j + 1]. k,j On the other hand, we have uncountable union; [ `α , `α = {α} × R α∈R but `α has no interior points, the one dimensional subspace of R2 . The question is can we represent R2 as a countable union of these sets? It turns out that the answer is no. Definition (Category) Let (X, d) be a metric space. A subset E of X is called nowhere dense if its closure does not contain any open set. X is said to be the first category if X is the union of a countable number of S nowhere dense sets. Otherwise, X is said to be the second category, i.e., suppose X = k Ek , then the closure of at least one of Ek ’s has non-empty interior. Baire-Hausdorff Theorem A complete metric space is of the second category. 17 Proof: Let {Ek } be a sequence of closed sets and assume ∪n En = X in a complete metric space X. We will show that there is at least one of Ek ’s has non-empty interior. Assume otherwise, no En contains an interior point. Let On = Enc and then On is open and dense in X for every n ≥ 1. Thus, O1 contains a closed ball B1 = {x : d(x, x1 ) ≤ r1 } with r1 ∈ (0, 21 ). Since O2 is open and dense, there exists a closed ball B2 = {x : d(x, x2 ) ≤ r2 } with r2 ∈ (0, 212 ) in B1 . By repeating the same process, there exists a sequence {Bk } of closed ball Bk = {x : x : d(x, xk ) ≤ rk such that Bk+1 ⊂ Bk , 0 < rk 1 , Bk ∩ Ek = ∅. 2k The sequence {xk } is a Cauchy sequence in X and since X is complete x∗ = lim xn ∈ X exists. Since d(xn , x∗ ) ≤ d(xn , xm ) + d(xm , x∗ ) ≤ rn + d(xm , x∗ ) → rn as m → ∞, x∈ Bn for every n, i.e. x∗ ∈ / En and thus x∗ ∈ / ∩n En = X, which is a contradiction. Baire Category theorem Let M be a set of the first category in a compact topological space. Then, the complement M c is dense in X. Theorem (Totally bounded) A subset M in a complete metric space X is relatively compact if and only if it is totally bounded, i.e, for every > 0 there exists a family of finite many points {m1 , m2 , · · · , mn } in M such that for every point m ∈ M , d(m, mk ) < for at least one mk . Theorem (BanachSteinhaus, uniform boundedness principle) Let X, Y be Banach spaces and let Tk , k ∈ K be a family (not necessarily countable) of continuous linear operators from X to Y . Assume that sup |Tk x|Y < ∞ for all x ∈ X. k∈K Then there exists a constant c such that |Tk x|Y ≤ c |x|X for all k ∈ K and x ∈ X. Proof: Define Xn = {x ∈ X : |Tk x| ≤ n for all k ∈ K}. The, Xn is closed, and by the assumption we have X = ∪n Xn . It follows from the Baire category theorem that int(X0 ) is not empty for some n0 Pick x0 ∈ X and r > 0 such that B(x0 , r) ⊂ int(X0 ). We have Tk (x0 + r z) ≤ n0 for all k and z ∈ B(0, 1), which implies r |Tk |L(X,Y ) ≤ n0 + |Tk x0 |. The uniform bounded principle is quite remarkable and surprising, since from pointwise estimates one derives a global (uniform) estimate. We have the following examples. Examples (1) Let X be a Banach space and B be a subset of X. If for every f ∈ X ∗ the set f (B) = {hf, xi, x ∈ B} is bounded in R, then B is bounded. 18 (2) Let X be a Banach space and B ∗ be a subset of X ∗ . If for every x ∈ X the set hB ∗ , xi = {hf, xi, f ∈ B ∗ } is bounded in R, then B ∗ is bounded. Theorem (Banach closed graph theorem) A closed linear operator on a Banach space X to a Banach space Y is continuous. 2.4 Dual space and Hahn-Banach theorem In this section we discuss the dual space of a normed space and the Hahn-Banach theorem. Definition (Dual Space) If X be a normed space, let X ∗ denote the collection of all bounded linear functionals on X. X ∗ is called the dual space of X. We define the dual product hx, f iX×X ∗ by for x ∈ X f ∈ X ∗ . hf, xiX ∗ ×X = f (x) For f ∈ X ∗ we define the operator norm of f by |f |X ∗ = sup x6=0 |f (x)| . |x|X Theorem (Dual space) The dual space X ∗ equipped with the operator norm is a Banach space. Proof: {fn } be a Cauchy sequence in X ∗ , i.e. |fn (φ) − fm (φ)| ≤ |fn − fm |∗ |φ| → 0 as m → ∞, for all φ ∈ X. Thus, {fn (φ)} is Cauchy in R, define the functional f on X by f (φ) = lim fn (φ), n→∞ φ ∈ X. Then f is linear and for all > 0 there exists N such that for m, n ≥ N |fm (φ) − fn (φ)| ≤ |φ| Letting m → ∞, we have |f (φ) − fn (φ)| ≤ |φ| for all n ≥ N . and thus f ∈ X ∗ and |fn − f |∗ → 0 as n → ∞. Examples (Dual space) (1) Let `p be the space of p-summable infinite sequences x = 1 P (s1 , s2 , · · · ) with norm |x|p = ( |sk |p ) p . Then, (`p )∗ = `q , p1 + 1q = 1 for 1 ≤ p < ∞. Let c = `∞ be the space of uniformly bounded sequences x = (ξ1 , ξ2 , · · · ) with norm |x|∞ = maxk |ξk | and c0 be the space of uniformly bounded sequences with lim n → ∞ξn = 0. Then, (`1 )0 = c0 (2) Let Lp (Ω) is p-integrable functions f on Ω with Z 1 |f |Lp = ( |f (x)|p dx) p Ω 19 Then, Lp (Ω)∗ = Lq (Ω), L1 (Ω)∗ = L∞ (Ω), where 1 p + 1 q = 1 for p ≥ 1. Especially, L2 (Ω)∗ = L2 (Ω) and L∞ (Ω) = the space of essentially bounded functions on Ω with |f |L∞ = inf M where |f (x)| ≤ M almost everywhere on Ω. (3) Consider a linear functional f (x) = x(1). Then, f is in X1∗ but not in X2∗ since n n n for q the sequence defined by x (t) = t , t ∈ [0, 1] we have f (xn ) = 1 but |x |X 2 = 1 2n+1 → 0 as n → ∞. Since for x ∈ X3 Z t |x(t)| = |x(0) + x0 (s) ds| ≤ 0 s Z 1 |x0 (t)|2 dt, 0 thus f ∈ X3∗ . (4) The total variation V01 (g) of a function g on an interval [0, 1] is defined by V01 (g) = sup n−1 X |g(xi+1 ) − g(xi )|, P ∈P i=0 where the supremum is taken over all partitions P = {P : 0 ≤ t0 ≤ · · · ≤ tn ≤ 1} of the interval [0, 1]. Then, BV (0, 1) = {uniformly bounded function g with V01 (f ) < ∞} and |g|BV = V01 (g). It is a normed space if we assume g(0) = 0. For example, every monotone functions is of bounded variation. Every real-valued BV-function can be expressed as the difference of two increasing functions (Jordan decomposition theorem). Hahn-Banach Theorem Suppose S is a subspace of a normed space X and f : S → R is a linear functional satisfying f (x) ≤ |x| for all x ∈ S. Then there exists an F ∈ X ∗ such that F (x) ≤ |x| and F (s) = f (s) for all s ∈ S. In general, the theorem holds for a vector space X and sub-additive and positive homogeneous function p, i.e., for all x, y ∈ X and α > 0 p(x + y) ≤ p(x) + p(y), p(αx) = αp(x). Every linear functional f on a subspace S, satisfying f (x) ≤ p(x) has a extension F on X satisfying F (x) ≤ p(x) for all x ∈ X. Proof: First, we show that there exists an one step extension of f , i.e., for x0 ∈ X \ S, there exists an extension f1 of f on the subspace Y1 spanned by (S, x0 ) such that f1 (x) ≤ p(x) on Y1 . Note that for arbitrary s1 , s2 ∈ S, f (s1 ) + f (s2 ) ≤ p(s1 − x0 ) + p(s2 + x0 ). Hence, there exist a constant c such that sup{f (s) − p(s − x0 )} ≤ c ≤ inf {p(s + x0 ) − f (s)}. s∈S s∈S 20 Extend f by defining, for s ∈ S and α ∈ R f1 (s + αx0 ) = f (s) + α c, where f1 (x0 ) = c, is determined as above. We claim that f1 (s + αx0) ≤ p(s + αx0 ) for all α ∈ R and s ∈ S, i.e., by the definition of c, for α > 0 s s s s f1 (s + αx0 ) = α(c + f ( ) ≤ α (p( + x0 ) − f ( ) + f ( ) = p(s + αx0 ), α α α α and for all α < 0 f1 (s + αx0 ) = −α(−c + f ( s s s s ) ≤ −α(p( − x0 ) − f ( ) + f( ) = p(s + αx0 ). −α −α −α −α If X is a separable normed space. Let {x1 , x2 , . . . } be a countable dense base of X \ S. Select victors from this dense vectors, which is independent of S, recursively. Extend f to subspaces span{span{span{S, x1 }, x2 }, . . . } recursively, as described above. The space span{S, x1 , x2 , . . . } is dense in X. The final extension F is defined by the continuity. In general, the remaining proof is done by using Zorns lemma. Corollary Let X be a normed space. For each x0 ∈ X there exists an f ∈ X ∗ such that f (x0 ) = |f |X ∗ |x0 |X . Proof of Corollary: Let S = {α x0 : α ∈ R} and define f (α x0 ) = α |x0 |X . By HahnBanach theorem there exits an extension F ∈ X ∗ of f such that F (x) ≤ |x| for all x ∈ X. Since −F (x) = F (−x) ≤ | − x| = |x|, we have |F (x)| ≤ |x|, in particular |F |X ∗ ≤ 1. On the other hand, F (x0 ) = f (x0 ) = |x0 |, thus |F |X ∗ = 1 and F (x0 ) = f (x0 ) = |F ||x0 |. Example (Hahn-Banach Theorem) (1) |x| = sup|x∗ |=1 hx∗ , xi (2) Next, we prove the geometric form of the Hahn-Banach theorem. It states that given a convex set K containing an interior point and x0 ∈ / K, then x0 is separated from K by a hyperplane. Let K be a convex set and 0 is an interior point of K. A hyperplane is a maximal affine space x0 +M where M is a subspace of X. We introduce the Minkowski functional that defines the distance from the origin measured by K. Definition (Minkowski functional) Let K be a convex set in a normed space X and 0 is an interior point of K. Then the Minkowski functional p of K is defined by p(x) = inf{r : x ∈ K, r > 0}. r Note that if K is the unit sphere in X, p(x) = |x|. Lemma (Minkowski functional) (1) 0 ≤ p(x) < ∞. (2) p(αx) = α p(x) for α ≥ 0. 21 (3) (4) (5) p(x1 + x2 ) ≤ p(x1 ) + p(x2 ). p is continuous. ¯ = {x : p(x) ≤ 1}, int(K) = {x : p(x) < 1}. K Proof: (1) Since K contains a sphere at 0, for every x ∈ X there exists an r > 0 such that xr ∈ K and thus p(x) is finite. (2) For α > 0 p(αx) = inf{r : αx x x ∈ K} = inf{αr0 : 0 ∈ K} = α inf{r0 : 0 ∈ K} = α p(x). r r r (3) Let p(xi ) < ri < p(xi )+, i = 1, 2 for arbitrary > 0. Let r = r1 +r2 . By convexity 2 of K since rr1 xr11 + rr2 xr22 = x1 +x ∈ K and thus p(x1 + x2 ) ≤ r ≤ p(x1 ) + p(x2 ) + 2. r Since > 0 is arbitrary, p is subadditive. (4) Let be a radius of a closed sphere centered at 0 and contained in K. Since x x |x| ∈ K, p( |x| ) ≤ 1 and thus p(x) ≤ 1 |x|. This shows that p is continuous at 0. Since p is sub linear p(x) = p(x − y + y) ≤ p(x − y) + p(y) and p(y) = p(y − x + x) ≤ p(y − x) + p(x) and thus −p(y − x) ≤ p(x) − p(y) ≤ p(x − y). Hence, p is continuous since it is continuous at 0. (5) It follows from the continuity of p. Lemma (Hyperplane) Let H be a hyperplane in a vector space X if and only if there exist a linear functional f on X and a constant c such that H = {f (x) = c}. Proof: If H ia hyperplane, then H is the translation of a subspace M in X, i.e., H = x0 + M . If x0 6 inM , then [M + x0 ] = X. if for x = α x0 + m, m ∈ M we let f (x) = α, we have H = {x : f (x) = 1}. Conversely, if f is a nonzero linear functional on X, then M = {x : f (x) = 0} is a subspace. Let x0 ∈ X such that f (x0 ) = 1. Since f (x − f (x)x0 ) = 0, X = span[{x0 }, M ] and {x : f (x) = c} = {f (x − c x0 ) = 0} is a hyperplane. Theorem (Mazur’s Theorem) Let K be convex set with a nonempty interior point in a normed space X. Suppose V is an affine space in X that contains no interior points of K. Then, there exist x∗ ∈ X ∗ and constant c such that the hyperplane langlex∗ , vi = c has the separation: hx∗ , vi = c for all v ∈ V and hx∗ , ki < c for all k ∈ int(K). Proof: By an appropriate translation one may assume that 0 is an interior point of K. Let M is the subspace spanned by V . Since V does not contain 0, exists a linear functional f on M such that V = {x : f (x) = 1}. Let p be the Minkowski functional of K. Since V contains no interior points of K, f (x) = 1 ≤ p(x) for all x ∈ V . By the homogeneity of p, f (x) ≤ p(x) for all x ∈ V By the Hahn-Banach space, there exits an extension F of f to V from M to X with F (x) ≤ p(x). From Lemma p is continuous and F is continuous and F (x) < 1 for x ∈ int(K). Thus, H = {x : F (x) = 1} is the desired closed hyperplane. 22 Example (Hilbert space) If X is a Hilbert space and S is a subspace of X. Then the extension F is defined as lim f (sn ) for s ∈ S F (s) = 0 for s ∈ S ⊥ where X = S ⊕ S ⊥ . R1 Example (Integral) Let X = L2 (0, 1) S = C(0, 1) and f (x) = R − 0 x(t) dt be the R1 Riemman integral of x ∈ S . Then the extension F is F (x) = L − 0 x(t) dt is the Lebesgue integral of x ∈ X. Example (Alignment) (1) For Hilbert space X f = x0 is self aligned. (2) Let x0 ∈ X = Lp (Ω), 1 ≤ p < ∞. For f ∈ X ∗ = Lq (Ω) defined by by f (t) = |x0 (t)|p−2 x0 (t), t ∈ Ω, Z |x0 (t)|p dt. f (x) = Ω Thus, x0 |x0 |p−2 ∈ X ∗ is aligned with x0 . A function g is called right continuous if limh→0+ g(t + h) = g(t). Let BV0 [0, 1] = {g ∈ BV0 [0, 1] : g is right continuous on [0, 1) and g(0) = 0}. Let C[0, 1] is the normed space of continuous functions on [0, 1] with the sup norm. For f ∈ C[0, 1] and g ∈ BV0 [0, 1] define the Riemann-Stieltjes integral by Z t X x(t)dg(t) = lim R(f, g, P ) = lim f (zk )(g(tk ) − g(tk−1 ) 0 ∆t ∆t k=1 for the partition P = {0 = t0 < · · · < tn = 1} and zk ∈ [tk−1 , tk ]. There is a version of the Riesz representation theorem. Theorem (Dual of C[0, 1]) There is a norm-preserving linear isomorphism from C[0, 1] to V [0, 1], i.e., for F ∈ C[0, 1]∗ there exists a unique g ∈ BV0 [0, 1] such that Z 1 F (x) = x(t)dg(t) 0 and |F |C[0,1]∗ = |g|BV . That is, C[0, 1]∗ ∼ = BV (0, 1), the space of functions with bounded variation. Proof: Define the normed space of uniformly bound function on [0, 1] by B = {x : [0, 1] → R is bounded on [0, 1]} with norm |x|B = sup |x(t)|. t∈[0,1] By the Hahn-Banach theorem, an extension F of f from C[0, 1] to B exists and |F | = |f |. For any s ∈ (0, 1], define χ[0,s] ∈ B and define v(s) = F (χ[0,s] ). Then, P P k |v(tk ) − v(tk−1 )| = k sign(v(tk ) − v(tk−1 ))(v(tk ) − v(tk−1 )) P = sign(v(tk ) − v(tk−1 )(F (χ[0,tk ] ) − F (χ[0,tk−1 ] )) = F ( k sign(v(tk ) − v(tk−1 )(χ[0,tk ] − χ[0,tk−1 ] ) ≤ |F || P k sign(v(tk ) − v(tk−1 )χ(tk−1 ,tk ] | ≤ |f |. 23 Thus, v ∈ BV [0, 1] and |v|BV ≤ |F |. Next, for any x ∈ C[0, 1] define the function X z(t) = x(tk−1 (χ[0,tk ] − χ[0,tk−1 ] ). k Note that the difference |z − x|B = max k max t∈[tk−1 ,tk ] |x(tk−1 ) − x(t)| → 0 as ∆t → 0+ . Since F is bounded F (z) → F (x) = f (x) and F (z) = X Z x(tk−1 )(v(tk ) − v(tk−1 )) → 1 x(t)dv(t) 0 k we have 1 Z x(t)dv(t) f (x) = 0 and |F | ≤ |x|C[0,1] |v|BV . Hence F | = |f | = |v|BV . Riesz Representation Theorem For every bounded linear functional f on a Hilbert space X there exits a unique element yf ∈ X such that f (x) = (x, yf ) for all x ∈ X, and |f |X ∗ = |yf |X . We define the Riesz map R : X ∗ → X by Rf = yf ∈ X for f ∈ X ∗ for a Hilbert space X. Proof: There exists a z ∈ X such that (z, x) = 0 for all f (x) = 0 and |z| = 1. Otherwise, f (x) = 0 for all x and f = 0. Note that for every x ∈ X we have f (f (x)z − f (z)x) = 0. Thus, 0 = (f (x) − f (z)x, z) = (f (x)z, z) − (f (z)x, z) which implies f (x) = (x, f (z)z) The existence then follows by taking yf = f (z)z. Let for u1 , u2 f (x) = (x, u1 ) = (x, u2 ) for all x ∈ X. Then, then (x, u1 − u2 ) = 0 and by taking x = u1 − u2 we obtain |u1 − u2 | = 0, which implies the uniqueness. Definition (Reflexive Banach space) For a Banach space X we define an injection i of X into X ∗∗ = (X ∗ )∗ (the double dual) as follows. If x ∈ X then i x ∈ X ∗∗ is defined by i x(f ) = f (x) for f ∈ X ∗ . Note that |i x|X ∗∗ = sup f ∈X ∗ |f (x)| |f |X ∗ |x|X ≤ sup = |x|X |f |X ∗ |f |X ∗ f ∈X ∗ an thus |i x|X ∗ ∗ ≤ |x|X . But by Hahn-Banach theorem there exists an f ∈ X ∗ such that |f |X ∗ = 1 and f (x) = |x|X . Hence |i x|X ∗ ∗ = |x|X and i : X → X ∗∗ is an isometry. If i is also subjective, then we say that X is reflexive and we may identify X with X ∗∗ . 24 Definition (Weak and Weak star Convergence) (1) A sequence {xn } in a normed space X is said to converge weakly to x ∈ X if limn→∞ f (xn ) → f (x) for all f ∈ X ∗ (2) A sequence {fn } in the dual space X ∗ of a normed space X is said to converge weakly star to f ∈ X ∗ if limn→∞ fn (x) → f (x) for all x ∈ X. Notes (1) If xn converges strongly to x ∈ X then xn converges weakly to x, but not conversely. (2) If xn converges weakly to x, then {|xn |X } is bounded and |x| ≤ lim inf |xn |X , i.e., the norm is weakly lower semi-continuous. (3) A sequence {xn } in a normed space X converges weakly to x ∈ X if limn→∞ f (xn ) → f (x) for all f inS, where S is a dense set in X ∗ . (4) If X be a Hilbert space. If a sequence {xn } of X converges weakly to x ∈ X, then xn converges strongly to x if and only if limn→∞ |xn | = |x|. In fact, |xn − x|2 = (xn − x, xn − x) = |xn |2 − (xn , x) − (x, xn ) + |x|2 → 0 as n → ∞ if xn converges weakly to x. Example (Weak Convergence) (1) Let X = L2 (0, 1) and consider a sequence xn = √ 2 sin(n πt), t ∈ (0, 1). Then |xn | = 1 and xn converges weakly to 0. In fact, since a family {sin(kπt)} is dense in L2 (0, 1) and (xn , sin(kπt))L2 → 0 as n → ∞ for all k and thus xn converges weakly to 0. Since |0| = 0, the sequence xn does not converge strongly to 0. In general, if xn → x weakly and yn → y strongly in a Hilbert space, then (xn , yn ) → (x, y) since (xn , yn ) = (xn − x, y) + (xn , yn − y). R1 But, note that (xn , xn ) = 2 0 sin2 (nπt) dt = 1 6= 0 and (xn , xn ) does not converges to 0. In general (xn , yn ) does not necessarily converge to (x, y) if xn → x and yn → y weakly in X. Alaoglu Theorem If X is a normed space and {x∗n } is a bounded sequence in X ∗ , then there exists a subsequence that converges weakly star to an element of X ∗ . Proof: We prove the theorem for the case when X is separable. Let {x∗n } be sequence in X ∗ such that |x∗n | ≤ 1. Let {xk } is a dense sequence in X. The sequence {hx∗n , x1 i} is a bounded sequence in R and thus a convergent subsequence denoted by {hx∗n1 , x1 i}. Similarly, the sequence {hx∗n1 , x2 i} contains a a convergent subsequence {hx∗n2 , x1 i}. Continuing in this manner we extract subsequences {hx∗nk , xk i} that converges in R. Thus, the diagonal sequence {x∗nn } on the dense subset {xk }, i.e., {hx∗nn , xk i} converges for each xk . We will show that x∗nn converges weakly start to an element x∗ ∈ X ∗ . Let a x ∈ X be fixed. For arbitrary > 0 there exists N ≥ N such that |x − N X k=1 αk xk | ≤ . 3 Thus, |hx∗nn , xi − hx∗mm , xi| ≤ |hx∗nn , x − +|hx∗nn − x∗mm , PN k=1 αk xk i| PN k=1 αk xk i| + |hx∗mm , x − 25 PN k=1 αk xk i| ≤ for sufficiently large nn, mm. Thus, {hx∗nn , xi} is a Cauchy sequence in R and converges to a real number hx∗ , xi. The functional hx∗ , xi so defined is obviously linear and |x∗ | ≤ 1 since |hx∗nn , xi| ≤ |x|. Example (Weak Star Convergence) (1) Let X = L1 (Ω) and L∞ (Ω) = X ∗ . Thus, a bounded sequence in L∞ (Ω) has a weakly star convergent subsequence. Let X = L∞ (0, 1) and {xn } be a sequence in X defined by 1 on [ 12 + n1 , 1] n x on [ 12 − n1 , 12 − n1 ] xn (t) = −1 on [0, 12 − n1 ]; Then |xn |X = 1 and {xn } converges weakly star to the sign function x(t) = sign(t− 21 ). But, since |xn − x| = 12 , {xn } does not converge strongly to x. (2) Let X = c0 , the space of infinite sequences x = (x1 , x2 , · · · ) that converge to zero with norm |x|∞ = maxk |xk |. Then X ∗ = `1 and X ∗∗ = `∞ . Let {x∗n } in `1 = X ∗ defined by x∗n = (0, · · · , 1, 0, · · · ) where 1 appears at n−th term only. Then, x∗n converge weakly star to 0 in X ∗ but x∗n does not converge weakly to 0 since hx∗n , x∗∗ i does not converge for x∗∗ = (1, 1, · · · ) ∈ X ∗∗ . Example (L1 (Ω) bounded sequence) Let X = L1 (0, 1) and let {xn } ∈ X be a sequence defined by 1 1 1 n t ∈ ( 21 − 2n , 2 + 2n ) xn (t) = 0 otherwise. Then, |xn |L1 = 1 and Z 0 1 1 xn (t)φ(t) dt → φ( ) 2 for all φ ∈ C[0, 1] but a linear functional f (φ) = φ( 12 ) is not bounded on L∞ (0, 1). Thus, xn does not converge weakly. Actually, sequence xn converges weakly star in C[0, 1]∗ . Eberlein-Suhmulyan Theorem A Banach space X is reflexive if and only if and only if every closed bounded set is weakly sequentially compact, i.e., every bounded sequence of X contains a subsequence that converges weakly to an element of X. Next, we consider linear operator T : X → Y . Definition (Linear Operator) Let X, Y be normed spaces. A linear operator T on D(T ) ⊂ X into Y satisfies T (α x1 + β x2 ) = α T x1 + β T x2 where the domain D(T ) is a subspace of X. Then a linear operator T is continuous if and only if there exists a constant M ≥ 0 such that |T x|Y ≤ M |x|X for all x ∈ X. For a continuous linear operator we define the operator norm |T | by |T | = sup |T x|Y = inf M, |x|X =1 26 where we assume D(T ) = X. A continuous linear operator on a normed space X into Y is called a bounded linear operator on X into Y and will be denoted by T ∈ L(X, Y ). The norm of T ∈ L(X, Y ) is denoted either by |T |L(X,Y ) or simply by |T |. Note that the space L(X, Y ) is a Banach space with the operator norm and X ∗ = L(X, R). Open Mapping theorem Let T be a continuous linear operator from a Banach space X onto a Banach space Y . Then, T is an open map, i.e., maps every open set to of X onto an open set in Y . Proof: Suppose A : X → Y is a surjective continuous linear operator. In order to prove that A is an open map, it is sufficient to show that A maps the open unit ball in X to a neighborhood of the origin of Y . Let U S = B1 (0) and V = B1 (0) be a open unit ball of 0 in X and Y , respectively. Then X = k∈N kU . Since A is surjective, ! [ [ Y = A(X) = A kU = A(kU ). k∈N k∈N Since Y is a Banach, by the Baire’s◦category theorem there exists a k ∈ N such that ◦ A(kU ) 6= ∅. Let c ∈ A(kU ) . Then, there exists r > 0 such that Br (c) ⊆ ◦ A(kU ) . If v ∈ V , then ◦ c, c + rv ∈ Br (c) ⊆ A(kU ) ⊆ A(kU ). ◦ and the difference rv = (c + rv) − c of elements in A(kU ) satisfies rv ∈⊆ A(2kU ). Since A is linear, V ⊆ A 2k r U . Thus, it follows that for all y ∈ Y and > 0, there exists x ∈ X such that |x|X < 2k r |y|Y and |y − Ax|Y < . (2.1) Fix y ∈ δV , by (2.1), there is some x1 ∈ X with |x1 | < 1 and |y − Ax1 | < 2δ . By (2.1) one can find a sequence {xn } inductively by |xn | < 2−(n−1) and |y − A(x1 + x2 + · · · + xn )| < δ2−n . (2.2) Let sn = x1 +x2 +· · ·+xn . From the first inequality in (2.2), {sn } is a Cauchy sequence, and since X is complete, {sn } converges to some x ∈ X. By (2.2), the sequence Asn tends to y in Y , and thus Ax = y by continuity of A. Also, we have |x| = lim |sn | ≤ n→∞ ∞ X |xn | < 2. n=1 This shows that every y ∈ δ V belongs to A(2U ), or equivalently, that the image A(U ) of U = B1 (0) in X contains the open ball 2δ V ⊂ Y . Corollary (Open Mapping theorem) A bounded linear operator T on Banach spaces is bounded invertible if and only if it is one-to-one and onto. Definition (Closed Linear Operator) (1) The graph G(T ) of a linear operator T on the domain D(T ) ⊂ X into Y is the set (x, T x) : x ∈ D(T )} in the product space X × Y . Then T is closed if its graph G(T ) is a closed linear subspace of X × Y , i.e., 27 if xn ∈ D(T ) converges strongly to x ∈ X and T xn converges strongly to y ∈ Y , then x ∈ D(T ) and y = T x. Thus the notion of a closed linear operator is an extension of the notion of a bounded linear operator. (2) A linear operator T is said be closable if xn ∈ D(T ) converges strongly to 0 and T xn converges strongly to y ∈ Y , then y = 0. For a closed linear operator T , the domain D(T ) is a Banach space if it is equipped by the graph norm 1 |x|D(T ) = (|x|2X + |T x|2Y ) 2 . Example (Closed linear Operator) Let T = d dt with X = Y = L2 (0, 1) is closed and dom (A) = H 1 (0, 1) = {f ∈ L2 (0, 1) : absolutely continuous functions on [0, 1] with square integrable derivative}. If yn = T xn , then Z xn (t) = xn (0) + t yn (s) ds. 0 If xn ∈ dom(T ) → x and yn → y in L2 (0, 1), then letting n → ∞ we have Z t y(s) ds, x(t) = x(0) + 0 i,e., x ∈ dom (T ) and T x = y. In general if for λ I + T for some λ ∈ R has a bounded inverse (λ I + T )−1 , then T : dom (A) ⊂ X → X is closed. In fact, T xn = yn is equivalent to xn = (λ I + T )−1 (yn + λ xn Suppose xn → x and yn → y in X, letting n → ∞ in this, we have x ∈ dom (T ) and T x = T (λ I + T )−1 (λ x + y) = y. Definition (Dual Operator) Let T be a linear operator X into Y with dense domain D(T ). The dual operator of T ∗ of T is a linear operator on Y ∗ into X ∗ defined by hy ∗ , T xiY ∗ ×Y = hT ∗ y ∗ , xiX ∗ ×X for all x ∈ D(T ) and y ∗ ∈ D(T ∗ ). In fact, for y ∗ ∈ Y ∗ x∗ ∈ X ∗ satisfying hy ∗ , T xi = hx∗ , xi for all x ∈ D(T ) is uniquely defined if and only if D(T ) is dense. The only if part follows since if D(T ) 6= X then the Hahn-Banach theory there exits a nonzero x∗0 ∈ X ∗ such that hx∗0 , xi = 0 for all D(T ), which contradicts to the uniqueness assumption. If T is bounded with D(T ) = X then T ∗ is bounded with kT k = kT ∗ k. Examples Consider the gradient operator T : L2 (Ω) → L2 (Ω)n as T u = ∇u = (Dx1 u, · · · Dxn u) with D(T ) = H 1 (Ω). The, we have for v ∈ L2 (Ω)n X T ∗ v = −div v = − D xk v k 28 with domain D(T ∗ ) = {v ∈ L2 (Ω)n : div v ∈ L2 (Ω) and n · v = 0 at the boundary ∂Ω}. In fact by the divergence theorem Z Z Z (T u, v) = ∇u · v (n · v)u ds − u(div v) dx = (u, T ∗ v) Ω ∂Ω Ω for all v ∈ C 1 (Ω). First, let u ∈ H01 (Ω) we have T ∗ v = −div v ∈ L2 (Ω) since H01 (Ω) is dense in L2 (Ω). Thus, n · v ∈ L( ∂Ω) and n · v = 0. Definition (Hilbert space Adjoint operator) Let X, Y be Hilbert spaces and T be a linear operator X into Y with dense domain D(T ). The Hilbert self adjoint operator of T ∗ of T is a linear operator on Y into X defined by (y, T x)Y = (T ∗ y, x)X for all x ∈ D(T ) and y ∈ D(T ∗ ). Note that if we let T 0 : Y ∗ → X ∗ is the dual operator of T , then T ∗ RY ∗ →Y = RX ∗ →X T 0 where RX ∗ →X and RY ∗ →Y are the Riesz maps. Examples (self-adjoint operator) Let X = L2 (Ω) and T be the Laplace operator T u = ∆u = n X D xk xk u k=1 with domain D(T ) = H 2 (Ω) ∩ H01 (Ω). Then T is sel-adjoint, i.e., T ∗ = T . In fact Z Z Z (T u, v)X = ∆u v dx = ((n · ∇u)v − (n · ∇v)u) ds + ∆v u dx = (x, T ∗ v) Ω ∂Ω Ω for all v ∈ C 1 (Ω). Closed Range Theorem Let X and Y be Banach spaces and T a closed linear operator on D(T ) ⊂ X into Y . Assume that D(T ) is dense in X. The the following properties are all equivalent. (a) R(T ) is closed in Y . (b) R(T ∗ ) is closed in X ∗ . (c) R(T ) = N (T ∗ )⊥ = {y ∈ Y : hy ∗ , yi = 0 for all y ∗ ∈ N (T ∗ )}. (d) R(T ∗ ) = N (T )⊥ = {x∗ ∈ X ∗ : hx∗ , xi = 0 for all x∗ ∈ N (T )}. Proof: If y ∗ ∈ N (T ∗ )⊥ , then hT x, y ∗ i = hx, T ∗ y ∗ i for all x ∈ dom(T ) and N (T ∗ )⊥ ⊂ R(T ). If y ∈ R(T ), then hy, y ∗ i = hx, T ∗ y ∗ i for all y ∗ ∈ N (T ∗ ) and R(T ) ⊂ N (T ∗ )⊥ . Thus, R(T ) = N (T ∗ )⊥ . Since T is closed, the graph G = G(T ) = {(x, T x)inX × Y : x ∈ dom(T )} is closed. Thus, G is a Banach space equipped with norm |{x, y}| = |x| + |y| of X × Y . Define a continuous linear operator S from G into Y by S{x, T x} = T x 29 Then, the dual operator S ∗ of S is continuous linear operator from Y ∗ intoG∗ and we have h{x, T x}, S ∗ y ∗ i = hS{x, T x}, y ∗ i = hT x, y ∗ i = h{x, T x}, {0, y ∗ }i ˜ ∗ = S ∗ y ∗ − {0, y ∗ } : Y ∗ → for x ∈ dom(T ) and y ∗ ∈ Y ∗ . Thus, the functional S˜ Sy ∗ ∗ X ×Y vanishes for all elements of G. But, since h{x, T x}, {x∗ , y1∗ } for all x ∈ dom(T ), is equivalent to hx, x∗ i = h−T x, y1∗ i for all x ∈ dom(T ), i.e., −T ∗ y1∗ = x∗ , S ∗ y ∗ = {0, y ∗ } + {−T ∗ y1∗ , y1∗ } = {−T ∗ y1∗ , y ∗ + y1∗ } for all y ∗ ∈ Y ∗ . Since y1∗ is arbitrary, R(S ∗ ) = R(T ∗ ) × Y ∗ . Therefore, R(S ∗ ) is closed in X ∗ × Y ∗ if and only if R(T ) is closed and since R(S) = R(T ), R(S) is closed if and only if R(T ) is closed in Y . Hence we have only prove the equivalency of (a) and (b) in the special case of a bounded operator S. If T is bounded, then (a) implies(b). Since R(T ) a Banach space, by the HahnBanach theorem, one can assume R(T ) = Y without loss of generality hT x, y1∗ i = hx, T ∗ y1∗ i By the open mapping theorem, there exists c > 0 such that for each y ∈ Y there exists an x ∈ X with T x = y and |x| ≤ c |y|. Thus, for y ∗ ∈ Y ∗ , we have |hy, y ∗ i| = |hT x, y ∗ i| = |hx, T ∗ y ∗ i| ≤ c|y||T ∗ y ∗ |. and |y ∗ | ≤ sup |hy, y ∗ i| ≤ c|T ∗ y ∗ | |y|≤1 (T ∗ )−1 and so exists and bounded and D(T ∗ ) = R(T ∗ ) is closed. Let T1 : X → Y1 = R(T ) be defined by T1 x = T x For y1∗ ∈ Y1∗ hT1 x, y1∗ i = hx, T ∗ y ∗ i = 0, for all x ∈ X and since R(T ) is dense in R(T ), y1∗ = 0. Since R(T ∗ ) = R(T1∗ ) is closed, (T1∗ )−1 exists on Y1 = (Y1 )∗ 2.5 Distribution and Generalized Derivatives In this section we introduce the distribution (generalized function). The concept of distribution is very essential for defining a generalized solution to PDEs and provides the foundation of PDE theory. Let D(Ω) be a vector space of all infinitely many continuously differentiable functions C0∞ (Ω) with compact support in Ω. For any compact set K of Ω, let DK (Ω) be the set of all functions f ∈ C0∞ (Ω) whose support are in K. Define a family of seminorms on D(Ω) by pK,m (f ) = sup sup |Ds f (x)| x∈K |s|≤m where Ds = ∂ ∂x1 s1 ··· 30 ∂ ∂xn sn where s = (s1 , · · · , sn ) is nonnegative integer valued vector and |s| = DK (Ω) is a locally convex topological space. P sk ≤ m. Then, Definition (Distribution) A linear functional T defined on C0∞ (Ω) is a distribution if for every compact subset K of Ω, there exists a positive constant C and a positive integer k such that |T (φ)| ≤ C sup|s|≤k, x∈K |Ds φ(x)| for all φ ∈ DK (Ω). Definition (Generalized Derivative) A distribution S defined by S(φ) = −T (Dxk φ) for all φ ∈ C0∞ (Ω) is called the distributional derivative of T with respect to xk and we denote S = Dxk T . In general we have S(φ) = Ds T (φ) = (−1)|s| T (Ds φ) for all φ ∈ C0∞ (Ω). This definition is naturally followed from that for f is continuously differentiable Z Z ∂ Dxk f φ dx = − f φ dx Ω Ω ∂xk and thus Dxk f = Dxk Tf = T tive of Tf . ∂ ∂xk f . Thus, we let Ds f denote the distributional deriva- Example (Distribution) (1) For f is a locally integrable function on Ω, one defines the corresponding distribution by Z Tf (φ) = f φ dx for all φ ∈ C0∞ (Ω). Ω since Z |Tf (φ)| ≤ |f | dx sup |φ(x)|. x∈K K (2) T (φ) = φ(0) defines the Dirac delta δ0 at x = 0, i.e., |δ0 (φ)| ≤ sup |φ(x)|. x∈K (3) Let H be the Heaviside function defined by 0 for x < 0 H(x) = 1 for x ≥ 0 Then, Z DTH (φ) = − ∞ H(x)φ0 (x) dx = φ(0) −∞ and thus DTH = δ0 is the Dirac delta function at x = 0. (4) The distributional solution for −D2 u = δx0 satisfies Z ∞ − uφ00 dx = φ(x0 ) −∞ 31 for all φ ∈ C0∞ (R). That is, u = 21 |x − x0 | is the fundamental solution, i.e., Z ∞ Z x0 Z ∞ 00 0 φ0 (x) dx = 2φ(x0 ). φ (x) dx − |x − x0 |φ dx = − ∞ −∞ x0 In general for d ≥ 2 let G(x, x0 ) = 1 4π log|x − x0 | d=2 cd |x − x0 |2−d d ≥ 3. Then ∆G(x, x0 ) = 0, x 6= x0 . and u = G(x, x0 ) is the fundamental solution to to −∆ in Rd , −∆u = δx0 . In fact, let B = {|x−x0 | ≤ } and Γ = {|x−x0 | = } be the surface. By the divergence theorem Z Z ∂ ∂ G(x, x0 )∆φ(x) dx = φ(G(x, x0 ) − G(x, x0 )φ(s)) ds ∂ν Rd \B (x0 ) Γ ∂ν Z = (2−d Γ ∂φ 1 − (2 − d)1−d φ(s)) ds → φ(x0 ) ∂ν cd That is, G(x, x0 ) satisfies Z − G(x, x0 )∆φ dx = φ(x0 ). Rd In general let L be a linear diffrenrtial operator and L∗ denote the formal adjoint operator of L An locally integrable function u is said to be a distributional solution to Lu = T where L with a distribution T if Z u(L∗ φ) dx = T (φ) Ω for all φ ∈ C0∞ (Ω). Definition (Sovolev space) For 1 ≤ p < ∞ and m ≥ 0 the Sobolev space is W m,p (Ω) = {f ∈ Lp (Ω) : Ds f ∈ Lp (Ω), |s| ≤ m} with norm |f |W m,p (Ω) Z = 1 p X s p |D f | dx . Ω |s|≤m That is, |Ds f (φ)| ≤ c |φ|Lq with 32 1 1 + = 1. p q Remark (1) X = W m,p (Ω) is complete. In fact If {fn } is Cauchy in X, then {Ds fn } is Cauchy in Lp (Ω) for all |s| ≤ m. Since Lp (Ω) is complete, Ds fn → g s in Lp (Ω). But since Z Z Z s s f D φ dx = g s φ dx, fn D φ dx = lim n→∞ Ω Ds f Ω gs we have = for all |s| ≤ m and |fn − f |X → 0 as n → ∞. (2) H m,p ⊂ W 1,p (Ω). Let H m,p (Ω) be the completion of C m (Ω) with respect to W I,p (Ω) norm. That is, f ∈ H m,p (Ω) there exists a sequence fn ∈ C m (Ω) such that fn → f and Ds fn → g s strongly in Lp (Ω) and thus Z Z |s| s |s| g s φ dx Ds fn φ dx → (−1) D fn (φ) = (−1) Ω Ω which implies g s = Ds f and f ∈ W 1,p (Ω). (3) If Ω has a Lipschitz continuous boundary, then W m,p (Ω) = H m,p (Ω). 2.6 Compact Operator A compact operator is a linear operator A from a Banach space X to another Banach space Y , such that the image under A of any bounded subset of X is a relatively compact subset of Y , i.e., its closure is compact. For example, the integral operator is a concrete example of compact operators. A Fredholm integral equation gives rise to a compact operator A on function spaces. A compact set in a Banach space is bounded. The converse doses not hold in general. The the closed unit sphere is weakly star compact but not strongly compact unless X is of finite dimension. Let C(S) be the space of continuous functions on a compact metric space S with norm |x| = maxs∈S |x(s)|. the Arzela-Ascoli theorem gives the necessary and sufficient condition for a subset {xα } being (strongly) relatively compact. Arzela-Ascoli theorem A family {xα } in C(S) is strongly compact if and only if the following conditions hold: sup |xα | < ∞ (equi-bouned) α lim sup δ→0+ α, |s0 −s00 |≤δ |xα (s0 ) − xα (s0 )| = 0 (equi-continuous). Proof: By the Bolzano-Weirstrass theorem for a fixed s ∈ S {xα (s)} contains a convergent subsequence. Since S is compact, there is a countable dense subset {sn } in S such that for every > 0 there exists n satisfying sup inf s∈S 1≤j≤n dist(s, sj ) ≤ . Let us denote a convergent subsequence of {xn (s1 )} by {xn1 (s1 )}. Similarly, the sequence {hxn1 (s2 )} contains a a convergent subsequence {xn2 (s2 )}. Continuing in this manner we extract subsequences {xnk (sk )}. The diagonal sequence {x∗nn (s)} converges for s = s1 , s2 , · · · , simultaneously for the dense subset {sk }. We will show that {xnn } 33 is a Cauchy sequence in C(S). By the equi-continuity of {xn (s)} for all > 0 there exists δ such that |xn (s0 ) − xn (s00 )| ≤ for all s0 , s0 satisisfying dist(s0 , s00 ) ≤≤ δ. Thus, for s ∈ S there exists a j ≤ k such that |xnn (s) − xmm (s)| ≤ |xnn (s) − xmm (sj )| + |xnn (sj ) − xmm (sj )| + |xmm (s) − xmm (sj )| ≤ 2 + |xnn (sj ) − xmm (sj )| which implies that limm,m→∞ maxs∈S |xnn (s) − xmm (s)| ≤ 2 for arbitrary > 0. For the space Lp (Ω), we have Frechet-Kolmogorov theorem Let Omega be a subset of Rd . A family {xα } of Lp (Ω) functions is strongly compact if and only if the following conditions hold: Z sup |xα |p ds < ∞ (equi-bouned) α Ω Z lim sup |t|→0 α |xα (t + s) − xα (s)|p ds = 0 (equi-continuous). Ω Example (Integral Operator) Let k(x, y) be the kernel function satisfying Z Z |k(x, y)|2 dxdy < ∞ Ω Ω Define the integral operator A by Z k(x, y)u(y) dy for u ∈ X = L2 (Ω). Au = Ω Then A ∈ L(X) i compact. This class of operators A is called the Hilbert-Schmitz operator. Rellich-Kondrachov theorem Every uniformly bounded sequence in W 1,p (Ω) has a dp subsequence that converges in Lq (Ω) provided that q < p∗ = d−p . Fredholm Alternative theorem Let A ∈ L(X) be compact. For any nonzero λ ∈ C, either 1) The equation Ax − λ x = 0 has a nonzero solution x, or 2) The equation Ax − λ, x = f has a unique solution x for any function f ∈ X. In the second case, (λ I − A)−1 is bounded. Exercise Problem 1 T : D(T ) ⊂ X → Y is closable if and only if the closure of its graph G(T ) in X × Y the graph of a linear operator S. Problem 2 For a closed linear operator T , the domain D(T ) is a Banach space if it is equipped by the graph norm 1 |x|D(T ) = (|x|2X + |T x|2Y ) 2 . 3 Constrained Optimization In this section we develop the Lagrange multiplier theory for the constrained minimization in Banach spaces. 34 3.1 Hilbert space theory In this section we develop the Hilbert space theory for the constrained minimization. Theorem 1 (Riesz Representation) Let X be a Hilbert space. Given F ∈ X ∗ , consider the minimization 1 J(x) = (x, x)X − F (x) 2 over x ∈ X There exists a unique solution x∗ ∈ X and x∗ satisfies (v, x∗ ) = F (v) for all v ∈ X (3.1) The solution x∗ ∈ X is the Riesz representation of F ∈ X ∗ and it satisfies |x∗ |X = |F |X ∗ . Proof: Suppose x∗ is a minimizer. Then 1 1 ∗ (x + tv, x∗ + tv) − F (x∗ + tv) ≥ (x∗ , x∗ ) − F (x∗ ) 2 2 for all t ∈ R and v ∈ X. Thus, for t > 0 t (v, x∗ ) − F (v) + |v|2 ≥ 0 2 Letting t → 0+ , we have (v, x∗ ) − F (v) ≥ 0 for all v ∈ X. If v ∈ X, then −v ∈ X and this implies (3.1). (Uniqueness) Suppose x∗1 , x∗2 are minimizers. Then, (v, .x∗1 ) − (v, x∗2 ) = F (v) − F (v) = 0 for all v ∈ X Letting v = x∗1 − x∗2 , we have |x∗1 − x∗2 |2 = (x∗1 − x∗2 , x∗1 − x∗2 ) = 0, i.e., x∗1 = x∗2 . (Existence) Suppose {xn } is a minimizing sequence, i.e., J(xn ) is decreasing and limn→∞ J(xn ) = δ = inf x∈X J(x). Note that |xn − xm |2 + |xn + xm |2 = 2(|xn |2 + |xm |2 ) Thus, 1 xn + xm xn + xm |xn −xm |2 = J(xm )+J(xn )−2J( )−F (xn )−F (xm )+2F ( ) ≤ J(xn )+J(xm )−2δ. 4 2 2 This implies {xn } is a Cauchy sequence in X. Since X is complete there exists x∗ = limn→∞ xn in X and J(x∗ ) = δ, i.e, x∗ is the minimizer. Example (Mechanical System) Z min 0 1 1 d | u(x)|2 − 2 dx Z 1 f (x)u(x) dx 0 over u ∈ H01 (0, 1). Here X = H01 (0, 1) = {u ∈ H 1 (0, 1) : u(0) = u(1) = 0} 35 (3.2) is a Hilbert space with the inner product Z (u, v)H01 = 1 d d u v dx. dx dx 0 H 1 (0, 1) is a space of absolute continuous functions on [0, 1] with square integrable derivative, i.e., Z x d u(x), a.e. and g ∈ L2 (0, 1). g(x) dx, g(x) = u(x) = u(0) + dx 0 The solution yf = x∗ to (3.8) satisfies 1 Z 0 Z d d v yf dx = dx dx f (x)v(x) dx. 0 By the integration by part Z 1 Z 1 Z f (x)v(x) dx = ( 0 0 and thus Z 1 ( 0 for all v ∈ H01 (0, 1). Z d yf − dx 1 1 d v(x) dx, dx f (s) ds) x 1 f (s) ds) x d v(x) dx = 0 dx This implies that Z 1 d yf − f (s) ds = c = a constant dx x Integrating this, we have xZ 1 Z yf = f (s) dsdx + cx 0 x Since yf (1) = 0 we have Z 1Z 1 c=− f (s) dsdx. 0 Moreover, d dx yf x ∈ H 1 (0, 1) and − d2 yf (x) = f (x) dx2 a.e. with yf (0) = yf (1) = 0. Now, we consider the case when F (v) = v(1/2). Then, Z |F (v)| = | 0 That is, F ∈ X ∗. 1/2 d v dx| ≤ ( dx Z 0 1/2 d | v| dx)1/2 ( dx Z 0 1/2 1 1 dx)1/2 ≤ √ |v|X . 2 Thus, we have Z 1 d d ( yf − χ(0,1/2) (x)) v(x) dx = 0 dx 0 dx 36 for all v ∈ H01 (0, 1), where χS is the characteristic function of a set S. It is equivalent to d yf − χ(0,1/2) (x) = c = a constant. dx Integrating this, we have (1 + c)x x ∈ [0, 21 ] yf (x) = cx + 12 x ∈ [ 12 , 1]. Since yf (1) = 0 we have c = − 12 . Moreover yf satisfies − d2 1 yf = 0, for x 6= 2 dx 2 yf (( 12 )− ) = yf ( 21 )+ ), yf (0) = yf (1) = 0 d 1 − dx yf (( 2 ) ) − d 1 + dx yf (( 2 ) ) = 1. Or, equivalently d2 yf = δ 1 (x) 2 dx2 is the Dirac delta distribution at x = x0 . − where δx0 Theorem 2 (Orthogonal Decomposition X = M ⊕ M ⊥ ) Let X be a Hilbert space and M is a linear subspace of X. Let M be the closure of M and M ⊥ = {z ∈ X : (z, y) = 0 for all y ∈ M } be the orthogonal complement of M . Then every element u ∈ X has the unique representation u = u1 + u2 , u1 ∈ M and u2 ∈ M ⊥ . Proof: Consider the minimum distance problem min |x − u|2 over x ∈ M . (3.3) As for the proof of Theorem 1 one can prove that there exists a unique solution x∗ for problem (3.3) and x∗ ∈ M satisfies (x∗ − u, y) = 0, for all y ∈ M i.e., u − x∗ ∈ M ⊥ . If we let u1 = x∗ ∈ M and u2 = u − x∗ ∈ M ⊥ , then u = u1 + u2 is the desired decomposition. P Example (Closed subspace) (1) M = span{(yk , k ∈= {x ∈ X; x = k∈K αk yk } is a subspace of X. If K is finite, M is closed. (2) Let X = L2 (−1, 1) and M = {odd functions} = {f ∈ X : f (−x) = f (x), a.e.} and then M is closed. Let M1 = span{sin(kπt)} and M2 = span{cos(kπt)}. L2 (0, 1) = M1 ⊕ M2 . (3) Let E ∈ L(X, Y ). Then M = N (E) = {x ∈ X : Ex = 0} is closed. In fact, if xn ∈ M and xn → x in X, then 0 = limn→∞ Exn = E limn→∞ xn = Ex and thus x ∈ N (E) = M . 37 Example (Hodge-Decomposition) Let X = L2 (Ω)3 with where Ω is a bounded open set in R3 with Lipschitz boundary Γ. Let M = {∇u, u ∈ H 1 (Ω)} is the gradient field. Then, M ⊥ = {ψ ∈ X : ∇ · ψ = 0, n · ψ = 0 at Γ} is the divergence free field. Thus, we have the Hodge decomposition X = M ⊕ M ⊥ = {grad u, u ∈ H 1 (Ω)} ⊕ {the divergence free field}. In fact ψ ∈ M ⊥ Z Z ∇φ · ψ = Z n · ψφ ds − Γ ∇ψφ dx = 0 Ω and ∇u ∈ M, u ∈ H 1 (Ω)/R is uniquely determined by Z Z f · ∇φ dx ∇u · ∇φ dx = Ω Ω for all φ ∈ H 1 (Ω). Theorem 3 (Decomposition) Let X, Y be Hilbert spaces and E ∈ L(X, Y ). Let E ∗ ∈ L(Y, X) is the adjoint of E, i.e., (Ex, y)Y = (x, E ∗ y)X for all x ∈ X and y ∈ Y . Then N (E) = R(E ∗ )⊥ and thus from the orthogonal decomposition theorem X = N (E) ⊕ R(E ∗ ). Proof: Suppose x ∈ N (E), then (x, E ∗ y)X = (Ex, y)Y = 0 for all y ∈ Y and thus N (E) ⊂ R(E ∗ )⊥ . Conversely, x∗ ∈ R(E ∗ )⊥ , i.e., (x∗ , E ∗ y) = (Ex, y) = 0 for all y ∈ Y Thus, Ex∗ = 0 and R(E ∗ )⊥ ⊂ N (E). Remark Suppose R(E) is closed in Y , then R(E ∗ ) is closed in X and Y = R(E) ⊕ N (E ∗ ), X = R(E ∗ ) ⊕ N (E) Thus, we have b = b1 + b2 , b1 ∈ R(E), b2 ∈ N (E ∗ ) for all b ∈ Y and |Ex − b|2 = |Ex − b1 |2 + |b2 |2 . Example (R(E) is not closed) Consider X = L2 (0, 1) and define a linear operator E ∈ L(X) by Z t (Ef )(t) = f (s) ds, t ∈ (0, 1) 0 Since R(E) ⊂ {absolutely continuous functions on [0, 1]}, R(E) = L2 (0, 1) and R(E) is not a closed subspace of L2 (0, 1). 38 3.2 Minimum norm problem We consider min |x|2 subject to (x, yi )X = ci , 1 ≤ i ≤ m. (3.4) Theorem 4 (Minimum Norm) Suppose {y Then there Pi }m are linearly independent. m ∗ ∗ exits a unique solution x to (3.4) and x = i=1 βi yi where β = {βi }i=1 satisfies Gβ = c, Gi,j = (yi , yj ). ⊥ ∗ Proof: Let M = span{yi }m i=1 . Since X = M ⊕ M , it is necessary that x ∈ M , i.e., ∗ x = m X β i yi . i=1 From the constraint (x∗ , yi ) = ci , 1 ≤ i ≤ m, the condition for β ∈ Rn follows. In fact, x ∈ X satisfies (x, yi ) = ci , 1 ≤ i ≤ m if and only if x = x∗ + z, z ∈ M ⊥ . But |x|2 = |x∗ |2 + |z|2 . Remark {yi } is linearly independent if and only if the symmetric matrix G ∈ Rn,n is positive definite since (x, Gx)Rn = | n X xk yk |2X for all x ∈ Rn . k=0 Theorem 5 (Minimum Norm) Suppose E ∈ L(X, Y ). norm problem min |x|2 subject to Ex = c Consider the minimum If R(E) = Y , then the minimum norm solution x∗ exists and x∗ = E ∗ y, (EE ∗ )y = c. Proof: Since R(E ∗ ) is closed by the closed range theorem, it follows from the decomposition theorem that X = N (E) ⊕ R(E ∗ ). Thus, we x∗ = E ∗ y if there exists y ∈ Y such that (EE ∗ )y = c. Since Y = N (E ∗ ) ⊕ R(E), N (E ∗ ) = {0}. Since if EE ∗ x = 0, then (x, EE ∗ x) = |E ∗ x| = 0 and E ∗ x = 0. Thus, N (EE ∗ ) = N (E ∗ ) = {0}. Since R(EE ∗ ) = R(E) it follows from the open mapping theorem EE ∗ is bounded invertible and there exits a unique solution to (EE ∗ )y = c. Remark If R(E) is closed and c ∈ R(E), we let Y = R(E) and Theorem 5 is valid. For example we define E ∈ L(X, Rm ) by Ex = ((y1 , x)X , · · · , (ym , x)X ) ∈ Rm . m m If {yi }m i=1 are dependent, then span{yi }i=1 is a closed subspace of R Example (DC-motor Control) Let (θ(t), ω(t))) satisfy d dt ω(t) + ω(t) = u(t), d dt θ(t) = ω(t), ω(0) = 0 (3.5) θ(0) = 0 39 This models a control problem for the dc-motor and the control function u(t) is the current applied to the motor, θ is the angle and ω is the angular velocity. The objective ¯ 0) is reached at time t = 1 and to find a control u on [0, 1] such the desired state (θ, with minimum norm Z 1 |u(t)|2 dt. 0 Let X = L2 (0, 1). Given u ∈ X we have the unique solution to (3.5) and ω(1) = θ(t) = R1 0 R1 0 et−1 u(t) dt (1 − et−1 )u(t) dt Thus, we can formulate this control problem as problem (3.4) by letting X = L2 (0, 1) with the standard inner product and y1 (t) = 1 − et−1 , ¯ c1 = θ, y2 (t) = et−1 c2 = 0. Example (Linear Control System) Consider the linear control system d x(t) = Ax(t) + Bu(t), dt x(0) = x0 ∈ Rn (3.6) where x(t) ∈ Rn and u(t) ∈ Rm are the state and the control function, respectively and system matrices A ∈ Rn×n and B ∈ Rn×m are given. Consider a control problem of finding a control that transfers x(0) = 0 to a desired state x ¯ ∈ Rn at time T > 0, i.e., x(T ) = x ¯ with minimum norm. The problem can be formulated as (3.4). First, we have the variation of constants formula: Z t At x(t) = e x0 + eA(t−s) Bu(s) ds 0 d At where eAt is the matrix exponential and satisfies dt e = AeAt = eAt A. Thus, condition x(T ) = x ¯ is written as Z T eA(T −t) Bu(t) ds = x ¯ 0 L2 (0, T ; Rm ) Let X = solution is given by and it then follows from Theorem 4 that the minimum norm u∗ (t) = B t E A t (T −t) β, Gβ = x ¯, provided that the control Gramean Z T t G= eA(T −s) BB t eA (T −t) dt ∈ Rn×n 0 is positive on Rn . If G is positive, then control system (3.6) is controllable and it is equivalent to t B t eA x = 0, t ∈ [0, T ] implies x = 0. since T Z t |B t eA t x|2 dt (x, Gx) = 0 40 Moreover, G is positive if and only if the Kalman rank condition holds; i.e., rank [B, AB, · · · An−1 B] = n. t In fact, (x, Gx) = 0 implies B t eA t x = 0, t ≥ 0 and thus B t (At )k = 0 for all k ≥ 0 and thus the claim follows from the Cayley-Hamilton theorem. Example (Function Interpolation) (1) Consider the minimum norm on X = H01 (0, 1): Z 1 d | u|2 dx over H01 (0, 1) min 0 dx (3.7) subject to u(xi ) = ci , 1 ≤ i ≤ m, where 0 < x1 < · · · < xm < 1 are given nodes. Let X = H01 (0, 1). From the mechanical system example we have the piecewise linear function yi (x) ∈ X such that (u, yi )X = u(xi ) for all u ∈ X for each i. From Theorem 3 the solution to (3.7) is given by ∗ u (x) = m X βi yi (x). i=1 That is, u∗ is a piecewise linear function with nodes {xi }, 1 ≤ i ≤ m and nodal values ci , 1 ≤ i ≤ n. ci at xi , i.e., βi = yi (xi ) (2) Consider the Spline interpolation problem (1.2): Z 1 1 d2 min | 2 u(x)|2 dx over all functions u(x) 2 dx 0 u0 (xi ) = ci , 1 ≤ i ≤ m. (3.8) Let X = H02 (0, 1) be the completion of the pre-Hilbert space C02 [0, 1] = the space of twice continuously differentiable function with u(0) = u0 (0) = 0 and u(1) = u0 (1) = 0 with respect to H02 (0, 1) inner product; Z 1 2 d d2 (f, g)H02 = f g dx, 2 dx2 0 dx subject to the interpolation conditions u(xi ) = bi , i.e., H02 (0, 1) = {u ∈ L2 (0, 1) : d d2 u, u ∈ L2 (0, 1) with u(0) = u0 (0) = u(1) = u0 (1) = 0}. dx dx2 Problem (3.8) is a minimum norm problem on x = H02 (0, 1). For Fi ∈ X ∗ defined by Fi (v) = v(xi ) we have Z 1 2 d2 d (3.9) (yi , v)X − Fi (v) = ( 2 yi + (x − xi )χ[0,xi ] ) 2 v dx, dx 0 dx and for F˜i ∈ X ∗ defined by F˜i (v) = v 0 (xi ) we have Z 1 2 d d2 ˜ (˜ yi , v)X − Fi (v) = ( 2 y˜i − χ[0,xi ] ) 2 v dx. dx 0 dx 41 (3.10) Thus, d2 d2 y + (x − x )χ = a + bx, y˜i − χ[0,xi ] = a ˜ + ˜bx i i [0,x ] i dx2 dx2 where a, a ˜ , b, ˜b are constants and yi = RFi and y˜i = RF˜i are piecewise cubic functions on [0, 1]. One can determine yi , y˜i using the conditions y(0) = y 0 (0) = y(1) = y 0 (0) = 0. From Theorem 3 the solution to (3.8) is given by u∗ (x) = m X (βi yi (x) + β˜i y˜i (x)) i=1 That is, u∗ is a piecewise cubic function with nodes {xi }, 1 ≤ i ≤ m and nodal value ci and derivative bi at xi . Example (Differential form, Natural Boundary Conditions) (1) Let X be the Hilbert space defined by X = H 1 (0, 1) = {u ∈ L2 (0, 1) : du ∈ L2 (0, 1)}. dx Given a, c ∈ C[0, 1] with a > 0 c ≥ 0 and α ≥ 0 consider the minimization Z min 0 1 1 du α ( (a(x)| |2 + c(x)|u|2 ) − uf ) dx + |u(1)|2 − gu(1) 2 dx 2 over u ∈ X = H 1 (0, 1). If u∗ is a minimizer, it follows from Theorem 1 that u∗ ∈ X satisfies Z 1 du∗ dv + c(x) uv − f v) dx + α u(1)v(1) − gv(1) = 0 (3.11) (a(x) dx dx 0 for all v ∈ X. By the integration by parts Z 1 d du∗ du∗ du∗ (− (a(x) )+c(x)u∗ (x)−f (x))v(x) dx−a(0) (0)v(0)+(a(1) (1)+α u∗ (1)−g)v(1) = 0, dx dx dx dx 0 ∗ 1 1 assuming a(x) du dx ∈ H (0, 1). Since v ∈ H (0, 1) is arbitrary, − d du∗ (a(x) ) + c(x) u∗ = f (x), dx dx du∗ a(0) (0) = 0, dx x ∈ (0, 1) du∗ a(1) (1) + α u∗ (1) = g. dx (3.12) (4.3) is the weak form and (3.14) is the strong form of the optimality condition. (2) Let X be the Hilbert space defined by X = H 2 (0, 1) = {u ∈ L2 (0, 1) : du d2 u , ∈ L2 (0, 1)}. dx dx Given a, b, c ∈ C[0, 1] with a > 0 b, c ≥ 0 and α ≥ 0 consider the minimization on X = H 2 (0, 1) Z min 0 1 1 d2 u du α β ( (a(x)| 2 |2 +b(x)| |2 +c(x)|u|2 )−uf ) dx+ |u(1)|2 + |u0 (1)|2 −g1 u(1)−g2 u0 (1) 2 dx dx 2 2 42 over u ∈ X = H 2 (0, 1) satisfying u(0) = 1. If u∗ is a minimizer, it follows from Theorem 1 that u∗ ∈ X satisfies Z 1 d2 u∗ d2 v du∗ dv (a(x) 2 +b(x) +c(x) u∗ v−f v) dx+α u∗ (1)v(1)+β (u∗ )0 (1)v 0 (1)−g1 v(1)−g2 v 0 (1) = 0 2 dx dx dx dx 0 (3.13) for all v ∈ X satisfying v(0) = 0. By the integration by parts Z 1 2 d2 d d d (b(x) u∗ ) + c(x)u∗ (x) − f (x))v(x) dx ( 2 (a(x) 2 u∗ ) − dx dx dx 0 dx +(a(1) +(− d2 u∗ du∗ d2 u∗ 0 (1) + β (1) − g )v (1) − a(0) (0)v 0 (0) 2 dx2 dx dx2 d2 u∗ du∗ d d2 u∗ du∗ d (a(x) 2 + b(x) + α u∗ − g1 )(1)v(1) − (− (a(x) 2 ) + b(x) )(0)v(0) = 0 dx dx dx dx dx dx assuming u∗ ∈ H 4 (0, 1). Since v ∈ H 2 (0, 1) satisfying v(0) = 0 is arbitrary, d2 d2 d du∗ (a(x) 2 u∗ ) − (b(x) ) + c(x) u∗ = f (x), 2 dx dx dx dx u∗ (0) = 1, a(0) x ∈ (0, 1) d2 u∗ (0) = 0 dx2 d d2 u∗ du∗ (a(x) 2 ) + b(x) )(1) + α u∗ (1) − g1 = 0. dx dx dx (3.14) (4.3) is the weak form and (3.14) is the strong (differential) form of the optimality condition. Next consider the multi dimensional case. Let Ω be a subdomain in Rd with Lipschtz boundary and Γ0 is a closed subset of the boundary ∂Ω with Γ1 = ∂Ω \ Γ0 . Let Γ is a smooth hypersurface in Ω. a(1) d2 u∗ du∗ (1) + β (1) − g2 = 0, dx2 dx (− HΓ10 (Ω) = {u ∈ H 1 (Ω) : f (s) = 0, s ∈ Γ0 } with norm |∇u|L2 (Ω) . Let Γ is a smooth hyper-surface in Ω and consider the weak form of equation for u ∈ HΓ10 (Ω): Z Z (σ(x)∇u · ∇φ + u(x) (~b(x) · ∇φ) + c(x) u(x)φ(x)) dx + α(s)u(s)φ(s) ds Γ1 Z − Z g0 (s)φ(s) ds − Γ Z f (x)φ(x) dx − Ω g(s)u(s) ds = 0 Γ1 for all HΓ10 (Ω). Then the differential form for u ∈ HΓ10 (Ω) is given by −∇ · (σ(x)∇u + ~bu) + c(x)u(x) = f in Ω n · (σ(x)∇u + ~bu) + α u = g at Γ1 [n · (σ(x)∇u + ~bu)] = g at Γ. 0 43 By the divergence theorem if u satisfies the strong form, then u is a weak solution. Conversely, letting φ ∈ C0 (Ω \ Γ) we obtain the first equation, i.e., ∇ · (σ∇u + ~bu) ∈ L2 (Ω) Thus, n·(σ∇u+~bu) ∈ L2 (Γ) and the third equation holds. Also, n·(σ∇u+~bu) ∈ L2 (Γ1 ) and the third equation holds. 3.3 Lax-Milgram Theory and Applications Let H be a Hilbert space with scalar product (φ, ψ) and X be a Hilbert space and X ⊂ H with continuous dense injection. Let X ∗ denote the strong dual space of X. H is identified with its dual so that X ⊂ H = H ∗ ⊂ X ∗ (i.e., H is the pivoting space). The dual product hφ, ψi on X ∗ × X is the continuous extension of the scalar product of H restricted to H × X. This framework is called the Gelfand triple. Let σ is a bounded coercive bilinear form on X ×X. Note that given x ∈ X, F (y) = σ(x, y) defines a bounded linear functional on X. Since given x ∈ X, y → σ(x, y) is a bounded linear functional on X, say x∗ ∈ X ∗ . We define a linear operator A from X into X ∗ by x∗ = Ax. Equation σ(x, y) = F (y) for all y ∈ X is equivalently written as an equation Ax = F ∈ X ∗ . Here, hAx, yiX ∗ ×X = σ(x, y), x, y ∈ X, and thus A is a bounded linear operator. In fact, |Ax|X ∗ ≤ sup |σ(x, y)| ≤ M |x|. |y|≤1 Let R be the Riesz operator X ∗ → X, i.e., |Rx∗ |X = |x∗ | and (Rx∗ , x)X = hx∗ , xi for all x ∈ X, then Aˆ = RA represents the linear operator Aˆ ∈ L(X, X). Moreover, we define a linear operator A˜ on H by ˜ = Ax ∈ H Ax with ˜ = {x ∈ X : |σ(x, y)| ≤ cx |y|H for all y ∈ X}. dom (A) ˜ We will use the symbol A for all three That is, A˜ is a restriction of A on dom (A). linear operators as above in the lecture note and its use should be understood by the underlining context. Lax-Milgram Theorem Let X be a Hilbert space. Let σ be a (complex-valued) sesquilinear form on X × X satisfying σ(α x1 + β x2 , y) = α σ(x1 , y) + β σ(x2 , y) σ(x, α y1 + β y2 ) = α ¯ σ(x, y1 ) + β¯ σ(x, y2 ), |σ(x, y)| ≤ M |x||y| for all x, y ∈ X (Bounded) and Re σ(x, x) ≥ δ |x|2 for all x ∈ X and δ > 0 44 (Coercive). Then for each f ∈ X ∗ there exist a unique solution x ∈ X to σ(x, y) = hf, yiX ∗ ×X for all y ∈ X and |x|X ≤ δ −1 |f |X ∗ . Proof: Let us define the linear operator S from X ∗ into X by Sf = x, f ∈ X∗ where x ∈ X satisfies σ(x, y) = hf, yi for all y ∈ X. The operator S is well defined since if x1 , x2 ∈ X satisfy the above, then σ(x1 −x2 , y) = 0 for all y ∈ X and thus δ |x1 − x2 |2X ≤ Re σ(x1 − x2 , x1 − x2 ) = 0. Next we show that dom(S) is closed in X ∗ . Suppose fn ∈ dom(S), i.e., there exists xn ∈ X satisfying σ(xn , y) = hfn , yi for all y ∈ X and fn → f in X ∗ as n → ∞. Then σ(xn − xm , y) = hfn − fm , yi for all y ∈ X Setting y = xn − xm in this we obtain δ |xn − xm |2X ≤ Re σ(xn − xm , xn − xm ) ≤ |fn − fm |X ∗ |xn − xm |X . Thus {xn } is a Cauchy sequence in X and so xn → x for some x ∈ X as n → ∞. Since σ and the dual product are continuous, thus x = Sf . Now we prove that dom(S) = X ∗ . Suppose dom(S) 6= X ∗ . Since dom(S) is closed there exists a nontrivial x0 ∈ X such that hf, x0 i = 0 for all f ∈dom(S). Consider the linear functional F (y) = σ(x0 , y), y ∈ X. Then since σ is bounded F ∈ X ∗ and x0 = SF . Thus F (x0 ) = 0. But since σ(x0 , x0 ) = hF, x0 i = 0, by the coercivity of σ x0 = 0, which is a contradiction. Hence dom(S) = X ∗ . Assume that σ is coercive. By the Lax-Milgram theorem A has a bounded inverse S = A−1 . Thus, ˜ = A−1 H. dom (A) Moreover A˜ is closed. In fact, if ˜ → x and fn = Axn → f in H, xn ∈ dom (A) ˜ and Ax ˜ = f. then since xn = Sfn and S is bounded, x = Sf and thus x ∈ dom (A) If σ is symmetric, σ(x, y) = (x, y)X defines an inner product on X. and SF coincides with the Riesz representation of F ∈ X ∗ . Moreover, hAx, yi = hAy, xi for all x, y ∈ X. and thus A˜ is a self-adjoint operator in H. Example (Laplace operator) Consider X = H01 (Ω), H = L2 (Ω) and Z ∇u · ∇φ dx. σ(u, φ) = (u, φ)X = Ω 45 Then, ∂2 ∂2 u + u) ∂x21 ∂x22 Au = −∆u = −( and ˜ = H 2 (Ω) ∩ H 1 (Ω). dom (A) 0 for Ω with C 1 boundary or convex domain Ω. For Ω = (0, 1) and f ∈ L2 (0, 1) Z 1 Z 1 d d f (x)y(x) dx y u dt = 0 dx dx 0 is equivalent to Z 0 1 d d y( u+ dx dx Z 1 f (s) ds) dx = 0 x for all y ∈ H01 (0, 1). Thus, d u+ dx and therefore d dx u Z 1 f (s) ds = c (a constant) x ∈ H 1 (0, 1) and Au = − d2 u = f in L2 (0, 1). dx2 Example (Elliptic operator) Consider a second order elliptic equation ∂u = g at Γ1 u = 0 at Γ0 ∂ν where Γ0 and Γ1 are disjoint and Γ0 ∪ Γ1 = Γ. Integrating this against a test function φ, we have Z Z Z Z Auφ dx = (a(x)∇u · ∇φ + b(x) · ∇uφ + c(x)uφ) dx − gφ dsx = f (x)φ(x) dx, Au = −∇ · (a(x)∇u) + b(x) · ∇u + c(x)u(x) = f (x), Ω Ω Γ1 Ω for all φ ∈ C 1 (Ω) vanishing at Γ0 . Let X = HΓ10 (Ω) is the completion of C 1 (Ω) vanishing at Γ0 with inner product Z (u, φ) = ∇u · ∇φ dx Ω i.e., HΓ10 (Ω) = {u ∈ H 1 (Ω) : u|Γ0 = 0} Define the bilinear form σ on X × X by Z σ(u, φ) = (a(x)∇u · ∇φ + b(x) · ∇uφ + c(x)uφ. Ω Then, by the Green’s formula Z 1 σ(u, u) = (a(x)|∇u|2 + b(x) · ∇( |u|2 ) + c(x)|u|2 ) dx 2 Ω Z Z 1 1 2 2 = (a(x)|∇u| + (c(x) − ∇ · b) |u| ) dx + + n · b|u|2 dsx . 2 Ω Γ1 2 46 If we assume 0 < a ≤ a(x) ≤ a ¯, 1 c(x) − ∇ · b ≥ 0, 2 n · b ≥ 0 at Γ1 , then σ is bounded and coercive with δ = a. The Banach space version of Lax-Milgram theorem is as follows. Banach-Necas-Babuska Theorem Let V and W be Banach spaces. Consider the linear equation for u ∈ W for all v ∈ V a(u, v) = f (v) (3.15) for given f ∈ V ∗ , where a is a bounded bilinear form on W × V . The problem is well-posed in if and only if the following conditions hold: inf sup u∈W v∈V a(u, v) ≥δ>0 |u|W |v|V (3.16) a(u, v) = 0 for all u ∈ W implies v = 0 Under conditions we have the unique solution u ∈ W to (3.15) satisfies |u|W ≤ 1 |f |V ∗ . δ Proof: Let A be a bounded linear operator from W to V ∗ defined by hAu, vi = a(u, v) for all u ∈ W, v ∈ V. The inf-sup condition is equivalent to for any w?W |Aw|V ∗ ≥ δ|u|W , and thus the range of A, R(A), is closed in V ∗ and N (A) = 0. But since V is reflexive and hAu, viV ∗ ×V = hu, A∗ viW ×W ∗ from the second condition N (A∗ ) = {0}. It thus follows from the closed range and open mapping theorems that A−1 is bounded. Next, we consider the generalized Stokes system. Let V and Q be Hilbert spaces. We consider the mixed variational problem for (u, p) ∈ V × Q of the form a(u, v) + b(p, v) = f (v), b(u, q) = g(q) (3.17) for all v ∈ V and q ∈ Q, where a and b is bounded bilinear form on V × V and V × Q. If we define the linear operators A ∈ L(V, V ∗ ) and B ∈ L(V, Q∗ ) by hAu, vi = a(u, v) and hBu, qi = b(u, q) then it is equivalent to the operator form: A B∗ u f = . B 0 p g 47 Assume the coercivity on a a(u, u) ≥ δ |u|2V (3.18) and the inf-sup condition on b inf sup q∈P u∈V b(u, q) ≥β>0 |u|V |q|Q (3.19) Note that inf-sup condition that for all q there exists u ∈ V such that Bu = q and |u|V ≤ β1 |q|Q . Also, it is equivalent to |B ∗ p|V ∗ ≥ β |p|Q for all p ∈ Q. Theorem (Mixed problem) Under conditions (3.18)-(3.19) there exits a unique solution (u, p) ∈ V × Q to (3.17) and |u|V + |p|Q ≤ c (|f |V ∗ + |g|Q∗ ) Proof: For > 0 consider the penalized problem a(u , v) + b(v, P ) = f (v), for all v ∈ V (3.20) −b(u , q) + (p , q)Q = −g(q) for all q ∈ Q. By the Lax-Milgram theorem for every > 0 there exists a unique solution. From the first equation β |p |Q ≤ |f − Au |V ∗ ≤ |f |V ∗ + M |u |Q . Letting v = u and q = p in the first and second equation, we have δ |u |2V + |p |2Q ≤ |f |V ∗ ||u |V + |p |Q |g|Q∗ ≤ C (|f |V ∗ + |g|Q∗ )|u |V ), and thus |u |V and |p |Q as well, are bounded uniformly in > 0. Thus, (u , p ) has a weakly convergent subspace to (u, p) in V × Q and (u, p) satisfies (3.17). 3.4 Quadratic Constrained Optimization In this section we discuss the constrained minimization. First, we consider the case of equality constraint. Let A be coercive and self-adjoint operator on X and E ∈ L(X, Y ). Let σ : X × X → R is a symmetric, bounded, ceoercive bilinear form, i.e., σ(α x1 + β x2 , y) = α σ(x1 , y) + β σ(x2 , y) for all x1 , x2 , y ∈ X and α, β ∈ R σ(x, y) = σ(y, x), |σ(x, y)| ≤ M |x|X |y|X , σ(x, x) ≥ δ |x|2 for some 0 < δ ≤ M < ∞. Thus, p σ(x, y) defines p an inner product on X. In fact, since σ is bounded and coercive, σ(x, x) = (Ax, x) is an equivalent norm of X. Since y → σ(x, y) is a bounded linear functional on X, say x∗ and define a linear operator from X to X ∗ by x∗ = Ax. Then, A ∈ L(X, X ∗ ) with |A|X→X ∗ ≤ M and hAx, yiX ∗ ×X = σ(x, y) for all x, y ∈ X. Also, if RX ∗ →X is the Riesz map of X ∗ and define Aˆ = RX ∗ →X A ∈ LX, X), then ˆ y)X = σ(x, y) for all x, y ∈ X, (Ax, 48 and thus Aˆ is a self-adjoint operator in X. Throughout this section, we assume X = ˜ Consider the X ∗ and Y = Y ∗ , i.e., we identify A as the self-adjoint operator A. ∗ unconstrained minimization problem: for f ∈ X min 1 F (x) = σ(x, x) − f (x). 2 (3.21) Since for x, v ∈ X and t ∈ R F (x + tv) − F (x) t = σ(x, v) − f (v) + σ(v, v), t 2 x ∈ X → F (x) ∈ R is differentiable and we have the necessary optimality for (3.21): σ(x, v) − f (v) = 0 for all v ∈ X. (3.22) Or, equivalently Ax = f in X ∗ Also, if RX ∗ →X is the Riesz map of X ∗ and define A˜ = RX ∗ →X A ∈ L(X, X), then ˜ y)X = σ(x, y) for all x, y ∈ X, (Ax, and thus A˜ is a self-adjoint operator in X. Throughout this section, we assume X = X ∗ ˜ and Y = Y ∗ , i.e., we identify A as the self-adjoint operator A. Consider the equality constraint minimization; 1 (Ax, x)X − (a, x)X 2 min subject to Ex = b ∈ Y, (3.23) or equivalently for f ∈ X ∗ min 1 σ(x, x) − f (x) 2 subject to Ex = b ∈ Y. The necessary optimality for (3.23) is: Theorem 6 (Equality Constraint) Suppose there exists a x ¯ ∈ X such that E x ¯=b then (3.23) has a unique solution x∗ ∈ X and Ax∗ − a ⊥ N (E). If R(E ∗ ) is closed, then there exits a Lagrange multiplier λ ∈ Y such that Ax∗ + E ∗ λ = a, Ex∗ = b. (3.24) If range(E) = Y (E is surjective), there exits a Lagrange multiplier λ ∈ Y is unique. Proof: Suppose there exists a x ¯ ∈ X such that E x ¯ = b then (3.23) is equivalent to min 1 (A(z + x ¯), z + x ¯) − (a, z + x ¯) 2 over z ∈ N (E). Thus, it follows from Theorem 2 that it has a unique minimizer z ∗ ∈ N (E) and (A(z ∗ + x ¯) − a, v) = 0 for all v ∈ N (E) and thus A(z ∗ + x ¯) − a ⊥ N (E). From the ∗ ∗ orthogonal decomposition theorem Ax − a ∈ R(E ). Moreover if R(E ∗ ) is surjective, from the closed range theorem R(E ∗ ) = R(E ∗ ) and there exist exists a λ ∈ Y satisfying 49 Ax∗ + E ∗ λ = a. If E is surjective (R(E) = Y ), then N (E ∗ ) = 0 and thus the multiplier λ is unique. Remark (Lagrange Formulation) Define the Lagrange functional L(x, λ) = J(x) + (λ, Ex − b). The optimality condition (3.24) is equivalent to ∂ L(x∗ , λ) = 0, ∂x ∂ L(x∗ , λ) = 0. ∂λ Here, x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y is the dual variable. Remark (1) If E is not surjective, one can seek a solution using the penalty formulation: 1 β min (Ax, x)X − (a, x)X + |Ex − b|2Y (3.25) 2 2 for β > 0. The necessary optimality condition is Axβ + β E ∗ (Exβ − b) − a = 0 If we let λβ = β (Ex − b), then we have A E∗ a xβ . = E − βI b λβ the optimality system (3.24) is equivalent to the case when β = ∞. One can prove the monotonicity J(xβ ) ↑, |Exβ − b|Y ↓ as β → ∞ In fact for β > βˆ we have J(xβ ) + β2 |Exβ − b|2Y ≤ J(xβˆ) + β2 |Exβˆ − b|2Y ˆ ˆ J(xβˆ) + β2 |Exβˆ − b|2Y ≤ J(xβ ) + β2 |Exβ − b|2Y . Adding these inequalities, we obtain 2 2 ˆ (β − β)(|Ex β − b|Y − |Exβˆ − b|Y ) ≤ 0, which implies the claim. Suppose there exists a x ¯ ∈ X such that E x ¯ = b then from Theorem 6 β J(xβ ) + |Exβ − b|2 ≤ J(x∗ ). 2 and thus |xβ |X is bounded uniformly in β > 0 and lim |Exβ − b| = 0 β→∞ Since X is a Hilbert space, there exists a weak limit x ∈ X with J(x) ≤ J(x∗ ) and Ex = b since norms are weakly sequentially lower semi-continuous. Since the optimal solution x∗ to (3.23) is unique, lim xβ = x∗ as β → ∞. Suppose λβ is bounded in Y , 50 then there exists a subsequence of λβ that converges weakly to λ in Y . Then, (x∗ , λ) satisfies Ax∗ + E ∗ λ = a. Conversely, if there is a Lagrange multiplier λ∗ ∈ Y , then λβ is bounded in Y . In fact, we have (A(xβ − x∗ ), xβ − x∗ ) + (xβ − x∗ , E ∗ (λβ − λ∗ )) = 0 where β (xβ − x∗ , E ∗ (λβ − λ∗ )) = β (Exβ − b, λβ − λ∗ ) = |λβ |2 − (λβ , λ∗ ). Thus we obtain β (xβ − x∗ , A(xβ − x∗ )) + |λβ |2 = (λβ , λ∗ ) and hence |λβ | ≤ |λ∗ |. Moreover, if R(E) = Y , then N (E ∗ ) = {0}. Since λβ satisfies E ∗ λβ = a − Axβ ∈ X, we have λβ = Esβ , sβ = (E ∗ E)−1 (a − Axβ ) and thus λβ ∈ Y is bounded uniformly in β > 0. (2) If E is not surjective but R(E) is closed. We have λ ∈ R(E) ⊂ Y such that Ax∗ + E ∗ λ = a by letting Y = R(E) is a Hilbert space. Also, we have (xβ , λβ ) → (x, λ) satisfying Ax + E ∗ λ = a Ex = PR(E) b since |Ex − b|2 = |Ex − PR(E) b|2 + |PN (E ∗ ) b|2 . Example (Parameterized constrained problem) min 1 2 ((Ay, y) + α (p, p)) − (a, y) subject to E1 y + E2 p = b where x = (y, p) ∈ X1 × X2 = X and E1 ∈ L(X1 , Y ), E2 ∈ L(X2 , Y ). Assume Y = X1 and E1 is bounded invertible. Then we have the optimality Ay ∗ + E1∗ λ = a α p + E2∗ λ = 0 E1 y ∗ + E2 p∗ = b Example (Stokes system) Let X = H01 (Ω) × H01 (Ω) where Ω is a bounded open set in R2 . A Hilbert space H01 (Ω) is the completion of C0 (Ω) with respect to the H01 (Ω) inner product Z (f, g)H01 (Ω) = ∇f · ∇g dx Ω 51 Consider the constrained minimization Z 1 ( (|∇u1 |2 + |∇u2 |2 ) − f · u) dx Ω 2 over the velocity field u = (u1 , u2 ) ∈ X satisfying div u = ∂ ∂ u1 + u2 = 0. ∂x1 ∂x2 From Theorem 5 with Y = L2 (Ω) and Eu = div u on X we have the necessary optimality −∆u + gradp = f, div u = 0 which is called the Stokes system. Here f ∈ L2 (Ω) × L2 (Ω) is a applied force p = −λ∗ ∈ L2 (Ω) is the pressure. 3.5 Variational Inequalities Let Z be a Hilbert lattice with order ≤ and Z ∗ = Z and G ∈ L(X, Z). For example, Z = L2 (Ω) with a.e. pointwise inequality constarint. Consider the quadratic programming: 1 min (Ax, x)X − (a, x)X 2 subject to Ex = b, Gx ≤ c. (3.26) Note that the constrain set C = {x ∈ X, Ex = b, Gx ≤ c} is a closed convex set in X. In general we consider the constrained problem on a Hilbert space X: F0 (x) + F1 (x) over x ∈ C (3.27) where C is a closed convex set, F : X → R is C 1 and F1 : X → R is convex, i.e. F1 ((1 − λ)x1 + λ x2 ) ≥ (1 − λ) F1 (x1 ) + λ F (x2 ) for all x, x2 ∈ X and 0 ≤ λ ≤ 1. Then, we have the necessary optimality in the form of the variational inequality: Theorem 7 (Variational Inequality) Suppose x∗ ∈ C minimizes (3.27), then we have the optimality condition F00 (x∗ )(x − x∗ ) + F1 (x) − F1 (x∗ ) ≥ 0 for all x ∈ C. (3.28) Proof: Let xt = x∗ + t (x − x∗ ) for all x ∈ C and 0 ≤ t ≤ 1. Since C is convex, xt ∈ C. Since x∗ ∈ C is optimal, F0 (xt ) − F0 (x∗ ) + F1 (xt ) − F1 (x∗ ) ≥ 0, where F1 (xt ) ≤ (1 − t)F1 (x∗ ) + tF (x) and thus F1 (xt ) − F1 (x∗ ) ≤ t (F1 (x) − F1 (x∗ )). 52 (3.29) Since F0 (xt ) − F0 (x∗ ) → F00 (x∗ )(x − x∗ ) as t → 0+ , t the variational inequality (3.28) follows from (3.29) by letting t → 0+ . Example (Inequality constraint) For X = Rn F 0 (x∗ )(x − x∗ ) ≥ 0, x∈C implies that F 0 (x∗ ) · ν ≤ 0 and F 0 (x∗ ) · τ = 0, where ν is the outward normal and τ is the tangents of C at x∗ . Let X = R2 , C = [0, 1] × [0, 1] and assume x∗ = [1, s], 0 < s < 1. Then, ∂F ∗ (x ) ≤ 0, ∂x1 ∂F ∗ (x ) = 0. ∂x2 ∂F ∗ (x ) ≤ 0, ∂x1 ∂F ∗ (x ) ≤ 0. ∂x2 If x∗ = [1, 1] then Example (L1 -optimization) Let U be a closed convex subset in R. Consider the minimization problem on X = L2 (Ω) and C = {u ∈ U a.e. in Ω}; Z 1 2 min |Eu − b|Y + α |u(x)| dx. over u ∈ C. (3.30) 2 Ω Then, the optimality condition is ∗ Z ∗ (|u| − |u∗ |) dx ≥ 0 (Eu − b, E(u − u ))Y + α (3.31) Ω for all u ∈ C. Let p = −E ∗ (Eu∗ − b) ∈ X and u = v ∈ U on |x − x ¯| ≤ δ, otherwise u = u∗ . From (4.24) we have Z 1 (−p(v − u∗ (x)) + α (|v| − |u∗ (x)|) dx ≥ 0. δ |x−¯x|≤δ Suppose u∗ ∈ L1 (Ω) and letting δ → 0+ we obtain −p(v − u∗ (¯ x)) + α (|v| − |u∗ (¯ x)|) ≥ 0 a.e. x ¯ ∈ Ω (Lebesgue points of |u∗ | and for all v ∈ U . That is, u∗ ∈ U minimizes −pu + α |u| over u ∈ U. (3.32) ∗ Suppose U = R it implies that if |p| < α, then u∗ = 0 and otherwise p = α |uu∗ | = α sign(u∗ ). Thus, we obtain the necessary optimality (E ∗ (Eu∗ − b))(x) + α ∂|u|(u∗ (x)) = 0, 53 a.e. in Ω, where the sub-differential of |u| (see, Section ) is defined by s>0 1 [−1, 1] s=0 ∂|u|(s) = −1 s < 0. Theorem 8 (Quadratic Programming) Consider the quadratic programming (3.26): min 1 (Ax, x)X − (a, x) subject to Ex = b, Gx ≤ c. 2 If there exists x ¯ ∈ X such that E x ¯ = b and G¯ x ≤ c, then there exits unique solution to (3.26) and (Ax∗ − a, x − x∗ ) ≥ 0 for all x ∈ C, (3.33) where C = {Ex = b, Gx ≤ c}. Moreover, if R(E) = Y and assume the regular point condition 0 ∈ int {G N (E) − Z − + G(x∗ ) − c}, then there exists λ ∈ Y and µ ∈ Z + such that Ax∗ + E ∗ λ + G∗ µ = a Ex∗ = b Gx∗ − c ≤ 0, µ ≥ 0 and (3.34) (Gx∗ − c, µ)Z = 0. Proof: First note that (3.26) is equivalent to min 1 (A(z + x ¯, z + x ¯)X − (a, z + x ¯)X 2 subject to Ez = 0, Gz ≤ c − G¯ x = cˆ Since Cb = {Ez = 0, Gz ≤ cˆ} is a closed convex set in the Hilbert space N (E). As in the proof of Theorem 1 the minimizing sequence zn ∈ Cb satisfies 1 |zm − zn | ≤ J(zn ) + J(zm ) − 2 δ → 0 4 as m ≥ n → ∞. Since Cb is closed, there exists a unique minimizer z ∗ ∈ Cb and z ∗ satisfies b (A(z ∗ + x ¯) − a, z − z ∗ ) ≥ 0 for all z ∈ C, which is equivalent to (3.33). Moreover, it follows from the Lagrange Multiplier theory (below) if the regular point condition holds, then we obtain (3.34). Example Suppose Z = Rm . If (Gx∗ − c)i = 0, then i is called an active index and let Ga the (row) vectors of G corresponding to the active indices. Suppose E is surjective (Kuhn-Tuker condition), Ga then the regular point condition holds. 54 Remark (Lagrange Formulation) Define the Lagrange functional; L(x, λ, µ) = J(x) + (λ, Ex − b) + (µ, (Gx − c)+ ). for (x, λ) ∈ X × Y, µ ∈ Z + ; The optimality condition (3.34) is equivalent to ∂ L(x∗ , λ, µ) = 0, ∂x ∂ L(x∗ , λ, µ) = 0, ∂λ ∂ L(x∗ , λ, µ) = 0 ∂µ Here x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y and µ ∈ Z + are the dual variables. Remark Let A ∈ L(X, X ∗ ) and E : X → Y is surjective. If we define G is natural injection X to Z. Define µ ∈ X ∗ by −µ = Ax∗ − a + E ∗ λ. Since ht(x∗ − c) − (x∗ − c), µi = (t − 1) hx∗ − c, µi ≤ 0 for t > 0 we have hx∗ − c, µi = 0. Thus, hx − c, µi ≤ 0 for all x − c ∈ Z + ∩ X ∗ by If we define the dual cone X+ ∗ X+ = {x∗ ∈ X ∗ : hx∗ , xi ≤ 0 for all x ∈ X and x ≤ 0}, ∗ . Thus, the regular point condition gives the stronger regularity µ ∈ Z + then µ ∈ X+ for the Lagrange multiplier µ. Example (Source Identification) Let X = Z = L2 (Ω) and S : L(X, Y ). and consider the constrained minimization: min 1 (|Sf − y|2Y + α |f |2 ) 2 subject to −f ≤ 0. In this case, A = α I + S ∗ S. α > 0, a = S ∗ y and G = −I. The regular point condition (??) is given by 0 ∈ int {(−Z + f ∗ ) − Z − − f ∗ }, Since −Z + f ∗ contains Z − and the regular point condition holds. Hence there exists µ ∈ Z + such that α f ∗ + S ∗ Sf ∗ + µ = S ∗ y, µ ≥ 0 and (−f ∗ , µ) = 0. For example, the operator S is defined for Poisson equation: −∆u = f in Ω ∂ + u = 0 at Γ = ∂Ω and Sf = Cu with a measurement ∂u ˆ a subdomain of Ω. operator C, e.g., Cu = u|Γ or Cu = u|Ωˆ with Ω, with a boundary condition Remark (Penalty Method) Consider the penalized problem J(x) + β β |Ex − b|2Y + | max(0, Gx − c)|2Z . 2 2 55 The necessary optimality condition is given by Ax + E ∗ λβ + G∗ µβ = a, λβ = β (Ex − b), µβ = β (Gx − c)+ . It follows from Theorem 7 that there exists a unique solution xβ to (3.26) and satisfies J(xβ ) + β β |Exβ − b|2Y + | max(0, Gxβ − c)|2Z ≤ J(x∗ ) 2 2 (3.35) Thus, {xβ } is bounded in X and has a weak convergence subsequence that converges to x in X. Moreover, we have Ex = b and max(0, Gx − c) = 0 and J(x) ≤ J(x∗ ) since the norm is weakly sequentially lower semi-continuous. Since the minimizer to (3.26) is unique, x = x∗ and limβ→∞ J(xβ ) = J(x∗ ) which implies that xβ → x∗ strongly in X. Suppose λβ ∈ Y and µβ ∈ Z are bounded, there exists a subsequence that converges weakly to (λ, µ) in Y × Z and µ ≥ 0. Since limβ→∞ J(xβ ) = J(x∗ ) it follows from (3.35) that (µβ , Gxβ − c) → 0 as β → ∞ and thus (µ, Gx∗ − c) = 0. Hence (x∗ , λ, µ) satisfies (3.34). Conversely, if there is a Lagrange multiplier (λ∗ , µ∗ ) ∈ Y , then (λβ , µβ ) is bounded in Y × Z. In fact, we have (xβ − x∗ , A(xβ − x∗ )) + (xβ − x∗ , E ∗ (λβ − λ∗ )) + (xβ − x∗ , G∗ (µβ − µ∗ )) = 0 where β (xβ − x∗ , E ∗ (λc − λ∗ )) = β (Exβ − b, λβ − λ∗ ) = |λβ |2 − (λβ , λ∗ ). and β (xβ − x∗ , G∗ (µβ − µ∗ )) = β (Gxβ − c − (Gx∗ − c), µβ − µ∗ ) ≥ |µβ |2 − (µβ , µ∗ ). Thus we obtain β (xβ − x∗ , A(xβ − x∗ )) + |λβ |2 + |µβ |2 ≤ (λβ , λ) + (µβ , µ∗ ) and hence |(λβ , µβ )| ≤ |(λ∗ , µ∗ )|. It will be shown in Section 5 the regular point condition implies that (λβ , µβ ) is uniformly bounded in β > 0. Example (Quadratic Programming on Rn ) Let X = Rn . Given a ∈ Rn , b ∈ Rm and c ∈ Rp consider 1 min (Ax, x) − (a, x) 2 subject to Bx = b, Cx ≤ c p×n . Let C the (row) where A ∈ Rn×n is symmetric and positive, B ∈ Rm×n and C ∈ R a B vectors of C corresponding to the active indices and assuming is surjective, Ca 56 the necessary and sufficient optimality condition is given by Ax∗ + B t λ + C t µ = a Bx∗ = b (Cx∗ − c)j ≤ 0 µj ≥ 0 and (Cx∗ − c)j µj = 0. Example (Minimum norm problem with inequality constraint) min |x|2 subject to (x, yi ) ≤ ci , 1 ≤ ı ≤ m. Assume {yi } are linearly independent. Then it has a unique solution x∗ and the necessary and sufficient optimality is given by P x∗ + m i=1 µi yi = 0 (x∗ , yi ) − ci ≤ 0, µi ≥ 0 and ((x∗ , yi ) − ci )µi = 0, 1 ≤ i ≤ m. That is, if G is Grammian (Gij = (yi , yj )) then (−Gx∗ − c)i ≤ 0 µi ≥ 0 and (−Gx∗ − c)i µi = 0, 1 ≤ i ≤ m. Example (Pointwise Obstacle Problem) Consider Z min 0 1 1 d | u(x)|2 − 2 dx Z 1 f (x)u(x) dx 0 1 subject to u( ) ≤ c 2 over u ∈ X = H01 (0, 1). Let f = 1. The Riesz representation of Z 1 1 F1 (u) = f (t)u(t) dt, F2 (u) = u( ) 2 0 are 1 1 1 1 y1 (x) = x(1 − x). y2 (x) = ( − |x − |), 2 2 2 2 respectively. Assume that f = 1. Then, we have u∗ − y1 + µ y2 = 0, 1 µ 1 u∗ ( ) = − ≤ c, 2 8 4 µ ≥ 0. Thus, if c ≥ 81 , then u∗ = y1 and µ = 0. If c ≤ 18 , then u∗ = y1 − µ y2 and µ = 4(c − 18 ). 3.6 Constrained minimization in Banach spaces and Lagrange multiplier theory In this section we discuss the general nonlinear programming. Let X, Y be Hilbert spaces and Z be a Hilbert lattice. Consider the constrained minimization; min J(x) subject to E(x) = 0 and G(x) ≤ 0. 57 (3.36) over a closed convex set C in X. Here, where J : X → R, E : X → Y and G : X → Z are continuously differentiable. Definition (Derivatives) (1) The functional F on a Banach space X is Gateaux differentiable at u ∈ X if for all d ∈ X lim t→0 F (u + td) − F (u) = F 0 (u)(d) t exists. If the G-derivative d ∈ X → F 0 (u)(d) ∈ R is liner and bounded, then F is Frechet differentiable at u. (2) E : X → Y , Y ia Banach space is Frechet differentiable at u if there exists a linear bounded operator E 0 (u) such that |E(u + d) − E(u) − E 0 (u)d|Y = 0. |d|X |d|X →0 lim If Frechet derivative u ∈ X → E 0 (u) ∈ L(X, Y ) is continuous, then E is continuously differentiable. Definition (Lower semi-continuous) (1) A functional F is lower-semi continuous if lim inf F (xn ) ≥ F ( lim xn ) n→∞ n→∞ (2) A functional F is weakly lower-semi continuous if lim inf F (xn ) ≥ F (w − limn→∞ xn ) n→∞ Theorem (Lower-semicontinuous) (1) Norm is weakly lower-semi continuous. (2) A convex lower-semicontinuous functional is weakly lower-semi continuous. Proof: Assume xn → x weakly in X. Let x∗ ∈ F (x), i.e., hx∗ , xi = |x∗ ||x|. Then, we have |x|2 = lim hx∗ , xn i n→∞ and |hx∗ , xn i| ≤ |xn ||x∗ |. Thus, lim inf |xn | ≥ |x|. n→∞ (2) Since F is convex, X X F( tk xk ) ≤ tk F (xk ) k k P for all convex combination of xk , i.e., tk = 1, tk ≥ 0. By the Mazur lemma there exists a sequence of convex combination of weak convergent sequence ({xk }, {F (xk )}) to (x, F (x)) in X × R that converges strongly to (x, F (x)) and thus F (x) ≤ lim inf n → ∞ F (xn ). Theorem (Existence) (1) A lower-semicontinuous functional F on a compact set S has a minimizer. (2) A weak lower-semicontinuous, coercive functional F has a minimizer. 58 Proof: (1) One can select a minimizing sequence {xn } such that F (xn ) ↓ η = inf x∈S F (x). Since S is a compact, there exits a subsequaence that converges strongly to x∗ ∈ S. Since F is lower-semicontinuous, F (x∗ ) ≤ η. (2) Since F (x) → ∞ as |x| → ∞, there exits a bounded minimizing sequence {xn } of F . A bounded sequence in a reflexible Banach space X has a weak convergent subsequence to x∗ ∈ X. Since F is weakly lower-semicontinuous, F (x∗ ) ≤ η. Next, we consider the constrained optimization in Banach spaces X, Y : min subject to G(y) ∈ K F (y) (3.37) over y ∈ C, where K is a closed convex cone of a Banach space Y , F : X → R and G : X → Y are C1 and C is a closed convex set of X. For example, if K = {0}, then G(y) = 0 is the equality constraint. If Y = Z is a Hilbert lattice and K = {z ∈ Z : z ≤ 0}, then G(y) ≤ 0 is the inequality constraint. Then, we have the Lagrange multiplier theory: Theorem (Lagrange Multiplier Theory) Let y ∗ ∈ C be a solution to (3.37) and assume the regular point condition 0 ∈ int {G0 (y ∗ )(C − y ∗ ) − K + G(y ∗ )} (3.38) holds. Then, there exists a Lagrange multiplier λ ∈ K + = {y ∈ Y : (y, z) ≤ 0 for all z ∈ K} satisfying (λ, G(y ∗ ))Y = 0 such that (F 0 (y ∗ ) + G0 (y ∗ )∗ λ, y − y ∗ )X ≥ 0 for all y ∈ C. (3.39) Proof: Let F = {y ∈ C : G(y) ∈ K} be the feasible set. The sequential tangent cone of F at y ∗ is defined by yn − y ∗ , tn → 0+ , yn ∈ F}. n→∞ tn T (F, y ∗ ) = {x ∈ X : x = lim It is easy show that (F 0 (y ∗ ), y − y ∗ ) ≥ 0 for all y ∈ T (F, y ∗ ). Let C(y ∗ ) = ∪s≥0, y∈C {s(y − y ∗ )}, K(G(y ∗ )) = {K − t G(y ∗ ), t ≥ 0}. and define the linearizing cone L(F, y ∗ ) of F at y ∗ by L(F, y ∗ ) = {z ∈ C(y ∗ ) : G0 (y ∗ )z ∈ K(G(y ∗ ))}. It follows the Listernik’s theorem [] that L(F, y ∗ ) ⊆ T (F, y ∗ ), and thus (F 0 (y ∗ ), z) ≥ 0 for all z = y − y ∗ ∈ L(F, y ∗ ), 59 (3.40) Define the convex cone B by B = {(F 0 (y ∗ )C(y ∗ ) + R+ , G0 (y ∗ )C(y ∗ ) − K(G(y ∗ ))} ⊂ R × Y. By the regular point condition B has an interior point. From (3.40) the origin (0, 0) is a boundary point of B and thus there exists a hyperplane in R × Y that supports B at (0, 0), i.e., there exists a nontrivial (α, λ) ∈ R × Y such that α (F 0 (y ∗ )z + r) + (λ, G0 (y ∗ )z − y) ≥ 0 for all z ∈ C(y ∗ ), y ∈ K(G(y ∗ )) and r ≥ 0. Setting (z, r) = (0, 0), we have (λ, y) ≤ 0 for all y ∈ K(G(y ∗ )) and thus λ ∈ K+ and (λ, G(y ∗ )) = 0. Letting (r, y) = (0, 0) and z ∈ L(F, y ∗ ), we have α ≥ 0. If α = 0, the regular point condition implies λ = 0, which is a contradiction. Without loss of generality we set α = 1 and obtain (3.39) by setting (r, y) = 0. Corollary (Nonlinear Programming ) If there exists x ¯ such that E(¯ x) = 0 and G(¯ x) ≤ 0, then there exits a solution x∗ to (3.36). Assume the regular point condition 0 ∈ int {E 0 (x∗ )(C − x∗ ) × (G0 (x∗ )(C − x∗ ) − Z − + G(x∗ ))}, then there exists λ ∈ Y and µ ∈ Z such that (J 0 (x∗ ) + (E 0 (x∗ ))∗ λ + (G0 (x∗ )∗ µ, x − x∗ ) ≥ 0 for all x ∈ C, E(x∗ ) = 0 G(x∗ ) ≤ 0, (3.41) µ≥0 and (G(x∗ ), µ)Z = 0. Next, we consider the parameterized nonlinear programming. Let X, Y, U be Hilbert spaces and Z be a Hilbert lattice. We consider the constrained minimization of the form min J(x, u) ˆ subject to E(x, u) = 0 and G(x, u) ≤ 0 over (x, u) ∈ X × C, (3.42) where J : X × U → R, E : X × U → Y and G : X × U → Z are continuously differentiable and Cˆ is a closed convex set in U . Here we assume that for given u ∈ Cˆ there exists a unique x = x(u) ∈ X that satisfies E(x, u) = 0. Thus, one can eliminate x ∈ X as a function of u ∈ U and thus (3.42) can be reduced to the constrained ˆ But it is more advantageous to analyze (3.42) directly using minimization over u ∈ C. Theorem (Lagrange Mulplier). Corollary (Control Problem) Let (x∗ , u∗ ) be a minimizer of (3.42) and assume the regular point holds. It follows from Theorem 8 that the necessary optimality condition is given by Jx (x∗ , u∗ ) + Ex (x∗ , u∗ )∗ λ + Gx (x∗ , u∗ )∗ µ = 0 (Ju (x∗ , u∗ ) + Eu (x∗ , u∗ )∗ λ + Gu (x∗ , u∗ )∗ µ, u − u∗ ) ≥ 0 for all u ∈ Cˆ (3.43) ∗ ∗ E(x , u ) = 0 G(x∗ , µ∗ ) ≤ 0, µ ≥ 0, and (G(x∗ , u∗ ), µ) = 0. 60 Example (Control problem) Consider the optimal control problem T Z (`(x(t)) + h(u(t))) dt + G(x(T )) min 0 subject to d x(t) − f (x, u) = 0, dt (3.44) x(0) = x0 ˆ = a closed convex set in Rm . u(t) ∈ U We let X = H 1 (0, T ; Rn ), U = L2 (0, T : Rm ), Y = L2 (0, T ; Rn ) and Z T (`(x) + h(u)) dt + G(x(T )), J(x, u) = E = f (x, u) − 0 d x dt ˆ a.e in (0, T )}. From (3.43) we have and Cˆ = {u ∈ U, u(t) ∈ U Z T (`0 (x∗ (t))φ(t) + (p(t), fx (x∗ , u∗ )φ(t) − 0 d φ) dt + G0 (x(T ))φ(T ) = 0 dt for all φ ∈ H 1 (0, T ; Rn ) with φ(0) = 0. By the integration by part T Z 0 Z ( T (`0 (x∗ ) + fx (x∗ , u∗ )t p) ds + G0 (x∗ (T )) − p(t), t dφ ) dt = 0. dt Since φ is arbitrary, we have 0 ∗ Z p(t) = G (x (T )) + T (`0 (x∗ ) + fx (x∗ , u∗ )t p) ds t and thus the adjoint state p is absolutely continuous and satisfies − d p(t) = fx (x∗ , u∗ )t p(t) + `0 (x∗ ), dt p(T ) = G0 (x∗ (T )). From the the optimality, we have Z T (h0 (u∗ ) + fu (x∗ , u∗ )t p(t))(u(t) − u∗ (t)) dt 0 ˆ which implies for all u ∈ C, ˆ (h0 (u∗ (t)) + fu (x∗ (t), u∗ (t))t p(t))(v − u∗ (t)) ≥ 0 for all v ∈ U at Lebesgue points t of u∗ (t). Hence we obtain the necessary optimality is of the form of Two-Point-Boundary value problem; d ∗ ∗ ∗ x(0) = x0 dt x (t) = f (x , u ), ˆ (h0 (u∗ (t)) + fu (x∗ (t), u∗ (t))t p(t), v − u∗ (t)) ≥ 0 for all v ∈ U d − dt p(t) = fx (x∗ , u∗ )t p(t) + lx (x∗ ), p(T ) = Gx (x∗ (T )). 61 ˆ = [−1, 1] coordinate-wise and h(u) = α |u|2 , f (x, u) = f˜(x) + Bu. Thus, the Let U 2 optimality condition implies (α u∗ (t) + B t p(t), v − u∗ (t))Rm ≥ 0 fur all |v|∞ ≤ 1. Or, equivalently u∗ (t) minimizes α 2 |u| + (B t p(t), u) over |u|∞ ≤ 1 2 Thus, we have 1 t u∗ (t) = − B αp(t) −1 B t p(t) α if − >1 t if | B αp(t) | ≤ 1 if − B t p(t) α (3.45) < −1. ˜ be a subset Example (Coefficient estimation) Let ω be a bounded open set in Rd and Ω in Ω. Consider the parameter estimation problem Z Z 2 min J(u, c) = |y − u| dx + |∇c|2 dx ˜ Ω Ω subject to −∆u + c(x)u = f, c≥0 over (u, c) ∈ H01 (Ω) × H 1 (Ω). We let Y = (H01 (Ω))∗ and hE(u, c), φi = (∇u, ∇φ) + (c u, φ)L2 (Ω) − hf, φi for all φ ∈ H01 (Ω). Since hJx (h) + Ex∗ λ(h) = (∇h, ∇λ) + (c h, λ)L2 (ω) − ((y − u) χΩ˜ , h) = 0, for all h ∈ H01 (Ω), i.e., the adjoint λ ∈ H01 (Ω) satisfies −∆λ + c λ = (y − u) χΩ˜ . Also, there exits µ ≥ 0 in L2 (Ω) such that hJc + Ec∗ λ, di = α (∇c∗ , ∇d) + (d u, λ)L2 (Ω) − (µ, d) = 0 for all d ∈ H 1 (Ω) and hµ, c − c∗ i = 0 and hµ, di ≥ 0 for all d ∈ H 1 (Ω) satisfying d ≥ 0. 3.7 Minimum norm solution in Banach space In this section we discuss the minimum distance and the minimum norm problem in Banach spaces. Given a subspace M of a normed space X, the orthogonal complement M ⊥ of M is defined by M ⊥ = {x∗ ∈ X ∗ : hx∗ , xi = 0 for all x ∈ M }. 62 Theorem If M is a closed subspace of a normed space, then (M ⊥ )⊥ = {x ∈ X : hx, x∗ i = 0 for all x∗ ∈ M ⊥ } = M Proof: If there exits a x ∈ (M ⊥ )⊥ but x ∈ / M , define a linear functional f by f (α x + m) = α on the space spanned by x + M . Since x ∈ / m and M is closed |f | = |f (α x + m)| |f (x + m)| 1 = sup = < ∞, inf m∈M |x + m|X m∈M, α∈R |α x + m|X m∈M |x + m|X sup and thus f is bounded. By the Hahn-Banach theorem, f can be extended to x∗ ∈ X ∗ . Since hx∗ , mi = 0 for all m ∈ M , thus x∗ ∈ M ⊥ . It is contradicts to the fact that hx∗ , xi = 1. Theorem (Duality I) Given x ∈ X and a subspace M , we have d = inf |x − m|X = m∈M max |x∗ |≤1, x∗ ∈M ⊥ hx∗ , xi = hx∗0 , xi, where the maximum is attained by some x∗0 ∈ M ⊥ with |x∗0 | = 1. If the infimum on the first equality holds for some m0 ∈ M , then x∗0 is aligned with x − m0 . Proof: By the definition of inf, for any > 0, there exists m ∈ M such that |x − m | ≤ d + . For any x∗ ∈ M ⊥ , |x∗ | ≤ 1, it follows that hx∗ , xi = hx∗ , x − m i ≤ |x∗ ||x − m | ≤ d + . Since > 0 is arbitrary, hx∗ , xi ≤ d. It only remains to show that hx∗0 , xi = d for some x∗0 ∈ X ∗ . Define a linear functional f on the space spanned by x + M by f (α x + m) = α d Since |f | = d |f (α x + m)| = =1 inf m∈M |x + m|X m∈M, α∈R |α x + m|X sup f is bounded. By the Hahn-Banach theorem, there exists an extension x∗0 of f with |x∗0 | = 1. Since hx∗0 , mi = 0 for all m ∈ M , x∗0 ∈ M ⊥ . By construction hx∗0 , xi = d. ???Moreover, if |x − m0 | = d for some m0 ∈ M , then hx∗0 , x − m0 i = d = |x∗0 ||x − m0 |. 3.7.1 Duality Principle Corollary (Duality) If m0 ∈ M attains the minimum if and only if there exists a nonzero x∗ ∈ M ⊥ that is aligned with x − m0 . Proof: Suppose that x∗ ∈ M ⊥ is aligned with x − m0 , |x − m0 | = hx∗ , x − m0 i = hx∗ , xi = hx∗ , xi ≤ |x − m| for all m ∈ M . 63 Theorem (Duality principle) (1) Given x∗ ∈ X ∗ , we have min |x∗ − m∗ |X ∗ = sup m∗ ∈M ⊥ hx∗ , xi. x∈M, |x|≤1 (2) The minimum of the left is achieved by some m∗0 ∈ M ⊥ . (3) If the supremum on the right is achieved by some x0 ∈ M , then x∗ − m∗0 is aligned with x0 . Proof: (1) Note that for m∗ ∈∈ M ⊥ |x∗ − m∗ | = sup hx∗ − m∗ , xi ≥ |x|≤1 hx∗ − m∗ , xi = sup |x|≤1, x∈M sup hx∗ , xi. |x|≤1, x∈M Consider the restriction of x∗ to M (with norm supx∈M, |x|≤1 hx∗ , xi. Let y ∗ be the Hahn-Banach extension of this restricted x∗ . Let m∗0 = x∗ − y ∗ . Then m∗0 ∈ M ⊥ and |x∗ − m∗0 | = |y ∗ | = supx∈M, |x|≤1 hx∗ , xi. Thus, the minimum of the left is achieved by some m∗0 ∈ M ⊥ . If the supremum on the right is achieved by some x0 ∈ M , then x∗ −m∗0 is aligned with x0 . Obviously, |x0 | = 1 and |x∗ −m∗0 | = hx∗ , x0 i = hx∗ −m∗0 , x0 i. In all applications of the duality theory, we use the alignment properties of the space and its dual to characterize optimum solutions, guarantee the existence of a solution by formulating minimum norm problems in the dual space, and examine to see if the dual problem is easier than the primal problem. Example 1(Chebyshev Approximation) Problem is to to find a polynomial p of degree less than or equal to n that best approximates f ∈ C[0, 1] in the the sense of the supnorm over [a, b]. Let M be the space of all polynomials of degree less than or equal to n. Then, M is a closed subspace of C[0, 1] of dimension n + 1. The infimum of |f − p|∞ is achievable by some p0 ∈ M since M is closed. The Chebyshev theorem states that the set Γ = {t ∈ [0, 1] : |f (t) − p0 (t)| = |f − p0 |∞ } contains at least n + 2 points. In fact, f − p0 must be aligned with some element in M ⊥ ⊂ C[0, 1]∗ = BV [0, 1]. Assume Γ contains m < n + 2 points 0 ≤ t1 < · · · < tm ≤ 1. If v ∈ BV [0, 1] is aligned with f − p0 , then v is a piecewise continuous function with jump discontinuities only at these tk ’s. Let tk be a point of jump discontinuity of v. Define q(t) = Πm j6=k (t − tj ) Then, q ∈ M but hq, vi = 6 0 and hence v ∈ / M ⊥ , which is a contradiction. Next, we consider the minimum norm problem; min |x∗ |X ∗ subject to hx∗ , yk i = ck , 1 ≤ k ≤ n. where y1 , · · · , yn ∈ X are given linearly independent vectors. Theorem Let x ¯∗ ∈ X ∗ satisfying the constraints h¯ x∗ , yk i = ck , 1 ≤ k ≤ n. Let M = span{yk }. Then, d= min h¯ x∗ ,yk i=ck |x∗ |X ∗ = min |¯ x∗ − m∗ |. m∗ ∈M ⊥ By the duality principle, d = min |¯ x∗ − m∗ | = m∗ ıM ⊥ sup h¯ x∗ , xi. x∈M, |x|≤1 64 Write x = Y a where Y = [y1 , · · · , yn ] and a ∈ Rn . Then d = min |x∗ | = max h¯ x∗ , Y ai = max (c, a). |Y a|≤1 |Y a|≤1 Thus, the optimal solution x∗0 must be aligned with Y a. Example (DC motor control) Let X = L1 (0, 1) and X ∗ = L∞ (0, 1). min |u|∞ 1 Z t−1 e subject to Z 1 u(t) dt = 0, (1 + et−1 )u(t) = 1. 0 0 Let y1 = et−1 and y2 = 1 + et−1 . From the duality principle min hyk ,ui=ck |u|∞ = max |a1 y1 +a2 y2 |1 ≤1 a2 . The controlR problem now is reduced to finding constants a1 and a2 that maximizes a2 1 subject to 0 |(a1 − a2 )et−1 + a2 | dt ≤ 1. The optimal solution u ∈ L∞ (0, 1) should be aligned with (a1 − a2)et−1 + a2 ∈ L1 (0, 1). Since (a1 − a2 )et−1 + a2 , can change sign at most once. the alignment condition implies that u must have values |u|∞ and can change sign at most once, which is the so called bang-bang control. 4 Linear Operator Theory and C0-semigroup In this section we discuss the Cauchy problem d u(t) = Au(t) + f (t), dt u(0) = u0 ∈ X in a Banach space X, where u0 ∈ X is the initial condition and f ∈ L1 (0, T ; X). We construct the mild solution u(t) ∈ C(0, T ; X): Z t u(t) = S(t)u0 + S(t − s)f (s) ds (4.1) 0 where a family of bounded linear operator {S(t), t ≥ 0} is C0 -semigroup on X. Definition (C0 semigroup) (1) Let X be a Banach space. A family of bounded linear operators {S(t), t ≥ 0} on X is called a strongly continuous (C0 ) semigroup if S(t + s) = S(t)S(s) for t, s ≥ 0 with S(0) = I |S(t)φ − φ| → 0 as t → 0+ for all φ ∈ X. (2) A linear operator A in X defined by Aφ = lim t→0+ S(t)φ − φ t (4.2) with dom(A) = {φ ∈ X : the strong limit of lim t→0+ 65 S(t)φ − φ in X exists}. t is called the infinitesimal generator of the C0 semigroup S(t). In this section we present the basic theory of the linear C0 -semigroup on a Banach space X. The theory allows to analyze a wide class of the physical and engineering dynamics using the unified framework. We also present the concrete examples to demonstrate the theory. There is a necessary and sufficient condition (Hile-Yosida Theorem) for a closed, densely defined linear A in X to be the infinitesimal generator of the C0 semigroup S(t). Moreover, we will show that the mild solution u(t) satisfies Z hu(t), ψi = hu0 , ψi + (hx(s), A∗ ψi + hf (s), ψi ds (4.3) for all ψ ∈ dom (A∗ ). Examples (1) For A ∈ (X), define a sequence of linear operators in X SN (t) = X 1 (At)k . k! k Then |SN (t)| ≤ and X 1 (|A|t)k ≤ e|A| t k! d (t) = ASN −1 (t) SN Since S(t) = eAt = limN →∞ SN (t) in the operator norm, we have d S(t) = AS(t) = S(t)A. dt (2) Consider the the hyperbolic equation ut + ux = 0, u(0, x) = u0 (x) in (0, 1). (4.4) Define the semigroup S(t) of translations on X = L2 (0, 1) by [S(t)u0 ](x) = u ˜0 (x − t), u ˜0 (x) = 0, x ≤ 0, u ˜0 = u0 on (0, 1). Then, {S(t), t ≥ 0} is a C0 semigroup on X. If we define u(t, x) = [S(t)u0 ](x) with u0 ∈ H 1 (0, 1) with u0 (0) = 0 satisfies (4.5) a.e.. The generator A is given by Aφ = −φ0 with dom(A) = {φ ∈ H 1 (0, 1) with u(0) = 0}. Thus, u(t) = S(t)u0 satisfies the Cauchy problem (3) Consider the the heat equation ut = σ2 ∆u, 2 d u (t) = Au(t) if u0 ∈ dom(A). u(0, x) = u0 (x) in L2 (Rn ). Define the semigroup 1 [S(t)u0 ](x) = √ ( 2πtσ)n 66 Z Rn e− |x−y|2 2σ 2 t u0 (y) dy (4.5) Let A be closed, densely defined linear operator dom(A) → X. We use the finite difference method in time to construct the mild solution (4.1). For a stepsize λ > 0 consider a sequence {un } in X generated by un − un−1 = Aun + f n−1 , λ with f n−1 1 = λ (4.6) nλ Z f (t) dt. (n−1)λ Assume that for λ > 0 the resolvent operator Jλ = (I − λ I)−1 is bounded. Then, we have the product formula: un = Jλn u0 + n−1 X Jλn−k f k λ. (4.7) k=0 In order to un ∈ X is uniformly bounded in n for all u0 ∈ X (with f = 0), it is necessary that M |Jλn | ≤ for λω < 1, (4.8) (1 − λω)n for some M ≥ 1 and ω ∈ R. Hile’s Theorem Define a piecewise constant function in X by uλ (t) = uk−1 on [tk−1 , tk ) Then, max |uλ − u(t)|x → 0 t∈[0,T ] as λ → 0+ to the mild solution (4.1). That is, S(t)x = lim (I − n→∞ t [t] A) n x n exists for all x ∈ X and {S(t), t ≥ 0} is the C0 semigurop on X and its generator is A. Proof: First, note that |Jλ | ≤ M 1 − λω and for x ∈ dom(A) |Jλ x − x| = |λJλ Ax| ≤ λ |Ax| → 0 1 − λω as λ → 0+ . Since dom(A) is dense in X it follows that |Jλ x − x| → 0 as λ → 0+ for all x ∈ X. 67 Define the linear operators Tλ (t) and Sλ (t) by Sλ (t) = Jλk−1 and Tλ (t) = Jλk−1 + (t − tk )(Jλk − Jλk−1 ), on [tk−1 , tk ). Then, d Tλ (t) = ASλ (t), a.e int ∈ [0, T ]. dt Thus, Z Tλ (t)u0 −Tµ (t)u0 = 0 t d (Tλ (s)Tµ (t−s)u0 ) ds = ds Z t (Sλ (s)Tµ (t−s)−Tλ (s)Sλ (t−s))Au0 ds 0 Since Tλ (s)u − Sλ (s)u = (s − tk−1 )(Jλ − I)Tλ (tk−1 ) on s ∈ [tk−1 , tk ). By the bounded convergence theorem |Tλ (t)u0 − Tµ (t)u0 |X → 0 as λ, µ → 0 for all u0 ∈ dom(A). Thus, the unique limit defines the linear operator S(t) by S(t)u0 = lim Sλ (t)u0 . (4.9) λ→0+ Since |Sλ (t)| ≤ M ≤ M eωt , (1 − λω)[t/n] where [s] is the largest integer less than s ∈ R. and dom(A) is dense, (4.9) holds for all u0 ∈ X. Since S(t + s)u = lim Jλn+m = Jλn Jλm u = S(t)S(s)u λ→0+ and limt→0+ S(t)u = limt→0+ Jt u = u, S(t) is the C0 semigroup on X. Moreover, {S(t), t ≥ 0} is in the class G(M, ω), i.e., |S(t)| ≤ M eωt Note that Z t Tλ (t)u0 − u0 = A Sλ u0 ds 0 Since limλ→0+ Tλ (t)u0 = limλ→0+ Sλ (t)u0 = S(t)u0 and A is closed, we have Z t Z t S(t)u0 − u0 = A S(s)u0 ds, S(s)u0 ds ∈ dom(A). 0 0 If B is a generator of {S(t), t ≥ 0}, then Bx = lim t→0+ S(t)x − x = Ax t if x ∈ dom(A). Conversely, if u0 ∈ dom(B), then u0 ∈ dom(A) since A is closed and t → S(t)u0 is continuous at 0 and thus Z 1 t S(s)u0 ds = u0 as t → 0+ . t 0 68 Hence S(t)u0 − u0 = Bx t That is, A is the generator of {S(t), t ≥ 0}. Similarly, we have Ax = n−1 X Jλn−k f k = k=0 t Z Z Sλ (t − s)f (s) ds → t S(t − s)f (s) ds as λ → 0+ 0 0 by the Lebesque dominated convergence theorem. The following theorem state the basic properties of C0 semigroups: Theorem (Semigroup) (1) There exists M ≥ 1, ω ∈ R such that S ∈ G(M, ω) class, i.e., |S(t)| ≤ M eωt , t ≥ 0. (4.10) (2) If x(t) = S(t)x0 , x0 ∈ X, then x ∈ C(0, T ; X) (3) If x0 ∈ dom (A), then x ∈ C 1 (0, T ; X) ∩ C(0, T ; dom(A)) and d x(t) = Ax(t) = AS(t)x0 . dt (4) The infinitesimal generator A is closed and densely defined. For x ∈ X Z t S(t)x − x = A S(s)x ds. (4.11) 0 (5) λ > ω the resolvent is given by (λ I − A)−1 = Z ∞ e−λs S(s) ds (4.12) M . (λ − ω)n (4.13) 0 with estimate |(λ I − A)−n | ≤ Proof: (1) By the uniform boundedness principle there exists M ≥ 1 such that |S(t)| ≤ M on [0, t0 ] For arbitrary t = k t0 +τ , k ∈ N and τ ∈ [0, t0 ) it follows from the semigroup property we get |S(t)| ≤ |S(τ )||S(t0 |k ≤ M ek log |S(t0 )| ≤ M eω t with ω = t10 , log |S(t0 )|. (2) It follows from the semigroup property that for h > 0 x(t + h) − x(t) = (S(h) − I)S(t)x0 and for t − h ≥ 0 x(t − h) − x(t) = S(t − h)(I − S(h))x0 Thus, x ∈ C(0, T ; X) follows from the strong continuity of S(t) at t = 0. (3)–(4) Moreover, x(t + h) − x(t) S(h) − I S(h)x0 − x0 = S(t)x0 = S(t) h h h 69 and thus S(t)x0 ∈ dom(A) and lim h→0+ x(t + h) − x(t) = AS(t)x0 = Ax(t). h Similarly, lim h→0+ x(t − h) − x(t) S(h)φ − φ = lim S(t − h) = S(t)Ax0 . −h h h→0+ Hence, for x0 ∈ dom(A) Z t Z t Z t S(s)x0 ds AS(s)x0 ds = A S(s)Ax0 ds = S(t)x0 − x0 = (4.14) 0 0 0 If xn ∈ don(A) → x and Axn → y in X, we have Z t S(s)y ds S(t)x − x = 0 Since 1 lim t→0+ t Z t S(s)y ds = y 0 x ∈ dom(A) and y = Ax and hence A is closed. Since A is closed it follows from (4.14) that for x ∈ X Z t S(s)x ds ∈ dom(A) 0 and (4.11) holds. For x ∈ X let xh = 1 h Z h S(s)x ds ∈ dom(A) 0 Since xh → x as h → 0+ , dom(A) is dense in X. (5) For λ > ω define Rt ∈ L(X) by Z t Rt = e−λs S(s) ds. 0 Since A − λ I is the infinitesimal generator of the semigroup eλt S(t), from (4.11) (λ I − A)Rt x = x − e−λt S(t)x. Since A is closed and |e−λt S(t)| → 0 as t → ∞, we have R = limt→∞ Rt satisfies (λ I − A)Rφ = φ. Conversely, for φ ∈ dom(A) Z ∞ R(A − λ I)φ = e−λs S(s)(A − λ I)φ = lim e−λt S(tφ − φ = −φ t→∞ 0 Hence Z R= ∞ e−λs S(s) ds = (λ I − A)−1 0 70 Since for φ ∈ X ∞ Z |e |Rx| ≤ −λs ∞ Z e(ω−λ)s |x| ds = S(s)x| ≤ M 0 0 we have |(λ I − A)−1 | ≤ M , λ−ω M |x|, λ−ω λ > ω. Note that −2 (λ I − A) ∞ Z e = −λt ∞Z ∞ e S(s) ds = = e 0 0 0 −λσ ∞Z ∞ Z λs S(t) ds 0 Z ∞ Z ∞ Z S(σ) dσ dt = e−λ(t+s) S(t + s) ds dt 0 σe−λσ S(σ) dσ. 0 t By induction, we obtain −n (λ I − A) 1 = (n − 1)! Thus, −n |(λ I − A) 1 |≤ (n − 1)! ∞ Z Z ∞ tn−1 e−λt S(t) dt. (4.15) 0 tn−1 e−(λ−ω)t dt = 0 M . (λ − ω)n We then we have the necessary and sufficient condition: Hile-Yosida Theorem A closed, densely defined linear operator A on a Banach space X is the infinitesimal generator of a C0 semigroup of class G(M, ω) if and only if M (λ − ω)n |(λ I − A)−n | ≤ for all λ > ω (4.16) Proof: The sufficient part follows from the previous Theorem. But, we describe the Yosida construction. Define the Yosida approximation Aλ ∈ L(X) of A by Aλ = Jλ − I = AJλ . λ (4.17) Define the Yosida approximation: t t Sλ (t) = eAλ t = e− λ eJλ λ . Since |Jλk | ≤ we have t |Sλ (t)| ≤ e− λ ∞ X M k=0 Since M (1 − λω)k ω t |Jλk |( )k ≤ M e 1−λω t . k! λ d Sλ (s)Sλˆ (t − s) = Sλ (s)(Aλ − Aλˆ )Sλˆ (t − s), ds we have Z Sλ (t)x − Sλˆ (t)x = 0 t Sλ (s)Sλˆ (t − s)(Aλ − Aλˆ )x ds 71 Thus, for x ∈ dom(A) |Sλ (t)x − Sλˆ (t)x| ≤ M 2 teωt |(Aλ − Aλˆ )x| → 0 ˆ → 0+ . Since dom(A) is dense in X this implies that as λ, λ S(t)x = lim Sλ (t)x exist for all x ∈ X, λ→0+ defines a C0 semigroup of G(M, ω) class. The necessary part follows from (4.15) Theorem (Mild solution) (1) If for f ∈ L1 (0, T ; X) define t Z S(t − s)f (s) ds, x(t) = x(0) + 0 then x(t) ∈ C(0, T ; X) and it satisfies Z t Z t x(s) ds + f (s) ds. x(t) = A 0 (4.18) 0 (2) If Af ∈ L1 (0, T ; X) then x ∈ C(0, T ; dom(A)) and Z t (Ax(s) + f (s)) ds. x(t) = x(0) + 0 Rt d (3) If f ∈ W 1,1 (0, T ; X), i.e. f (t) = f (0) + 0 f 0 (s) ds, dt f = f 0 ∈ L1 (0, T ; X), then Ax ∈ C(0, T ; X) and Z t Z t A S(t − s)f (s) ds = S(t)f (0) − f (t) + S(t − s)f 0 (s) ds. (4.19) 0 0 Proof: Since Z tZ τ t Z S(t − s)f (s) dsdτ = 0 0 Z t S(τ − s)d τ )f (s) ds, ( 0 and Z s t S(s) ds = S(t) − I A 0 we have x(t) ∈ dom(A) and Z t Z t Z t A x(s) ds = S(t)x − x + S(t − s)f (s) ds − f (s) ds. 0 0 0 and we have (4.18). (2) Since for h > 0 x(t + h) − x(t) = h Z 0 t S(h) − I 1 S(t − s) f (s) ds + h h Z t+h S(t + h − s)f (s) ds t if Af ∈ L1 (0, T ; X) x(t + h) − x(t) lim = + h h→0 Z t S(t − s)Af (s) ds + f (t) 0 72 a.e. t ∈ (0, T ). Similarly, Z Z t−h x(t − h) − x(t) S(h) − I 1 t S(t − s)f (s) ds = S(t − h − s) f (s) ds + −h h h t−h 0 Z t S(t − s)Af (s) ds + f (t) → 0 a.e. t ∈ (0, T ). (3) Since Z t+h Z h S(h) − I 1 S(t + f − s)f (s) ds S(th − s)f (s) ds − x(t) = ( h h 0 t Z t S(t − s) + 0 f (s + h) − f (s) ds, h letting h → 0+ , we obtain(4.19). It follows from Theorems the mild solution Z t x(t) = S(t)x(0) + S(t − s)f (s) ds 0 satisfies t Z x(t) = x(0) + A Z t x(s) + 0 f (s) ds. 0 Note that the mild solution x ∈ C(0, T ; X) depends continuously on x(0) ∈ X and f ∈ L1 (0, T ; X) with estimate Z t ωt |x(t)| ≤ M (e |x(0)| + eω(t−s) |f (s)| ds). 0 Thus, the mild solution is the limit of a sequence {xn } of strong solutions with xn (0) ∈ dom(A) and fn ∈ W 1,1 (0, T ; X), i.e., since dom(A) is dense in X and W 1,1 (0, T ; X) is dense in L1 (0, T ; X), |xn (t) − x(t)|X → 0 uniformly on [0, T ] for |xn (0) − x(0)|x → 0 and |fn − f |L1 (0,T ;X) → 0 as n → ∞. Moreover, the mild solution x ∈ C(0, T : X) is a weak solution to the Cauchy problem d x(t) = Ax(t) + f (t) dt (4.20) in the sense of (4.3), i.e., for all ψ ∈ dom(A∗ ) hx(t), ψiX×X ∗ is absolutely continues and d hx(t), ψi = hx(t), ψi + hf (t), ψi a.e. in (0, T ). dt If x(0) ∈ dom(A) and Af ∈ L1 (0, T ; X), then Ax ∈ C(0, T ; X), x ∈ W 1,1 (0, T ; X) and d x(t) = Ax(t) + f (t), a.e. in (0, T ) dt 73 If x(0) ∈ dom(A) and f ∈ W 1,1 (0, T ; X), then x ∈ C(0, T ; dom(A)) ∩ C 1 (0, T ; X) and d x(t) = Ax(t) + f (t), everywhere in [0, T ]. dt The condition (4.16) is very difficult to check for a given A in general. For the case M = 1 we have a very complete characterization. Lumer-Phillips Theorem The followings are equivalent: (a) A is the infinitesimal generator of a C0 semigroup of G(1, ω) class. (b) A − ω I is a densely defined linear m-dissipative operator,i.e. |(λ I − A)x| ≥ (λ − ω)|x| for all x ∈ don(A), λ > ω (4.21) and for some λ0 > ω R(λ0 I − A) = X. (4.22) Proof: It follows from the m-dissipativity |(λ0 I − A)−1 | ≤ 1 λ0 − ω Suppose xn ∈ dom(A) → x and Axn → y in X, the x = lim xn = (λ0 I − A)−1 lim (λ0 xn − Axn ) = (λ0 I − A)−1 (λ0 x − y). n→∞ n→∞ Thus, x ∈ dom(A) and y = Ay and hence A is closed. Since for λ > ω λ I − A = (I + (λ − λ0 )(λ0 I − A)−1 )(λ0 I − A), 0| −1 ∈ L(X). Thus by the continuation method we have if |λ−λ λ0 −ω < 1, then (λ I − A) −1 (λ I − A) exists and 1 |(λ I − A)−1 | ≤ , λ > ω. λ−ω It follows from the Hile-Yosida theorem that (b) → (a). (b) → (a) Since for x∗ ∈ F (x), the dual element of x, i.e. x∗ ∈ X ∗ satisfying hx, x∗ iX×X ∗ = |x|2 and |x∗ | = |x| he−ωt S(t)x, x∗ i ≤ |x||x∗ | = hx, x∗ i we have for all x ∈ dom(A) 0 ≥ lim h t→0+ e−ωt S(t)x − x ∗ , x i = h(A − ω I)x, x∗ i for all x∗ ∈ F (x). t which implies A − ω I is dissipative. Theorem (Dissipative I) (1) A is a ω-dissipative |λ x − Ax| ≥ (λ − ω)|x| for all x ∈ dom (A). if and only if (2) for all x ∈ dom(A) there exists x∗ ∈ F (x) such that hAx, x∗ i ≤ ω |x|2 . 74 (4.23) (2) → (1). Let x ∈ dom(A) and choose x∗ ∈ F (0) such that hA, x∗ i ≤ 0. Then, for any λ > 0, |x|2 = hx, x∗ i = hλ x − Ax + Ax, x∗ i ≤ hλ x − Ax, x∗ i − ω |x|2 ≤ |λ x − Ax||x| − ω |x|2 , which implies (2). (1) → (2). From (1) we obtain the estimate 1 − (|x − λ Ax| − |x|) ≤ 0 λ and hAx, xi− = lim = λ→0+ 1 (|x| − |x − λ Ax|) ≤ 0 λ which implies there exists x∗ ∈ F (x) such that (4.23) holds since hAx, xi− = hAX, x∗ i for some x∗ ∈ F (x). Thus, Lumer-Phillips theorem says that if m-diisipative, then (4.23) hold for all x∗ ∈ F (x). Theorem (Dissipative II) Let A be a closed densely defined operator on X. If A and A∗ are dissipative, then A is m-dissipative and thus the infinitesimal generator of a C − 0-semigroup of contractions. Proof: Let y ∈ R(I − A) be given. Then there exists a sequence xn ∈ dom(A) such that y= xn − Axn → y as n → ∞. By the dissipativity of A we obtain |xn − xm | ≤ |xn − xm − A(xn − xm )| ≤ |y− ym | Hence xn is a Cauchy sequence in X. We set x = limn→∞ xn . Since A is closed, we see that x ∈ dom (A) and x − Ax = y, i.e., y ∈ R(I − A). Thus R(I − A) is closed. Assume that R(I − A) 6= X. Then there exists an x∗ ∈ X ∗ such that h(I − A)x, x∗ ) = 0 for all x ∈ dom (A). By definition of the dual operator this implies x∗ ∈ dom (A∗ ) and (IA)∗ x∗ = 0. Dissipativity of A* implies |x∗ | < |x∗ − A∗ x∗ | = 0, which is a contradiction. Example (revisited example 1) Aφ = −φ0 in X = L2 (0, 1) and for φ ∈ H 1 (0, 1) Z (Aφ, φ)X = − 0 1 1 φ0 (x)φ dx == (|φ(0)|2 − |φ(1)|2 ) 2 Thus, A is dissipative if and only if φ(0) = 0, the in flow condition. Define the domain of A by dom(A) = {φ ∈ H 1 (0, 1) : φ(0) = 0} The resolvent equation is equivalent to λu + d u=f dx 75 and Z u(x) = x e−λ (x−s) f (s) ds, 0 and R(λ I − A) = X. By the Lumer-Philips theorem A generates the C0 semigroup on X = L2 (0, 1). Example (Conduction equation) Consider the heat conduction equation: X d ∂2u u = Au = aij (x) + b(x) · ∇u + c(x)u, dt ∂xi ∂xj in Ω. i,j Let X = C(Ω) and dom(A) ⊂ C 2 (Ω). Assume that a ∈ Rn×n ∈ C(Omega) b ∈ Rn,1 ¯ and a is symmetric and and c ∈ R are continuous on Ω m I ≤ a(x) ≤ M I for 0 < m ≤ M < ∞. Then, if x0 is an interior point of Ω at which the maximum of φ ∈ C 2 (Ω) is attained. Then, X ∂2u (x0 ) ≤ 0. ∇φ(x0 ) = 0, aij (x0 ) ∂xi ∂xj ij and thus (λ φ − Aφ)(x0 ) ≤ ω φ(x0 ) where ω ≤ max c(x). x∈Ω Similarly, if x0 is an interior point of Ω at which the minimum of φ ∈ C 2 (Ω) is attained, then (λ φ − Aφ)(x0 ) ≥ 0 If x0 ∈ ∂Ω attains the maximum, then ∂ φ(x0 ) ≤ 0. ∂ν Consider the domain with the Robin boundary condition: dom(A) = {u ∈ α(x) u(x) + β(x) ∂ u = 0 at ∂Ω} ∂ν with α, β ≥ 0 and inf x∈∂Ω (α(x) + β(x)) > 0. Then, |λφ − Aφ|X ≥ (λ − ω)|φ|X . for all φ ∈ dom(A), which shows A is dissipative. The range condition follows from the Lax Milgram theory. Example (Advection equation) Consider the advection equation ut + (~b(x)u) = ν ∆u. Let X = L1 (Ω). Assume ~b ∈ L∞ (Ω) 76 Let ρ ∈ C 1 (R) be a monotonically increasing function satisfying ρ(0) = 0 and ρ(x) = sign(x), |x| ≥ 1 and ρ (s) = ρ( s ) for > 0. For u ∈ C 1 (Ω) Z ∂ 1 u − n · ~b u, ρ (u)) ds + (~b u − ν ux , ρ0 (u) ux ) + (c u, ρ (u)). (Au, u) = (ν ∂n Γ where 1 1 (~b u, ρ0 (u) ux ) ≤ ν (ux , ρ0 (u) ux ) + meas({|u| ≤ }) 4ν Assume the inflow condition u on {s ∈ ∂Ω : n · b < 0}. Note that for u ∈ L1 (Rd ) (u, ρ (u)) → |u|1 and (ψ, ρ (u)) → (ψ, sign0 (u)) for ψ ∈ L1 (Ω) as → 0. It thus follows (λ − ω) |u| ≤ |λ u − λ Au|. (4.24) Since C 2 (Ω) is dense in W 2,1 (Rd ) (4.24) holds for u ∈ W 2,1 (Ω). Example (X = Lp (Ω)( Let Au = ν ∆u + b · ∇u with homogeneous boundary condition u = 0 on X = Lp (Ω). Since Z Z ∗ p−2 h∆u, u i = ∆, |u| u) = −(p − 1) (∇u, |u|p−2 ∇u) Ω Ω and (b · ∇u, |u|p−2 u)L2 ≤ (p − 1)ν |b|2∞ |(∇u, |u|p−2 ∇u)L2 + (|u|p , 1)L2 2 2ν(p − 1) we have hAu, u∗ iω|u|2 for some ω > 0. Example (Second order equation) Let V ⊂ H = H ∗ ⊂ V ∗ be the Gelfand triple. Let ρ be a bounded bilinear form on H × H, µ and σ be bounded bilinear forms on V × V . Assume ρ and σ are symmetric and coercive and µ(φ, φ) ≥ 0 for all φ ∈ V . Consider the second order equation ρ(utt , φ) + µ(ut , φ) + σ(u, φ) = hf (t), φi for all φ ∈ V. Define linear operators M (mass), D (dampping), K and (stiffness) by (M φ, ψ)H = ρ(φ, ψ), φ, ψ ∈ HhDφ, ψi = µ(φ, ψ) φ, ψ ∈ V hKφ, ψiV ∗ ×V , Let v = ut and define A on X = V × H by A(u, v) = (v, −M −1 (Ku + Dv)) with domain dom (A) = {(u, v) ∈ X : v ∈ V and Ku + Dv ∈ H} The state space X is a Hilbert space with inner product ((u1 , v1 ), (u, v2 )) = σ(u1 , u2 ) + ρ(v1 , v2 ) 77 φ, ψ ∈ V and E(t) = |(u(t), v(t))|2X = σ(u(t), u(t)) + ρ(v(t), v(t)) defines the energy of the state x(t) = (u(t), v(t)). First, we show that A is dissipative: (A(u, v), (u, v))X = σ(u, v)+ρ(−M −1 (Ku+Dv), v) = σ(u, v)−σ(u, v)−µ(v, v) = −µ(v, v) ≤ 0 Next, we show that R(λ I − A) = X. That is, for (f, g) ∈ X the exists a solution (u, v) ∈ dom (A) satisfying λ u − v = f, λM v + Dv + Ku = M g, or equivalently v = λ u − f and λ2 M u + λDu + Ku = M g + λ M f + Df (4.25) Define the bilinear form a on V × V a(φ, ψ) = λ2 ρ(φ, ψ) + λ µ(φ, ψ) + σ(φ, ψ) Then, a is bounded and coercive and if we let F (φ) = (M (g + λ f )φ)H + µ(f, φ) then F ∈ V ∗ . It thus follows from the Lax-Milgram theory there exists a unique solution u ∈ V to (4.25) and Dv + Ku ∈ H. For example, consider the wave equation 1 utt c2 (x) = ∆u, ∂u + κ(x)ut = 0 and ∂Ω. ∂n In this example we let V = H 1 (Ω)/R and H = L2 (Ω) and define Z σ(φ, ψ) = (∇φ, ∇ψ) dx Ω Z µ(φ, ψ) = κ(x)φ, ψ ds ∂Ω Z ρ(φ, ψ) = Ω 1 φψ dx. c2 (x) Example (Maxwell equation) Et = ∇ × H µ Ht = −∇ × E with boundary condition E×n=0 The dissipativity follows from E · (∇ × H) + H · (∇ × E) = div(E 78 Theorem (Dual semigroup) Let X be a reflexive Banach space. The adjoint S ∗ (t) of the C0 semigroup S(t) on X forms the C0 semigroup and the infinitesimal generator of S ∗ (t) is A∗ . Let X be a Hilbert space and dom(A∗ ) be the Hilbert space with graph norm and X−1 be the strong dual space of dom(A∗ ), then the extension S(t) to X−1 defines the C0 semigroup on X−1 . Proof: (1) Since for t, s ≥ 0 S ∗ (t + s) = (S(s)S(t))∗ = S ∗ (t)S ∗ (s) and hx, S ∗ (t)y − yiX×X ∗ = hS(t)x − x, yiX×X ∗ → 0. for x ∈ X and y ∈ X ∗ Thus, S ∗ (t) is weakly star continuous at t = 0 and let B is the generator of S ∗ (t) as S ∗ (t)x − x Bx = w∗ − lim . t Since S ∗ (t)y − y S(t)x − x , y) = (x, ), ( t t for all x ∈ dom(A) and y ∈ dom(B) we have hAx, yiX×X ∗ = hx, ByiX×X ∗ and thus B = A∗ . Thus, A∗ is the generator of a w∗ − continuous semigroup on X ∗ . (2) Since Z t S ∗ (t)y − y = A∗ S ∗ (s)y ds 0 dom (A∗ ). S ∗ (t) for all y ∈ Y = Thus, is strongly continuous at t = 0 on Y . ∗ ∗ (3) If X is reflexive, dom (A ) = X . If not, there exists a nonzero y0 ∈ X such that hy0 , x∗ iX×X ∗ = 0 for all x∗ ∈ dom(A∗ ). Thus, for x0 = (λ I −A)−1 y0 hλ x0 −Ax0 , x∗ i = hx0 , λ x∗ − A∗ x∗ i = 0. Letting x∗ = (λ I − A∗ )−1 x∗0 for x∗0 ∈ F (x0 ), we have x0 = 0 and thus y0 = 0, which yields a contradiction. (4) X1 = dom (A∗ ) is a closed subspace of X ∗ and is a invariant set of S ∗ (t). Since A∗ is closed, S ∗ (t) is the C0 semigroup on X1 equipped with its graph norm. Thus, (S ∗ (t))∗ is the C0 semigroup on X−1 = X1∗ and defines the extension of S(t) to X−1 . Since for x ∈ X ⊂ X−1 and x∗ ∈ X ∗ hS(t)x, x∗ i = hx, S ∗ (t)x∗ i, S(t) is the restriction of (S ∗ (t))∗ onto X. 4.1 Sectorial operator and Analytic semigroup In this section we have the representation of the semigroup S(t) in terms of the inverse Laplace transform. Taking the Laplace transform of d x(t) = Ax(t) + f (t) dt 79 we have x ˆ = (λ I − A)−1 (x(0) + fˆ) where for λ > ω Z ∞ e−λt x(t) dt x ˆ= 0 is the Laplace transform of x(t). We have the following the representation theory (inverse formula). Theorem (Resolvent Calculus) For x ∈ dom(A2 ) and γ > ω 1 S(t)x = 2πi γ+i∞ Z eλt (λ I − A)−1 x dλ. (4.26) γ−i∞ Proof: Let Aµ be the Yosida approximation of A. Since Re σ(Aµ ) ≤ have 1 uµ (t) = Sµ (t)x = 2πi γ+i∞ Z ω0 < γ, we 1 − µω0 eλt (λ I − Aµ )−1 x dλ. γ−i∞ Note that λ(λ I − A)−1 = I + (λ I − A)−1 A. Since 1 2πi and Z Z γ+i∞ γ+i∞ γ−i∞ (4.27) eλt dλ = 1 λ |(|λ − ω|−2 | dλ < ∞, γ−i∞ we have |Sµ (t)x| ≤ M |A2 x|, uniformly in µ > 0. Since (λ I − Aµ )−1 x − (λ I − A)−1 x = µ (ν I − A)−1 (λ I − A)−1 A2 x, 1 + λµ λ , {uµ (t)} is Cauchy in C(0, T ; X) if x ∈ dom(A2 ). Letting µ → 0+ , 1 + λµ we obtain (4.26). where ν = Next we consider the sectorial operator. For δ > 0 let Σδω = {λ ∈ C : arg(λ − ω) < π + δ} 2 be the sector in the complex plane C. A closed, densely defined, linear operator A on a Banach space X is a sectorial operator if |(λ I − A)−1 | ≤ M for all λ ∈ Σδω . |λ − ω| 80 For 0 < θ < δ let Γ = Γω,θ be the integration path defined by Γ± = {z ∈ C : |z| ≥ δ, arg(z − ω) = ±( π2 + θ)}, Γ0 = {z ∈ C : |z| = δ, |arg(z − ω)| ≤ π 2 + θ}. For 0 < θ < δ define a family {S(t), t ≥ 0} of bounded linear operators on X by Z 1 eλt (λ I − A)−1 x dλ. (4.28) S(t)x = 2πi Γ Theorem (Analytic semigroup) If A is a sectorial operator on a Banach space X, then A generates an analytic (C0 ) semigroup on X, i.e., for x ∈ X t → S(t)x is an analytic function on (0, ∞). We have the representation (4.28) for x ∈ X and |AS(t)x|X ≤ Mθ |x|X t Proof: The elliptic operator A defined by the Lax-Milgram theorem defines a sectorial operator on Hilbert space. Theorem (Sectorial operator) Let V, H are Hilbert spaces and assume H ⊂ V ∗ . Let ρ(u, v) is bounded bilinear form on H × H and ρ(u, u) ≥ |u|2H for all u ∈ H Let a(u, v) to be a bounded bilinear form on V × V with σ(u, u) ≥ δ |u|2V for all u ∈ V Define the linear operator A by ρ(Au, φ) = a(u, φ) for all φ ∈ V. Then, for Re λ > 0we have |(λ I − A)−1 |L(V,V ∗ ) ≤ |(λ I − A)−1 |L(H) ≤ 1 δ M |λ| |(λ I − A)−1 |L(V ∗ ,H) ≤ √M |λ| |(λ I − A)−1 |L(H,V ) ≤ √M |λ| Proof: Let a(u, v) to be a bounded bilinear form on V × V . For f ∈ V ∗ and <λ > 0, Define M H, H by (M u, v) = ρ(u, v) for all v ∈ H and A0 ∈ L(V, V ∗ ) by hA0 u, vi = σ(u, v) for v ∈ V 81 Then, (λ I − A)u = M −1 f is equivalent to λρ(u, φ) + a(u, φ) = hf, φi, for all φ ∈ V. (4.29) It follows from the Lax-Milgram theorem that (4.29) has a unique solution, given f ∈ V ∗ and Re λρ(u, u) + a(u, u) ≤ |f |V ∗ |u|V . Thus, 1 |(λ I − A)−1 |L(V ∗ ,V ) ≤ . δ Also, |λ| |u|2H ≤ |f |V ∗ |u|V + M |u|2V = M1 |f |2V ∗ for M1 = 1 + M δ2 and thus √ |(λ I − A)−1 |L(V ∗ ,H) ≤ M1 . |λ|1/2 For f ∈ H ⊂ V ∗ δ |u|2V ≤ Re λ ρ(u, u) + a(u, u) ≤ |f |H |u|H , (4.30) and |λ|ρ(u, u) ≤ |f |H |u|H + M |u|2V ≤ M1 |f |H |u|H Thus, |(λ I − A)−1 |L(H) ≤ M1 . |λ| Also, from (4.30) δ |u|2V ≤ |f |H |u|H ≤ M1 |f |2 . which implies |(λ I − A)−1 |L(H,V ) ≤ 4.2 M2 . |λ|1/2 Approximation Theory In this section we discuss the approximation theory for the C0 -semigroup. Theorem (Trotter-Kato theorem) Let X and Xn be Banach spaces and A and An be the infinitesimal generator of C0 semigroups S(t) on X and Sn (t) on Xn of G(M, ω) class. Assume a family of uniformly bounded linear operators Pn ∈ L(X, Xn ) and En ∈ L(Xn , X) satisfy Pn En = I |En Pn x − x|X → 0 for all x ∈ X Then, the followings are equivalent. (1) there exist a λ0 > ω such that for all x ∈ X |En (λ0 I − An )−1 Pn x − (λ0 I − A)−1 x|X → 0 as n → ∞, (2) For every x ∈ X and T ≥ 0 |En Sn (t)Pn x − S(t)x|X → 82 as n → ∞. Proof: Since for λ > ω En (λ I − A) −1 Pn x − (λ I − A) −1 Z ∞ En Sn (t)Pn x − S(t)x dt x= 0 (1) follows from (2). Conversely, from the representation theory En Sn (t)Pn x − S(t)x = 1 2πi Z γ+i∞ (En (λ I − A)−1 Pn x − (λ I − A)−1 x) dλ γ−i∞ where (λ I − A)−1 − (λ0 I − A)−1 = (λ − λ0 )(λ I − A)−1 (λ0 I − A)−1 . Thus, from the proof of Theorem (Resolvent Calculus) (1) holds for x ∈ dom(A2 ). But since dom(A2 ) is dense in X, (2) implies (1). Example (Trotter-Kato theoarem) Consider the heat equation on Ω = (0, 1) × (0, 1): d u(t) = ∆u, dt u(0, x) = u0 (x) with boundary condition u = 0 at the boundary ∂Ω. We use the central difference approximation on uniform grid points: (i h, j h) ∈ Ω with mesh size h = n1 : ui,j − ui−1,j ui,j − ui,j−1 d 1 ui+1,j − ui,j 1 ui,j+1 − ui,j ui,j (t) = ∆h u = ( − )+ ( − ) dt h h h h h h for 1 ≤ i, j ≤ n1 , where ui,0 = ui,n = u1, j = un,j = 0 at the boundary node. First, 2 let X = C(Ω) and Xn = R(n−1) with sup norm. Let En ui,j = the piecewise linear interpolation and (Pn u)i,j = u(i h, j h) is the pointwise evaluation. We will prove that ∆h is dissipative on Xn . Suppose uij = |un |∞ . Then, since λ ui,j − (∆h u)i,j = fij and −(∆h u)i,j = 1 (4ui,j − ui+1,j − ui,j+1 − ui−1,j − ui,j−1 ) ≥ 0 h2 we have 0 ≤ ui,j ≤ fi,j . λ Thus, ∆h is dissipative on Xn with sup norm. Next X = L2 (Ω) and Xn with `2 norm. Then, (−∆h un , un ) = X ui,j − ui−1,j ui,j − ui,j−1 2 | |2 + | | ≥0 h h i,j and thus ∆h is dissipative on Xn with `2 norm. Example 83 5 Monotone Operator Theory and Nonlinear semigroup In this section we discuss the nonlinear operator theory that extends the Lax-Milgram theory for nonlinear monotone operators and mappings from Banach space from X to X ∗ . Also, we extend the C0 semigroup theory to the nonlinear case. Let us denote by F : X → X ∗ , the duality mapping of X, i.e., F (x) = {x∗ ∈ X ∗ : hx, x∗ i = |x|2 = |x∗ |2 }. By Hahn-Banach theorem, F (x) is non-empty. In general F is multi-valued. Therefore, when X is a Hilbert space, h·, ·i coincides with its inner product if X ∗ is identified with X and F (x) = x. Let H be a Hilbert space with scalar product (φ, ψ) and X be a real, reflexive Banach space and X ⊂ H with continuous dense injection. Let X ∗ denote the strong dual space of X. H is identified with its dual so that X ⊂ H = H ∗ ⊂ X ∗ . The dual product hφ, ψi on X × X ∗ is the continuous extension of the scalar product of H restricted to X × H. The following proposition contains some further important properties of the duality mapping F . Theorem (Duality Mapping) (a) F (x) is a closed convex subset. (b) If X ∗ is strictly convex (i.e., balls in X ∗ are strictly convex), then for any x ∈ X, F (x) is single-valued. Moreover, the mapping x → F (x) is demicontinuous, i.e., if xn → x in X, then F (xn ) converges weakly star to F (x) in X ∗ . (c) Assume X be uniformly convex (i.e., for each 0 < < 2 there exists δ = δ() > 0 such that if |x| = |y| = 1and |x − y| > , then |x + y| ≤ 2(1 − δ)). If xn converges weakly to x and lim supn→∞ |xn | ≤ |x|, then xn converges strongly to x in X. (d) If X ∗ is uniformly convex, then the mapping x → F (x) is uniformly continuous on bounded subsets of X. Proof: (a) Closeness of F (x) is an easy consequence of the follows from the continuity of the duality product. Choose x∗1 , x∗2 ∈ F (x) and α ∈ (0, 1). For arbitrary z ∈ X we have (using |x∗1 | = |x∗2 | = |x|) hz, αx∗1 + (1 − α)x∗2 i ≤ α|z| |x∗1 | + (1 − α)|z| |x∗2 | = |z| |x|, which shows |αx∗1 + (1 − α)x∗2 | ≤ |x|. Using hx, x∗ i = hx, x∗1 i = |x|2 we get hx, αx∗1 + (1 − α)x∗2 i = αhx, x∗1 i + (1 − α)hx, x∗2 i = |x|2 , so that |αx∗1 + (1 − α)x∗2 | = |x|. This proves αx∗1 + (1 − α)x∗2 ∈ F (x). (b) Choose x∗1 , x∗2 ∈ F (x), α ∈ (0, 1) and assume that |αx∗1 +(1−α)x∗2 | = |x|. Since X ∗ is strictly convex, this implies x∗1 = x∗2 . Let {xn } be a sequence such that xn → x ∈ X. From |F (xn )| = |xn | and the fact that closed balls in X ∗ are weakly star compact we see that there exists a weakly star accumulation point x∗ of {F (xn )}. Since the closed ball in X ∗ is weakly star closed, thus hx, x∗ i = |x|2 ≥ |x∗ |2 . Hence hx, x∗ i = |x|2 = |x∗ |2 and thus x∗ = F (x). Since F (x) is single-valued, this implies F (xn ) converges weakly to F (x). (c) Since lim inf |xn | ≤ |x|, thus limn→∞ |xn | = |x|. We set yn = xn /|xn | and y = x/|x|. Then yn converges weakly to y in X. Suppose yn does not converge strongly to y in 84 X. Then there exists an > 0 such that for a subsequence yn˜ |yn˜ − y| > . Since X ∗ is uniformly convex there exists a δ > 0 such that |yn˜ + y| ≤ 2(1 − δ). Since the norm is weakly lower semicontinuos, letting n ˜ → ∞ we obtain |y| ≤ 1 − δ, which is a contradiction. (d) Assume F is not uniformly continuous on bounded subsets of X. Then there exist constants M > 0, > 0 and sequences {un }, {vn } in X satisfying |un |, |vn | ≤ M, |un − vn | → 0, and |F (un ) − F (vn )| ≥ . Without loss of the generality we can assume that, for a constant β > 0, we have in addition |un | ≥ β, |vn | ≥ β. We set xn = un /|un | and yn = vn /|vn |. Then we have |xn − yn | = ≤ 1 ||vn |un − |un |vn | |un | |vn | 2M 1 (|vn ||un − vn | + ||vn | − |un || |vn |) ≤ 2 |un − vn | → 0 β2 β as n → ∞. Obviously we have 2 ≥ |F (xn ) + F (yn )| ≥ hxn , F (xn ) + F (yn )i and this together with hxn , F (xn ) + F (yn )i = |xn |2 + |yn |2 + hxn − yn , F (yn )i = 2 + hxn − yn , F (yn )i ≥ 2 − |xn − yn | implies lim |F (xn ) + F (yn )| = 2. n→∞ Suppose there exists an 0 > 0 and a subsequence {nk } such that |F (xnk )−F (ynk )| ≥ 0 . Observing |F (xnk )| = |F (ynk )| = 1 and using uniform convexity of X ∗ we conclude that there exists a δ0 > 0 such that |F (xnk ) + F (ynk )| ≤ 2(1 − δ0 ), which is a contradiction to the above. Therefore we have limn→∞ |F (xn ) − F (yn )| = 0. Thus |F (un ) − F (vn )| ≤ |un | |F (xn ) − F (yn )| + ||un | − |vn || |F (yn )| which implies F (un ) converges strongly to F (vn ). This contradiction proves the result. Definition (Monotone Mapping) (a) A mapping A ⊂ X × X ∗ be given. is called monotone if hx1 − x2 , y1 − y2 i ≥ 0 for all [x1 , y1 ], [x2 , y2 ] ∈ A. (b) A monotone mapping A is called maximal monotone if any monotone extension of A coincides with A, i.e., if for [x, y] ∈ X × X ∗ , hx − u, y − vi ≥ 0 for all [u, v] ∈ A then [x, y] ∈ A. (c) The operator A is called coercive if for all sequences [xn , yn ] ∈ A with limn→∞ |xn | = ∞ we have hxn , yn i lim = ∞. n→∞ |xn | 85 (d) Assume that A is single-valued with dom(A) = X. The operator A is called hemicontinuouson X if for all x1 , x2 , x ∈ X, the function defined by t ∈ R → hx, A(x1 + tx2 )i is continuous on R. For example, let F be the duality mapping of X. Then F is monotone, coercive and hemicontinuous. Indeed, for [x1 , y1 ], [x2 , y2 ] ∈ F we have hx1 − x2 , y1 − y2 i = |x1 |2 − hx1 , y2 i − hx2 , y1 i + |x2 |2 ≥ (|x1 | − |x2 |)2 ≥ 0, (5.1) which shows monotonicity of F . Coercivity is obvious and hemicontinuity follows from the continuity of the duality product. Lemma 1 Let X be a finite dimensional Banach space and A be a hemicontinuous monotone operator from X to X ∗ . Then A is continuous. Proof: We first show that A is bounded on bounded subsets. In fact, otherwise there exists a sequence {xn } in X such that |Axn | → ∞ and xn → x0 as n → ∞. By monotonicity we have hxn − x, Ax Axn − i≥0 |Axn | |Axn | Without loss of generality we can assume that hx0 − x, y0 i ≥ 0 for all x ∈ X. Axn |Axn | → y0 in X ∗ as n → ∞. Thus for all x ∈ X and therefore y0 = 0. This is a contradiction and thus A is bounded. Now, assume {xn } converges to x0 and let y0 be a cluster point of {Axn }. Again by monotonicity of A hx0 − x, y0 − Axi ≥ 0 for all x ∈ X. Setting x = x0 + t (u − x0 ), t > 0 for arbitrary u ∈ X, we have hx0 − u, y0 − A(x0 + t (u − x0 )) ≥ 0i for all u ∈ X. Then, letting limit t → 0+ , by hemicontinuity of A we have hx0 − u, y0 − Ax0 i ≥ 0 for all u ∈ X, which implies y0 = Ax0 . Lemma 2 Let X be a reflexive Banach space and A : X → X ∗ be a hemicontinuous monotone operator. Then A is maximumal monotone. Proof: For [x0 , y0 ] ∈ X × X ∗ hx0 − u, y0 − Aui ≥ 0 for all u ∈ X. Setting u = x0 + t (x − x0 ), t > 0 and letting t → 0+ , by hemicontinuity of A we have hx0 − x, y0 − Ax0 i ≥ 0 86 for all x ∈ X. Hence y0 = Ax0 and thus A is maximum monotone. The next theorem characterizes maximal monotone operators by a range condition. Minty–Browder Theorem Assume that X, X ∗ are reflexive and strictly convex. Let F denote the duality mapping of X and assume that A ⊂ X × X ∗ is monotone. Then A is maximal monotone if and only if Range (λF + A) = X ∗ for all λ > 0 or, equivalently, for some λ > 0. Proof: Assume that the range condition is satisfied for some λ > 0 and let [x0 , y0 ] ∈ X × X ∗ be such that hx0 − u, y0 − vi ≥ 0 for all [u, v] ∈ A. Then there exists an element [x1 , y1 ] ∈ A with λF x1 + y1 = λF x0 + y0 . (5.2) From these we obtain, setting [u, v] = [x1 , y1 ], hx1 − x0 , F x1 − F x0 i ≤ 0. By monotonicity of F we also have the converse inequality, so that hx1 − x0 , F x1 − F x0 i = 0. From (5.10) this implies that |x1 | = |x0 | and hx1 , F x0 i = |x1 |2 , hx0 , F x1 i = |x0 |2 . Hence F x0 = F x1 and hx1 , F x0 i = hx0 , F x0 i = |x0 |2 = |F x0 |2 . If we denote by F ∗ the duality mapping of X ∗ (which is also single-valued), then the last equation implies x1 = x0 = F ∗ (F x0 ). This and (5.11) imply that [x0 , y0 ] = [x1 , y1 ] ∈ A, which proves that A is maximal monotone. In stead of the detailed proof of ”only if’ part of Theorem, we state the following results. Corollary Let X be reflexive and A be a monotone, everywhere defined, hemicontinous operator. If A is coercive, then R(A) = X ∗ . Proof: Suppose A is coercive. Let y0 ∈ X ∗ be arbitrary. By the Appland’s renorming theorem, we may assume that X and X ∗ are strictly convex Banach spaces. It then follows from Theorem that every λ > 0, equation λ F xλ + Axλ = y0 has a solution xλ ∈ X. Multiplying this by xλ , λ |xλ |2 + hxλ , Axλ i = hy0 , xλ i. 87 and thus hxλ , Axλ i ≤ |y0 |X ∗ |xλ |X Since A is coercive, this implies that {xλ } is bounded in X as λ → 0+ . Thus, we may assume that xλ converges weakly to x0 in X and Axλ converges strongly to y0 in X ∗ as λ → 0+ . Since A is monotone hxλ − x, y0 − λ F xλ − Axi ≥ 0, and letting λ → 0+ , we have hx0 − x, y0 − Axi ≥ 0, for all x ∈ X. Since A is maximal monotone, this implies y0 = Ax0 . Hence, we conclude R(A) = X ∗ . Theorem (Galerkin Approximation) Assume X is a reflexive, separable Banach space and A is a bounded, hemicontinuous, coercive monotone operator from X into X ∗ . Let Xn = span{φ}ni=1 satisfies the density condition: for each ψ ∈ X and any > 0 there exists a sequence ψn ∈ Xn such that |ψ − ψn | → 0 as n → ∞. The xn be the solution to hψ, Axn i = hψ, f i for all ψ ∈ Xn , (5.3) then there exists a subsequence of {xn } that converges weakly to a solution to Ax = f . Proof: Since hx, Axi/|x|X → ∞ as |x|X → ∞ there exists a solution xn to (5.14) and |xn |X is bounded. Since A is bounded, thus Axn bounded. Thus there exists a subsequence of {n} (denoted by the same) such that xn converges weakly to x in X and Axn converges weakly in X ∗ . Since lim hψ, Axn i = lim (hψn , f i + hψ − ψn , Axn i) = hψ, f i n→∞ n→∞ Axn converges weakly to f . Since A is monotone hxn − u, Axn − Aui ≥ 0 for all u ∈ X Note that lim hxn , Axn i = lim hxn , f i = hx, f i. n→∞ n→∞ Thus taking limit n → ∞, we obtain hx − u, f − Aui ≥ 0 for all u ∈ X. Since A is maximum monotone this implies Ax = f . The main theorem for monotone operators applies directly to the model problem involving the p-Laplace operator −div(|∇u|p−2 ∇u) = f on Ω (with appropriate boundary conditions) and −∆u + c u = f, − ∂ u ∈ β(u). at ∂Ω ∂n 88 with β maximal monotone on R. Also, nonlinear problems of non-variational form are applicable, e.g., Lu + F (u, ∇u) = f on Ω where L(u) = −div(σ(∇u) − ~b u) and we are looking for a solution u ∈ W01,p (Ω), 1 < p < ∞. We assume the following conditions: (i) Monotonicity for the principle part L(u): (σ(ξ) − σ(η), ξ − η)Rn ≥ 0 for all ξ, η ∈ Rn . (ii) Monotonicity for F = F (u): (F (u) − F (v), u − v) ≥ 0 for all u, v ∈ R. (iii) Coerciveness and Growth condition: for some c, d > 0 (σ(ξ), σ) ≥ c |ξ|p , |σ(ξ)| ≤ d (1 + |ξ|p−1 ) hold for all ξ ∈ Rn . 5.1 Pseudo-Monotone operator Next, we examine a somewhat more general class of nonlinear operators, called pseudomonotone operators. In applications, it often occurs that the hypotheses imposed on monotonicity are unnecessarily strong. In particular, the monotonicity assumption F (u) involves both the first-order derivatives and the function itself. In general the compactness will take care of the lower-order terms. Definition (Pseudo-Monotone Operator) Let X be a reflexive Banach space. An operator T : X → X ∗ is said to be pseudo-monotone operator if T is a bounded operator (not necessarily continuous) and if whenever un converges weakly to u in X and lim suphun − u, T (un )i ≤ 0, n→∞ it follows that, for all v ∈ X, lim inf hun − v, T (un )i ≥ hu − v, T (u)i. n→∞ (5.4) For example, a monotone and hemi-continuous operator is pseudo-monotone. Lemma If T is maximal monotone, then T is pseudo-monotone. Proof: Since T is monotone, hun − u, T (un )i ≥ hun − u, T (u)i → 0 it follows from (5.4) that limn hun − u, T (un )i = 0. Assume un → u and T (un ) → y weakly in X and X ∗ , respectively. Thus, 0 ≤ hun − v, T (un ) − T (v)i = hun − u + u − v, T (un ) − T (v)i → hu − v, y − T (v)i. 89 Since T is maximal monotone, we have y = T (u). Now, limhun − v, T (un )i = limhun − u + u − v, T (un )i = hu − v, T (u)i. n n A class of pseudo-monotone operators is intermediate between monotone and hemicontinuous. The sum of two pseudo-monotone operators is pseudo-monotone. Moreover, the sum of a pseudo-monotone and a strongly continuous operator (xn converges weakly to x, then T xn → T x strongly) is pseudo-monotone. Property of Pseudo-monotone operator (1) Letting v = u in (5.4) 0 ≤ liminf hun − u, T (un )i ≤ lim suphun − u, T (un )i ≤ 0 and thus limn hun − u, T (un )i = 0. (2) From (1) and (5.4) hu − v, T ui ≤ lim infhun − v, T (un )i + lim infhu − un − u, T (un )i = lim infhu − v, T (un )i (3) T (un ) → T (u) weakly. In fact, from(2) hv, T (u)i ≤ lim infhv, T (un )i ≤ lim suphv, T (un )i = lim suphun − u, T (un )i + lim suphu + v − un , T (un )i = − lim infhun − (u + v), T (un )i ≤ hv, T (u)i. (4) hun − u, T (un ) − T (u)i → 0 By the hypothesis 0 ≤ lim infhun − u, T (un )i = lim infhun − u, T (un )i − lim infhun − u, T (u)i = lim infhun − u, T (un ) − T (u)i ≤ lim suphun − u, T (un ) − T (u)i = lim suphun − u, T (un ) − T (u)i ≤ 0 (5) hun , T (un )i → hu, T (u)i. Since hun , T (un )i = hun − u, T (un ) − ui − hu, T (un )i − hun , T (u)i + hun , T (un )i the claim follows from (3)–(4). (6) Conversely, if T (un ) → T (u) weakly and hun , T (un )i → hu, T (u)i → 0, then the hypothesis holds. Using a very similar proof to that of the Browder-Minty theorem, one can show the following. Theorem (Bresiz) Assume, the operator A : X → X ∗ is pseudo-monotone, bounded and coercive on the real separable and reflexive Banach space X. Then, for each f ∈ X ∗ the equation A(u) = f has a solution. 90 5.2 Generalized Pseudo-Monotone Mapping In this section we discuss pseudo-monotone mappings. We extend the notion of the maximal monotone mapping, i.e., the maximal monotone mapping is a generalized pseudo-monotone mappings. Definition (Pseudo-monotone Mapping) (a) The set T (u) is nonempty, bounded, closed and convex for all u ∈ X. (b) T is upper semicontinuous from each finite-dimensional subspace F of X to the weak topology on X ∗ , i.e., to a given element u+ ∈ F and a weak neighborhood V of T (u), in X ∗ there exists a neighborhood U of u0 in F such that T (u) ⊂ V for all u ∈ U. (c) If {un } is a sequence in X that converges weakly tou ∈ X and if wn ∈ T (un ) is such that lim suphun − u, wn i ≤ 0, then to each element v ∈ X there exists w ∈ T (u) with the property that lim infhun − v, wn i ≥ hu − v, wi. Definition (Generalized Pseudo-monotone Mapping) A mapping T from X into X ∗ is said to be generalized pseudo-monotone if the following is satisfied. For any sequence {un } in X and a corresponding sequence {wn } in X ∗ with wn ∈ T (xn ), if un converges weakly to u and wn weakly to w such that lim suphun − u, wn )i ≤ 0, n→∞ then w ∈ T (u) and hun , wn i → hu, wi. Let X be a reflexive Banach space, T a pseudo-monotone mapping from X into X ∗ . Then T is generalized pseudo-monotone. Conversely, suppose T is a bounded generalized pseudo-monotone mapping from X into X ∗ . and assume that for each u ∈ X, T u is a nonempty closed convex subset of X ∗ . Then T is pseudo-monotone. PROPOSITION 2. A maximal monotone mapping T from the Banach space X into X ∗ is generalized pseudo-monotone. Proof: Let {un } be a sequence in X that converges weakly to u ∈ X, {wn } a sequence in X ∗ with wn ∈ T (un ) that converges weakly to w ∈ X ∗ . Suppose that lim suphun − u, wn i ≤ 0 Let [x, y] ∈ T be an arbitrary element of the graph G(T ). By the monotonicity of T , hun − x, wn − yi ≥ 0 Since hun , wn i = hun − x, wn − yi + hx, wn i + hun , yi − hx, yi, where hx, wn i + hun , yi → hx, wi + hu, yi − hx, yi, we have hu, wi ≤ lim suphun , wn i ≥ hx, wi + hu, yi − hx, yi, i.e., hu − x, w − yi ≥ 0 91 Since T is maximal, w ∈ T (u). Consequently, hun − u, wn − wi ≥ 0. Thus, lim infhun , wn i ≥ lim inf(hun , wihu, wn i + hu, wi = hu, wi, It follows that limhun , wn i → hu, wi. DEFINITION 5 (1) A mapping T from X into X ∗ is coercive if there exists a function c : R+ → R+ with limr∞ c(r) = ∞ such that hu, wi ≥ c(|u|)|u| for all [u, w] ∈ G(T ). (1) An operator T from X into X ∗ is called smooth if it is bounded, coercive, maximal monotone, and has effective domain D(T ) = X. (2) Let T be a generalized pseudo-monotone mapping from X into X ∗ . Then T is said to be regular if R(T + T2 ) = X ∗ for each smooth operator T2 . (3) T is said to be quasi-bounded if for each M > 0 there exists K(M ) > 0 such that whenever [u, w] ∈ G(T ) of T and hu, wi ≤ M |u|, |u| ≤ M, then |w| ≤ K(M ). T is said to be strongly quasi-bounded if for each M > 0 there exists K(M ) > 0 such that whenever [u, w] ∈ G(T ) of T and hu, wi ≤ M |u| ≤ M, then |w| ≤ K(M ). Theorem 1 Let T = T1 + T0 , where T1 is maximal monotone with 0 ∈ D(T ), while T0 is a regular generalized pseudo-monotone mapping such that for a given constant k, hu, wik |u| for all [u, w] ∈ G(T ). If either T0 is quasi-bounded or T1 is strongly quasi-bounded, then T is regular. Theorem 2 (Browder). Let X be a reflexive Banach space and T be a generalized pseudo-monotone mapping from X into X ∗ . Suppose that R( F + T ) = X ∗ for > 0 and T is coercive. Then R(T ) = X ∗ . Lemma 1 Let T be a generalized pseudo-monotone mapping from the reflexive Banach space X into X ∗ , and let C be a bounded weakly closed subset of X. Then T (C) = {w ∈ X ∗ : w ∈ T (u)for some u ∈ C} is closed in the strong topology of X ∗ . Proof: Let Let {wn } be a sequence in T (C) converging strongly to some w ∈ X ∗ . For each n, there exists un ∈ C such that wn ∈ T (un ). Since C is bounded and weakly closed, without loss of the generality we assume un → u ∈ C, weakly in X. It follows that limhun − u, wn i = 0. The generalized pseudo-monotonicity of T implies that w ∈ T (u), i.e., w lies in T (C). Proof of Theorem 2: Let J denote the normalized Let F denote the duality mapping of X into X ∗ . It is known that F is a smooth operator. Let w0 be a given element of X ∗ . We wish to show that w0 ∈ R(T ). For each > 0, it follows from the regularity of T that there exists an element u ∈ X such that w0 − p ∈ T (u ) for some p ∈ F (u ). 92 Let w = w0 − p . Since T is coercive and hu , w i = |u |2 , we have for some k |u |2 ≤ k |u | + |u ||w0 |. Hence |u | = |p | ≤ |w0 |. and |w | ≤ |w0 | + |p | ≤ k + 2|w0 | Since T is coercive, there exits M > 0 such that |u | ≤ M and thus |w − w0 | ≤ |p | = |u | → 0 as → 0+ It thus follows from Lemma 1 that T (B(0, M )) is closed and thus w0 ∈ T (B(0, M )), i.e., w0 ∈ R(T ). THEOREM 3 Let X be a reflexive Banach space and T a pseudo-monotone mapping from X into X ∗ . Suppose that T is coercive, R(T ) = X ∗ . PROPOSITION 10. Let F be a finite-dimensional Banach space, a mapping from F into F ∗ such that for each u ∈ F , T(u) is a nonempty bounded closed convex subset of F ∗ . Suppose that T is coercive and upper semicontinuous from F to F ∗ . Then R(T ) = F ∗ . Proof: Since for each w0 ∈ F ∗ define the mapping Tw0 : F → F ∗ by Tw0 (u) = T (u) − w0 satisfies the same hypotheses as the original mapping T . It thus suffices to prove that 0 ∈ R(T ). Suppose that 0 does not lie in R(T ). Then for each u ∈ F , there exists v(u) ∈ F such that inf hv(u), wi > 0 w∈T (u) Since T is coercive, there exists a positive constant R such that c(R) > 0, and hence for each u ∈ F with |u| = R and each w ∈ T (u), hu, wi ≥ λ > 0 where λ = c(R)R, For such points u ∈ F , we may take v(u) = u. For a given nonzero v0 in F let Wv0 = {u ∈ F : inf hv0 , wi > 0}. w∈T (u) The family {Wv0 : v0 ∈ F, v0 6= 0} forms a covering of the space F . By the upper semicontinuity of T , from F to F ∗ each Wv0 is open in F . Hence the family {Wv0 : v0 ∈ F, v0 6= 0} forms an open covering of F . We now choose a finite open covering {V1 , · · · , Vm } of the closed ball B(0, R) in F of radius R about the origin, with the property that for each k there exists an element vk in F such that Vk ⊂ Wvk and the additional condition that if Vk intersects the boundary sphere S(0, R), then vk is a point of Vk ∩ S(0, R) and the diameter of such Vk is less than R2 .. We next choose a partition of unity {α1 , · · · , αm } subordinated to the covering {V1 , · · · , Vm } where each αk is a continuous mapping of F into [0, I] which vanishes outside the corresponding 93 P set Vk , and with the property that k αk (x) = 1 for each x ∈ B(0, R). Using this partition of unity, we may define a continuous mapping f of B(0, R) into F by setting X f (x) = α(x)vk . k We note that for each k for which alphak (x) > 0 and for any winT0 x, we have hvk , wi > 0 since x must lie in Vk , which is a subset of Wk . Hence for each x ∈ B(0, R) and any w ∈ T0 x X hf (x), wi = α(x)hvk , wi > 0 k Consequently, f (x) 6= 0 for any x ∈ B(0, R). This implies that the degree of the mapping f on B(0, R) with respect to 0 equals 0. For x ∈ S(0, R), on the other hand, f (x) is a convex linear combination of points vk ∈ S(0, R), each of which is at distance at most R2 from the point x. We infer that |f (x) − x| ≤ R , x ∈ S(0, R). 2 By this inequality, f considered as a mapping from B(0, R) into F is homotopic to the identity mapping. Hence the degree of f on B(0, R) with respect to 0 is 1. This contradiction proves that 0 must lie in T0 u for some u ∈ B(0, R). Proof of Theorem 3 Let Λ be the family of all finite-dimensional subspaces F of X, ordered by inclusion. For F ∈ Λ, let jF : F → X denote the inclusion mapping of F into X, and jF∗ : X ∗ → F ∗ the dual projection mapping of X ∗ onto F ∗ . The operator TF = jF∗ T jF then maps F into F ∗ . For each u ∈ F , TF u is a weakly compact convex subset of X ∗ and jF∗ is continuous from the weak topology on X ∗ to the (unique) topology on F ∗ . Hence TF u is a nonempty closed convex subset of F ∗ . Since T is upper semicontinuous from F to F ∗ with X ∗ given its weak topology, TF is upper semicontinuous from F to F ∗ . By Proposition 10, to each F ∈ Λ there exists an element uF ∈ F such that jF∗ w0 = wF for some wF ∈ TF uF . The coerciveness of T implies that wF ∈ TF uF , i.e. huF , jF∗ w0 i = huF , wF i ≥ c(|uF |)|uF |. Consequently, the elements {|uF |} are uniformly bounded by M for all F ∈ Λ. For F ∈ Λ, let [ VF = {uF 0 }. F 0 ∈Λ, F 0 ⊂F Then the set VF is contained in the closed ball B(0, M ) in X Since B(0, M ) is weakly compact, and since the family {weak closure(VF )} has the finite intersection property, the intersection ∩F ∈Λ {weak closure(VF )} is not empty. Let u0 be an element contained in this intersection. The proof will be complete if we show that 0 ∈ Tw0 u0 . Let v ∈ X be arbitrarily given. We choose F ∈ Λ such that it contains u0 , v ∈ F Let {uFk }denote a sequence in VF converging weakly to u0 . Since jF∗ k wFk = 0, we have huFk − u0 , wFk i = 0 for all k. 94 Consequently, by the pseudo-monotonicity of T , to given v ∈ X there exists w(v) ∈ T (u0 ) with (5.5) limhuFk − v, wFk i ≥ hu0 − v, wi. Suppose now that 0 ∈ / T (u0 ). Then 0 can be separated from the nonempty closed convex set T (u0 ), i.e., there exists an element x = u − v ∈ X such that inf hu0 − v, zi > 0. z∈T (u0 ) But this is a contradiction to (5.5). Navier-Stokes equations Example Consider the constrained minimization min F (y) subject to E(y) = 0. where E(y) = E0 y + f (y). The necessary optimality condition for x = (y, p) ∂F − E 0 (y)∗ p A(y, p) ∈ . E(y) Let A0 (y, p) ∈ and ∂F − E0∗ p E0 y A1 (y, p) = −f 0 (y)p f (y) , . 5.3 Dissipative Operators and Semigroup of Nonlinear Contractions In this section we consider du ∈ Au(t), dt u(0) = u0 ∈ X for the dissipative mapping A on a Banach space X. Definition (Dissipative) A mapping A on a Banach space X is dissipative if |x1 − x2 − λ (y1 − y2 )| ≥ |x1 − x2 | for all λ > 0 and [x1 , y1 ], [x2 , y2 ] ∈ A, or equivalently hy1 − y2 , x1 − x2 i− ≤ 0 for all [x1 , y1 ], [x2 , y2 ] ∈ A. and if in addition R(I − λA) = X, then A is m-dissipative. In particular, it follows that if A is dissipative, then for λ > 0 the operator (I − λ A)−1 is a single-valued and nonexpansive on R(I − λ A), i.e., |(I − λ A)−1 x − (I − λ A)−1 y| ≤ |x − y| 95 for all x, y ∈ R(I − λ A). Define the resolvent and Yosida approximation A by Jλ x = (I − λ A)−1 x ∈ dom (A), x ∈ dom (Jλ ) = R(I − λ A) (5.6) Aλ = λ−1 (Jλ x − x), x ∈ dom (Jλ ). We summarize some fundamental properties of Jλ and Aλ in the following theorem. Theorem 1.4 Let A be an ω− dissipative subset of X × X, i.e., |x1 − x2 − λ (y1 − y2 )| ≥ (1 − λω) |x1 − x2 | for all 0 < λ < ω −1 and [x1 , y1 ], [x2 , y2 ] ∈ A and define kAxk by kAxk = inf{|y| : y ∈ Ax}. Then for 0 < λ < ω −1 , (i) |Jλ x − Jλ y| ≤ (1 − λω)−1 |x − y| for x, y ∈ dom (Jλ ). (ii) Aλ x ∈ AJλ x for x ∈ R(I − λ A). (iii) For x ∈ dom(Jλ ) ∩ dom (A) |Aλ x| ≤ (1 − λω)−1 kAxk and thus |Jλ x − x| ≤ λ(1 − λω)−1 kAxk. (iv) If x ∈ dom (Jλ ), λ, µ > 0, then µ λ−µ x+ Jλ x ∈ dom (Jµ ) λ λ and Jλ x = Jµ λ−µ µ x+ Jλ x . λ λ (v) If x ∈ dom (Jλ )∩dom (A) and 0 < µ ≤ λ < ω −1 , then (1−λω)|Aλ x| ≤ (1−µω) |Aµ y| (vi) Aλ is ω −1 (1 − λω)−1 -dissipative and for x, y ∈ dom(Jλ ) |Aλ x − Aλ y| ≤ λ−1 (1 + (1 − λω)−1 ) |x − y|. Proof: (i) − −(ii) If x, y ∈ dom (Jλ ) and we set u = Jλ x and v = Jλ y, then there exist u ˆ and vˆ such that x = u − λ u ˆ and y = v − λ vˆ. Thus, from (1.8) |Jλ x − Jλ y| = |u − v| ≤ (1 − λω)−1 |u − v − λ (ˆ u − vˆ)| = (1 − λω)−1 |x − y|. Next, by the definition Aλ x = λ−1 (u − x) = u ˆ ∈ Au = AJλ x. (iii) Let x ∈ dom (Jλ )∩dom (A) and x ˆ ∈ Ax be arbitrary. Then we have Jλ (x−λ x ˆ) = x since x − λ x ˆ ∈ (I − λ A)x. Thus, |Aλ x| = λ−1 |Jλ x−x| = λ−1 |Jλ x−Jλ (x−λ x ˆ)| ≤ (1−λω)−1 λ−1 |x−(x−λ x ˆ)| = (1−λω)−1 |ˆ x|. which implies (iii). (iv) If x ∈ dom (Jλ ) = R(I − λ A) then we have x = u − λ u ˆ for [u, u ˆ] ∈ A and thus u = Jλ x. For µ > 0 µ λ−µ µ λ−µ x+ Jλ x = (u − λ u ˆ) + u = u − µu ˆ ∈ R(I − µ A) = dom (Jµ ). λ λ λ λ and Jµ µ λ−µ x+ Jλ x λ λ = Jµ (u − µ u ˆ) = u = Jλ x. 96 (v) From (i) and (iv) we have λ |Aλ x| = |Jλ x − x| ≤ |Jλ x − Jλ y| + |Jµ x − x| µ λ−µ ≤ Jµ x+ Jλ x − Jµ x + |Jµ x − x| λ λ λ−µ λ x + λ Jλ x − x + |Jµ x − x| −1 µ ≤ (1 − µω) = (1 − µω)−1 (λ − µ) |Aλ x| + µ |Aµ x|, which implies (v) by rearranging. (vi) It follows from (i) that for ρ > 0 |x − y − ρ (Aλ x − Aλ y)| = |(1 + ≥ (1 + ρ ρ ) (x − y) − (Jλ x − Jλ y)| λ λ ρ ρ ) |x − y| − |Jλ x − Jλ y| λ λ ≥ ((1 + ρ ρ ) − (1 − λω)−1 ) |x − y| = (1 − ρ ω(1 − λω)−1 ) |x − y|. λ λ The last assertion follows from the definition of Aλ and (i). Theorem 1.5 (1) A dissipative set A ⊂ X × X is m-dissipative, if and only if R(I − λ0 A) = X for some λ0 > 0. (2) An m-dissipative mapping is maximal dissipative, i.e., all dissipative set containing A in X × X coincide with A. (3) If X = X ∗ = H is a Hilbert space, then the notions of the maximal dissipative set and m-dissipative set are equivalent. Proof: (1) Suppose R(I − λ0 A) = X. Then it follows from Theorem 1.4 (i) that Jλ0 is contraction on X. We note that λ0 λ0 I − λA = I − (1 − ) Jλ0 (I − λ0 A) (5.7) λ λ for 0 < λ < ω −1 . For given x ∈ X define the operator T : X → X by T y = x + (1 − λ0 ) Jλ0 y, λ Then |T y − T z| ≤ |1 − y∈X λ0 | |y − z| λ where |1 − λλ0 | < 1 if 2λ > λ0 . By Banach fixed-point theorem the operator T has a unique fixed point z ∈ X, i.e., x = (I − (1 − λλ0 Jλ0 )z. Thus, x ∈ (I − (1 − λ0 ) Jλ0 )(I − λ0 A)dom (A). λ 97 and it thus follows from (5.7) that R(I − λ A) = X if λ > λ20 . Hence, (1) follows from applying the above argument repeatedly. (2) Assume A is m-dissipative. Suppose A˜ is a dissipative set containing A. We need to show that A˜ ⊂ A. Let [x, x ˆ] ∈ A˜ Since x − λ x ˆ ∈ X = R(I − λ A), for λ > 0, there exists a [y, yˆ] ∈ A such that x − λ x ˆ = y − λ yˆ. Since A ⊂ A˜ it follows that [y, yˆ] ∈ A and thus |x − y| ≤ |x − y − λ (ˆ x − yˆ)| = 0. Hence, [x, x ˆ] = [y, yˆ] ∈ A. (3) It suffices to show that if A is maximal dissipative, then A is m-dissipative. We use the following extension lemma, Lemma 1.6. Let y be any element of H. By Lemma 1.6, taking C = H, we have that there exists x ∈ H such that (ξ − x, η − x + y) ≤ 0 for all [ξ, η] ∈ A. and thus (ξ − x, η − (x − y)) ≤ 0 for all [ξ, η] ∈ A. Since A is maximal dissipative, this implies that [x, x − y] ∈ A, that is x − y ∈ Ax, and therefore y ∈ R(I − H). Lemma 1.6 Let A be dissipative and C be a closed, convex, non-empty subset of H such that dom (A) ∈ C. Then for every y ∈ H there exists x ∈ C such that (ξ − x, η − x + y) ≤ 0 for all [ξ, η] ∈ A. Proof: Without loss of generality we can assume that y = 0, for otherwise we define Ay = {[ξ, η + y] : [ξ, η] ∈ A} with dom (Ay ) = dom (A). Since A is dissipative if and only if Ay is dissipative , we can prove the lemma for Ay . For [ξ, η] ∈ A, define the set C([ξ, η]) = {x ∈ C : (ξ − x, η − x) ≤ 0}. T Thus, the lemma is proved if we can show that [ξ,η]∈A C([ξ, η]) is non-empty. 5.3.1 Properties of m-dissipative operators In this section we discuss some properties of m-dissipative sets. Lemma 1.7 Let X ∗ be a strictly convex Banach space. If A is maximal dissipative, then Ax is a closed convex set of X for each x ∈ dom (A). Proof: It follows from Lemma that the duality mapping F is single-valued. First, we show that Ax is convex. Let x ˆ1 , xˆ2 ∈ Ax and set x ˆ = αˆ x1 + (1 − α) x ˆ2 for 0 ≤ α ≤ 1. Then, Since A is dissipative, for all [y, yˆ] ∈ A Re hˆ x − yˆ, F (x − y)i = α Re hˆ x1 − yˆ, F (x − y)i + (1 − α) Re hˆ x2 − yˆ, F (x − y)i ≤ 0. Thus, if we define a subset A˜ by if z ∈ dom (A) \ {x} Az ˜ Az = Ax ∪ {ˆ x} if z = x, 98 ˜ = dom (A). Since A is maximal then A˜ is a dissipative extension of A and dom (A) ˜ = Ax and thus x dissipative, it follows that Ax ˆ ∈ Ax as desired. Next, we show that Ax is closed. Let x ˆn ∈ Ax and limn→∞ x ˆn = x ˆ. Since A is dissipative, Re hˆ xn − yˆ, x − yi ≤ 0 for all [y, yˆ] ∈ A. Letting n → ∞, we obtain Re hˆ x − yˆ, x − yi ≤ 0. Hence, as shown above x ˆ ∈ Ax as desired. Definition 1.4 A subset A of X × X is said to be demiclosed if xn → x and yn * y and [xn , yn ] ∈ A imply that [x, y] ∈ A. A subset A is closed if [xn , yn ], xn → x and yn → y imply that [x, y] ∈ A. Theorem 1.8 Let A be m-dissipative. Then the followings hold. (i) A is closed. (ii) If {xλ } ⊂ X such that xλ → x and Aλ xλ → y as λ → 0+ , then [x, y] ∈ A. Proof: (i) Let [xn , x ˆn ] ∈ A and (xn , x ˆn ) → (x, x ˆ) in X × X. Since A is dissipative Re hˆ xn − yˆ, xn − yii ≤ 0 for all [y, yˆ] ∈ A. Since h·, ·ii is lower semicontinuous, letting n → ∞, we obtain Re hˆ x − yˆ, x − yii ≤ for all [y, yˆ] ∈ A. Then A1 = [x, x ˆ] ∪ A is a dissipative extension of A. Since A is maximal dissipative, A1 = A and thus [x, x ˆ] ∈ A. Hence, A is closed. (ii) Since {Aλ x} is a bounded set in X, by the definition of Aλ , lim |Jλ xλ − xλ | → 0 and thus Jλ xλ → x as λ → 0+ . But, since Aλ xλ ∈ AJλ xλ , it follows from (i) that [x, y] ∈ A. Theorem 1.9 Let A be m-dissipative and let X ∗ be uniformly convex. Then the followings hold. (i) A is demiclosed. (ii) If {xλ } ⊂ X such that xλ → x and {|Aλ x|} is bounded as λ → 0+ , then x ∈ dom (A). Moreover, if for some subsequence Aλn xn * y, then y ∈ Ax. (iii) limλ→0+ |Aλ x| = kAxk. Proof: (i) Let [xn , x ˆn ] ∈ A be such that lim xn = x and w − lim x ˆn = x ˆ as n → ∞. Since X ∗ is uniformly convex, from Lemma the duality mapping is single-valued and uniformly continuous on the bounded subsets of X. Since A is dissipative Re hˆ xn − yˆ, F (xn − y)i ≤ 0 for all [y, yˆ] ∈ A. Thus, letting n → ∞, we obtain Re hˆ x − yˆ, F (x − y)i ≤ 0 for all [y, yˆ] ∈ A. Thus, [x, x ˆ] ∈ A, by the maximality of A. Definition 1.5 The minimal section A0 of A is defined by A0 x = {y ∈ Ax : |y| = kAxk} with dom (A0 ) = {x ∈ dom (A) : A0 x is non-empty}. Lemma 1.10 Let X ∗ be a strictly convex Banach space and let A be maximal dissipative. Then, the followings hold. (i) If X is strictly convex, then A0 is single-valued. (ii) If X reflexible, then dom (A0 ) = dom (A). (iii) If X strictly convex and reflexible, then A0 is single-valued and dom (A0 ) = dom (A). Theorem 1.11 Let X ∗ is a uniformly convex Banach space and let A be m-dissipative. Then the followings hold. (i) limλ→0+ F (Aλ x) = F (A0 x) for each x ∈ dom (A). Moreover, if X is also uniformly convex, then (ii) limλ→0+ Aλ x = A0 x for each x ∈ dom (A). 99 Proof: (1) Let x ∈ dom (A). By (ii) of Theorem 1.2 |Aλ x| ≤ kAxk Since {Aλ x} is a bounded sequence in a reflexive Banach space (i.e., since X ∗ is uniformly convex, X ∗ is reflexive and so is X), there exists a weak convergent subsequence {Aλn x}. Now we set y = w − limn→∞ Aλn x. Since from Theorem 1.2 Aλn x ∈ AJλn x and limn→∞ Jλn x = x and from Theorem 1.10 A is demiclosed, it follows that [x, y] ∈ A. Since by the lower-semicontinuity of norm this implies kAxk ≤ |y| lim inf |Aλn x| ≤ lim sup |Aλn x| ≤ kAxk, n→∞ n→∞ we have |y| = kAxk = limn→∞ |Aλn x| and thus y ∈ A0 x. Next, since |F (Aλn x)| = |Aλn x| ≤ kAxk, F (Aλn x) is a bounded sequence in the reflexive Banach space X ∗ and has a weakly convergent subsequence F (Aλk x) of F (Aλn x). If we set y ∗ = w − limk→∞ F (Aλk x), then it follows from the dissipativity of A that Re hAλk x − y, F (Aλk x)i = λ−1 n Re hAλk x − y, F (Jλk xx)i ≤ 0, or equivalently |Aλk x|2 ≤ Re hy, F (Aλn x)i. Letting k → ∞, we obtain |y|2 ≤ Re hy, y ∗ i. Combining this with |y ∗ | ≤ lim |F (Aλk x)| = lim |Aλk x| = |y|, k→∞ k→∞ we have |y ∗ | ≤ Re hy, y ∗ i ≤ |hy, y ∗ i| ≤ |y||y ∗ | ≤ |y|2 . Hence, hy, y ∗ i = |y|2 = |y ∗ |2 and we have y ∗ = F (y). Also, limk→∞ |F (Aλk x)| = |y| = |F (y)|. It thus follows from the uniform convexity of X ∗ that lim F (Aλk x) = F (y). k→∞ Since Ax is a closed convex set of X from Theorem , we can show that x → F (A0 x) is single-valued. In fact, if C is a closed convex subset of X, the y is an element of minimal norm in C, if and only if |y| ≤ |(1 − α)y + α z| for all z ∈ C and 0 ≤ α ≤ 1. Hence, hz − y, yi+ ≥ 0. and from Theorem 1.10 0 ≤ Re hz − y, f i = Re hz, f i − |y|2 (5.8) for all z ∈ C and f ∈ F (y). Now, let y1 , y2 be arbitrary in A0 x. Then, from (5.8) |y1 |2 ≤ Re hy2 , F (y1 )i ≤ |y1 ||y2 | 100 which implies that hy2 , F (y1 )i = |y2 |2 and |F (y1 )| = |y2 |. Therefore, F (y1 ) = F (y2 ) as desired. Thus, we have shown that for every sequence {λ} of positive numbers that converge to zero, the sequence {F (Aλ x)} has a subsequence that converges to the same limit F (A0 x). Therefore, limλ→0 F (Aλ x) → F (A0 x). Furthermore, we assume that X is uniformly convex. We have shown above that for x ∈ dom (A) the sequence {Aλ } contains a weak convergent subsequence {Aλn x} and if y = w − limn→∞ Aλn x then [x, y] ∈ A0 and |y| = limn→∞ |Aλn x|. But since X is uniformly convex, it follows from Theorem 1.10 that A0 is single-valued and thus y = A0 x. Hence, w − limn→∞ Aλn x = A0 x and limn→∞ |Aλn x| = |A0 x|. Since X is uniformly convex, this implies that limn→∞ Aλn x = A0 x. Theorem 1.12 Let X is a uniformly convex Banach space and let A be m-dissipative. Then dom (A) is a convex subset of X. Proof: It follows from Theorem 1.4 that |Jλ x − x| ≤ λ kAxk for x ∈ dom (A) Hence |Jλ x − x| → 0 as λ → 0+ . Since Jλ x ∈ dom (A) for X ∈ X, it follows that dom (A) = {x ∈ X : |Jλ x − x| → 0 as λ → 0+ }. Let x1 , x2 ∈ dom (A) and 0 ≤ α ≤ 1 and set x = α x1 + (1 − α) x2 . Then, we have |Jλ x − x1 | ≤ |x − x1 | + |Jλ x1 − x1 | (5.9) |Jλ x − x2 | ≤ |x − x2 | + |Jλ x2 − x2 | where x − x1 = (1 − α) (x2 − x1 ) and x − x2 = α (x1 − x2 ). Since {Jλ x} is a bounded set and a uniformly convex Banach space is reflexive, it follows that there exists a subsequence {Jλn x} that converges weakly to z. Since the norm is weakly lower semicontinuous, letting n → ∞ in (5.9) with λ = λn , we obtain |z − x1 | ≤ (1 − α) |x1 − x2 | |z − x2 | ≤ α |x1 − x2 | Thus, |x1 − x2 | = |(x1 − z) + (z − x2 )| ≤ |x1 − z| + |z − x2 | ≤ |x1 − x2 | and therefore |x1 −z| = (1−α) |x1 −x2 |, |z −x2 | = α |x1 −x2 | and |(x1 −z)+(z −x2 )| = |x1 − x2 |. But, since X is uniformly convex we have z = x and w − lim Jλn x = x as n → ∞. Since we also have |x − x1 | ≤ lim inf |Jλn x − Jλn x1 | ≤ |x − x1 | n→∞ |Jλn x − Jλn x1 | → |x − x1 | and w − lim Jλn x − Jλn x1 = x − x1 as n → ∞. Since X is uniformly convex, this implies that limλ→0+ Jλ x = x and x ∈ dom (A). 101 5.3.2 Generation of Nonlinear Semigroups Definition 2.1 Let X0 be a subset of X. A semigroup S(t), t ≥ of nonlinear contractions on X0 is a function with domain [0, ∞) × X0 and range in X0 satisfying the following conditions: S(t + s)x = S(t)S(s)x and S(0)x = x for x ∈ X0 , t, s ≥ 0 t → S(t)x ∈ X is continuous |S(t)x − S(t)y| ≤ |x − y| for t ≥ 0, x, y ∈ X0 In this section, we consider the generation of nonlinear semigroup by CrandallLiggett on a Banach space X. Let A be a ω-dissipative operator and Jλ = (I − λ A)−1 is the resolvent. The following estimate plays an essential role in the Crandall-Liggett generation theory. Lemma 2.1 Assume a sequence {an,m } of positive numbers satisfies an,m ≤ α an−1,m−1 + (1 − α) an−1,m (5.10) and a0,m ≤ m λ and an,0 ≤ n µ for λ ≥ µ > 0 and α = µλ . Then we have the estimate 1 1 an,m ≤ [(mλ − nµ)2 + mλ2 ] 2 + [(mλ − nµ)2 + nλµ] 2 . (5.11) Proof: From the assumption, (5.11) holds for either m = 0 or n = 0. We will use the induction in n, m, that is if (5.11) holds for (n + 1, m) when (5.11) is true for (n, m) and (n, m − 1), then (5.11) holds for all (n, m). Let β = 1 − α. We assume that (5.11) holds for (n, m) and (n, m − 1). Then, by (5.10) and Cauchy-Schwarz inequality 1 1 αx + βy ≤ (α + β) 2 (αx2 + βy 2 ) 2 an+1,m ≤ α an,m−1 + β an,m 1 1 ≤ α ([((m − 1)λ − nµ)2 + (m − 1)λ2 ] 2 + [((m − 1)λ − nµ)2 + nλµ] 2 ) 1 1 +β ([(mλ − nµ)2 + mλ2 ] 2 + [(mλ − nµ)2 + nλµ] 2 ) 1 1 = (α + β) 2 (α [((m − 1)λ − nµ)2 + (m − 1)λ2 ] + β [(mλ − nµ)2 + mλ2 ]) 2 1 1 +(α + β) 2 (α [((m − 1)λ − nµ)2 + nλµ] + β [(mλ − nµ)2 + nλµ]) 2 1 1 ≤ [(mλ − (n + 1)µ)2 + mλ2 ] 2 + [(λ − (n + 1)µ)2 + (n + 1)λµ] 2 . Here, we used α + β = 1, αλ = µ and α [((m − 1)λ − nµ)2 + (m − 1)λ2 ] + β [(mλ − nµ)2 + mλ2 ]) ≤ (mλ − nµ)2 + mλ2 − αλ(mλ − nµ) = (mλ − (n + 1)µ)2 + mλ2 − µ2 α [((m − 1)λ − nµ)2 + nλµ] + β [(mλ − nµ)2 + nλµ] ≤ (mλ − nµ)2 + (n + 1)λµ − 2αλ(mλ − nµ) ≤ (mλ − (n + 1)µ)2 + (n + 1)λµ − µ2 . 102 Theorem 2.2 Assume A be a dissipative subset of X × X and satisfies the range condition dom (A) ⊂ R(I − λ A) for all sufficiently small λ > 0. (5.12) Then, there exists a semigroup of type ω on S(t) on dom (A) that satisfies for x ∈ dom (A) t S(t)x = lim (I − λ A)−[ λ ] x, t ≥ 0 (5.13) λ→0+ and |S(t)x − S(t)x| ≤ |t − s| kAxk for x ∈ dom (A), and t, s ≥ 0. Proof: First, note that from (5.14) dom (A) ⊂ dom (Jλ ). Let x ∈ dom (A) and set an,m = |Jµn x − Jλm x| for n, m ≥ 0. Then, from Theorem 1.4 a0,m = |x − Jλm x| ≤ |x − Jλ x| + |Jλ x − Jλ2 x| + · · · + |Jλm−1 x − Jλm x| ≤ m |x − Jλ x| ≤ mλ kAxk. Similarly, an,0 = |Jµn x − x| ≤ nµ kAxk. Moreover, λ−µ m µ m−1 n n m x+ J Jλ x | an,m = |Jµ x − Jλ x| ≤ |Jµ x − Jµ λ λ λ ≤ µ n−1 λ − µ n−1 |J x − Jλm−1 x| + |Jµ x − Jλm x| λ µ λ = α an−1,m−1 + (1 − α) an−1,m . It thus follows from Lemma 2.1 that [t] [t] 1 |Jµµ x − Jλλ x| ≤ 2(λ2 + λt) 2 kAxk. (5.14) [t] Thus, Jλλ x converges to S(t)x uniformly on any bounded intervals, as λ → 0+ . Since [t] Jλλ is non-expansive, so is S(t). Hence (5.13) holds. Next, we show that S(t) satisfies the semigroup property S(t + s)x = S(t)S(s)x for x ∈ dom (A) and t, s ≥ 0. Letting µ → 0+ in (??), we obtain [t] 1 |T (t)x − Jλλ x| ≤ 2(λ2 + λt) 2 . (5.15) [ λs ] for x ∈ dom (A). If we let x = Jλ z, then x ∈ dom (A) and [s] [t] [s] 1 |S(t)Jλλ z − Jλλ Jλλ z| ≤ 2(λ2 + λt) 2 kAzk (5.16) [ λs ] t s where we used that kAJλ zk ≤ kAzk for z ∈ dom (A). Since [ t+s λ ] − ([ λ ] + [ λ ]) equals 0 or 1, we have ] [ t+s λ |Jλ [t] [s] z − Jλλ Jλλ z| ≤ |Jλ z − z| ≤ λ kAzk. (5.17) It thus follows from (5.15)–(5.17) that [ t+s ] λ |S(t + s)z − S(t)S(s)z| ≤ |S(t + s)x − Jλ [t] [s] [s] [s] [ t+s ] λ z| + |Jλ +|Jλλ Jλλ z − S(t)Jλλ z| + |S(t)Jλλ z − S(t)S(s)z| → 0 103 [t] [s] z − Jλλ Jλλ z| as λ → 0+ . Hence, S(t + s)z = S(t)S(s)z. Finally, since t [t] |Jλλ x − x| ≤ [ ] |Jλ x − x| ≤ t kAxk λ we have |S(t)x − x| ≤ t kAxk for x ∈ dom (A). From this we obtain |S(t)x − x| → 0 as t → 0+ for x ∈ dom (A) and also |S(t)x − S(s)x| ≤ |S(t − s)x − x| ≤ (t − s) kAxk for x ∈ dom (A) and t ≥ s ≥ 0. 6 Cauchy Problem Definition 3.1 Let x0 ∈ X and ω ∈ R. Consider the Cauchy problem d u(t) ∈ Au(t), dt u(0) = x0 . (6.1) (1) A continuous function u(t) : [0, T ] → X is called a strong solution of (6.1) if u(t) is Lipschitz continuous with u(0) = x0 , strongly differentiable a.e. t ∈ [0, T ], and (6.1) holds a.e. t ∈ [0, T ]. (2) A continuous function u(t) : [0, T ] → X is called an integral solution of type ω of (6.1) if u(t) satisfies |u(t) − x| − |u(tˆ) − x| ≤ t Z tˆ (ω |u(s) − x| + hy, u(s) − xi+ ) ds (6.2) for all [x, y] ∈ A and t ≥ tˆ ∈ [0, T ]. Theorem 3.1 Let A be a dissipative subset of X × X. Then, the strong solution to (6.1) is unique. Moreover if the range condition (5.14) holds, then then the strong solution u(t) : [0, ∞) → X to (6.1) is given by t u(t) = lim (I − λ A)−[ λ ] x λ→0+ for x ∈ dom (A) and t ≥ 0. Proof: Let ui (t), i = 1, 2 be the strong solutions to (6.1). Then, t → |u1 (t) − u2 (t)| is Lipschitz continuous and thus a.e. differentiable t ≥ 0. Thus d |u1 (t) − u2 (t)|2 = 2 hu01 (t) − u02 (t), u1 (t) − u2 (t)ii dt a.e. t ≥ 0. Since u0i (t) ∈ Aui (t), i = 1, 2 from the dissipativeness of A, we have d 2 dt |u1 (t) − u2 (t)| ≤ 0 and therefore 2 Z |u1 (t) − u2 (t)| ≤ 0 t d |u1 (t) − u2 (t)|2 dt ≤ 0, dt which implies u1 = u2 . t For 0 < 2λ < s let uλ (t) = (I − λ A)−[ λ ] x and define gλ (t) = λ−1 (u(t) − u(t − λ)) − u0 (t) a.e. t ≥ λ. Since limλ→0+ |gλ | = 0 a.e. t > 0 and |gλ (t)| ≤ 2M for a.e. t ∈ [λ, s], 104 where M is a Lipschitz constant of u(t) on [0, s], it follows that limλ→0+ by Lebesgue dominated convergence theorem. Next, since Rs λ |gλ (t)| dt = 0 u(t − λ) + λ gλ (t) = u(t) − λ u0 (t) ∈ (I − λ A)u(t), we have u(t) = (I − λ A)−1 (u(t − λ) + λ gλ (t)). Hence, |uλ (t) − u(t)| ≤ |(I − λ A)−[ t−λ ] λ x − u(t − λ) − λ gλ (t)| ≤ |uλ (t − λ) − u(t − λ)| + λ |gλ (t)| a.e. t ∈ [λ, s]. Integrating this on [λ, s], we obtain Z Z λ Z s −1 −1 |uλ (t) − u(t)| dt + |uλ (t) − u(t)| dt ≤ λ λ |gλ (t)| dt. λ 0 s−λ s Letting λ → 0+ , it follows from Theorem 2.2 that |S(s)x−u(s)| = 0 since u is Lipschitz continuous, which shows the desired result. In general, the semigroup S(t) generated on dom (A) in Theorem 2.2 in not necessary strongly differentiable. In fact, an example of an m-dissipative A satisfying (5.14) is given, for which the semigroup constructed in Theorem 2.2 is not even weakly differentiable for all t ≥ 0. Hence, from Theorem 3.1 the corresponding Cauchy problem (6.1) does not have a strong solution. However, we have the following. Theorem 3.2 Let A be an ω-dissipative subset satisfying (5.14) and S(t), t ≥ 0 be the semigroup on dom (A), as constructed in Theorem 2.2. Then, the followings hold. (1) u(t) = S(t)x on dom (A) defined in Theorem 2.2 is an integral solution of type ω to the Cauchy problem (6.1). (2) If v(t) ∈ C(0, T ; X) be an integral of type ω to (6.1), then |v(t) − u(t)| ≤ eωt |v(0) − u(0)|. (3) The Cauchy problem (6.1) has a unique solution in dom (A) in the sense of Definition 2.1. Proof: A simple modification of the proof of Theorem 2.2 shows that for x0 ∈ dom (A) t S(t)x0 = lim (I − λ A)−[ λ ] x0 λ→0+ exists and defines the semigroup S(t) of nonlinear ω-contractions on dom (A), i.e., |S(t)x − S(t)y| ≤ eωt |x − y| for t ≥ 0 and x, y ∈ dom (A). For x0 ∈ dom (A) we define for λ > 0 and k ≥ 1 yλk = λ−1 (Jλk x0 − Jλk−1 x0 ) = Aλ Jλk−1 x0 ∈ AJλk . (6.3) Since A is ω-dissipative, hyλ,k − y, Jλk x0 − x)i− ≤ ω |Jλk x0 − x)| for [x, y] ∈ A. Since from Lemma 1.1 (4) hy, xi− − hz, xi+ ≤ hy − z, xi− , it follows that hyλk , Jλk x0 − xi− ≤ ω |Jλk x0 − x)| + hy, Jλk x0 − xi+ (6.4) Since from Lemma 1.1 (3) hx + y, xi− = |x| + hy, xi− , we have hλyλk , Jλk x0 − x)i− = |Jλk x0 − x| + h−(Jλk−1 − x), Jλk x0 − xi≥ |Jλk x0 − x)| − |Jλk−1 x0 − x|. 105 It thus follows from (6.4) that |Jλk x0 − x| − |Jλk−1 x0 − x| ≤ λ (ω |Jλk x0 − x| + hy, Jλk x0 − xi+ ). t Since J [ λ ] = Jλk on t ∈ [kλ, (k + 1)λ), this inequality can be written as Z (k+1)λ [t [t] (ω |Jλλ x0 − x| + hy, Jλλ x0 − xi+ dt. |Jλk x0 − x| − |Jλk−1 x0 − x| ≤ kλ ˆ Hence, summing up this in k from k = [ λt ] + 1 to [ λt ] we obtain Z [ t ]λ ˆ λ [s] [ λt ] [ λt ] [s] (ω |Jλλ x0 − x| + hy, Jλλ x0 − xi+ ds. |Jλ x0 − x| − |Jλ x0 − x| ≤ ˆ [ λt ]λ [s] Since |Jλλ x0 | ≤ (1 − λω)−k |x0 | ≤ ekλω |x0 |, by Lebesgue dominated convergence theorem and the upper semicontinuity of h·, ·i+ , letting λ → 0+ we obtain Z t |S(t)x0 − x| − |S(tˆ)x0 − x| ≤ (ω |S(s)x0 − x| + hy, S(t)x0 − xi+ ) ds (6.5) tˆ for x0 ∈ dom (A). Similarly, since S(t) is Lipschitz continuous on dom (A), again by Lebesgue dominated convergence theorem and the upper semicontinuity of h·, ·i+ , (6.5) holds for all x0 ∈ dom (A). (2) Let v(t) ∈ C(0, T ; X) be an integral of type ω to (6.1). Since [Jkλ x0 , yλk ] ∈ A, it follows from (6.2) that Z t k k (ω |v(s) − Jλk x0 | + hyλk , v(s) − Jλk x0 i+ ) ds. (6.6) |v(t) − Jλ x0 | − |v(tˆ) − Jλ x0 | ≤ tˆ Since λ yλk = −(v(s) − Jλk x0 ) + (v(s) − Jλk−1 x0 ) and from Lemma 1.1 (3) h−x + y, xi+ = −|x| + hy, xi+ , we have hλyλk , v(s)−Jλk x0 i = −|v(s)−Jλk x0 |+hv(s)−Jλk−1 x0 , v(s)−Jλk x0 i+ ≤ −|v(s)−Jλk x0 |+|v(s)−Jλk−1 x0 |. Thus, from (6.6) (|v(t)−Jλk x0 |−|v(tˆ)−Jλk x0 |)λ Z ≤ tˆ t (ωλ |v(s)−Jλk x0 |−|v(s)−Jλk x0 |+|v(s)−Jλk−1 x0 |) ds Summing up the both sides of this in k from [ λτˆ ] + 1 to [ λτ ], we obtain Z [ τ ]λ λ [σ] [σ] (|v(t) − Jλλ x0 | − |v(tˆ − Jλλ x0 | dσ τ ˆ [λ ]λ t Z ≤ (−|v(s) − tˆ [τ ] Jλλ x0 | + |v(s) − [ τˆ ] Jλλ x0 | Z + τ [λ ]λ τ ˆ [λ ]λ [σ] ω |v(s) − Jλλ x0 | dσ) ds. Now, by Lebesgue dominated convergence theorem, letting λ → 0+ Z τ Z t (|v(t) − u(σ)| − |v(tˆ − u(σ)|) dσ + (|v(s) − u(τ )| − |v(s) − u(ˆ τ )|) ds tˆ τˆ (6.7) Z tZ ≤ τ ω |v(s) − u(σ)| dσ ds. tˆ τˆ 106 For h > 0 we define Fh by −2 t+h Z t+h Z |v(s) − u(σ)| dσ ds. Fh (t) = h t t d Fh (t) ≤ ω Fh (t) and thus Fh (t) ≤ eωt Fh (0). Since u, v are dt continuous we obtain the desired estimate by letting h → 0+ . Then from (6.7) we have Lemma 3.3 Let A be an ω-dissipative subset satisfying (5.14) and S(t), t ≥ 0 be the semigroup on dom (A), as constructed in Theorem 2.2. Then, for x0 ∈ dom (A) and [x, y] ∈ A Z t (6.8) |S(t)x0 − x|2 − |S(tˆ)x0 − x|2 ≤ 2 (ω |S(s)x0 − x|2 + hy, S(s)x0 − xis ds tˆ and for every f ∈ F (x0 − x) lim sup Re h t→0+ S(t)x0 − x0 , f i ≤ ω |x0 − x|2 + hy, x0 − xis . t (6.9) Proof: Let ykλ be defined by (6.3). Since A is ω-dissipative, there exists f ∈ F (Jλk x0 −x) such that Re (yλk − y, f i ≤ ω |Jλk x0 − x|2 . Since Re hyλk , f i = λ−1 Re hJλk x0 − x − (Jλk−1 x0 − x), f i ≥ λ−1 (|Jλk x0 − x|2 − |Jλk−1 x0 − x||Jλk x0 − x|) ≥ (2λ)−1 (|Jλk x0 − x|2 − |Jλk−1 x0 − x|2 ), we have from Theorem 1.4 |Jλk x0 − x|2 − |Jλk−1 x0 − x|2 ≤ 2λ Re hyλk , f i ≤ 2λ (ω |Jλk x0 − x|2 + hy, Jλk x0 − xis . [t] Since Jλλ x0 = Jλk x0 on [kλ, (k + 1)λ), this can be written as |Jλk x0 |Jλk−1 x0 2 − x| − Z 2 (k+1)λ [t [t] (ω |Jλλ x0 − x|2 + hy, Jλλ x0 − xis ) dt. − x| ≤ kλ Hence, [t] |Jλλ x0 ˆ 2 − x| − [t] |Jλλ x0 Z 2 − x| ≤ 2λ [ λt ]λ ˆ [ λt ]λ [s] [s] (ω |Jλλ x0 − x|2 + hy, Jλλ x0 − xis ) ds. [s] Since |Jλλ x0 | ≤ (1 − λω)−k |x0 | ≤ ekλω |x0 |, by Lebesgue dominated convergence theorem and the upper semicontinuity of h·, ·is , letting λ → 0+ we obtain (6.8). Next, we show (6.9). For any given f ∈ F (x0 − x) as shown above 2Re hS(t)x0 − x0 , f i ≤ |S(t)x0 − x|2 − |x0 − x|2 . Thus, from (6.8) Z Re hS(t)x0 − x0 , f i ≤ t (ω |S(s)x0 − x|2 + hy, S(s)x0 − xis ) ds 0 107 Since s → S(s)x0 is continuous, by the upper semicontinuity of h·, ·is , we have (6.9). Theorem 3.4 Assume that A is a close dissipative subset of X × X and satisfies the range condition (5.14) and let S(t), t ≥ 0 be the semigroup on dom (A), defined in Theorem 2.2. Then, if S(t)x is strongly differentiable at t0 > 0 then S(t0 )x ∈ dom (A) and d S(t)x |t=t0 ∈ AS(t)x, dt S(t0 )x ∈ dom (A0 ) and d S(t)x t=t0 = A0 S(t0 )x. dt and moreover d Proof: Let dt S(t)x |t=t0 = y . Then S(t0 − λ)x − (S(t0 )x − λ y) = o(λ), where |o(λ)| λ → + 0 as λ → 0 . Since S(t0 − λ)x ∈ dom (A), there exists a [xλ , yλ ] ∈ A such that S(t0 − λ)x = xλ − λ yλ and λ (y − yλ ) = S(t0 )x − xλ + o(λ). (6.10) If we let x = xλ , y = yλ and x0 = S(t0 )x in (6.9), then we obtain Re hy, f i ≤ ω |S(t0 )x − xλ |2 + hyλ , S(t0 )x − xλ is . for all f ∈ (S(t0 )x − xλ ). It follows from Lemma 1.3 that there exists a g ∈ F (S(t0 )x − xλ ) such that hyλ , S(t)x − xλ is = Re hyλ , gi and thus Re hy − yλ , gi ≤ ω |S(t0 )x − xλ |2 . From (6.10) λ−1 |S(t0 )x − xλ |2 ≤ and thus |o(λ)| |S(t0 )x − xλ | + ω |S(t0 )x − xλ |2 λ S(t0 )x − xλ →0 (1 − λω) λ as λ → 0+ . Combining this with (6.10), we obtain xλ → S(t0 )x and yλ → y as λ → 0+ . Since A is closed, it follows that [S(t0 )x, y] ∈ A, which shows the first assertion. Next, from Theorem 2.2 |S(t0 + λ)x − S(t0 )x| ≤ λ kAS(t0 )xk for λ > 0 This implies that |y| ≤ kAS(t0 )xk. Since y ∈ AS(t0 )x, it follows that S(t0 )x ∈ dom (A0 ) and y ∈ A0 S(t0 )x. Theorem 3.5 Let A be a dissipative subset of X × X satisfying the range condition (5.14) and S(t), t ≥ 0 be the semigroup on dom (A), defined in Theorem 2.2. Then the followings hold. 108 (1) For ever x ∈ dom (A) lim |Aλ x| = lim inf |S(t)x − x|. λ→0+ t→0+ (2) Let x ∈ dom (A). Then, limλ→0+ |Aλ x| < ∞ if and only if there exists a sequence {xn } in dom (A) such that limn→∞ xn = x and supn kAxn k < ∞. Proof: Let x ∈ dom (A). From Theorem 1.4 |Aλ x| is monotone decreasing and [t] lim |Aλ x| exists (including ∞). Since |Jλλ − x| ≤ t |Aλ x|, it follows from Theorem 2.2 that |S(t)x − x| ≤ t lim |Aλ x| λ→0+ thus lim inf t→0+ 1 |S(t)x − x| ≤ lim |Aλ x|. t λ→0+ Conversely, from Lemma 3.3 − lim inf t→0+ 1 |S(t)x − x||x − u| ≤ hv, x − uis t for [u, v] ∈ A. We set u = Jλ x and v = Aλ x. Since x − u = −λ Aλ x, − lim inf t→0+ 1 |S(t)x − x|λ|Aλ x| ≤ −λ |Aλ x|2 t which implies 1 |S(t)x − x| ≥ |Aλ x|. t Theorem 3.7 Let A be a dissipative set of X × X satisfying the range condition (5.14) and let S(t), t ≥ 0 be the semigroup defined on dom (A) in Theorem 2.2. (1) If x ∈ dom (A) and S(t)x is differentiable a.e., t > 0, then u(t) = S(t)x, t ≥ 0 is a unique strong solution of the Cauchy problem (6.1). (2) If X is reflexive. Then, if x ∈ dom (A) then u(t) = S(t)x, t ≥ 0 is a unique strong solution of the Cauchy problem (6.1). lim inf t→0+ Proof: The assertion (1) follows from Theorems 3.1 and 3.5. If X is reflexive, then since an X-valued absolute continuous function is a.e. strongly differentiable, (2) follows from (1). 6.0.3 Infinitesimal generator Definition 4.1 Let X0 be a subset of a Banach space X and S(t), t ≥ 0 be a semigroup of nonlinear contractions on X0 . Set Ah = h−1 (T (h) − I) for h > 0 and define the strong and weak infinitesimal generators A0 and Aw by A0 x = limh→0+ Ah x with dom (A0 ) = {x ∈ X0 : limh→0+ Ah x exists} A0 x = w − limh→0+ Ah x with dom (A0 ) = {x ∈ X0 : w − limh→0+ Ah x exists}, (6.11) ˆ respectively. We define the set D by ˆ = {x ∈ X0 : lim inf |Ah x| < ∞} D h→0 109 (6.12) Theorem 4.1 Let S(t), t ≥ 0 be a semigroup of nonlinear contractions defined on a closed subset X0 of X. Then the followings hold. (1) hAw x1 − Aw x2 , x∗ i for all x1 , x2 ∈ dom (Aw ) and x∗ ∈ F (x1 − x2 ). In particular, A0 and Aw are dissipative. ˆ (2) If X is reflexive, then dom (A0 ) = dom (Aw ) = D. ˆ In addition, if X is (3) If X is reflexive and strictly convex, then dom(Aw ) = D. ˆ uniformly convex, then dom (Aw ) = dom (A0 ) = D and Aw = A0 . Proof: (1) For x1 , x2 ∈ X0 and x∗ ∈ F (x1 − x2 ) we have hAh x1 − Ah x2 , x∗ i = h−1 (hS(h)x1 − S(h)x2 , x∗ i − |x1 − x2 |2 ) ≤ h−1 (|S(h)x1 − S(h)x2 ||x1 − x2 | − |x1 − x2 |2 ) ≤ 0. Letting h → 0+ , we obtain the desired inequality. ˆ Let x ∈ D. ˆ It suffices to show that (2) Obviously, dom (A0 ) ⊂ dom (Aw ) ⊂ D. x ∈ dom (A0 ). We can show that t → S(t)x is Lipschitz continuous. In fact, there exists a monotonically decreasing sequence {tk } of positive numbers and L > 0 such that tk → 0 as k → ∞ and |S(tk )x − x| ≤ L tk . Let h > 0 and nk be a nonnegative integer such that 0 ≤ h − nk tk < tk . Then we have |S(t + h)x − S(t)x| ≤ |S(t)x − x| = |S(h − nk tk + nk tk )x − x| ≤ |S(h − nk tk )x − x| + L nk tk ≤ |S(h − nk tk )x − x| + L h By the strong continuity of S(t)x at t = 0, letting k → ∞, we obtain |S(t + h)x − S(t)x| ≤ L h. Now, since X is reflexive this implies that S(t)x is a.e. differentiable d on (0, ∞). But since S(t)x ∈ dom (A0 ) whenever dt S(t)x exists, S(t)x ∈ dom (A0 ) a.e. + t > 0. Thus, since |S(t)x − x| → 0 as t → 0 , it follows that x ∈ dom (A0 ). ˆ and Y be the set of all (3) Assume that X is reflexive and strictly convex. Let x0 ∈ D −1 + ˜ weak cluster points of t (S(t)x0 − x0 ) as t → 0 . Let A be a subset of X × X defined by ˜ = dom (A0 ) ∪ {x0 } A˜ = A0 ∪ [x0 , co Y ] and dom (A) where co Y denotes the closure of the convex hull of Y . Note that from (1) hA0 x1 − A0 x2 , x∗ i ≤ 0 for all x1 , x2 ∈ dom (A0 ) and x∗ ∈ F (x1 − x2 ) and for every y ∈ Y hA0 x1 − y, x∗ i ≤ 0 for all x1 ∈ dom (A0 ) and x∗ ∈ F (x1 − x0 ). This implies that A˜ is a dissipative subset of X×X. But, since X is reflexive, t → S(t)x0 is a.e. differentiable and d ˜ S(t)x0 = A0 S(t)x0 ∈ AS(t)x 0, dt a.e., t > 0. It follows from the dissipativity of A˜ that h d (S(t)x0 − x0 ), x∗ i ≤ hy, x∗ i, dt 110 ˜ 0 a.e, t > 0 and y ∈ Ax (6.13) for x∗ ∈ F (S(t)x0 − x0 ). Note that for h > 0 hh−1 (S(t+h)x0 −S(t)x0 ), x∗ i ≤ h−1 (|S(t+h)x0 −x0 |−|S(t)x0 −x0 |)|x∗ | for x∗ ∈ F (S(t)x0 −x0 ). Letting h → 0+ , we have h d d (S(t)x0 − x0 ), x∗ i ≤ |S(t)x0 − x0 | |S(t)x0 − x0 |, dt dt a.e. t > 0 The converse inequality follows much similarly. Thus, we have |S(t)x0 − x0 | d d |S(t)x0 − x0 | = h (S(t)x0 − x0 ), x∗ i for x∗ ∈ F (S(t)x0 − x0 ). (6.14) dt dt d It follows from (6.13)–(6.14) that |S(t)x0 − x0 | ≤ |y| for y ∈ Y and a.e. t > 0 and dt thus ˜ 0 k for all t > 0. |S(t)x0 − x0 | ≤ t kAx (6.15) ˜ 0 = co Y is a closed convex subset of X. Since X is reflexive and strictly Note that Ax ˜ 0 such that |y0 | = kAx ˜ 0 k. Hence, (6.15) convex, there exists a unique element y0 ∈ Ax implies that co Y = y0 = Aw x0 and therefore x0 ∈ dom (Aw ). ˆ Then Next, we assume that X is uniformly convex and let x0 ∈ dom(Aw ) = D. w − lim S(t)x0 − x0 = y0 t as t → 0+ . From (6.15) |t−1 (S(t)x0 − x0 )| ≤ |y0 |, a.e. t > 0. Since X is uniformly convex, these imply that lim S(t)x0 − x0 = y0 t as t → 0+ . which completes the proof. Theorem 4.2 Let X and X ∗ be uniformly convex Banach spaces. Let S(t), t ≥ 0 be the semigroup of nonlinear contractions on a closed subset X0 and A0 be the infinitesimal generator of S(t). If x ∈ dom (A0 ), then (i) S(t)x ∈ dom (A0 ) for all t ≥ 0 and the function t → A0 S(t)x is right continuous on [0, ∞). + + (ii) S(t)x has a right derivative ddt S(t)x for t ≥ 0 and ddt S(t)x = A0 S(t)x, t ≥ 0. d (iii) dt S(t)x exists and is continuous except a countable number of values t ≥ 0. ˆ and thus S(t)x ∈ Proof: (i) − (ii) Let x ∈ dom (A0 ). By Theorem 4.1, dom (A0 ) = D dom (A0 ) and d+ S(t)x = A0 S(t)x for t ≥ 0. (6.16) dt d Moreover, t → S(t)x a.e. differentiable and dt S(t)x = A0 S(t)x a.e. t > 0. We next prove that A0 S(t)x is right continuous. For h > 0 d (S(t + h)x − S(t)x) = A0 S(t + h)x − A0 S(t)x, dt 111 a.e. t > 0. From (6.14) |S(t + h)x − S(t)x| d |S(t + h)x − S(t)x| = hA0 S(t + h)x − A0 S(t)x, x∗ i ≤ 0 dt for all x∗ ∈ F (S(t + h)x − S(t)x), since A0 is dissipative. Integrating this over [s, t], we obtain |S(t + h)x − S(t)x| ≤ |S(s + h)x − S(s)x| for 0 ≤ s ≤ t and therefore d+ d+ S(t)x| ≤ | S(s)x|. dt ds Hence t → |A0 S(t)x| is monotonically non-increasing function and thus it is right continuous. Let t0 ≥ 0 and let {tk } be a decreasing sequence of positive numbers such that tk → t0 . Without loss of generality, we may assume that w−limk→∞ A0 S(tk ) = y0 . The right continuity of |A0 S(t)x| at t = t0 , thus implies that | |y0 | ≤ |A0 S(t0 )x| (6.17) since norm is weakly lower semicontinuous. Let A˜0 be the maximal dissipative exten˜ sion of A0 . It then follows from Theorem 1.9 that A˜ is demiclosed and thus y0 ∈ AS(t)x. ˜ On the other hand, for x ∈ dom (A0 ) and y ∈ Ax, we have h d (S(t)x − x), x∗ i ≤ hy, x∗ i for all x∗ ∈ F (S(t)x − x) dt d ˜ a.e. t > 0, since A˜ is dissipative and dt S(t)x = A0 S(t)x ∈ AS(t)x a.e. t > 0. From (6.14) we have ˜ = kA˜0 xk for t ≥ 0 t−1 |S(t)x − x| ≤ |Ax| ˜ Hence A0 x = A˜0 x. It thus follows from (6.17) where A˜0 is the minimal section of A. that y0 = A0 S(t0 )x and limk→∞ A0 S(tk )x = y0 since X is uniformly convex. Thus, we have proved the right continuity of A0 S(t)x for t ≥ 0. (iii) Integrating (6.16) over [t, t + h], we have Z t+h S(t + h)x − S(t)x = A0 S(s)x ds t for t, h ≥ 0. Hence it suffices to prove that the function t → A0 S(t)x is continuous except a cuntable number of t > 0. Using the same arguments as above, we can show that if |A0 S(t)x| is continuous at t = t0 , then A0 S(t)x is continuous at t = t0 . But since |A0 S(t)x| is monotone non-increasing, it follows that it has at most countably many discontinuities, which completes the proof. Theorem 4.3 Let X and X ∗ be uniformly convex Banach spaces. If A be m-dissipative, then A is demiclosed, dom (A) is a closed convex set and A0 is single-valued operator with dom (A0 ) = dom (A). Moreover, A0 is the infinitesimal generator of a semigroup of contractions on dom (A). Proof: It follows from Theorem 1.9 that A is demiclosed. The second assertion follows from Theorem 1.12. Also, from Theorem 3.4 d S(t)x = A0 S(t)x, dt 112 a.e. t > 0 and |S(t)x − x| ≤ t |A0 x|, t>0 (6.18) for x ∈ dom (A). Let A0 be the infinitesimal generator of the semigroup S(t), t ≥ 0 generated by A defined in Theorem 2.2. Then, (6.18) implies that by Theorem 4.1 + x ∈ dom (A0 ) and by Theorem 4.2 ddt S(t)x = A0 S(t)x and A0 S(t)x is right continuous in t. Since A is closed, A0 x = lim A0 S(t)x ∈ Ax. t→0+ Hence, (6.17) implies that A0 x = A0 x. When X is a Hilbert space we have the nonlinear version of Hille-Yosida theorem as follows. Theorem 4.4 Let H be a Hilbert space. Then, (1) The infinitesimal generator A0 of a semigroup of contractions S(t), t ≥ 0 on a closed convex set X0 has a dense domain in X0 and there exists a unique maximal dissipative operator A such that A0 = A0 . Conversely, (2) If A0 is a maximal dissipative operator, then dom (A) is a closed convex set and A0 is the infinitesimal generator of contractions on dom (A). Proof: (2) Since from Theorem 1.7 the maximal dissipative operator in a Hilbert space is m-dissipative, (2) follows from Theorem 4.3. Example (Nonlinear Diffusion) ut = Au = ∆γ(u) + ∇ · (f~(u)) on X = L1 . Assume γ : R → R is strictly monotone. Thus, sign(x − y) = sign(γ(x) − γ(y)) (Au1 −Au2 , ρ (γ(u1 )−γ(u2 ))) = −(∇(γ(u1 )−γ(u2 ))ρ0 )∇(γ(u1 )−γ(u2 )))−(ρ0 (u1 −u2 ))(f (u1 )−f (u2 )), ∇(u1 −u 6.1 Fixed Point Theorems Banach Fixed Point Theorem. Let (X, d) be a non-empty complete metric space with a contraction mapping T : X → X. Then T admits a unique fixed-point x∗ in X (i.e. T (x∗) = x∗). Moreover, x∗ is a fixed point of the fixed point iterate xn = T (xn−1 ) with arbitrary initial condition x0 . Proof: d(xn+1 , xn ) = d(T (xn ), T (xn−1 ) ≤ ρ d(xn , xn−1 . By the induction d(xn+1 , xn ) ≤ ρn d(x1 , x0 ) and d(xm , xn ) ≤ ρn−1 d(x1 , x0 ) → 0 1−ρ as m ≥ n → ∞. Thus, {xn } is a Cauchy sequence and {xn } converges a fixed point x∗ of T . Suppose x∗1 , x∗2 are two fixed point. Then, d(x∗1 , x∗2 ) = d(T (x∗1 ), T (x∗2 ) ≤ ρ d(x∗1 , x∗2 ) 113 Since ρ < 1, d(x∗1 , x∗2 ) = 0 and thus x∗1 = x∗2 . Brouwer fixed point theorem: is a fundamental result in topology which proves the existence of fixed points for continuous functions defined on compact, convex subsets of Euclidean spaces. Kakutani’s theorem extends this to set-valued functions. Kakutani’s theorem states: Let S be a non-empty, compact and convex subset of some Euclidean space Rn . Let T : S → S be a set-valued function on S with a closed graph and the property that T (x) is non-empty and convex for all x ∈ S. Then T has a fixed point. The Schauder fixed point theorem is an extension of the Brouwer fixed point theorem to topological vector spaces. Schauder fixed point theorem Let X be a locally convex topological vector space, and let K ⊂ X be a compact, and convex set. Then given any continuous mapping T : K → K has a fixed point. Given > 0 there exists the family of open sets {B ?(x) : x ∈ K} is open covering of K Since K is compact. there exists a finite open subcover, i.e. there exist finite many points {xk } in K such that K= n [ B (xk ) k=1 Define the functions {gk } by P gk (x) = max(0, − |x − xk |). It is clear that each gk is continuous, gk (x) ≥ 0 and k gk (x) > 1 for all x ∈ K. Thus, we can define a function on K by P k gk (x) xk g?(x) = P k gk (x) and g is a continuous function from K to the convex hull K0 of of {xk }. It is easily shown that |g?(x) − x| ≤ for x ∈ K. Then A maps K0 to K0 Since K0 is compact convex subset of a finite dimensional vector space, we can apply the Brouwer fixed point theorem to assure the existence of z ∈ K0 such that x = A(x ). We have the estimate |z − T (z )| = |g(T (z )) − T (z )| ≤ . Since K is compact, the exits a subsequence n → 0 such that xn → x0 ∈ K and thus |xn − T (xn )| ≤ |g(T (xn )) − T (xn )| ≤ n . Since T is continuous, T (x0 ) = x0 . Schaefer’s fixed point theorem: Let T be a continuous and compact mapping of a Banach space X into itself, such that the set {x ∈ X : x = λT x for some 0 ≤ λ ≤ 1} is bounded. Then T has a fixed point. Proof: Let |x| ≤ M for {x ∈ X : x = λT x for some 0 ≤ λ ≤ 1}. Define T˜(x) = M T (x) max(M, |T (x)|) 114 Note that T˜ : B(0, M ) → B(0, M ).. Let K be the closed convex hull go T˜(B(0, M )). Since T is compact, K is a compact convex subset of X. Since T˜K → K, it follows from Schauder’s fixed point theorem that there exits x ∈ K such that x = T˜(x). We now claim that x is a fixed point of T . If not, we should have |T (x)| > M and x = λ T (u) for λ = M <1 |T (x)| But, since |x| = |T˜(x)| ≤ M , which is a contradiction. The advantage of Schaefer’s theorem over Schauder’s for applications is that it is not necessary to determine an explicit convex, compact set K. Krasnoselskii theorem The sum of two operators T = A + B has a fixed point in a nonempty closed convex subset C of a real Banach space X under conditions such that T (C) ⊂ C, A is continuous on C, A(C) is a relatively compact subset of X, and B is a strict contraction on C. Note that the proof of Kransnoselskiis fixed point theorem combines the Banach contraction theorem and Schauders fixed point theorem, i.e., y ∈ C, Ay + Bx = x has the fixed point map x = ψ(y) and use Schauders fixed point theorem for ψ. Kakutani-Glicksberg-Fan theorem Let S be a non-empty, compact and convex subset of a locally convex topological vector space X. Let T : S → S be a Kakutani map, i.e., it is upper hemicontinuous, i.e., if for every open set W ⊂ X, the set {x ∈ X : T (x) ∈ W } is open in X and T (x) is non-empty, compact and convex for all x ∈ X. Then T has a fixed point. The corresponding result for single-valued functions is the Tychonoff fixed-point theorem. 7 Convex Analysis and Duality This chapter is organized as follows. We present the basic convex analysis and duality theory and subdifferntial and monotone operator theory in Banach spaces. 7.1 Convex Functional and Subdifferential Definition (Convex Functional) (1) A proper convex functional on a Banach space X is a function ϕ from X to (−∞, ∞], not identically +∞ such that ϕ((1 − λ) x1 + λ x2 ) ≤ (1 − λ) ϕ(x1 ) + λ ϕ(x2 ) for all x1 , x2 ∈ X and 0 ≤ λ ≤ 1. (2) A functional ϕ : X → R is said to be lower-semicontinuous if ϕ(x) ≤ lim inf ϕ(y) y→x for all x ∈ X. (3) A functional ϕ : X → R is said to be weakly lower-semicontinuous if ϕ(x) ≤ lim inf ϕ(xn ) n→∞ for all weakly convergent sequence {xn } to x. 115 (4) The subset D(ϕ) = {x ∈ X; ϕ(x) ≤ ∞} of X is called the domain of ϕ. (5) The epigraph of ϕ is defined by epi(ϕ) = {(x, c) ∈ X × R : ϕ(x) ≤ c}. Lemma 3 A convex functional ϕ is lower-semicontinuous if and only if it is weakly lower-semicontinuous on X. Proof: Since the level set {x ∈ X : ϕ(x) ≤ c} is a closed convex subset if ϕ is lowersemicontinuous. Thus, the claim follows the fact that a convex subset of X is closed if and only if it is weakly closed. Lemma 4 If ϕ be a proper lower-semicontinuous, convex functional on X, then ϕ is bounded below by an affine functional, i.e., there exist x∗ ∈ X ∗ and c ∈ R such that ϕ(x) ≥ hx, x∗ i + β, x ∈ X. Proof: Let x0 ∈ X and β ∈ R be such that ϕ(x0 ) > c. Since ϕ is lower-semicontinuous on X, there exists an open neighborhood V (x0 ) of X0 such that ϕ(x) > c for all x ∈ V (x0 ). Since the ephigraph epi(ϕ) is a closed convex subset of the product space X × R. It follows from the separation theorem for convex sets that there exists a closed hyperplane H ⊂ X × R; H = {(x, r) ∈ X × R : hx, x∗0 i + r = α} with x∗0 ∈ X ∗ , α ∈ R, that separates epi(ϕ) and V (x0 ) × (−∞, c). Since {x0 } × (−∞, c) ⊂ {(x, r) ∈ X × R : hx, x∗0 i + r < α} it follows that hx, x∗0 i + r > α for all (x, c) ∈ epi(ϕ) which yields the desired estimate. Definition (Subdifferential) Given a proper convex functional ϕ on a Banach space X the subdifferential of ∂ϕ(x) is a subset in X ∗ , defined by ∂ϕ(x) = {x∗ ∈ X ∗ : ϕ(y) − ϕ(x) ≥ hy − x, x∗ i for all y ∈ X}. Since for x∗1 ∈ ∂ϕ(x1 ) and x∗2 ∈ ∂ϕ(x2 ), ϕ(x1 ) − ϕ(x2 ) ≤ hx1 − x2 , x∗2 i ϕ(x2 ) − ϕ(x1 ) ≤ hx2 − x1 , x∗1 i it follows that hx1 − x2 , x∗1 − x∗2 i ≥ 0. Hence ∂ϕ is a monotone operator from X into X ∗. Example 1 Let ϕ be Gateaux differentiable at x. i.e., there exists w∗ ∈ X ∗ such that lim t→0+ ϕ(x + t v) − ϕ(x) = hh, w∗ i t for all h ∈ X and w∗ is the Gateaux differential of ϕ at x and is denoted by ϕ0 (x). If ϕ is convex, then ϕ is subdifferentiable at x and ∂ϕ(x) = {ϕ0 (x)}. Indeed, for v = y − x ϕ(x + t (y − x)) − ϕ(x) ≤ ϕ(y) − ϕ(x), t 116 0<t<1 Letting t → 0+ we have ϕ(y) − ϕ(x) ≥ hy − x, ϕ0 (x)i for all y ∈ X, and thus ϕ0 (x) ∈ ∂ϕ(x). On the other hand if w∗ ∈ ∂ϕ(x) we have for y ∈ X and t > 0 ϕ(x + t y) − ϕ(x) ≥ hy, w∗ i. t Taking limit t → 0+ , we obtain hy, ϕ0 (x) − w∗ i ≥ 0 for all y ∈ X. This implies w∗ = ϕ0 (x). Example 2 If ϕ(x) = 21 |x|2 then we will show that ∂ϕ(x) = F (x), the duality mapping. In fact, if x∗ ∈ F (x), then hx − y, x∗ i = |x|2 − hy, x∗ i ≥ 1 (|x|2 − |y|2 ) 2 for all y ∈ X. Thus x∗ ∈ ∂ϕ(x). Conversely, if x∗ ∈ ∂ϕ(x) then |x + t y|2 ≥ |x|2 + 2t hy − x, x∗ i for all y ∈ X and t > 0. It can be shown that this implies |x∗ | = |x| and hx, x∗ i = |x|2 . Hence x∗ ∈ F (x). Example 3 Let K be a closed convex subset of X and IK be the indicator function of K, i.e., 0 if x ∈ K IK (x) = ∞ otherwise. Obviously, IK is convex and lower-semicontinuous on X. By definition we have for x∈K ∂IK (x) = {x∗ ∈ X ∗ : hx − y, x∗ i ≥ 0 for all y ∈ K} Thus D(IK ) = D(∂IK ) = K and ∂K (x) = {0} for each interior point of K. Moreover, if x lies on the boundary of K, then ∂IK (x) coincides with the cone of normals to K at x. Definition (Conjugate function) Given a proper functional ϕ on a Banach space X, the conjugate functional on X ∗ is defined by ϕ∗ (x∗ ) = sup{hx∗ , xi − ϕ(x)}. y Theorem 2.1 The conjugate function ϕ∗ is convex lower semi-continuous and proper. Proof: For 0 < t < 1 and x∗1 , x∗2 ∈ X ∗ ϕ∗ (t x∗1 + (1 − t) x∗2 ) = supy {ht x∗1 + (1 − t) x∗2 , xi − ϕ(x)} ≥ t supy {hx∗1 , xi − ϕ(x)} + (1 − t) supy {hx∗2 , xi − ϕ(x)} = t ϕ∗ (x∗1 ) + (1 − t) ϕ∗ (x∗2 ). 117 Thus, ϕ∗ is convex. For > 0 arbitrary let y ∈ X be ϕ∗ (x∗ ) − ≥ hx∗ , yi − ϕ(y) For xn → x∗ ϕ∗ (x∗n )hx∗n , yi − ϕ(y) and lim inf ϕ∗ (x∗n ) ≥ hx∗ , yi − ϕ(y) n→∞ Since > arbitrary, ϕ∗ is lower semi-continuous. Definition (Bi-Conjugate Function) For any proper functional ϕ on X the biconjugate function of ϕ is ϕ∗∗ : X → (−∞, ∞] is defined by ϕ∗∗ (x) = sup {hx∗ , xi − ϕ∗ (x∗ )}. x∗ ∈X ∗ Theorem C.3 For any F : X → (−∞, ∞] F ∗∗ is convex and l.s.c. and F ∗∗ ≤ F . Proof: Note that F ∗∗ (¯ x) = sup {hx∗ , x ¯i − F ∗ (x∗ )} x∗ ∈D(F ∗ ) Since for x∗ ∈ D(F ∗ ) hx∗ , xi − F (x∗ ) ≤ F (x) for all x, F ∗∗ (¯ x) ≤ F (¯ x). Theorem If F is a proper l.s.c convex function, then F = F ∗∗ . If X is reflexive, then (F ∗ )∗ = F ∗∗ . Proof: By Theorem if F is a proper l.s.c. convex function, then F is bounded below by an affine functional hx∗ , xi − a, i.e. a ≥ hx∗ , xi − F (x) for all x ∈ X. Define a(x∗ ) = inf{a ∈ R : a ≥ hx∗ , xi − F (x) for all x ∈ X} Then for a ≥ a(x∗ ) hx∗ , xi − a ≤ hx∗ , xi − a(x∗ ) for all x ∈ X. By Theorem again F (¯ x) = sup {hx∗ , x ¯i − a(x∗ )}. x∗ ∈X ∗ Since a(x∗ ) = sup {hx∗ , xi − F (x)} = F ∗ (x) x∈X we have F (x) = sup {hx∗ , xi − F ∗ (x∗ )}. x∗ ∈X ∗ We have the duality property. Theorem 2.2 Let F be a proper convex function on X. (1) For F : X → (−∞, ∞] x∗ ∈ ∂F (¯ x) is if and only if F (¯ x) + F ∗ (x∗ ) = hx∗ , x ¯i. 118 (2) Assume X is reflexive. If x∗ ∈ ∂F (¯ x), then x ¯ ∈ ∂F ∗ (x∗ ). If F is convex and lower ∗ semi-continuous, then x ∈ ∂F (¯ x) if and only if x ¯ ∈ ∂F ∗ (x∗ ). Proof: (1) Note that x∗ ∈ ∂F (¯ x) is if and only if hx∗ , xi − F (x) ≤ hx∗ , x ¯i − F (¯ x) (7.1) for all x ∈ X. By the definition of F ∗ this implies F ∗ (x∗ ) = hx∗ , x ¯i − F (¯ x). Conversely, if F (x∗ ) + F (¯ x) = hx∗ , x ¯i then (7.1) holds for all x ∈ X. (2) Since F ∗∗ ≤ F by Theorem C.3, it follows from (1) F ∗∗ (¯ x) ≤ hx∗ , x ¯i − F ∗ (x∗ ) But by the definition of F ∗∗ F ∗∗ (¯ x) ≥ hx∗ , x ¯i − F ∗ (x∗ ). Hence F ∗ (x∗ ) + F ∗∗ (¯ x) = hx∗ , x ¯i. Since X is reflexive, F ∗∗ = (F ∗ )∗ . If we apply (1) to F ∗ we have x ¯ ∈ ∂F ∗ (x∗ ). In addition if F is convex and l.s.c, it follows from Theorem C.1 that F ∗∗ = F . Thus if x∗ ∈ ∂F ∗ (¯ x) then F (¯ x) + F ∗ (x∗ ) = F ∗ (x∗ ) + F ∗∗ (x) = hx∗ , x ¯i by applying (1) for F ∗ . Therefore x∗ ∈ ∂F (¯ x) again by (1). Theorem(Rockafellar) Let X be real Banach space. If ϕ is lower-semicontinuous proper convex functional on X, then ∂ϕ is a maximal monotone operator from X into X ∗. Proof: We prove the theorem when X is reflexive. By Apuland theorem we can assume that X and X ∗ are strictly convex. By Minty-Browder theorem ∂ϕ it suffices to prove that R(F + ∂ϕ) = X ∗ . For x∗0 ∈ X ∗ we must show that equation x∗0 ∈ F x + ∂ϕ(x) has at least a solution x0 Define the proper convex functional on X by f (x) = 1 2 |x| + ϕ(x) − hx, x∗0 i. 2 X Since f is lower-semicontinuous and f (x) → ∞ as |x| → ∞ there exists x0 ∈ D(f ) such that f (x0 ) ≤ f (x) for all x ∈ X. Since F is monotone ϕ(x) − ϕ(x0 ) ≥ hx − x0 , x∗0 i − hx − x0 , F (x)i Setting xt = x0 + t (u − x0 ) and since ϕ is convex, we have 1 ϕ(u) − ϕ(x0 ) ≥ ϕ(xt ) − ϕ(x0 ) ≥ hu − x0 , x∗0 i − hu − x0 , F (xt )i. t Taking limit t → 0+ , we obtain ϕ(u) − ϕ(x0 ) ≥ hu − x0 , x∗0 i − hu − x0 , F (x0 )i, which implies x∗0 − F (x0 ) ∈ ∂ϕ(x0 ). 119 We have the perturbation result. Theorem Assume that X is a real Hilbert space and that A is a maximal monotone operator on X. Let ϕ be a proper, convex and lower semi-continuous functional on X satisfying dom(A) ∩ dom(∂ϕ) is not empty and ϕ((I + λ A)−1 x) ≤ ϕ(x) + λ M, for all λ > 0, x ∈ D(ϕ), where M is some non-negative constant. Then the operator A + ∂ϕ is maximal monotone. Lemma Let A and B be m-dissipative operators on X. Then for every y ∈ X the equation y ∈ −Ax − Bλ x (7.2) has a unique solution x ∈ dom (A). Proof Equation (9.17) is equivalent to y = xλ − wλ − Bλ xλ for some wλ ∈ A(xλ ). Thus, xλ − = λ λ 1 wλ = y+ (xλ + λ Bλ xλ ) λ+1 λ+1 λ+1 λ 1 y+ (I − λ B)−1 . λ+1 λ+1 Since A is m-dissipative, we conclude that (9.17) is equivalent to that xλ is the fixed point of the operator Fλ x = (I − λ 1 λ A)−1 ( y+ (I − λ B)−1 x). λ+1 λ+1 λ+1 By m-dissipativity of the operators A and B their resolvents are contractions on X and thus λ |Fλ x1 − Fλ x2 | ≤ |x1 − x2 | for all λ > 0, x1 , x2 ∈ X. λ+1 Hence, Fλ has the unique fixed point xλ and xλ ∈ dom (A) solves (9.17). Proof: From Lemma there exists xλ for y ∈ X such that y ∈ xλ − (−A)λ xλ + ∂ϕ(xλ ) Moreover, one can show that |xλ | is bounded uniformly. Since y − xλ + (−A)λ xλ ∈ ∂ϕ(xλ ) for z ∈ X ϕ(z) − ϕ(xλ ) ≥ (z − xλ , y − xλ + (−A)λ xλ ) Letting λ(I + λA)−1 x, so that z − xλ = λ(−A)λ xλ and we obtain (λ(−A)λ xλ , y − xλ + (−A)λ xλ ) ≤ ϕ((I + λ A)−1 ) − ϕ(xλ ) ≤ λ M, and thus |(−A)λ xλ |2 ≤ |(−A)λ xλ ||y − xλ | + M. Since xλ | is bounded and so that |(−A)λ xλ |. 120 7.2 Duality Theory In this section we discuss the duality theory. We call (P ) inf x∈X F (x) is the primal problem, where X is a Banach space and F : X → (−∞, ∞] is a proper l.s.c. convex function. We have the following result for the existence of minimizer. Theorem 2 Let X be reflexive and F be a lower semicontinuous proper convex functional defined on X. Suppose F lim F (x) = ∞ as |x| → ∞. (7.3) Then there exists an x ¯ ∈ X such that F (¯ x) = inf {F (x); x ∈ X}. Proof: Let η = inf{F (x); x ∈ X} and let {xn } be a minimizing sequence such that lim F (xn ) = η. The condition (7.3) implies that {xn } is bounded in X. Since X is reflexive there exists a subsequence that converges weakly to x ¯ in X and it follows from Lemma 1 that F (¯ x) = η. We imbed (P ) into a family of perturbed problem (Py ) inf x∈X Φ(x, y) where y ∈ Y , a Banach space is an imbedding variable and Φ : X × Y → (−∞, ∞] is a proper l.s.c. convex function with Φ(x, 0) = F (x). Thus (P0 ) = (P ). For example in terms of (12.1) we let F (x) = f (x) + ϕ(Λx) (7.4) and with Y = H Φ(x, y) = f (x) + ϕ(Λx + y) Definition The dual problem of (P ) with respect to Φ is (P ∗ ) sup (−Φ∗ (0, y ∗ )). y ∗ ∈Y The value function of (Py ) is defined by h(y) = inf x∈X Φ(x, y), y ∈ Y. We assume that h(y) > −∞ for all y ∈ Y . Then we have Theorem D.0 inf (P ) ≥ sup (P ∗ ). Proof. For any (x, y) ∈ X × Y and (x∗ , y ∗ ) ∈ X ∗ × Y ∗ Φ∗ (x∗ , y ∗ ) ≥ hx∗ , xi + hy ∗ , yi − Φ(z). Thus F (x) + Φ∗ (0, y ∗ ) ≥ h0, xi + hy ∗ , 0i = 0 121 (7.5) for all x ∈ X and y ∗ ∈ Y ∗ . Therefore inf (P ) = inf F (x) ≥ sup (−Φ∗ (0, y ∗ )) = sup (P ∗ ). y ∗ ∈Y ∗ In what follows we establish the equivalence of the primal problem (P ) and the dual problem (P ∗ ). We start with the properties of h. Lemma D.1 h is convex. Proof. The proof is by contradiction. Suppose there exist y1 , y2 ∈ Y and θ ∈ (0, 1) such that h(θ y1 + (1 − θ) y2 ) > θ h(y1 ) + (1 − θ) h(y2 ). Then there exists c and > 0 such that h(θ y1 + (1 − θ) y2 ) > c > c − > θ h(y1 ) + (1 − θ) h(y2 ). Let a1 = h(y1 ) + θ > h(y1 ) and a2 = c − − θ h(y) c − θ a1 = > h(y2 ). 1−θ 1−θ By definition of h there exist x1 , x2 ∈ X such that h(y1 ) ≤ Φ(x, y1 ) ≤ a1 and h(y2 ) ≤ Φ(x2 , y2 ) ≤ a2 . Thus h(θ y1 + (1 − θ) y2 ) ≤ Φ(θ x1 + (1 − θ) x2 , θ y1 + (1 − θ) y2 ) ≤ θ Φ(x1 , y1 ) + (1 − θ) Φ(x2 , y2 ) ≤ θ a1 + (1 − θ) a2 = c which is a contradiction. Hence h is convex. Lemma D.2 For all y ∗ ∈ Y ∗ , h∗ (y) = Φ∗ (0, y ∗ ). Proof. h(y ∗ ) = supy∈Y (hy ∗ , yi − h(y)) = supy∈Y (hy ∗ , yi − inf x∈X Φ(x, y)) = sup sup ((hy ∗ , yi − Φ(x, y)) = sup sup (h0, xi + hy ∗ , yi − Φ(x, y)) y∈Y x∈X y∈Y x∈X = sup(x,y)∈X×Y (h(0, y ∗ ), (x, y)i − Φ(x, y)) = Φ∗ (0, y ∗ ). Note that h is not necessarily l.s.c.. Theorem D.3 If h is l.s.c. at 0, then inf (P ) = sup (P ∗ ). Proof. Since F is proper, h(0) = inf x∈X F (x) < ∞. Since h is convex by Lemma D.1, it follows from Theorem C.4 that h(0) = h∗∗ (0). Thus sup (P ∗ ) = supy∗ ∈Y ∗ (−Φ(0, y ∗ )) = supy∗ ∈Y ∗ ((hy ∗ , 0i − h∗ (y ∗ )) = h∗∗ (0) = h(0) = inf (P )square 122 Theorem D.4 If h is subdifferentiable at 0, then inf (P ) = sup (P ∗ ) and ∂h(0 is the set of solutions of (P ∗ ). Proof. By Lemma D.2 y¯∗ solves (P ∗ ) if and only if −h∗ (¯ y ∗ ) = −Φ(0, y¯∗ ) = supy∗ ∈Y ∗ (−Φ(0, y ∗ )) = supy∗ (hy ∗ , 0) − h∗ (y ∗ )) = h∗∗ (0). By Theorem 2.2 h∗∗ (0) + h∗∗∗ (¯ y ∗ ) = h¯ y ∗ , 0) = 0 if and only if y¯∗ ∈ ∂h∗∗ (0). Since h∗∗∗ = h∗ by Theorem C.5, y ∗ solves (P ∗ ) if and only if y ∗ ∈ ∂h∗∗ (y ∗ ). Since ∂h(0) is not empty, ∂h(0) = ∂h∗∗ (0). Therefore ∂h(0) is the set of all solutions of (P ∗ ) and (P ∗ ) has at least one solution. Let y ∗ ∈ ∂h(0). Then hy ∗ , xi + h(0) ≤ h(x) for all x ∈ X. Let {xn } is a sequence in X such that xn → 0. lim inf h(xn ) ≥ limhy ∗ , xn i + h(0) = h(0) n→∞ and h is l.s.c. at 0. By Theorem D.3 inf (P ) = sup (P ∗ ). Corollary D.6 If there exists an x ¯ ∈ X such that Φ(¯ x, ·) is finite and continuous at 0, h is continuous on an open neighborhood U of 0 and h = h∗∗ . Moreover, inf (P ) = sup (P ∗ ) and ∂h(0) is the set of solutions of (P ∗ ). ¯ ·) is bounded above on an open Proof. First show that h is continuous. Clearly, Φ(X, neighborhood U of 0. Since for all y ∈ Y h(y) ≤ Φ(¯ x, y), his bounded above on U . Since h is convex by Lemma D.1 and h is continuous by Theorem C.6. Hence h = h∗∗ by Theorem C. Now, h is subdifferential at 0 by Theorem C.10. The conclusion follows from Theorem D.4. Example D.4 Consider the example (7.4)–(7.5), i.e., Φ(x, y) = f (x) + ϕ(Λx + y) for (12.1). Let us calculate the conjugate of Φ. Φ∗ (x∗ , y ∗ ) = supx∈X supy∈Y {hx∗ , xi + hy ∗ , yi − Φ(x, y)} = supx∈X supy∈Y {hx∗ , xi + hy ∗ , yi − f (x) − ϕ(Λx + y)} = supx∈X {hx∗ , xi − f (x) supy∈Y [hy ∗ , yi − ϕ(Λx + y)]} where supy∈Y [hy ∗ , yi − ϕ(Λx + y)] = supy∈Y [hy ∗ , Λx + yi − ϕ(Λx + y) − hy ∗ , Λxi] = supz∈Y [hy ∗ , zi − G(z)] − hy ∗ , Λxi = ϕ(y ∗ ) − hy ∗ , Λxi. 123 Thus Φ∗ (x∗ , y ∗ ) = supx∈X {hx∗ , xi − hy ∗ , Λxi − f (x) + ϕ(y ∗ )} = sup {hx∗ − Λ∗ y ∗ , xi − f (x) + ϕ(y ∗ )} = f ∗ (x∗ − Λy ∗ ) + ϕ∗ (y ∗ ). x∈X Theorem D.7 For any x ¯ ∈ X and y¯∗ ∈ Y ∗ , the following statements are equivalent. (1) x ¯ solves (P ), y¯∗ solves (P ∗ ), and min (P ) = max (P ∗ ). (2) Φ(¯ x) + Φ∗ (0, y¯∗ ) = 0. (3) (0, y¯∗ ) ∈ ∂Φ(¯ x, 0). Proof: Suppose (1) holds, then obviously (2) holds. Suppose (2) holds, then since Φ(¯ x, 0) = F (¯ x) ≥ inf (P ) ≥ sup (P ∗ ) ≥ −Φ(0, y¯∗ ) by Theorem D.2, thus Φ(¯ x, 0) = min (P ) = sup (P ∗ ) = −Φ∗ (0, y¯∗ ) Therfore (2) implies (1). Since h(0, y¯∗ ), (¯ x, 0)i = 0, (2) and (3) by Theorem C.7. Any solution y ∗ of (P ∗ ) is called a Lagrange multiplier associated with Φ. In fact for the above Example the optimality condition implies 0 = Φ(¯ x, 0) + Φ∗ (0, y¯∗ ) = F (¯ x) + F ∗ (−Λ∗ y¯∗ ) + ϕ(Λ¯ x) + ϕ(¯ y∗) (7.6) = [F (¯ x) + F ∗ (−Λ∗ y¯∗ ) − h−Λ∗ y¯∗ , x ¯i] + [ϕ(Λ¯ x) + ϕ∗ (¯ y ∗ ) − h¯ y ∗ , Λ¯ x)] Since each expression of (7.6) in square brackets is nonnegative, it follows that F (¯ x) + F ∗ (−Λ∗ y¯∗ ) − h−Λ∗ y¯∗ , x ¯i = 0 ϕ(Λ¯ x) + ϕ∗ (¯ y ∗ ) − h¯ y ∗ , Λ¯ x) = 0 By Theorem 2.2 0 ∈ ∂F (¯ x) + Λ∗ y¯∗ (7.7) y¯∗ ∈ ∂ϕ(Λ¯ x). The function L : X × Y ∗ → (−∞, ∞] definded by −L(u, y ∗ ) = sup {hy ∗ , yi − Φ(x, y)} (7.8) y∈Y is called the Lagrangian. Note that Φ∗ (x∗ , y ∗ ) = {hx∗ , xi + hx∗ , xi − Φ(x, y)} sup x∈X, y∈Y = sup hx∗ , xi + sup {hx∗ , xi − Φ(x, y)}} = sup hx∗ , xi − L(x, y ∗ ) x∈X y∈Y x∈X Thus −Φ∗ (0, y ∗ ) = inf L(x, y ∗ ) x∈X 124 (7.9) and thus the dual problem (P ∗ ) is equivalently written as sup inf L(x, y ∗ ). y ∗ ∈Y ∗ x∈X Similarly, if Φ is a proper convex l.s.c. function, then the bi-conjugate of Φ in y, Φ∗∗ x = Φ and ∗ ∗ ∗ Φ(x, y) = Φ∗∗ x (y) = supy ∗ ∈Y ∗ {hy , yi − Φx (y )} = sup {hy ∗ , yi + L(u, y ∗ )}. y ∗ ∈Y ∗ Hence Φ(u, 0) = sup L(x, y ∗ ) (7.10) y ∗ ∈Y ∗ and the primal problem (P ) is equivalently written as inf sup L(x, y ∗ ). x∈X y ∗ ∈Y By introducing the Lagrangian L, the Problems (P ) and (P ∗ ) are formulated as minmax problems, which arise in the game theory. By Theorem D.0 sup inf L(x, y ∗ ) ≤ inf sup L(x, y ∗ ). y∗ x x y∗ Theorem D.8 (Saddle Point) Assume Φ is a proper convex l.s.c. function. Then the followings are equivalent. (1) (¯ x, y¯∗ ) ∈ X × Y ∗ is a saddle point of L, i.e., L(¯ u, y ∗ ) ≤ L(¯ u, y¯∗ ) ≤ L(x, y¯∗ ) for all x ∈ X, y ∗ ∈ Y ∗ . (7.11) (2) x ¯ solves (P ), y¯∗ solves (P ∗ ), and min (P ) = max (P ∗ ). Proof: Suppose (1) holds. From (7.9) and (7.11) L(¯ x, y¯∗ ) = inf L(x, y¯∗ ) = −Φ∗ (0, y¯∗ ) x∈X and form (7.10) and (7.11) L(¯ x, y¯∗ = sup L(¯ x, y ∗ ) = Φ(¯ x, 0) y ∗ ∈Y ∗ Thus, Φ∗ (¯ u, 0) + Φ(0, y¯∗ ) = 0 and (2) follows from Theorem D.7. Conversely, if (2) holds, then from (7.9) and (7.10) −Φ∗ (0, y¯∗ ) = inf x∈X L(x, y¯∗ ) ≤ L(¯ x, y¯∗ ) Φ∗ (¯ x, 0) = supy∗ ∈Y ∗ L(¯ xx, y) ≥ L(¯ x, y¯∗ ) By Theorem D.7 −Φ∗ (0, y¯∗ = Φ∗ (¯ x, 0) and (7.11) holds. Theorem D.8 implies that no duality gup between (P ) and (P ∗ ) is equivalent to the saddle point property of the pair (¯ x, y¯∗ ). 125 For the Example L(x, y ∗ ) = f (x) + hy ∗ , Λxi − ϕ(y ∗ ) (7.12) If (¯ x, y¯∗ ) is a saddle point, then from (7.11) 0 ∈ ∂f (¯ x) + Λ∗ y¯∗ (7.13) 0 ∈ Λ¯ x − ∂ϕ∗ (¯ x). It follows from Theorem 2.2 that the second equation is equivalent to y¯∗ ∈ ∂ϕ(Λ¯ x) and (7.13) is equivalent to (7.7). 7.3 Monotone Operators and Yosida-Morrey approximations 7.4 Lagrange multiplier Theory In this action we introduce the generalized Lagrange multiplier theory. We establish ¯ and derive the optimalthe conditions for the existence of the Lagrange multiplier λ ity condition (1.7)–(1.8). In Section 1.5 it is shown that the both Uzawa and augmented Lagrangian method are fixed point methods for the complementarity condition (??))and the convergence analysis is presented . In Section 1.6 we preset the concrete examples and demonstrate the applicability of the results in Sections 1.4–1.5. 8 Optimization Algorithms In this section we discuss iterative methods for finding a solution to the optimization problem. First, consider the unconstrained minimization: min J(x) over a Hilbert space X. Assume J : X → R+ is continuously differentiable. If u∗ ∈ X minimizes J, then the necessarily optimality is given by J 0 (u∗ ) = 0. The gradient method is given by xn+1 = xn − α J 0 (xn ) (8.1) where α > 0 is a stepsize chosen so that the decent J(xn+1 ) < J(xn ) holds. Let gn = J 0 (xn ) and since J(xn − α gn ) − J(xn ) = −α |J 0 (xn )|2 + o(α), (8.2) there exits a α > 0 satisfies the decent property. A stepsize α > 0 can be determined by min J(xn − α gn ) (8.3) α>0 or by a line search method (e.g, Amijo-rule) 126 Next, consider the constrained minimization min J(x) over x ∈ C. (8.4) where C is a closed convex set in X. It follows from Theorem 7 that if J : X → R is C 1 and x∗ ∈ C is a minimizer, then (J 0 (x∗ ), x − x∗ ) ≥ 0 for all x ∈ C. (8.5) Define the orthogonal projection operator PC of X onto C, i.e., z ∗ = PC x minimizes |z − x|2 over z ∈ C. From Theorem 7, z ∗ ∈ C satisfies (z ∗ − x, z − z ∗ ) ≥ 0 for all z ∈ C. (8.6) Note that (8.5) is equivalent to (x∗ − (x∗ − α J 0 (x∗ )), x − x∗ ) ≥ 0 for all x ∈ C and α > 0. Thus, it is follows from (8.6) that (8.5) is equivalent to x∗ = PC (x∗ − α J 0 (x∗ )). The projected gradient method for (8.4) is defined by xn+1 = PC (xn − αn J 0 (xn )) (8.7) with a stepsize αn > 0. Define the the Hessian H = J 00 (x0 ) ∈ L(X, X) of J at x0 by J(x) − J(x0 ) − (J 0 (x0 ), x − x0 ) = 1 (x − x0 , H(x − x0 )) + o(|x − x0 |2 ). 2 Thus, we minimize over x J(x) ∼ J(x0 ) + (J 0 (x0 ), x − x0 ) + 1 (x − x0 , H(x − x0 )) 2 we obtain the damped-Newton method for equation given by xn+1 = xn − α H(xn )−1 J 0 (xn ), (8.8) where α > 0 is a stepsize and α = 1 corresponds to the Newton method. In the case of min J(x) over x ∈ C we obtain the projected Newton method xn+1 = P rojC (xn − α H(xn )−1 J 0 (xn )). (8.9) Computing the Hessian can be expensive and numerical evaluation can also be less accurate. The finite rank scant update (BFGS and GFP) is used to substitute H (Quasi-Newton method). It is a variable merit method and such updates are preconditioner for the gradient method. 127 Consider the (regularized) nonlinear least square problem min 1 α |F (x)|2Y + (P x, x)X 2 2 (8.10) where F : X → Y is C 1 and P is a nonnegative self-adjoint operator on a Hilbert space X. The Gauss-Newton method is an iterative method of the form 1 α xn+1 = argmin{ |F 0 (xn )(x − xn ) + F (xn )|2Y + (P x, x)X }, 2 2 i.e., xn+1 = xn − (F 0 (xn )∗ F (xn ) + α P )−1 (F 0 (xn )∗ F (xn )). 8.1 (8.11) Constrained minimization and Lagrange multiplier method Consider the constrained minimization min J(x, u) = F (x) + H(u) subject to E(x, u) = 0 and u ∈ C. (8.12) We use the implicit function theory for developing algorithms for (8.12). x, u ¯) satisfies Implicit Function Theory Let E : X × U → X is C 1 . Suppose a pair (¯ E(x, u) = 0 and Ex (¯ x, u ¯) is bounded invertible. Then the exists a δ > 0 such that for |u − u ¯| < δ equation E(x, u) = 0 has a locally defined unique solution x = Φ(u). Moreover, Φ : U → X is continuously differentiable at u ¯ and x˙ = Φ0 (¯ x)(d) satisfies Ex (¯ x, u ¯)x˙ + Eu (¯ x, u ¯)d = 0. Theorem (Lagrange Calculus) Assume that E(x, u) = 0 and Ex (x, u) is bounded invertible. Then, by the implicit function theory x = Φ(u) and for J(u) = J(Φ(u), u) (J 0 (u), d) = (H 0 (u), d) + (Eu (x, u)∗ λ, d) where λ ∈ Y satisfies Ex (x, u)∗ λ + F 0 (x) = 0. Proof: From the implicit function theory (J 0 , d) = (F 0 (x), x) ˙ + (H 0 (u), d), where Ex (x, u)x˙ + Eu (x, u) = 0. Thus, the claim follows from (F 0 (x), x) ˙ = −(Ex∗ λ, x) ˙ = −(λ, Ex x) ˙ = (λ, Eu d). Hence we obtain the gradient method for equality constraint problem (8.12); Gradient method 1. Solve for xn E(xn , un ) = 0 128 2. Solve for λn Ex (xn , un )∗ λn + F 0 (xn ) = 0 3. Compute the gradient by gn = H 0 (un ) + Eu (xn , un )∗ λn 4. Gradient Step: Update un+1 = PC (un − α gn ) with a line search for α > 0. 8.2 Conjugate gradient method In this section we develop the conjugate gradient and residual method for the saddle point problem. Consider the quadratic programing 1 J(x) = (Ax, x)X − (b, x)X 2 where A is positive self-adjioint operator on a Hilbert space X. Then, the negative gradient of J is given by r = −J 0 (x) = b − Ax. The gradient method is given by xk+1 = xk + αk pk , pk = rk , where the step size αk is selected by min J(xk + α rk ) over α > 0. That is, the Cauchy step is given by 0 = (J 0 (xk+1 ), pk ) = α (Apk , pk ) − (pk , rk ) ⇒ αk = (pk , rk ) . (Apk , pk ) (8.13) It is known that the gradient method with Cauchy step searches the solution on a subspace spanned by the major and minor axes of A dominantly, and exhibits a zigzag pattern of its iterates. If the condition number ρ= max σ(A) min σ(A) is large, then it is very slow convergent. In order to improve the convergence we introduce the conjugacy of the search directions pk and it can be achieved by the conjugate gradient method. For example, a new search direction pk is extracted for the residual rk = b − Axk by pk = rk − β pk−1 and the orthogonality 0 = (Apk , pk−1 ) = (rk , Apk−1 ) − β(Apk−1 , pk−1 ) 129 implies βk = (rk , Apk−1 ) . (Apk−1 , pk−1 ) (8.14) Let P is the pre-conditioner, self-adjiont operator so that the condition number of 1 1 P − 2 AP − 2 is much smaller that the one for A. If we apply (8.13)(8.14) with the pre-conditioner P , we have the pre-conditioned conjugate gradient method: • Initialize x0 and set r0 = b − Ax0 , z0 = P −1 r0 , p0 = z0 • Iterate on k until (rk , zk ) is sufficiently small αk = (rk , zk ) , (pk , Apk ) xk+1 = xk + αk pk , rk+1 = rk − αk Apk . (zk+1 , rk+1 ) , (zk , rk ) pk+1 := zk+1 + βk pk zk+1 = P −1 rk+1 , βk = Here, zk is the pre-conditioned residual and (rk , zk ) = (rk , P −1 rk ). The above formulation is equivalent to applying the conjugate gradient method without preconditioned system 1 1 1 P − 2 AP − 2 x ˆ = P − 2 b, 1 where x ˆ = P 2 x. We have the conjugate gradient theory. Theorem (Conjugate gradient method) (1) We have conjugate property (rk , zj ) = 0, (pk , Apj ) = 0 for 0 ≤ j < k. (2) xm+1 minimizes J(x) over x ∈ x0 + span{p0 , · · · , pm }. Proof: For k = 1 (r1 , z0 ) = (r0 , z0 ) − α0 (Az0 , z0 ) = 0 and since (r1 , z1 ) = (r0 , z1 ) − α0 (r1 , Az0 ) (p1 , Ap0 ) = (z1 , Az0 ) + β0 (z0 , Az0 ) = − 1 (r1 , z1 ) + β0 (z0 , Az0 ) = 0 α0 Note that span{p0 , · · · , pk } = span{z0 , · · · , zk }. In induction in k, for j < k (rk+1 , zj ) = (rk , zj ) − αk (Apk , pj ) = 0 and ((rk+1 , zk ) = (rk , zk ) − αk (Apk , pk ) = 0 Also, for j < k (pk+1 , Apj ) = (zk+1 , Apj ) + βk (pk , Apj ) = − 1 (zk+1 , rj+1 − rj ) = 0 αj and (pk+1 , Apk ) = (zk+1 , Apk ) + βk (pk , Apk ) = − 130 1 (rk+1 , zk ) + βk (pk , Apk ) = 0. αk Hence (1) holds. For (2) we note that (J 0 (xm+1 ), pj ) = (rm+1 , pj ) = 0 for j ≤ m and xm+1 = x0 + span{p0 , · · · , pm }. Thus, (2) holds. Remark Note that rk = r0 − k−1 X αj P −1 Apj j=0 Thus, |rk |2 = |r0 |2 − (2 k−1 X αj P −1 Apj , r0 ) + |αj |. Consider the constrained quadratic programing min 1 (Qy, y)X − (a, y)X 2 subject to Ey = b in Y. Then the necessary optimality condition is written for x = (y, λ) ∈ X × Y : Q E∗ , A(y, λ) = (a, b) where A = E 0 where A is self-adjoint but is indefinite. In general we assume A is self-adjoint and invertible. Consider the least square problem 1 1 1 |Ax − b|2X = (AAx, x)X − (Ab, x)X + |b|2 . 2 2 2 We consider the conjugate directions {pk } satisfying (Apk , Apj ) = 0, j < k. The preconditioned conjugate residual method may be derived in the same way as done for the conjugate gradient method: • Initialize x0 and set r0 = P −1 (b − Ax0 ), p0 = r0 , (Ap)0 = Ap0 • Iterate with k until (rk , rk ) is sufficiently small αk = (rk , Ark ) , ((Ap)k , P −1 (Ap)k ) βk = (rk+1 , Ark+1 ) , (rk , Ark ) xk+1 = xk + αk pk , pk+1 = rk+1 + βk pk , rk+1 = rk − αk P −1 (Ap)k . (Ap)k+1 = Ark+1 + βk (Ap)k Theorem (Conjugate residual method) Assuming αk 6= 0, (1) We have conjugate property (rj , Ark ) = 0, (Apk , P −1 Apj ) = 0 for 0 ≤ j < k. (2) xm+1 minimizes |Ax − b|2 over x ∈ x0 + span{p0 , · · · , pm }. 131 Proof: For k = 1 (Ar0 , r1 ) = (Ar0 , r0 ) − α0 (Ap0 , P −1 Ap0 ) = 0 and since (Ap0 , P −1 Ap1 ) = (P −1 Ap0 , Ar1 ) + β0 (Ap0 , P −1 Ap0 ) and (P −1 Ap0 , Ar1 ) = 1 1 α0 (r0 − r1 , Ar1 ) = − α0 (r1 , Ar1 ), we have (Ap0 , P −1 Ap1 ) = − 1 (r1 , Ar1 ) + β0 (Ap0 , P −1 Ap0 ) = 0. α0 Note that span{p0 , · · · , pk } = span{r0 , · · · , rk } and rk = r0 +P −1 A span{p0 , · · · , pk−1 }. In induction in k, for j < k (rk+1 , Arj ) = (rk , Arj ) − αk (P −1 Apk , Arj ) = 0 and (Ark+1 , rk ) = (Ark , rk ) − αk (Apk , P −1 Apk ) = 0 Also, for j < k (Apk+1 , P −1 Apj ) = (Ark+1 , P −1 Apj ) + βk (Apk , P −1 Apj ) = − 1 (Ark+1 , rj+1 − rj ) = 0 αj and (Apk+1 , P −1 Apk ) = (Ark+1 , P −1 Apk )+βk (Apk , P −1 Apk ) = − 1 (rk+1 , Ark+1 )+βk (pk , Apk ) = 0. αk Hence (1) holds. For (2) we note that (Axm+1 − b, P −1 Apj ) = −(rm+1 , P −1 Apj ) = 0 for j ≤ m and xm+1 = x0 + span{p0 , · · · , pm }. Thus, (2) holds. direct inversion in the iterative subspace (DIIS) 8.3 Nonlinear Conjugate Residual method One can extend the conjugate residual method for the nonlinear equation. Consider the equality constrained optimization min J(y) subject to E(y) = 0. The necessary optimality system for x = (y, λ) is 0 J (y) + E 0 (y)∗ λ = 0. F (y, λ) = E(y) Multiplication of vector by the Jacobian F 0 can be approximated by the difference of two residual vectors by F (xk + t y) − F (xk ) ∼ Ay with t |y| ∼ 1. t Thus, we have the nonlinear version of the conjugate residual method: Nonlinear Conjugate Residual method 132 • Calculate sk for approximating Ark by sk = F (xk + t rk ) − F (xk ) 1 , t= . t |rk | • Update the direction by pk = P −1 rk − βk pk−1 , βk = (sk , rk ) . (sk−1 , rk−1 ) • Calculate qk = sk − βk qk−1 . • Update the solution by xk+1 = xk + αk pk , αk = (sk , rk ) . (qk , P −1 qk ) • Calculate the residual rk+1 = F (xk+1 ). 9 Newton method and SQP Next, we develop the Newton method for J 0 (u) = 0. Define the Lagrange functional L(x, u, λ) = F (x) + H(u) + (λ, E(x, u)). Note that J 0 (u) = Lu (x(u), u, λ(u)) where (x(u), λ(u)) satisfies E(x(u), u) = 0, E 0 (x(u))∗ λ(u) + F 0 (x(u)) = 0 where we assumed Ex (x, u) is bounded invertible. By the implicit function theory J 00 (u) = Lux (x(u), u, λ(u))x˙ + Eu (x(u), u, λ(u))∗ λ˙ + Luu (x(u), u, λ(u)), ˙ satisfies where (x, ˙ λ) x˙ Lxu (x, u, λ) Lxx (x, u, λ) Ex (x, u)∗ = 0. + Ex (x, u) 0 Eu (x, u) λ˙ Thus, x˙ = −(Ex (x, u))−1 Eu (x, u), λ˙ = −(Ex (x, u)∗ )−1 (Lxx (x, u, λ)x˙ + Lxu (x, u, λ)) and J 00 (u) = Lux x˙ − Eu (x, u)∗ (Ex (x, u)∗ )−1 (Lux x˙ + Lxu ) + Luu (x, u, λ). Theorem (Lagrange Calculus II) The Newton step J 00 (u)(u+ − u) + J 0 (u) = 0 133 is equivalently written as Lxx (x, u, λ) Lxu (x, u, λ) Ex (x, u)∗ Lux (x, u, λ) Luu (x, u, λ) Eu (x, u)∗ Ex (x, u) Eu (x, u, λ) 0 ∆x 0 ∆u + Eu (x, u)∗ λ + H 0 (u) ∆λ 0 =0 where ∆x = x+ − x, ∆u = u+ − u and ∆λ = λ+ − λ. Proof: Define the linear operator T by −Ex (x, u)−1 Eu (x, u) . T (x, u) = I Then, the Newton method is equivalently written as 0 ∆x ∗ ∗ 00 = 0. + T (x, u) T (x, u) L (x, u, λ) Eu∗ λ + H 0 (u) ∆u Since E 0 (x, u) is surjective, it follows from the closed range theorem N (T (x, u)∗ ) = R(T (x, u))⊥ = N (E 0 (x, u)∗ )⊥ = R(E 0 (x, u)). Hence, we obtain the claimed update. Thus, we have the Newton method for the equality constraint problem (8.12). Newton method 1. Given u0 , initialize (x0 , λ0 ) by E(x0 , u0 ) = 0, Ex (x0 , u0 )∗ λ0 + F 0 (x0 ) = 0 2. Newton Step: Solve for (∆x, ∆u, ∆λ) 00 L (xn , un , λn ) E 0 (xn , un )∗ ∆x 0 ∆u + Eu (xn , un )∗ λn + H 0 (un ) = 0 0 E (xn , un ) 0 ∆λ 0 and update un+1 = P rojC (un + α ∆u) with a stepsize α > 0. 3. Feasible Step: Solve for (xn+1 , λn+1 ) E(xn+1 , un+1 ) = 0, Ex (xn+1 , un+1 )∗ λn+1 + F 0 (xn+1 ) = 0. 4. Stop or set n = n + 1 and return to step 2. where L00 (xn , un , λn ) = F 00 (xn ) + (λn , Exx (xn , un )) (λn , Eux (xn , un )) 134 (λn , Exu (xn , un )) H 00 (un ) . + (λn , Euu (xn , un )) and E 0 (xn , un ) = (Ex (xn , un ), Eu (xn , un )). Consider the general equality constraint minimization; min J(y) subject to E(y) = 0 The necessary optimality condition is given by J 0 (y) + E 0 (y)∗ λ = 0, E(y) = 0, (9.1) where we assume E 0 (y) is surjective. Problem (8.12) is a specific case of this, i.e., y = (x, u), J(y) = F (x) + H(u), E = E(x, u). The Newton method applied to the necessary optimality (9.1) for (y, λ) is as follows. One can apply the Newton method for the necessary optimality system (Lx (x, u, λ), Lu (x, u, λ, E(x, u)) = 0 and results in the SQP (sequential quadratic programing): Newton method applied to Necessary optimality system (SQP) 1. Newton direction: Solve for (∆y, ∆λ) 00 0 L (yn , λn ) E 0 (yn )∗ ∆y J (yn ) + E 0 (yn )∗ λn + = 0. 0 E (yn ) 0 ∆λ E(yn ) (9.2) 2. Update yn+1 = yn + α ∆y and λn+1 = λm + α ∆λ. 3. Stop or set n = n + 1 and return to step 1. where L00 (y, λ) = J 00 (y) + E 00 (y)∗ λ. The both Newton methods solve the same linear system (9.2) with the different right hand side. But, the first Newton method we iterate on the variable u only but each step uses the feasible step E(yn ) = E(xn , un ) = 0 and Lx (xn , un , λn ) = 0 for (xn , λn ), which are the steps for the gradient method. The, it computes the Newton direction for J 0 (u) and updates u. In this sense the second Newton’s method is a relaxed version of the first one. 9.1 Sequential programing In this section we discuss the sequential programming. It is an intermediate method between the gradient method and the Newton method. Consider the constrained optimization F (y) subject to E(y) = 0, y ∈ C. We linearize the equality constraint and consider a sequence of the constrained optimization; min F (y) subject to E 0 (yn )(y − yn ) + E(yn ) = 0, 135 y ∈ C. (9.3) The necessary optimality (if E ( yn ) is surjective) is given by (F 0 (y) + E 0 (yn )∗ λ, y˜ − y) ≥ 0 for all y˜ ∈ C E 0 (yn )(y (9.4) − yn ) + E(yn ) = 0. Note that the necessary condition for (y ∗ , λ∗ ) is written as (F 0 (y ∗ ) + E 0 (yn )∗ λ∗ − (E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗ , y˜ − y ∗ ) ≥ 0 for all y˜ ∈ C E 0 (y n )(y ∗ − yn ) + E(yn ) = E 0 (y n )(y ∗ − yn ) + E(yn ) − (9.5) E(y ∗ ). Suppose we have the Lipschitz continuity of solutions to (9.4) and (9.5) with respect to the perturbation; ∆1 = (E 0 (yn )∗ −E 0 (y ∗ )∗ )λ∗ and ∆2 = E 0 (yn )(y ∗ −yn )+E(yn )−E(y ∗ ), we have |y − y ∗ | ≤ c1 |(E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗ | + c2 |E 0 (yn )(y ∗ − yn ) + E(yn ) − E(y ∗ )|. Thus, for the (damped) update: yn+1 = (1 − α)yn + α y, α ∈ (0, 1), (9.6) we have the estimate |yn+1 − y ∗ | ≤ (1 − α + αγ)|yn − y ∗ | + cα |yn − y ∗ |2 , (9.7) for some constants γ and c > 0. In fact, for the case C = X let zn = (yn , λn ), z ∗ = (y ∗ , λ∗ ) and define Gn by 00 F (yn ) E 0 (yn )∗ . Gn = E 0 (yn ) 0 For the update: zn+1 = (1 − α)zn + α z, α ∈ (0, 1] we have ∗ ∗ zn+1 − z = (1 − α) (zn − z ) + with δn = α G−1 n ( (E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗ 0 F 0 (y ∗ ) − F 0 (yn+1 ) − F 00 (yn )(y ∗ − yn+1 ) E(y ∗ ) − E 0 (yn )(y ∗ − yn ) − E(yn ). Note that if F 00 (yn ) = A, then −1 ∆1 A ∆1 − E 0 (yn )∗ e −1 Gn = , 0 e + δn ) . e = (E 0 (yn )A−1 E 0 (yn )∗ )−1 E 0 (yn )A−1 ∆1 Thus, we have |G−1 n (E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗ 0 136 | ≤ γ |y n − y ∗ | Since ∗ 2 |G−1 n δn | ≤ c |yn − y | , we have |zn+1 − z ∗ | ≤ (1 − α + αγ) |zn − z ∗ | + αc |zn − z ∗ |2 which imply (9.7). When either the quadratic variation of E is dominated by the linearization E 0 of E, i.e., |E 0 (y ∗ )† (E 0 (y n ) − E 0 (y ∗ ))| ≤ β |y n − y| with small β > 0 or the multiplier |λ∗ | is small, γ > 0 is sufficiently small. The estimate (9.7) implies that if |y − y ∗ |2 is dominating we use a small α > 0 to the damp out and if |y − y ∗ | is dominating we use α → 1. 9.1.1 Second order version The second order sequential programming is given by min F (y) + hλn , E(y) − (E 0 (yn )(y − yn ) + E(yn ))i (9.8) subject to E 0 (yn )(y − yn ) + E(yn ) = 0 and y ∈ C. The necessary optimality condition for (9.8) is given by (F 0 (y) + (E 0 (y)∗ − E 0 (yn )∗ )λn + E 0 (yn )∗ λ, y˜ − y) ≥ 0 for all y˜ ∈ C (9.9) E 0 (yn )(y − yn ) + E(yn ) = 0. It follows from (9.5) and (9.9) that |y − y ∗ | + |λ − λ∗ | ≤ c (|yn − y ∗ |2 + |λn − λ∗ |2 ). In summary we have Sequential Programming I 1. Given yn ∈ C, we solve for min F (y) subject to E 0 (yn )(y − yn ) + E(yn ) = 0, over y ∈ C. 2. Update yn+1 = (1 − α) yn + α y, α ∈ (0, 1). Iterate until convergence. Sequential Programming II 1. Given yn ∈ C, λn and we solve for y ∈ C: min F (y)+(λn , E(y)−(E 0 (yn )(y−yn )+E(yn )) subject to E 0 (yn )(y−yn )+E(yn ) = 0, 2. Update (yn+1 , λn+1 ) = (y, λ). Iterate until convergence. 137 y ∈ C. 9.1.2 Examples Consider the constrained optimization of the form min F (x) + H(u) subject to E(x, u) = 0 and u ∈ C. (9.10) Suppose H(u) is nonsmooth. We linearize the equality constraint and consider a sequence of the constrained optimization; min F (x) + H(u) + hλn , E(x, u) − (E 0 (xn , un )(x − xn , u − un ) + E(xn , un )i subject to E 0 (xn , un )(x − xn , u − un ) + E(xn , un ) = 0 and u ∈ C. (9.11) Note that if (x∗ , u∗ ) is an optimizer of (9.10), then F 0 (x∗ ) + (E 0 (x∗ , u∗ ) − E 0 (xn , un ))λn + E 0 (xn , un )λ∗ = (E 0 (xn , un ) − E 0 (x∗ , u∗ ))(λ∗ − λn ) = ∆1 E 0 (xn , un )(x∗ − xn , u∗ − un ) + E(xn , un ) = ∆2 , |∆2 | ∼ M (|xn − x∗ |2 + |un − u∗ |2 ). (9.12) Thus, (x∗ , u∗ ) is the solution to the perturbed problem of (9.11): min F (x) + H(u) + hλn , E(x, u) − (E 0 (xn , un )(x − xn , u − un ) + E(xn , un )i subject to E 0 (xn , un )(x − xn , u − un ) + E(xn , un ) = ∆ and u ∈ C. (9.13) Let (xn+1 , un+1 ) be the solution of (9.11). One can assume the Lipschitz continuity of solutions to (9.13): |xn+1 − x∗ | + |un+1 − u∗ | + |λn − λ∗ | ≤ C |∆| (9.14) From (9.12) and assumption (9.14) we have the quadratic convergence of the sequential programing method (9.17): |xn+1 − x∗ | + |un+1 − u∗ | + |λ − λ∗ | ≤ M (|xn − x∗ |2 + |un − u∗ |2 + |λn − λ∗ |). (9.15) Assuming H is convex, the necessary optimality given by 0 F (x) + (Ex (x, u)∗ − Ex (xn , un )∗ )λn + Ex (xn , un )∗ λ = 0 H(v) − H(u) + ((Eu (x, u)∗ − Eu (xn , un )∗ )λn + Eu (x, u)∗ λ, v − u) ≥ 0 for all v ∈ C Ex (xn , un )(x − xn ) + Eu (xn , un )(u − un ) + E(xn , un ) = 0. (9.16) As in the Newton method we use (Ex (x, u)∗ − Ex (xn , un )∗ )λn ∼ E 00 (xn , un )(x − xn , u − un )λn F 0 (xn+1 ) ∼ F 00 (xn )(xn+1 − xn ) + F 0 (xn ) and obtain F 00 (xn )(x − xn ) + F 0 (xn ) + λn (Exx (xn , un )(x − xn ) + Exu (xn , un )(u − un )) + Ex (xn , un )∗ λ = 0 Ex (xn , un )x + Eu (xn , un )u = Ex (xn , un )xn + Eu (xn , un )un − E(xn , un ), 138 which is a linear system of equations for (x, λ). Thus, one has x = x(u), λ = λ(u), a continuous affine function in u; x(u) F 00 (xn ) + λn Exx (xn , un ) Ex (xn , un )∗ = −1 λ(u) Ex (xn , un ) RHS (9.17) 0 with RHS = F 00 (xn )xn − F 0 (xn ) + λn Exx (xn , un )xn − Exu (xn , un )(u − un ))λn Ex (xn , un )xn − Eu (xn , un )(u − un ) − E(xn , un ) and then the second inequality of (9.16) becomes the variational inequality for u ∈ C. Consequently, we obtain the following algorithm: Sequential Programming II 1. Given u ∈ C, solve (9.17) for (x(u), λ(u)). 2. Solve the variational inequality for u ∈ C: H(v) − H(u) + ((Eu (x(u), u)∗ − Eu (xn , un )∗ )λn + Eu (xn , un )∗ λ(u), v − u) ≥ 0, (9.18) for all v ∈ C. 3. Set un+1 = u and xn+1 = x(u). Iterate until convergence. Example (Optimal Control Problem) Let (x, u) ∈ H 1 (0, T ; Rn ) × L2 (0, T ; Rm ). Consider the optimal control problem: Z min T (`(x(t)) + h(u(t))) dt 0 subject to the dynamical constraint d x(t) = f (x(t)) + Bu(t), dt x(0) = x0 and the control constraint u ∈ C = {u(t) ∈ U, a.e.}, where U is a closed convex set in Rm . Then, the necessary optimality condition for (9.11) is E 0 (xn , un )(x − xn , u − un ) + E(xn , un ) = − F 0 (x) + Ex (xn , un )λ = d x + f 0 (xn )(x − xn ) + Bu = 0 dt d λ + f 0 (xn )t λ + `0 (x) = 0 dt u(t) = argminv∈U {h(v) + (B t λ(t), v)}. 139 t If h(u) = α2 |u|2 and U = Rm , then u(t) = − B αλ(t) . Thus, (9.11) is equivalent to solving the two point boundary value for (x, λ): 1 d 0 t dt x = f (xn )(x − xn ) − α BB λ, x(0) = x0 − d λ = (f 0 (x) − f 0 (xn ))∗ λn + f 0 (xn )∗ λ + `0 (x), dt λ(T ) = 0. Example (Inverse Medium problem) Let (y, v) ∈ H 1 (Ω) × H ( Ω). Consider the inverse medium problem: Z Z 1 α 2 min |y − z| dsx + (β |v| + |∇v|2 ) dx 2 Γ 2 Ω subject to ∂y = g at the boundary Γ ∂n −∆y + vy = 0 in Ω, and v ∈ C = {0 ≤ v ≤ γ < ∞}, where Ω is a bounded open domain in R2 , z is a boundary measurement of the voltage y and g ∈ L2 (Γ) is a given current. Problem is to determine the potential function v ∈ C from z ∈ L2 (Γ) in a class of sparse and clustered media. Then, the necessary optimality condition for (9.11) is given by ∂y =g −∆y + vn (y − yn ) + vyn = 0, ∂n ∂λ −∆λ + vn λ = 0, =y−z ∂n Z (α (∇v, ∇φ − ∇v) + β (|φ| − |v|) + (λyn , v)) dx ≥ 0 for all φ ∈ C Ω 9.2 Exact penalty method Consider the penalty method for the inequality constraint minimization min subject to Gx ≤ c F (x), (9.19) where G ∈ L(X, L2 (Ω)). If x∗ is a minimizer of (9.19), the necessary optimality condition is given by F 0 (x∗ ) + G∗ µ = 0, µ = max(0, µ + Gx∗ − c) a.e. in Ω. (9.20) For β > 0 the penalty method is defined by min F (x) + β ψ(Gx − c) where Z ψ(y) = max(0, y) dω. Ω 140 (9.21) The necessary optimality condition is given by −F 0 (x) ∈ βG∗ ∂ψ(Gx − c). where (9.22) s<0 0 [0, 1] s = 0 ∂ψ(s) = 1 s>0 Suppose the Lagrange multiplier µ satisfies sup |µ(ω)| ≤ β, (9.23) ω∈Ω then µ ∈ β ∂ψ(Gx∗ − c) and thus x∗ satisfies (9.22). Moreover, if (9.21) has a unique minimizer in a neighborhood of x∗ , then x = x∗ where x is a minimizer of (9.21). Due to the singularity and the non-uniqueness of the subdifferential of ∂ψ, the direct treatment of the condition (9.21) many not be feasible for numerical computation. We define a regularized functional of max(0, s); for > 0 s≤0 2 1 2 max (0, s) = 2 |s| + 2 0 ≤ s ≤ . s s≥0 and consider the regularized problem of (9.21); min J (x) = F (x) + β ψ (Gx − c), whee (9.24) Z ψ (y) = max (0, y) dω Ω Theorem (Consistency) Assume F is weakly lower semi-continuous. For an arbitrary β > 0, any weak cluster point of the solution x , > 0 of the regularized problem (9.24) converges to a solution x of the non-smooth penalized problem (9.21) as → 0. Proof: First we note that 0 ≤ max(s) − max(0, s) ≤ for all s ∈ R. Let x be a solution 2 to (9.21) and x be a solution of the regularized problem (9.24). Then we have F (x ) + β ψ (x ) ≤ F (x) + β ψ (x) F (x) + β ψ(x) ≤ F (x ) + β ψ(x ). Adding these inequalities, ψ (x ) − ψ(x ) ≤ ψ (x) − ψ(x). Thus, we have for all cluster points x ¯ of x F (¯ x) + β ψ(¯ x) ≤ F (x) + β ψ(x), 141 since F is weakly lower semi-continuous, ψ (x ) − ψ(¯ x) = ψ (x ) − ψ(x ) + ψ(x ) − ψ(¯ x) ≤ ψ (x) − ψ(x) + ψ(x ) − ψ(¯ x). and lim (ψ (x) − ψ(x) + ψ(x ) − ψ(¯ x)) ≤ 0 →0+ As a consequence of Theorem, we have Corollary If (9.21) has the unique minimizer in a neighborhood of x∗ and (9.23) holds, then x converges to the solution x∗ of (9.19) as → 0. The necessary optimality for (9.24) is given by F 0 (x) + G∗ ψ0 (Gx − c) = 0. (9.25) Although the non-uniqueness for concerning subdifferential in the optimality condition in (9.22) is now bypassed through regularization, the optimality condition (9.25) is still nonlinear. For this objective, we first observe that max0 has an alternative representation, i.e., 0 s≤0 0 max (s) = χ (s)s, χ (s) = (9.26) 1 max(,s) s > 0 This suggests the following semi-implicit fixed point iteration; α P (xk+1 −xk )+F 0 (xk )+β G∗ χk (Gxk −c)(Gxk+1 −c)j = 0, χk = χ (Gxk −c), (9.27) where P is positive, self-adjoint and serves a pre-conditioner for F 0 and α > 0 serves a stabilizing and acceleration stepsize (see, Theorem 2). Let dk = xk+1 − xk . Equation (9.29) for xk+1 is equivalent to the equation for dk α P dk + F 0 (xk ) + βG∗ χk (Gxk − c + Gdk ) = 0, which gives us (α P + β G∗ χk G)dk = −F0 (xk ). (9.28) The direction dk is a decent direction for J (x) at xk , indeed, (dk , J0 (xk )) = −((α P + β G∗ χk G)dk , dk ) = −α(Ddk , dk ) − β(χk Gdk , Gdk ) < 0. Here we used the fact that P is strictly positive definite. So, the iteration (9.29) can be seen as a decent method: Algorithm (Fixed point iteration) Step 0. Set parameters: β, α, , P Step 1. Compute the direction by (α P + βG∗ χk G)dk = −F 0 (xk ) Step 2. Update xk+1 = xk + dk . 142 Step 3. If |J0 (xk )| < T OL, then stop. Otherwise repeat Step 1 - Step 2. Let us make some remarks on the Algorithm. In many applications, the structure of F 0 and G are sparse block diagonals, and the resulting system (9.28) for the direction dk then becomes a linear system with a sparse symmetric positive-definite matrix; and can be efficiently solved by the Cholesky decomposition method, for example. If F (x) = 12 (x, Ax) − (b, x), then we have F 0 (x) = Ax − b. For this case we may use the alternative update α P (xk+1 − xk ) + Axk+1 − b + β G∗ χk (Gxk+1 − c) = 0, assuming that it doesn’t cost much to perform this fully implicit step. For the bilateral inequality constraint (??)-(??) the update (9.29) becomes α P (xk+1 − xk ) + F 0 (xk ) + β G(j, :)∗ χk (Gxk+1 − g)j = 0 (9.29) where the active index set Ak and the diagonal matrix χk on Ak are defined by j ∈ Ak = {j : (Gxk − g)j ≥ gj }, (χk )jj = 1 . max(, (Gxk − g)j ) The algorithm is globally convergent practically and the following results justify the fact. Theorem 3 Let R(x, x ˆ) = −(F (x) − F (ˆ(x)) − F 0 (ˆ x)(x − ˆ(x))) + α (P (x − x ˆ), x − x ˆ) ≥ ω |x − x ˆ|2 for some ω > 0 and all x, x ˆ. Then, we have R(xk+1 , xk ) + F (xk+1 ) − F (xk ) + + X β 2 ((χk Gdk , Gdk ) (χk , |Gxk+1 − c − g|2 − |Gxk − c − g|2 ) + j0 X (χk , |Gxk+1 − c + g|2 − |Gxk − c + g|2 ), j 00 where {j 0 : (Gxk − c − g)j 0 ≥ 0} and {j 00 : (Gxk − c + g)j 00 ≤ 0}. Proof: Multiplying (9.29) by dk = xk+1 − xk α (P dk , dk ) − (F (xk+1 ) − F (xk ) − F 0 (xk )dk ) +F (xk+1 ) − F (xk ) + Ek = 0, where Ek = β(χk (G(j, :)xk+1 − c), G(j, :)dk ) = + X β 2 ((χk G(j, :)t dk , G(j, :)dk ) (χk , |Gxk+1 − c − g|2 − |Gxk − c − g|2 ) + j0 X (χk , |Gxk+1 − c + g|2 − |Gxk − c + g|2 )), j 00 Corollary 2 Suppose all inactive indexes remain inactive, then ω |xk+1 − xk |2 + J (xk+1 ) − J (xk ) ≤ 0 143 and {xk } is globally convergent. p p Proof: Since s → ψ( |s − g|) on s ≥ g is and s → ψ( |s + g|) on s ≤ −g are concave, we have (χk , |(Gxk+1 − c − g)j 0 |2 − |(Gxk − c − g)j 0 |2 ) ≥ ψ ((Gxk+1 − c)j 0 ) − ψ ((Gxk − c − g)j 0 ) and (χk , |(Gxk+1 − c + g)j 00 |2 − |(Gxk − c − g)j 00 |2 ) ≥ ψ ((Gxk+1 − c)j 00 ) − ψ ((Gxk − c)j 00 ) Thus, we obtain F (xk+1 )+β ψ (xk+1 )+R(xk+1 , xk )+ β (χk G(j, :)(xk+1 −xk ), G(j, :)(xk+1 −xk )) ≤ F (xk )+β ψ (xk ). 2 If we assume R(x, x ˆ) ≥ ω |x − x ˆ|2 for some ω > 0, then F (xk ) is monotonically decreasing and X |xk+1 − xk |2 < ∞. Corollary 3 Suppose i ∈ Ack = I k is inactive but ψ (Gxk+1 − c) > 0. Assume i X β (χk G(j, :)(xk+1 − xk ), G(j, :)(xk+1 − xk )) − β ψ (Gxk+1 − c) ≥ −ω 0 |xk+1 − xk |2 i 2 k i∈I with 0 ≤ ω 0 < ω, then the algorithm is globally convergent. 10 Sensitivity analysis In this section the sensitivity analysis is discussed for the parameter-dependent optimization. Consider the parametric optimization problem; F (x, p) x ∈ C. (10.1) For given p ∈ P , a complete matrix space, let x = x(p) ∈ C is a minimizer of (10.1). For p¯ ∈ P and p = p¯ + tp˙ ∈ P with t ∈ R and increment p˙ we have F (x(p), p) ≤ F (x(¯ p), p), F (x(¯ p), p¯) ≤ F (x(p), p¯). Thus, F (x(p), p) − F (x(¯ p), p¯) = F (x(p), p) − F (x(¯ p), p) + F (x(¯ p), p) − F (x(¯ p), p¯) ≤ F (x(¯ p), p) − F (x(¯ p), p¯) F (x(p), p) − F (x(¯ p), p¯) = F (x(p), p¯) − F (x(¯ p), p¯) + F (x(p), p) − F (x(p), p¯) ≥ F (x(p), p) − F (x(p), p¯) and F (x(p), p) − F (x(p), p¯) ≤ F (x(p), p) − F (x(¯ p), p¯) ≤ F (x(¯ p), p) − F (x(¯ p), p¯). Hence for t > 0 we have F (x(¯ p + tp), ˙ p¯ + tp) ˙ − F (x(¯ p), p¯) F (x(¯ p), p¯ + tp) ˙ − F (x(¯ p), p¯) F (x(¯ p + tp), ˙ p¯ + tp) ˙ − F (x(¯ p + tp), ˙ p¯) ≤ ≤ t t t F (x(¯ p), p¯) − F (x(¯ p), p¯ − tp) ˙ F (x(¯ p), p¯) − F (x(¯ p − tp), ˙ p¯ − tp) ˙ F (x(¯ p − tp), ˙ p¯) − F (x(¯ p − t p), ˙ p¯ − tp) ˙ ≤ ≤ . t t t 144 Based on this inequality we have Theorem (Sensitivity I) Assume (H1) there exists the continuous graph p → x(p) ∈ C in a neighborhood of (x(¯ p), p¯) ∈ C × P and define the value function V (p) = F (x(p), p). (H2) in a neighborhood of (x(¯ p), p¯) (x, p) ∈ C × Q → Fp (x, p) is continuous. Then, the G-derivative of V at p¯ exists and is given by V 0 (¯ p)p˙ = Fp (x(¯ p, p¯)p. ˙ Next, consider the implicit function case. Consider a functional V (p) = F (x(p)), E(x(p), p) = 0 where we assume that the constraint E is C 1 and E(x, p) = 0 defines a continuous solution graph p → x(p) in a neighborhood of (x(¯ p), p¯). Theorem (Sensitivity II) Assume there exists λ that satisfies the adjoint equation Ex (x(¯ p), p¯)∗ λ + Fx (x(¯ p)) = 0 and assume 1 + 2 = o(|p − p¯|), where 1 = (E(x(p), p¯) − E(x(¯ p), p¯) − Ex (x(¯ p), p¯)(x(p) − x(¯ p)), λ) (10.2) 2 = F (x(p)) − F (x(¯ p)) − Fx (x(¯ p))(x(p) − x(¯ p)) Then, p → V (p) is differentiable at p¯ and V 0 (¯ p)p˙ = (Ep (x(¯ p), p¯), λ). Proof: Note that V (p) − V (¯ p) = F 0 (x(¯ p))(x(p) − x(ˆ p)) + 2 and (Ex (x(¯ p), p¯)(x(p) − x(¯ p)), λ) + Fx (x(¯ p)(x(p) − x(¯ p)) = 0. Since 0 = (E(x(p), p)−E(x(¯ p), p¯), λ) = (Ex (x(¯ p), p¯)(x(p)−x(¯ p))+(E(x(p), p)−E(x(p), p¯), λ)+1 , we have V (p) − V (¯ p) = (Ep (x(p), p¯)(p − p¯), λ) + 1 + 2 + o(|p − p¯|) and thus V (p) is differentiable and the desired result. If we assume that E and F is C 2 , then it is sufficient to have the Holder continuity 1 2 ) |x(p) − x(¯ p)|X ∼ o(|p − p¯|Q 145 for condition (10.2) holding. Next, consider the parametric constrained optimization: F (x, p) subject to E(x, p) ∈ K. We assume (H3) λ ∈ Y satisfies the adjoint equation Ex (x(¯ p), p¯)∗ λ + Fx (¯ p), p¯) = 0. (H4) Conditions (10.2) holds. (H5) In a neighborhood of (x(¯ p), p¯). (x, p) ∈ C × P → Ep (x, p) ∈ Y is continuous. Eigenvalue problem Let A(p) be a linear operator in a Hilbert space X. For x = (µ, y) ∈ R × X F (x) = µ, subject to A(p)y = µ y. That is, (µ, y) is an eigen pair of A(p). A(p)∗ λ − µ λ = 0, (y, λ) = 1 Thus, µ0 (p) ˙ =( d A(p)py, ˙ λ). dp Next, consider the parametric constrained optimization; given p ∈ Q min x,u F (x, u, p) subject to E(x, u, p) = 0. (10.3) Let (¯ x, u ¯) is a minimizer given p¯ ∈ Q. We assume (H3) λ ∈ Y satisfies the adjoint equation Ex (¯ x, u ¯, p¯)∗ λ + Fx (¯ x, u ¯, p¯) = 0. (H4) Conditions (10.2) holds. (H5) In a neighborhood of (x(¯ p), p¯). (x, p) ∈ C × Q → Ep (x, p) ∈ Y is continuous. Theorem (Sensitivity III) Under assumptions (H1)–(H5) the necessary optimality for (12.16) is given by Ex∗ λ + Fx (¯ x, u ¯, p¯) = 0 Eu∗ λ + Fu (¯ x, u ¯, p¯) = 0 E(¯ x, u ¯, p¯) = 0 and V 0 (¯ p)p˙ = Fp (¯ x, u ¯, p¯)p˙ + (λ, Ep (¯ x, u ¯, p¯)p) ˙ = 0. 146 Proof: F (x(p), p) − F (x(¯ p), p¯) = F (x(p), p¯) − F (x(¯ p), p¯) + F (x(p), p) − F (x(p), p¯) ≥ Fx (x(¯ p), p¯)(x(p) − x(¯ p) + F (x(p), p) − F (x(p), p¯) F (x(p), p) − F (x(¯ p), p¯) = F (x(p), p) − F (x(¯ p), p) + F (x(¯ p), p) − F (x(¯ p), p¯) ≤ Fx (x(p), p)(x(p) − x(¯ p)) + F (x(¯ p), p) − F (x(¯ p), p¯). Note that E(x(p), p)−E(x(¯ p), p¯) = Ex (x(¯ p))(x(p)−x(¯ p)+E(x(p), p)−E(x(p), p¯) = o(|x(p)−x(¯ p) and (λ, Ex (¯ x, p¯)(x(p) − x(¯ p)) + Fx (¯ x, p¯), x(p) − x(¯ p) = 0 Combining the above inequalities and letting t → 0, we have 11 Augmented Lagrangian Method In this section we discuss the augmented Lagrangian method for the constrained minimization; min F (x) subject to E(x) = 0, G(x) ≤ 0 (11.1) over x ∈ C. For the equality constraint case we consider the augmented Lagrangian functional c Lc (x, λ) = F (x) + (λ, E(x)) + |E(x)|2 . 2 for c > 0. The augmented Lagrangian method is an iterative method with the update xn+1 = argminx∈C Lc (x, λn ) λn+1 = λn + c E(xn ). It is a combination of the multiplier method and the penalty method. That is, if c = 0 and λn+1 = λn + α E(xn ) with step size α > 0 is the multiplier method and if λn = 0 is the penalty method. The multiplier method converges (locally) if F is (locally) uniformly convex. The penalty method requires c → ∞ and may suffer if c > 0 is too large. It can be shown that the augmented Lagrangian method is locally convergent ¯ Since provided that L00c (x, λ) is uniformly positive near the solution pair (¯ x, λ). ¯ = F 00 (¯ ¯ E 00 (¯ L00 (¯ x, λ) x) + (λ, x)) + c E 0 (¯ x)∗ E 0 (¯ x), the augmented Lagrangian method has an enlarged convergent class compared to the multiplier method. Unlike the penalty method it is not necessary to c → ∞ and the Lagrange multiplier update speeds up the convergence. For the inequality constraint case G(x) ≤ 0 we consider the equivalent formulation: min c F (x) + |G(x) − z|2 + (µ, G(x) − z) over (x, z) ∈ C × Z 2 147 subject to and z ≤ 0. G(x) = z, For minimizing this over z ≤ 0 we have that z ∗ = min(0, µ + c G(x) ) c attains the minimum and thus we obtain 1 min F (x) + (| max(0, µ + c G(x))|2 − |µ|2 ) over x ∈ C. 2c For (11.1), given (λ, µ) and c > 0, define the augmented Lagrangian functional c 1 |E(x)|2 + (| max(0, µ + c G(x))|2 − |µ|2 ). (11.2) 2 2c The first order augmented Lagrangian method is a sequential minimization of Lc (x, λ, µ) over x ∈ C; Lc (x, λ, µ) = F (x) + (λ, E(x)) + Algorithm (First order Augmented Lagrangian method) 1. Initialize (λ0 , µ0 ) 2. Let xn be a solution to Lc (x, λn , µn ) over x ∈ C. min 3. Update the Lagrange multipliers λn+1 = λn + c E(xn ), µn+1 = max(0, µn + c G(xn )). (11.3) Remark (1) Define the value function Φ(λ, µ) = min Lc (x, λ, µ). x∈C Then, one can prove that Φλ = E(x), Φµ = max(0, µ + c G(x)) (2) The Lagrange multiplier update (11.3) is a gradient method for maximizing Φ(λ, µ). (3) For the equality constraint case c Lc (x, λ) = F (x) + (λ, E(x)) + |E(x)|2 2 and L0c (x, λ) = L00 (x, λ + c E(x)). The necessary optimality for max min Lc (x, λ) λ x is given by L0c (x, λ) = L00 (x, λ + c E(x)) = 0, E(x) = 0 (11.4) If we apply the Newton method to system (11.4), we obtain 00 + 0 L0 (x, λ + c E(x)) + c E 0 (x)∗ E 0 (x) E 0 (x)∗ x −x L0 (x, λ + c E(x)) = − . 0 + E (x) 0 λ −λ E(x) The c > 0 adds the coercivity on the Hessian L000 term. 148 11.1 Primal-Dual Active set method For the inequality constraint G(x) ≤ 0 (e.g., Gx − c˜ ≤ 0); Lc (x, µ) = F (x) + 1 (| max(0, µ + c G(x)|2 − |µ|2 ) 2c and the necessary optimality condition is F 0 (x) + G0 (x)∗ µ, µ = max(0, µ + c G(x)) where c > 0 is arbitrary. Based on the complementarity condition we have primal dual active set method. Primal Dual Active Set method For the affine case G(x) = Gx − c˜ we have the Primal Dual Active Set method as: 1. Define the active index and inactive index by A = {k ∈ (µ + c (Gx − c˜))k > 0}, I = {j ∈ (µ + c (Gx − c˜))j ≤ 0} 2. Let µ+ = 0 on I and Gx+ = c˜ on A. 3. Solve for (x+ , µ+ A) F 00 (x)(x+ − x) + F 0 (x) + G∗A µ+ = 0, GA x+ − c = 0 4. Stop or set n = n + 1 and return to step 1, where GA = {Gk }, k ∈ A. For the nonlinear G G0 (x)(x+ − x) + G(x) = 0 on A = {k ∈ (µ + c G(x))k > 0}. The primal dual active set method is a semismooth Newton method. That is, if we define a generalized derivative of s → max(0, s) by 0 on (−∞, 0] and 1 on (0, ∞). Note that s → max(0, s) is not differentiable at s = 0 and we define the derivative at s = 0 as the limit of derivative for s < 0. So, we select the generalized derivative of max(0, s) as 0, s ≤ 0 and 1, s > 0 and thus the generalized Newton update µ+ − µ + 0 = µ if µ + c (Gx − c) ≤ 0 c G(x+ − x) + c (Gx − c˜) = Gx+ − c˜ = 0 if µ + c (Gx − c˜) > 0, which results in the active set strategy µ+ j = 0, j ∈ I and (Gx+ − c˜)k = 0, k ∈ A. We will introduce the class of semismooth functions and the semismooth Newton method in Chapter 7 in function spaces. Specifically, the pointwise (coordinate) operation s → max(0, s) defines a semismooth function from Lp (Ω) → Lq (Ω), p > q. 149 12 Lagrange multiplier Theory for Nonsmooth convex Optimization In this section we present the Lagrange multiplier theory for the nonsmmoth optimization of the form min f (x) + ϕ(Λx) over x ∈ C (12.1) where X is a Banach space, H is a Hilbert space lattice, f : X → R is C 1 , Λ ∈ L(X, H) and C is a closed convex set in X. The nonsmoothness is represented by ϕ and ϕ is a proper, lower semi continuous convex functional in H. This problem class encompasses a wide variety of optimization problems including variational inequalities of the first and second kind [?, ?]. We will specify those to our specific examples. Moreover, the exact penalty formulation of the constraint problem is min f (x) + c |E(x)|L1 + c |(G(x))+ |L1 (12.2) where (G(x))+ = max(0, G(x)). We describe the Lagrange multiplier theory to deal with the non-smoothness of ϕ. Note that (12.1) is equivalent to min f (x) + ϕ(Λx − u) (12.3) subject to x ∈ C and u = 0 in H. Treating the equality constraint u = 0 in 12.3) by the augmented Lagrangian method, results in the minimization problem min x∈C,u∈H f (x) + ϕ(Λx − u) + (λ, u)H + c 2 |u| , 2 H (12.4) where λ ∈ H is a multiplier and c is a positive scalar penalty parameter. By (12.6) we have min Lc (y, λ) = f (x) + ϕc (Λx, λ). (12.5) y∈C where ϕc (u, λ) = inf v∈H {ϕ(u − v) + (λ, v)H + 2c |v|2H } = inf {ϕ(z) + (λ, u − z)H + z∈H c |u − z|2H } (u − v = z). 2 (12.6) For u, λ ∈ H and c > 0, ϕc (u, λ) is called the generalized Yoshida-Moreau approximations ϕ. It can be shown that u → ϕc (u, λ) is continuously Fr´echet differentiable with Lipschitz continuous derivative. Then, the augmented Lagrangian functional [?, ?, ?] of (12.1) is given by Lc (x, λ) = f (x) + ϕc (Λx, λ). (12.7) Let ϕ∗ is the convex conjugate of ϕ ϕc (u, λ) = inf z∈H {supy∈H {(y, z) − ϕ∗ (y)} + (λ, u − z)H + 2c |u − z|2H } = sup {− y∈H 1 |y − λ|2 + (y, u) − ϕ∗ (y)}. 2c 150 (12.8) where we used inf {(y, z)H + (λ, u − z)H + z∈H c 1 |u − z|2H } = (y, u) − |y − λ|2H 2 2c Thus, ϕ0c (u, λ) = yc (u, λ) = argmaxy∈H {− where 1 |y − λ|2 + (y, u) − ϕ∗ (y)} = λ + c vc (u, λ) (12.9) 2c c vc (u, λ) = argminv∈H {ϕ∗ (u − v) + (λ, v) + |v|2 }. 2 Moreover, if pc (λ) = argmin{ 1 |p − λ|2 + h∗ (p)}, 2c then ϕ0c (u, λ) = pc (λ + c x). Theorem (Lipschitz complementarity) (1) If λ ∈ ∂ϕ(x) for x, λ ∈ H, then λ = ϕ0c (x, λ) for all c > 0. (2) Conversely, if λ = ϕ0c (x, λ) for some c > 0, then λ ∈ ∂ϕ(x). Proof: If λ ∈ ∂ϕ(x), then from (12.6) and (12.8) ϕ(x) ≥ ϕc (x, λ) ≥ hλ, xi − ϕ∗ (λ) = ϕ(x). Thus, λ ∈ H attains the supremum of (12.17) and we have λ = ϕ0c (x, λ). Conversely, if λ ∈ H satisfies λ = ϕ0c (x, λ) for some c > 0, then vc (x, λ) = 0 by (12.17). Hence it follows from that ϕ(x) = ϕc (x, λ) = (λ, x) − ϕ∗ (λ). which implies λ ∈ ∂ϕ(x). If xc ∈ C denotes the solution to (12.5), then it satisfies hf 0 (xc ) + Λ∗ λc , x − xc iX ∗ ,X ≥ 0, for all x ∈ C (12.10) λc = ϕ0c (Λxc , λ). Under appropriate conditions [?, ?, ?] the pair (xc , λc ) ∈ C × H has a (strong-weak) ¯ as c → ∞ such that x cluster point (¯ x, λ) ¯ ∈ C is the minimizer of (12.1) and that ¯ ∈ H is a Lagrange multiplier in the sense that λ ¯ x−x hf 0 (¯ x) + Λ∗ λ, ¯iX ∗ ,X ≥ 0, for all x ∈ C, (12.11) with the complementarity condition ¯ = ϕ0 (Λ¯ ¯ λ x, λ), c for each c > 0. (12.12) ¯ The advantage here is that System (12.11)–(12.12) is for the primal-dual variable (¯ x, λ). ¯ ∈ ∂ϕ(Λ¯ the frequently employed differential inclusion λ x) is replaced by the equivalent nonlinear equation (12.12). The first order augmented Lagrangian method for (12.1) is given by 151 • Select λ0 and set n = 0 • Let xn = argminx∈C Lc (x, λn ) • Update the Lagrange multiplier by λn+1 = ϕ0c (Λxn , λn ). • Stop or set n = n + 1 and return to step 1. In many applications, the convex conjugate functional ϕ∗ of ϕ is given by ϕ∗ (v) = IK ∗ (v), where K ∗ is a closed convex set in H and IS is the indicator function of a set S. From (??) 1 ϕc (u, λ) = sup {− |y − λ|2 + (y, u)} 2c y∈K ∗ and (12.12) is equivalent to ¯ = P rojK ∗ (λ ¯ + c Λ¯ λ x). (12.13) which is the basis of our approach. 12.1 Examples In this section we discuss the applications of the Lagrange multiplier theorem and the complementarity. Example (Inequality) Let H = L2 (Ω) and ϕ = IC with C = {s ∈ L2 (Ω) : z ≤ 0, a.e., i.e., min f (x) over x ∈ Λx − c ≤ 0. (12.14) In this case ϕ∗ (v) = IK ∗ (v) with K ∗ = −C and the complementarity (15.25) implies that ¯=0 f 0 (¯ x) + Λ∗ λ ¯ = max(0, λ ¯ + c (Λ¯ λ x − c)). Example (L1 (Ω) optimization) Let H = L2 (Ω) and ϕ(v) = Z |v| dx, i.e. Ω Z min |Λy|2 dx. f (y) + Ω In this case ϕ∗ (v) = IK ∗ (v) with K ∗ = {z ∈ L2 (Ω) : |z|2 ≤ 1, a.e.} and the complementarity (15.25) implies that ¯=0 f 0 (¯ x) + Λ ∗ λ ¯= λ ¯ + c Λ¯ λ x ¯ + c Λ¯ max(1, |λ x|2 ) 152 a.e.. (12.15) Exaples (Constrained optimization) Let X be a Hilbert space and consider min f (x) subject to Λx ∈ C (12.16) where f : X → R is C 1 and C is a closed convex set in X. In this case we let f = J and Λ is the natural injection and ϕ = IC . ϕc (u, λ) = inf {(λ, u − z)H + z∈C and z ∗ = P rojC (u + c |u − z|2H }. 2 (12.17) λ ) c (12.18) attains the minimum in (12.17) and ϕ0c (u, λ) = λ + c (u − z ∗ ) (12.19) If H = L2 (Ω) and C is the bilateral constraint {y ∈ X : φ ≤ Λy ≤ ψ}, the complementarity (12.18)–(12.19) implies that for c > 0 f 0 (x∗ ) + Λ∗ µ = 0 (12.20) µ = max(0, µ + c (Λx∗ − ψ) + min(0, µ + c (Λx∗ − φ)) a.e.. If either φ = −∞ or ψ = ∞, it is unilateral constrained and defines the generalized obstacle problem. The cost functional J(y) can represent the performance index for the optimal control and design problem, the fidelity of data-to-fit for the inverse problem and deformation and restoring energy for the variational problem. 13 Semismooth Newton method In this section we present the semismooth Newton method for the nonsmooth necessary optimality condition, e.g., the complementarity condition (12.10): λ = ϕ0c (Λx, λ). Consider the nonlinear equation F (y) = 0 in a Banach space X. The generalized Newton update is given by y k+1 = y k − Vk−1 F (y k ), (13.1) where Vk is a generalized derivative of F at y k . In the finite dimensional space or for a locally Lipschitz continuous function F let DF denote the set of points at which F is differentiable. For x ∈ X = Rn we define the Bouligand derivative ∂B F (x) as ∂B F (x) = J : J = lim ∇F (xi ) , (13.2) xi →x, xi ∈DF where DF is dense by Rademacher’s theorem which states that every locally Lipschitz continuous function in the finite dimensional space is differentiable almost everywhere. Thus, we take Vk ∈ ∂B F (y k ). 153 In infinite dimensional spaces notions of generalized derivatives for functions which are not C 1 cannot rely on Rademacher’s theorem. Here, instead, we shall mainly utilize a concept of generalized derivative that is sufficient to guarantee superlinear convergence of Newton’s method [?]. This notion of differentiability is called Newton derivative and is defined below. We refer to [?, ?, ?, ?] for further discussion of the notions and topics. Let X, Z be real Banach spaces and let D ⊂ X be an open set. Definition (Newton differentiable) (1) F : D ⊂ X → Z is called Newton differentiable at x, if there exists an open neighborhood N (x) ⊂ D and mappings G : N (x) → L(X, Z) such that lim |h|→0 |F (x + h) − F (x) − G(x + h)h|Z = 0. |h|X The family {G(y) : y ∈ N (x)} is called a N -derivative of F at x. (2) F is called semismooth at y, if it is Newton differentiable at y and lim G(y + t h)h t→0+ exists uniformly in |h| = 1. Semi-smoothness was originally introduced in [?] for scalar-valued functions. Convex functions and real-valued C 1 functions are examples for such semismooth functions [?, ?] in the finite dimensional space. For example, if F (y)(s) = ψ(y(s)), point-wise, then G(y)(s) ∈ ψB (y(s)) is an N derivative in Lp (Ω) → Lq (Ω) under appropriate conditions [?]. We often use ψ(s) = |s| and max(0, s) for the necessary optimality. Suppose F (y ∗ ) = 0, Then, y ∗ = y k + (y ∗ − y k ) and |y k+1 − y ∗ | = |Vk−1 (F (y ∗ ) − F (y k ) − Vk (y ∗ − y k )| ≤ |Vk−1 |o(|y k − y ∗ |). Thus, the semismooth Newton method is q-superlinear convergent provided that the Jacobian sequence Vk is uniformly invertible as y k → y ∗ . That is, if one can select a sequence of quasi-Jacobian Vk that is consistent, i.e., |Vk − G(y ∗ )| as y k → y ∗ and invertible, (13.1) is still q-superlinear convergent. 13.1 Bilateral constraint Consider the bilateral constraint problem min f (x) subject to φ ≤ Λx ≤ ψ. (13.3) We have the optimality condition (12.20): f 0 (x) + Λ∗ µ = 0 µ = max(0, µ + c (Λx − ψ) + min(0, µ + c (Λx − φ)) a.e.. Since ∂B max(0, s) = {0, 1}, ∂B min(0, s) = {−1, 0}, at s = 0, the semismooth Newton method for the bilateral constraint (13.3) is of the form of the primal-dual active set method: Primal dual active set method 154 1. Initialize y 0 ∈ X and λ0 ∈ H. Set k = 0. 2. Set the active set A = A+ ∪ A− and the inactive set I by A+ = {x ∈ Ω : µk +c (Λy k −ψ) > 0}, A− = {x ∈ Ω : µk +c (Λy k −φ) < 0}, I = Ω\A 3. Solve for (y k+1 , λk+1 ) J 00 (y k )(y k+1 − y k ) + J 0 (y k ) + Λ∗ µk+1 = 0 Λy k+1 (x) = ψ(x), x ∈ A+ , Λy k−1 (x) = φ(x), x ∈ A− , µk+1 (x) = 0, x ∈ I 4. Stop, or set k = k + 1 and return to the second seep. Thus the algorithm involves solving the linear system of the form A Λ∗A+ Λ∗A− y f ΛA+ 0 0 µA+ = ψ . ΛA− 0 0 µA− φ In order to gain the stability the following one-parameter family of the regularization can be used [?, ?] µ = α (max(0, µ + c (Λy − ψ)) + min(0, µ + c (Λy − φ))), 13.2 α → 1+ . L1 optimization R (L1 -type optimiza- λ max(1 + , |λ + c v|) = λ + c v (13.4) As an important example, we discuss the case of ϕ(v) = tion). The complementarity is reduced to the form Ω |v|2 dx for ≥ 0. It is often convenient (but not essential) to use very small > 0 to avoid the singularity for the implementation of algorithm. Let 2 s 2 + 2 , if |s| ≤ , (13.5) ϕ (s) = |s|, if |s| ≥ . Then, (13.4) corresponds to the regularized one of (4.24): min J (y) = F (y) + ϕ (Λy). The semismooth Newton method is given by F 0 (y + ) + Λ∗ λ+ = 0 v+ λ+ = if |λ + v| ≤ 1 + , t ! λ + cv |λ + c v| λ+ + λ (λ+ + c v + ) |λ + c v| = λ+ + c v + + |λ + c v| λ if |λ + c v| > 1 + . 155 (13.6) There is no guarantee that (13.6) is solvable for λ+ and is stable. In order to obtain the compact and unconditionally stable formula we use the damped and regularized algorithm with β ≤ 1; v if |λ + c v| ≤ 1 + , λ+ = t ! λ λ + c v |λ + c v| λ+ − β (λ+ + c v + ) max(1, |λ|) |λ + c v| λ = λ+ + c v + + β |λ + c v| if |λ + c v| > 1 + . max(1, |λ|) (13.7) λ Here, the purpose of the regularization |λ|∧1 is to automatically constrain the dual variable λ into the unit ball. The damping factor β is automatically selected to achieve the stability. Let d = |λ + c v|, η = d − 1, a= λ , |λ| ∧ 1 b= λ + cv , |λ + c v| F = abt . Then, (13.7) is equivalent to λ+ = (η I + β F )−1 ((I − β F )(c v + ) + βd a), where by Sherman–Morrison formula (η I + β F )−1 = 1 β (I − F ). η η+βa·b Then, (η I + β F )−1 βd a = βd a. η+βa·b Since F 2 = (a · b) F , (η I + β F )−1 (I − β F ) = 1 βd (I − F ). η η+βa·b In order to achieve the stability, we let d−1 βd = 1, i.e., β = ≤ 1. η+βa·b d−a·b Consequently, we obtain a compact Newton step λ+ = 1 λ (I − F )(c v + ) + , d−1 |λ| ∧ 1 which results in; Primal-dual active set method (L1 -optimization) 1. Initialize: λ0 = 0 and solve F 0 (y 0 ) = 0 for y 0 . Set k = 0. 2. Set inactive set Ik and active set Ak by Ik = {|λk + c Λy k | > 1 + }, and Ak = {|λk + c Λy k | ≤ 1 + }. 156 (13.8) 3. Solve for (y k+1 , λk+1 ) ∈ X × H: F 0 (y k+1 ) + Λ∗ λk+1 = 0 λk 1 (I − F k )(c Λy k+1 ) + k in Ak λk+1 = k d −1 |λ | ∧ 1 Λy k+1 = λk+1 in I . k and 4. Convergent or set k = k + 1 and Return to Step 2. This algorithm is unconditionally stable and is rapidly convergent for our test examples. Remark (1) Note that (λ+ , v + ) = |v + |2 − (a · v + )(b · v + ) + a · v+, d−1 (13.9) which implies stability of the algorithm. In fact, since 0 k+1 F (y ) − F 0 (y k ) + Λ∗ (λk+1 − λk ) 1 λk+1 − λk = (I − F k )(c Λy k+1 ) + ( k − 1) λk , |λ | we have (F 0 (y k+1 ) − F 0 (y k ), y k+1 − y k ) + c ((I − F k )Λy k+1 , Λy k+1 ) + ( 1 − 1) (λk , Λy k+1 ). |λk | Suppose F (y) − F (x) − (F 0 (y), y − x) ≥ ω |y − x|2X , we have F (y k ) + k X ω |y j − y j−1 |2 + ( j=1 1 − 1) (λj−1 , v j ) ≤ F (y 0 ). |λj−1 | ∧ 1 where v j = Λy j . Note that (λk , v k+1 ) > 0 implies that (λk+1 , v k+1 ) > 0 and thus inactive at k+1-step, pointwise. This fact is a key step for proving a global convergence of the algorithm. (2) If a · b → 1+ , then β → 1. Suppose (λ, v) is a fixed point of (13.8), then 1 c λ + cv 1 1− 1− ·v λ= v. |λ| ∧ 1 d − 1 |λ + c v| d−1 Thus, the angle between λ and λ + c v is zero and 1 c |v| |v| 1− = − , |λ| ∧ 1 |λ| + c |v| − 1 |λ| |λ| ∧ 1 which implies |λ| = 1. It follows that λ + cv = λ + cv . max(|λ + c v|, 1 + ) 157 That is, if the algorithm converges, a · b → 1 and |λ| → 1 and it is consistent. (3) Consider the substitution iterate: 0 + F (y ) + Λ∗ λ+ = 0, λ+ = 1 v+, max(, |v|) (13.10) v = Λy. Note that v + ,v − v |v| = |v + |2 − |v|2 + |v + − v|2 2|v| dx (|v + | − |v|)2 + |v + − v|2 + = |v | − |v| − . 2|v| Thus, J (v + ) ≤ J (v), where the equality holds only if v + . This fact can be used to prove the iterative method (13.10) is globally convergent [?, ?]. It also suggests to use the hybrid method; for 0 < µ < 1 λ+ = µ 1 λ v + + (1 − µ) ( (I − F )v + + ), max(, |v|) d−1 max(|λ|, 1) in order to gain the global convergence property without loosing the fast convergence of the Newton method. 14 Exercise Problem 1 If A is a symmetric matrix on Rn and b ∈ Rn is a vector. Let F (x) = 1 t t 0 2 x Ax − b x. Show that F (x) = Ax − b. Problem 2 Sequences {fn }, {gn }, {hn } are Cauchy in L2 (0, 1) but not on C[0, 1]. Problem 3 Show that ||x| − |y|| ≤ |x − y| and thus the norm is continuous. Problem 4 Show that all norms are equivalent on a finite dimensional vector space. Problem 5 Show that for a closed linear operator T , the domain D(T ) is a Banach space if it is equipped by the graph norm |x|D(T ) = (|x|2X + |T x|2Y )1/2 . Problem 6 Complete the DC motor problem. Problem 7 Consider the spline interpolation problem (3.8). Find the Riesz representations Fi and F˜j in H02 (0, 1). Complete the spline interpolation problem (3.8). Problem 8 Derive the necessary optimality for L1 optimization (3.32) with U = [−1, 1]. 158 Problem 9 Solve the constrained minimization in a Hilbert space X; min 1 2 |x| − (a, x) 2 subject to (yi , x) = bi , 1 ≤ i ≤ m1 , (˜ yj , x) ≤ cj , 1 ≤ j ≤ m2 . 1 2 over x ∈ X. We assume {yi }m yj } m i=1 , {˜ j=1 are linearly independent. Problem 10 Solve the pointwise obstacle problem: Z 1 1 du ( | |2 − f (x)u(x)) dx min 0 2 dx subject to 1 u( ) ≤ c1 , 3 2 u( ) ≤ c2 , 3 over a Hilbert space X = H01 (0, 1) for f = 1. Problem 11 Find the differential form of the necessary optimality for Z 1 1 du 1 min ( (a(x)| |2 +c(x)|u|2 )−f (x)u(x)) dx+ (α1 |u(0)|2 +α2 |u(1)|2 )−(g1 , u(0))−(g2 , u(1)) 2 dx 2 0 over a Hilbert space X = H 1 (0, 1). Problem 12 Let B ∈ Rm,n . Show that R(B) = Rm if and only if BB t is positive definite on Rm . Show that x∗ = B t (BB t )−1 c defines a minimum norm solution to Bx = c. Problem 13 Consider the optimal control problem (3.44) (Example Control problem) ˆ = {u ∈ Rm : |u|∞ ≤ 1}. Derive the optimality condition (3.45). with U Problem 14 Consider the optimal control problem (3.44) (Example Control problem) ˆ = {u ∈ Rm : |u|2 ≤ 1}. Find the orthogonal projection PC and derive the with U necessary optimality and develop the gradient method. R1 Problem 15 Let J(x) = 0 |x(t)| dt is a functional on C(0, 1). Show that J 0 (d) = R1 0 sign(x(t))d(t) dt. Problem 16 Develop the Newton methods for the optimal control problem (3.44) withˆ = Rm ). out the constraint on u (i.e., U Problem 17 (1) Show that the conjugate functional h∗ (p) = sup{(p, u) − h(u)} u is convex. (2) The set-valued function Φ Φ(p) = argmaxu {(p, u) − h(u)} is monotone. 159 Problem 18 Find the graph Φ(p) for max {pu − |u|0 } |u|≤γ and max {pu − |u|1 }. |u|≤γ 15 Non-convex nonsmooth optimization In this section we consider a general class of nonsmooth optimization problems. Let X be a Banach space and continuously embed to H = L2 (Ω)m . Let J : X → R+ is C 1 and N is nonsmooth functional on H is given by Z N (y) = h(y(ω)) dω. Ω where h : Rm → R is lower semi-continuous. Consider the minimization on X; min J(x) + N (x) (15.1) subject to x ∈ C = {x(ω) ∈ U a.e. } where U is a closed convex set in Rm . For example, we consider h(u) = α2 |u|2 + |u|0 where |s|0 = 1, s 6= 0 and |0|0 = 0. For an integrable function u Z |u(ω)|0 dω = meas({u 6= 0}). Ω Thus, one can formulate the volume control and scheduling problem as (15.1). 15.1 Existence In this section we develop a existence method for a general class of minimizations min G(x) (15.2) on a Banach space X, including (15.1). Here, the constrained problem x ∈ C is formulated as G(x) = G(x) + IC (x) using the indicator function of the constrained set ˜ ∗ . Thus xn * x means the weak or weakly star C. Let X be a reflexible space or X = X convergence in X. To prove the existence of solutions to (15.2) we modify a general existence result given in [?]. Theorem (Existence) Let G be an extended real-valued functional on a Banach space X satisfying the following properties: (i) G is proper, weakly lower semi-continuous and G(x) + |x|p for each > 0 is coercive for some p. 160 (ii) for any sequence {xn } in X with |xn | → ∞, |xxnn | * x ¯ weakly in X, and {G(xn )} xn bounded from above, we have |xn | → x ¯ strongly in X and there exists ρn ∈ (0, |xn |] and n0 = n0 ({xn }) such that G(xn − ρn x ¯) ≤ G(xn ), (15.3) for all n ≥ n0 . Then minx∈X G(x) admits a minimizer. Proof: Let {n } ⊂ (0, 1) be a sequence converging to 0 from above and consider the family of auxiliary problems min G(x) + n |x|p . (15.4) x∈x By (i), every minimizing sequence for (15.1) is bounded. Extracting a weakly convergent subsequence, the existence of a solution xn ∈ X for (15.1) can be argued in a standard manner. Suppose {xn } is bounded. Then there exists a weakly convergent subsequence, denoted by the same symbols and x∗ such that xn * x∗ . Passing to the limit n → 0+ in n n G(xn ) + |xn |p ≤ G(x) + |x|p for all x ∈ X 2 2 and using again (i) we have G(x∗ ) ≤ G(x) for all x ∈ X and thus x∗ is a minimizer for G. We now argue that {xn } is bounded, and assume to the contrary that limn→∞ |xn | = ∞. Then a weakly convergent subsequence can be extracted from |xxnn | such that, again ¯, for some x ¯ ∈ X. Since G(xn ) is bounded from (ii) dropping indices, |xxnn | * x xn →x ¯ strongly in X. |xn | (15.5) Next, choose ρ > 0 and n0 according to (15.3) and thus for all n ≥ n0 G(xn ) + n |xn |p ≤ G(xn − ρ¯ x) + n |xn − ρ¯ x|p ≤ G(xn ) + n |xn − ρ¯ x|p . It follows that |xn | ≤ |xn − ρ¯ x| = |xn − ρn xn xn ρ xn + ρn ( −x ¯)| ≤ |xn |(1 − ) + ρn | −x ¯|. |xn | |xn | |xn | |xn | This implies that 1≤| xn −x ¯|, |xn | which give a contradiction to (15.5), and concludes the proof. The first half of condition (ii) is a compactness assumption and (15.3) is a compatibility condition [?]. Remark (1) If G is not bounded below. Define the (sequential) recession functional G∞ associated with G. It is the extended real-valued functional defined by G∞ (x) = inf lim inf xn *x,tn →∞ n→∞ 161 1 G(tn xn ). tn (15.6) and assume G∞ (x) ≥ 0. Then, G(x) + |x|2 is coercive. If not, there exists a sequence {xn } satisfying G(xn ) + |xn |2 is bounded but |xn | → ∞. Let w − lim |xxnn | = x ¯. 1 xn G(|xn | ) + |xn | → 0. |xn | |xn | Since G∞ (¯ x) ≤ |x1n | G(|xn | |xxnn | ), which yields a contradiction. Moreover, in the proof note that G∞ (¯ x) ≤ lim inf n→∞ 1 1 xn G(xn ) = lim inf G( |xn |) ≤ 0. n→∞ |xn | |xn | |xn | If we assume G∞ ≥ 0, then G∞ (¯ x) = 0. Thus, (15.3) must hold, assuming G∞ (¯ x) = 0. In [?] (15.3) is assumed for all x ∈ X and G∞ (¯ x) = 0, i.e., G(x − ρ¯ x) ≤ G(x). Thus, our condition is much weaker. In [?] we find a more general version of (15.3): there exists a ρn and zn such that ρn ∈ (0, |xn |] and zn ∈ X G(xn − ρn zn ) ≤ G(xn ), |zn − z| → 0 and |z − x ¯| < 1. (2) We have also proved that a sequence {xn } that minimizes the regularized problem for each n has a weakly convergent subsequence to a minimizer of G(x). Example 1 If ker(G∞ ) = 0, then x ¯ = 0 and the compactness assumption implies |¯ x| = 1, which is a contradiction. Thus, the theorem applies. Example 2 (`0 optimization) Let X = `2 and consider `0 minimization: 1 G(x) = |Ax − b|22 + β |x|0 , 2 (15.7) where |x|0 = the number of nonzero elements of x ∈ `2 , the counting measure of x. Assume N (A) is finite and R(A) is closed. The, one can apply Theorem to prove the existence of solutions to (15.7). That is, suppose n |xn | → ∞, G(xn ) is bounded from above and |xxn | * z. First, we show that z ∈ N (A) and xn |xn | → z. Since {G(xn )} is bounded from above, here exists M such that 1 J(xn ) = |Axn − b|22 + β |xn |0 ≤ M, for all n. 2 (15.8) Since R(A) closed, by the closed range theorem X = R(A∗ )+N (A) and let xn = x1n +x2n Consequently, 0 ≤ |Ax1n |2 − 2(b, Ax1n )2 + |b|22 ≤ K and |Ax1n | is bounded. x1 2 1 xn ( A∗ b, )2 + |b|22 → 0. 0 ≤ A( n ) 2 − 2 |xn |2 |xn |2 |xn |2 (15.9) 1 By the closed range theorem this implies that |xni is bounded and |xxnn|2 → x ¯1 = 0 in `2 . 2 Since |xxnn|2 * x ¯2 and by assumption dim N (A) < ∞ it follows that strongly in `2 . Next, (15.3) holds. Since |xn |0 = | xn |0 |xn | 162 xn |xn |2 →z =x ¯2 and thus |z|0 < ∞ (15.3) is equivalent to showing that |(x1n + x2n − ρ z)i |0 ≤ |(x1n + x2n )i |0 for all i, and for all n sufficiently large. Only the coordinates for which (x1n + x2n )i = 0 with zi 6= 0 require our attention. Since |z|0 < ∞ there exists ˜i such that zi = 0 for all i > ˜i. For i ∈ {1, . . . , ˜i} we define Ii = {n : (x1n + x2n )i = 0, zi 6= 0}. These sets are finite. In fact, if Ii is infinite for some i ∈ {1, . . . , ˜i}, then limn→∞,n∈Ii |xn1 |2 (x1n + x2n )i = 0. Since limn→∞ |xn1 |2 (x1n )i = 0 this implies that zi = 0, which is a contradiction. Example 3 (Obstacle problem) Consider Z min 1 G(u) = 0 1 ( |ux |2 − f u) dx 2 subject to u( 12 ) ≤ 1. Let X = H 1 (0, 1). If |un | → ∞ and vn = assumption in (ii) Z Z 1 1 1 1 G(un ) 2 |(vn )x | dx ≤ f vn dx + →0 2 0 |un | 0 |un |2 (15.10) un |un | * v. By the and thus (vn )x → 0 and vn → v = c for some constant c. But, since vn ( 21 ) ≤ c = v( 21 ) ≤ 0 Thus, Z G(un − ρ v) = G(un ) + 1 |un | → 0, 1 ρvf (x) dx ≤ G(un ) 0 if R1 0 f (x) dx ≥ 0. That is, if R1 0 f (x) dx ≥ 0 (15.10) has a minimizer. Example 4 (Friction Problem) Z min 0 1 1 ( |ux |2 − f u) dx + |u(1)| 2 (15.11) Using exactly the same argument above. we have (vn ) → 0 and vn → v = c. But, we have Z 1 G(un − ρv) = G(un ) + ρc f (x) dx + |un (1) − ρc| − |un (1)| 0 where ρ |un (1) − ρc| − |un (1)| = |vn (1) − c| − |vn (1)| ≤ 0 |un | |un | for sufficiently large n since |un (1) − ρc| − |un (1)| → |c − ρˆc| − |c| = −ˆ ρ|c| |un | R1 where ρˆ = limn→∞ |uρnn | . Thus, if | 0 f (x) dx| ≤ 1, then (15.11) has a minimizer. Example 5 (L∞ Laplacian). Z 1 |u − f | dx subject to |ux | ≤ 1 min 0 163 Similarly, we have (vn )x → 0 and vn → v = c. G(un − ρc) − G(un ) f f = G(vn − ρˆc − ) − G(vn − )≤0 |un | |un | |un | for sufficiently large n. Example 6 (Elastic contact problem) Let X = H 1 (Ω)d and ~u is the deformation field. Given, the boundary body force ~g , consider the elastic contact problem: Z Z Z 1 : σ dx − |τ · ~u| dsx , subject to n · ~u ≤ ψ, (15.12) ~g · ~u dsx + min Γ2 Ω 2 Γ1 where we assume the linear strain: ∂uj 1 ∂ui (u) = ( + ) 2 ∂xj ∂xi and the Hooke’s law: σ = µ + λ trI with positive Lame constants µ, λ. If |~un | → ∞ and G(un ) is bounded we have (vn ) → 0 ∇(vn ) → 0 for ~vn = |~uu~nn | * v. Thus, one can show that ~vn → v = (v 1 , v 2 ) ∈ {(−Ax2 + C1 , Ax1 + C2 )} for the two dimensional problem. Consider the case when Ω = (0, 1)2 and Γ1 = {x2 = 1} and Γ2 = {x2 = 0}. As shown in the previous examples Z Z G(~un − ρ v) − G(~un ) 2 = ρˆn · ~g v dsx + ρˆτ · ~g v 1 dsx |~un | Γ1 Γ1 Z (|(vn )1 − ρˆv 1 | − |(vn )1 |) dsx + Γ2 Since v 2 ∈ C = {Ax1 + C2 ≤ 0} and for v 1 = C1 |(vn )1 − ρˆv 1 | − |(vn )1 | → |(v)1 − ρˆv 2 | − |(v)1 | = −ˆ ρ|v 1 |, R R if we assume | Γ1 τ · ~g dsx | ≤ 1 and Γ1 n · ~g v2 dsx ≥ 0 for all v2 ∈ C, then (15.12) has a minimizer. Example 7 Z min 0 with a(x) > 0 except 15.2 a( 12 ) 1 1 ( (a(x)|ux |2 + |u|2 ) − f u) dx 2 = 0. Necessary Optimality Let X be a Banach space and continuously embed to H = L2 (Ω)m . Let J : X → R+ is C 1 and N is nonsmooth functional on H = L2 (Ω)m is given by Z N (y) = h(y(ω)) dω. Ω 164 where h : Rm → R is a lower semi-continuous functional. Consider the minimization on H; min J(y) + N (y) (15.13) subject to y ∈ C = {y(ω) ∈ U a.e. } where U is a closed convex set in Rm . Given y ∗ ∈ C, define needle perturbations of (the optimal solution) y ∗ by ∗ y on |ω − s| > δ y= u on |ω − s| ≤ δ for δ > 0, s ∈ ω, and u ∈ U . Assume that |J(y) − J(y ∗ ) − (J 0 (y ∗ ), y − y ∗ )| ∼ o(meas({|s| < δ}). (15.14) Let H is the Hamiltonian defined by H(y, λ) = (λ, y) − h(y), y, λ ∈ Rm . Then, we have the pointwise maximum principle Theorem (Pointwise Optimality) If an integrable function y ∗ attains the minimum and (15.14) holds, then for a.e. ω ∈ Ω J 0 (y ∗ ) + λ = 0 (15.15) H(u, λ(ω)) ≤ H(y ∗ (ω), λ(ω)) for all u ∈ U. Proof: Since 0 ≥ J(y) + N (y) − (J(y ∗ ) + N (y ∗ )) ∗ 0 ∗ ∗ Z = J(y) − J(y ) − (J (y ), y − y ) + (J 0 (y ∗ ), u − y ∗ ) + h(u) − h(y ∗ )) dω |ω−s|<δ Since h is lower semicontinuos, it follows from (15.14) that at a Lubegue point s = ω of y ∗ (15.15) holds. For λ ∈ Rm define a set-valued function by Φ(λ) = argmaxy∈U {(λ, y) − h(y)}. Then, (15.15) is written as y(ω) ∈ Φ(−J 0 (y(ω)) The graph of Φ is monotone: (λ1 − λ2 , y1 − y2 ) ≥ 0 for all y1 ∈ Φ(λ1 ), y2 ∈ Φ(λ2 ). but it not necessarily maximum monotone. Let h∗ be the conjugate function of h defined by h∗ (λ) = max H(λ, y). u inU 165 Then, h∗ is necessarily convex. Since if u ∈ Φ(λ), then ˆ ≥ (λ, ˆ y) − h(u) = (λ ˆ − λ, y) + h∗ (u), h∗ (λ) thus u ∈ ∂h∗ (λ). Since h∗ is convex, ∂h∗ is maximum monotone and thus ∂h∗ is the maximum monotone extension. Let h∗∗ be the bi-conjugate function of h; h∗∗ (x) = sup{(x, λ) − sup H(y, λ)} y λ whose epigraph is the convex envelope of the epigraph of h, i.e., epi(h∗∗ ) = {λ1 y1 + (1 − λ) y2 for all y1 , y2 ∈ epi(h) and λ ∈ [0, 1]}. Then, h∗∗ is necessarily convex and is the convexfication of h and λ ∈ ∂h∗∗ (u) if and only if u ∈ ∂h∗ (λ) and h∗ (λ) + h∗∗ (u) = (λ, u). Consider the relaxed problem: Z min h∗∗ (y(ω)) dω. J(y) + (15.16) Ω Since h∗∗ is convex, Z ∗∗ N (y) = h∗∗ (y) dω Ω is weakly lower semi-continuous and thus there exits a minimizer y of (15.16). It thus follows from Theorem (Pointwise Optimality) that the necessary optimality of (15.16) is given by −J 0 (y)(ω) ∈ ∂h∗∗ (y(ω)), or equivalently y ∈ ∂h∗ (−J 0 (y)). (15.17) Assume J(ˆ y ) − J(y) − J 0 (y)(ˆ y − y) ≥ 0 for all yˆ ∈ C, where y(ω) ∈ Φ(−J 0 (y(ω))). Then, y minimizes the original cost functional (15.13). One can construct a C 1 realization for h by h∗ (λ) = min{ p 1 |p − λ|2 + h∗ (p)}. 2 That is, h∗ is C 1 and (h∗ )0 is Lipschitz and (h∗ )0 is the Yosida approximation of ∂h∗ . Then, we let h = (h∗ )∗ h (u) = min{ 1 |y − u|2 + h(y)}. 2 166 Consider the relaxed problem Z min J(y) + h (y(ω)) dω. (15.18) Ω The necessary optimality condition becomes y(ω) = (h∗ )0 (−J 0 (y(ω))). (15.19) Then, one can develop the semi smooth Newton method to solve (15.19). Since |h − h∗∗ |∞ ≤ Ch∗∗ , one can argue that a sequence of minimizers y of (15.18) to a minimizer of (15.16). In fact, Z Z h (y) dω. h (y ) dω. ≤ J(y) + J(y ) + Ω Ω for all y ∈ C Suppose |y | is bounded , then y has a weakly convergent subsequence to y¯ ∈ C and Z Z ∗∗ J(¯ y) + h (¯ y ) dω. ≤ J(y) + h∗∗ (y) dω. Ω Ω for all y ∈ C. Example (L0 (Ω) optimization) Based on our analysis one can analyze when h involves the volume control by L0 (Ω), i.e., Let h on R be given by h(u) = where α 2 |u| + β|u|0 , 2 ( (15.20) 0 if u = 0 |u|0 = 1 if u 6= 0. That is, Z |u|0 dω = meas({u(ω) 6= 0}). Ω In this case Theorem applies with Φ(q) := argminu∈R (h(u) − qu) = q α for |q| ≥ 0 for |q| < √ √ 2αβ (15.21) 2αβ. and the conjugate function h∗ of h; ∗ −h (q) = h(Φ(q)) − q Φ(q) = 1 |q|2 + β − 2α 0 for |q| ≥ for |q| < The bi-conjugate function h∗∗ (u) is given by q α 2 2 |u| + β |u| > 2β α h∗∗ (u) = q √2αβ |u| |u| ≤ 2β . α 167 √ √ 2αβ 2αβ. Clearly, Φ : R → R is monotone, but it is not maximal monotone. The maximal ˜ = (∂h∗∗ )−1 = ∂h∗ of Φ and is given by monotone extension Φ √ q for |q| > 2αβ α √ for |q| < 2αβ 0 ˜ Φ(q) ∈ √ [ αq , 0] for q = − 2αβ √ [0, αq ] for q = 2αβ. Thus, λ − p (λ) (h∗ )0 = where p (λ) = argmin{ We have 1 |p − λ|2 + h∗ (p)}. 2 √ λ |λ| < 2αβ √ √ √ 2αβ 2αβ ≤ λ ≤ (1 + α ) 2αβ p (λ) = √ √ √ − 2αβ 2αβ ≤ −λ ≤ (1 + α ) 2αβ √ λ |λ| ≥ (1 + α ) 2αβ. 1+ α (h∗ )0 (λ) = |λ| < 0 √ λ− 2αβ √ λ+ 2αβ λ α+ √ √ √ (15.22) 2αβ √ 2αβ ≤ λ ≤ (1 + α ) 2αβ √ 2αβ ≤ −λ ≤ (1 + α ) 2αβ √ |λ| ≥ (1 + α ) 2αβ. Example (Authority optimization) We analyze the authority optimization of two unknowns u1 and u2 , i.e., the at least one of u1 and u2 is zero. We formulate such a problem by using α h(u) = (|u|21 + |u2 |2 ) + β|u1 u2 |0 , (15.23) 2 The, we have 1 √ 2 2 |p1 |, |p2 | ≥ 2αβ 2α (|p1 | + |p2 | ) − β √ ∗ 1 |p1 |2 |p1 | ≥ |p2 |, |p2 | ≤ 2αβ h (p1 , p2 ) = 2α √ 1 2 |p2 | ≥ |p1 |, |p1 | ≤ 2αβ 2α |p2 | 168 Also,we have √ 1 |λ1 |, |λ2 | ≥ (1 + α ) 2αβ 1+ α (λ1 , λ2 ) √ √ √ √ ( 1+1 λ1 , ± 2αβ) |λ1 | ≥ (1 + α ) 2αβ, 2αβ ≤ |λ2 | ≤ (1 + α ) 2αβ α √ √ √ √ 2αβ ≤ |λ1 | ≤ (1 + α ) 2αβ (± 2αβ, 1+1 λ2 ) |λ2 | ≥ (1 + α ) 2αβ, α √ √ √ 2αβ, 2αβ ≤ |λ | ≤ (1 + (λ , λ ) |λ | ≤ |λ | ≤ (1 + ) ) 2αβ 2 2 2 2 1 α α √ √ √ p (λ1 , λ2 ) = (λ1 , λ1 ) |λ1 | ≤ |λ2 | ≤ (1 + α ) 2αβ, 2αβ ≤ |λ1 | ≤ (1 + α ) 2αβ √ λ1 ( 1+ |λ1 | ≥ (1 + α )| ≥ |λ2 |, |λ2 | ≤ 2αβ , λ2 ) α √ λ2 |λ2 | ≥ (1 + α )| ≥ |λ1 |, |λ1 | ≤ 2αβ (λ1 , 1+ ) α √ (λ2 , λ2 ) |λ2 | ≤ |λ1 | ≤ (1 + α )|λ2 |, |λ2 | ≤ 2αβ √ (λ1 , λ1 ) |λ1 | ≤ |λ2 | ≤ (1 + α )|λ1 |, |λ1 | ≤ 2αβ (15.24) 15.3 Augmented Lagrangian method Given λ ∈ H and c > 0, consider the augmented Lagrangian functional; c L(x, y, λ) = J(x) + (λ, x − y)H + |x − y|2H + N (y). 2 Since Z c L(x, y, λ) = J(x) + ( |x − y|2 + (λ, x − y) + h(y)) dω 2 Ω it follows from Theorem that if (¯ x, y¯) minimizes L(x, y, λ), then the necessary optimality condition is given by −J 0 (¯ x) = λ + c (¯ x − y¯) y¯ ∈ Φc (λ + c x ¯). where c Φc (p) = argmax{(p, y) − |y|2 − h(y)} 2 Thus, the augmented Lagrangian update is given by λn+1 = λn + c (xn − y n ) where (xn , y n ) solves −J 0 (xn ) = λn + c (xn − y n ) y n ∈ Φc (λn + c xn ). If we convexity N by the bi-conjugate of h we have c L(x, y, λ) = J(x) + (λ, x − y)H + |y − x|2H + N ∗∗ (y). 2 169 Since hc (x, λ) = inf {h∗∗ (y) + (λ, x − y) + y c |y − x|2 } 2 = inf {sup{(y, p) − h∗ (p)} + (λ, x − y) + y p = sup{− p c |y − x|2 } 2 1 |p − λ|2 + (x, p) − h∗ (p)}. 2c we have h0c (x, λ) = argmax{− 1 |p − λ|2 + (x, p) − h∗ (p)}. 2c If we let pc (λ) = argmin{ 1 |p − λ|2 + h∗ (p)}, 2c then h0c (x, λ) = pc (λ + cx) and (15.25) λ − pc (λ) . (15.26) c ˜ c of the graph Φc is given by (h∗c )0 (λ), the Thus, the maximal monotone extension Φ Yosida approximation of ∂h∗ (λ) and results in the update: (h∗c )0 (λ) = −J 0 (xn ) = λn + c (xn − y n ) y n = (h∗c )0 (λn + c xn ). In the case c = 0, we have the multiplier method λn+1 = λn + α (xn − y n ), α > 0 where (xn , y n ) solves −J 0 (xn ) = λn ˜ n ). y n ∈ Φ(λ In the case λ = 0 we have the proximal iterate of the form xn+1 = (h∗c )0 (xn + α J 0 (xn )). 15.4 Semismooth Newton method The necessary optimality condition is written as −J 0 (x) = λ λ = h0c (x, λ) = pc (λ + c x). 170 is the semi-smooth equation. For the case of h = α2 |x|2 + β|x|0 we have from (15.22)(15.25) √ λ if |λ + cx| ≤ 2αβ √ √ √ 2αβ if 2αβ ≤ λ + cx ≤ (1 + αc ) 2αβ h0c (x, λ) = √ √ √ − 2αβ if 2αβ ≤ −(λ + cx) ≤ (1 + αc ) 2αβ √ λ if |λ + cx| ≥ (1 + αc ) 2αβ. α+c Thus, the semi-smooth Newton update is xn+1 = 0 √ n+1 = 2αβ λ 0 n+1 n+1 −J (x )=λ , √ λn+1 = − 2αβ n+1 xn+1 = λ α if |λn + cxn | ≤ if if √ 2αβ √ √ 2αβ < λn + cxn < (1 + αc ) 2αβ √ √ 2αβ < −(λn + cxn ) < (1 + αc ) 2αβ √ if |λn + cxn | ≥ (1 + αc ) 2αβ. For the case h(u1 , u2 ) = α2 (|u1 |2 + |u2 |2 ) + β|u1 u2 |0 . Let µn = λn + cun . From (15.24) the semi-smooth Newton update is √ n+1 |µn1 |, |µn2 | ≥ (1 + α ) 2αβ un+1 = λ α √ √ √ √ λn+1 n+1 1 = u = ± 2αβ , λn+1 |µn1 | ≥ (1 + α ) 2αβ, 2αβ ≤ |µn2 | ≤ (1 + α ) 2αβ 1 2 α √ √ √ λn+1 = ±√2αβ, un+1 = λn+1 2 |µn2 | ≥ (1 + α ) 2αβ, 2αβ ≤ |µn1 | ≤ (1 + α ) 2αβ 2 α √ √ √ λn+1 = 1α , un+1 , λn+1 λn+1 |µn2 | ≤ |µn1 | ≤ (1 + α ) 2αβ, 2αβ ≤ |µn2 | ≤ (1 + α ) 2αβ 1 2 1 √ √ √ λn+1 = 2α , un+1 = λn+1 λn+1 2αβ ≤ |λ1 | ≤ (1 + α ) 2αβ |λ1 | ≤ |λ2 | ≤ (1 + α ) 2αβ, 2 1 2 √ λn+1 n+1 n+1 1 u = =0 |µn1 | ≥ (1 + α )| ≥ |µn2 |, |µn2 | ≤ 2αβ 1 α , u2 √ λn+1 un+1 = 2α , un+1 =0 |µn2 | ≥ (1 + α )| ≥ |µn1 |, |µn1 | ≤ 2αβ 2 1 √ n+1 n+1 n+1 n | ≤ |µn | ≤ (1 + )|µn |, |µn | ≤ λ = λ u = 0 |µ 2αβ 2 1 2 2 2 2 2 α √ λn+1 = λn+1 , un+1 =0 |µn1 | ≤ |µn2 | ≤ (1 + α )|µn1 |, |µn1 | ≤ 2αβ. 1 2 1 15.5 Constrained optimization Consider the constrained minimization problem of the form; Z min J(x, u) = (`(x) + h(u)) dω over u ∈ U, Ω 171 (15.27) subject to the equality constraint ˜ u) = Ex + f (x) + Bu = 0. E(x, (15.28) Here U = {u ∈ L2 (Ω)m : u(ω) ∈ U a.e. in Ω}, where U is a closed convex subset of Rm . Let X is a closed subspace of L2 (Ω)n and E : X × U → X ∗ with E ∈ L(X, X ∗ ) bounded invertible, B ∈ L(L2 (Ω)m , L2 (Ω)n ) and f : L2 (Ω)n → L2 (Ω)n Lipschitz. We ˜ u) = 0 has a unique solution x = x(u) ∈ X, given u ∈ U. We assume there assume E(x, exist a solution (¯ x, x ¯) to Problem (15.32). Also, we assume that the adjoint equation: (E + fx (¯ x))∗ p + `0 (¯ x) = 0. (15.29) has a solution in X. It is a special case of (15.1) when J(u) is the implicit functional of u: Z `(x(ω)) dω, J(u) = Ω where x = x(u) ∈ X for u ∈ U is the unique solution to Ex + f (x) + Bu = 0. To derive a necessary condition for this class of (nonconvex) problems we use a maximum principle approach. For arbitrary s ∈ Ω, we shall utilize needle perturbations of the optimal solution u ¯ defined by on {ω : |ω − s| < δ} u v(ω) = (15.30) u ¯(ω) otherwise, where u ∈ U is constant and δ > 0 is sufficiently small so that {|ω − s| < δ} ⊂ Ω. The following additional properties for the optimal state x ¯ and each perturbed state x(v) will be used: 1 |x(v) − x ¯|2L2 (Ω) = o(meas({|ω − s| < δ})) 2 R (15.31) ¯) − `x (·, x ¯)(x(v) − x ¯) dω = O(|x(v) − x ¯|2L2 (Ω) ) Ω `(·, x(v)) − `(·, x hf (·, x(v)) − f (·, x ¯) − fx (·, x ¯)(x(v) − x ¯), piX ∗ ,X = O(|x(v) − x ¯|2X ) Theorem Suppose (¯ x, u ¯) ∈ X × U is optimal for problem (15.32) , that p ∈ X satisfies the adjoint equation (15.29) and that (??), (??), and (15.31) hold. Then we have the necessary optimality condition that u ¯(ω) minimizes h(u) + (p, Bu) pointwise a.e. in Ω. Proof: By the second property in (15.31) we have R 0 ≤ J(v) − J(¯ u) = Ω `(·, x(v)) − `(·, x(¯ u)) + h(v) − h(¯ u) dω = R ¯)(x Ω (`x (·, x −x ¯) + h(v) − h(¯ u)) dω + O(|x − x ¯|2 ), where (`x (·, x ¯)(x − x ¯), p) = −h(E + fx (·, x ¯)(x − x ¯)), pi and 0 = hE(x − x ¯) + f (·, x) − f (·, x ¯) + B(v − u ¯), pi = hE(x − x ¯) + fx (·, x ¯)(x − x ¯) + B(v − u ¯), pi + O(|x − x ¯|2 ) 172 Thus, we obtain Z ((−B ∗ p, v − u ¯) + h(v) − h(¯ u)) dω. 0 ≤ J(v) − J(¯ u) ≤ Ω Dividing this by meas({|ω − s| < δ}) > 0 , letting δ → 0, and using the first property in (15.31) we obtain the desired result at a Lebesgue point s ω → (−B ∗ p, v − u ¯) + h(v) − h(¯ u)) since h is lower semi continuous. Consider the relaxed problem of (15.32): Z min J(x, u) = (`(x) + h∗∗ (u)) dω over u ∈ U, (15.32) Ω We obtain the pointwise necessary optimality ¯ + f 0 (¯ x) + B u ¯=0 Ex (E + f 0 (¯ x))∗ p + `0 (¯ x) = 0 u ¯ ∈ ∂h∗ (−B ∗ p), or equivalently λ = h0c (u, λ), λ = B ∗ p. where hc is the Yoshida-M approximation of h∗∗ . 15.6 Algoritm Based the complementarity condition (??) we have the Primal dual active method for Lp optimization (h(u) = α2 |u|2 + β |u|p .): Primal-Dual Active method (Lp (Ω)-optimization) 1. Initialize u0 ∈ X and λ0 ∈ H. Set n = 0. 2. Solve for (y n+1 , un+1 , pn+1 ) Ey n+1 + Bun+1 = g, E ∗ pn+1 + `0 (y n+1 ) = 0, λn+1 = B ∗ pn+1 and un+1 = − λn+1 if ω ∈ {|λn + c un | > (µp + cup } α + βp|un |p−2 λn+1 = µp if ω ∈ {µp ≤ λn + cun ≤ µp + cun }. λn+1 = −µp if ω ∈ {µp ≤ −(λn + cun ) ≤ µp + cun }. un+1 = 0, if ω ∈ {|λn + cun | ≤ µp }. 3. Stop, or set n = n + 1 and return to the second seep. 173 16 Exercise Problem 1 If A is a symmetric matrix on Rn and b ∈ Rn is a vector. Let F (x) = 1 t t 0 2 x Ax − b x. Show that F (x) = Ax − b. Problem 2 Sequences {fn }, {gn }, {hn } are Cauchy in L2 (0, 1) but not on C[0, 1]. Problem 3 Show that ||x| − |y|| ≤ |x − y| and thus the norm is continuous. Problem 4 Show that all norms are equivalent on a finite dimensional vector space. Problem 5 Show that for a closed linear operator T , the domain D(T ) is a Banach space if it is equipped by the graph norm |x|D(T ) = (|x|2X + |T x|2Y )1/2 . Problem 6 Consider the spline interpolation problem (3.8). Find the Riesz representations Fi and F˜j in H02 (0, 1). Complete the spline interpolation problem (3.8). Problem 7 Derive the necessary optimality for L1 optimization (3.32) with U = [−1, 1]. Problem 8 Solve the constrained minimization in a Hilbert space X; min 1 2 |x| − (a, x) 2 subject to (yi , x) = bi , 1 ≤ i ≤ m1 , (˜ yj , x) ≤ cj , 1 ≤ j ≤ m2 . 1 2 over x ∈ X. We assume {yi }m yj } m i=1 , {˜ j=1 are linearly independent. Problem 9 Solve the pointwise obstacle problem: Z 1 1 du min ( | |2 − f (x)u(x)) dx 0 2 dx subject to 1 2 u( ) ≤ c1 , u( ) ≤ c2 , 3 3 1 over a Hilbert space X = H0 (0, 1) for f = 1. Problem 10 Find the differential form of the necessary optimality for Z 1 1 du 1 min ( (a(x)| |2 +c(x)|u|2 )−f (x)u(x)) dx+ (α1 |u(0)|2 +α2 |u(1)|2 )−(g1 , u(0))−(g2 , u(1)) dx 2 0 2 over a Hilbert space X = H 1 (0, 1). Problem 11 Let B ∈ Rm,n . Show that R(B) = Rm if and only if BB t is positive definite on Rm . Show that x∗ = B t (BB t )−1 c defines a minimum norm solution to Bx = c. Problem 12 Consider the optimal control problem (3.44) (Example Control problem) ˆ = {u ∈ Rm : |u|∞ ≤ 1}. Derive the optimality condition (3.45). with U Problem 13 Consider the optimal control problem (3.44) (Example Control problem) ˆ = {u ∈ Rm : |u|2 ≤ 1}. Find the orthogonal projection PC and derive the with U necessary optimality and develop the gradient method. 174 Problem 14 Let J(x) = R1 0 sign(x(t))d(t) dt. R1 0 |x(t)| dt is a functional on C(0, 1). Show that J 0 (d) = Problem 15 Develop the Newton methods for the optimal control problem (3.44) withˆ = Rm ). out the constraint on u (i.e., U Problem 16 (1) Show that the conjugate functional h∗ (p) = sup{(p, u) − h(u)} u is convex. (2) The set-valued function Φ Φ(p) = argmaxu {(p, u) − h(u)} is monotone. Problem 17 Find the graph Φ(p) for max {pu − |u|0 } |u|≤γ and max {pu − |u|1 }. |u|≤γ 175
© Copyright 2024