Download Report

Functional Analysis and Optimization
Kazufumi Ito∗
November 4, 2014
Abstract
In this monograph we develop the function space method for optimization problems
and operator equations in Banach spaces. Optimization is the one of key components
for mathematical modeling of real world problems and the solution method provides
an accurate and essential description and validation of the mathematical model. Optimization problems are encountered frequently in engineering and sciences and have
widespread practical applications. In general we optimize an appropriately chosen
cost functional subject to constraints. For example, the constraints are in the form of
equality constraints and inequality constraints. The problem is casted in a function
space and it is important to formulate the problem in a proper function space framework in order to develop the accurate theory and the effective algorithm. Nonsmooth
optimization becomes a very basic modeling toll and enlarges and enhances the applications of the optimization method in general. For example, in the classical variational
formulation we analyze the non Newtonian energy functional and nonsmooth friction
penalty. For the imaging/signal analysis the sparsity optimization is used by means of
of L1 and T V regularization. In order to develop an efficient solution method for large
scale optimizations it is essential to develop a set of necessary conditions in the equation form rather than the variational inequality. The Lagrange multiplier theory for
the constrained minimizations and nonsmooth optimization problems is developed and
analyzed The theory derives the complementarity condition for the multiplier and the
solution. Consequently, one can derive the necessary optimality system for the solution
and the multiplier in a general class of nonsmooth and noncovex optimizations. In the
light of the theory we derive and analyze numerical algorithms based the primal-dual
active set method and the semi-smooth Newton method.
The monograph also covers the basic materials for real analysis, functional analysis, Banach space theory, convex analysis, operator theory and PDE theory, which
makes the book self-contained and comprehensive. The analysis component is naturally connected to the optimization theory. The necessary optimality condition is
in general written as nonlinear operator equations for the primal variable and Lagrange multiplier. The Lagrange multiplier theory of a general class of nonsmooth and
non-convex optimization are based on the functional analysis tool. For example, the
necessary optimality for the constrained optimization is in general nonsmooth and a
psuedo-monontone type operator equation. To develop the mathematical treatment
∗
Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA
1
of the PDE constrained minimization we also present the PDE theory and linear and
nonlinear C0 -semigroup theory on Banach spaces for well-posedness of the control dynamics and optimization problems. That is, the existence and uniqueness of solutions
to a general class of controlled linear and nonlinear PDEs and Cauchy problems for
nonlinear evolution equations in Banach space are discussed. Especially the linear operator theory and the (psuedo-) monotone operator and mapping theory and the fixed
point theory. Throughout the monograph demonstrates the theory and algorithm using
concrete examples and describes how to apply it for motivated applications
1
Introduction
In this monograph we develop the function space approach for the optimization problems, e.g., nonlinear programming in Banach spaces, convex and non-convex nonsmooth
variational problems, control and inverse problems, image/signal analysis, material design, classification, and resource allocation and for the nonlinear equations and Cauchy
problems in Banach spaces.
A general class of constrained optimization problems in function spaces is considered and we develop the Lagrange multiplier theory and effective solution algorithms.
Applications are presented and analyzed for various examples including control and
design optimization, inverse problems, image analysis and variational problems. Nonconvex and non-smooth optimization becomes increasing important for advanced and
effective use of the optimization methods and also the essential topic for the monograph We introduce the basic function space and functional analysis tools needed for
our analysis. The monograph provides an introduction to the functional analysis, real
analysis and convex analysis and their basic concepts with illustrated examples. Thus,
one can use the book as a basic course material for the functional analysis and nonlinear operator theory. An emerging application of optimization is the imaging analysis
and can be formulated as the PDE’s constrained optimization. The PDE theory and
the nonlinear operator theory and their applications are analyzed in details.
There are numerous applications of the optimization theory and variational method
in all sciences and to real world applications. For example, the followings are the
motivated examples.
Example 1 (Quadratic programming)
min
1 t
2 x Ax
− bt x over x ∈ Rn
subject to Ex = c (equality constraint) Gx ≤ g (inequality constraint).
where A ∈ Rn×n is a symmetric positive matrix, E ∈ Rm×n , G ∈ Rp×n , and b ∈ Rn ,
c ∈ Rm , g ∈ Rp are given vectors. The quadratic programming is formulated on a
Hilbert space X for the variational problem, in which (x, Ax)X defines the quadratic
form on X.
Example 2 (Linear programming) The linear programing minimizes (c, x)X subject to
the linear constraint Ax = b and x ≥ 0. Its dual problem is to maximize (b, x) subject
to A∗ x ≤ c. It has many important classes of applications. For example, the optimal
transport problem is to find the joint distribution function µ(x, y) ≥ 0 on X × Y that
2
transports the probability density p0 (x) to p1 (y), i.e.
Z
Z
µ(x, y) dx = p0 (x),
µ(x, y) dy = p1 (y),
Y
Y
with minimum energy
Z
c(x, y)µ(x, y) dxdy = (c, µ).
X×Y
Example 3 (Integer programming) Let F is a continuous function Rn . The integer
programming is to minimize F (x) over the integer variables x, i.e.,
min
F (x)
subject to x ∈ {0, 1, · · · N }.
The binary optimization is to minimize F (x) over the binary variables x. The network
optimization problem is the one of important applications.
Example4 (Obstacle problem) Consider the variational problem;
Z
min
J(u) =
0
1
1 d
( | u(x)|2 − f (x)u(x)) dx
2 dx
(1.1)
subject to
u(x) ≤ ψ(x)
over all displacement function u(x) ∈ H01 (0, 1), where the function ψ represents the
upper bound (obstacle) of the deformation. The functional J represents the Newtonian
deformation energy and f (x) is the applied force.
Example 5 (Function Interpolation) Consider the interpolation problem;
Z
min
0
1
1 d2
( | 2 u(x)|2 dx
2 dx
over all functions u(x),
(1.2)
subject to the interpolation conditions
u(xi ) = bi ,
d
u(xi ) = ci , 1 ≤ i ≤ m.
dx
Example 6 (Image/Signal Analysis) Consider the deconvolution problem of finding the
image u that satisfies (Ku)(x) = f (x) for the convolution K
Z
(Ku)(x) =
k(x, y)u(y) dy.
Ω
over a domain Ω. It is in general ill-posed and we use the (Tikhonov) regularization
formulation to have a stable deconvolution solution u, i.e., for Ω = [0, 1]
Z
min
0
1
(|Ku − f |2 +
α d
| u(x)|2 + β |u(x)|) dx over all possible image u ≥ 0, (1.3)
2 dx
where α, β ≥ 0 are the regularization parameters and the two regularization functional
is chosen so that the variation of image u is regularized.
3
Example 7 (L0 optimization) Problem of optimal resource allocation and material distribution and optimal time scheduling can be formulated as
Z
h(u(ω)) dω
(1.4)
F (u) +
Ω
with h(u) =
α
2
2 |u|
+ β |u|0 where
|0|0 = 0,
Since
Z
|u|0 = 1, otherwise.
|u(ω)|0 dω = vol({u(ω) 6= 0}),
Ω
the solution to (1.4) provides the optimal distribution of u for minimizing the performance F (u) with appropriate choice of (regularization) parameters α, β > 0.
Example 8 (Bayesian statistics) Consider the statical optimization;
min
−log (p(y|x) + βp(x)) over all admissible parameters x,
(1.5)
where p(y|x) is the conditional probability density of observation y given parameter
x and the p(x) is the prior probability density for parameter x. It is the maximum
posterior estimation of the parameter x given observation y. In the case of inverse
problems it has the form
φ(x|y) + α p(x)
(1.6)
for x is in a function space X and φ(x|y) is the fidelity and p is a regularization
semi-norm on X. A constant α > 0 is the Tikhonov regularization parameter.
Multi-dimensional versions of these examples and more illustrated examples will be
discussed and analyzed throughout the monograph.
Discretization An important aspect of the function space optimization is the approximation method and theory. For example, let Bin (x) is the linear B-spline defined
by
 x−x
i−1

on [xi−1 , xi ]
 xi −xi−1
x−xi+1
(1.7)
Bi (x) =
on [xi , xi+1 ]
x −x

 i i+1
0
otherwise
and let
un (x) =
n−1
X
ui Bi (x) ∼ u(x).
(1.8)
i=1
Then,
d n ui − ui−1
u =
,
dx
xi − xi−1
Let xi =
i
n
and ∆x =
1
n
x ∈ (xi−1 , xi ).
and one can define the discretized problem of Example 1:
min
n
X
1 ui − ui−1 2
( |
| − fi ui ) ∆x
2
∆x
i=1
subject to
u0 = un = 0,
ui ≤ ψi = ψ(xi ).
4
where fi = f (xi ). The discretized problem is the quadratic programming for ~u ∈ Rn−1
with




2 −1
f1
 −1 2 −1

 f2 



1 


 .. 
.
.
.
.
.
.
A=
,
b
=
∆x


 . 
.
.
.



∆x 


 fn−2 
−1 2 −1
−1 2
fn−1
Thus, we will discuss a convergence analysis of the approximate solutions {un } to the
solution u as n → ∞ in an appropriate function space.
In general, one has the optimization of the form
min
F0 (x) + F1 (x) over x ∈ X subject to E(x) ∈ K
where X is a Banach space, F0 is C 1 and F1 is nonsmooth functional and E(x) ∈ K
represents the constraint conditions with K is a closed convex cone. For the existence
of a minimizer we require a coercivity condition and lower-semicontinuity and the
continuity of the constraint in a weak or weak star topology of X. For the necessary
optimality condition, we develop a generalized Lagrange multiplier theory:
 0 ∗
F (x ) + λ + E 0 (x∗ )∗ µ = 0, E(x) ∈ K





C1 (x∗ , λ) = 0
(1.9)





C2 (x∗ , µ) = 0,
where λ is the Lagrange multiplier for a relaxed derivative of F1 and µ is the Lagrange
multiplier for the constraint E(x) ∈ K. We derive the complementarity conditions C1
and C2 so that (1.9) is a complete set of equation for determining (x∗ , λ, µ).
We discuss the solution method for linear and nonlinear equations in Banach spaces
of the form
f ∈ A(x)
(1.10)
where f ∈ X ∗ in the dual space of a Banach space and A is pseudo-monotone mapping
X → X ∗ . and the Cauchy problem:
d
x(t) ∈ A(t)(x(t)) + f (t),
dt
x(0) = x0 ,
(1.11)
where A(t) is an m-dissipative mapping in X × X. Also, we note that the necessary
optimality (1.9) is a nonlinear operator equation and we use the nonlinear operator
theory to analyze solutions to (1.9). We also discuss the PDE constrained optimization problems, in which f and f (t) are a function of control and design and medium
parameters. We minimize a proper performance index defined for the state and control
functions subject to the constraint (1.10) or (1.11). Thus, we combine the analysis of
the PDE theory and the optimization theory to develop a comprehensive treatment
and analysis of the PDE constrained optimization problems.
The outline of the monograph is as follows. In Section 2 we introduce the basic
function space and functional analysis tools needed for our analysis. It provides the
introduction to the functional analysis and real analysis and their basic concepts with
illustrated examples. For example, the dual space of normed space, Hahn-Banach
5
theorem, open mapping theory, closed range theory, distribution theory and weak
solution to PDEs, compact operator and spectral theory.
In Section 3 we develop the Lagrange multiplier method for a general class of
(smooth) constrained optimization problems in Banach spaces. The Hilbert space
theory is introduced first and the Lagrange multiplier is defined as the limit of the
penalized problem. The constraint qualification condition is introduced and we derive
the (normal) necessary optimality system for the primal and dual variables. We discuss
the minimum norm problem and the duality principle in Banach spaces. In Section
4 we present the linear operator theory and C0 -semigroup theory on Banach spaces.
In Section 5 we develop the monotone operator theory and pseudo-monotone operator
theory for nonlinear equations in Banach spaces and the nonlinear semigroup theory
in Banach spaces. The fixed point theory for nonlinear equations in Banach spaces.
In Section 7 we develop the basic convex analysis tools and the Lagrange multiplier
theory for a general class of convex non-smooth optimization. We introduce and analyze
the augmented Lagrangian method for the constrained optimization and non-smooth
optimization problems.
In Section 8 we develop the Lagrange multiplier theory for a general class of nonsmooth and non-convex optimizations.
2
Basic Function Space Theory
In this section we discuss the basic theory and concept of the functional analysis and
introduce the function spaces, which is essential to our function space analysis of a general class of optimizations. We refer to the more comprehensive treatise and discussion
on the functional analysis for e.g., Yosida [].
2.1
Complete Normed Space
In this section we introduce the vector space, normed space and Banach spaces.
Definition (Vector Space) A vector space over the scalar field K is a set X, whose
elements are called vectors, and in which two operations, addition and scalar multiplication, are defined, with the following familiar algebraic properties.
(a) To every pair of vectors x and y corresponds a vector x + y, in such a way that
x+y =y+x
and x + (y + z) = (x + y) + z;
X contains a unique vector 0 (the zero vector) such that x + 0 = 0 + x for every x ∈ X,
and to each x ∈ X corresponds a unique inverse vector −x such that x + (−x) = 0.
(b) To every pair (α, x) with α ∈ K and x ∈ X corresponds a vector α x, in such a
way that
1 x = x, α(βx) = (αβ) x,
and such that the two distributive laws
α (x + y) = α x + α y,
(α + β) x = α x + β x
hold.
6
The symbol 0 will of course also be used for the zero element of the scalar field. It is
easy to see that the inverse element −x of the additivity satisfies −x = (−1) x since
0 x = (1 − 1) x = x + (−1) x for each x ∈ X. In what follows we consider vector spaces
only over the real number field R or the complex number field C.
Definition (Subspace, Basis) (1) A subset S of X is a (linear) subspace of a vector
space X if α x1 + β x2 ∈ S for all x, x2 ∈ S and and α, β ∈ K. P
(2) A family of linear vectors {xi }ni=1 are linearly independent if ni=1 αi xi = 0 if and
only if αi = 0 for all 1 ≤ i ≤ n. The dimension dim(X) is the largest number of
independent vectors in X.
(3) A basis {eα } of a vector space X is a set of linearly independent
vectors that, in a
P
linear combination, can represent every vector in X, i.e., x = α aα eα .
Example (Vector Space) (1) Let X = C[0, 1] be the vector space of continuous functions
f on [0, 1]. X is a vector space by
(α f + β g)(t) = α f (t) + β g(t),
t ∈ [0, 1]
S= a space of polynomials of order less than n is a linear subspace of X and dim(X) =
n. A family {tk }∞
k=0 of polynomials is a basis of X and dom(X) = ∞.
(2) The vector space of real number sequences x = (x1 , x2 , x3 , · · · ) is a vector space by
(α x + β y)k = α xk + β yk ,
k ∈ N.
A family of unit vectors ek = (0, · · · , 1, 0, · · · ), k ∈ N is a basis.
(3) Let X be the vector space of square integrable functions on (0, 1). Let ek (x) =
Z 1
p
(2) sin(kπx), 0 ≤ x ≤ 1. Then {ek } are orthonomal basis, i.e. (ek , ej ) =
ek (x)ej (x) dx =
0
δk,j and
f (x) =
∞
X
fk ek (x), the Fourier sine series
k=1
R1
where fP
k = 0 f (x)ek (x) dx. en+1 is not linearly dependent with (e1 , · · · , en ). If so,
en+1 = ni=1 βi ei but βi = (en+1 , ei ) = 0, which is the contradiction.
Definition (Normed Space) A norm | · | on a vector space X is a real-valued functional on X which satisfies
|x| ≥ 0 for all x ∈ X with equality if and only if x = 0,
|α x| = |α| |x| for every x ∈ x and α ∈ K
|x + y| ≤ |x| + |y| (triangle inequality) for every x, y ∈ X.
A normed space is a vector space X which is equipped with a norm | · | and will be
denoted by (X, | · |). The norm of a normed space X will be denoted by | · |X or simply
by | · | whenever the underlined space X is understood from the context.
Example (Normed space) Let X = C[0, 1] be the vector space of continuous functions
f on [0, 1]. The followings are examples the two normed space for X; X1 is equipped
with sup norm
|f |X1 = max |f (x)|.
x∈[0,1]
7
and X2 is equipped with L2 norm
s
Z
1
|f |X2 =
|f (x)|2 dx.
0
Definition (Banach Space) Let X be a normed space.
(1) A sequence {xn } in a normed space X converges to the limit x∗ ∈ X if and only if
limn→∞ |xn − x∗ | = 0 in R.
(2) A subspace S of a normed space X is said to be dense in X if each x ∈ X is the
limit of a sequence of elements in S.
(3) A sequence {xn } in a normed space X is called a Cauchy sequence if and only if
limm, n→∞ |xm − xn | = 0, i.e., for all > 0 there exists N such that for m, n ≥ N such
that |xm − xn | < . If every Cauchy sequence in X converges to a limit in X, then X
is complete and X is called a Banach space.
A Cauchy sequence {xn } in a normed space X has a unique limit if it has accumulation points. In fact, for any limit x∗
lim |x∗ − xn | = lim
n→∞
lim |xm − xn | = 0.
n→∞ m→∞
Example (continued) Two norms |x|1 and |x|2 on a vector space X are equivalent if
there exist 0 < c ≤ c¯ < ∞ such that
c|x|1 ≤ |x|2 ≤ c¯ |x|1
for all x ∈ X.
(1) All norms on Rn is areqequivalent. But for X = C[0, 1] consider xn (t) = tn . Then,
1
for all n. Thus, |xn |X2 → 0 as n → ∞ and the two
|xn |X1 = 1 but |xn |X2 = 2n+1
norms X1 and X2 are not equivalent.
(2) Every bounded sequence in Rn has a convergent subsequence in Rn (BolzanoWeierstrass theorem). Consider the sequence xn (t) = tn in X = C[0, 1]. Then xn is
not convergent despite |xn |X1 = 1 is bounded.
Definition (Hilbert Space) If X is a vector space, a functional (·, ·) defined on X ×X
is called an inner product on X provided that for every x, y, z ∈ X and α, β ∈ C
(x, y) = (y, x)
(α x + β y, z) = α (x, z) + β (y, z),
(x, x) ≥ 0 and (x, x) = 0 if and only if x = 0.
where c¯ denotes the complex conjugate of c ∈ C. Given such an inner product, a norm
on X can be defined by
1
|x| = (x, x) 2 .
A vector space (X, (·, ·)) equipped with an inner product is called a pre–Hilbert space.
If X is complete with respect to the induced norm, then it is called a Hilbert space.
Every inner product satisfies the Cauchy-Schwarz inequality
|(x, y)| ≤ |x| |y| for x, y ∈ X
8
In fact, for all t ∈ R
|x + t(x, y) y|2 = |x|2 + 2t |(x, y)|2 + t2 |(x, y)|2 |y|2 ≥ 0.
Thus, we must have |(x, y)|4 − |x|2 |(x, y)|2 |y|2 ≤ 0 which shows the Cauchy-Schwarz
inequality.
Example (Hilbert space) (1) Rn with inner product
(x, y) =
n
X
xk yk
for x, y ∈ Rn
k=1
(2) Consider a space `2 of square summable real number sequences x = (x1 , x2 , · · · )
define the inner product
(x, y) =
∞
X
xk yk
for x, y ∈ `2
k=1
Then, `2 is an Hilbert space. In fact, let {xn } is a Cauchy sequence in `2 . Since
n
m
|xnk − xm
k | ≤ |x − x |`2 ,
{xnk } is a Cauchy sequence in R for every k. Thus, one can define a sequence x by
xk = limn→∞ xnk . Since
lim |xm − xn |`2 = |x − xn |`2
m→∞
`2
for a fixed n, x ∈ and |xn − x|`2 → as n → ∞.
(3) Consider a space X3 of continuously differentiable functions on [0, 1] vanishing at
x = 0, x = 1 with inner product
Z 1
d d
(f, g) =
f g dx for f, g ∈ X3 .
0 dx dx
Then X3 is a Pre-Hilbert space.
Every normed space X is either a Banach space or a dense subspace of a Banach
space Y whose norm |x|Y satisfies
|x|Y = |x| for every x ∈ X.
In this latter case Y is called the completion of X. The completion of a normed space
proceeds as in Cantor’s Construction of real numbers from rational numbers.
Completion The set of Cauchy sequences {xn } of the normed space X can be classified
according to the equivalence {xn } ∼ {yn } if lim n→∞ |xn −yn | = 0. Since ||xn |−|xm || ≤
|xn − xm | and R is complete, it follows that limn→∞ |xn | exists. We denote by {xn }0 ,
the class containing {xn }. Then the set Y of all such equivalent classes x
˜ = {xn }0 is a
vector space by
{xn }0 + {yn }0 = {xn + yn }0 ,
α {xn }0 = {α xn }0
and we define
|{xn }0 |Y = lim |xn |.
n→∞
9
It is easy to see that these definitions of the vector sum and scalar multiplication and
norm do not depend on the particular representations for the classes {xn }0 and {yn }0 .
For example, if {xm } ∼ {yn }, then lim n→∞ ||xn | − |yn || ≤ lim n→∞ |xn − yn | = 0, and
thus lim n→∞ |xn | = lim n→∞ |yn |. Since |{xn }0 |Y = 0 implies that limn→∞ |xn − 0|,
we have {0} ∈ {xn }0 and thus | · |Y is a norm. Since an element x ∈ X corresponds to
the Cauchy sequence:
x
¯ = {x, x, . . . , x, . . . }0 ∈ Y,
X is naturally contained in Y .
(k)
To prove the completeness of Y , let {˜
xk } = {{xn }} be a Cauchy sequence in Y .
Then for each k, we can choose an integer nk such that
(k)
−1
|x(k)
m − xnk |X ≤ k
for m > nk .
We show that the sequence {˜
xk } converges to the class containing the Cauchy sequence
(k)
{xnk } of X.
To this end, note that
(k)
(k)
−1
|˜
xk − xnk |Y = lim |x(k)
m − xnk |X ≤ k .
m→∞
Since
(k)
(m)
(k)
(m)
(m)
˜k |Y + |˜
xk − x
˜m |Y + |˜
xm − xnm |Y
|x(k)
nk − xnm |X = |xnk − xnm |Y ≤ |xnk − x
≤ k −1 + m−1 + |˜
xk − x
˜ m |Y ,
(k)
(k)
{xnk } is a Cauchy sequence of X. Let x
˜ = {xnk }0 . Then,
(k)
(k)
(k)
|˜
x−x
˜k |Y ≤ |˜
x − xnk |Y + |xnk − x
˜k |Y ≤ |˜
x − xnk |Y + k −1 .
Since, as shown above
(k)
(k)
|˜
x − xnk |Y = lim |xm
xm − x
˜k |Y + k −1
nm − xnk |X ≤ lim |˜
m→∞
m→∞
(k)
thus we prove that limk→∞ |˜
x − xnk |Y = 0 and so limk→∞ |˜
x−x
˜k |Y = 0. Moreover,
the above proof shows that the correspondence x ∈ X to x
¯ ∈ Y is isomorphic and
isometric and the image of X by this correspondence is dense in Y , that is X is dense
in Y . Example (Completion) Let X = C[0, 1] be the vector space of continuous functions f
on [0, 1]. We consider the two normed space for X; X1 is equipped with sup norm
|f |X1 = max |f (x)|.
x∈[0,1]
and X2 is equipped with L2 norm
s
Z
1
|f |X2 =
0
10
|f (x)|2 dx
We claim that X1 is complete but X2 is not. The completion of X2 is L2 (0, 1) =the
space of square integrable functions.
First, discuss X1 . Let {fn } is a Cauchy sequence in X1 . Since
|fn (x) − fm (x)| ≤ |fn − fm |X1 ,
{fn (x)} is Cauchy sequence in R for every x ∈ [0, 1]. Thus, one can define a function
f by f (x) = limn→∞ fn (x). Since
|f (x) − f (ˆ
x)| ≤ |fn (x) − fn (ˆ
x)| + |fn (x) − f (x)| + |fn (ˆ
x) − f (ˆ
x)|
and |fn (x) − f (x)| → 0 uniformly x ∈ [0, 1], the candidate f is continuous. Thus, X1
is complete.
Next, discuss X2 . Define a Cauchy sequence {fn } by
 1
on [ 12 − n1 , 12 + n1 ]
 2 + n (x − 12 )
0
on [0, 12 − n1 ]
fn (x) =

1
on [ 12 + n1 , 1]
The candidate limit f in this case is defined by f (x) = 0, x < 21 , f ( 12 ) = 12 and
f (x) = 1, x > 12 and is not in X. Consider a Cauchy sequences gn and hn in the
equivalence class {fn };

on [ 12 − n1 , 12 ]
 1 + n (x − 12 )
0
on [0, 12 − n1 ]
gn (x) =

1
on [ 12 , 1]

 n (x − 12 )
0
hn (x) =

1
on [ 12 , 12 + n1 ]
on [0, 12 ]
on [ 12 + n1 , 1].
Thus, the candidate limits g, (g(x) = 0, x < 12 , g(x) = 1, x ≥ 12 and h, (h(x) = 0, x ≤
1
1
2 , h(x) = 1, x > 2 belong to the same equivalent class f = {fn }. Note that f , g and
h differ only at x = 12 . In general functions in an equivalent class differ at countably
many points. The completion Y of X2 is denoted by
L2 (0, 1) = {the space of square integrable measurable functions on (0, 1)}
with
|f |2L2
1
Z
|f (x)|2 dx.
=
0
Here, the integrable is defined in the sense of Lebesgue as discussed in the next section.
¯ =the completion of X there
The concept of completion states, for every f ∈ Y = X
exists a sequence fn in X such that |fn − f |Y → 0 as n → ∞ and |f |Y = limn→∞ |fn |X .
¯ 2 there
For the example X2 , for all square integrable function f ∈ L2 (0, 1) = Y = X
exists a continuous function sequence fn ∈ X2 such that |f − fn |L2 (0,1) → 0.
Example H01 (0, 1) The completion of X3 as defined above is denoted by
H01 (0, 1) = {a space of absolutely continuous functions on [0, 1] with square integrable derivative
and vanishing at x = 0, x = 1}
11
with
Z
(f, g)H01 =
1
0
d
d
f (x) g(x) dx
dx
dx
where the integrable is defined in the sense of Lebesgue. Here, f is absolutely continuous function if there exits an integrable function g such that
Z x
f (x) = f (0) +
g(x) dx x ∈ [0, 1].
0
d
and f is almost everywhere (a.e.) differentiable and dx
f = g a.e. in (0, 1). For the
n
1
linear spline function Bk (x), 1 ≤ k ≤ n − 1 is in H0 (0, 1).
2.2
Measure Theory
In this section we discuss the measure theory.
Definition (1) A topological space is a set E together with a collection τ of subsets of
E, called open sets and satisfying the following axioms; The empty set and E itself are
open. Any union of open sets is open. The intersection of any finite number of open
sets is open. The collection τ of open sets is then also called a topology on E. We
assume that a topological space (E, τ ) satisfies the Hausdoroff’s axiom of separation:
for every disjoints x1 , x2 ∈ E
there exist disjoint open sets G, G2 auch that x1 ∈ G1 , x2 ∈ G2 .
(2) A metric space E is a topological space with metric d if for all x, y, z ∈ E
d(x, y) ≥ for x, y ∈ E and d(x, y) = 0 if and only if x = y
d(x, y) = d(y, x)
d(x, z) ≤ d(x, y) + d(y, z).
For x0 ∈ E and r > 0 B(x0 , r) = {x ∈ E : d(x, x0 ) < r is an open ball of E with center
x0 an radius r. A set O is called open if for any points x ∈ E there exists an open ball
B(x, r) contained in the set O.
(3) A collection of subsets F of E is σ-algebra if
for all A ∈ F, Ac = E \ A ∈ F
for arbitrary family {Ai } in F, countable union
S∞
i=1 Ai
∈F
(4) A measure µ on (E, F) assigns A ∈ F the nonnegative real value µ(A) and satisfies
σ-additivity;
∞
∞
[
X
µ( Ai ) =
µ(Ai )
i=1
i=1
for all disjoint sets {Ai } in F.
Theorem (Monotone Convergence) The measure
S µ is σ-additive if and only if for
all sequence {Ak } of nondecreasing events and A = k≥1 Ak , limn→∞ µ(An ) = µ(A).
12
Proof: {Ack } is a sequence of nonincreasing events and Ac =
\
T
c
k≥1 Ak .
Since
Ack = Ac1 + (Ac2 \ Ac1 ) + (Ac3 \ Ac2 ) + · · ·
k≥1
we have
µ(Ac ) = µ(Ac1 ) + µ(Ac2 \ Ac1 ) + µ(Ac3 \ Ac2 ) + · · ·
= µ(Ac1 ) + µ(Ac2 ) − µ(Ac1 ) + µ(Ac3 ) − µ(Ac1 ) + · · · = limn→∞ µ(Acn )
Thus,
µ(A) = 1 − µ(Ac ) = 1 − (1 − lim µ(An )) = lim µ(An ).
n→∞
n→∞
P
Conversely, let A1 , A2 , · · · ∈ F be pairwise disjoint and let ∞
k=1 Ak ∈ F. Then
∞
n
∞
X
X
X
µ(
Ak ) = µ(
Ak ) + µ(
Ak )
k=1
Since
P∞
k=n+1 Ak
k=1
k=n+1
↓ ∅, we have
∞
X
lim
n→∞
and thus
µ(
∞
X
Ak ) = lim
k=n+1
n→∞
k=1
Ak = ∅
n
X
µ(Ak ) =
k=1
∞
X
µ(Ak ).
k=1
Example (Metric space) A normed space X is a metric space with d(x, y) = |x − y|. A
point x ∈ E is the accumulation point of a sequence {xn } if d(xn , x) → 0 as n → ∞.
The closure A¯ of a set A is the collection of all accumulation points of A. The boundary
of asset A is the collection points of x that any open ball at x contains both points in
¯ i.e., every sequence in {xn } in B has a
A and Ac . A set B in E is closed if B = B,
c
accumulation x. The complement A of an open set A is closed.
Examples (σ-algebra) There are trivial σ-algebras;
F0 = {Ω, ∅},
F ∗ = all subsets of Ω.
Let A be a subset of Ω and σ-algebra generated by A is
FA = {Ω, ∅, A, Ac }.
A finite set of subsets A1 , A2 , · · · , An of E which are pairwise disjoint and whose
S union
is E. It is called a partition of Ω . It generates the σ-algebra: A = {A = j∈J Aj }
where J runs over all subsets of 1, · · · , n. This σ-algebra has 2n elements. Every finite
σ-algebra is of this form. The smallest nonempty elements {A1 , · · · , An } of this algebra
are called atoms.
Example (Countable measure) Let Ω has a countable decomposition {Dk }, i.e.,
Ω=
X
Dk ,
Dj ∩ Dj = ∅, i 6= j.
13
Let F = F ∗ and P (Dk ) = αk > 0 and
nonnegative integers;
Dk = {k},
P
k
αk = 1. For the Poisson measure on
m(Dk ) = e−λ
λk
k!
for λ > 0.
Definition For any family C of subsets of E, we can define the σ-algebra σ(C) by the
smallest σ-algebra A which contains C. The σ-algebra σ(C) is the intersection of all
σ-algebras which contains C. It is again a σ-algebra, i.e.,
\
Aα
A=
α
where Aα are all σ-algebras that contain C.
The following construction of the measure space and the measure is essential for
the triple (E, F, m) on uncountable space E.
If (E, O) is a topological space, where O is the set of open sets in E, then the σalgebra B(E) generated by O is called the Borel σ-algebra of the topological space E
and (E, B(E)) defines the measure space and a set B in B(E) is called a Borel set. For
example (Rn , B(Rn )) is the the measure space and B(Rn ) is the σ-algebra generated
by open balls in Rn .
Caratheodory Theorem Let B = σ(A), the smallest σ-algebra containing an algebra
A of subsets of E. Let µ0 is a σ additive measure of on (E, A). Then there exist a
unique measure µ on (E, B) which is an extension of µ0 , i.e., µ(A) = µ0 (A), A ∈ A.
Definition A map f from a measure space (X, A) to an other measure space (Y, B)
is called measurable, if f −1 (B) = {x ∈ X : f (x) ∈ B} ∈ A for all B ∈ B. If
f : (Rn , B(Rn )) → (Rm , B(Rm )) is measurable, we say f is a Borel function.
In general every continuous function Rn → Rm is a Borel function since the inverse
image of open sets in Rn are open in Rm .
Example (Measure space) Let Ω = R and B(R) be the Borel σ-algebra. Note that
(a, b] =
\
1
(a, b + ),
n
n
[a, b] =
\
1
1
(a − , b + ) ∈ B(R).
n
n
n
Thus, B(R) coincides with the σ-algebra generated by the semi-closed intervals. Let A
be the algebra of finite disjoint sum of semi-closed intervals (ai , bi ] and define P0 by
µ0 (
n
X
(ak , bk ]) =
k=1
n
X
(F (bk ) − F (ak ))
k=1
where x ∈ R → F (x) ∈ R+ is nondecreasing, right continuous and the left limit exists
everywhere. We have the measure µ on (R, B(R)) by the Caratheodory Theorem if µ0
is σ-additive on A. on (Ω, F) = (R, B(R)).
We now prove that µ0 is σ-additive on A. By the monotone convergence theorem
it suffices to prove that
P0 (An ) ↓ 0,
An ↓ ∅,
14
An ∈ A.
Without loss of the generality one can assume that An ⊂ [−N, N ]. Since F is the right
continuous, for each An there exists a set Bn ∈ A such that Bn ⊂ An and
µ0 (An ) − µ0 (Bn ) ≤ 2−n
for all > 0. The collection of sets {[−N, N ]\Bn } is an open covering of the compact set
[−N, N ] since ∩Bn = ∅. By the Heine-Borel theorem there exists a finite subcovering:
n0
[
[−N, N ] \ Bn = [−N, N ].
n=1
and thus
Tn0
n=1 Bn
= 0. Thus,
µ0 (An0 ) = µ0 (An0 \
µ0 (
n0
\
Tn0
k=1 Bk )
(Ak \ Bk )) ≤
k=1
n0
X
T 0
T 0
Bk )
Bk ) = µ0 (An0 \ nk=1
+ µ0 ( nk=1
µ0 (Ak \ Bk ) ≤ .
k=1
Since > 0 is arbitrary µ0 (An ) → 0 as n → ∞.
Next, we define the integral of a measurable function f on (Ω, F, µ).
Defintion (Elementary function) An elementary function f is defined by
f (x) =
n
X
xk IAk (x)
k=1
where {Ak } is a partition of E, i.e, Ak ∈ F are disjoint and ∪Ak = E. Then the
integral of f is given by
Z
f (x) dµ(x) =
E
n
X
xk µ(Ak ).
k=1
Theorem (Approximation) For every measurable function f ≥ 0 on (E, F) there
exists a sequence of elementary functions {fn } such that 0 ≤ fn (x) ≤ f (x) and fn (x) ↑
f (x) for all x ∈ E.
Proof: For n ≥ 1, define a sequence of elementary functions by
n
fn (ω) =
n2
X
k−1
k=1
2n
Ik,n (ω) + nIf (x)>n
k
where Ik,n is the indicator function of the set {x : k−1
2n < f (x) ≤ 2n }. It is easy to verify
that fn (x) is monotonically nondecreasing and fn (x) ≤ f (x) and thus fn (x) → f (x)
for all x ∈ E. Definition (Integral) For a nonnegative measurable function f on (E, F) we define
the integral by
Z
Z
f (x) dµ(x) = lim
fn (x) dµ(x)
E
n→∞ E
15
Z
fn (x) dµ(x) is an increasing number sequence.
where the limit exists since
E
Note that f = f + − f − with f + (x) = max(0, f (x)), f − (x) = max(0, f (x)). So, we
can apply for Theorem and Definition for f + and f − .
Z
Z
Z
+
f − (x) dµ(x)
f (x) dµ(x) −
f (x) dµ(x) =
Z
f + (x) dµ(x),
If
E
E
E
Z
f − (x) dµ(x) < ∞, f is integrable and
E
E
Z
Z
|f (x)| dµ(x) =
Z
+
f (x) dµ(x) +
E
E
E
f − (x) dµ(x).
Corollary Let f is a measurable function on (R, B(R), µ) with µ((a, b]) = F (b)−F (a),
then we have as n → ∞
Z
n
∞
f (x) dµ(x) =
0
n2
X
k=0
k
k−1
k−1
f ( n )(F ( n )−F ( n ))+f (n)(1−F (n)) →
2
2
2
Z
∞
f (x) dF (x),
0
which is the Lebesgue Stieljes integral with respect to measure dF .
The space L of measurable functions on (E, F) is a complete metric space with
metric
Z
|f − g|
dµ.
d(f, g) =
1
+
|f − g|
E
In fact, d(f, g) = 0 if and if f = g almost everywhere and since
|f + g|
|f |
|g|
≤
+
,
1 + |f + g|
1 + |f | 1 + |g|
d satisfies the triangle inequality.
Theorem (completeness) (L, d) is a complete metric space.
Proof: Let {fn } be a Cauchy sequence of (L, d). Select a subsequence fnk such that
∞
X
d(fnk , fnk+1 ) < ∞
k=1
Then,
X
E[
k
Since |f | ≤≤
2|f |
1+|f |
|fnk − fnk+1 |
<∞
1 + |fnk − fnk+1 |
for |f | ≤ 1,
∞
X
|fnk − fnk+1 | < ∞ a.e.
and {fnk } almost everywhere converges to f for some f ∈ L. Moreover, d(fnk , f ) → 0
and thus (L, d) is complete.
16
Definition (Convergence in Measure) A sequence {fn } of mesurable functions on S converges to f in measure if
for > 0, limn→∞ m({s ∈ S : |fn (s) − f (s)| ≥ })=0.
Theorem (Convergence in measure) {fn } converges in measure to f if and only
if {fn } converges to f in d-metric.
Proof: For f, g ∈ L
Z
d(f, g) =
|f −g|≥
|f − g|
dµ +
1 + |f − g|
Z
|f −g|<
|f − g|
dµ ≤ µ(|f − g| ≥ ) +
1 + |f − g|
1+
holds for all > 0. Thus, fn converges in mesure to f , then fn converges to f in
d-metric. Conversely, since
d(f, g) ≥
µ(|f − g| ≥ )
1+
if fn converges to f in d-metric, then fn converges in measure to f . Corollary The completion of C(E) with respect to Lp (E)-norm:
Z
1
( |f |p dµ) p
E
is Lp (E) = {f ∈ L :
2.3
R
E
|f |p dµ}.
Baire’s Category Theory
In this section we discuss the uniform boundedness principle and the open mapping
theorem. The underlying idea of the proofs of these theorems is the Baire theorem for
complete metric spaces. It is concerned with the decomposition of a space as a union
of subsets. For instance, we have a countable union;
[
R2 =
Sk,j , Sk = (k, k + 1] × (j, j + 1].
k,j
On the other hand, we have uncountable union;
[
`α , `α = {α} × R
α∈R
but `α has no interior points, the one dimensional subspace of R2 . The question is can
we represent R2 as a countable union of these sets? It turns out that the answer is no.
Definition (Category) Let (X, d) be a metric space. A subset E of X is called
nowhere dense if its closure does not contain any open set. X is said to be the first
category if X is the union of a countable number of S
nowhere dense sets. Otherwise, X
is said to be the second category, i.e., suppose X = k Ek , then the closure of at least
one of Ek ’s has non-empty interior.
Baire-Hausdorff Theorem A complete metric space is of the second category.
17
Proof: Let {Ek } be a sequence of closed sets and assume ∪n En = X in a complete
metric space X. We will show that there is at least one of Ek ’s has non-empty interior.
Assume otherwise, no En contains an interior point. Let On = Enc and then On is
open and dense in X for every n ≥ 1. Thus, O1 contains a closed ball B1 = {x :
d(x, x1 ) ≤ r1 } with r1 ∈ (0, 21 ). Since O2 is open and dense, there exists a closed ball
B2 = {x : d(x, x2 ) ≤ r2 } with r2 ∈ (0, 212 ) in B1 . By repeating the same process, there
exists a sequence {Bk } of closed ball Bk = {x : x : d(x, xk ) ≤ rk such that
Bk+1 ⊂ Bk , 0 < rk
1
, Bk ∩ Ek = ∅.
2k
The sequence {xk } is a Cauchy sequence in X and since X is complete x∗ = lim xn ∈ X
exists. Since
d(xn , x∗ ) ≤ d(xn , xm ) + d(xm , x∗ ) ≤ rn + d(xm , x∗ ) → rn
as m → ∞, x∈ Bn for every n, i.e. x∗ ∈
/ En and thus x∗ ∈
/ ∩n En = X, which is a
contradiction. Baire Category theorem Let M be a set of the first category in a compact topological space. Then, the complement M c is dense in X.
Theorem (Totally bounded) A subset M in a complete metric space X is relatively
compact if and only if it is totally bounded, i.e, for every > 0 there exists a family of finite many points {m1 , m2 , · · · , mn } in M such that for every point m ∈ M ,
d(m, mk ) < for at least one mk .
Theorem (BanachSteinhaus, uniform boundedness principle) Let X, Y be Banach spaces and let Tk , k ∈ K be a family (not necessarily countable) of continuous
linear operators from X to Y . Assume that
sup |Tk x|Y < ∞ for all x ∈ X.
k∈K
Then there exists a constant c such that
|Tk x|Y ≤ c |x|X for all k ∈ K and x ∈ X.
Proof: Define
Xn = {x ∈ X : |Tk x| ≤ n for all k ∈ K}.
The, Xn is closed, and by the assumption we have X = ∪n Xn . It follows from the
Baire category theorem that int(X0 ) is not empty for some n0 Pick x0 ∈ X and r > 0
such that B(x0 , r) ⊂ int(X0 ). We have
Tk (x0 + r z) ≤ n0 for all k and z ∈ B(0, 1),
which implies
r |Tk |L(X,Y ) ≤ n0 + |Tk x0 |.
The uniform bounded principle is quite remarkable and surprising, since from pointwise
estimates one derives a global (uniform) estimate. We have the following examples.
Examples (1) Let X be a Banach space and B be a subset of X. If for every f ∈ X ∗
the set f (B) = {hf, xi, x ∈ B} is bounded in R, then B is bounded.
18
(2) Let X be a Banach space and B ∗ be a subset of X ∗ . If for every x ∈ X the set
hB ∗ , xi = {hf, xi, f ∈ B ∗ } is bounded in R, then B ∗ is bounded.
Theorem (Banach closed graph theorem) A closed linear operator on a Banach
space X to a Banach space Y is continuous.
2.4
Dual space and Hahn-Banach theorem
In this section we discuss the dual space of a normed space and the Hahn-Banach
theorem.
Definition (Dual Space) If X be a normed space, let X ∗ denote the collection of all
bounded linear functionals on X. X ∗ is called the dual space of X. We define the dual
product hx, f iX×X ∗ by
for x ∈ X f ∈ X ∗ .
hf, xiX ∗ ×X = f (x)
For f ∈ X ∗ we define the operator norm of f by
|f |X ∗ = sup
x6=0
|f (x)|
.
|x|X
Theorem (Dual space) The dual space X ∗ equipped with the operator norm is a
Banach space.
Proof: {fn } be a Cauchy sequence in X ∗ , i.e.
|fn (φ) − fm (φ)| ≤ |fn − fm |∗ |φ| → 0 as m → ∞,
for all φ ∈ X. Thus, {fn (φ)} is Cauchy in R, define the functional f on X by
f (φ) = lim fn (φ),
n→∞
φ ∈ X.
Then f is linear and for all > 0 there exists N such that for m, n ≥ N
|fm (φ) − fn (φ)| ≤ |φ|
Letting m → ∞, we have
|f (φ) − fn (φ)| ≤ |φ| for all n ≥ N .
and thus f ∈ X ∗ and |fn − f |∗ → 0 as n → ∞. Examples (Dual space) (1) Let `p be the space of p-summable infinite sequences x =
1
P
(s1 , s2 , · · · ) with norm |x|p = ( |sk |p ) p . Then, (`p )∗ = `q , p1 + 1q = 1 for 1 ≤ p <
∞. Let c = `∞ be the space of uniformly bounded sequences x = (ξ1 , ξ2 , · · · ) with
norm |x|∞ = maxk |ξk | and c0 be the space of uniformly bounded sequences with
lim n → ∞ξn = 0. Then, (`1 )0 = c0
(2) Let Lp (Ω) is p-integrable functions f on Ω with
Z
1
|f |Lp = ( |f (x)|p dx) p
Ω
19
Then, Lp (Ω)∗ = Lq (Ω),
L1 (Ω)∗ = L∞ (Ω), where
1
p
+
1
q
= 1 for p ≥ 1. Especially, L2 (Ω)∗ = L2 (Ω) and
L∞ (Ω) = the space of essentially bounded functions on Ω
with
|f |L∞ = inf M where |f (x)| ≤ M almost everywhere on Ω.
(3) Consider a linear functional f (x) = x(1). Then, f is in X1∗ but not in X2∗ since
n
n
n
for
q the sequence defined by x (t) = t , t ∈ [0, 1] we have f (xn ) = 1 but |x |X 2 =
1
2n+1
→ 0 as n → ∞. Since for x ∈ X3
Z
t
|x(t)| = |x(0) +
x0 (s) ds| ≤
0
s
Z
1
|x0 (t)|2 dt,
0
thus f ∈ X3∗ .
(4) The total variation V01 (g) of a function g on an interval [0, 1] is defined by
V01 (g) = sup
n−1
X
|g(xi+1 ) − g(xi )|,
P ∈P i=0
where the supremum is taken over all partitions P = {P : 0 ≤ t0 ≤ · · · ≤ tn ≤ 1} of
the interval [0, 1]. Then, BV (0, 1) = {uniformly bounded function g with V01 (f ) < ∞}
and |g|BV = V01 (g). It is a normed space if we assume g(0) = 0. For example, every
monotone functions is of bounded variation. Every real-valued BV-function can be
expressed as the difference of two increasing functions (Jordan decomposition theorem).
Hahn-Banach Theorem Suppose S is a subspace of a normed space X and f : S → R
is a linear functional satisfying f (x) ≤ |x| for all x ∈ S. Then there exists an F ∈ X ∗
such that
F (x) ≤ |x| and F (s) = f (s) for all s ∈ S.
In general, the theorem holds for a vector space X and sub-additive and positive
homogeneous function p, i.e., for all x, y ∈ X and α > 0
p(x + y) ≤ p(x) + p(y),
p(αx) = αp(x).
Every linear functional f on a subspace S, satisfying f (x) ≤ p(x) has a extension F
on X satisfying F (x) ≤ p(x) for all x ∈ X.
Proof: First, we show that there exists an one step extension of f , i.e., for x0 ∈ X \ S,
there exists an extension f1 of f on the subspace Y1 spanned by (S, x0 ) such that
f1 (x) ≤ p(x) on Y1 . Note that for arbitrary s1 , s2 ∈ S,
f (s1 ) + f (s2 ) ≤ p(s1 − x0 ) + p(s2 + x0 ).
Hence, there exist a constant c such that
sup{f (s) − p(s − x0 )} ≤ c ≤ inf {p(s + x0 ) − f (s)}.
s∈S
s∈S
20
Extend f by defining, for s ∈ S and α ∈ R
f1 (s + αx0 ) = f (s) + α c,
where f1 (x0 ) = c, is determined as above. We claim that
f1 (s + αx0) ≤ p(s + αx0 )
for all α ∈ R and s ∈ S, i.e., by the definition of c, for α > 0
s
s
s
s
f1 (s + αx0 ) = α(c + f ( ) ≤ α (p( + x0 ) − f ( ) + f ( ) = p(s + αx0 ),
α
α
α
α
and for all α < 0
f1 (s + αx0 ) = −α(−c + f (
s
s
s
s
) ≤ −α(p(
− x0 ) − f (
) + f(
) = p(s + αx0 ).
−α
−α
−α
−α
If X is a separable normed space. Let {x1 , x2 , . . . } be a countable dense base of X \ S.
Select victors from this dense vectors, which is independent of S, recursively. Extend f to subspaces span{span{span{S, x1 }, x2 }, . . . } recursively, as described above.
The space span{S, x1 , x2 , . . . } is dense in X. The final extension F is defined by the
continuity. In general, the remaining proof is done by using Zorns lemma. Corollary Let X be a normed space. For each x0 ∈ X there exists an f ∈ X ∗ such
that
f (x0 ) = |f |X ∗ |x0 |X .
Proof of Corollary: Let S = {α x0 : α ∈ R} and define f (α x0 ) = α |x0 |X . By HahnBanach theorem there exits an extension F ∈ X ∗ of f such that F (x) ≤ |x| for all
x ∈ X. Since
−F (x) = F (−x) ≤ | − x| = |x|,
we have |F (x)| ≤ |x|, in particular |F |X ∗ ≤ 1. On the other hand, F (x0 ) = f (x0 ) =
|x0 |, thus |F |X ∗ = 1 and F (x0 ) = f (x0 ) = |F ||x0 |. Example (Hahn-Banach Theorem) (1) |x| = sup|x∗ |=1 hx∗ , xi
(2)
Next, we prove the geometric form of the Hahn-Banach theorem. It states that
given a convex set K containing an interior point and x0 ∈
/ K, then x0 is separated
from K by a hyperplane. Let K be a convex set and 0 is an interior point of K. A
hyperplane is a maximal affine space x0 +M where M is a subspace of X. We introduce
the Minkowski functional that defines the distance from the origin measured by K.
Definition (Minkowski functional) Let K be a convex set in a normed space X
and 0 is an interior point of K. Then the Minkowski functional p of K is defined by
p(x) = inf{r :
x
∈ K, r > 0}.
r
Note that if K is the unit sphere in X, p(x) = |x|.
Lemma (Minkowski functional)
(1) 0 ≤ p(x) < ∞.
(2) p(αx) = α p(x) for α ≥ 0.
21
(3)
(4)
(5)
p(x1 + x2 ) ≤ p(x1 ) + p(x2 ).
p is continuous.
¯ = {x : p(x) ≤ 1}, int(K) = {x : p(x) < 1}.
K
Proof: (1) Since K contains a sphere at 0, for every x ∈ X there exists an r > 0 such
that xr ∈ K and thus p(x) is finite.
(2) For α > 0
p(αx) = inf{r :
αx
x
x
∈ K} = inf{αr0 : 0 ∈ K} = α inf{r0 : 0 ∈ K} = α p(x).
r
r
r
(3) Let p(xi ) < ri < p(xi )+, i = 1, 2 for arbitrary > 0. Let r = r1 +r2 . By convexity
2
of K since rr1 xr11 + rr2 xr22 = x1 +x
∈ K and thus p(x1 + x2 ) ≤ r ≤ p(x1 ) + p(x2 ) + 2.
r
Since > 0 is arbitrary, p is subadditive.
(4) Let be a radius of a closed sphere centered at 0 and contained in K. Since
x
x
|x|
∈ K, p( |x|
) ≤ 1 and thus p(x) ≤ 1 |x|. This shows that p is continuous at 0. Since
p is sub linear
p(x) = p(x − y + y) ≤ p(x − y) + p(y) and p(y) = p(y − x + x) ≤ p(y − x) + p(x)
and thus
−p(y − x) ≤ p(x) − p(y) ≤ p(x − y).
Hence, p is continuous since it is continuous at 0.
(5) It follows from the continuity of p. Lemma (Hyperplane) Let H be a hyperplane in a vector space X if and only if
there exist a linear functional f on X and a constant c such that H = {f (x) = c}.
Proof: If H ia hyperplane, then H is the translation of a subspace M in X, i.e.,
H = x0 + M . If x0 6 inM , then [M + x0 ] = X. if for x = α x0 + m, m ∈ M we let
f (x) = α, we have H = {x : f (x) = 1}. Conversely, if f is a nonzero linear functional
on X, then M = {x : f (x) = 0} is a subspace. Let x0 ∈ X such that f (x0 ) = 1. Since
f (x − f (x)x0 ) = 0, X = span[{x0 }, M ] and {x : f (x) = c} = {f (x − c x0 ) = 0} is a
hyperplane.
Theorem (Mazur’s Theorem) Let K be convex set with a nonempty interior point
in a normed space X. Suppose V is an affine space in X that contains no interior
points of K. Then, there exist x∗ ∈ X ∗ and constant c such that the hyperplane
langlex∗ , vi = c has the separation:
hx∗ , vi = c for all v ∈ V and hx∗ , ki < c for all k ∈ int(K).
Proof: By an appropriate translation one may assume that 0 is an interior point of
K. Let M is the subspace spanned by V . Since V does not contain 0, exists a linear
functional f on M such that V = {x : f (x) = 1}. Let p be the Minkowski functional
of K. Since V contains no interior points of K, f (x) = 1 ≤ p(x) for all x ∈ V . By the
homogeneity of p, f (x) ≤ p(x) for all x ∈ V By the Hahn-Banach space, there exits an
extension F of f to V from M to X with F (x) ≤ p(x). From Lemma p is continuous
and F is continuous and F (x) < 1 for x ∈ int(K). Thus, H = {x : F (x) = 1} is the
desired closed hyperplane. 22
Example (Hilbert space) If X is a Hilbert space and S is a subspace of X. Then the
extension F is defined as
lim f (sn ) for s ∈ S
F (s) =
0
for s ∈ S ⊥
where X = S ⊕ S ⊥ .
R1
Example (Integral) Let X = L2 (0, 1) S = C(0, 1) and f (x) = R − 0 x(t) dt be the
R1
Riemman integral of x ∈ S . Then the extension F is F (x) = L − 0 x(t) dt is the
Lebesgue integral of x ∈ X.
Example (Alignment) (1) For Hilbert space X f = x0 is self aligned.
(2) Let x0 ∈ X = Lp (Ω), 1 ≤ p < ∞. For f ∈ X ∗ = Lq (Ω) defined by by
f (t) = |x0 (t)|p−2 x0 (t), t ∈ Ω,
Z
|x0 (t)|p dt.
f (x) =
Ω
Thus,
x0
|x0 |p−2
∈ X ∗ is aligned with x0 .
A function g is called right continuous if limh→0+ g(t + h) = g(t). Let
BV0 [0, 1] = {g ∈ BV0 [0, 1] : g is right continuous on [0, 1) and g(0) = 0}.
Let C[0, 1] is the normed space of continuous functions on [0, 1] with the sup norm.
For f ∈ C[0, 1] and g ∈ BV0 [0, 1] define the Riemann-Stieltjes integral by
Z t
X
x(t)dg(t) = lim R(f, g, P ) = lim
f (zk )(g(tk ) − g(tk−1 )
0
∆t
∆t
k=1
for the partition P = {0 = t0 < · · · < tn = 1} and zk ∈ [tk−1 , tk ]. There is a version of
the Riesz representation theorem.
Theorem (Dual of C[0, 1]) There is a norm-preserving linear isomorphism from C[0, 1]
to V [0, 1], i.e., for F ∈ C[0, 1]∗ there exists a unique g ∈ BV0 [0, 1] such that
Z 1
F (x) =
x(t)dg(t)
0
and
|F |C[0,1]∗ = |g|BV .
That is, C[0, 1]∗ ∼
= BV (0, 1), the space of functions with bounded variation.
Proof: Define the normed space of uniformly bound function on [0, 1] by
B = {x : [0, 1] → R is bounded on [0, 1]} with norm |x|B = sup |x(t)|.
t∈[0,1]
By the Hahn-Banach theorem, an extension F of f from C[0, 1] to B exists and |F | =
|f |. For any s ∈ (0, 1], define χ[0,s] ∈ B and define v(s) = F (χ[0,s] ). Then,
P
P
k |v(tk ) − v(tk−1 )| =
k sign(v(tk ) − v(tk−1 ))(v(tk ) − v(tk−1 ))
P
= sign(v(tk ) − v(tk−1 )(F (χ[0,tk ] ) − F (χ[0,tk−1 ] )) = F ( k sign(v(tk ) − v(tk−1 )(χ[0,tk ] − χ[0,tk−1 ] )
≤ |F ||
P
k
sign(v(tk ) − v(tk−1 )χ(tk−1 ,tk ] | ≤ |f |.
23
Thus, v ∈ BV [0, 1] and |v|BV ≤ |F |. Next, for any x ∈ C[0, 1] define the function
X
z(t) =
x(tk−1 (χ[0,tk ] − χ[0,tk−1 ] ).
k
Note that the difference
|z − x|B = max
k
max
t∈[tk−1 ,tk ]
|x(tk−1 ) − x(t)| → 0 as ∆t → 0+ .
Since F is bounded F (z) → F (x) = f (x) and
F (z) =
X
Z
x(tk−1 )(v(tk ) − v(tk−1 )) →
1
x(t)dv(t)
0
k
we have
1
Z
x(t)dv(t)
f (x) =
0
and |F | ≤ |x|C[0,1] |v|BV . Hence F | = |f | = |v|BV . Riesz Representation Theorem For every bounded linear functional f on a Hilbert
space X there exits a unique element yf ∈ X such that
f (x) = (x, yf ) for all x ∈ X, and |f |X ∗ = |yf |X .
We define the Riesz map R : X ∗ → X by
Rf = yf ∈ X
for f ∈ X ∗
for a Hilbert space X.
Proof: There exists a z ∈ X such that (z, x) = 0 for all f (x) = 0 and |z| = 1. Otherwise,
f (x) = 0 for all x and f = 0. Note that for every x ∈ X we have f (f (x)z − f (z)x) = 0.
Thus,
0 = (f (x) − f (z)x, z) = (f (x)z, z) − (f (z)x, z)
which implies
f (x) = (x, f (z)z)
The existence then follows by taking yf = f (z)z. Let for u1 , u2 f (x) = (x, u1 ) = (x, u2 )
for all x ∈ X. Then, then (x, u1 − u2 ) = 0 and by taking x = u1 − u2 we obtain
|u1 − u2 | = 0, which implies the uniqueness.
Definition (Reflexive Banach space) For a Banach space X we define an injection
i of X into X ∗∗ = (X ∗ )∗ (the double dual) as follows. If x ∈ X then i x ∈ X ∗∗ is
defined by i x(f ) = f (x) for f ∈ X ∗ . Note that
|i x|X ∗∗ = sup
f ∈X ∗
|f (x)|
|f |X ∗ |x|X
≤ sup
= |x|X
|f |X ∗
|f |X ∗
f ∈X ∗
an thus |i x|X ∗ ∗ ≤ |x|X . But by Hahn-Banach theorem there exists an f ∈ X ∗ such
that |f |X ∗ = 1 and f (x) = |x|X . Hence |i x|X ∗ ∗ = |x|X and i : X → X ∗∗ is an isometry.
If i is also subjective, then we say that X is reflexive and we may identify X with X ∗∗ .
24
Definition (Weak and Weak star Convergence) (1) A sequence {xn } in a normed
space X is said to converge weakly to x ∈ X if limn→∞ f (xn ) → f (x) for all f ∈ X ∗
(2) A sequence {fn } in the dual space X ∗ of a normed space X is said to converge
weakly star to f ∈ X ∗ if limn→∞ fn (x) → f (x) for all x ∈ X.
Notes (1) If xn converges strongly to x ∈ X then xn converges weakly to x, but not
conversely.
(2) If xn converges weakly to x, then {|xn |X } is bounded and |x| ≤ lim inf |xn |X , i.e.,
the norm is weakly lower semi-continuous.
(3) A sequence {xn } in a normed space X converges weakly to x ∈ X if limn→∞ f (xn ) →
f (x) for all f inS, where S is a dense set in X ∗ .
(4) If X be a Hilbert space. If a sequence {xn } of X converges weakly to x ∈ X, then
xn converges strongly to x if and only if limn→∞ |xn | = |x|. In fact,
|xn − x|2 = (xn − x, xn − x) = |xn |2 − (xn , x) − (x, xn ) + |x|2 → 0
as n → ∞ if xn converges weakly to x.
Example (Weak Convergence) (1) Let X = L2 (0, 1) and consider a sequence xn =
√
2 sin(n πt), t ∈ (0, 1). Then |xn | = 1 and xn converges weakly to 0. In fact, since a
family {sin(kπt)} is dense in L2 (0, 1) and
(xn , sin(kπt))L2 → 0 as n → ∞
for all k and thus xn converges weakly to 0. Since |0| = 0, the sequence xn does not
converge strongly to 0.
In general, if xn → x weakly and yn → y strongly in a Hilbert space, then (xn , yn ) →
(x, y) since
(xn , yn ) = (xn − x, y) + (xn , yn − y).
R1
But, note that (xn , xn ) = 2 0 sin2 (nπt) dt = 1 6= 0 and (xn , xn ) does not converges to
0. In general (xn , yn ) does not necessarily converge to (x, y) if xn → x and yn → y
weakly in X.
Alaoglu Theorem If X is a normed space and {x∗n } is a bounded sequence in X ∗ ,
then there exists a subsequence that converges weakly star to an element of X ∗ .
Proof: We prove the theorem for the case when X is separable. Let {x∗n } be sequence
in X ∗ such that |x∗n | ≤ 1. Let {xk } is a dense sequence in X. The sequence {hx∗n , x1 i}
is a bounded sequence in R and thus a convergent subsequence denoted by {hx∗n1 , x1 i}.
Similarly, the sequence {hx∗n1 , x2 i} contains a a convergent subsequence {hx∗n2 , x1 i}.
Continuing in this manner we extract subsequences {hx∗nk , xk i} that converges in R.
Thus, the diagonal sequence {x∗nn } on the dense subset {xk }, i.e., {hx∗nn , xk i} converges
for each xk . We will show that x∗nn converges weakly start to an element x∗ ∈ X ∗ . Let
a x ∈ X be fixed. For arbitrary > 0 there exists N ≥ N such that
|x −
N
X
k=1
αk xk | ≤ .
3
Thus,
|hx∗nn , xi − hx∗mm , xi| ≤ |hx∗nn , x −
+|hx∗nn − x∗mm ,
PN
k=1 αk xk i|
PN
k=1 αk xk i|
+ |hx∗mm , x −
25
PN
k=1 αk xk i|
≤
for sufficiently large nn, mm. Thus, {hx∗nn , xi} is a Cauchy sequence in R and converges
to a real number hx∗ , xi. The functional hx∗ , xi so defined is obviously linear and
|x∗ | ≤ 1 since |hx∗nn , xi| ≤ |x|. Example (Weak Star Convergence) (1) Let X = L1 (Ω) and L∞ (Ω) = X ∗ . Thus, a
bounded sequence in L∞ (Ω) has a weakly star convergent subsequence. Let X =
L∞ (0, 1) and {xn } be a sequence in X defined by

1
on [ 12 + n1 , 1]





n x on [ 12 − n1 , 12 − n1 ]
xn (t) =





−1 on [0, 12 − n1 ];
Then |xn |X = 1 and {xn } converges weakly star to the sign function x(t) = sign(t− 21 ).
But, since |xn − x| = 12 , {xn } does not converge strongly to x.
(2) Let X = c0 , the space of infinite sequences x = (x1 , x2 , · · · ) that converge to zero
with norm |x|∞ = maxk |xk |. Then X ∗ = `1 and X ∗∗ = `∞ . Let {x∗n } in `1 = X ∗
defined by x∗n = (0, · · · , 1, 0, · · · ) where 1 appears at n−th term only. Then, x∗n converge
weakly star to 0 in X ∗ but x∗n does not converge weakly to 0 since hx∗n , x∗∗ i does not
converge for x∗∗ = (1, 1, · · · ) ∈ X ∗∗ .
Example (L1 (Ω) bounded sequence) Let X = L1 (0, 1) and let {xn } ∈ X be a sequence
defined by
1 1
1
n
t ∈ ( 21 − 2n
, 2 + 2n
)
xn (t) =
0
otherwise.
Then, |xn |L1 = 1 and
Z
0
1
1
xn (t)φ(t) dt → φ( )
2
for all φ ∈ C[0, 1] but a linear functional f (φ) = φ( 12 ) is not bounded on L∞ (0, 1).
Thus, xn does not converge weakly. Actually, sequence xn converges weakly star in
C[0, 1]∗ .
Eberlein-Suhmulyan Theorem A Banach space X is reflexive if and only if and
only if every closed bounded set is weakly sequentially compact, i.e., every bounded
sequence of X contains a subsequence that converges weakly to an element of X.
Next, we consider linear operator T : X → Y .
Definition (Linear Operator) Let X, Y be normed spaces. A linear operator T on
D(T ) ⊂ X into Y satisfies
T (α x1 + β x2 ) = α T x1 + β T x2
where the domain D(T ) is a subspace of X. Then a linear operator T is continuous if
and only if there exists a constant M ≥ 0 such that
|T x|Y ≤ M |x|X
for all x ∈ X.
For a continuous linear operator we define the operator norm |T | by
|T | = sup |T x|Y = inf M,
|x|X =1
26
where we assume D(T ) = X. A continuous linear operator on a normed space X into
Y is called a bounded linear operator on X into Y and will be denoted by T ∈ L(X, Y ).
The norm of T ∈ L(X, Y ) is denoted either by |T |L(X,Y ) or simply by |T |. Note that
the space L(X, Y ) is a Banach space with the operator norm and X ∗ = L(X, R).
Open Mapping theorem Let T be a continuous linear operator from a Banach space
X onto a Banach space Y . Then, T is an open map, i.e., maps every open set to of X
onto an open set in Y .
Proof: Suppose A : X → Y is a surjective continuous linear operator. In order to prove
that A is an open map, it is sufficient to show that A maps the open unit ball in X to
a neighborhood of the origin of Y . Let U S
= B1 (0) and V = B1 (0) be a open unit ball
of 0 in X and Y , respectively. Then X = k∈N kU . Since A is surjective,
!
[
[
Y = A(X) = A
kU =
A(kU ).
k∈N
k∈N
Since
Y is a Banach, by the
Baire’s◦category theorem there exists a k ∈ N such that
◦
A(kU )
6= ∅. Let c ∈ A(kU ) . Then, there exists r > 0 such that Br (c) ⊆
◦
A(kU ) . If v ∈ V , then
◦
c, c + rv ∈ Br (c) ⊆ A(kU ) ⊆ A(kU ).
◦
and the difference rv = (c + rv) − c of elements in A(kU ) satisfies rv ∈⊆ A(2kU ).
Since A is linear, V ⊆ A 2k
r U . Thus, it follows that for all y ∈ Y and > 0,
there exists x ∈ X such that |x|X <
2k
r |y|Y
and |y − Ax|Y < .
(2.1)
Fix y ∈ δV , by (2.1), there is some x1 ∈ X with |x1 | < 1 and |y − Ax1 | < 2δ . By (2.1)
one can find a sequence {xn } inductively by
|xn | < 2−(n−1)
and |y − A(x1 + x2 + · · · + xn )| < δ2−n .
(2.2)
Let sn = x1 +x2 +· · ·+xn . From the first inequality in (2.2), {sn } is a Cauchy sequence,
and since X is complete, {sn } converges to some x ∈ X. By (2.2), the sequence Asn
tends to y in Y , and thus Ax = y by continuity of A. Also, we have
|x| = lim |sn | ≤
n→∞
∞
X
|xn | < 2.
n=1
This shows that every y ∈ δ V belongs to A(2U ), or equivalently, that the image A(U )
of U = B1 (0) in X contains the open ball 2δ V ⊂ Y . Corollary (Open Mapping theorem) A bounded linear operator T on Banach
spaces is bounded invertible if and only if it is one-to-one and onto.
Definition (Closed Linear Operator) (1) The graph G(T ) of a linear operator T
on the domain D(T ) ⊂ X into Y is the set (x, T x) : x ∈ D(T )} in the product space
X × Y . Then T is closed if its graph G(T ) is a closed linear subspace of X × Y , i.e.,
27
if xn ∈ D(T ) converges strongly to x ∈ X and T xn converges strongly to y ∈ Y , then
x ∈ D(T ) and y = T x. Thus the notion of a closed linear operator is an extension of
the notion of a bounded linear operator.
(2) A linear operator T is said be closable if xn ∈ D(T ) converges strongly to 0 and
T xn converges strongly to y ∈ Y , then y = 0.
For a closed linear operator T , the domain D(T ) is a Banach space if it is equipped
by the graph norm
1
|x|D(T ) = (|x|2X + |T x|2Y ) 2 .
Example (Closed linear Operator) Let T =
d
dt
with X = Y = L2 (0, 1) is closed and
dom (A) = H 1 (0, 1) = {f ∈ L2 (0, 1) : absolutely continuous functions on [0, 1]
with square integrable derivative}.
If yn = T xn , then
Z
xn (t) = xn (0) +
t
yn (s) ds.
0
If xn ∈ dom(T ) → x and yn → y in L2 (0, 1), then letting n → ∞ we have
Z t
y(s) ds,
x(t) = x(0) +
0
i,e., x ∈ dom (T ) and T x = y. In general if for λ I + T for some λ ∈ R has a bounded
inverse (λ I + T )−1 , then T : dom (A) ⊂ X → X is closed. In fact, T xn = yn is
equivalent to
xn = (λ I + T )−1 (yn + λ xn
Suppose xn → x and yn → y in X, letting n → ∞ in this, we have x ∈ dom (T ) and
T x = T (λ I + T )−1 (λ x + y) = y.
Definition (Dual Operator) Let T be a linear operator X into Y with dense domain
D(T ). The dual operator of T ∗ of T is a linear operator on Y ∗ into X ∗ defined by
hy ∗ , T xiY ∗ ×Y = hT ∗ y ∗ , xiX ∗ ×X
for all x ∈ D(T ) and y ∗ ∈ D(T ∗ ).
In fact, for y ∗ ∈ Y ∗ x∗ ∈ X ∗ satisfying
hy ∗ , T xi = hx∗ , xi for all x ∈ D(T )
is uniquely defined if and only if D(T ) is dense. The only if part follows since if
D(T ) 6= X then the Hahn-Banach theory there exits a nonzero x∗0 ∈ X ∗ such that
hx∗0 , xi = 0 for all D(T ), which contradicts to the uniqueness assumption. If T is
bounded with D(T ) = X then T ∗ is bounded with kT k = kT ∗ k.
Examples Consider the gradient operator T : L2 (Ω) → L2 (Ω)n as
T u = ∇u = (Dx1 u, · · · Dxn u)
with D(T ) = H 1 (Ω). The, we have for v ∈ L2 (Ω)n
X
T ∗ v = −div v = −
D xk v k
28
with domain D(T ∗ ) = {v ∈ L2 (Ω)n : div v ∈ L2 (Ω) and n · v = 0 at the boundary ∂Ω}.
In fact by the divergence theorem
Z
Z
Z
(T u, v) =
∇u · v
(n · v)u ds −
u(div v) dx = (u, T ∗ v)
Ω
∂Ω
Ω
for all v ∈ C 1 (Ω). First, let u ∈ H01 (Ω) we have T ∗ v = −div v ∈ L2 (Ω) since H01 (Ω) is
dense in L2 (Ω). Thus, n · v ∈ L( ∂Ω) and n · v = 0.
Definition (Hilbert space Adjoint operator) Let X, Y be Hilbert spaces and T
be a linear operator X into Y with dense domain D(T ). The Hilbert self adjoint
operator of T ∗ of T is a linear operator on Y into X defined by
(y, T x)Y = (T ∗ y, x)X
for all x ∈ D(T ) and y ∈ D(T ∗ ). Note that if we let T 0 : Y ∗ → X ∗ is the dual operator
of T , then
T ∗ RY ∗ →Y = RX ∗ →X T 0
where RX ∗ →X and RY ∗ →Y are the Riesz maps.
Examples (self-adjoint operator) Let X = L2 (Ω) and T be the Laplace operator
T u = ∆u =
n
X
D xk xk u
k=1
with domain D(T ) = H 2 (Ω) ∩ H01 (Ω). Then T is sel-adjoint, i.e., T ∗ = T . In fact
Z
Z
Z
(T u, v)X =
∆u v dx =
((n · ∇u)v − (n · ∇v)u) ds +
∆v u dx = (x, T ∗ v)
Ω
∂Ω
Ω
for all v ∈ C 1 (Ω).
Closed Range Theorem Let X and Y be Banach spaces and T a closed linear operator on D(T ) ⊂ X into Y . Assume that D(T ) is dense in X. The the following
properties are all equivalent.
(a) R(T ) is closed in Y .
(b) R(T ∗ ) is closed in X ∗ .
(c) R(T ) = N (T ∗ )⊥ = {y ∈ Y : hy ∗ , yi = 0 for all y ∗ ∈ N (T ∗ )}.
(d) R(T ∗ ) = N (T )⊥ = {x∗ ∈ X ∗ : hx∗ , xi = 0 for all x∗ ∈ N (T )}.
Proof: If y ∗ ∈ N (T ∗ )⊥ , then
hT x, y ∗ i = hx, T ∗ y ∗ i
for all x ∈ dom(T ) and N (T ∗ )⊥ ⊂ R(T ). If y ∈ R(T ), then
hy, y ∗ i = hx, T ∗ y ∗ i
for all y ∗ ∈ N (T ∗ ) and R(T ) ⊂ N (T ∗ )⊥ . Thus, R(T ) = N (T ∗ )⊥ .
Since T is closed, the graph G = G(T ) = {(x, T x)inX × Y : x ∈ dom(T )} is closed.
Thus, G is a Banach space equipped with norm |{x, y}| = |x| + |y| of X × Y . Define a
continuous linear operator S from G into Y by
S{x, T x} = T x
29
Then, the dual operator S ∗ of S is continuous linear operator from Y ∗ intoG∗ and we
have
h{x, T x}, S ∗ y ∗ i = hS{x, T x}, y ∗ i = hT x, y ∗ i = h{x, T x}, {0, y ∗ }i
˜ ∗ = S ∗ y ∗ − {0, y ∗ } : Y ∗ →
for x ∈ dom(T ) and y ∗ ∈ Y ∗ . Thus, the functional S˜ Sy
∗
∗
X ×Y vanishes for all elements of G. But, since h{x, T x}, {x∗ , y1∗ } for all x ∈ dom(T ),
is equivalent to hx, x∗ i = h−T x, y1∗ i for all x ∈ dom(T ), i.e., −T ∗ y1∗ = x∗ ,
S ∗ y ∗ = {0, y ∗ } + {−T ∗ y1∗ , y1∗ } = {−T ∗ y1∗ , y ∗ + y1∗ }
for all y ∗ ∈ Y ∗ . Since y1∗ is arbitrary, R(S ∗ ) = R(T ∗ ) × Y ∗ . Therefore, R(S ∗ ) is closed
in X ∗ × Y ∗ if and only if R(T ) is closed and since R(S) = R(T ), R(S) is closed if and
only if R(T ) is closed in Y . Hence we have only prove the equivalency of (a) and (b)
in the special case of a bounded operator S.
If T is bounded, then (a) implies(b). Since R(T ) a Banach space, by the HahnBanach theorem, one can assume R(T ) = Y without loss of generality
hT x, y1∗ i = hx, T ∗ y1∗ i
By the open mapping theorem, there exists c > 0 such that for each y ∈ Y there exists
an x ∈ X with T x = y and |x| ≤ c |y|. Thus, for y ∗ ∈ Y ∗ , we have
|hy, y ∗ i| = |hT x, y ∗ i| = |hx, T ∗ y ∗ i| ≤ c|y||T ∗ y ∗ |.
and
|y ∗ | ≤ sup |hy, y ∗ i| ≤ c|T ∗ y ∗ |
|y|≤1
(T ∗ )−1
and so
exists and bounded and D(T ∗ ) = R(T ∗ ) is closed.
Let T1 : X → Y1 = R(T ) be defined by T1 x = T x For y1∗ ∈ Y1∗
hT1 x, y1∗ i = hx, T ∗ y ∗ i = 0,
for all x ∈ X
and since R(T ) is dense in R(T ), y1∗ = 0. Since R(T ∗ ) = R(T1∗ ) is closed, (T1∗ )−1 exists
on Y1 = (Y1 )∗
2.5
Distribution and Generalized Derivatives
In this section we introduce the distribution (generalized function). The concept of
distribution is very essential for defining a generalized solution to PDEs and provides
the foundation of PDE theory. Let D(Ω) be a vector space of all infinitely many continuously differentiable functions C0∞ (Ω) with compact support in Ω. For any compact
set K of Ω, let DK (Ω) be the set of all functions f ∈ C0∞ (Ω) whose support are in K.
Define a family of seminorms on D(Ω) by
pK,m (f ) = sup sup |Ds f (x)|
x∈K |s|≤m
where
Ds =
∂
∂x1
s1
···
30
∂
∂xn
sn
where s = (s1 , · · · , sn ) is nonnegative integer valued vector and |s| =
DK (Ω) is a locally convex topological space.
P
sk ≤ m. Then,
Definition (Distribution) A linear functional T defined on C0∞ (Ω) is a distribution
if for every compact subset K of Ω, there exists a positive constant C and a positive
integer k such that
|T (φ)| ≤ C sup|s|≤k, x∈K |Ds φ(x)| for all φ ∈ DK (Ω).
Definition (Generalized Derivative) A distribution S defined by
S(φ) = −T (Dxk φ) for all φ ∈ C0∞ (Ω)
is called the distributional derivative of T with respect to xk and we denote S = Dxk T .
In general we have
S(φ) = Ds T (φ) = (−1)|s| T (Ds φ) for all φ ∈ C0∞ (Ω).
This definition is naturally followed from that for f is continuously differentiable
Z
Z
∂
Dxk f φ dx = − f
φ dx
Ω
Ω ∂xk
and thus Dxk f = Dxk Tf = T
tive of Tf .
∂
∂xk
f . Thus, we let Ds f denote the distributional deriva-
Example (Distribution) (1) For f is a locally integrable function on Ω, one defines the
corresponding distribution by
Z
Tf (φ) =
f φ dx for all φ ∈ C0∞ (Ω).
Ω
since
Z
|Tf (φ)| ≤
|f | dx sup |φ(x)|.
x∈K
K
(2) T (φ) = φ(0) defines the Dirac delta δ0 at x = 0, i.e.,
|δ0 (φ)| ≤ sup |φ(x)|.
x∈K
(3) Let H be the Heaviside function defined by
0
for x < 0
H(x) =
1
for x ≥ 0
Then,
Z
DTH (φ) = −
∞
H(x)φ0 (x) dx = φ(0)
−∞
and thus DTH = δ0 is the Dirac delta function at x = 0.
(4) The distributional solution for −D2 u = δx0 satisfies
Z ∞
−
uφ00 dx = φ(x0 )
−∞
31
for all φ ∈ C0∞ (R). That is, u = 21 |x − x0 | is the fundamental solution, i.e.,
Z ∞
Z x0
Z ∞
00
0
φ0 (x) dx = 2φ(x0 ).
φ (x) dx −
|x − x0 |φ dx =
−
∞
−∞
x0
In general for d ≥ 2 let
G(x, x0 ) =


1
4π
log|x − x0 |
d=2

cd |x − x0 |2−d
d ≥ 3.
Then
∆G(x, x0 ) = 0,
x 6= x0 .
and u = G(x, x0 ) is the fundamental solution to to −∆ in Rd ,
−∆u = δx0 .
In fact, let B = {|x−x0 | ≤ } and Γ = {|x−x0 | = } be the surface. By the divergence
theorem
Z
Z
∂
∂
G(x, x0 )∆φ(x) dx =
φ(G(x, x0 ) −
G(x, x0 )φ(s)) ds
∂ν
Rd \B (x0 )
Γ ∂ν
Z
=
(2−d
Γ
∂φ
1
− (2 − d)1−d φ(s)) ds → φ(x0 )
∂ν
cd
That is, G(x, x0 ) satisfies
Z
−
G(x, x0 )∆φ dx = φ(x0 ).
Rd
In general let L be a linear diffrenrtial operator and L∗ denote the formal adjoint
operator of L An locally integrable function u is said to be a distributional solution to
Lu = T where L with a distribution T if
Z
u(L∗ φ) dx = T (φ)
Ω
for all φ ∈ C0∞ (Ω).
Definition (Sovolev space) For 1 ≤ p < ∞ and m ≥ 0 the Sobolev space is
W m,p (Ω) = {f ∈ Lp (Ω) : Ds f ∈ Lp (Ω), |s| ≤ m}
with norm
|f |W m,p (Ω)

Z

=
1
p
X
s
p
|D f | dx .
Ω |s|≤m
That is,
|Ds f (φ)| ≤ c |φ|Lq with
32
1 1
+ = 1.
p q
Remark (1) X = W m,p (Ω) is complete. In fact If {fn } is Cauchy in X, then {Ds fn }
is Cauchy in Lp (Ω) for all |s| ≤ m. Since Lp (Ω) is complete, Ds fn → g s in Lp (Ω). But
since
Z
Z
Z
s
s
f D φ dx = g s φ dx,
fn D φ dx =
lim
n→∞ Ω
Ds f
Ω
gs
we have
=
for all |s| ≤ m and |fn − f |X → 0 as n → ∞.
(2) H m,p ⊂ W 1,p (Ω). Let H m,p (Ω) be the completion of C m (Ω) with respect to W I,p (Ω)
norm. That is, f ∈ H m,p (Ω) there exists a sequence fn ∈ C m (Ω) such that fn → f
and Ds fn → g s strongly in Lp (Ω) and thus
Z
Z
|s|
s
|s|
g s φ dx
Ds fn φ dx → (−1)
D fn (φ) = (−1)
Ω
Ω
which implies g s = Ds f and f ∈ W 1,p (Ω).
(3) If Ω has a Lipschitz continuous boundary, then
W m,p (Ω) = H m,p (Ω).
2.6
Compact Operator
A compact operator is a linear operator A from a Banach space X to another Banach
space Y , such that the image under A of any bounded subset of X is a relatively
compact subset of Y , i.e., its closure is compact. For example, the integral operator is
a concrete example of compact operators. A Fredholm integral equation gives rise to
a compact operator A on function spaces.
A compact set in a Banach space is bounded. The converse doses not hold in
general. The the closed unit sphere is weakly star compact but not strongly compact
unless X is of finite dimension.
Let C(S) be the space of continuous functions on a compact metric space S with
norm |x| = maxs∈S |x(s)|. the Arzela-Ascoli theorem gives the necessary and sufficient
condition for a subset {xα } being (strongly) relatively compact.
Arzela-Ascoli theorem A family {xα } in C(S) is strongly compact if and only if the
following conditions hold:
sup |xα | < ∞ (equi-bouned)
α
lim
sup
δ→0+ α, |s0 −s00 |≤δ
|xα (s0 ) − xα (s0 )| = 0 (equi-continuous).
Proof: By the Bolzano-Weirstrass theorem for a fixed s ∈ S {xα (s)} contains a convergent subsequence. Since S is compact, there is a countable dense subset {sn } in S
such that for every > 0 there exists n satisfying
sup inf
s∈S 1≤j≤n
dist(s, sj ) ≤ .
Let us denote a convergent subsequence of {xn (s1 )} by {xn1 (s1 )}. Similarly, the sequence {hxn1 (s2 )} contains a a convergent subsequence {xn2 (s2 )}. Continuing in this
manner we extract subsequences {xnk (sk )}. The diagonal sequence {x∗nn (s)} converges
for s = s1 , s2 , · · · , simultaneously for the dense subset {sk }. We will show that {xnn }
33
is a Cauchy sequence in C(S). By the equi-continuity of {xn (s)} for all > 0 there
exists δ such that |xn (s0 ) − xn (s00 )| ≤ for all s0 , s0 satisisfying dist(s0 , s00 ) ≤≤ δ.
Thus, for s ∈ S there exists a j ≤ k such that
|xnn (s) − xmm (s)| ≤ |xnn (s) − xmm (sj )| + |xnn (sj ) − xmm (sj )| + |xmm (s) − xmm (sj )|
≤ 2 + |xnn (sj ) − xmm (sj )|
which implies that limm,m→∞ maxs∈S |xnn (s) − xmm (s)| ≤ 2 for arbitrary > 0. For the space Lp (Ω), we have
Frechet-Kolmogorov theorem Let Omega be a subset of Rd . A family {xα } of
Lp (Ω) functions is strongly compact if and only if the following conditions hold:
Z
sup |xα |p ds < ∞ (equi-bouned)
α
Ω
Z
lim sup
|t|→0 α
|xα (t + s) − xα (s)|p ds = 0 (equi-continuous).
Ω
Example (Integral Operator) Let k(x, y) be the kernel function satisfying
Z Z
|k(x, y)|2 dxdy < ∞
Ω
Ω
Define the integral operator A by
Z
k(x, y)u(y) dy for u ∈ X = L2 (Ω).
Au =
Ω
Then A ∈ L(X) i compact. This class of operators A is called the Hilbert-Schmitz
operator.
Rellich-Kondrachov theorem Every uniformly bounded sequence in W 1,p (Ω) has a
dp
subsequence that converges in Lq (Ω) provided that q < p∗ = d−p
.
Fredholm Alternative theorem Let A ∈ L(X) be compact. For any nonzero λ ∈ C,
either
1) The equation Ax − λ x = 0 has a nonzero solution x, or
2) The equation Ax − λ, x = f has a unique solution x for any function f ∈ X.
In the second case, (λ I − A)−1 is bounded.
Exercise
Problem 1 T : D(T ) ⊂ X → Y is closable if and only if the closure of its graph G(T )
in X × Y the graph of a linear operator S.
Problem 2 For a closed linear operator T , the domain D(T ) is a Banach space if it is
equipped by the graph norm
1
|x|D(T ) = (|x|2X + |T x|2Y ) 2 .
3
Constrained Optimization
In this section we develop the Lagrange multiplier theory for the constrained minimization in Banach spaces.
34
3.1
Hilbert space theory
In this section we develop the Hilbert space theory for the constrained minimization.
Theorem 1 (Riesz Representation) Let X be a Hilbert space. Given F ∈ X ∗ ,
consider the minimization
1
J(x) = (x, x)X − F (x)
2
over x ∈ X
There exists a unique solution x∗ ∈ X and x∗ satisfies
(v, x∗ ) = F (v)
for all v ∈ X
(3.1)
The solution x∗ ∈ X is the Riesz representation of F ∈ X ∗ and it satisfies |x∗ |X =
|F |X ∗ .
Proof: Suppose x∗ is a minimizer. Then
1
1 ∗
(x + tv, x∗ + tv) − F (x∗ + tv) ≥ (x∗ , x∗ ) − F (x∗ )
2
2
for all t ∈ R and v ∈ X. Thus, for t > 0
t
(v, x∗ ) − F (v) + |v|2 ≥ 0
2
Letting t → 0+ , we have (v, x∗ ) − F (v) ≥ 0 for all v ∈ X. If v ∈ X, then −v ∈ X and
this implies (3.1).
(Uniqueness) Suppose x∗1 , x∗2 are minimizers. Then,
(v, .x∗1 ) − (v, x∗2 ) = F (v) − F (v) = 0
for all v ∈ X
Letting v = x∗1 − x∗2 , we have |x∗1 − x∗2 |2 = (x∗1 − x∗2 , x∗1 − x∗2 ) = 0, i.e., x∗1 = x∗2 . (Existence) Suppose {xn } is a minimizing sequence, i.e., J(xn ) is decreasing and
limn→∞ J(xn ) = δ = inf x∈X J(x). Note that
|xn − xm |2 + |xn + xm |2 = 2(|xn |2 + |xm |2 )
Thus,
1
xn + xm
xn + xm
|xn −xm |2 = J(xm )+J(xn )−2J(
)−F (xn )−F (xm )+2F (
) ≤ J(xn )+J(xm )−2δ.
4
2
2
This implies {xn } is a Cauchy sequence in X. Since X is complete there exists x∗ =
limn→∞ xn in X and J(x∗ ) = δ, i.e, x∗ is the minimizer. Example (Mechanical System)
Z
min
0
1
1 d
| u(x)|2 −
2 dx
Z
1
f (x)u(x) dx
0
over u ∈ H01 (0, 1). Here
X = H01 (0, 1) = {u ∈ H 1 (0, 1) : u(0) = u(1) = 0}
35
(3.2)
is a Hilbert space with the inner product
Z
(u, v)H01 =
1
d d
u v dx.
dx dx
0
H 1 (0, 1) is a space of absolute continuous functions on [0, 1] with square integrable
derivative, i.e.,
Z x
d
u(x), a.e. and g ∈ L2 (0, 1).
g(x) dx, g(x) =
u(x) = u(0) +
dx
0
The solution yf = x∗ to (3.8) satisfies
1
Z
0
Z
d d
v yf dx =
dx dx
f (x)v(x) dx.
0
By the integration by part
Z 1
Z 1 Z
f (x)v(x) dx =
(
0
0
and thus
Z
1
(
0
for all v ∈
H01 (0, 1).
Z
d
yf −
dx
1
1
d
v(x) dx,
dx
f (s) ds)
x
1
f (s) ds)
x
d
v(x) dx = 0
dx
This implies that
Z 1
d
yf −
f (s) ds = c = a constant
dx
x
Integrating this, we have
xZ 1
Z
yf =
f (s) dsdx + cx
0
x
Since yf (1) = 0 we have
Z
1Z 1
c=−
f (s) dsdx.
0
Moreover,
d
dx yf
x
∈ H 1 (0, 1) and
−
d2
yf (x) = f (x)
dx2
a.e.
with yf (0) = yf (1) = 0.
Now, we consider the case when F (v) = v(1/2). Then,
Z
|F (v)| = |
0
That is, F ∈
X ∗.
1/2
d
v dx| ≤ (
dx
Z
0
1/2
d
| v| dx)1/2 (
dx
Z
0
1/2
1
1 dx)1/2 ≤ √ |v|X .
2
Thus, we have
Z 1
d
d
( yf − χ(0,1/2) (x)) v(x) dx = 0
dx
0 dx
36
for all v ∈ H01 (0, 1), where χS is the characteristic function of a set S. It is equivalent
to
d
yf − χ(0,1/2) (x) = c = a constant.
dx
Integrating this, we have

 (1 + c)x x ∈ [0, 21 ]
yf (x) =

cx + 12
x ∈ [ 12 , 1].
Since yf (1) = 0 we have c = − 12 . Moreover yf satisfies
−
d2
1
yf = 0, for x 6=
2
dx
2
yf (( 12 )− ) = yf ( 21 )+ ),
yf (0) = yf (1) = 0
d
1 −
dx yf (( 2 ) )
−
d
1 +
dx yf (( 2 ) )
= 1.
Or, equivalently
d2
yf = δ 1 (x)
2
dx2
is the Dirac delta distribution at x = x0 .
−
where δx0
Theorem 2 (Orthogonal Decomposition X = M ⊕ M ⊥ ) Let X be a Hilbert space
and M is a linear subspace of X. Let M be the closure of M and M ⊥ = {z ∈ X :
(z, y) = 0 for all y ∈ M } be the orthogonal complement of M . Then every element
u ∈ X has the unique representation
u = u1 + u2 ,
u1 ∈ M and u2 ∈ M ⊥ .
Proof: Consider the minimum distance problem
min
|x − u|2
over x ∈ M .
(3.3)
As for the proof of Theorem 1 one can prove that there exists a unique solution x∗ for
problem (3.3) and x∗ ∈ M satisfies
(x∗ − u, y) = 0,
for all y ∈ M
i.e., u − x∗ ∈ M ⊥ . If we let u1 = x∗ ∈ M and u2 = u − x∗ ∈ M ⊥ , then u = u1 + u2 is
the desired decomposition. P
Example (Closed subspace) (1) M = span{(yk , k ∈= {x ∈ X; x = k∈K αk yk } is a
subspace of X. If K is finite, M is closed.
(2) Let X = L2 (−1, 1) and M = {odd functions} = {f ∈ X : f (−x) = f (x), a.e.} and
then M is closed. Let M1 = span{sin(kπt)} and M2 = span{cos(kπt)}. L2 (0, 1) =
M1 ⊕ M2 .
(3) Let E ∈ L(X, Y ). Then M = N (E) = {x ∈ X : Ex = 0} is closed. In fact, if
xn ∈ M and xn → x in X, then 0 = limn→∞ Exn = E limn→∞ xn = Ex and thus
x ∈ N (E) = M .
37
Example (Hodge-Decomposition) Let X = L2 (Ω)3 with where Ω is a bounded open
set in R3 with Lipschitz boundary Γ. Let M = {∇u, u ∈ H 1 (Ω)} is the gradient field.
Then,
M ⊥ = {ψ ∈ X : ∇ · ψ = 0, n · ψ = 0 at Γ}
is the divergence free field. Thus, we have the Hodge decomposition
X = M ⊕ M ⊥ = {grad u, u ∈ H 1 (Ω)} ⊕ {the divergence free field}.
In fact ψ ∈ M ⊥
Z
Z
∇φ · ψ =
Z
n · ψφ ds −
Γ
∇ψφ dx = 0
Ω
and ∇u ∈ M, u ∈ H 1 (Ω)/R is uniquely determined by
Z
Z
f · ∇φ dx
∇u · ∇φ dx =
Ω
Ω
for all φ ∈ H 1 (Ω).
Theorem 3 (Decomposition) Let X, Y be Hilbert spaces and E ∈ L(X, Y ). Let
E ∗ ∈ L(Y, X) is the adjoint of E, i.e., (Ex, y)Y = (x, E ∗ y)X for all x ∈ X and
y ∈ Y . Then N (E) = R(E ∗ )⊥ and thus from the orthogonal decomposition theorem
X = N (E) ⊕ R(E ∗ ).
Proof: Suppose x ∈ N (E), then
(x, E ∗ y)X = (Ex, y)Y = 0
for all y ∈ Y and thus N (E) ⊂ R(E ∗ )⊥ . Conversely, x∗ ∈ R(E ∗ )⊥ , i.e.,
(x∗ , E ∗ y) = (Ex, y) = 0
for all y ∈ Y
Thus, Ex∗ = 0 and R(E ∗ )⊥ ⊂ N (E). Remark Suppose R(E) is closed in Y , then R(E ∗ ) is closed in X and
Y = R(E) ⊕ N (E ∗ ),
X = R(E ∗ ) ⊕ N (E)
Thus, we have b = b1 + b2 , b1 ∈ R(E), b2 ∈ N (E ∗ ) for all b ∈ Y and
|Ex − b|2 = |Ex − b1 |2 + |b2 |2 .
Example (R(E) is not closed) Consider X = L2 (0, 1) and define a linear operator E ∈
L(X) by
Z t
(Ef )(t) =
f (s) ds, t ∈ (0, 1)
0
Since R(E) ⊂ {absolutely continuous functions on [0, 1]}, R(E) = L2 (0, 1) and R(E)
is not a closed subspace of L2 (0, 1).
38
3.2
Minimum norm problem
We consider
min
|x|2
subject to (x, yi )X = ci , 1 ≤ i ≤ m.
(3.4)
Theorem 4 (Minimum Norm) Suppose {y
Then there
Pi }m are linearly independent.
m
∗
∗
exits a unique solution x to (3.4) and x = i=1 βi yi where β = {βi }i=1 satisfies
Gβ = c,
Gi,j = (yi , yj ).
⊥
∗
Proof: Let M = span{yi }m
i=1 . Since X = M ⊕ M , it is necessary that x ∈ M , i.e.,
∗
x =
m
X
β i yi .
i=1
From the constraint (x∗ , yi ) = ci , 1 ≤ i ≤ m, the condition for β ∈ Rn follows. In
fact, x ∈ X satisfies (x, yi ) = ci , 1 ≤ i ≤ m if and only if x = x∗ + z, z ∈ M ⊥ . But
|x|2 = |x∗ |2 + |z|2 . Remark {yi } is linearly independent if and only if the symmetric matrix G ∈ Rn,n is
positive definite since
(x, Gx)Rn = |
n
X
xk yk |2X
for all x ∈ Rn .
k=0
Theorem 5 (Minimum Norm) Suppose E ∈ L(X, Y ).
norm problem
min |x|2 subject to Ex = c
Consider the minimum
If R(E) = Y , then the minimum norm solution x∗ exists and
x∗ = E ∗ y,
(EE ∗ )y = c.
Proof: Since R(E ∗ ) is closed by the closed range theorem, it follows from the decomposition theorem that X = N (E) ⊕ R(E ∗ ). Thus, we x∗ = E ∗ y if there exists y ∈ Y
such that (EE ∗ )y = c. Since Y = N (E ∗ ) ⊕ R(E), N (E ∗ ) = {0}. Since if EE ∗ x = 0,
then (x, EE ∗ x) = |E ∗ x| = 0 and E ∗ x = 0. Thus, N (EE ∗ ) = N (E ∗ ) = {0}. Since
R(EE ∗ ) = R(E) it follows from the open mapping theorem EE ∗ is bounded invertible
and there exits a unique solution to (EE ∗ )y = c. Remark If R(E) is closed and c ∈ R(E), we let Y = R(E) and Theorem 5 is valid.
For example we define E ∈ L(X, Rm ) by
Ex = ((y1 , x)X , · · · , (ym , x)X ) ∈ Rm .
m
m
If {yi }m
i=1 are dependent, then span{yi }i=1 is a closed subspace of R
Example (DC-motor Control) Let (θ(t), ω(t))) satisfy
d
dt ω(t)
+ ω(t) = u(t),
d
dt θ(t)
= ω(t),
ω(0) = 0
(3.5)
θ(0) = 0
39
This models a control problem for the dc-motor and the control function u(t) is the
current applied to the motor, θ is the angle and ω is the angular velocity. The objective
¯ 0) is reached at time t = 1 and
to find a control u on [0, 1] such the desired state (θ,
with minimum norm
Z
1
|u(t)|2 dt.
0
Let X =
L2 (0, 1).
Given u ∈ X we have the unique solution to (3.5) and
ω(1) =
θ(t) =
R1
0
R1
0
et−1 u(t) dt
(1 − et−1 )u(t) dt
Thus, we can formulate this control problem as problem (3.4) by letting X = L2 (0, 1)
with the standard inner product and
y1 (t) = 1 − et−1 ,
¯
c1 = θ,
y2 (t) = et−1
c2 = 0.
Example (Linear Control System) Consider the linear control system
d
x(t) = Ax(t) + Bu(t),
dt
x(0) = x0 ∈ Rn
(3.6)
where x(t) ∈ Rn and u(t) ∈ Rm are the state and the control function, respectively
and system matrices A ∈ Rn×n and B ∈ Rn×m are given. Consider a control problem
of finding a control that transfers x(0) = 0 to a desired state x
¯ ∈ Rn at time T > 0,
i.e., x(T ) = x
¯ with minimum norm. The problem can be formulated as (3.4). First,
we have the variation of constants formula:
Z t
At
x(t) = e x0 +
eA(t−s) Bu(s) ds
0
d At
where eAt is the matrix exponential and satisfies dt
e = AeAt = eAt A. Thus, condition
x(T ) = x
¯ is written as
Z T
eA(T −t) Bu(t) ds = x
¯
0
L2 (0, T ; Rm )
Let X =
solution is given by
and it then follows from Theorem 4 that the minimum norm
u∗ (t) = B t E A
t (T −t)
β, Gβ = x
¯,
provided that the control Gramean
Z T
t
G=
eA(T −s) BB t eA (T −t) dt ∈ Rn×n
0
is positive on Rn . If G is positive, then control system (3.6) is controllable and it is
equivalent to
t
B t eA x = 0, t ∈ [0, T ] implies x = 0.
since
T
Z
t
|B t eA t x|2 dt
(x, Gx) =
0
40
Moreover, G is positive if and only if the Kalman rank condition holds; i.e.,
rank [B, AB, · · · An−1 B] = n.
t
In fact, (x, Gx) = 0 implies B t eA t x = 0, t ≥ 0 and thus B t (At )k = 0 for all k ≥ 0 and
thus the claim follows from the Cayley-Hamilton theorem.
Example (Function Interpolation)
(1) Consider the minimum norm on X = H01 (0, 1):
Z 1
d
| u|2 dx over H01 (0, 1)
min
0 dx
(3.7)
subject to u(xi ) = ci , 1 ≤ i ≤ m, where 0 < x1 < · · · < xm < 1 are given nodes.
Let X = H01 (0, 1). From the mechanical system example we have the piecewise linear
function yi (x) ∈ X such that (u, yi )X = u(xi ) for all u ∈ X for each i. From Theorem
3 the solution to (3.7) is given by
∗
u (x) =
m
X
βi yi (x).
i=1
That is, u∗ is a piecewise linear function with nodes {xi }, 1 ≤ i ≤ m and nodal values
ci
, 1 ≤ i ≤ n.
ci at xi , i.e., βi =
yi (xi )
(2) Consider the Spline interpolation problem (1.2):
Z 1
1 d2
min
| 2 u(x)|2 dx over all functions u(x)
2
dx
0
u0 (xi ) = ci , 1 ≤ i ≤ m.
(3.8)
Let X = H02 (0, 1) be the completion of the pre-Hilbert space C02 [0, 1] = the space of
twice continuously differentiable function with u(0) = u0 (0) = 0 and u(1) = u0 (1) = 0
with respect to H02 (0, 1) inner product;
Z 1 2
d
d2
(f, g)H02 =
f
g dx,
2 dx2
0 dx
subject to the interpolation conditions u(xi ) = bi ,
i.e.,
H02 (0, 1) = {u ∈ L2 (0, 1) :
d
d2
u,
u ∈ L2 (0, 1) with u(0) = u0 (0) = u(1) = u0 (1) = 0}.
dx
dx2
Problem (3.8) is a minimum norm problem on x = H02 (0, 1). For Fi ∈ X ∗ defined by
Fi (v) = v(xi ) we have
Z 1 2
d2
d
(3.9)
(yi , v)X − Fi (v) =
( 2 yi + (x − xi )χ[0,xi ] ) 2 v dx,
dx
0 dx
and for F˜i ∈ X ∗ defined by F˜i (v) = v 0 (xi ) we have
Z 1 2
d
d2
˜
(˜
yi , v)X − Fi (v) =
( 2 y˜i − χ[0,xi ] ) 2 v dx.
dx
0 dx
41
(3.10)
Thus,
d2
d2
y
+
(x
−
x
)χ
=
a
+
bx,
y˜i − χ[0,xi ] = a
˜ + ˜bx
i
i
[0,x
]
i
dx2
dx2
where a, a
˜ , b, ˜b are constants and yi = RFi and y˜i = RF˜i are piecewise cubic functions
on [0, 1]. One can determine yi , y˜i using the conditions y(0) = y 0 (0) = y(1) = y 0 (0) = 0.
From Theorem 3 the solution to (3.8) is given by
u∗ (x) =
m
X
(βi yi (x) + β˜i y˜i (x))
i=1
That is, u∗ is a piecewise cubic function with nodes {xi }, 1 ≤ i ≤ m and nodal value
ci and derivative bi at xi .
Example (Differential form, Natural Boundary Conditions) (1) Let X be the Hilbert
space defined by
X = H 1 (0, 1) = {u ∈ L2 (0, 1) :
du
∈ L2 (0, 1)}.
dx
Given a, c ∈ C[0, 1] with a > 0 c ≥ 0 and α ≥ 0 consider the minimization
Z
min
0
1
1
du
α
( (a(x)| |2 + c(x)|u|2 ) − uf ) dx + |u(1)|2 − gu(1)
2
dx
2
over u ∈ X = H 1 (0, 1). If u∗ is a minimizer, it follows from Theorem 1 that u∗ ∈ X
satisfies
Z 1
du∗ dv
+ c(x) uv − f v) dx + α u(1)v(1) − gv(1) = 0
(3.11)
(a(x)
dx dx
0
for all v ∈ X. By the integration by parts
Z 1
d
du∗
du∗
du∗
(− (a(x)
)+c(x)u∗ (x)−f (x))v(x) dx−a(0)
(0)v(0)+(a(1)
(1)+α u∗ (1)−g)v(1) = 0,
dx
dx
dx
dx
0
∗
1
1
assuming a(x) du
dx ∈ H (0, 1). Since v ∈ H (0, 1) is arbitrary,
−
d
du∗
(a(x)
) + c(x) u∗ = f (x),
dx
dx
du∗
a(0)
(0) = 0,
dx
x ∈ (0, 1)
du∗
a(1)
(1) + α u∗ (1) = g.
dx
(3.12)
(4.3) is the weak form and (3.14) is the strong form of the optimality condition.
(2) Let X be the Hilbert space defined by
X = H 2 (0, 1) = {u ∈ L2 (0, 1) :
du d2 u
,
∈ L2 (0, 1)}.
dx dx
Given a, b, c ∈ C[0, 1] with a > 0 b, c ≥ 0 and α ≥ 0 consider the minimization on
X = H 2 (0, 1)
Z
min
0
1
1
d2 u
du
α
β
( (a(x)| 2 |2 +b(x)| |2 +c(x)|u|2 )−uf ) dx+ |u(1)|2 + |u0 (1)|2 −g1 u(1)−g2 u0 (1)
2
dx
dx
2
2
42
over u ∈ X = H 2 (0, 1) satisfying u(0) = 1. If u∗ is a minimizer, it follows from
Theorem 1 that u∗ ∈ X satisfies
Z 1
d2 u∗ d2 v
du∗ dv
(a(x) 2
+b(x)
+c(x) u∗ v−f v) dx+α u∗ (1)v(1)+β (u∗ )0 (1)v 0 (1)−g1 v(1)−g2 v 0 (1) = 0
2
dx
dx
dx
dx
0
(3.13)
for all v ∈ X satisfying v(0) = 0. By the integration by parts
Z 1 2
d2
d
d
d
(b(x) u∗ ) + c(x)u∗ (x) − f (x))v(x) dx
( 2 (a(x) 2 u∗ ) −
dx
dx
dx
0 dx
+(a(1)
+(−
d2 u∗
du∗
d2 u∗
0
(1)
+
β
(1)
−
g
)v
(1)
−
a(0)
(0)v 0 (0)
2
dx2
dx
dx2
d2 u∗
du∗
d
d2 u∗
du∗
d
(a(x) 2 + b(x)
+ α u∗ − g1 )(1)v(1) − (− (a(x) 2 ) + b(x)
)(0)v(0) = 0
dx
dx
dx
dx
dx
dx
assuming u∗ ∈ H 4 (0, 1). Since v ∈ H 2 (0, 1) satisfying v(0) = 0 is arbitrary,
d2
d2
d
du∗
(a(x) 2 u∗ ) −
(b(x)
) + c(x) u∗ = f (x),
2
dx
dx
dx
dx
u∗ (0) = 1,
a(0)
x ∈ (0, 1)
d2 u∗
(0) = 0
dx2
d
d2 u∗
du∗
(a(x) 2 ) + b(x)
)(1) + α u∗ (1) − g1 = 0.
dx
dx
dx
(3.14)
(4.3) is the weak form and (3.14) is the strong (differential) form of the optimality
condition.
Next consider the multi dimensional case. Let Ω be a subdomain in Rd with Lipschtz
boundary and Γ0 is a closed subset of the boundary ∂Ω with Γ1 = ∂Ω \ Γ0 . Let Γ is a
smooth hypersurface in Ω.
a(1)
d2 u∗
du∗
(1)
+
β
(1) − g2 = 0,
dx2
dx
(−
HΓ10 (Ω) = {u ∈ H 1 (Ω) : f (s) = 0, s ∈ Γ0 }
with norm |∇u|L2 (Ω) . Let Γ is a smooth hyper-surface in Ω and consider the weak form
of equation for u ∈ HΓ10 (Ω):
Z
Z
(σ(x)∇u · ∇φ + u(x) (~b(x) · ∇φ) + c(x) u(x)φ(x)) dx +
α(s)u(s)φ(s) ds
Γ1
Z
−
Z
g0 (s)φ(s) ds −
Γ
Z
f (x)φ(x) dx −
Ω
g(s)u(s) ds = 0
Γ1
for all HΓ10 (Ω). Then the differential form for u ∈ HΓ10 (Ω) is given by


−∇ · (σ(x)∇u + ~bu) + c(x)u(x) = f in Ω





n · (σ(x)∇u + ~bu) + α u = g at Γ1





 [n · (σ(x)∇u + ~bu)] = g at Γ.
0
43
By the divergence theorem if u satisfies the strong form, then u is a weak solution.
Conversely, letting φ ∈ C0 (Ω \ Γ) we obtain the first equation, i.e., ∇ · (σ∇u + ~bu) ∈
L2 (Ω) Thus, n·(σ∇u+~bu) ∈ L2 (Γ) and the third equation holds. Also, n·(σ∇u+~bu) ∈
L2 (Γ1 ) and the third equation holds.
3.3
Lax-Milgram Theory and Applications
Let H be a Hilbert space with scalar product (φ, ψ) and X be a Hilbert space and
X ⊂ H with continuous dense injection. Let X ∗ denote the strong dual space of X.
H is identified with its dual so that X ⊂ H = H ∗ ⊂ X ∗ (i.e., H is the pivoting space).
The dual product hφ, ψi on X ∗ × X is the continuous extension of the scalar product
of H restricted to H × X. This framework is called the Gelfand triple.
Let σ is a bounded coercive bilinear form on X ×X. Note that given x ∈ X, F (y) =
σ(x, y) defines a bounded linear functional on X. Since given x ∈ X, y → σ(x, y) is a
bounded linear functional on X, say x∗ ∈ X ∗ . We define a linear operator A from X
into X ∗ by x∗ = Ax. Equation σ(x, y) = F (y) for all y ∈ X is equivalently written as
an equation
Ax = F ∈ X ∗ .
Here,
hAx, yiX ∗ ×X = σ(x, y),
x, y ∈ X,
and thus A is a bounded linear operator. In fact,
|Ax|X ∗ ≤ sup |σ(x, y)| ≤ M |x|.
|y|≤1
Let R be the Riesz operator X ∗ → X, i.e.,
|Rx∗ |X = |x∗ | and (Rx∗ , x)X = hx∗ , xi for all x ∈ X,
then Aˆ = RA represents the linear operator Aˆ ∈ L(X, X). Moreover, we define a linear
operator A˜ on H by
˜ = Ax ∈ H
Ax
with
˜ = {x ∈ X : |σ(x, y)| ≤ cx |y|H for all y ∈ X}.
dom (A)
˜ We will use the symbol A for all three
That is, A˜ is a restriction of A on dom (A).
linear operators as above in the lecture note and its use should be understood by the
underlining context.
Lax-Milgram Theorem Let X be a Hilbert space. Let σ be a (complex-valued)
sesquilinear form on X × X satisfying
σ(α x1 + β x2 , y) = α σ(x1 , y) + β σ(x2 , y)
σ(x, α y1 + β y2 ) = α
¯ σ(x, y1 ) + β¯ σ(x, y2 ),
|σ(x, y)| ≤ M |x||y|
for all x, y ∈ X
(Bounded)
and
Re σ(x, x) ≥ δ |x|2 for all x ∈ X and δ > 0
44
(Coercive).
Then for each f ∈ X ∗ there exist a unique solution x ∈ X to
σ(x, y) = hf, yiX ∗ ×X
for all y ∈ X
and
|x|X ≤ δ −1 |f |X ∗ .
Proof: Let us define the linear operator S from X ∗ into X by
Sf = x,
f ∈ X∗
where x ∈ X satisfies
σ(x, y) = hf, yi for all y ∈ X.
The operator S is well defined since if x1 , x2 ∈ X satisfy the above, then σ(x1 −x2 , y) =
0 for all y ∈ X and thus δ |x1 − x2 |2X ≤ Re σ(x1 − x2 , x1 − x2 ) = 0.
Next we show that dom(S) is closed in X ∗ . Suppose fn ∈ dom(S), i.e., there exists
xn ∈ X satisfying σ(xn , y) = hfn , yi for all y ∈ X and fn → f in X ∗ as n → ∞. Then
σ(xn − xm , y) = hfn − fm , yi for all y ∈ X
Setting y = xn − xm in this we obtain
δ |xn − xm |2X ≤ Re σ(xn − xm , xn − xm ) ≤ |fn − fm |X ∗ |xn − xm |X .
Thus {xn } is a Cauchy sequence in X and so xn → x for some x ∈ X as n → ∞. Since
σ and the dual product are continuous, thus x = Sf .
Now we prove that dom(S) = X ∗ . Suppose dom(S) 6= X ∗ . Since dom(S) is closed
there exists a nontrivial x0 ∈ X such that hf, x0 i = 0 for all f ∈dom(S). Consider
the linear functional F (y) = σ(x0 , y), y ∈ X. Then since σ is bounded F ∈ X ∗ and
x0 = SF . Thus F (x0 ) = 0. But since σ(x0 , x0 ) = hF, x0 i = 0, by the coercivity of σ
x0 = 0, which is a contradiction. Hence dom(S) = X ∗ . Assume that σ is coercive. By the Lax-Milgram theorem A has a bounded inverse
S = A−1 . Thus,
˜ = A−1 H.
dom (A)
Moreover A˜ is closed. In fact, if
˜ → x and fn = Axn → f in H,
xn ∈ dom (A)
˜ and Ax
˜ = f.
then since xn = Sfn and S is bounded, x = Sf and thus x ∈ dom (A)
If σ is symmetric, σ(x, y) = (x, y)X defines an inner product on X. and SF
coincides with the Riesz representation of F ∈ X ∗ . Moreover,
hAx, yi = hAy, xi for all x, y ∈ X.
and thus A˜ is a self-adjoint operator in H.
Example (Laplace operator) Consider X = H01 (Ω), H = L2 (Ω) and
Z
∇u · ∇φ dx.
σ(u, φ) = (u, φ)X =
Ω
45
Then,
∂2
∂2
u
+
u)
∂x21
∂x22
Au = −∆u = −(
and
˜ = H 2 (Ω) ∩ H 1 (Ω).
dom (A)
0
for Ω with C 1 boundary or convex domain Ω.
For Ω = (0, 1) and f ∈ L2 (0, 1)
Z 1
Z 1
d d
f (x)y(x) dx
y u dt =
0 dx dx
0
is equivalent to
Z
0
1
d
d
y( u+
dx dx
Z
1
f (s) ds) dx = 0
x
for all y ∈ H01 (0, 1). Thus,
d
u+
dx
and therefore
d
dx u
Z
1
f (s) ds = c (a constant)
x
∈ H 1 (0, 1) and
Au = −
d2
u = f in L2 (0, 1).
dx2
Example (Elliptic operator) Consider a second order elliptic equation
∂u
= g at Γ1 u = 0 at Γ0
∂ν
where Γ0 and Γ1 are disjoint and Γ0 ∪ Γ1 = Γ. Integrating this against a test function
φ, we have
Z
Z
Z
Z
Auφ dx = (a(x)∇u · ∇φ + b(x) · ∇uφ + c(x)uφ) dx −
gφ dsx =
f (x)φ(x) dx,
Au = −∇ · (a(x)∇u) + b(x) · ∇u + c(x)u(x) = f (x),
Ω
Ω
Γ1
Ω
for all φ ∈ C 1 (Ω) vanishing at Γ0 . Let X = HΓ10 (Ω) is the completion of C 1 (Ω)
vanishing at Γ0 with inner product
Z
(u, φ) =
∇u · ∇φ dx
Ω
i.e.,
HΓ10 (Ω) = {u ∈ H 1 (Ω) : u|Γ0 = 0}
Define the bilinear form σ on X × X by
Z
σ(u, φ) = (a(x)∇u · ∇φ + b(x) · ∇uφ + c(x)uφ.
Ω
Then, by the Green’s formula
Z
1
σ(u, u) = (a(x)|∇u|2 + b(x) · ∇( |u|2 ) + c(x)|u|2 ) dx
2
Ω
Z
Z
1
1
2
2
= (a(x)|∇u| + (c(x) − ∇ · b) |u| ) dx + +
n · b|u|2 dsx .
2
Ω
Γ1 2
46
If we assume
0 < a ≤ a(x) ≤ a
¯,
1
c(x) − ∇ · b ≥ 0,
2
n · b ≥ 0 at Γ1 ,
then σ is bounded and coercive with δ = a.
The Banach space version of Lax-Milgram theorem is as follows.
Banach-Necas-Babuska Theorem Let V and W be Banach spaces. Consider the
linear equation for u ∈ W
for all v ∈ V
a(u, v) = f (v)
(3.15)
for given f ∈ V ∗ , where a is a bounded bilinear form on W × V . The problem is
well-posed in if and only if the following conditions hold:
inf sup
u∈W v∈V
a(u, v)
≥δ>0
|u|W |v|V
(3.16)
a(u, v) = 0 for all u ∈ W implies v = 0
Under conditions we have the unique solution u ∈ W to (3.15) satisfies
|u|W ≤
1
|f |V ∗ .
δ
Proof: Let A be a bounded linear operator from W to V ∗ defined by
hAu, vi = a(u, v) for all u ∈ W, v ∈ V.
The inf-sup condition is equivalent to for any w?W
|Aw|V ∗ ≥ δ|u|W ,
and thus the range of A, R(A), is closed in V ∗ and N (A) = 0. But since V is reflexive
and
hAu, viV ∗ ×V = hu, A∗ viW ×W ∗
from the second condition N (A∗ ) = {0}. It thus follows from the closed range and
open mapping theorems that A−1 is bounded. Next, we consider the generalized Stokes system. Let V and Q be Hilbert spaces.
We consider the mixed variational problem for (u, p) ∈ V × Q of the form
a(u, v) + b(p, v) = f (v),
b(u, q) = g(q)
(3.17)
for all v ∈ V and q ∈ Q, where a and b is bounded bilinear form on V × V and V × Q.
If we define the linear operators A ∈ L(V, V ∗ ) and B ∈ L(V, Q∗ ) by
hAu, vi = a(u, v)
and hBu, qi = b(u, q)
then it is equivalent to the operator form:

  

A B∗
u
f

  = 
.
B 0
p
g
47
Assume the coercivity on a
a(u, u) ≥ δ |u|2V
(3.18)
and the inf-sup condition on b
inf sup
q∈P u∈V
b(u, q)
≥β>0
|u|V |q|Q
(3.19)
Note that inf-sup condition that for all q there exists u ∈ V such that Bu = q and
|u|V ≤ β1 |q|Q . Also, it is equivalent to |B ∗ p|V ∗ ≥ β |p|Q for all p ∈ Q.
Theorem (Mixed problem) Under conditions (3.18)-(3.19) there exits a unique solution
(u, p) ∈ V × Q to (3.17) and
|u|V + |p|Q ≤ c (|f |V ∗ + |g|Q∗ )
Proof: For > 0 consider the penalized problem
a(u , v) + b(v, P ) = f (v),
for all v ∈ V
(3.20)
−b(u , q) + (p , q)Q = −g(q)
for all q ∈ Q.
By the Lax-Milgram theorem for every > 0 there exists a unique solution. From the
first equation
β |p |Q ≤ |f − Au |V ∗ ≤ |f |V ∗ + M |u |Q .
Letting v = u and q = p in the first and second equation, we have
δ |u |2V + |p |2Q ≤ |f |V ∗ ||u |V + |p |Q |g|Q∗ ≤ C (|f |V ∗ + |g|Q∗ )|u |V ),
and thus |u |V and |p |Q as well, are bounded uniformly in > 0. Thus, (u , p ) has a
weakly convergent subspace to (u, p) in V × Q and (u, p) satisfies (3.17). 3.4
Quadratic Constrained Optimization
In this section we discuss the constrained minimization. First, we consider the case of
equality constraint. Let A be coercive and self-adjoint operator on X and E ∈ L(X, Y ).
Let σ : X × X → R is a symmetric, bounded, ceoercive bilinear form, i.e.,
σ(α x1 + β x2 , y) = α σ(x1 , y) + β σ(x2 , y) for all x1 , x2 , y ∈ X and α, β ∈ R
σ(x, y) = σ(y, x),
|σ(x, y)| ≤ M |x|X |y|X ,
σ(x, x) ≥ δ |x|2
for some 0 < δ ≤ M < ∞. Thus,
p σ(x, y) defines
p an inner product on X. In fact,
since σ is bounded and coercive, σ(x, x) = (Ax, x) is an equivalent norm of X.
Since y → σ(x, y) is a bounded linear functional on X, say x∗ and define a linear
operator from X to X ∗ by x∗ = Ax. Then, A ∈ L(X, X ∗ ) with |A|X→X ∗ ≤ M and
hAx, yiX ∗ ×X = σ(x, y) for all x, y ∈ X. Also, if RX ∗ →X is the Riesz map of X ∗ and
define Aˆ = RX ∗ →X A ∈ LX, X), then
ˆ y)X = σ(x, y) for all x, y ∈ X,
(Ax,
48
and thus Aˆ is a self-adjoint operator in X. Throughout this section, we assume X =
˜ Consider the
X ∗ and Y = Y ∗ , i.e., we identify A as the self-adjoint operator A.
∗
unconstrained minimization problem: for f ∈ X
min
1
F (x) = σ(x, x) − f (x).
2
(3.21)
Since for x, v ∈ X and t ∈ R
F (x + tv) − F (x)
t
= σ(x, v) − f (v) + σ(v, v),
t
2
x ∈ X → F (x) ∈ R is differentiable and we have the necessary optimality for (3.21):
σ(x, v) − f (v) = 0 for all v ∈ X.
(3.22)
Or, equivalently
Ax = f in X ∗
Also, if RX ∗ →X is the Riesz map of X ∗ and define A˜ = RX ∗ →X A ∈ L(X, X), then
˜ y)X = σ(x, y) for all x, y ∈ X,
(Ax,
and thus A˜ is a self-adjoint operator in X. Throughout this section, we assume X = X ∗
˜
and Y = Y ∗ , i.e., we identify A as the self-adjoint operator A.
Consider the equality constraint minimization;
1
(Ax, x)X − (a, x)X
2
min
subject to Ex = b ∈ Y,
(3.23)
or equivalently for f ∈ X ∗
min
1
σ(x, x) − f (x)
2
subject to Ex = b ∈ Y.
The necessary optimality for (3.23) is:
Theorem 6 (Equality Constraint) Suppose there exists a x
¯ ∈ X such that E x
¯=b
then (3.23) has a unique solution x∗ ∈ X and Ax∗ − a ⊥ N (E). If R(E ∗ ) is closed,
then there exits a Lagrange multiplier λ ∈ Y such that
Ax∗ + E ∗ λ = a,
Ex∗ = b.
(3.24)
If range(E) = Y (E is surjective), there exits a Lagrange multiplier λ ∈ Y is unique.
Proof: Suppose there exists a x
¯ ∈ X such that E x
¯ = b then (3.23) is equivalent to
min
1
(A(z + x
¯), z + x
¯) − (a, z + x
¯)
2
over z ∈ N (E).
Thus, it follows from Theorem 2 that it has a unique minimizer z ∗ ∈ N (E) and
(A(z ∗ + x
¯) − a, v) = 0 for all v ∈ N (E) and thus A(z ∗ + x
¯) − a ⊥ N (E). From the
∗
∗
orthogonal decomposition theorem Ax − a ∈ R(E ). Moreover if R(E ∗ ) is surjective,
from the closed range theorem R(E ∗ ) = R(E ∗ ) and there exist exists a λ ∈ Y satisfying
49
Ax∗ + E ∗ λ = a. If E is surjective (R(E) = Y ), then N (E ∗ ) = 0 and thus the multiplier
λ is unique. Remark (Lagrange Formulation) Define the Lagrange functional
L(x, λ) = J(x) + (λ, Ex − b).
The optimality condition (3.24) is equivalent to
∂
L(x∗ , λ) = 0,
∂x
∂
L(x∗ , λ) = 0.
∂λ
Here, x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y is the dual
variable.
Remark (1) If E is not surjective, one can seek a solution using the penalty formulation:
1
β
min
(Ax, x)X − (a, x)X + |Ex − b|2Y
(3.25)
2
2
for β > 0. The necessary optimality condition is
Axβ + β E ∗ (Exβ − b) − a = 0
If we let λβ = β (Ex − b), then we have
A E∗
a
xβ
.
=
E − βI
b
λβ
the optimality system (3.24) is equivalent to the case when β = ∞. One can prove the
monotonicity
J(xβ ) ↑, |Exβ − b|Y ↓ as β → ∞
In fact for β > βˆ we have
J(xβ ) + β2 |Exβ − b|2Y ≤ J(xβˆ) + β2 |Exβˆ − b|2Y
ˆ
ˆ
J(xβˆ) + β2 |Exβˆ − b|2Y ≤ J(xβ ) + β2 |Exβ − b|2Y .
Adding these inequalities, we obtain
2
2
ˆ
(β − β)(|Ex
β − b|Y − |Exβˆ − b|Y ) ≤ 0,
which implies the claim. Suppose there exists a x
¯ ∈ X such that E x
¯ = b then from
Theorem 6
β
J(xβ ) + |Exβ − b|2 ≤ J(x∗ ).
2
and thus |xβ |X is bounded uniformly in β > 0 and
lim |Exβ − b| = 0
β→∞
Since X is a Hilbert space, there exists a weak limit x ∈ X with J(x) ≤ J(x∗ ) and
Ex = b since norms are weakly sequentially lower semi-continuous. Since the optimal
solution x∗ to (3.23) is unique, lim xβ = x∗ as β → ∞. Suppose λβ is bounded in Y ,
50
then there exists a subsequence of λβ that converges weakly to λ in Y . Then, (x∗ , λ)
satisfies Ax∗ + E ∗ λ = a. Conversely, if there is a Lagrange multiplier λ∗ ∈ Y , then λβ
is bounded in Y . In fact, we have
(A(xβ − x∗ ), xβ − x∗ ) + (xβ − x∗ , E ∗ (λβ − λ∗ )) = 0
where
β (xβ − x∗ , E ∗ (λβ − λ∗ )) = β (Exβ − b, λβ − λ∗ ) = |λβ |2 − (λβ , λ∗ ).
Thus we obtain
β (xβ − x∗ , A(xβ − x∗ )) + |λβ |2 = (λβ , λ∗ )
and hence |λβ | ≤ |λ∗ |. Moreover, if R(E) = Y , then N (E ∗ ) = {0}. Since λβ satisfies
E ∗ λβ = a − Axβ ∈ X, we have
λβ = Esβ ,
sβ = (E ∗ E)−1 (a − Axβ )
and thus λβ ∈ Y is bounded uniformly in β > 0.
(2) If E is not surjective but R(E) is closed. We have λ ∈ R(E) ⊂ Y such that
Ax∗ + E ∗ λ = a
by letting Y = R(E) is a Hilbert space. Also, we have (xβ , λβ ) → (x, λ) satisfying
Ax + E ∗ λ = a
Ex = PR(E) b
since
|Ex − b|2 = |Ex − PR(E) b|2 + |PN (E ∗ ) b|2 .
Example (Parameterized constrained problem)
min
1
2
((Ay, y) + α (p, p)) − (a, y)
subject to E1 y + E2 p = b
where x = (y, p) ∈ X1 × X2 = X and E1 ∈ L(X1 , Y ), E2 ∈ L(X2 , Y ). Assume Y = X1
and E1 is bounded invertible. Then we have the optimality

Ay ∗ + E1∗ λ = a





α p + E2∗ λ = 0





E1 y ∗ + E2 p∗ = b
Example (Stokes system) Let X = H01 (Ω) × H01 (Ω) where Ω is a bounded open set in
R2 . A Hilbert space H01 (Ω) is the completion of C0 (Ω) with respect to the H01 (Ω) inner
product
Z
(f, g)H01 (Ω) =
∇f · ∇g dx
Ω
51
Consider the constrained minimization
Z
1
( (|∇u1 |2 + |∇u2 |2 ) − f · u) dx
Ω 2
over the velocity field u = (u1 , u2 ) ∈ X satisfying
div u =
∂
∂
u1 +
u2 = 0.
∂x1
∂x2
From Theorem 5 with Y = L2 (Ω) and Eu = div u on X we have the necessary optimality
−∆u + gradp = f, div u = 0
which is called the Stokes system. Here f ∈ L2 (Ω) × L2 (Ω) is a applied force p =
−λ∗ ∈ L2 (Ω) is the pressure.
3.5
Variational Inequalities
Let Z be a Hilbert lattice with order ≤ and Z ∗ = Z and G ∈ L(X, Z). For example, Z = L2 (Ω) with a.e. pointwise inequality constarint. Consider the quadratic
programming:
1
min (Ax, x)X − (a, x)X
2
subject to Ex = b,
Gx ≤ c.
(3.26)
Note that the constrain set C = {x ∈ X, Ex = b, Gx ≤ c} is a closed convex set in X.
In general we consider the constrained problem on a Hilbert space X:
F0 (x) + F1 (x)
over x ∈ C
(3.27)
where C is a closed convex set, F : X → R is C 1 and F1 : X → R is convex, i.e.
F1 ((1 − λ)x1 + λ x2 ) ≥ (1 − λ) F1 (x1 ) + λ F (x2 ) for all x, x2 ∈ X and 0 ≤ λ ≤ 1.
Then, we have the necessary optimality in the form of the variational inequality:
Theorem 7 (Variational Inequality) Suppose x∗ ∈ C minimizes (3.27), then we
have the optimality condition
F00 (x∗ )(x − x∗ ) + F1 (x) − F1 (x∗ ) ≥ 0 for all x ∈ C.
(3.28)
Proof: Let xt = x∗ + t (x − x∗ ) for all x ∈ C and 0 ≤ t ≤ 1. Since C is convex, xt ∈ C.
Since x∗ ∈ C is optimal,
F0 (xt ) − F0 (x∗ ) + F1 (xt ) − F1 (x∗ ) ≥ 0,
where
F1 (xt ) ≤ (1 − t)F1 (x∗ ) + tF (x)
and thus
F1 (xt ) − F1 (x∗ ) ≤ t (F1 (x) − F1 (x∗ )).
52
(3.29)
Since
F0 (xt ) − F0 (x∗ )
→ F00 (x∗ )(x − x∗ ) as t → 0+ ,
t
the variational inequality (3.28) follows from (3.29) by letting t → 0+ . Example (Inequality constraint) For X = Rn
F 0 (x∗ )(x − x∗ ) ≥ 0,
x∈C
implies that
F 0 (x∗ ) · ν ≤ 0 and F 0 (x∗ ) · τ = 0,
where ν is the outward normal and τ is the tangents of C at x∗ . Let X = R2 , C =
[0, 1] × [0, 1] and assume x∗ = [1, s], 0 < s < 1. Then,
∂F ∗
(x ) ≤ 0,
∂x1
∂F ∗
(x ) = 0.
∂x2
∂F ∗
(x ) ≤ 0,
∂x1
∂F ∗
(x ) ≤ 0.
∂x2
If x∗ = [1, 1] then
Example (L1 -optimization) Let U be a closed convex subset in R. Consider the minimization problem on X = L2 (Ω) and C = {u ∈ U a.e. in Ω};
Z
1
2
min
|Eu − b|Y + α |u(x)| dx. over u ∈ C.
(3.30)
2
Ω
Then, the optimality condition is
∗
Z
∗
(|u| − |u∗ |) dx ≥ 0
(Eu − b, E(u − u ))Y + α
(3.31)
Ω
for all u ∈ C. Let p = −E ∗ (Eu∗ − b) ∈ X and
u = v ∈ U on |x − x
¯| ≤ δ,
otherwise u = u∗ .
From (4.24) we have
Z
1
(−p(v − u∗ (x)) + α (|v| − |u∗ (x)|) dx ≥ 0.
δ |x−¯x|≤δ
Suppose u∗ ∈ L1 (Ω) and letting δ → 0+ we obtain
−p(v − u∗ (¯
x)) + α (|v| − |u∗ (¯
x)|) ≥ 0
a.e. x
¯ ∈ Ω (Lebesgue points of |u∗ | and for all v ∈ U . That is, u∗ ∈ U minimizes
−pu + α |u|
over u ∈ U.
(3.32)
∗
Suppose U = R it implies that if |p| < α, then u∗ = 0 and otherwise p = α |uu∗ | =
α sign(u∗ ). Thus, we obtain the necessary optimality
(E ∗ (Eu∗ − b))(x) + α ∂|u|(u∗ (x)) = 0,
53
a.e. in Ω,
where the sub-differential of |u| (see, Section ) is defined by

s>0
 1
[−1, 1]
s=0
∂|u|(s) =

−1
s < 0.
Theorem 8 (Quadratic Programming) Consider the quadratic programming (3.26):
min
1
(Ax, x)X − (a, x) subject to Ex = b, Gx ≤ c.
2
If there exists x
¯ ∈ X such that E x
¯ = b and G¯
x ≤ c, then there exits unique solution
to (3.26) and
(Ax∗ − a, x − x∗ ) ≥ 0 for all x ∈ C,
(3.33)
where C = {Ex = b, Gx ≤ c}. Moreover, if R(E) = Y and assume the regular point
condition
0 ∈ int {G N (E) − Z − + G(x∗ ) − c},
then there exists λ ∈ Y and µ ∈ Z + such that

Ax∗ + E ∗ λ + G∗ µ = a





Ex∗ = b





Gx∗ − c ≤ 0, µ ≥ 0 and
(3.34)
(Gx∗ − c, µ)Z = 0.
Proof: First note that (3.26) is equivalent to
min
1
(A(z + x
¯, z + x
¯)X − (a, z + x
¯)X
2
subject to Ez = 0,
Gz ≤ c − G¯
x = cˆ
Since Cb = {Ez = 0, Gz ≤ cˆ} is a closed convex set in the Hilbert space N (E). As in
the proof of Theorem 1 the minimizing sequence zn ∈ Cb satisfies
1
|zm − zn | ≤ J(zn ) + J(zm ) − 2 δ → 0
4
as m ≥ n → ∞. Since Cb is closed, there exists a unique minimizer z ∗ ∈ Cb and z ∗
satisfies
b
(A(z ∗ + x
¯) − a, z − z ∗ ) ≥ 0 for all z ∈ C,
which is equivalent to (3.33). Moreover, it follows from the Lagrange Multiplier theory
(below) if the regular point condition holds, then we obtain (3.34). Example Suppose Z = Rm . If (Gx∗ − c)i = 0, then i is called an active index and let
Ga the (row) vectors of G corresponding to the active indices. Suppose
E
is surjective (Kuhn-Tuker condition),
Ga
then the regular point condition holds.
54
Remark (Lagrange Formulation) Define the Lagrange functional;
L(x, λ, µ) = J(x) + (λ, Ex − b) + (µ, (Gx − c)+ ).
for (x, λ) ∈ X × Y, µ ∈ Z + ; The optimality condition (3.34) is equivalent to
∂
L(x∗ , λ, µ) = 0,
∂x
∂
L(x∗ , λ, µ) = 0,
∂λ
∂
L(x∗ , λ, µ) = 0
∂µ
Here x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y and µ ∈ Z + are
the dual variables.
Remark Let A ∈ L(X, X ∗ ) and E : X → Y is surjective. If we define G is natural
injection X to Z. Define µ ∈ X ∗ by
−µ = Ax∗ − a + E ∗ λ.
Since ht(x∗ − c) − (x∗ − c), µi = (t − 1) hx∗ − c, µi ≤ 0 for t > 0 we have hx∗ − c, µi = 0.
Thus,
hx − c, µi ≤ 0 for all x − c ∈ Z + ∩ X
∗ by
If we define the dual cone X+
∗
X+
= {x∗ ∈ X ∗ : hx∗ , xi ≤ 0 for all x ∈ X and x ≤ 0},
∗ . Thus, the regular point condition gives the stronger regularity µ ∈ Z +
then µ ∈ X+
for the Lagrange multiplier µ.
Example (Source Identification) Let X = Z = L2 (Ω) and S : L(X, Y ). and consider
the constrained minimization:
min
1
(|Sf − y|2Y + α |f |2 )
2
subject to −f ≤ 0. In this case, A = α I + S ∗ S. α > 0, a = S ∗ y and G = −I. The
regular point condition (??) is given by
0 ∈ int {(−Z + f ∗ ) − Z − − f ∗ },
Since −Z + f ∗ contains Z − and the regular point condition holds. Hence there exists
µ ∈ Z + such that
α f ∗ + S ∗ Sf ∗ + µ = S ∗ y,
µ ≥ 0 and (−f ∗ , µ) = 0.
For example, the operator S is defined for Poisson equation:
−∆u = f in Ω
∂
+ u = 0 at Γ = ∂Ω and Sf = Cu with a measurement
∂u
ˆ a subdomain of Ω.
operator C, e.g., Cu = u|Γ or Cu = u|Ωˆ with Ω,
with a boundary condition
Remark (Penalty Method) Consider the penalized problem
J(x) +
β
β
|Ex − b|2Y + | max(0, Gx − c)|2Z .
2
2
55
The necessary optimality condition is given by
Ax + E ∗ λβ + G∗ µβ = a,
λβ = β (Ex − b),
µβ = β (Gx − c)+ .
It follows from Theorem 7 that there exists a unique solution xβ to (3.26) and satisfies
J(xβ ) +
β
β
|Exβ − b|2Y + | max(0, Gxβ − c)|2Z ≤ J(x∗ )
2
2
(3.35)
Thus, {xβ } is bounded in X and has a weak convergence subsequence that converges
to x in X. Moreover, we have
Ex = b and
max(0, Gx − c) = 0
and J(x) ≤ J(x∗ ) since the norm is weakly sequentially lower semi-continuous. Since
the minimizer to (3.26) is unique, x = x∗ and limβ→∞ J(xβ ) = J(x∗ ) which implies
that xβ → x∗ strongly in X.
Suppose λβ ∈ Y and µβ ∈ Z are bounded, there exists a subsequence that converges
weakly to (λ, µ) in Y × Z and µ ≥ 0. Since limβ→∞ J(xβ ) = J(x∗ ) it follows from
(3.35) that
(µβ , Gxβ − c) → 0 as β → ∞
and thus (µ, Gx∗ − c) = 0. Hence (x∗ , λ, µ) satisfies (3.34). Conversely, if there is a
Lagrange multiplier (λ∗ , µ∗ ) ∈ Y , then (λβ , µβ ) is bounded in Y × Z. In fact, we have
(xβ − x∗ , A(xβ − x∗ )) + (xβ − x∗ , E ∗ (λβ − λ∗ )) + (xβ − x∗ , G∗ (µβ − µ∗ )) = 0
where
β (xβ − x∗ , E ∗ (λc − λ∗ )) = β (Exβ − b, λβ − λ∗ ) = |λβ |2 − (λβ , λ∗ ).
and
β (xβ − x∗ , G∗ (µβ − µ∗ )) = β (Gxβ − c − (Gx∗ − c), µβ − µ∗ ) ≥ |µβ |2 − (µβ , µ∗ ).
Thus we obtain
β (xβ − x∗ , A(xβ − x∗ )) + |λβ |2 + |µβ |2 ≤ (λβ , λ) + (µβ , µ∗ )
and hence |(λβ , µβ )| ≤ |(λ∗ , µ∗ )|. It will be shown in Section 5 the regular point
condition implies that (λβ , µβ ) is uniformly bounded in β > 0.
Example (Quadratic Programming on Rn ) Let X = Rn . Given a ∈ Rn , b ∈ Rm and
c ∈ Rp consider
1
min
(Ax, x) − (a, x)
2
subject to Bx = b,
Cx ≤ c
p×n . Let C the (row)
where A ∈ Rn×n is symmetric and positive, B ∈ Rm×n and C ∈ R
a
B
vectors of C corresponding to the active indices and assuming
is surjective,
Ca
56
the necessary and sufficient optimality condition is given by

Ax∗ + B t λ + C t µ = a





Bx∗ = b





(Cx∗ − c)j ≤ 0 µj ≥ 0 and (Cx∗ − c)j µj = 0.
Example (Minimum norm problem with inequality constraint)
min
|x|2
subject to (x, yi ) ≤ ci , 1 ≤ ı ≤ m.
Assume {yi } are linearly independent. Then it has a unique solution x∗ and the
necessary and sufficient optimality is given by
P
x∗ + m
i=1 µi yi = 0
(x∗ , yi ) − ci ≤ 0, µi ≥ 0 and ((x∗ , yi ) − ci )µi = 0, 1 ≤ i ≤ m.
That is, if G is Grammian (Gij = (yi , yj )) then
(−Gx∗ − c)i ≤ 0 µi ≥ 0
and
(−Gx∗ − c)i µi = 0,
1 ≤ i ≤ m.
Example (Pointwise Obstacle Problem) Consider
Z
min
0
1
1 d
| u(x)|2 −
2 dx
Z
1
f (x)u(x) dx
0
1
subject to u( ) ≤ c
2
over u ∈ X = H01 (0, 1). Let f = 1. The Riesz representation of
Z 1
1
F1 (u) =
f (t)u(t) dt, F2 (u) = u( )
2
0
are
1
1 1
1
y1 (x) = x(1 − x). y2 (x) = ( − |x − |),
2
2 2
2
respectively. Assume that f = 1. Then, we have
u∗ − y1 + µ y2 = 0,
1 µ
1
u∗ ( ) = − ≤ c,
2
8 4
µ ≥ 0.
Thus, if c ≥ 81 , then u∗ = y1 and µ = 0. If c ≤ 18 , then u∗ = y1 − µ y2 and µ = 4(c − 18 ).
3.6 Constrained minimization in Banach spaces and Lagrange multiplier theory
In this section we discuss the general nonlinear programming. Let X, Y be Hilbert
spaces and Z be a Hilbert lattice. Consider the constrained minimization;
min
J(x)
subject to E(x) = 0 and G(x) ≤ 0.
57
(3.36)
over a closed convex set C in X. Here, where J : X → R, E : X → Y and G : X → Z
are continuously differentiable.
Definition (Derivatives) (1) The functional F on a Banach space X is Gateaux
differentiable at u ∈ X if for all d ∈ X
lim
t→0
F (u + td) − F (u)
= F 0 (u)(d)
t
exists. If the G-derivative d ∈ X → F 0 (u)(d) ∈ R is liner and bounded, then F is
Frechet differentiable at u.
(2) E : X → Y , Y ia Banach space is Frechet differentiable at u if there exists a linear
bounded operator E 0 (u) such that
|E(u + d) − E(u) − E 0 (u)d|Y
= 0.
|d|X
|d|X →0
lim
If Frechet derivative u ∈ X → E 0 (u) ∈ L(X, Y ) is continuous, then E is continuously
differentiable.
Definition (Lower semi-continuous) (1) A functional F is lower-semi continuous
if
lim inf F (xn ) ≥ F ( lim xn )
n→∞
n→∞
(2) A functional F is weakly lower-semi continuous if
lim inf F (xn ) ≥ F (w − limn→∞ xn )
n→∞
Theorem (Lower-semicontinuous) (1) Norm is weakly lower-semi continuous.
(2) A convex lower-semicontinuous functional is weakly lower-semi continuous.
Proof: Assume xn → x weakly in X. Let x∗ ∈ F (x), i.e., hx∗ , xi = |x∗ ||x|. Then, we
have
|x|2 = lim hx∗ , xn i
n→∞
and
|hx∗ , xn i| ≤ |xn ||x∗ |.
Thus,
lim inf |xn | ≥ |x|.
n→∞
(2) Since F is convex,
X
X
F(
tk xk ) ≤
tk F (xk )
k
k
P
for all convex combination of xk , i.e.,
tk = 1, tk ≥ 0. By the Mazur lemma there
exists a sequence of convex combination of weak convergent sequence ({xk }, {F (xk )})
to (x, F (x)) in X × R that converges strongly to (x, F (x)) and thus
F (x) ≤ lim inf n → ∞ F (xn ).
Theorem (Existence) (1) A lower-semicontinuous functional F on a compact set S
has a minimizer.
(2) A weak lower-semicontinuous, coercive functional F has a minimizer.
58
Proof: (1) One can select a minimizing sequence {xn } such that F (xn ) ↓ η = inf x∈S F (x).
Since S is a compact, there exits a subsequaence that converges strongly to x∗ ∈ S.
Since F is lower-semicontinuous, F (x∗ ) ≤ η.
(2) Since F (x) → ∞ as |x| → ∞, there exits a bounded minimizing sequence {xn } of F .
A bounded sequence in a reflexible Banach space X has a weak convergent subsequence
to x∗ ∈ X. Since F is weakly lower-semicontinuous, F (x∗ ) ≤ η. Next, we consider the constrained optimization in Banach spaces X, Y :
min
subject to G(y) ∈ K
F (y)
(3.37)
over y ∈ C, where K is a closed convex cone of a Banach space Y , F : X → R and
G : X → Y are C1 and C is a closed convex set of X. For example, if K = {0}, then
G(y) = 0 is the equality constraint. If Y = Z is a Hilbert lattice and
K = {z ∈ Z : z ≤ 0},
then G(y) ≤ 0 is the inequality constraint. Then, we have the Lagrange multiplier
theory:
Theorem (Lagrange Multiplier Theory) Let y ∗ ∈ C be a solution to (3.37) and
assume the regular point condition
0 ∈ int {G0 (y ∗ )(C − y ∗ ) − K + G(y ∗ )}
(3.38)
holds. Then, there exists a Lagrange multiplier
λ ∈ K + = {y ∈ Y : (y, z) ≤ 0 for all z ∈ K} satisfying (λ, G(y ∗ ))Y = 0
such that
(F 0 (y ∗ ) + G0 (y ∗ )∗ λ, y − y ∗ )X ≥ 0
for all y ∈ C.
(3.39)
Proof: Let F = {y ∈ C : G(y) ∈ K} be the feasible set. The sequential tangent cone
of F at y ∗ is defined by
yn − y ∗
, tn → 0+ , yn ∈ F}.
n→∞
tn
T (F, y ∗ ) = {x ∈ X : x = lim
It is easy show that
(F 0 (y ∗ ), y − y ∗ ) ≥ 0
for all y ∈ T (F, y ∗ ).
Let
C(y ∗ ) = ∪s≥0, y∈C {s(y − y ∗ )},
K(G(y ∗ )) = {K − t G(y ∗ ), t ≥ 0}.
and define the linearizing cone L(F, y ∗ ) of F at y ∗ by
L(F, y ∗ ) = {z ∈ C(y ∗ ) : G0 (y ∗ )z ∈ K(G(y ∗ ))}.
It follows the Listernik’s theorem [] that
L(F, y ∗ ) ⊆ T (F, y ∗ ),
and thus
(F 0 (y ∗ ), z) ≥ 0
for all z = y − y ∗ ∈ L(F, y ∗ ),
59
(3.40)
Define the convex cone B by
B = {(F 0 (y ∗ )C(y ∗ ) + R+ , G0 (y ∗ )C(y ∗ ) − K(G(y ∗ ))} ⊂ R × Y.
By the regular point condition B has an interior point. From (3.40) the origin (0, 0) is
a boundary point of B and thus there exists a hyperplane in R × Y that supports B
at (0, 0), i.e., there exists a nontrivial (α, λ) ∈ R × Y such that
α (F 0 (y ∗ )z + r) + (λ, G0 (y ∗ )z − y) ≥ 0
for all z ∈ C(y ∗ ), y ∈ K(G(y ∗ )) and r ≥ 0. Setting (z, r) = (0, 0), we have (λ, y) ≤ 0
for all y ∈ K(G(y ∗ )) and thus
λ ∈ K+
and
(λ, G(y ∗ )) = 0.
Letting (r, y) = (0, 0) and z ∈ L(F, y ∗ ), we have α ≥ 0. If α = 0, the regular point
condition implies λ = 0, which is a contradiction. Without loss of generality we set
α = 1 and obtain (3.39) by setting (r, y) = 0. Corollary (Nonlinear Programming ) If there exists x
¯ such that E(¯
x) = 0 and
G(¯
x) ≤ 0, then there exits a solution x∗ to (3.36). Assume the regular point condition
0 ∈ int {E 0 (x∗ )(C − x∗ ) × (G0 (x∗ )(C − x∗ ) − Z − + G(x∗ ))},
then there exists λ ∈ Y and µ ∈ Z such that
(J 0 (x∗ ) + (E 0 (x∗ ))∗ λ + (G0 (x∗ )∗ µ, x − x∗ ) ≥ 0
for all x ∈ C,
E(x∗ ) = 0
G(x∗ ) ≤ 0,
(3.41)
µ≥0
and
(G(x∗ ), µ)Z = 0.
Next, we consider the parameterized nonlinear programming. Let X, Y, U be
Hilbert spaces and Z be a Hilbert lattice. We consider the constrained minimization
of the form
min
J(x, u)
ˆ
subject to E(x, u) = 0 and G(x, u) ≤ 0 over (x, u) ∈ X × C,
(3.42)
where J : X × U → R, E : X × U → Y and G : X × U → Z are continuously
differentiable and Cˆ is a closed convex set in U . Here we assume that for given u ∈ Cˆ
there exists a unique x = x(u) ∈ X that satisfies E(x, u) = 0. Thus, one can eliminate
x ∈ X as a function of u ∈ U and thus (3.42) can be reduced to the constrained
ˆ But it is more advantageous to analyze (3.42) directly using
minimization over u ∈ C.
Theorem (Lagrange Mulplier).
Corollary (Control Problem) Let (x∗ , u∗ ) be a minimizer of (3.42) and assume the
regular point holds. It follows from Theorem 8 that the necessary optimality condition
is given by

Jx (x∗ , u∗ ) + Ex (x∗ , u∗ )∗ λ + Gx (x∗ , u∗ )∗ µ = 0








 (Ju (x∗ , u∗ ) + Eu (x∗ , u∗ )∗ λ + Gu (x∗ , u∗ )∗ µ, u − u∗ ) ≥ 0 for all u ∈ Cˆ
(3.43)

∗
∗

 E(x , u ) = 0






G(x∗ , µ∗ ) ≤ 0, µ ≥ 0, and (G(x∗ , u∗ ), µ) = 0.
60
Example (Control problem) Consider the optimal control problem
T
Z
(`(x(t)) + h(u(t))) dt + G(x(T ))
min
0
subject to
d
x(t) − f (x, u) = 0,
dt
(3.44)
x(0) = x0
ˆ = a closed convex set in Rm .
u(t) ∈ U
We let X = H 1 (0, T ; Rn ), U = L2 (0, T : Rm ), Y = L2 (0, T ; Rn ) and
Z
T
(`(x) + h(u)) dt + G(x(T )),
J(x, u) =
E = f (x, u) −
0
d
x
dt
ˆ a.e in (0, T )}. From (3.43) we have
and Cˆ = {u ∈ U, u(t) ∈ U
Z
T
(`0 (x∗ (t))φ(t) + (p(t), fx (x∗ , u∗ )φ(t) −
0
d
φ) dt + G0 (x(T ))φ(T ) = 0
dt
for all φ ∈ H 1 (0, T ; Rn ) with φ(0) = 0. By the integration by part
T
Z
0
Z
(
T
(`0 (x∗ ) + fx (x∗ , u∗ )t p) ds + G0 (x∗ (T )) − p(t),
t
dφ
) dt = 0.
dt
Since φ is arbitrary, we have
0
∗
Z
p(t) = G (x (T )) +
T
(`0 (x∗ ) + fx (x∗ , u∗ )t p) ds
t
and thus the adjoint state p is absolutely continuous and satisfies
−
d
p(t) = fx (x∗ , u∗ )t p(t) + `0 (x∗ ),
dt
p(T ) = G0 (x∗ (T )).
From the the optimality, we have
Z T
(h0 (u∗ ) + fu (x∗ , u∗ )t p(t))(u(t) − u∗ (t)) dt
0
ˆ which implies
for all u ∈ C,
ˆ
(h0 (u∗ (t)) + fu (x∗ (t), u∗ (t))t p(t))(v − u∗ (t)) ≥ 0 for all v ∈ U
at Lebesgue points t of u∗ (t). Hence we obtain the necessary optimality is of the form
of Two-Point-Boundary value problem;
 d ∗
∗ ∗
x(0) = x0

dt x (t) = f (x , u ),




ˆ
(h0 (u∗ (t)) + fu (x∗ (t), u∗ (t))t p(t), v − u∗ (t)) ≥ 0 for all v ∈ U




 d
− dt p(t) = fx (x∗ , u∗ )t p(t) + lx (x∗ ), p(T ) = Gx (x∗ (T )).
61
ˆ = [−1, 1] coordinate-wise and h(u) = α |u|2 , f (x, u) = f˜(x) + Bu. Thus, the
Let U
2
optimality condition implies
(α u∗ (t) + B t p(t), v − u∗ (t))Rm ≥ 0 fur all |v|∞ ≤ 1.
Or, equivalently u∗ (t) minimizes
α 2
|u| + (B t p(t), u) over |u|∞ ≤ 1
2
Thus, we have


1





t
u∗ (t) =
− B αp(t)






−1
B t p(t)
α
if −
>1
t
if | B αp(t) | ≤ 1
if −
B t p(t)
α
(3.45)
< −1.
˜ be a subset
Example (Coefficient estimation) Let ω be a bounded open set in Rd and Ω
in Ω. Consider the parameter estimation problem
Z
Z
2
min J(u, c) =
|y − u| dx +
|∇c|2 dx
˜
Ω
Ω
subject to
−∆u + c(x)u = f,
c≥0
over (u, c) ∈ H01 (Ω) × H 1 (Ω). We let Y = (H01 (Ω))∗ and
hE(u, c), φi = (∇u, ∇φ) + (c u, φ)L2 (Ω) − hf, φi
for all φ ∈ H01 (Ω). Since
hJx (h) + Ex∗ λ(h) = (∇h, ∇λ) + (c h, λ)L2 (ω) − ((y − u) χΩ˜ , h) = 0,
for all h ∈ H01 (Ω), i.e., the adjoint λ ∈ H01 (Ω) satisfies
−∆λ + c λ = (y − u) χΩ˜ .
Also, there exits µ ≥ 0 in L2 (Ω) such that
hJc + Ec∗ λ, di = α (∇c∗ , ∇d) + (d u, λ)L2 (Ω) − (µ, d) = 0
for all d ∈ H 1 (Ω) and
hµ, c − c∗ i = 0 and hµ, di ≥ 0 for all d ∈ H 1 (Ω) satisfying d ≥ 0.
3.7
Minimum norm solution in Banach space
In this section we discuss the minimum distance and the minimum norm problem in
Banach spaces. Given a subspace M of a normed space X, the orthogonal complement
M ⊥ of M is defined by
M ⊥ = {x∗ ∈ X ∗ : hx∗ , xi = 0 for all x ∈ M }.
62
Theorem If M is a closed subspace of a normed space, then
(M ⊥ )⊥ = {x ∈ X : hx, x∗ i = 0 for all x∗ ∈ M ⊥ } = M
Proof: If there exits a x ∈ (M ⊥ )⊥ but x ∈
/ M , define a linear functional f by
f (α x + m) = α
on the space spanned by x + M . Since x ∈
/ m and M is closed
|f | =
|f (α x + m)|
|f (x + m)|
1
= sup
=
< ∞,
inf m∈M |x + m|X
m∈M, α∈R |α x + m|X
m∈M |x + m|X
sup
and thus f is bounded. By the Hahn-Banach theorem, f can be extended to x∗ ∈ X ∗ .
Since hx∗ , mi = 0 for all m ∈ M , thus x∗ ∈ M ⊥ . It is contradicts to the fact that
hx∗ , xi = 1. Theorem (Duality I) Given x ∈ X and a subspace M , we have
d = inf |x − m|X =
m∈M
max
|x∗ |≤1, x∗ ∈M ⊥
hx∗ , xi = hx∗0 , xi,
where the maximum is attained by some x∗0 ∈ M ⊥ with |x∗0 | = 1. If the infimum on
the first equality holds for some m0 ∈ M , then x∗0 is aligned with x − m0 .
Proof: By the definition of inf, for any > 0, there exists m ∈ M such that |x − m | ≤
d + . For any x∗ ∈ M ⊥ , |x∗ | ≤ 1, it follows that
hx∗ , xi = hx∗ , x − m i ≤ |x∗ ||x − m | ≤ d + .
Since > 0 is arbitrary, hx∗ , xi ≤ d. It only remains to show that hx∗0 , xi = d for some
x∗0 ∈ X ∗ . Define a linear functional f on the space spanned by x + M by
f (α x + m) = α d
Since
|f | =
d
|f (α x + m)|
=
=1
inf m∈M |x + m|X
m∈M, α∈R |α x + m|X
sup
f is bounded. By the Hahn-Banach theorem, there exists an extension x∗0 of f with
|x∗0 | = 1. Since hx∗0 , mi = 0 for all m ∈ M , x∗0 ∈ M ⊥ . By construction hx∗0 , xi = d.
???Moreover, if |x − m0 | = d for some m0 ∈ M , then
hx∗0 , x − m0 i = d = |x∗0 ||x − m0 |.
3.7.1
Duality Principle
Corollary (Duality) If m0 ∈ M attains the minimum if and only if there exists a
nonzero x∗ ∈ M ⊥ that is aligned with x − m0 .
Proof: Suppose that x∗ ∈ M ⊥ is aligned with x − m0 ,
|x − m0 | = hx∗ , x − m0 i = hx∗ , xi = hx∗ , xi ≤ |x − m|
for all m ∈ M . 63
Theorem (Duality principle) (1) Given x∗ ∈ X ∗ , we have
min |x∗ − m∗ |X ∗ =
sup
m∗ ∈M ⊥
hx∗ , xi.
x∈M, |x|≤1
(2) The minimum of the left is achieved by some m∗0 ∈ M ⊥ .
(3) If the supremum on the right is achieved by some x0 ∈ M , then x∗ − m∗0 is aligned
with x0 .
Proof: (1) Note that for m∗ ∈∈ M ⊥
|x∗ − m∗ | = sup hx∗ − m∗ , xi ≥
|x|≤1
hx∗ − m∗ , xi =
sup
|x|≤1, x∈M
sup
hx∗ , xi.
|x|≤1, x∈M
Consider the restriction of x∗ to M (with norm supx∈M, |x|≤1 hx∗ , xi. Let y ∗ be the
Hahn-Banach extension of this restricted x∗ . Let m∗0 = x∗ − y ∗ . Then m∗0 ∈ M ⊥ and
|x∗ − m∗0 | = |y ∗ | = supx∈M, |x|≤1 hx∗ , xi. Thus, the minimum of the left is achieved
by some m∗0 ∈ M ⊥ . If the supremum on the right is achieved by some x0 ∈ M , then
x∗ −m∗0 is aligned with x0 . Obviously, |x0 | = 1 and |x∗ −m∗0 | = hx∗ , x0 i = hx∗ −m∗0 , x0 i.
In all applications of the duality theory, we use the alignment properties of the space
and its dual to characterize optimum solutions, guarantee the existence of a solution
by formulating minimum norm problems in the dual space, and examine to see if the
dual problem is easier than the primal problem.
Example 1(Chebyshev Approximation) Problem is to to find a polynomial p of degree
less than or equal to n that best approximates f ∈ C[0, 1] in the the sense of the supnorm over [a, b]. Let M be the space of all polynomials of degree less than or equal to
n. Then, M is a closed subspace of C[0, 1] of dimension n + 1. The infimum of |f − p|∞
is achievable by some p0 ∈ M since M is closed. The Chebyshev theorem states that
the set Γ = {t ∈ [0, 1] : |f (t) − p0 (t)| = |f − p0 |∞ } contains at least n + 2 points. In fact,
f − p0 must be aligned with some element in M ⊥ ⊂ C[0, 1]∗ = BV [0, 1]. Assume Γ
contains m < n + 2 points 0 ≤ t1 < · · · < tm ≤ 1. If v ∈ BV [0, 1] is aligned with f − p0 ,
then v is a piecewise continuous function with jump discontinuities only at these tk ’s.
Let tk be a point of jump discontinuity of v. Define
q(t) = Πm
j6=k (t − tj )
Then, q ∈ M but hq, vi =
6 0 and hence v ∈
/ M ⊥ , which is a contradiction.
Next, we consider the minimum norm problem;
min
|x∗ |X ∗
subject to hx∗ , yk i = ck , 1 ≤ k ≤ n.
where y1 , · · · , yn ∈ X are given linearly independent vectors.
Theorem Let x
¯∗ ∈ X ∗ satisfying the constraints h¯
x∗ , yk i = ck , 1 ≤ k ≤ n. Let
M = span{yk }. Then,
d=
min
h¯
x∗ ,yk i=ck
|x∗ |X ∗ = min |¯
x∗ − m∗ |.
m∗ ∈M ⊥
By the duality principle,
d = min |¯
x∗ − m∗ | =
m∗ ıM ⊥
sup
h¯
x∗ , xi.
x∈M, |x|≤1
64
Write x = Y a where Y = [y1 , · · · , yn ] and a ∈ Rn . Then
d = min |x∗ | = max h¯
x∗ , Y ai = max (c, a).
|Y a|≤1
|Y a|≤1
Thus, the optimal solution x∗0 must be aligned with Y a.
Example (DC motor control) Let X = L1 (0, 1) and X ∗ = L∞ (0, 1).
min
|u|∞
1
Z
t−1
e
subject to
Z
1
u(t) dt = 0,
(1 + et−1 )u(t) = 1.
0
0
Let y1 = et−1 and y2 = 1 + et−1 . From the duality principle
min
hyk ,ui=ck
|u|∞ =
max
|a1 y1 +a2 y2 |1 ≤1
a2 .
The controlR problem now is reduced to finding constants a1 and a2 that maximizes a2
1
subject to 0 |(a1 − a2 )et−1 + a2 | dt ≤ 1. The optimal solution u ∈ L∞ (0, 1) should be
aligned with (a1 − a2)et−1 + a2 ∈ L1 (0, 1). Since (a1 − a2 )et−1 + a2 , can change sign
at most once. the alignment condition implies that u must have values |u|∞ and can
change sign at most once, which is the so called bang-bang control.
4
Linear Operator Theory and C0-semigroup
In this section we discuss the Cauchy problem
d
u(t) = Au(t) + f (t),
dt
u(0) = u0 ∈ X
in a Banach space X, where u0 ∈ X is the initial condition and f ∈ L1 (0, T ; X). We
construct the mild solution u(t) ∈ C(0, T ; X):
Z t
u(t) = S(t)u0 +
S(t − s)f (s) ds
(4.1)
0
where a family of bounded linear operator {S(t), t ≥ 0} is C0 -semigroup on X.
Definition (C0 semigroup) (1) Let X be a Banach space. A family of bounded
linear operators {S(t), t ≥ 0} on X is called a strongly continuous (C0 ) semigroup if
S(t + s) = S(t)S(s) for t, s ≥ 0 with S(0) = I
|S(t)φ − φ| → 0 as t → 0+
for all φ ∈ X.
(2) A linear operator A in X defined by
Aφ = lim
t→0+
S(t)φ − φ
t
(4.2)
with
dom(A) = {φ ∈ X : the strong limit of lim
t→0+
65
S(t)φ − φ
in X exists}.
t
is called the infinitesimal generator of the C0 semigroup S(t).
In this section we present the basic theory of the linear C0 -semigroup on a Banach
space X. The theory allows to analyze a wide class of the physical and engineering
dynamics using the unified framework. We also present the concrete examples to
demonstrate the theory. There is a necessary and sufficient condition (Hile-Yosida
Theorem) for a closed, densely defined linear A in X to be the infinitesimal generator
of the C0 semigroup S(t). Moreover, we will show that the mild solution u(t) satisfies
Z
hu(t), ψi = hu0 , ψi + (hx(s), A∗ ψi + hf (s), ψi ds
(4.3)
for all ψ ∈ dom (A∗ ).
Examples (1) For A ∈ (X), define a sequence of linear operators in X
SN (t) =
X 1
(At)k .
k!
k
Then
|SN (t)| ≤
and
X 1
(|A|t)k ≤ e|A| t
k!
d
(t) = ASN −1 (t)
SN
Since S(t) = eAt = limN →∞ SN (t) in the operator norm, we have
d
S(t) = AS(t) = S(t)A.
dt
(2) Consider the the hyperbolic equation
ut + ux = 0,
u(0, x) = u0 (x) in (0, 1).
(4.4)
Define the semigroup S(t) of translations on X = L2 (0, 1) by
[S(t)u0 ](x) = u
˜0 (x − t),
u
˜0 (x) = 0, x ≤ 0, u
˜0 = u0 on (0, 1).
Then, {S(t), t ≥ 0} is a C0 semigroup on X. If we define u(t, x) = [S(t)u0 ](x) with
u0 ∈ H 1 (0, 1) with u0 (0) = 0 satisfies (4.5) a.e.. The generator A is given by
Aφ = −φ0 with dom(A) = {φ ∈ H 1 (0, 1) with u(0) = 0}.
Thus, u(t) = S(t)u0 satisfies the Cauchy problem
(3) Consider the the heat equation
ut =
σ2
∆u,
2
d
u (t)
= Au(t) if u0 ∈ dom(A).
u(0, x) = u0 (x) in L2 (Rn ).
Define the semigroup
1
[S(t)u0 ](x) = √
( 2πtσ)n
66
Z
Rn
e−
|x−y|2
2σ 2 t
u0 (y) dy
(4.5)
Let A be closed, densely defined linear operator dom(A) → X. We use the finite
difference method in time to construct the mild solution (4.1). For a stepsize λ > 0
consider a sequence {un } in X generated by
un − un−1
= Aun + f n−1 ,
λ
with
f
n−1
1
=
λ
(4.6)
nλ
Z
f (t) dt.
(n−1)λ
Assume that for λ > 0 the resolvent operator
Jλ = (I − λ I)−1
is bounded. Then, we have the product formula:
un = Jλn u0 +
n−1
X
Jλn−k f k λ.
(4.7)
k=0
In order to un ∈ X is uniformly bounded in n for all u0 ∈ X (with f = 0), it is
necessary that
M
|Jλn | ≤
for λω < 1,
(4.8)
(1 − λω)n
for some M ≥ 1 and ω ∈ R.
Hile’s Theorem Define a piecewise constant function in X by
uλ (t) = uk−1 on [tk−1 , tk )
Then,
max |uλ − u(t)|x → 0
t∈[0,T ]
as λ → 0+ to the mild solution (4.1). That is,
S(t)x = lim (I −
n→∞
t [t]
A) n x
n
exists for all x ∈ X and {S(t), t ≥ 0} is the C0 semigurop on X and its generator is
A.
Proof: First, note that
|Jλ | ≤
M
1 − λω
and for x ∈ dom(A)
|Jλ x − x| = |λJλ Ax| ≤
λ
|Ax| → 0
1 − λω
as λ → 0+ . Since dom(A) is dense in X it follows that
|Jλ x − x| → 0 as λ → 0+ for all x ∈ X.
67
Define the linear operators Tλ (t) and Sλ (t) by
Sλ (t) = Jλk−1 and Tλ (t) = Jλk−1 + (t − tk )(Jλk − Jλk−1 ), on [tk−1 , tk ).
Then,
d
Tλ (t) = ASλ (t), a.e int ∈ [0, T ].
dt
Thus,
Z
Tλ (t)u0 −Tµ (t)u0 =
0
t
d
(Tλ (s)Tµ (t−s)u0 ) ds =
ds
Z
t
(Sλ (s)Tµ (t−s)−Tλ (s)Sλ (t−s))Au0 ds
0
Since
Tλ (s)u − Sλ (s)u = (s − tk−1 )(Jλ − I)Tλ (tk−1 ) on s ∈ [tk−1 , tk ).
By the bounded convergence theorem
|Tλ (t)u0 − Tµ (t)u0 |X → 0
as λ, µ → 0 for all u0 ∈ dom(A). Thus, the unique limit defines the linear operator
S(t) by
S(t)u0 = lim Sλ (t)u0 .
(4.9)
λ→0+
Since
|Sλ (t)| ≤
M
≤ M eωt ,
(1 − λω)[t/n]
where [s] is the largest integer less than s ∈ R. and dom(A) is dense, (4.9) holds for
all u0 ∈ X. Since
S(t + s)u = lim Jλn+m = Jλn Jλm u = S(t)S(s)u
λ→0+
and limt→0+ S(t)u = limt→0+ Jt u = u, S(t) is the C0 semigroup on X. Moreover,
{S(t), t ≥ 0} is in the class G(M, ω), i.e.,
|S(t)| ≤ M eωt
Note that
Z
t
Tλ (t)u0 − u0 = A
Sλ u0 ds
0
Since limλ→0+ Tλ (t)u0 = limλ→0+ Sλ (t)u0 = S(t)u0 and A is closed, we have
Z t
Z t
S(t)u0 − u0 = A
S(s)u0 ds,
S(s)u0 ds ∈ dom(A).
0
0
If B is a generator of {S(t), t ≥ 0}, then
Bx = lim
t→0+
S(t)x − x
= Ax
t
if x ∈ dom(A). Conversely, if u0 ∈ dom(B), then u0 ∈ dom(A) since A is closed and
t → S(t)u0 is continuous at 0 and thus
Z
1 t
S(s)u0 ds = u0 as t → 0+ .
t 0
68
Hence
S(t)u0 − u0
= Bx
t
That is, A is the generator of {S(t), t ≥ 0}.
Similarly, we have
Ax =
n−1
X
Jλn−k f k =
k=0
t
Z
Z
Sλ (t − s)f (s) ds →
t
S(t − s)f (s) ds as λ → 0+
0
0
by the Lebesque dominated convergence theorem. The following theorem state the basic properties of C0 semigroups:
Theorem (Semigroup) (1) There exists M ≥ 1, ω ∈ R such that S ∈ G(M, ω) class,
i.e.,
|S(t)| ≤ M eωt , t ≥ 0.
(4.10)
(2) If x(t) = S(t)x0 , x0 ∈ X, then x ∈ C(0, T ; X)
(3) If x0 ∈ dom (A), then x ∈ C 1 (0, T ; X) ∩ C(0, T ; dom(A)) and
d
x(t) = Ax(t) = AS(t)x0 .
dt
(4) The infinitesimal generator A is closed and densely defined. For x ∈ X
Z t
S(t)x − x = A
S(s)x ds.
(4.11)
0
(5) λ > ω the resolvent is given by
(λ I − A)−1 =
Z
∞
e−λs S(s) ds
(4.12)
M
.
(λ − ω)n
(4.13)
0
with estimate
|(λ I − A)−n | ≤
Proof: (1) By the uniform boundedness principle there exists M ≥ 1 such that |S(t)| ≤
M on [0, t0 ] For arbitrary t = k t0 +τ , k ∈ N and τ ∈ [0, t0 ) it follows from the semigroup
property we get
|S(t)| ≤ |S(τ )||S(t0 |k ≤ M ek log |S(t0 )| ≤ M eω t
with ω = t10 , log |S(t0 )|.
(2) It follows from the semigroup property that for h > 0
x(t + h) − x(t) = (S(h) − I)S(t)x0
and for t − h ≥ 0
x(t − h) − x(t) = S(t − h)(I − S(h))x0
Thus, x ∈ C(0, T ; X) follows from the strong continuity of S(t) at t = 0.
(3)–(4) Moreover,
x(t + h) − x(t)
S(h) − I
S(h)x0 − x0
=
S(t)x0 = S(t)
h
h
h
69
and thus S(t)x0 ∈ dom(A) and
lim
h→0+
x(t + h) − x(t)
= AS(t)x0 = Ax(t).
h
Similarly,
lim
h→0+
x(t − h) − x(t)
S(h)φ − φ
= lim S(t − h)
= S(t)Ax0 .
−h
h
h→0+
Hence, for x0 ∈ dom(A)
Z t
Z t
Z t
S(s)x0 ds
AS(s)x0 ds = A
S(s)Ax0 ds =
S(t)x0 − x0 =
(4.14)
0
0
0
If xn ∈ don(A) → x and Axn → y in X, we have
Z t
S(s)y ds
S(t)x − x =
0
Since
1
lim
t→0+ t
Z
t
S(s)y ds = y
0
x ∈ dom(A) and y = Ax and hence A is closed. Since A is closed it follows from (4.14)
that for x ∈ X
Z t
S(s)x ds ∈ dom(A)
0
and (4.11) holds. For x ∈ X let
xh =
1
h
Z
h
S(s)x ds ∈ dom(A)
0
Since xh → x as h → 0+ , dom(A) is dense in X.
(5) For λ > ω define Rt ∈ L(X) by
Z t
Rt =
e−λs S(s) ds.
0
Since A − λ I is the infinitesimal generator of the semigroup eλt S(t), from (4.11)
(λ I − A)Rt x = x − e−λt S(t)x.
Since A is closed and |e−λt S(t)| → 0 as t → ∞, we have R = limt→∞ Rt satisfies
(λ I − A)Rφ = φ.
Conversely, for φ ∈ dom(A)
Z ∞
R(A − λ I)φ =
e−λs S(s)(A − λ I)φ = lim e−λt S(tφ − φ = −φ
t→∞
0
Hence
Z
R=
∞
e−λs S(s) ds = (λ I − A)−1
0
70
Since for φ ∈ X
∞
Z
|e
|Rx| ≤
−λs
∞
Z
e(ω−λ)s |x| ds =
S(s)x| ≤ M
0
0
we have
|(λ I − A)−1 | ≤
M
,
λ−ω
M
|x|,
λ−ω
λ > ω.
Note that
−2
(λ I − A)
∞
Z
e
=
−λt
∞Z ∞
e S(s) ds =
=
e
0
0
0
−λσ
∞Z ∞
Z
λs
S(t) ds
0
Z
∞
Z
∞
Z
S(σ) dσ dt =
e−λ(t+s) S(t + s) ds dt
0
σe−λσ S(σ) dσ.
0
t
By induction, we obtain
−n
(λ I − A)
1
=
(n − 1)!
Thus,
−n
|(λ I − A)
1
|≤
(n − 1)!
∞
Z
Z
∞
tn−1 e−λt S(t) dt.
(4.15)
0
tn−1 e−(λ−ω)t dt =
0
M
.
(λ − ω)n
We then we have the necessary and sufficient condition:
Hile-Yosida Theorem A closed, densely defined linear operator A on a Banach space
X is the infinitesimal generator of a C0 semigroup of class G(M, ω) if and only if
M
(λ − ω)n
|(λ I − A)−n | ≤
for all λ > ω
(4.16)
Proof: The sufficient part follows from the previous Theorem. But, we describe the
Yosida construction. Define the Yosida approximation Aλ ∈ L(X) of A by
Aλ =
Jλ − I
= AJλ .
λ
(4.17)
Define the Yosida approximation:
t
t
Sλ (t) = eAλ t = e− λ eJλ λ .
Since
|Jλk | ≤
we have
t
|Sλ (t)| ≤ e− λ
∞
X
M
k=0
Since
M
(1 − λω)k
ω
t
|Jλk |( )k ≤ M e 1−λω t .
k!
λ
d
Sλ (s)Sλˆ (t − s) = Sλ (s)(Aλ − Aλˆ )Sλˆ (t − s),
ds
we have
Z
Sλ (t)x − Sλˆ (t)x =
0
t
Sλ (s)Sλˆ (t − s)(Aλ − Aλˆ )x ds
71
Thus, for x ∈ dom(A)
|Sλ (t)x − Sλˆ (t)x| ≤ M 2 teωt |(Aλ − Aλˆ )x| → 0
ˆ → 0+ . Since dom(A) is dense in X this implies that
as λ, λ
S(t)x = lim Sλ (t)x exist for all x ∈ X,
λ→0+
defines a C0 semigroup of G(M, ω) class. The necessary part follows from (4.15) Theorem (Mild solution) (1) If for f ∈ L1 (0, T ; X) define
t
Z
S(t − s)f (s) ds,
x(t) = x(0) +
0
then x(t) ∈ C(0, T ; X) and it satisfies
Z t
Z t
x(s) ds +
f (s) ds.
x(t) = A
0
(4.18)
0
(2) If Af ∈ L1 (0, T ; X) then x ∈ C(0, T ; dom(A)) and
Z t
(Ax(s) + f (s)) ds.
x(t) = x(0) +
0
Rt
d
(3) If f ∈ W 1,1 (0, T ; X), i.e. f (t) = f (0) + 0 f 0 (s) ds, dt
f = f 0 ∈ L1 (0, T ; X), then
Ax ∈ C(0, T ; X) and
Z t
Z t
A
S(t − s)f (s) ds = S(t)f (0) − f (t) +
S(t − s)f 0 (s) ds.
(4.19)
0
0
Proof: Since
Z tZ
τ
t
Z
S(t − s)f (s) dsdτ =
0
0
Z
t
S(τ − s)d τ )f (s) ds,
(
0
and
Z
s
t
S(s) ds = S(t) − I
A
0
we have x(t) ∈ dom(A) and
Z t
Z t
Z t
A
x(s) ds = S(t)x − x +
S(t − s)f (s) ds −
f (s) ds.
0
0
0
and we have (4.18).
(2) Since for h > 0
x(t + h) − x(t)
=
h
Z
0
t
S(h) − I
1
S(t − s)
f (s) ds +
h
h
Z
t+h
S(t + h − s)f (s) ds
t
if Af ∈ L1 (0, T ; X)
x(t + h) − x(t)
lim
=
+
h
h→0
Z
t
S(t − s)Af (s) ds + f (t)
0
72
a.e. t ∈ (0, T ). Similarly,
Z
Z t−h
x(t − h) − x(t)
S(h) − I
1 t
S(t − s)f (s) ds
=
S(t − h − s)
f (s) ds +
−h
h
h t−h
0
Z
t
S(t − s)Af (s) ds + f (t)
→
0
a.e. t ∈ (0, T ).
(3) Since
Z t+h
Z h
S(h) − I
1
S(t + f − s)f (s) ds
S(th − s)f (s) ds −
x(t) = (
h
h 0
t
Z
t
S(t − s)
+
0
f (s + h) − f (s)
ds,
h
letting h → 0+ , we obtain(4.19). It follows from Theorems the mild solution
Z t
x(t) = S(t)x(0) +
S(t − s)f (s) ds
0
satisfies
t
Z
x(t) = x(0) + A
Z
t
x(s) +
0
f (s) ds.
0
Note that the mild solution x ∈ C(0, T ; X) depends continuously on x(0) ∈ X and
f ∈ L1 (0, T ; X) with estimate
Z t
ωt
|x(t)| ≤ M (e |x(0)| +
eω(t−s) |f (s)| ds).
0
Thus, the mild solution is the limit of a sequence {xn } of strong solutions with xn (0) ∈
dom(A) and fn ∈ W 1,1 (0, T ; X), i.e., since dom(A) is dense in X and W 1,1 (0, T ; X) is
dense in L1 (0, T ; X),
|xn (t) − x(t)|X → 0 uniformly on [0, T ]
for
|xn (0) − x(0)|x → 0 and |fn − f |L1 (0,T ;X) → 0 as n → ∞.
Moreover, the mild solution x ∈ C(0, T : X) is a weak solution to the Cauchy problem
d
x(t) = Ax(t) + f (t)
dt
(4.20)
in the sense of (4.3), i.e., for all ψ ∈ dom(A∗ ) hx(t), ψiX×X ∗ is absolutely continues
and
d
hx(t), ψi = hx(t), ψi + hf (t), ψi a.e. in (0, T ).
dt
If x(0) ∈ dom(A) and Af ∈ L1 (0, T ; X), then Ax ∈ C(0, T ; X), x ∈ W 1,1 (0, T ; X) and
d
x(t) = Ax(t) + f (t), a.e. in (0, T )
dt
73
If x(0) ∈ dom(A) and f ∈ W 1,1 (0, T ; X), then x ∈ C(0, T ; dom(A)) ∩ C 1 (0, T ; X) and
d
x(t) = Ax(t) + f (t), everywhere in [0, T ].
dt
The condition (4.16) is very difficult to check for a given A in general. For the case
M = 1 we have a very complete characterization.
Lumer-Phillips Theorem The followings are equivalent:
(a) A is the infinitesimal generator of a C0 semigroup of G(1, ω) class.
(b) A − ω I is a densely defined linear m-dissipative operator,i.e.
|(λ I − A)x| ≥ (λ − ω)|x|
for all x ∈ don(A), λ > ω
(4.21)
and for some λ0 > ω
R(λ0 I − A) = X.
(4.22)
Proof: It follows from the m-dissipativity
|(λ0 I − A)−1 | ≤
1
λ0 − ω
Suppose xn ∈ dom(A) → x and Axn → y in X, the
x = lim xn = (λ0 I − A)−1 lim (λ0 xn − Axn ) = (λ0 I − A)−1 (λ0 x − y).
n→∞
n→∞
Thus, x ∈ dom(A) and y = Ay and hence A is closed. Since for λ > ω
λ I − A = (I + (λ − λ0 )(λ0 I − A)−1 )(λ0 I − A),
0|
−1 ∈ L(X). Thus by the continuation method we have
if |λ−λ
λ0 −ω < 1, then (λ I − A)
−1
(λ I − A) exists and
1
|(λ I − A)−1 | ≤
, λ > ω.
λ−ω
It follows from the Hile-Yosida theorem that (b) → (a).
(b) → (a) Since for x∗ ∈ F (x), the dual element of x, i.e. x∗ ∈ X ∗ satisfying
hx, x∗ iX×X ∗ = |x|2 and |x∗ | = |x|
he−ωt S(t)x, x∗ i ≤ |x||x∗ | = hx, x∗ i
we have for all x ∈ dom(A)
0 ≥ lim h
t→0+
e−ωt S(t)x − x ∗
, x i = h(A − ω I)x, x∗ i for all x∗ ∈ F (x).
t
which implies A − ω I is dissipative. Theorem (Dissipative I) (1) A is a ω-dissipative
|λ x − Ax| ≥ (λ − ω)|x| for all x ∈ dom (A).
if and only if (2) for all x ∈ dom(A) there exists x∗ ∈ F (x) such that
hAx, x∗ i ≤ ω |x|2 .
74
(4.23)
(2) → (1). Let x ∈ dom(A) and choose x∗ ∈ F (0) such that hA, x∗ i ≤ 0. Then, for any
λ > 0,
|x|2 = hx, x∗ i = hλ x − Ax + Ax, x∗ i ≤ hλ x − Ax, x∗ i − ω |x|2 ≤ |λ x − Ax||x| − ω |x|2 ,
which implies (2).
(1) → (2). From (1) we obtain the estimate
1
− (|x − λ Ax| − |x|) ≤ 0
λ
and
hAx, xi− = lim =
λ→0+
1
(|x| − |x − λ Ax|) ≤ 0
λ
which implies there exists x∗ ∈ F (x) such that (4.23) holds since hAx, xi− = hAX, x∗ i
for some x∗ ∈ F (x). Thus, Lumer-Phillips theorem says that if m-diisipative, then (4.23) hold for all
x∗ ∈ F (x).
Theorem (Dissipative II) Let A be a closed densely defined operator on X. If A
and A∗ are dissipative, then A is m-dissipative and thus the infinitesimal generator of
a C − 0-semigroup of contractions.
Proof: Let y ∈ R(I − A) be given. Then there exists a sequence xn ∈ dom(A) such
that y= xn − Axn → y as n → ∞. By the dissipativity of A we obtain
|xn − xm | ≤ |xn − xm − A(xn − xm )| ≤ |y− ym |
Hence xn is a Cauchy sequence in X. We set x = limn→∞ xn . Since A is closed, we
see that x ∈ dom (A) and x − Ax = y, i.e., y ∈ R(I − A). Thus R(I − A) is closed.
Assume that R(I − A) 6= X. Then there exists an x∗ ∈ X ∗ such that
h(I − A)x, x∗ ) = 0 for all x ∈ dom (A).
By definition of the dual operator this implies x∗ ∈ dom (A∗ ) and (IA)∗ x∗ = 0. Dissipativity of A* implies |x∗ | < |x∗ − A∗ x∗ | = 0, which is a contradiction. Example (revisited example 1)
Aφ = −φ0 in X = L2 (0, 1)
and for φ ∈ H 1 (0, 1)
Z
(Aφ, φ)X = −
0
1
1
φ0 (x)φ dx == (|φ(0)|2 − |φ(1)|2 )
2
Thus, A is dissipative if and only if φ(0) = 0, the in flow condition. Define the domain
of A by
dom(A) = {φ ∈ H 1 (0, 1) : φ(0) = 0}
The resolvent equation is equivalent to
λu +
d
u=f
dx
75
and
Z
u(x) =
x
e−λ (x−s) f (s) ds,
0
and R(λ I − A) = X. By the Lumer-Philips theorem A generates the C0 semigroup on
X = L2 (0, 1).
Example (Conduction equation) Consider the heat conduction equation:
X
d
∂2u
u = Au =
aij (x)
+ b(x) · ∇u + c(x)u,
dt
∂xi ∂xj
in Ω.
i,j
Let X = C(Ω) and dom(A) ⊂ C 2 (Ω). Assume that a ∈ Rn×n ∈ C(Omega) b ∈ Rn,1
¯ and a is symmetric and
and c ∈ R are continuous on Ω
m I ≤ a(x) ≤ M I for 0 < m ≤ M < ∞.
Then, if x0 is an interior point of Ω at which the maximum of φ ∈ C 2 (Ω) is attained.
Then,
X
∂2u
(x0 ) ≤ 0.
∇φ(x0 ) = 0,
aij (x0 )
∂xi ∂xj
ij
and thus
(λ φ − Aφ)(x0 ) ≤ ω φ(x0 )
where
ω ≤ max c(x).
x∈Ω
Similarly, if x0 is an interior point of Ω at which the minimum of φ ∈ C 2 (Ω) is attained,
then
(λ φ − Aφ)(x0 ) ≥ 0
If x0 ∈ ∂Ω attains the maximum, then
∂
φ(x0 ) ≤ 0.
∂ν
Consider the domain with the Robin boundary condition:
dom(A) = {u ∈ α(x) u(x) + β(x)
∂
u = 0 at ∂Ω}
∂ν
with α, β ≥ 0 and inf x∈∂Ω (α(x) + β(x)) > 0. Then,
|λφ − Aφ|X ≥ (λ − ω)|φ|X .
for all φ ∈ dom(A), which shows A is dissipative. The range condition follows from the
Lax Milgram theory.
Example (Advection equation) Consider the advection equation
ut + (~b(x)u) = ν ∆u.
Let X = L1 (Ω). Assume
~b ∈ L∞ (Ω)
76
Let ρ ∈ C 1 (R) be a monotonically increasing function satisfying ρ(0) = 0 and ρ(x) =
sign(x), |x| ≥ 1 and ρ (s) = ρ( s ) for > 0. For u ∈ C 1 (Ω)
Z
∂
1
u − n · ~b u, ρ (u)) ds + (~b u − ν ux , ρ0 (u) ux ) + (c u, ρ (u)).
(Au, u) = (ν
∂n
Γ
where
1
1
(~b u, ρ0 (u) ux ) ≤ ν (ux , ρ0 (u) ux ) +
meas({|u| ≤ })
4ν
Assume the inflow condition u on {s ∈ ∂Ω : n · b < 0}. Note that for u ∈ L1 (Rd )
(u, ρ (u)) → |u|1
and
(ψ, ρ (u)) → (ψ, sign0 (u)) for ψ ∈ L1 (Ω)
as → 0. It thus follows
(λ − ω) |u| ≤ |λ u − λ Au|.
(4.24)
Since C 2 (Ω) is dense in W 2,1 (Rd ) (4.24) holds for u ∈ W 2,1 (Ω).
Example (X = Lp (Ω)( Let Au = ν ∆u + b · ∇u with homogeneous boundary condition
u = 0 on X = Lp (Ω). Since
Z
Z
∗
p−2
h∆u, u i =
∆, |u| u) = −(p − 1) (∇u, |u|p−2 ∇u)
Ω
Ω
and
(b · ∇u, |u|p−2 u)L2 ≤
(p − 1)ν
|b|2∞
|(∇u, |u|p−2 ∇u)L2 +
(|u|p , 1)L2
2
2ν(p − 1)
we have
hAu, u∗ iω|u|2
for some ω > 0.
Example (Second order equation) Let V ⊂ H = H ∗ ⊂ V ∗ be the Gelfand triple. Let ρ
be a bounded bilinear form on H × H, µ and σ be bounded bilinear forms on V × V .
Assume ρ and σ are symmetric and coercive and µ(φ, φ) ≥ 0 for all φ ∈ V . Consider
the second order equation
ρ(utt , φ) + µ(ut , φ) + σ(u, φ) = hf (t), φi for all φ ∈ V.
Define linear operators M (mass), D (dampping), K and (stiffness) by
(M φ, ψ)H = ρ(φ, ψ),
φ, ψ ∈ HhDφ, ψi = µ(φ, ψ)
φ, ψ ∈ V hKφ, ψiV ∗ ×V ,
Let v = ut and define A on X = V × H by
A(u, v) = (v, −M −1 (Ku + Dv))
with domain
dom (A) = {(u, v) ∈ X : v ∈ V and Ku + Dv ∈ H}
The state space X is a Hilbert space with inner product
((u1 , v1 ), (u, v2 )) = σ(u1 , u2 ) + ρ(v1 , v2 )
77
φ, ψ ∈ V
and
E(t) = |(u(t), v(t))|2X = σ(u(t), u(t)) + ρ(v(t), v(t))
defines the energy of the state x(t) = (u(t), v(t)). First, we show that A is dissipative:
(A(u, v), (u, v))X = σ(u, v)+ρ(−M −1 (Ku+Dv), v) = σ(u, v)−σ(u, v)−µ(v, v) = −µ(v, v) ≤ 0
Next, we show that R(λ I − A) = X. That is, for (f, g) ∈ X the exists a solution
(u, v) ∈ dom (A) satisfying
λ u − v = f,
λM v + Dv + Ku = M g,
or equivalently v = λ u − f and
λ2 M u + λDu + Ku = M g + λ M f + Df
(4.25)
Define the bilinear form a on V × V
a(φ, ψ) = λ2 ρ(φ, ψ) + λ µ(φ, ψ) + σ(φ, ψ)
Then, a is bounded and coercive and if we let
F (φ) = (M (g + λ f )φ)H + µ(f, φ)
then F ∈ V ∗ . It thus follows from the Lax-Milgram theory there exists a unique
solution u ∈ V to (4.25) and Dv + Ku ∈ H.
For example, consider the wave equation
1
utt
c2 (x)
= ∆u,
∂u
+ κ(x)ut = 0 and ∂Ω.
∂n
In this example we let V = H 1 (Ω)/R and H = L2 (Ω) and define
Z
σ(φ, ψ) = (∇φ, ∇ψ) dx
Ω
Z
µ(φ, ψ) =
κ(x)φ, ψ ds
∂Ω
Z
ρ(φ, ψ) =
Ω
1
φψ dx.
c2 (x)
Example (Maxwell equation)
Et = ∇ × H
µ Ht = −∇ × E
with boundary condition
E×n=0
The dissipativity follows from
E · (∇ × H) + H · (∇ × E) = div(E
78
Theorem (Dual semigroup) Let X be a reflexive Banach space. The adjoint S ∗ (t)
of the C0 semigroup S(t) on X forms the C0 semigroup and the infinitesimal generator
of S ∗ (t) is A∗ . Let X be a Hilbert space and dom(A∗ ) be the Hilbert space with graph
norm and X−1 be the strong dual space of dom(A∗ ), then the extension S(t) to X−1
defines the C0 semigroup on X−1 .
Proof: (1) Since for t, s ≥ 0
S ∗ (t + s) = (S(s)S(t))∗ = S ∗ (t)S ∗ (s)
and
hx, S ∗ (t)y − yiX×X ∗ = hS(t)x − x, yiX×X ∗ → 0.
for x ∈ X and y ∈ X ∗ Thus, S ∗ (t) is weakly star continuous at t = 0 and let B is the
generator of S ∗ (t) as
S ∗ (t)x − x
Bx = w∗ − lim
.
t
Since
S ∗ (t)y − y
S(t)x − x
, y) = (x,
),
(
t
t
for all x ∈ dom(A) and y ∈ dom(B) we have
hAx, yiX×X ∗ = hx, ByiX×X ∗
and thus B = A∗ . Thus, A∗ is the generator of a w∗ − continuous semigroup on X ∗ .
(2) Since
Z t
S ∗ (t)y − y = A∗
S ∗ (s)y ds
0
dom (A∗ ).
S ∗ (t)
for all y ∈ Y =
Thus,
is strongly continuous at t = 0 on Y .
∗
∗
(3) If X is reflexive, dom (A ) = X . If not, there exists a nonzero y0 ∈ X such that
hy0 , x∗ iX×X ∗ = 0 for all x∗ ∈ dom(A∗ ). Thus, for x0 = (λ I −A)−1 y0 hλ x0 −Ax0 , x∗ i =
hx0 , λ x∗ − A∗ x∗ i = 0. Letting x∗ = (λ I − A∗ )−1 x∗0 for x∗0 ∈ F (x0 ), we have x0 = 0
and thus y0 = 0, which yields a contradiction.
(4) X1 = dom (A∗ ) is a closed subspace of X ∗ and is a invariant set of S ∗ (t). Since A∗
is closed, S ∗ (t) is the C0 semigroup on X1 equipped with its graph norm. Thus,
(S ∗ (t))∗ is the C0 semigroup on X−1 = X1∗
and defines the extension of S(t) to X−1 . Since for x ∈ X ⊂ X−1 and x∗ ∈ X ∗
hS(t)x, x∗ i = hx, S ∗ (t)x∗ i,
S(t) is the restriction of (S ∗ (t))∗ onto X. 4.1
Sectorial operator and Analytic semigroup
In this section we have the representation of the semigroup S(t) in terms of the inverse
Laplace transform. Taking the Laplace transform of
d
x(t) = Ax(t) + f (t)
dt
79
we have
x
ˆ = (λ I − A)−1 (x(0) + fˆ)
where for λ > ω
Z
∞
e−λt x(t) dt
x
ˆ=
0
is the Laplace transform of x(t). We have the following the representation theory
(inverse formula).
Theorem (Resolvent Calculus) For x ∈ dom(A2 ) and γ > ω
1
S(t)x =
2πi
γ+i∞
Z
eλt (λ I − A)−1 x dλ.
(4.26)
γ−i∞
Proof: Let Aµ be the Yosida approximation of A. Since Re σ(Aµ ) ≤
have
1
uµ (t) = Sµ (t)x =
2πi
γ+i∞
Z
ω0
< γ, we
1 − µω0
eλt (λ I − Aµ )−1 x dλ.
γ−i∞
Note that
λ(λ I − A)−1 = I + (λ I − A)−1 A.
Since
1
2πi
and
Z
Z
γ+i∞
γ+i∞
γ−i∞
(4.27)
eλt
dλ = 1
λ
|(|λ − ω|−2 | dλ < ∞,
γ−i∞
we have
|Sµ (t)x| ≤ M |A2 x|,
uniformly in µ > 0. Since
(λ I − Aµ )−1 x − (λ I − A)−1 x =
µ
(ν I − A)−1 (λ I − A)−1 A2 x,
1 + λµ
λ
, {uµ (t)} is Cauchy in C(0, T ; X) if x ∈ dom(A2 ). Letting µ → 0+ ,
1 + λµ
we obtain (4.26). where ν =
Next we consider the sectorial operator. For δ > 0 let
Σδω = {λ ∈ C : arg(λ − ω) <
π
+ δ}
2
be the sector in the complex plane C. A closed, densely defined, linear operator A on
a Banach space X is a sectorial operator if
|(λ I − A)−1 | ≤
M
for all λ ∈ Σδω .
|λ − ω|
80
For 0 < θ < δ let Γ = Γω,θ be the integration path defined by
Γ± = {z ∈ C : |z| ≥ δ, arg(z − ω) = ±( π2 + θ)},
Γ0 = {z ∈ C : |z| = δ, |arg(z − ω)| ≤
π
2
+ θ}.
For 0 < θ < δ define a family {S(t), t ≥ 0} of bounded linear operators on X by
Z
1
eλt (λ I − A)−1 x dλ.
(4.28)
S(t)x =
2πi Γ
Theorem (Analytic semigroup) If A is a sectorial operator on a Banach space X,
then A generates an analytic (C0 ) semigroup on X, i.e., for x ∈ X t → S(t)x is an
analytic function on (0, ∞). We have the representation (4.28) for x ∈ X and
|AS(t)x|X ≤
Mθ
|x|X
t
Proof:
The elliptic operator A defined by the Lax-Milgram theorem defines a sectorial
operator on Hilbert space.
Theorem (Sectorial operator) Let V, H are Hilbert spaces and assume H ⊂ V ∗ .
Let ρ(u, v) is bounded bilinear form on H × H and
ρ(u, u) ≥ |u|2H for all u ∈ H
Let a(u, v) to be a bounded bilinear form on V × V with
σ(u, u) ≥ δ |u|2V for all u ∈ V
Define the linear operator A by
ρ(Au, φ) = a(u, φ) for all φ ∈ V.
Then, for Re λ > 0we have
|(λ I − A)−1 |L(V,V ∗ ) ≤
|(λ I − A)−1 |L(H) ≤
1
δ
M
|λ|
|(λ I − A)−1 |L(V ∗ ,H) ≤ √M
|λ|
|(λ I − A)−1 |L(H,V ) ≤ √M
|λ|
Proof: Let a(u, v) to be a bounded bilinear form on V × V . For f ∈ V ∗ and <λ > 0,
Define M H, H by
(M u, v) = ρ(u, v) for all v ∈ H
and A0 ∈ L(V, V ∗ ) by
hA0 u, vi = σ(u, v) for v ∈ V
81
Then, (λ I − A)u = M −1 f is equivalent to
λρ(u, φ) + a(u, φ) = hf, φi, for all φ ∈ V.
(4.29)
It follows from the Lax-Milgram theorem that (4.29) has a unique solution, given
f ∈ V ∗ and
Re λρ(u, u) + a(u, u) ≤ |f |V ∗ |u|V .
Thus,
1
|(λ I − A)−1 |L(V ∗ ,V ) ≤ .
δ
Also,
|λ| |u|2H ≤ |f |V ∗ |u|V + M |u|2V = M1 |f |2V ∗
for M1 = 1 +
M
δ2
and thus
√
|(λ I − A)−1 |L(V ∗ ,H) ≤
M1
.
|λ|1/2
For f ∈ H ⊂ V ∗
δ |u|2V ≤ Re λ ρ(u, u) + a(u, u) ≤ |f |H |u|H ,
(4.30)
and
|λ|ρ(u, u) ≤ |f |H |u|H + M |u|2V ≤ M1 |f |H |u|H
Thus,
|(λ I − A)−1 |L(H) ≤
M1
.
|λ|
Also, from (4.30)
δ |u|2V ≤ |f |H |u|H ≤ M1 |f |2 .
which implies
|(λ I − A)−1 |L(H,V ) ≤
4.2
M2
.
|λ|1/2
Approximation Theory
In this section we discuss the approximation theory for the C0 -semigroup.
Theorem (Trotter-Kato theorem) Let X and Xn be Banach spaces and A and An
be the infinitesimal generator of C0 semigroups S(t) on X and Sn (t) on Xn of G(M, ω)
class. Assume a family of uniformly bounded linear operators Pn ∈ L(X, Xn ) and
En ∈ L(Xn , X) satisfy
Pn En = I
|En Pn x − x|X → 0 for all x ∈ X
Then, the followings are equivalent.
(1) there exist a λ0 > ω such that for all x ∈ X
|En (λ0 I − An )−1 Pn x − (λ0 I − A)−1 x|X → 0
as n → ∞,
(2) For every x ∈ X and T ≥ 0
|En Sn (t)Pn x − S(t)x|X →
82
as n → ∞.
Proof: Since for λ > ω
En (λ I − A)
−1
Pn x − (λ I − A)
−1
Z
∞
En Sn (t)Pn x − S(t)x dt
x=
0
(1) follows from (2). Conversely, from the representation theory
En Sn (t)Pn x − S(t)x =
1
2πi
Z
γ+i∞
(En (λ I − A)−1 Pn x − (λ I − A)−1 x) dλ
γ−i∞
where
(λ I − A)−1 − (λ0 I − A)−1 = (λ − λ0 )(λ I − A)−1 (λ0 I − A)−1 .
Thus, from the proof of Theorem (Resolvent Calculus) (1) holds for x ∈ dom(A2 ). But
since dom(A2 ) is dense in X, (2) implies (1). Example (Trotter-Kato theoarem) Consider the heat equation on Ω = (0, 1) × (0, 1):
d
u(t) = ∆u,
dt
u(0, x) = u0 (x)
with boundary condition u = 0 at the boundary ∂Ω. We use the central difference
approximation on uniform grid points: (i h, j h) ∈ Ω with mesh size h = n1 :
ui,j − ui−1,j
ui,j − ui,j−1
d
1 ui+1,j − ui,j
1 ui,j+1 − ui,j
ui,j (t) = ∆h u = (
−
)+ (
−
)
dt
h
h
h
h
h
h
for 1 ≤ i, j ≤ n1 , where ui,0 = ui,n = u1, j = un,j = 0 at the boundary node. First,
2
let X = C(Ω) and Xn = R(n−1) with sup norm. Let En ui,j = the piecewise linear
interpolation and (Pn u)i,j = u(i h, j h) is the pointwise evaluation. We will prove that
∆h is dissipative on Xn . Suppose uij = |un |∞ . Then, since
λ ui,j − (∆h u)i,j = fij
and
−(∆h u)i,j =
1
(4ui,j − ui+1,j − ui,j+1 − ui−1,j − ui,j−1 ) ≥ 0
h2
we have
0 ≤ ui,j ≤
fi,j
.
λ
Thus, ∆h is dissipative on Xn with sup norm.
Next X = L2 (Ω) and Xn with `2 norm. Then,
(−∆h un , un ) =
X ui,j − ui−1,j
ui,j − ui,j−1 2
|
|2 + |
| ≥0
h
h
i,j
and thus ∆h is dissipative on Xn with `2 norm.
Example
83
5 Monotone Operator Theory and Nonlinear semigroup
In this section we discuss the nonlinear operator theory that extends the Lax-Milgram
theory for nonlinear monotone operators and mappings from Banach space from X to
X ∗ . Also, we extend the C0 semigroup theory to the nonlinear case.
Let us denote by F : X → X ∗ , the duality mapping of X, i.e.,
F (x) = {x∗ ∈ X ∗ : hx, x∗ i = |x|2 = |x∗ |2 }.
By Hahn-Banach theorem, F (x) is non-empty. In general F is multi-valued. Therefore,
when X is a Hilbert space, h·, ·i coincides with its inner product if X ∗ is identified with
X and F (x) = x.
Let H be a Hilbert space with scalar product (φ, ψ) and X be a real, reflexive
Banach space and X ⊂ H with continuous dense injection. Let X ∗ denote the strong
dual space of X. H is identified with its dual so that X ⊂ H = H ∗ ⊂ X ∗ . The
dual product hφ, ψi on X × X ∗ is the continuous extension of the scalar product of H
restricted to X × H.
The following proposition contains some further important properties of the duality
mapping F .
Theorem (Duality Mapping) (a) F (x) is a closed convex subset.
(b) If X ∗ is strictly convex (i.e., balls in X ∗ are strictly convex), then for any x ∈ X,
F (x) is single-valued. Moreover, the mapping x → F (x) is demicontinuous, i.e., if
xn → x in X, then F (xn ) converges weakly star to F (x) in X ∗ .
(c) Assume X be uniformly convex (i.e., for each 0 < < 2 there exists δ = δ() > 0
such that if |x| = |y| = 1and |x − y| > , then |x + y| ≤ 2(1 − δ)). If xn converges
weakly to x and lim supn→∞ |xn | ≤ |x|, then xn converges strongly to x in X.
(d) If X ∗ is uniformly convex, then the mapping x → F (x) is uniformly continuous on
bounded subsets of X.
Proof: (a) Closeness of F (x) is an easy consequence of the follows from the continuity
of the duality product. Choose x∗1 , x∗2 ∈ F (x) and α ∈ (0, 1). For arbitrary z ∈ X
we have (using |x∗1 | = |x∗2 | = |x|) hz, αx∗1 + (1 − α)x∗2 i ≤ α|z| |x∗1 | + (1 − α)|z| |x∗2 | =
|z| |x|, which shows |αx∗1 + (1 − α)x∗2 | ≤ |x|. Using hx, x∗ i = hx, x∗1 i = |x|2 we get
hx, αx∗1 + (1 − α)x∗2 i = αhx, x∗1 i + (1 − α)hx, x∗2 i = |x|2 , so that |αx∗1 + (1 − α)x∗2 | = |x|.
This proves αx∗1 + (1 − α)x∗2 ∈ F (x).
(b) Choose x∗1 , x∗2 ∈ F (x), α ∈ (0, 1) and assume that |αx∗1 +(1−α)x∗2 | = |x|. Since X ∗
is strictly convex, this implies x∗1 = x∗2 . Let {xn } be a sequence such that xn → x ∈ X.
From |F (xn )| = |xn | and the fact that closed balls in X ∗ are weakly star compact we
see that there exists a weakly star accumulation point x∗ of {F (xn )}. Since the closed
ball in X ∗ is weakly star closed, thus
hx, x∗ i = |x|2 ≥ |x∗ |2 .
Hence hx, x∗ i = |x|2 = |x∗ |2 and thus x∗ = F (x). Since F (x) is single-valued, this
implies F (xn ) converges weakly to F (x).
(c) Since lim inf |xn | ≤ |x|, thus limn→∞ |xn | = |x|. We set yn = xn /|xn | and y = x/|x|.
Then yn converges weakly to y in X. Suppose yn does not converge strongly to y in
84
X. Then there exists an > 0 such that for a subsequence yn˜ |yn˜ − y| > . Since
X ∗ is uniformly convex there exists a δ > 0 such that |yn˜ + y| ≤ 2(1 − δ). Since the
norm is weakly lower semicontinuos, letting n
˜ → ∞ we obtain |y| ≤ 1 − δ, which is a
contradiction.
(d) Assume F is not uniformly continuous on bounded subsets of X. Then there exist
constants M > 0, > 0 and sequences {un }, {vn } in X satisfying
|un |, |vn | ≤ M, |un − vn | → 0, and |F (un ) − F (vn )| ≥ .
Without loss of the generality we can assume that, for a constant β > 0, we have in
addition |un | ≥ β, |vn | ≥ β. We set xn = un /|un | and yn = vn /|vn |. Then we have
|xn − yn | =
≤
1
||vn |un − |un |vn |
|un | |vn |
2M
1
(|vn ||un − vn | + ||vn | − |un || |vn |) ≤ 2 |un − vn | → 0
β2
β
as n → ∞.
Obviously we have 2 ≥ |F (xn ) + F (yn )| ≥ hxn , F (xn ) + F (yn )i and this together with
hxn , F (xn ) + F (yn )i = |xn |2 + |yn |2 + hxn − yn , F (yn )i
= 2 + hxn − yn , F (yn )i ≥ 2 − |xn − yn |
implies
lim |F (xn ) + F (yn )| = 2.
n→∞
Suppose there exists an 0 > 0 and a subsequence {nk } such that |F (xnk )−F (ynk )| ≥ 0 .
Observing |F (xnk )| = |F (ynk )| = 1 and using uniform convexity of X ∗ we conclude
that there exists a δ0 > 0 such that
|F (xnk ) + F (ynk )| ≤ 2(1 − δ0 ),
which is a contradiction to the above. Therefore we have limn→∞ |F (xn ) − F (yn )| = 0.
Thus
|F (un ) − F (vn )| ≤ |un | |F (xn ) − F (yn )| + ||un | − |vn || |F (yn )|
which implies F (un ) converges strongly to F (vn ). This contradiction proves the result.
Definition (Monotone Mapping)
(a) A mapping A ⊂ X × X ∗ be given. is called monotone if
hx1 − x2 , y1 − y2 i ≥ 0
for all [x1 , y1 ], [x2 , y2 ] ∈ A.
(b) A monotone mapping A is called maximal monotone if any monotone extension of
A coincides with A, i.e., if for [x, y] ∈ X × X ∗ , hx − u, y − vi ≥ 0 for all [u, v] ∈ A then
[x, y] ∈ A.
(c) The operator A is called coercive if for all sequences [xn , yn ] ∈ A with limn→∞ |xn | =
∞ we have
hxn , yn i
lim
= ∞.
n→∞
|xn |
85
(d) Assume that A is single-valued with dom(A) = X. The operator A is called
hemicontinuouson X if for all x1 , x2 , x ∈ X, the function defined by
t ∈ R → hx, A(x1 + tx2 )i
is continuous on R.
For example, let F be the duality mapping of X. Then F is monotone, coercive
and hemicontinuous. Indeed, for [x1 , y1 ], [x2 , y2 ] ∈ F we have
hx1 − x2 , y1 − y2 i = |x1 |2 − hx1 , y2 i − hx2 , y1 i + |x2 |2 ≥ (|x1 | − |x2 |)2 ≥ 0,
(5.1)
which shows monotonicity of F . Coercivity is obvious and hemicontinuity follows from
the continuity of the duality product.
Lemma 1 Let X be a finite dimensional Banach space and A be a hemicontinuous
monotone operator from X to X ∗ . Then A is continuous.
Proof: We first show that A is bounded on bounded subsets. In fact, otherwise there
exists a sequence {xn } in X such that |Axn | → ∞ and xn → x0 as n → ∞. By
monotonicity we have
hxn − x,
Ax
Axn
−
i≥0
|Axn | |Axn |
Without loss of generality we can assume that
hx0 − x, y0 i ≥ 0
for all x ∈ X.
Axn
|Axn |
→ y0 in X ∗ as n → ∞. Thus
for all x ∈ X
and therefore y0 = 0. This is a contradiction and thus A is bounded. Now, assume
{xn } converges to x0 and let y0 be a cluster point of {Axn }. Again by monotonicity
of A
hx0 − x, y0 − Axi ≥ 0 for all x ∈ X.
Setting x = x0 + t (u − x0 ), t > 0 for arbitrary u ∈ X, we have
hx0 − u, y0 − A(x0 + t (u − x0 )) ≥ 0i
for all u ∈ X.
Then, letting limit t → 0+ , by hemicontinuity of A we have
hx0 − u, y0 − Ax0 i ≥ 0
for all u ∈ X,
which implies y0 = Ax0 . Lemma 2 Let X be a reflexive Banach space and A : X → X ∗ be a hemicontinuous
monotone operator. Then A is maximumal monotone.
Proof: For [x0 , y0 ] ∈ X × X ∗
hx0 − u, y0 − Aui ≥ 0
for all u ∈ X.
Setting u = x0 + t (x − x0 ), t > 0 and letting t → 0+ , by hemicontinuity of A we have
hx0 − x, y0 − Ax0 i ≥ 0
86
for all x ∈ X.
Hence y0 = Ax0 and thus A is maximum monotone. The next theorem characterizes maximal monotone operators by a range condition.
Minty–Browder Theorem Assume that X, X ∗ are reflexive and strictly convex. Let
F denote the duality mapping of X and assume that A ⊂ X × X ∗ is monotone. Then
A is maximal monotone if and only if
Range (λF + A) = X ∗
for all λ > 0 or, equivalently, for some λ > 0.
Proof: Assume that the range condition is satisfied for some λ > 0 and let [x0 , y0 ] ∈
X × X ∗ be such that
hx0 − u, y0 − vi ≥ 0
for all [u, v] ∈ A.
Then there exists an element [x1 , y1 ] ∈ A with
λF x1 + y1 = λF x0 + y0 .
(5.2)
From these we obtain, setting [u, v] = [x1 , y1 ],
hx1 − x0 , F x1 − F x0 i ≤ 0.
By monotonicity of F we also have the converse inequality, so that
hx1 − x0 , F x1 − F x0 i = 0.
From (5.10) this implies that |x1 | = |x0 | and hx1 , F x0 i = |x1 |2 , hx0 , F x1 i = |x0 |2 .
Hence F x0 = F x1 and
hx1 , F x0 i = hx0 , F x0 i = |x0 |2 = |F x0 |2 .
If we denote by F ∗ the duality mapping of X ∗ (which is also single-valued), then the last
equation implies x1 = x0 = F ∗ (F x0 ). This and (5.11) imply that [x0 , y0 ] = [x1 , y1 ] ∈ A,
which proves that A is maximal monotone. In stead of the detailed proof of ”only if’ part of Theorem, we state the following
results. Corollary Let X be reflexive and A be a monotone, everywhere defined, hemicontinous
operator. If A is coercive, then R(A) = X ∗ .
Proof: Suppose A is coercive. Let y0 ∈ X ∗ be arbitrary. By the Appland’s renorming
theorem, we may assume that X and X ∗ are strictly convex Banach spaces. It then
follows from Theorem that every λ > 0, equation
λ F xλ + Axλ = y0
has a solution xλ ∈ X. Multiplying this by xλ ,
λ |xλ |2 + hxλ , Axλ i = hy0 , xλ i.
87
and thus
hxλ , Axλ i
≤ |y0 |X ∗
|xλ |X
Since A is coercive, this implies that {xλ } is bounded in X as λ → 0+ . Thus, we may
assume that xλ converges weakly to x0 in X and Axλ converges strongly to y0 in X ∗
as λ → 0+ . Since A is monotone
hxλ − x, y0 − λ F xλ − Axi ≥ 0,
and letting λ → 0+ , we have
hx0 − x, y0 − Axi ≥ 0,
for all x ∈ X. Since A is maximal monotone, this implies y0 = Ax0 . Hence, we
conclude R(A) = X ∗ . Theorem (Galerkin Approximation) Assume X is a reflexive, separable Banach
space and A is a bounded, hemicontinuous, coercive monotone operator from X into
X ∗ . Let Xn = span{φ}ni=1 satisfies the density condition: for each ψ ∈ X and any
> 0 there exists a sequence ψn ∈ Xn such that |ψ − ψn | → 0 as n → ∞. The xn be
the solution to
hψ, Axn i = hψ, f i for all ψ ∈ Xn ,
(5.3)
then there exists a subsequence of {xn } that converges weakly to a solution to Ax = f .
Proof: Since hx, Axi/|x|X → ∞ as |x|X → ∞ there exists a solution xn to (5.14)
and |xn |X is bounded. Since A is bounded, thus Axn bounded. Thus there exists a
subsequence of {n} (denoted by the same) such that xn converges weakly to x in X
and Axn converges weakly in X ∗ . Since
lim hψ, Axn i = lim (hψn , f i + hψ − ψn , Axn i) = hψ, f i
n→∞
n→∞
Axn converges weakly to f . Since A is monotone
hxn − u, Axn − Aui ≥ 0
for all u ∈ X
Note that
lim hxn , Axn i = lim hxn , f i = hx, f i.
n→∞
n→∞
Thus taking limit n → ∞, we obtain
hx − u, f − Aui ≥ 0 for all u ∈ X.
Since A is maximum monotone this implies Ax = f . The main theorem for monotone operators applies directly to the model problem
involving the p-Laplace operator
−div(|∇u|p−2 ∇u) = f on Ω
(with appropriate boundary conditions) and
−∆u + c u = f,
−
∂
u ∈ β(u). at ∂Ω
∂n
88
with β maximal monotone on R. Also, nonlinear problems of non-variational form are
applicable, e.g.,
Lu + F (u, ∇u) = f on Ω
where
L(u) = −div(σ(∇u) − ~b u)
and we are looking for a solution u ∈ W01,p (Ω), 1 < p < ∞. We assume the following
conditions:
(i) Monotonicity for the principle part L(u):
(σ(ξ) − σ(η), ξ − η)Rn ≥ 0 for all ξ, η ∈ Rn .
(ii) Monotonicity for F = F (u):
(F (u) − F (v), u − v) ≥ 0 for all u, v ∈ R.
(iii) Coerciveness and Growth condition: for some c, d > 0
(σ(ξ), σ) ≥ c |ξ|p ,
|σ(ξ)| ≤ d (1 + |ξ|p−1 )
hold for all ξ ∈ Rn .
5.1
Pseudo-Monotone operator
Next, we examine a somewhat more general class of nonlinear operators, called pseudomonotone operators. In applications, it often occurs that the hypotheses imposed on
monotonicity are unnecessarily strong. In particular, the monotonicity assumption
F (u) involves both the first-order derivatives and the function itself. In general the
compactness will take care of the lower-order terms.
Definition (Pseudo-Monotone Operator) Let X be a reflexive Banach space. An
operator T : X → X ∗ is said to be pseudo-monotone operator if T is a bounded
operator (not necessarily continuous) and if whenever un converges weakly to u in X
and
lim suphun − u, T (un )i ≤ 0,
n→∞
it follows that, for all v ∈ X,
lim inf hun − v, T (un )i ≥ hu − v, T (u)i.
n→∞
(5.4)
For example, a monotone and hemi-continuous operator is pseudo-monotone.
Lemma If T is maximal monotone, then T is pseudo-monotone.
Proof: Since T is monotone,
hun − u, T (un )i ≥ hun − u, T (u)i → 0
it follows from (5.4) that limn hun − u, T (un )i = 0. Assume un → u and T (un ) → y
weakly in X and X ∗ , respectively. Thus,
0 ≤ hun − v, T (un ) − T (v)i = hun − u + u − v, T (un ) − T (v)i → hu − v, y − T (v)i.
89
Since T is maximal monotone, we have y = T (u). Now,
limhun − v, T (un )i = limhun − u + u − v, T (un )i = hu − v, T (u)i.
n
n
A class of pseudo-monotone operators is intermediate between monotone and hemicontinuous. The sum of two pseudo-monotone operators is pseudo-monotone. Moreover, the sum of a pseudo-monotone and a strongly continuous operator (xn converges
weakly to x, then T xn → T x strongly) is pseudo-monotone.
Property of Pseudo-monotone operator
(1) Letting v = u in (5.4)
0 ≤ liminf hun − u, T (un )i ≤ lim suphun − u, T (un )i ≤ 0
and thus limn hun − u, T (un )i = 0.
(2) From (1) and (5.4)
hu − v, T ui ≤ lim infhun − v, T (un )i + lim infhu − un − u, T (un )i = lim infhu − v, T (un )i
(3) T (un ) → T (u) weakly. In fact, from(2)
hv, T (u)i ≤ lim infhv, T (un )i ≤ lim suphv, T (un )i
= lim suphun − u, T (un )i + lim suphu + v − un , T (un )i
= − lim infhun − (u + v), T (un )i ≤ hv, T (u)i.
(4) hun − u, T (un ) − T (u)i → 0 By the hypothesis
0 ≤ lim infhun − u, T (un )i = lim infhun − u, T (un )i − lim infhun − u, T (u)i
= lim infhun − u, T (un ) − T (u)i ≤ lim suphun − u, T (un ) − T (u)i
= lim suphun − u, T (un ) − T (u)i ≤ 0
(5) hun , T (un )i → hu, T (u)i. Since
hun , T (un )i = hun − u, T (un ) − ui − hu, T (un )i − hun , T (u)i + hun , T (un )i
the claim follows from (3)–(4).
(6) Conversely, if T (un ) → T (u) weakly and hun , T (un )i → hu, T (u)i → 0, then the
hypothesis holds.
Using a very similar proof to that of the Browder-Minty theorem, one can show the
following.
Theorem (Bresiz) Assume, the operator A : X → X ∗ is pseudo-monotone, bounded
and coercive on the real separable and reflexive Banach space X. Then, for each f ∈ X ∗
the equation A(u) = f has a solution.
90
5.2
Generalized Pseudo-Monotone Mapping
In this section we discuss pseudo-monotone mappings. We extend the notion of the
maximal monotone mapping, i.e., the maximal monotone mapping is a generalized
pseudo-monotone mappings.
Definition (Pseudo-monotone Mapping)
(a) The set T (u) is nonempty, bounded, closed and convex for all u ∈ X.
(b) T is upper semicontinuous from each finite-dimensional subspace F of X to the
weak topology on X ∗ , i.e., to a given element u+ ∈ F and a weak neighborhood V
of T (u), in X ∗ there exists a neighborhood U of u0 in F such that T (u) ⊂ V for all
u ∈ U.
(c) If {un } is a sequence in X that converges weakly tou ∈ X and if wn ∈ T (un ) is
such that lim suphun − u, wn i ≤ 0, then to each element v ∈ X there exists w ∈ T (u)
with the property that
lim infhun − v, wn i ≥ hu − v, wi.
Definition (Generalized Pseudo-monotone Mapping) A mapping T from X into
X ∗ is said to be generalized pseudo-monotone if the following is satisfied. For any sequence {un } in X and a corresponding sequence {wn } in X ∗ with wn ∈ T (xn ), if un
converges weakly to u and wn weakly to w such that
lim suphun − u, wn )i ≤ 0,
n→∞
then w ∈ T (u) and hun , wn i → hu, wi.
Let X be a reflexive Banach space, T a pseudo-monotone mapping from X into
X ∗ . Then T is generalized pseudo-monotone. Conversely, suppose T is a bounded
generalized pseudo-monotone mapping from X into X ∗ . and assume that for each
u ∈ X, T u is a nonempty closed convex subset of X ∗ . Then T is pseudo-monotone.
PROPOSITION 2. A maximal monotone mapping T from the Banach space X into
X ∗ is generalized pseudo-monotone.
Proof: Let {un } be a sequence in X that converges weakly to u ∈ X, {wn } a sequence in X ∗ with wn ∈ T (un ) that converges weakly to w ∈ X ∗ . Suppose that
lim suphun − u, wn i ≤ 0 Let [x, y] ∈ T be an arbitrary element of the graph G(T ). By
the monotonicity of T ,
hun − x, wn − yi ≥ 0
Since
hun , wn i = hun − x, wn − yi + hx, wn i + hun , yi − hx, yi,
where
hx, wn i + hun , yi → hx, wi + hu, yi − hx, yi,
we have
hu, wi ≤ lim suphun , wn i ≥ hx, wi + hu, yi − hx, yi,
i.e.,
hu − x, w − yi ≥ 0
91
Since T is maximal, w ∈ T (u). Consequently,
hun − u, wn − wi ≥ 0.
Thus,
lim infhun , wn i ≥ lim inf(hun , wihu, wn i + hu, wi = hu, wi,
It follows that limhun , wn i → hu, wi. DEFINITION 5 (1) A mapping T from X into X ∗ is coercive if there exists a function
c : R+ → R+ with limr∞ c(r) = ∞ such that
hu, wi ≥ c(|u|)|u| for all [u, w] ∈ G(T ).
(1) An operator T from X into X ∗ is called smooth if it is bounded, coercive, maximal
monotone, and has effective domain D(T ) = X.
(2) Let T be a generalized pseudo-monotone mapping from X into X ∗ . Then T is said
to be regular if R(T + T2 ) = X ∗ for each smooth operator T2 .
(3) T is said to be quasi-bounded if for each M > 0 there exists K(M ) > 0 such that
whenever [u, w] ∈ G(T ) of T and
hu, wi ≤ M |u|,
|u| ≤ M,
then |w| ≤ K(M ). T is said to be strongly quasi-bounded if for each M > 0 there
exists K(M ) > 0 such that whenever [u, w] ∈ G(T ) of T and
hu, wi ≤ M
|u| ≤ M,
then |w| ≤ K(M ).
Theorem 1 Let T = T1 + T0 , where T1 is maximal monotone with 0 ∈ D(T ), while
T0 is a regular generalized pseudo-monotone mapping such that for a given constant
k, hu, wik |u| for all [u, w] ∈ G(T ). If either T0 is quasi-bounded or T1 is strongly
quasi-bounded, then T is regular.
Theorem 2 (Browder). Let X be a reflexive Banach space and T be a generalized
pseudo-monotone mapping from X into X ∗ . Suppose that R( F + T ) = X ∗ for > 0
and T is coercive. Then R(T ) = X ∗ .
Lemma 1 Let T be a generalized pseudo-monotone mapping from the reflexive Banach
space X into X ∗ , and let C be a bounded weakly closed subset of X. Then T (C) =
{w ∈ X ∗ : w ∈ T (u)for some u ∈ C} is closed in the strong topology of X ∗ .
Proof: Let Let {wn } be a sequence in T (C) converging strongly to some w ∈ X ∗ . For
each n, there exists un ∈ C such that wn ∈ T (un ). Since C is bounded and weakly
closed, without loss of the generality we assume un → u ∈ C, weakly in X. It follows
that limhun − u, wn i = 0. The generalized pseudo-monotonicity of T implies that
w ∈ T (u), i.e., w lies in T (C). Proof of Theorem 2: Let J denote the normalized Let F denote the duality mapping
of X into X ∗ . It is known that F is a smooth operator. Let w0 be a given element of
X ∗ . We wish to show that w0 ∈ R(T ). For each > 0, it follows from the regularity
of T that there exists an element u ∈ X such that
w0 − p ∈ T (u ) for some p ∈ F (u ).
92
Let w = w0 − p . Since T is coercive and hu , w i = |u |2 , we have for some k
|u |2 ≤ k |u | + |u ||w0 |.
Hence
|u | = |p | ≤ |w0 |.
and
|w | ≤ |w0 | + |p | ≤ k + 2|w0 |
Since T is coercive, there exits M > 0 such that |u | ≤ M and thus
|w − w0 | ≤ |p | = |u | → 0 as → 0+
It thus follows from Lemma 1 that T (B(0, M )) is closed and thus w0 ∈ T (B(0, M )),
i.e., w0 ∈ R(T ). THEOREM 3 Let X be a reflexive Banach space and T a pseudo-monotone mapping
from X into X ∗ . Suppose that T is coercive, R(T ) = X ∗ .
PROPOSITION 10. Let F be a finite-dimensional Banach space, a mapping from
F into F ∗ such that for each u ∈ F , T(u) is a nonempty bounded closed convex subset
of F ∗ . Suppose that T is coercive and upper semicontinuous from F to F ∗ . Then
R(T ) = F ∗ .
Proof: Since for each w0 ∈ F ∗ define the mapping Tw0 : F → F ∗ by
Tw0 (u) = T (u) − w0
satisfies the same hypotheses as the original mapping T . It thus suffices to prove that
0 ∈ R(T ). Suppose that 0 does not lie in R(T ). Then for each u ∈ F , there exists
v(u) ∈ F such that
inf hv(u), wi > 0
w∈T (u)
Since T is coercive, there exists a positive constant R such that c(R) > 0, and hence
for each u ∈ F with |u| = R and each w ∈ T (u),
hu, wi ≥ λ > 0
where λ = c(R)R, For such points u ∈ F , we may take v(u) = u. For a given nonzero
v0 in F let
Wv0 = {u ∈ F : inf hv0 , wi > 0}.
w∈T (u)
The family {Wv0 : v0 ∈ F, v0 6= 0} forms a covering of the space F . By the upper
semicontinuity of T , from F to F ∗ each Wv0 is open in F . Hence the family {Wv0 :
v0 ∈ F, v0 6= 0} forms an open covering of F . We now choose a finite open covering
{V1 , · · · , Vm } of the closed ball B(0, R) in F of radius R about the origin, with the
property that for each k there exists an element vk in F such that Vk ⊂ Wvk and the
additional condition that if Vk intersects the boundary sphere S(0, R), then vk is a
point of Vk ∩ S(0, R) and the diameter of such Vk is less than R2 .. We next choose a
partition of unity {α1 , · · · , αm } subordinated to the covering {V1 , · · · , Vm } where each
αk is a continuous mapping of F into [0, I] which vanishes outside the corresponding
93
P
set Vk , and with the property that k αk (x) = 1 for each x ∈ B(0, R). Using this
partition of unity, we may define a continuous mapping f of B(0, R) into F by setting
X
f (x) =
α(x)vk .
k
We note that for each k for which alphak (x) > 0 and for any winT0 x, we have hvk , wi >
0 since x must lie in Vk , which is a subset of Wk . Hence for each x ∈ B(0, R) and any
w ∈ T0 x
X
hf (x), wi =
α(x)hvk , wi > 0
k
Consequently, f (x) 6= 0 for any x ∈ B(0, R). This implies that the degree of the
mapping f on B(0, R) with respect to 0 equals 0. For x ∈ S(0, R), on the other hand,
f (x) is a convex linear combination of points vk ∈ S(0, R), each of which is at distance
at most R2 from the point x. We infer that
|f (x) − x| ≤
R
, x ∈ S(0, R).
2
By this inequality, f considered as a mapping from B(0, R) into F is homotopic to
the identity mapping. Hence the degree of f on B(0, R) with respect to 0 is 1. This
contradiction proves that 0 must lie in T0 u for some u ∈ B(0, R). Proof of Theorem 3 Let Λ be the family of all finite-dimensional subspaces F of X,
ordered by inclusion. For F ∈ Λ, let jF : F → X denote the inclusion mapping of F
into X, and jF∗ : X ∗ → F ∗ the dual projection mapping of X ∗ onto F ∗ . The operator
TF = jF∗ T jF
then maps F into F ∗ . For each u ∈ F , TF u is a weakly compact convex subset of X ∗
and jF∗ is continuous from the weak topology on X ∗ to the (unique) topology on F ∗ .
Hence TF u is a nonempty closed convex subset of F ∗ . Since T is upper semicontinuous
from F to F ∗ with X ∗ given its weak topology, TF is upper semicontinuous from F
to F ∗ . By Proposition 10, to each F ∈ Λ there exists an element uF ∈ F such that
jF∗ w0 = wF for some wF ∈ TF uF . The coerciveness of T implies that wF ∈ TF uF , i.e.
huF , jF∗ w0 i = huF , wF i ≥ c(|uF |)|uF |.
Consequently, the elements {|uF |} are uniformly bounded by M for all F ∈ Λ. For
F ∈ Λ, let
[
VF =
{uF 0 }.
F 0 ∈Λ, F 0 ⊂F
Then the set VF is contained in the closed ball B(0, M ) in X Since B(0, M ) is weakly
compact, and since the family {weak closure(VF )} has the finite intersection property,
the intersection ∩F ∈Λ {weak closure(VF )} is not empty. Let u0 be an element contained
in this intersection.
The proof will be complete if we show that 0 ∈ Tw0 u0 . Let v ∈ X be arbitrarily
given. We choose F ∈ Λ such that it contains u0 , v ∈ F Let {uFk }denote a sequence
in VF converging weakly to u0 . Since jF∗ k wFk = 0, we have
huFk − u0 , wFk i = 0 for all k.
94
Consequently, by the pseudo-monotonicity of T , to given v ∈ X there exists w(v) ∈
T (u0 ) with
(5.5)
limhuFk − v, wFk i ≥ hu0 − v, wi.
Suppose now that 0 ∈
/ T (u0 ). Then 0 can be separated from the nonempty closed
convex set T (u0 ), i.e., there exists an element x = u − v ∈ X such that
inf hu0 − v, zi > 0.
z∈T (u0 )
But this is a contradiction to (5.5). Navier-Stokes equations
Example Consider the constrained minimization
min
F (y) subject to E(y) = 0.
where
E(y) = E0 y + f (y).
The necessary optimality condition for x = (y, p)
∂F − E 0 (y)∗ p
A(y, p) ∈
.
E(y)
Let
A0 (y, p) ∈
and
∂F − E0∗ p
E0 y
A1 (y, p) =
−f 0 (y)p
f (y)
,
.
5.3 Dissipative Operators and Semigroup of Nonlinear
Contractions
In this section we consider
du
∈ Au(t),
dt
u(0) = u0 ∈ X
for the dissipative mapping A on a Banach space X.
Definition (Dissipative) A mapping A on a Banach space X is dissipative if
|x1 − x2 − λ (y1 − y2 )| ≥ |x1 − x2 | for all λ > 0 and [x1 , y1 ], [x2 , y2 ] ∈ A,
or equivalently
hy1 − y2 , x1 − x2 i− ≤ 0 for all [x1 , y1 ], [x2 , y2 ] ∈ A.
and if in addition R(I − λA) = X, then A is m-dissipative.
In particular, it follows that if A is dissipative, then for λ > 0 the operator (I −
λ A)−1 is a single-valued and nonexpansive on R(I − λ A), i.e.,
|(I − λ A)−1 x − (I − λ A)−1 y| ≤ |x − y|
95
for all x, y ∈ R(I − λ A).
Define the resolvent and Yosida approximation A by
Jλ x = (I − λ A)−1 x ∈ dom (A),
x ∈ dom (Jλ ) = R(I − λ A)
(5.6)
Aλ = λ−1 (Jλ x − x),
x ∈ dom (Jλ ).
We summarize some fundamental properties of Jλ and Aλ in the following theorem.
Theorem 1.4 Let A be an ω− dissipative subset of X × X, i.e.,
|x1 − x2 − λ (y1 − y2 )| ≥ (1 − λω) |x1 − x2 |
for all 0 < λ < ω −1 and [x1 , y1 ], [x2 , y2 ] ∈ A and define kAxk by
kAxk = inf{|y| : y ∈ Ax}.
Then for 0 < λ < ω −1 ,
(i) |Jλ x − Jλ y| ≤ (1 − λω)−1 |x − y| for x, y ∈ dom (Jλ ).
(ii) Aλ x ∈ AJλ x for x ∈ R(I − λ A).
(iii) For x ∈ dom(Jλ ) ∩ dom (A) |Aλ x| ≤ (1 − λω)−1 kAxk and thus |Jλ x − x| ≤
λ(1 − λω)−1 kAxk.
(iv) If x ∈ dom (Jλ ), λ, µ > 0, then
µ
λ−µ
x+
Jλ x ∈ dom (Jµ )
λ
λ
and
Jλ x = Jµ
λ−µ
µ
x+
Jλ x .
λ
λ
(v) If x ∈ dom (Jλ )∩dom (A) and 0 < µ ≤ λ < ω −1 , then (1−λω)|Aλ x| ≤ (1−µω) |Aµ y|
(vi) Aλ is ω −1 (1 − λω)−1 -dissipative and for x, y ∈ dom(Jλ ) |Aλ x − Aλ y| ≤ λ−1 (1 +
(1 − λω)−1 ) |x − y|.
Proof: (i) − −(ii) If x, y ∈ dom (Jλ ) and we set u = Jλ x and v = Jλ y, then there exist
u
ˆ and vˆ such that x = u − λ u
ˆ and y = v − λ vˆ. Thus, from (1.8)
|Jλ x − Jλ y| = |u − v| ≤ (1 − λω)−1 |u − v − λ (ˆ
u − vˆ)| = (1 − λω)−1 |x − y|.
Next, by the definition Aλ x = λ−1 (u − x) = u
ˆ ∈ Au = AJλ x.
(iii) Let x ∈ dom (Jλ )∩dom (A) and x
ˆ ∈ Ax be arbitrary. Then we have Jλ (x−λ x
ˆ) = x
since x − λ x
ˆ ∈ (I − λ A)x. Thus,
|Aλ x| = λ−1 |Jλ x−x| = λ−1 |Jλ x−Jλ (x−λ x
ˆ)| ≤ (1−λω)−1 λ−1 |x−(x−λ x
ˆ)| = (1−λω)−1 |ˆ
x|.
which implies (iii).
(iv) If x ∈ dom (Jλ ) = R(I − λ A) then we have x = u − λ u
ˆ for [u, u
ˆ] ∈ A and thus
u = Jλ x. For µ > 0
µ
λ−µ
µ
λ−µ
x+
Jλ x = (u − λ u
ˆ) +
u = u − µu
ˆ ∈ R(I − µ A) = dom (Jµ ).
λ
λ
λ
λ
and
Jµ
µ
λ−µ
x+
Jλ x
λ
λ
= Jµ (u − µ u
ˆ) = u = Jλ x.
96
(v) From (i) and (iv) we have
λ |Aλ x| = |Jλ x − x| ≤ |Jλ x − Jλ y| + |Jµ x − x|
µ
λ−µ
≤ Jµ
x+
Jλ x − Jµ x + |Jµ x − x|
λ
λ
λ−µ
λ x + λ Jλ x − x + |Jµ x − x|
−1 µ
≤ (1 − µω)
= (1 − µω)−1 (λ − µ) |Aλ x| + µ |Aµ x|,
which implies (v) by rearranging.
(vi) It follows from (i) that for ρ > 0
|x − y − ρ (Aλ x − Aλ y)| = |(1 +
≥ (1 +
ρ
ρ
) (x − y) − (Jλ x − Jλ y)|
λ
λ
ρ
ρ
) |x − y| − |Jλ x − Jλ y|
λ
λ
≥ ((1 +
ρ
ρ
) − (1 − λω)−1 ) |x − y| = (1 − ρ ω(1 − λω)−1 ) |x − y|.
λ
λ
The last assertion follows from the definition of Aλ and (i). Theorem 1.5
(1) A dissipative set A ⊂ X × X is m-dissipative, if and only if
R(I − λ0 A) = X
for some λ0 > 0.
(2) An m-dissipative mapping is maximal dissipative, i.e., all dissipative set containing
A in X × X coincide with A.
(3) If X = X ∗ = H is a Hilbert space, then the notions of the maximal dissipative set
and m-dissipative set are equivalent.
Proof: (1) Suppose R(I − λ0 A) = X. Then it follows from Theorem 1.4 (i) that Jλ0 is
contraction on X. We note that
λ0
λ0
I − λA =
I − (1 − ) Jλ0 (I − λ0 A)
(5.7)
λ
λ
for 0 < λ < ω −1 . For given x ∈ X define the operator T : X → X by
T y = x + (1 −
λ0
) Jλ0 y,
λ
Then
|T y − T z| ≤ |1 −
y∈X
λ0
| |y − z|
λ
where |1 − λλ0 | < 1 if 2λ > λ0 . By Banach fixed-point theorem the operator T has a
unique fixed point z ∈ X, i.e., x = (I − (1 − λλ0 Jλ0 )z. Thus,
x ∈ (I − (1 −
λ0
) Jλ0 )(I − λ0 A)dom (A).
λ
97
and it thus follows from (5.7) that R(I − λ A) = X if λ > λ20 . Hence, (1) follows from
applying the above argument repeatedly.
(2) Assume A is m-dissipative. Suppose A˜ is a dissipative set containing A. We need
to show that A˜ ⊂ A. Let [x, x
ˆ] ∈ A˜ Since x − λ x
ˆ ∈ X = R(I − λ A), for λ > 0, there
exists a [y, yˆ] ∈ A such that x − λ x
ˆ = y − λ yˆ. Since A ⊂ A˜ it follows that [y, yˆ] ∈ A
and thus
|x − y| ≤ |x − y − λ (ˆ
x − yˆ)| = 0.
Hence, [x, x
ˆ] = [y, yˆ] ∈ A.
(3) It suffices to show that if A is maximal dissipative, then A is m-dissipative. We use
the following extension lemma, Lemma 1.6. Let y be any element of H. By Lemma
1.6, taking C = H, we have that there exists x ∈ H such that
(ξ − x, η − x + y) ≤ 0
for all [ξ, η] ∈ A.
and thus
(ξ − x, η − (x − y)) ≤ 0
for all [ξ, η] ∈ A.
Since A is maximal dissipative, this implies that [x, x − y] ∈ A, that is x − y ∈ Ax, and
therefore y ∈ R(I − H). Lemma 1.6 Let A be dissipative and C be a closed, convex, non-empty subset of H
such that dom (A) ∈ C. Then for every y ∈ H there exists x ∈ C such that
(ξ − x, η − x + y) ≤ 0
for all [ξ, η] ∈ A.
Proof: Without loss of generality we can assume that y = 0, for otherwise we define
Ay = {[ξ, η + y] : [ξ, η] ∈ A} with dom (Ay ) = dom (A). Since A is dissipative if and
only if Ay is dissipative , we can prove the lemma for Ay . For [ξ, η] ∈ A, define the set
C([ξ, η]) = {x ∈ C : (ξ − x, η − x) ≤ 0}.
T
Thus, the lemma is proved if we can show that [ξ,η]∈A C([ξ, η]) is non-empty.
5.3.1
Properties of m-dissipative operators
In this section we discuss some properties of m-dissipative sets.
Lemma 1.7 Let X ∗ be a strictly convex Banach space. If A is maximal dissipative,
then Ax is a closed convex set of X for each x ∈ dom (A).
Proof: It follows from Lemma that the duality mapping F is single-valued. First, we
show that Ax is convex. Let x
ˆ1 , xˆ2 ∈ Ax and set x
ˆ = αˆ
x1 + (1 − α) x
ˆ2 for 0 ≤ α ≤ 1.
Then, Since A is dissipative, for all [y, yˆ] ∈ A
Re hˆ
x − yˆ, F (x − y)i = α Re hˆ
x1 − yˆ, F (x − y)i + (1 − α) Re hˆ
x2 − yˆ, F (x − y)i ≤ 0.
Thus, if we define a subset A˜ by

if z ∈ dom (A) \ {x}
 Az
˜
Az =

Ax ∪ {ˆ
x} if z = x,
98
˜ = dom (A). Since A is maximal
then A˜ is a dissipative extension of A and dom (A)
˜ = Ax and thus x
dissipative, it follows that Ax
ˆ ∈ Ax as desired.
Next, we show that Ax is closed. Let x
ˆn ∈ Ax and limn→∞ x
ˆn = x
ˆ. Since A
is dissipative, Re hˆ
xn − yˆ, x − yi ≤ 0 for all [y, yˆ] ∈ A. Letting n → ∞, we obtain
Re hˆ
x − yˆ, x − yi ≤ 0. Hence, as shown above x
ˆ ∈ Ax as desired. Definition 1.4 A subset A of X × X is said to be demiclosed if xn → x and yn * y
and [xn , yn ] ∈ A imply that [x, y] ∈ A. A subset A is closed if [xn , yn ], xn → x and
yn → y imply that [x, y] ∈ A.
Theorem 1.8 Let A be m-dissipative. Then the followings hold.
(i) A is closed.
(ii) If {xλ } ⊂ X such that xλ → x and Aλ xλ → y as λ → 0+ , then [x, y] ∈ A.
Proof: (i) Let [xn , x
ˆn ] ∈ A and (xn , x
ˆn ) → (x, x
ˆ) in X × X. Since A is dissipative
Re hˆ
xn − yˆ, xn − yii ≤ 0 for all [y, yˆ] ∈ A. Since h·, ·ii is lower semicontinuous, letting
n → ∞, we obtain Re hˆ
x − yˆ, x − yii ≤ for all [y, yˆ] ∈ A. Then A1 = [x, x
ˆ] ∪ A is a
dissipative extension of A. Since A is maximal dissipative, A1 = A and thus [x, x
ˆ] ∈ A.
Hence, A is closed.
(ii) Since {Aλ x} is a bounded set in X, by the definition of Aλ , lim |Jλ xλ − xλ | → 0
and thus Jλ xλ → x as λ → 0+ . But, since Aλ xλ ∈ AJλ xλ , it follows from (i) that
[x, y] ∈ A. Theorem 1.9 Let A be m-dissipative and let X ∗ be uniformly convex. Then the
followings hold.
(i) A is demiclosed.
(ii) If {xλ } ⊂ X such that xλ → x and {|Aλ x|} is bounded as λ → 0+ , then x ∈
dom (A). Moreover, if for some subsequence Aλn xn * y, then y ∈ Ax.
(iii) limλ→0+ |Aλ x| = kAxk.
Proof: (i) Let [xn , x
ˆn ] ∈ A be such that lim xn = x and w − lim x
ˆn = x
ˆ as n → ∞.
Since X ∗ is uniformly convex, from Lemma the duality mapping is single-valued and
uniformly continuous on the bounded subsets of X. Since A is dissipative Re hˆ
xn −
yˆ, F (xn − y)i ≤ 0 for all [y, yˆ] ∈ A. Thus, letting n → ∞, we obtain Re hˆ
x − yˆ, F (x −
y)i ≤ 0 for all [y, yˆ] ∈ A. Thus, [x, x
ˆ] ∈ A, by the maximality of A.
Definition 1.5 The minimal section A0 of A is defined by
A0 x = {y ∈ Ax : |y| = kAxk} with dom (A0 ) = {x ∈ dom (A) : A0 x is non-empty}.
Lemma 1.10 Let X ∗ be a strictly convex Banach space and let A be maximal dissipative. Then, the followings hold.
(i) If X is strictly convex, then A0 is single-valued.
(ii) If X reflexible, then dom (A0 ) = dom (A).
(iii) If X strictly convex and reflexible, then A0 is single-valued and dom (A0 ) =
dom (A).
Theorem 1.11 Let X ∗ is a uniformly convex Banach space and let A be m-dissipative.
Then the followings hold.
(i) limλ→0+ F (Aλ x) = F (A0 x) for each x ∈ dom (A).
Moreover, if X is also uniformly convex, then
(ii) limλ→0+ Aλ x = A0 x for each x ∈ dom (A).
99
Proof: (1) Let x ∈ dom (A). By (ii) of Theorem 1.2
|Aλ x| ≤ kAxk
Since {Aλ x} is a bounded sequence in a reflexive Banach space (i.e., since X ∗ is
uniformly convex, X ∗ is reflexive and so is X), there exists a weak convergent subsequence {Aλn x}. Now we set y = w − limn→∞ Aλn x. Since from Theorem 1.2
Aλn x ∈ AJλn x and limn→∞ Jλn x = x and from Theorem 1.10 A is demiclosed, it
follows that [x, y] ∈ A. Since by the lower-semicontinuity of norm this implies
kAxk ≤ |y| lim inf |Aλn x| ≤ lim sup |Aλn x| ≤ kAxk,
n→∞
n→∞
we have |y| = kAxk = limn→∞ |Aλn x| and thus y ∈ A0 x. Next, since |F (Aλn x)| =
|Aλn x| ≤ kAxk, F (Aλn x) is a bounded sequence in the reflexive Banach space X ∗
and has a weakly convergent subsequence F (Aλk x) of F (Aλn x). If we set y ∗ = w −
limk→∞ F (Aλk x), then it follows from the dissipativity of A that
Re hAλk x − y, F (Aλk x)i = λ−1
n Re hAλk x − y, F (Jλk xx)i ≤ 0,
or equivalently |Aλk x|2 ≤ Re hy, F (Aλn x)i. Letting k → ∞, we obtain |y|2 ≤ Re hy, y ∗ i.
Combining this with
|y ∗ | ≤ lim |F (Aλk x)| = lim |Aλk x| = |y|,
k→∞
k→∞
we have
|y ∗ | ≤ Re hy, y ∗ i ≤ |hy, y ∗ i| ≤ |y||y ∗ | ≤ |y|2 .
Hence,
hy, y ∗ i = |y|2 = |y ∗ |2
and we have y ∗ = F (y). Also, limk→∞ |F (Aλk x)| = |y| = |F (y)|. It thus follows from
the uniform convexity of X ∗ that
lim F (Aλk x) = F (y).
k→∞
Since Ax is a closed convex set of X from Theorem , we can show that x → F (A0 x)
is single-valued. In fact, if C is a closed convex subset of X, the y is an element of
minimal norm in C, if and only if
|y| ≤ |(1 − α)y + α z|
for all z ∈ C and 0 ≤ α ≤ 1.
Hence,
hz − y, yi+ ≥ 0.
and from Theorem 1.10
0 ≤ Re hz − y, f i = Re hz, f i − |y|2
(5.8)
for all z ∈ C and f ∈ F (y). Now, let y1 , y2 be arbitrary in A0 x. Then, from (5.8)
|y1 |2 ≤ Re hy2 , F (y1 )i ≤ |y1 ||y2 |
100
which implies that hy2 , F (y1 )i = |y2 |2 and |F (y1 )| = |y2 |. Therefore, F (y1 ) = F (y2 ) as
desired. Thus, we have shown that for every sequence {λ} of positive numbers that
converge to zero, the sequence {F (Aλ x)} has a subsequence that converges to the same
limit F (A0 x). Therefore, limλ→0 F (Aλ x) → F (A0 x).
Furthermore, we assume that X is uniformly convex. We have shown above that
for x ∈ dom (A) the sequence {Aλ } contains a weak convergent subsequence {Aλn x}
and if y = w − limn→∞ Aλn x then [x, y] ∈ A0 and |y| = limn→∞ |Aλn x|. But since
X is uniformly convex, it follows from Theorem 1.10 that A0 is single-valued and thus
y = A0 x. Hence, w − limn→∞ Aλn x = A0 x and limn→∞ |Aλn x| = |A0 x|. Since X is
uniformly convex, this implies that limn→∞ Aλn x = A0 x. Theorem 1.12 Let X is a uniformly convex Banach space and let A be m-dissipative.
Then dom (A) is a convex subset of X.
Proof: It follows from Theorem 1.4 that
|Jλ x − x| ≤ λ kAxk for x ∈ dom (A)
Hence |Jλ x − x| → 0 as λ → 0+ . Since Jλ x ∈ dom (A) for X ∈ X, it follows that
dom (A) = {x ∈ X : |Jλ x − x| → 0 as λ → 0+ }.
Let x1 , x2 ∈ dom (A) and 0 ≤ α ≤ 1 and set
x = α x1 + (1 − α) x2 .
Then, we have
|Jλ x − x1 | ≤ |x − x1 | + |Jλ x1 − x1 |
(5.9)
|Jλ x − x2 | ≤ |x − x2 | + |Jλ x2 − x2 |
where x − x1 = (1 − α) (x2 − x1 ) and x − x2 = α (x1 − x2 ). Since {Jλ x} is a bounded
set and a uniformly convex Banach space is reflexive, it follows that there exists a
subsequence {Jλn x} that converges weakly to z. Since the norm is weakly lower semicontinuous, letting n → ∞ in (5.9) with λ = λn , we obtain
|z − x1 | ≤ (1 − α) |x1 − x2 |
|z − x2 | ≤ α |x1 − x2 |
Thus,
|x1 − x2 | = |(x1 − z) + (z − x2 )| ≤ |x1 − z| + |z − x2 | ≤ |x1 − x2 |
and therefore |x1 −z| = (1−α) |x1 −x2 |, |z −x2 | = α |x1 −x2 | and |(x1 −z)+(z −x2 )| =
|x1 − x2 |. But, since X is uniformly convex we have z = x and w − lim Jλn x = x as
n → ∞. Since we also have
|x − x1 | ≤ lim inf |Jλn x − Jλn x1 | ≤ |x − x1 |
n→∞
|Jλn x − Jλn x1 | → |x − x1 | and w − lim Jλn x − Jλn x1 = x − x1 as n → ∞. Since X is
uniformly convex, this implies that limλ→0+ Jλ x = x and x ∈ dom (A). 101
5.3.2
Generation of Nonlinear Semigroups
Definition 2.1 Let X0 be a subset of X. A semigroup S(t), t ≥ of nonlinear contractions on X0 is a function with domain [0, ∞) × X0 and range in X0 satisfying the
following conditions:
S(t + s)x = S(t)S(s)x
and S(0)x = x
for x ∈ X0 , t, s ≥ 0
t → S(t)x ∈ X is continuous
|S(t)x − S(t)y| ≤ |x − y| for t ≥ 0, x, y ∈ X0
In this section, we consider the generation of nonlinear semigroup by CrandallLiggett on a Banach space X. Let A be a ω-dissipative operator and Jλ = (I − λ A)−1
is the resolvent. The following estimate plays an essential role in the Crandall-Liggett
generation theory.
Lemma 2.1 Assume a sequence {an,m } of positive numbers satisfies
an,m ≤ α an−1,m−1 + (1 − α) an−1,m
(5.10)
and a0,m ≤ m λ and an,0 ≤ n µ for λ ≥ µ > 0 and α = µλ . Then we have the estimate
1
1
an,m ≤ [(mλ − nµ)2 + mλ2 ] 2 + [(mλ − nµ)2 + nλµ] 2 .
(5.11)
Proof: From the assumption, (5.11) holds for either m = 0 or n = 0. We will use
the induction in n, m, that is if (5.11) holds for (n + 1, m) when (5.11) is true for
(n, m) and (n, m − 1), then (5.11) holds for all (n, m). Let β = 1 − α. We assume that
(5.11) holds for (n, m) and (n, m − 1). Then, by (5.10) and Cauchy-Schwarz inequality
1
1
αx + βy ≤ (α + β) 2 (αx2 + βy 2 ) 2
an+1,m ≤ α an,m−1 + β an,m
1
1
≤ α ([((m − 1)λ − nµ)2 + (m − 1)λ2 ] 2 + [((m − 1)λ − nµ)2 + nλµ] 2 )
1
1
+β ([(mλ − nµ)2 + mλ2 ] 2 + [(mλ − nµ)2 + nλµ] 2 )
1
1
= (α + β) 2 (α [((m − 1)λ − nµ)2 + (m − 1)λ2 ] + β [(mλ − nµ)2 + mλ2 ]) 2
1
1
+(α + β) 2 (α [((m − 1)λ − nµ)2 + nλµ] + β [(mλ − nµ)2 + nλµ]) 2
1
1
≤ [(mλ − (n + 1)µ)2 + mλ2 ] 2 + [(λ − (n + 1)µ)2 + (n + 1)λµ] 2 .
Here, we used α + β = 1, αλ = µ and
α [((m − 1)λ − nµ)2 + (m − 1)λ2 ] + β [(mλ − nµ)2 + mλ2 ])
≤ (mλ − nµ)2 + mλ2 − αλ(mλ − nµ) = (mλ − (n + 1)µ)2 + mλ2 − µ2
α [((m − 1)λ − nµ)2 + nλµ] + β [(mλ − nµ)2 + nλµ]
≤ (mλ − nµ)2 + (n + 1)λµ − 2αλ(mλ − nµ) ≤ (mλ − (n + 1)µ)2 + (n + 1)λµ − µ2 .
102
Theorem 2.2 Assume A be a dissipative subset of X × X and satisfies the range
condition
dom (A) ⊂ R(I − λ A) for all sufficiently small λ > 0.
(5.12)
Then, there exists a semigroup of type ω on S(t) on dom (A) that satisfies for x ∈
dom (A)
t
S(t)x = lim (I − λ A)−[ λ ] x, t ≥ 0
(5.13)
λ→0+
and
|S(t)x − S(t)x| ≤ |t − s| kAxk
for x ∈ dom (A), and t, s ≥ 0.
Proof: First, note that from (5.14) dom (A) ⊂ dom (Jλ ). Let x ∈ dom (A) and set
an,m = |Jµn x − Jλm x| for n, m ≥ 0. Then, from Theorem 1.4
a0,m = |x − Jλm x| ≤ |x − Jλ x| + |Jλ x − Jλ2 x| + · · · + |Jλm−1 x − Jλm x|
≤ m |x − Jλ x| ≤ mλ kAxk.
Similarly, an,0 = |Jµn x − x| ≤ nµ kAxk. Moreover,
λ−µ m
µ m−1
n
n
m
x+
J
Jλ x |
an,m = |Jµ x − Jλ x| ≤ |Jµ x − Jµ
λ λ
λ
≤
µ n−1
λ − µ n−1
|J
x − Jλm−1 x| +
|Jµ x − Jλm x|
λ µ
λ
= α an−1,m−1 + (1 − α) an−1,m .
It thus follows from Lemma 2.1 that
[t]
[t]
1
|Jµµ x − Jλλ x| ≤ 2(λ2 + λt) 2 kAxk.
(5.14)
[t]
Thus, Jλλ x converges to S(t)x uniformly on any bounded intervals, as λ → 0+ . Since
[t]
Jλλ is non-expansive, so is S(t). Hence (5.13) holds. Next, we show that S(t) satisfies
the semigroup property S(t + s)x = S(t)S(s)x for x ∈ dom (A) and t, s ≥ 0. Letting
µ → 0+ in (??), we obtain
[t]
1
|T (t)x − Jλλ x| ≤ 2(λ2 + λt) 2 .
(5.15)
[ λs ]
for x ∈ dom (A). If we let x = Jλ z, then x ∈ dom (A) and
[s]
[t] [s]
1
|S(t)Jλλ z − Jλλ Jλλ z| ≤ 2(λ2 + λt) 2 kAzk
(5.16)
[ λs ]
t
s
where we used that kAJλ zk ≤ kAzk for z ∈ dom (A). Since [ t+s
λ ] − ([ λ ] + [ λ ]) equals
0 or 1, we have
]
[ t+s
λ
|Jλ
[t] [s]
z − Jλλ Jλλ z| ≤ |Jλ z − z| ≤ λ kAzk.
(5.17)
It thus follows from (5.15)–(5.17) that
[ t+s
]
λ
|S(t + s)z − S(t)S(s)z| ≤ |S(t + s)x − Jλ
[t] [s]
[s]
[s]
[ t+s
]
λ
z| + |Jλ
+|Jλλ Jλλ z − S(t)Jλλ z| + |S(t)Jλλ z − S(t)S(s)z| → 0
103
[t] [s]
z − Jλλ Jλλ z|
as λ → 0+ . Hence, S(t + s)z = S(t)S(s)z.
Finally, since
t
[t]
|Jλλ x − x| ≤ [ ] |Jλ x − x| ≤ t kAxk
λ
we have |S(t)x − x| ≤ t kAxk for x ∈ dom (A). From this we obtain |S(t)x − x| → 0 as
t → 0+ for x ∈ dom (A) and also
|S(t)x − S(s)x| ≤ |S(t − s)x − x| ≤ (t − s) kAxk
for x ∈ dom (A) and t ≥ s ≥ 0. 6
Cauchy Problem
Definition 3.1 Let x0 ∈ X and ω ∈ R. Consider the Cauchy problem
d
u(t) ∈ Au(t),
dt
u(0) = x0 .
(6.1)
(1) A continuous function u(t) : [0, T ] → X is called a strong solution of (6.1) if u(t)
is Lipschitz continuous with u(0) = x0 , strongly differentiable a.e. t ∈ [0, T ], and (6.1)
holds a.e. t ∈ [0, T ].
(2) A continuous function u(t) : [0, T ] → X is called an integral solution of type ω of
(6.1) if u(t) satisfies
|u(t) − x| − |u(tˆ) − x| ≤
t
Z
tˆ
(ω |u(s) − x| + hy, u(s) − xi+ ) ds
(6.2)
for all [x, y] ∈ A and t ≥ tˆ ∈ [0, T ].
Theorem 3.1 Let A be a dissipative subset of X × X. Then, the strong solution to
(6.1) is unique. Moreover if the range condition (5.14) holds, then then the strong
solution u(t) : [0, ∞) → X to (6.1) is given by
t
u(t) = lim (I − λ A)−[ λ ] x
λ→0+
for x ∈ dom (A) and t ≥ 0.
Proof: Let ui (t), i = 1, 2 be the strong solutions to (6.1). Then, t → |u1 (t) − u2 (t)| is
Lipschitz continuous and thus a.e. differentiable t ≥ 0. Thus
d
|u1 (t) − u2 (t)|2 = 2 hu01 (t) − u02 (t), u1 (t) − u2 (t)ii
dt
a.e. t ≥ 0. Since u0i (t) ∈ Aui (t), i = 1, 2 from the dissipativeness of A, we have
d
2
dt |u1 (t) − u2 (t)| ≤ 0 and therefore
2
Z
|u1 (t) − u2 (t)| ≤
0
t
d
|u1 (t) − u2 (t)|2 dt ≤ 0,
dt
which implies u1 = u2 .
t
For 0 < 2λ < s let uλ (t) = (I − λ A)−[ λ ] x and define gλ (t) = λ−1 (u(t) − u(t − λ)) −
u0 (t) a.e. t ≥ λ. Since limλ→0+ |gλ | = 0 a.e. t > 0 and |gλ (t)| ≤ 2M for a.e. t ∈ [λ, s],
104
where M is a Lipschitz constant of u(t) on [0, s], it follows that limλ→0+
by Lebesgue dominated convergence theorem. Next, since
Rs
λ
|gλ (t)| dt = 0
u(t − λ) + λ gλ (t) = u(t) − λ u0 (t) ∈ (I − λ A)u(t),
we have u(t) = (I − λ A)−1 (u(t − λ) + λ gλ (t)). Hence,
|uλ (t) − u(t)| ≤ |(I − λ A)−[
t−λ
]
λ
x − u(t − λ) − λ gλ (t)|
≤ |uλ (t − λ) − u(t − λ)| + λ |gλ (t)|
a.e. t ∈ [λ, s]. Integrating this on [λ, s], we obtain
Z
Z λ
Z s
−1
−1
|uλ (t) − u(t)| dt +
|uλ (t) − u(t)| dt ≤ λ
λ
|gλ (t)| dt.
λ
0
s−λ
s
Letting λ → 0+ , it follows from Theorem 2.2 that |S(s)x−u(s)| = 0 since u is Lipschitz
continuous, which shows the desired result. In general, the semigroup S(t) generated on dom (A) in Theorem 2.2 in not necessary strongly differentiable. In fact, an example of an m-dissipative A satisfying (5.14)
is given, for which the semigroup constructed in Theorem 2.2 is not even weakly differentiable for all t ≥ 0. Hence, from Theorem 3.1 the corresponding Cauchy problem
(6.1) does not have a strong solution. However, we have the following.
Theorem 3.2 Let A be an ω-dissipative subset satisfying (5.14) and S(t), t ≥ 0 be
the semigroup on dom (A), as constructed in Theorem 2.2. Then, the followings hold.
(1) u(t) = S(t)x on dom (A) defined in Theorem 2.2 is an integral solution of type ω
to the Cauchy problem (6.1).
(2) If v(t) ∈ C(0, T ; X) be an integral of type ω to (6.1), then |v(t) − u(t)| ≤ eωt |v(0) −
u(0)|.
(3) The Cauchy problem (6.1) has a unique solution in dom (A) in the sense of Definition
2.1.
Proof: A simple modification of the proof of Theorem 2.2 shows that for x0 ∈ dom (A)
t
S(t)x0 = lim (I − λ A)−[ λ ] x0
λ→0+
exists and defines the semigroup S(t) of nonlinear ω-contractions on dom (A), i.e.,
|S(t)x − S(t)y| ≤ eωt |x − y|
for t ≥ 0 and x, y ∈ dom (A).
For x0 ∈ dom (A) we define for λ > 0 and k ≥ 1
yλk = λ−1 (Jλk x0 − Jλk−1 x0 ) = Aλ Jλk−1 x0 ∈ AJλk .
(6.3)
Since A is ω-dissipative, hyλ,k − y, Jλk x0 − x)i− ≤ ω |Jλk x0 − x)| for [x, y] ∈ A. Since
from Lemma 1.1 (4) hy, xi− − hz, xi+ ≤ hy − z, xi− , it follows that
hyλk , Jλk x0 − xi− ≤ ω |Jλk x0 − x)| + hy, Jλk x0 − xi+
(6.4)
Since from Lemma 1.1 (3) hx + y, xi− = |x| + hy, xi− , we have
hλyλk , Jλk x0 − x)i− = |Jλk x0 − x| + h−(Jλk−1 − x), Jλk x0 − xi≥ |Jλk x0 − x)| − |Jλk−1 x0 − x|.
105
It thus follows from (6.4) that
|Jλk x0 − x| − |Jλk−1 x0 − x| ≤ λ (ω |Jλk x0 − x| + hy, Jλk x0 − xi+ ).
t
Since J [ λ ] = Jλk on t ∈ [kλ, (k + 1)λ), this inequality can be written as
Z (k+1)λ
[t
[t]
(ω |Jλλ x0 − x| + hy, Jλλ x0 − xi+ dt.
|Jλk x0 − x| − |Jλk−1 x0 − x| ≤
kλ
ˆ
Hence, summing up this in k from k = [ λt ] + 1 to [ λt ] we obtain
Z [ t ]λ
ˆ
λ
[s]
[ λt ]
[ λt ]
[s]
(ω |Jλλ x0 − x| + hy, Jλλ x0 − xi+ ds.
|Jλ x0 − x| − |Jλ x0 − x| ≤
ˆ
[ λt ]λ
[s]
Since |Jλλ x0 | ≤ (1 − λω)−k |x0 | ≤ ekλω |x0 |, by Lebesgue dominated convergence theorem and the upper semicontinuity of h·, ·i+ , letting λ → 0+ we obtain
Z t
|S(t)x0 − x| − |S(tˆ)x0 − x| ≤
(ω |S(s)x0 − x| + hy, S(t)x0 − xi+ ) ds
(6.5)
tˆ
for x0 ∈ dom (A). Similarly, since S(t) is Lipschitz continuous on dom (A), again by
Lebesgue dominated convergence theorem and the upper semicontinuity of h·, ·i+ , (6.5)
holds for all x0 ∈ dom (A).
(2) Let v(t) ∈ C(0, T ; X) be an integral of type ω to (6.1). Since [Jkλ x0 , yλk ] ∈ A, it
follows from (6.2) that
Z t
k
k
(ω |v(s) − Jλk x0 | + hyλk , v(s) − Jλk x0 i+ ) ds. (6.6)
|v(t) − Jλ x0 | − |v(tˆ) − Jλ x0 | ≤
tˆ
Since λ yλk = −(v(s) − Jλk x0 ) + (v(s) − Jλk−1 x0 ) and from Lemma 1.1 (3) h−x + y, xi+ =
−|x| + hy, xi+ , we have
hλyλk , v(s)−Jλk x0 i = −|v(s)−Jλk x0 |+hv(s)−Jλk−1 x0 , v(s)−Jλk x0 i+ ≤ −|v(s)−Jλk x0 |+|v(s)−Jλk−1 x0 |.
Thus, from (6.6)
(|v(t)−Jλk x0 |−|v(tˆ)−Jλk x0 |)λ
Z
≤
tˆ
t
(ωλ |v(s)−Jλk x0 |−|v(s)−Jλk x0 |+|v(s)−Jλk−1 x0 |) ds
Summing up the both sides of this in k from [ λτˆ ] + 1 to [ λτ ], we obtain
Z [ τ ]λ
λ
[σ]
[σ]
(|v(t) − Jλλ x0 | − |v(tˆ − Jλλ x0 | dσ
τ
ˆ
[λ
]λ
t
Z
≤
(−|v(s) −
tˆ
[τ ]
Jλλ x0 |
+ |v(s) −
[ τˆ ]
Jλλ x0 |
Z
+
τ
[λ
]λ
τ
ˆ
[λ
]λ
[σ]
ω |v(s) − Jλλ x0 | dσ) ds.
Now, by Lebesgue dominated convergence theorem, letting λ → 0+
Z τ
Z t
(|v(t) − u(σ)| − |v(tˆ − u(σ)|) dσ +
(|v(s) − u(τ )| − |v(s) − u(ˆ
τ )|) ds
tˆ
τˆ
(6.7)
Z tZ
≤
τ
ω |v(s) − u(σ)| dσ ds.
tˆ
τˆ
106
For h > 0 we define Fh by
−2
t+h Z t+h
Z
|v(s) − u(σ)| dσ ds.
Fh (t) = h
t
t
d
Fh (t) ≤ ω Fh (t) and thus Fh (t) ≤ eωt Fh (0). Since u, v are
dt
continuous we obtain the desired estimate by letting h → 0+ . Then from (6.7) we have
Lemma 3.3 Let A be an ω-dissipative subset satisfying (5.14) and S(t), t ≥ 0 be the
semigroup on dom (A), as constructed in Theorem 2.2. Then, for x0 ∈ dom (A) and
[x, y] ∈ A
Z t
(6.8)
|S(t)x0 − x|2 − |S(tˆ)x0 − x|2 ≤ 2 (ω |S(s)x0 − x|2 + hy, S(s)x0 − xis ds
tˆ
and for every f ∈ F (x0 − x)
lim sup Re h
t→0+
S(t)x0 − x0
, f i ≤ ω |x0 − x|2 + hy, x0 − xis .
t
(6.9)
Proof: Let ykλ be defined by (6.3). Since A is ω-dissipative, there exists f ∈ F (Jλk x0 −x)
such that Re (yλk − y, f i ≤ ω |Jλk x0 − x|2 . Since
Re hyλk , f i = λ−1 Re hJλk x0 − x − (Jλk−1 x0 − x), f i
≥ λ−1 (|Jλk x0 − x|2 − |Jλk−1 x0 − x||Jλk x0 − x|) ≥ (2λ)−1 (|Jλk x0 − x|2 − |Jλk−1 x0 − x|2 ),
we have from Theorem 1.4
|Jλk x0 − x|2 − |Jλk−1 x0 − x|2 ≤ 2λ Re hyλk , f i ≤ 2λ (ω |Jλk x0 − x|2 + hy, Jλk x0 − xis .
[t]
Since Jλλ x0 = Jλk x0 on [kλ, (k + 1)λ), this can be written as
|Jλk x0
|Jλk−1 x0
2
− x| −
Z
2
(k+1)λ
[t
[t]
(ω |Jλλ x0 − x|2 + hy, Jλλ x0 − xis ) dt.
− x| ≤
kλ
Hence,
[t]
|Jλλ x0
ˆ
2
− x| −
[t]
|Jλλ x0
Z
2
− x| ≤ 2λ
[ λt ]λ
ˆ
[ λt ]λ
[s]
[s]
(ω |Jλλ x0 − x|2 + hy, Jλλ x0 − xis ) ds.
[s]
Since |Jλλ x0 | ≤ (1 − λω)−k |x0 | ≤ ekλω |x0 |, by Lebesgue dominated convergence theorem and the upper semicontinuity of h·, ·is , letting λ → 0+ we obtain (6.8).
Next, we show (6.9). For any given f ∈ F (x0 − x) as shown above
2Re hS(t)x0 − x0 , f i ≤ |S(t)x0 − x|2 − |x0 − x|2 .
Thus, from (6.8)
Z
Re hS(t)x0 − x0 , f i ≤
t
(ω |S(s)x0 − x|2 + hy, S(s)x0 − xis ) ds
0
107
Since s → S(s)x0 is continuous, by the upper semicontinuity of h·, ·is , we have (6.9).
Theorem 3.4 Assume that A is a close dissipative subset of X × X and satisfies the
range condition (5.14) and let S(t), t ≥ 0 be the semigroup on dom (A), defined in
Theorem 2.2. Then, if S(t)x is strongly differentiable at t0 > 0 then
S(t0 )x ∈ dom (A)
and
d
S(t)x |t=t0 ∈ AS(t)x,
dt
S(t0 )x ∈ dom (A0 )
and
d
S(t)x t=t0 = A0 S(t0 )x.
dt
and moreover
d
Proof: Let dt
S(t)x |t=t0 = y . Then S(t0 − λ)x − (S(t0 )x − λ y) = o(λ), where |o(λ)|
λ →
+
0 as λ → 0 . Since S(t0 − λ)x ∈ dom (A), there exists a [xλ , yλ ] ∈ A such that
S(t0 − λ)x = xλ − λ yλ and
λ (y − yλ ) = S(t0 )x − xλ + o(λ).
(6.10)
If we let x = xλ , y = yλ and x0 = S(t0 )x in (6.9), then we obtain
Re hy, f i ≤ ω |S(t0 )x − xλ |2 + hyλ , S(t0 )x − xλ is .
for all f ∈ (S(t0 )x − xλ ). It follows from Lemma 1.3 that there exists a g ∈ F (S(t0 )x −
xλ ) such that hyλ , S(t)x − xλ is = Re hyλ , gi and thus
Re hy − yλ , gi ≤ ω |S(t0 )x − xλ |2 .
From (6.10)
λ−1 |S(t0 )x − xλ |2 ≤
and thus
|o(λ)|
|S(t0 )x − xλ | + ω |S(t0 )x − xλ |2
λ
S(t0 )x − xλ →0
(1 − λω) λ
as λ → 0+ .
Combining this with (6.10), we obtain
xλ → S(t0 )x
and yλ → y
as λ → 0+ . Since A is closed, it follows that [S(t0 )x, y] ∈ A, which shows the first
assertion.
Next, from Theorem 2.2
|S(t0 + λ)x − S(t0 )x| ≤ λ kAS(t0 )xk
for λ > 0
This implies that |y| ≤ kAS(t0 )xk. Since y ∈ AS(t0 )x, it follows that S(t0 )x ∈
dom (A0 ) and y ∈ A0 S(t0 )x. Theorem 3.5 Let A be a dissipative subset of X × X satisfying the range condition
(5.14) and S(t), t ≥ 0 be the semigroup on dom (A), defined in Theorem 2.2. Then
the followings hold.
108
(1) For ever x ∈ dom (A)
lim |Aλ x| = lim inf |S(t)x − x|.
λ→0+
t→0+
(2) Let x ∈ dom (A). Then, limλ→0+ |Aλ x| < ∞ if and only if there exists a sequence
{xn } in dom (A) such that limn→∞ xn = x and supn kAxn k < ∞.
Proof: Let x ∈ dom (A).
From Theorem 1.4 |Aλ x| is monotone decreasing and
[t]
lim |Aλ x| exists (including ∞). Since |Jλλ − x| ≤ t |Aλ x|, it follows from Theorem
2.2 that
|S(t)x − x| ≤ t lim |Aλ x|
λ→0+
thus
lim inf
t→0+
1
|S(t)x − x| ≤ lim |Aλ x|.
t
λ→0+
Conversely, from Lemma 3.3
− lim inf
t→0+
1
|S(t)x − x||x − u| ≤ hv, x − uis
t
for [u, v] ∈ A. We set u = Jλ x and v = Aλ x. Since x − u = −λ Aλ x,
− lim inf
t→0+
1
|S(t)x − x|λ|Aλ x| ≤ −λ |Aλ x|2
t
which implies
1
|S(t)x − x| ≥ |Aλ x|.
t
Theorem 3.7 Let A be a dissipative set of X × X satisfying the range condition (5.14)
and let S(t), t ≥ 0 be the semigroup defined on dom (A) in Theorem 2.2.
(1) If x ∈ dom (A) and S(t)x is differentiable a.e., t > 0, then u(t) = S(t)x, t ≥ 0
is a unique strong solution of the Cauchy problem (6.1).
(2) If X is reflexive. Then, if x ∈ dom (A) then u(t) = S(t)x, t ≥ 0 is a unique
strong solution of the Cauchy problem (6.1).
lim inf
t→0+
Proof: The assertion (1) follows from Theorems 3.1 and 3.5. If X is reflexive, then since
an X-valued absolute continuous function is a.e. strongly differentiable, (2) follows
from (1). 6.0.3
Infinitesimal generator
Definition 4.1 Let X0 be a subset of a Banach space X and S(t), t ≥ 0 be a semigroup
of nonlinear contractions on X0 . Set Ah = h−1 (T (h) − I) for h > 0 and define the
strong and weak infinitesimal generators A0 and Aw by
A0 x = limh→0+ Ah x
with dom (A0 ) = {x ∈ X0 : limh→0+ Ah x exists}
A0 x = w − limh→0+ Ah x
with dom (A0 ) = {x ∈ X0 : w − limh→0+ Ah x exists},
(6.11)
ˆ
respectively. We define the set D by
ˆ = {x ∈ X0 : lim inf |Ah x| < ∞}
D
h→0
109
(6.12)
Theorem 4.1 Let S(t), t ≥ 0 be a semigroup of nonlinear contractions defined on a
closed subset X0 of X. Then the followings hold.
(1) hAw x1 − Aw x2 , x∗ i for all x1 , x2 ∈ dom (Aw ) and x∗ ∈ F (x1 − x2 ). In particular,
A0 and Aw are dissipative.
ˆ
(2) If X is reflexive, then dom (A0 ) = dom (Aw ) = D.
ˆ In addition, if X is
(3) If X is reflexive and strictly convex, then dom(Aw ) = D.
ˆ
uniformly convex, then dom (Aw ) = dom (A0 ) = D and Aw = A0 .
Proof: (1) For x1 , x2 ∈ X0 and x∗ ∈ F (x1 − x2 ) we have
hAh x1 − Ah x2 , x∗ i = h−1 (hS(h)x1 − S(h)x2 , x∗ i − |x1 − x2 |2 )
≤ h−1 (|S(h)x1 − S(h)x2 ||x1 − x2 | − |x1 − x2 |2 ) ≤ 0.
Letting h → 0+ , we obtain the desired inequality.
ˆ Let x ∈ D.
ˆ It suffices to show that
(2) Obviously, dom (A0 ) ⊂ dom (Aw ) ⊂ D.
x ∈ dom (A0 ). We can show that t → S(t)x is Lipschitz continuous. In fact, there
exists a monotonically decreasing sequence {tk } of positive numbers and L > 0 such
that tk → 0 as k → ∞ and |S(tk )x − x| ≤ L tk . Let h > 0 and nk be a nonnegative
integer such that 0 ≤ h − nk tk < tk . Then we have
|S(t + h)x − S(t)x| ≤ |S(t)x − x| = |S(h − nk tk + nk tk )x − x|
≤ |S(h − nk tk )x − x| + L nk tk ≤ |S(h − nk tk )x − x| + L h
By the strong continuity of S(t)x at t = 0, letting k → ∞, we obtain |S(t + h)x −
S(t)x| ≤ L h. Now, since X is reflexive this implies that S(t)x is a.e. differentiable
d
on (0, ∞). But since S(t)x ∈ dom (A0 ) whenever dt
S(t)x exists, S(t)x ∈ dom (A0 ) a.e.
+
t > 0. Thus, since |S(t)x − x| → 0 as t → 0 , it follows that x ∈ dom (A0 ).
ˆ and Y be the set of all
(3) Assume that X is reflexive and strictly convex. Let x0 ∈ D
−1
+
˜
weak cluster points of t (S(t)x0 − x0 ) as t → 0 . Let A be a subset of X × X defined
by
˜ = dom (A0 ) ∪ {x0 }
A˜ = A0 ∪ [x0 , co Y ] and dom (A)
where co Y denotes the closure of the convex hull of Y . Note that from (1)
hA0 x1 − A0 x2 , x∗ i ≤ 0
for all x1 , x2 ∈ dom (A0 ) and x∗ ∈ F (x1 − x2 )
and for every y ∈ Y
hA0 x1 − y, x∗ i ≤ 0
for all x1 ∈ dom (A0 ) and x∗ ∈ F (x1 − x0 ).
This implies that A˜ is a dissipative subset of X×X. But, since X is reflexive, t → S(t)x0
is a.e. differentiable and
d
˜
S(t)x0 = A0 S(t)x0 ∈ AS(t)x
0,
dt
a.e., t > 0.
It follows from the dissipativity of A˜ that
h
d
(S(t)x0 − x0 ), x∗ i ≤ hy, x∗ i,
dt
110
˜ 0
a.e, t > 0 and y ∈ Ax
(6.13)
for x∗ ∈ F (S(t)x0 − x0 ). Note that for h > 0
hh−1 (S(t+h)x0 −S(t)x0 ), x∗ i ≤ h−1 (|S(t+h)x0 −x0 |−|S(t)x0 −x0 |)|x∗ |
for x∗ ∈ F (S(t)x0 −x0 ).
Letting h → 0+ , we have
h
d
d
(S(t)x0 − x0 ), x∗ i ≤ |S(t)x0 − x0 | |S(t)x0 − x0 |,
dt
dt
a.e. t > 0
The converse inequality follows much similarly. Thus, we have
|S(t)x0 − x0 |
d
d
|S(t)x0 − x0 | = h (S(t)x0 − x0 ), x∗ i for x∗ ∈ F (S(t)x0 − x0 ). (6.14)
dt
dt
d
It follows from (6.13)–(6.14) that |S(t)x0 − x0 | ≤ |y| for y ∈ Y and a.e. t > 0 and
dt
thus
˜ 0 k for all t > 0.
|S(t)x0 − x0 | ≤ t kAx
(6.15)
˜ 0 = co Y is a closed convex subset of X. Since X is reflexive and strictly
Note that Ax
˜ 0 such that |y0 | = kAx
˜ 0 k. Hence, (6.15)
convex, there exists a unique element y0 ∈ Ax
implies that co Y = y0 = Aw x0 and therefore x0 ∈ dom (Aw ).
ˆ Then
Next, we assume that X is uniformly convex and let x0 ∈ dom(Aw ) = D.
w − lim
S(t)x0 − x0
= y0
t
as t → 0+ .
From (6.15)
|t−1 (S(t)x0 − x0 )| ≤ |y0 |,
a.e. t > 0.
Since X is uniformly convex, these imply that
lim
S(t)x0 − x0
= y0
t
as t → 0+ .
which completes the proof. Theorem 4.2 Let X and X ∗ be uniformly convex Banach spaces. Let S(t), t ≥
0 be the semigroup of nonlinear contractions on a closed subset X0 and A0 be the
infinitesimal generator of S(t). If x ∈ dom (A0 ), then
(i) S(t)x ∈ dom (A0 ) for all t ≥ 0 and the function t → A0 S(t)x is right continuous on
[0, ∞).
+
+
(ii) S(t)x has a right derivative ddt S(t)x for t ≥ 0 and ddt S(t)x = A0 S(t)x, t ≥ 0.
d
(iii) dt
S(t)x exists and is continuous except a countable number of values t ≥ 0.
ˆ and thus S(t)x ∈
Proof: (i) − (ii) Let x ∈ dom (A0 ). By Theorem 4.1, dom (A0 ) = D
dom (A0 ) and
d+
S(t)x = A0 S(t)x for t ≥ 0.
(6.16)
dt
d
Moreover, t → S(t)x a.e. differentiable and dt
S(t)x = A0 S(t)x a.e. t > 0. We next
prove that A0 S(t)x is right continuous. For h > 0
d
(S(t + h)x − S(t)x) = A0 S(t + h)x − A0 S(t)x,
dt
111
a.e. t > 0.
From (6.14)
|S(t + h)x − S(t)x|
d
|S(t + h)x − S(t)x| = hA0 S(t + h)x − A0 S(t)x, x∗ i ≤ 0
dt
for all x∗ ∈ F (S(t + h)x − S(t)x), since A0 is dissipative. Integrating this over [s, t],
we obtain
|S(t + h)x − S(t)x| ≤ |S(s + h)x − S(s)x| for 0 ≤ s ≤ t
and therefore
d+
d+
S(t)x| ≤ | S(s)x|.
dt
ds
Hence t → |A0 S(t)x| is monotonically non-increasing function and thus it is right
continuous. Let t0 ≥ 0 and let {tk } be a decreasing sequence of positive numbers such
that tk → t0 . Without loss of generality, we may assume that w−limk→∞ A0 S(tk ) = y0 .
The right continuity of |A0 S(t)x| at t = t0 , thus implies that
|
|y0 | ≤ |A0 S(t0 )x|
(6.17)
since norm is weakly lower semicontinuous. Let A˜0 be the maximal dissipative exten˜
sion of A0 . It then follows from Theorem 1.9 that A˜ is demiclosed and thus y0 ∈ AS(t)x.
˜
On the other hand, for x ∈ dom (A0 ) and y ∈ Ax, we have
h
d
(S(t)x − x), x∗ i ≤ hy, x∗ i for all x∗ ∈ F (S(t)x − x)
dt
d
˜
a.e. t > 0, since A˜ is dissipative and dt
S(t)x = A0 S(t)x ∈ AS(t)x
a.e. t > 0. From
(6.14) we have
˜ = kA˜0 xk for t ≥ 0
t−1 |S(t)x − x| ≤ |Ax|
˜ Hence A0 x = A˜0 x. It thus follows from (6.17)
where A˜0 is the minimal section of A.
that y0 = A0 S(t0 )x and limk→∞ A0 S(tk )x = y0 since X is uniformly convex. Thus,
we have proved the right continuity of A0 S(t)x for t ≥ 0.
(iii) Integrating (6.16) over [t, t + h], we have
Z t+h
S(t + h)x − S(t)x =
A0 S(s)x ds
t
for t, h ≥ 0. Hence it suffices to prove that the function t → A0 S(t)x is continuous
except a cuntable number of t > 0. Using the same arguments as above, we can show
that if |A0 S(t)x| is continuous at t = t0 , then A0 S(t)x is continuous at t = t0 . But
since |A0 S(t)x| is monotone non-increasing, it follows that it has at most countably
many discontinuities, which completes the proof. Theorem 4.3 Let X and X ∗ be uniformly convex Banach spaces. If A be m-dissipative,
then A is demiclosed, dom (A) is a closed convex set and A0 is single-valued operator
with dom (A0 ) = dom (A). Moreover, A0 is the infinitesimal generator of a semigroup
of contractions on dom (A).
Proof: It follows from Theorem 1.9 that A is demiclosed. The second assertion follows
from Theorem 1.12. Also, from Theorem 3.4
d
S(t)x = A0 S(t)x,
dt
112
a.e. t > 0
and
|S(t)x − x| ≤ t |A0 x|,
t>0
(6.18)
for x ∈ dom (A). Let A0 be the infinitesimal generator of the semigroup S(t), t ≥ 0
generated by A defined in Theorem 2.2. Then, (6.18) implies that by Theorem 4.1
+
x ∈ dom (A0 ) and by Theorem 4.2 ddt S(t)x = A0 S(t)x and A0 S(t)x is right continuous
in t. Since A is closed,
A0 x = lim A0 S(t)x ∈ Ax.
t→0+
Hence, (6.17) implies that A0 x = A0 x.
When X is a Hilbert space we have the nonlinear version of Hille-Yosida theorem
as follows.
Theorem 4.4 Let H be a Hilbert space. Then,
(1) The infinitesimal generator A0 of a semigroup of contractions S(t), t ≥ 0 on a
closed convex set X0 has a dense domain in X0 and there exists a unique maximal
dissipative operator A such that A0 = A0 .
Conversely,
(2) If A0 is a maximal dissipative operator, then dom (A) is a closed convex set and
A0 is the infinitesimal generator of contractions on dom (A).
Proof: (2) Since from Theorem 1.7 the maximal dissipative operator in a Hilbert space
is m-dissipative, (2) follows from Theorem 4.3.
Example (Nonlinear Diffusion)
ut = Au = ∆γ(u) + ∇ · (f~(u))
on X = L1 . Assume γ : R → R is strictly monotone. Thus, sign(x − y) = sign(γ(x) −
γ(y))
(Au1 −Au2 , ρ (γ(u1 )−γ(u2 ))) = −(∇(γ(u1 )−γ(u2 ))ρ0 )∇(γ(u1 )−γ(u2 )))−(ρ0 (u1 −u2 ))(f (u1 )−f (u2 )), ∇(u1 −u
6.1
Fixed Point Theorems
Banach Fixed Point Theorem. Let (X, d) be a non-empty complete metric space
with a contraction mapping T : X → X. Then T admits a unique fixed-point x∗ in X
(i.e. T (x∗) = x∗). Moreover, x∗ is a fixed point of the fixed point iterate xn = T (xn−1 )
with arbitrary initial condition x0 .
Proof:
d(xn+1 , xn ) = d(T (xn ), T (xn−1 ) ≤ ρ d(xn , xn−1 .
By the induction
d(xn+1 , xn ) ≤ ρn d(x1 , x0 )
and
d(xm , xn ) ≤
ρn−1
d(x1 , x0 ) → 0
1−ρ
as m ≥ n → ∞. Thus, {xn } is a Cauchy sequence and {xn } converges a fixed point x∗
of T . Suppose x∗1 , x∗2 are two fixed point. Then,
d(x∗1 , x∗2 ) = d(T (x∗1 ), T (x∗2 ) ≤ ρ d(x∗1 , x∗2 )
113
Since ρ < 1, d(x∗1 , x∗2 ) = 0 and thus x∗1 = x∗2 . Brouwer fixed point theorem: is a fundamental result in topology which proves the
existence of fixed points for continuous functions defined on compact, convex subsets
of Euclidean spaces. Kakutani’s theorem extends this to set-valued functions.
Kakutani’s theorem states: Let S be a non-empty, compact and convex subset of
some Euclidean space Rn . Let T : S → S be a set-valued function on S with a closed
graph and the property that T (x) is non-empty and convex for all x ∈ S. Then T has
a fixed point.
The Schauder fixed point theorem is an extension of the Brouwer fixed point theorem to topological vector spaces.
Schauder fixed point theorem Let X be a locally convex topological vector space,
and let K ⊂ X be a compact, and convex set. Then given any continuous mapping
T : K → K has a fixed point.
Given > 0 there exists the family of open sets {B ?(x) : x ∈ K} is open covering
of K Since K is compact. there exists a finite open subcover, i.e. there exist finite
many points {xk } in K such that
K=
n
[
B (xk )
k=1
Define the functions {gk } by
P gk (x) = max(0, − |x − xk |). It is clear that each gk is
continuous, gk (x) ≥ 0 and k gk (x) > 1 for all x ∈ K. Thus, we can define a function
on K by
P
k gk (x) xk
g?(x) = P
k gk (x)
and g is a continuous function from K to the convex hull K0 of of {xk }. It is easily
shown that |g?(x) − x| ≤ for x ∈ K. Then A maps K0 to K0 Since K0 is compact
convex subset of a finite dimensional vector space, we can apply the Brouwer fixed
point theorem to assure the existence of z ∈ K0 such that x = A(x ). We have the
estimate
|z − T (z )| = |g(T (z )) − T (z )| ≤ .
Since K is compact, the exits a subsequence n → 0 such that xn → x0 ∈ K and thus
|xn − T (xn )| ≤ |g(T (xn )) − T (xn )| ≤ n .
Since T is continuous, T (x0 ) = x0 . Schaefer’s fixed point theorem: Let T be a continuous and compact mapping of a
Banach space X into itself, such that the set
{x ∈ X : x = λT x for some 0 ≤ λ ≤ 1}
is bounded. Then T has a fixed point.
Proof: Let |x| ≤ M for {x ∈ X : x = λT x for some 0 ≤ λ ≤ 1}. Define
T˜(x) =
M T (x)
max(M, |T (x)|)
114
Note that T˜ : B(0, M ) → B(0, M ).. Let K be the closed convex hull go T˜(B(0, M )).
Since T is compact, K is a compact convex subset of X. Since T˜K → K, it follows
from Schauder’s fixed point theorem that there exits x ∈ K such that x = T˜(x). We
now claim that x is a fixed point of T . If not, we should have |T (x)| > M and
x = λ T (u) for λ =
M
<1
|T (x)|
But, since |x| = |T˜(x)| ≤ M , which is a contradiction. The advantage of Schaefer’s theorem over Schauder’s for applications is that it is
not necessary to determine an explicit convex, compact set K.
Krasnoselskii theorem The sum of two operators T = A + B has a fixed point in
a nonempty closed convex subset C of a real Banach space X under conditions such
that T (C) ⊂ C, A is continuous on C, A(C) is a relatively compact subset of X, and
B is a strict contraction on C.
Note that the proof of Kransnoselskiis fixed point theorem combines the Banach
contraction theorem and Schauders fixed point theorem, i.e., y ∈ C, Ay + Bx = x has
the fixed point map x = ψ(y) and use Schauders fixed point theorem for ψ.
Kakutani-Glicksberg-Fan theorem Let S be a non-empty, compact and convex
subset of a locally convex topological vector space X. Let T : S → S be a Kakutani
map, i.e., it is upper hemicontinuous, i.e., if for every open set W ⊂ X, the set
{x ∈ X : T (x) ∈ W } is open in X and T (x) is non-empty, compact and convex for all
x ∈ X. Then T has a fixed point.
The corresponding result for single-valued functions is the Tychonoff fixed-point
theorem.
7
Convex Analysis and Duality
This chapter is organized as follows. We present the basic convex analysis and duality
theory and subdifferntial and monotone operator theory in Banach spaces.
7.1
Convex Functional and Subdifferential
Definition (Convex Functional) (1) A proper convex functional on a Banach space
X is a function ϕ from X to (−∞, ∞], not identically +∞ such that
ϕ((1 − λ) x1 + λ x2 ) ≤ (1 − λ) ϕ(x1 ) + λ ϕ(x2 )
for all x1 , x2 ∈ X and 0 ≤ λ ≤ 1.
(2) A functional ϕ : X → R is said to be lower-semicontinuous if
ϕ(x) ≤ lim inf ϕ(y)
y→x
for all x ∈ X.
(3) A functional ϕ : X → R is said to be weakly lower-semicontinuous if
ϕ(x) ≤ lim inf ϕ(xn )
n→∞
for all weakly convergent sequence {xn } to x.
115
(4) The subset D(ϕ) = {x ∈ X; ϕ(x) ≤ ∞} of X is called the domain of ϕ.
(5) The epigraph of ϕ is defined by epi(ϕ) = {(x, c) ∈ X × R : ϕ(x) ≤ c}.
Lemma 3 A convex functional ϕ is lower-semicontinuous if and only if it is weakly
lower-semicontinuous on X.
Proof: Since the level set {x ∈ X : ϕ(x) ≤ c} is a closed convex subset if ϕ is lowersemicontinuous. Thus, the claim follows the fact that a convex subset of X is closed if
and only if it is weakly closed.
Lemma 4 If ϕ be a proper lower-semicontinuous, convex functional on X, then ϕ is
bounded below by an affine functional, i.e., there exist x∗ ∈ X ∗ and c ∈ R such that
ϕ(x) ≥ hx, x∗ i + β,
x ∈ X.
Proof: Let x0 ∈ X and β ∈ R be such that ϕ(x0 ) > c. Since ϕ is lower-semicontinuous
on X, there exists an open neighborhood V (x0 ) of X0 such that ϕ(x) > c for all
x ∈ V (x0 ). Since the ephigraph epi(ϕ) is a closed convex subset of the product space
X × R. It follows from the separation theorem for convex sets that there exists a closed
hyperplane H ⊂ X × R;
H = {(x, r) ∈ X × R : hx, x∗0 i + r = α}
with x∗0 ∈ X ∗ , α ∈ R,
that separates epi(ϕ) and V (x0 ) × (−∞, c). Since {x0 } × (−∞, c) ⊂ {(x, r) ∈ X × R :
hx, x∗0 i + r < α} it follows that
hx, x∗0 i + r > α
for all (x, c) ∈ epi(ϕ)
which yields the desired estimate.
Definition (Subdifferential) Given a proper convex functional ϕ on a Banach space
X the subdifferential of ∂ϕ(x) is a subset in X ∗ , defined by
∂ϕ(x) = {x∗ ∈ X ∗ : ϕ(y) − ϕ(x) ≥ hy − x, x∗ i for all y ∈ X}.
Since for x∗1 ∈ ∂ϕ(x1 ) and x∗2 ∈ ∂ϕ(x2 ),
ϕ(x1 ) − ϕ(x2 ) ≤ hx1 − x2 , x∗2 i
ϕ(x2 ) − ϕ(x1 ) ≤ hx2 − x1 , x∗1 i
it follows that hx1 − x2 , x∗1 − x∗2 i ≥ 0. Hence ∂ϕ is a monotone operator from X into
X ∗.
Example 1 Let ϕ be Gateaux differentiable at x. i.e., there exists w∗ ∈ X ∗ such that
lim
t→0+
ϕ(x + t v) − ϕ(x)
= hh, w∗ i
t
for all h ∈ X
and w∗ is the Gateaux differential of ϕ at x and is denoted by ϕ0 (x). If ϕ is convex,
then ϕ is subdifferentiable at x and ∂ϕ(x) = {ϕ0 (x)}. Indeed, for v = y − x
ϕ(x + t (y − x)) − ϕ(x)
≤ ϕ(y) − ϕ(x),
t
116
0<t<1
Letting t → 0+ we have
ϕ(y) − ϕ(x) ≥ hy − x, ϕ0 (x)i
for all y ∈ X,
and thus ϕ0 (x) ∈ ∂ϕ(x). On the other hand if w∗ ∈ ∂ϕ(x) we have for y ∈ X and t > 0
ϕ(x + t y) − ϕ(x)
≥ hy, w∗ i.
t
Taking limit t → 0+ , we obtain
hy, ϕ0 (x) − w∗ i ≥ 0
for all y ∈ X.
This implies w∗ = ϕ0 (x).
Example 2 If ϕ(x) = 21 |x|2 then we will show that ∂ϕ(x) = F (x), the duality mapping. In fact, if x∗ ∈ F (x), then
hx − y, x∗ i = |x|2 − hy, x∗ i ≥
1
(|x|2 − |y|2 )
2
for all y ∈ X.
Thus x∗ ∈ ∂ϕ(x). Conversely, if x∗ ∈ ∂ϕ(x) then
|x + t y|2 ≥ |x|2 + 2t hy − x, x∗ i for all y ∈ X and t > 0.
It can be shown that this implies |x∗ | = |x| and hx, x∗ i = |x|2 . Hence x∗ ∈ F (x).
Example 3 Let K be a closed convex subset of X and IK be the indicator function
of K, i.e.,
0 if x ∈ K
IK (x) =
∞ otherwise.
Obviously, IK is convex and lower-semicontinuous on X. By definition we have for
x∈K
∂IK (x) = {x∗ ∈ X ∗ : hx − y, x∗ i ≥ 0 for all y ∈ K}
Thus D(IK ) = D(∂IK ) = K and ∂K (x) = {0} for each interior point of K. Moreover,
if x lies on the boundary of K, then ∂IK (x) coincides with the cone of normals to K
at x.
Definition (Conjugate function) Given a proper functional ϕ on a Banach space
X, the conjugate functional on X ∗ is defined by
ϕ∗ (x∗ ) = sup{hx∗ , xi − ϕ(x)}.
y
Theorem 2.1 The conjugate function ϕ∗ is convex lower semi-continuous and proper.
Proof: For 0 < t < 1 and x∗1 , x∗2 ∈ X ∗
ϕ∗ (t x∗1 + (1 − t) x∗2 ) = supy {ht x∗1 + (1 − t) x∗2 , xi − ϕ(x)}
≥ t supy {hx∗1 , xi − ϕ(x)} + (1 − t) supy {hx∗2 , xi − ϕ(x)}
= t ϕ∗ (x∗1 ) + (1 − t) ϕ∗ (x∗2 ).
117
Thus, ϕ∗ is convex. For > 0 arbitrary let y ∈ X be
ϕ∗ (x∗ ) − ≥ hx∗ , yi − ϕ(y)
For xn → x∗
ϕ∗ (x∗n )hx∗n , yi − ϕ(y)
and
lim inf ϕ∗ (x∗n ) ≥ hx∗ , yi − ϕ(y)
n→∞
Since > arbitrary,
ϕ∗
is lower semi-continuous. Definition (Bi-Conjugate Function) For any proper functional ϕ on X the biconjugate function of ϕ is ϕ∗∗ : X → (−∞, ∞] is defined by
ϕ∗∗ (x) = sup {hx∗ , xi − ϕ∗ (x∗ )}.
x∗ ∈X ∗
Theorem C.3 For any F : X → (−∞, ∞] F ∗∗ is convex and l.s.c. and F ∗∗ ≤ F .
Proof: Note that
F ∗∗ (¯
x) =
sup
{hx∗ , x
¯i − F ∗ (x∗ )}
x∗ ∈D(F ∗ )
Since for x∗ ∈ D(F ∗ )
hx∗ , xi − F (x∗ ) ≤ F (x)
for all x, F ∗∗ (¯
x) ≤ F (¯
x). Theorem If F is a proper l.s.c convex function, then F = F ∗∗ . If X is reflexive, then
(F ∗ )∗ = F ∗∗ .
Proof: By Theorem if F is a proper l.s.c. convex function, then F is bounded below
by an affine functional hx∗ , xi − a, i.e. a ≥ hx∗ , xi − F (x) for all x ∈ X. Define
a(x∗ ) = inf{a ∈ R : a ≥ hx∗ , xi − F (x) for all x ∈ X}
Then for a ≥ a(x∗ )
hx∗ , xi − a ≤ hx∗ , xi − a(x∗ )
for all x ∈ X. By Theorem again
F (¯
x) = sup {hx∗ , x
¯i − a(x∗ )}.
x∗ ∈X ∗
Since
a(x∗ ) = sup {hx∗ , xi − F (x)} = F ∗ (x)
x∈X
we have
F (x) = sup {hx∗ , xi − F ∗ (x∗ )}.
x∗ ∈X ∗
We have the duality property.
Theorem 2.2 Let F be a proper convex function on X.
(1) For F : X → (−∞, ∞] x∗ ∈ ∂F (¯
x) is if and only if
F (¯
x) + F ∗ (x∗ ) = hx∗ , x
¯i.
118
(2) Assume X is reflexive. If x∗ ∈ ∂F (¯
x), then x
¯ ∈ ∂F ∗ (x∗ ). If F is convex and lower
∗
semi-continuous, then x ∈ ∂F (¯
x) if and only if x
¯ ∈ ∂F ∗ (x∗ ).
Proof: (1) Note that x∗ ∈ ∂F (¯
x) is if and only if
hx∗ , xi − F (x) ≤ hx∗ , x
¯i − F (¯
x)
(7.1)
for all x ∈ X. By the definition of F ∗ this implies F ∗ (x∗ ) = hx∗ , x
¯i − F (¯
x). Conversely,
if F (x∗ ) + F (¯
x) = hx∗ , x
¯i then (7.1) holds for all x ∈ X.
(2) Since F ∗∗ ≤ F by Theorem C.3, it follows from (1)
F ∗∗ (¯
x) ≤ hx∗ , x
¯i − F ∗ (x∗ )
But by the definition of F ∗∗
F ∗∗ (¯
x) ≥ hx∗ , x
¯i − F ∗ (x∗ ).
Hence
F ∗ (x∗ ) + F ∗∗ (¯
x) = hx∗ , x
¯i.
Since X is reflexive, F ∗∗ = (F ∗ )∗ . If we apply (1) to F ∗ we have x
¯ ∈ ∂F ∗ (x∗ ). In
addition if F is convex and l.s.c, it follows from Theorem C.1 that F ∗∗ = F . Thus if
x∗ ∈ ∂F ∗ (¯
x) then
F (¯
x) + F ∗ (x∗ ) = F ∗ (x∗ ) + F ∗∗ (x) = hx∗ , x
¯i
by applying (1) for F ∗ . Therefore x∗ ∈ ∂F (¯
x) again by (1). Theorem(Rockafellar) Let X be real Banach space. If ϕ is lower-semicontinuous
proper convex functional on X, then ∂ϕ is a maximal monotone operator from X into
X ∗.
Proof: We prove the theorem when X is reflexive. By Apuland theorem we can assume
that X and X ∗ are strictly convex. By Minty-Browder theorem ∂ϕ it suffices to prove
that R(F + ∂ϕ) = X ∗ . For x∗0 ∈ X ∗ we must show that equation x∗0 ∈ F x + ∂ϕ(x) has
at least a solution x0 Define the proper convex functional on X by
f (x) =
1 2
|x| + ϕ(x) − hx, x∗0 i.
2 X
Since f is lower-semicontinuous and f (x) → ∞ as |x| → ∞ there exists x0 ∈ D(f )
such that f (x0 ) ≤ f (x) for all x ∈ X. Since F is monotone
ϕ(x) − ϕ(x0 ) ≥ hx − x0 , x∗0 i − hx − x0 , F (x)i
Setting xt = x0 + t (u − x0 ) and since ϕ is convex, we have
1
ϕ(u) − ϕ(x0 ) ≥ ϕ(xt ) − ϕ(x0 ) ≥ hu − x0 , x∗0 i − hu − x0 , F (xt )i.
t
Taking limit t → 0+ , we obtain
ϕ(u) − ϕ(x0 ) ≥ hu − x0 , x∗0 i − hu − x0 , F (x0 )i,
which implies x∗0 − F (x0 ) ∈ ∂ϕ(x0 ). 119
We have the perturbation result.
Theorem Assume that X is a real Hilbert space and that A is a maximal monotone
operator on X. Let ϕ be a proper, convex and lower semi-continuous functional on X
satisfying dom(A) ∩ dom(∂ϕ) is not empty and
ϕ((I + λ A)−1 x) ≤ ϕ(x) + λ M,
for all λ > 0, x ∈ D(ϕ),
where M is some non-negative constant. Then the operator A + ∂ϕ is maximal monotone.
Lemma Let A and B be m-dissipative operators on X. Then for every y ∈ X the
equation
y ∈ −Ax − Bλ x
(7.2)
has a unique solution x ∈ dom (A).
Proof Equation (9.17) is equivalent to y = xλ − wλ − Bλ xλ for some wλ ∈ A(xλ ). Thus,
xλ −
=
λ
λ
1
wλ =
y+
(xλ + λ Bλ xλ )
λ+1
λ+1
λ+1
λ
1
y+
(I − λ B)−1 .
λ+1
λ+1
Since A is m-dissipative, we conclude that (9.17) is equivalent to that xλ is the fixed
point of the operator
Fλ x = (I −
λ
1
λ
A)−1 (
y+
(I − λ B)−1 x).
λ+1
λ+1
λ+1
By m-dissipativity of the operators A and B their resolvents are contractions on X
and thus
λ
|Fλ x1 − Fλ x2 | ≤
|x1 − x2 | for all λ > 0, x1 , x2 ∈ X.
λ+1
Hence, Fλ has the unique fixed point xλ and xλ ∈ dom (A) solves (9.17). Proof: From Lemma there exists xλ for y ∈ X such that
y ∈ xλ − (−A)λ xλ + ∂ϕ(xλ )
Moreover, one can show that |xλ | is bounded uniformly. Since
y − xλ + (−A)λ xλ ∈ ∂ϕ(xλ )
for z ∈ X
ϕ(z) − ϕ(xλ ) ≥ (z − xλ , y − xλ + (−A)λ xλ )
Letting λ(I + λA)−1 x, so that z − xλ = λ(−A)λ xλ and we obtain
(λ(−A)λ xλ , y − xλ + (−A)λ xλ ) ≤ ϕ((I + λ A)−1 ) − ϕ(xλ ) ≤ λ M,
and thus
|(−A)λ xλ |2 ≤ |(−A)λ xλ ||y − xλ | + M.
Since xλ | is bounded and so that |(−A)λ xλ |.
120
7.2
Duality Theory
In this section we discuss the duality theory. We call
(P )
inf
x∈X
F (x)
is the primal problem, where X is a Banach space and F : X → (−∞, ∞] is a proper
l.s.c. convex function. We have the following result for the existence of minimizer.
Theorem 2 Let X be reflexive and F be a lower semicontinuous proper convex functional defined on X. Suppose F
lim F (x) = ∞ as |x| → ∞.
(7.3)
Then there exists an x
¯ ∈ X such that
F (¯
x) = inf {F (x); x ∈ X}.
Proof: Let η = inf{F (x); x ∈ X} and let {xn } be a minimizing sequence such that
lim F (xn ) = η. The condition (7.3) implies that {xn } is bounded in X. Since X is
reflexive there exists a subsequence that converges weakly to x
¯ in X and it follows from
Lemma 1 that F (¯
x) = η. We imbed (P ) into a family of perturbed problem
(Py )
inf
x∈X
Φ(x, y)
where y ∈ Y , a Banach space is an imbedding variable and Φ : X × Y → (−∞, ∞] is
a proper l.s.c. convex function with Φ(x, 0) = F (x). Thus (P0 ) = (P ). For example in
terms of (12.1) we let
F (x) = f (x) + ϕ(Λx)
(7.4)
and with Y = H
Φ(x, y) = f (x) + ϕ(Λx + y)
Definition The dual problem of (P ) with respect to Φ is
(P ∗ )
sup (−Φ∗ (0, y ∗ )).
y ∗ ∈Y
The value function of (Py ) is defined by
h(y) = inf
x∈X
Φ(x, y),
y ∈ Y.
We assume that h(y) > −∞ for all y ∈ Y . Then we have
Theorem D.0 inf (P ) ≥ sup (P ∗ ).
Proof. For any (x, y) ∈ X × Y and (x∗ , y ∗ ) ∈ X ∗ × Y ∗
Φ∗ (x∗ , y ∗ ) ≥ hx∗ , xi + hy ∗ , yi − Φ(z).
Thus
F (x) + Φ∗ (0, y ∗ ) ≥ h0, xi + hy ∗ , 0i = 0
121
(7.5)
for all x ∈ X and y ∗ ∈ Y ∗ . Therefore
inf (P ) = inf F (x) ≥ sup (−Φ∗ (0, y ∗ )) = sup (P ∗ ).
y ∗ ∈Y ∗
In what follows we establish the equivalence of the primal problem (P ) and the
dual problem (P ∗ ). We start with the properties of h.
Lemma D.1 h is convex.
Proof. The proof is by contradiction. Suppose there exist y1 , y2 ∈ Y and θ ∈ (0, 1)
such that
h(θ y1 + (1 − θ) y2 ) > θ h(y1 ) + (1 − θ) h(y2 ).
Then there exists c and > 0 such that
h(θ y1 + (1 − θ) y2 ) > c > c − > θ h(y1 ) + (1 − θ) h(y2 ).
Let a1 = h(y1 ) +
θ
> h(y1 ) and
a2 =
c − − θ h(y)
c − θ a1
=
> h(y2 ).
1−θ
1−θ
By definition of h there exist x1 , x2 ∈ X such that
h(y1 ) ≤ Φ(x, y1 ) ≤ a1
and h(y2 ) ≤ Φ(x2 , y2 ) ≤ a2 .
Thus
h(θ y1 + (1 − θ) y2 ) ≤ Φ(θ x1 + (1 − θ) x2 , θ y1 + (1 − θ) y2 ) ≤ θ Φ(x1 , y1 ) + (1 − θ) Φ(x2 , y2 )
≤ θ a1 + (1 − θ) a2 = c
which is a contradiction. Hence h is convex. Lemma D.2 For all y ∗ ∈ Y ∗ , h∗ (y) = Φ∗ (0, y ∗ ).
Proof.
h(y ∗ ) = supy∈Y (hy ∗ , yi − h(y)) = supy∈Y (hy ∗ , yi − inf x∈X Φ(x, y))
= sup sup ((hy ∗ , yi − Φ(x, y)) = sup sup (h0, xi + hy ∗ , yi − Φ(x, y))
y∈Y x∈X
y∈Y x∈X
= sup(x,y)∈X×Y (h(0, y ∗ ), (x, y)i − Φ(x, y)) = Φ∗ (0, y ∗ ).
Note that h is not necessarily l.s.c..
Theorem D.3 If h is l.s.c. at 0, then inf (P ) = sup (P ∗ ).
Proof. Since F is proper, h(0) = inf x∈X F (x) < ∞. Since h is convex by Lemma D.1,
it follows from Theorem C.4 that h(0) = h∗∗ (0). Thus
sup (P ∗ ) = supy∗ ∈Y ∗ (−Φ(0, y ∗ )) = supy∗ ∈Y ∗ ((hy ∗ , 0i − h∗ (y ∗ ))
= h∗∗ (0) = h(0) = inf (P )square
122
Theorem D.4 If h is subdifferentiable at 0, then inf (P ) = sup (P ∗ ) and ∂h(0 is the
set of solutions of (P ∗ ).
Proof. By Lemma D.2 y¯∗ solves (P ∗ ) if and only if
−h∗ (¯
y ∗ ) = −Φ(0, y¯∗ ) = supy∗ ∈Y ∗ (−Φ(0, y ∗ ))
= supy∗ (hy ∗ , 0) − h∗ (y ∗ )) = h∗∗ (0).
By Theorem 2.2
h∗∗ (0) + h∗∗∗ (¯
y ∗ ) = h¯
y ∗ , 0) = 0
if and only if y¯∗ ∈ ∂h∗∗ (0). Since h∗∗∗ = h∗ by Theorem C.5, y ∗ solves (P ∗ ) if and only
if y ∗ ∈ ∂h∗∗ (y ∗ ). Since ∂h(0) is not empty, ∂h(0) = ∂h∗∗ (0). Therefore ∂h(0) is the
set of all solutions of (P ∗ ) and (P ∗ ) has at least one solution.
Let y ∗ ∈ ∂h(0). Then
hy ∗ , xi + h(0) ≤ h(x)
for all x ∈ X. Let {xn } is a sequence in X such that xn → 0.
lim inf h(xn ) ≥ limhy ∗ , xn i + h(0) = h(0)
n→∞
and h is l.s.c. at 0. By Theorem D.3 inf (P ) = sup (P ∗ ). Corollary D.6 If there exists an x
¯ ∈ X such that Φ(¯
x, ·) is finite and continuous at
0, h is continuous on an open neighborhood U of 0 and h = h∗∗ . Moreover,
inf (P ) = sup (P ∗ )
and ∂h(0) is the set of solutions of (P ∗ ).
¯ ·) is bounded above on an open
Proof. First show that h is continuous. Clearly, Φ(X,
neighborhood U of 0. Since for all y ∈ Y
h(y) ≤ Φ(¯
x, y),
his bounded above on U . Since h is convex by Lemma D.1 and h is continuous by
Theorem C.6. Hence h = h∗∗ by Theorem C. Now, h is subdifferential at 0 by Theorem
C.10. The conclusion follows from Theorem D.4. Example D.4 Consider the example (7.4)–(7.5), i.e.,
Φ(x, y) = f (x) + ϕ(Λx + y)
for (12.1). Let us calculate the conjugate of Φ.
Φ∗ (x∗ , y ∗ ) = supx∈X supy∈Y {hx∗ , xi + hy ∗ , yi − Φ(x, y)}
= supx∈X supy∈Y {hx∗ , xi + hy ∗ , yi − f (x) − ϕ(Λx + y)}
= supx∈X {hx∗ , xi − f (x) supy∈Y [hy ∗ , yi − ϕ(Λx + y)]}
where
supy∈Y [hy ∗ , yi − ϕ(Λx + y)] = supy∈Y [hy ∗ , Λx + yi − ϕ(Λx + y) − hy ∗ , Λxi]
= supz∈Y [hy ∗ , zi − G(z)] − hy ∗ , Λxi = ϕ(y ∗ ) − hy ∗ , Λxi.
123
Thus
Φ∗ (x∗ , y ∗ ) = supx∈X {hx∗ , xi − hy ∗ , Λxi − f (x) + ϕ(y ∗ )}
= sup {hx∗ − Λ∗ y ∗ , xi − f (x) + ϕ(y ∗ )} = f ∗ (x∗ − Λy ∗ ) + ϕ∗ (y ∗ ).
x∈X
Theorem D.7 For any x
¯ ∈ X and y¯∗ ∈ Y ∗ , the following statements are equivalent.
(1) x
¯ solves (P ), y¯∗ solves (P ∗ ), and min (P ) = max (P ∗ ).
(2) Φ(¯
x) + Φ∗ (0, y¯∗ ) = 0.
(3) (0, y¯∗ ) ∈ ∂Φ(¯
x, 0).
Proof: Suppose (1) holds, then obviously (2) holds. Suppose (2) holds, then since
Φ(¯
x, 0) = F (¯
x) ≥ inf (P ) ≥ sup (P ∗ ) ≥ −Φ(0, y¯∗ )
by Theorem D.2, thus
Φ(¯
x, 0) = min (P ) = sup (P ∗ ) = −Φ∗ (0, y¯∗ )
Therfore (2) implies (1). Since h(0, y¯∗ ), (¯
x, 0)i = 0, (2) and (3) by Theorem C.7. Any solution y ∗ of (P ∗ ) is called a Lagrange multiplier associated with Φ. In fact
for the above Example the optimality condition implies
0 = Φ(¯
x, 0) + Φ∗ (0, y¯∗ ) = F (¯
x) + F ∗ (−Λ∗ y¯∗ ) + ϕ(Λ¯
x) + ϕ(¯
y∗)
(7.6)
= [F (¯
x) + F ∗ (−Λ∗ y¯∗ ) − h−Λ∗ y¯∗ , x
¯i] + [ϕ(Λ¯
x) + ϕ∗ (¯
y ∗ ) − h¯
y ∗ , Λ¯
x)]
Since each expression of (7.6) in square brackets is nonnegative, it follows that
F (¯
x) + F ∗ (−Λ∗ y¯∗ ) − h−Λ∗ y¯∗ , x
¯i = 0
ϕ(Λ¯
x) + ϕ∗ (¯
y ∗ ) − h¯
y ∗ , Λ¯
x) = 0
By Theorem 2.2
0 ∈ ∂F (¯
x) + Λ∗ y¯∗
(7.7)
y¯∗ ∈ ∂ϕ(Λ¯
x).
The function L : X × Y ∗ → (−∞, ∞] definded by
−L(u, y ∗ ) = sup {hy ∗ , yi − Φ(x, y)}
(7.8)
y∈Y
is called the Lagrangian. Note that
Φ∗ (x∗ , y ∗ ) =
{hx∗ , xi + hx∗ , xi − Φ(x, y)}
sup
x∈X, y∈Y
= sup hx∗ , xi + sup {hx∗ , xi − Φ(x, y)}} = sup hx∗ , xi − L(x, y ∗ )
x∈X
y∈Y
x∈X
Thus
−Φ∗ (0, y ∗ ) = inf L(x, y ∗ )
x∈X
124
(7.9)
and thus the dual problem (P ∗ ) is equivalently written as
sup
inf L(x, y ∗ ).
y ∗ ∈Y ∗ x∈X
Similarly, if Φ is a proper convex l.s.c. function, then the bi-conjugate of Φ in y,
Φ∗∗
x = Φ and
∗
∗ ∗
Φ(x, y) = Φ∗∗
x (y) = supy ∗ ∈Y ∗ {hy , yi − Φx (y )}
= sup {hy ∗ , yi + L(u, y ∗ )}.
y ∗ ∈Y ∗
Hence
Φ(u, 0) = sup L(x, y ∗ )
(7.10)
y ∗ ∈Y ∗
and the primal problem (P ) is equivalently written as
inf sup L(x, y ∗ ).
x∈X y ∗ ∈Y
By introducing the Lagrangian L, the Problems (P ) and (P ∗ ) are formulated as minmax problems, which arise in the game theory. By Theorem D.0
sup inf L(x, y ∗ ) ≤ inf sup L(x, y ∗ ).
y∗
x
x
y∗
Theorem D.8 (Saddle Point) Assume Φ is a proper convex l.s.c. function. Then
the followings are equivalent.
(1) (¯
x, y¯∗ ) ∈ X × Y ∗ is a saddle point of L, i.e.,
L(¯
u, y ∗ ) ≤ L(¯
u, y¯∗ ) ≤ L(x, y¯∗ )
for all x ∈ X, y ∗ ∈ Y ∗ .
(7.11)
(2) x
¯ solves (P ), y¯∗ solves (P ∗ ), and min (P ) = max (P ∗ ).
Proof: Suppose (1) holds. From (7.9) and (7.11)
L(¯
x, y¯∗ ) = inf L(x, y¯∗ ) = −Φ∗ (0, y¯∗ )
x∈X
and form (7.10) and (7.11)
L(¯
x, y¯∗ = sup L(¯
x, y ∗ ) = Φ(¯
x, 0)
y ∗ ∈Y ∗
Thus, Φ∗ (¯
u, 0) + Φ(0, y¯∗ ) = 0 and (2) follows from Theorem D.7.
Conversely, if (2) holds, then from (7.9) and (7.10)
−Φ∗ (0, y¯∗ ) = inf x∈X L(x, y¯∗ ) ≤ L(¯
x, y¯∗ )
Φ∗ (¯
x, 0) = supy∗ ∈Y ∗ L(¯
xx, y) ≥ L(¯
x, y¯∗ )
By Theorem D.7 −Φ∗ (0, y¯∗ = Φ∗ (¯
x, 0) and (7.11) holds. Theorem D.8 implies that no duality gup between (P ) and (P ∗ ) is equivalent to
the saddle point property of the pair (¯
x, y¯∗ ).
125
For the Example
L(x, y ∗ ) = f (x) + hy ∗ , Λxi − ϕ(y ∗ )
(7.12)
If (¯
x, y¯∗ ) is a saddle point, then from (7.11)
0 ∈ ∂f (¯
x) + Λ∗ y¯∗
(7.13)
0 ∈ Λ¯
x − ∂ϕ∗ (¯
x).
It follows from Theorem 2.2 that the second equation is equivalent to
y¯∗ ∈ ∂ϕ(Λ¯
x)
and (7.13) is equivalent to (7.7).
7.3 Monotone Operators and Yosida-Morrey approximations
7.4
Lagrange multiplier Theory
In this action we introduce the generalized Lagrange multiplier theory. We establish
¯ and derive the optimalthe conditions for the existence of the Lagrange multiplier λ
ity condition (1.7)–(1.8). In Section 1.5 it is shown that the both Uzawa and augmented Lagrangian method are fixed point methods for the complementarity condition
(??))and the convergence analysis is presented . In Section 1.6 we preset the concrete
examples and demonstrate the applicability of the results in Sections 1.4–1.5.
8
Optimization Algorithms
In this section we discuss iterative methods for finding a solution to the optimization
problem. First, consider the unconstrained minimization:
min
J(x)
over a Hilbert space X.
Assume J : X → R+ is continuously differentiable. If u∗ ∈ X minimizes J, then the
necessarily optimality is given by J 0 (u∗ ) = 0. The gradient method is given by
xn+1 = xn − α J 0 (xn )
(8.1)
where α > 0 is a stepsize chosen so that the decent J(xn+1 ) < J(xn ) holds. Let
gn = J 0 (xn ) and since
J(xn − α gn ) − J(xn ) = −α |J 0 (xn )|2 + o(α),
(8.2)
there exits a α > 0 satisfies the decent property. A stepsize α > 0 can be determined
by
min J(xn − α gn )
(8.3)
α>0
or by a line search method (e.g, Amijo-rule)
126
Next, consider the constrained minimization
min
J(x)
over x ∈ C.
(8.4)
where C is a closed convex set in X. It follows from Theorem 7 that if J : X → R is
C 1 and x∗ ∈ C is a minimizer, then
(J 0 (x∗ ), x − x∗ ) ≥ 0 for all x ∈ C.
(8.5)
Define the orthogonal projection operator PC of X onto C, i.e., z ∗ = PC x minimizes
|z − x|2 over z ∈ C.
From Theorem 7, z ∗ ∈ C satisfies
(z ∗ − x, z − z ∗ ) ≥ 0 for all z ∈ C.
(8.6)
Note that (8.5) is equivalent to
(x∗ − (x∗ − α J 0 (x∗ )), x − x∗ ) ≥ 0 for all x ∈ C and α > 0.
Thus, it is follows from (8.6) that (8.5) is equivalent to
x∗ = PC (x∗ − α J 0 (x∗ )).
The projected gradient method for (8.4) is defined by
xn+1 = PC (xn − αn J 0 (xn ))
(8.7)
with a stepsize αn > 0.
Define the the Hessian H = J 00 (x0 ) ∈ L(X, X) of J at x0 by
J(x) − J(x0 ) − (J 0 (x0 ), x − x0 ) =
1
(x − x0 , H(x − x0 )) + o(|x − x0 |2 ).
2
Thus, we minimize over x
J(x) ∼ J(x0 ) + (J 0 (x0 ), x − x0 ) +
1
(x − x0 , H(x − x0 ))
2
we obtain the damped-Newton method for equation given by
xn+1 = xn − α H(xn )−1 J 0 (xn ),
(8.8)
where α > 0 is a stepsize and α = 1 corresponds to the Newton method. In the case
of min J(x) over x ∈ C we obtain the projected Newton method
xn+1 = P rojC (xn − α H(xn )−1 J 0 (xn )).
(8.9)
Computing the Hessian can be expensive and numerical evaluation can also be less
accurate. The finite rank scant update (BFGS and GFP) is used to substitute H
(Quasi-Newton method). It is a variable merit method and such updates are preconditioner for the gradient method.
127
Consider the (regularized) nonlinear least square problem
min
1
α
|F (x)|2Y + (P x, x)X
2
2
(8.10)
where F : X → Y is C 1 and P is a nonnegative self-adjoint operator on a Hilbert space
X. The Gauss-Newton method is an iterative method of the form
1
α
xn+1 = argmin{ |F 0 (xn )(x − xn ) + F (xn )|2Y + (P x, x)X },
2
2
i.e.,
xn+1 = xn − (F 0 (xn )∗ F (xn ) + α P )−1 (F 0 (xn )∗ F (xn )).
8.1
(8.11)
Constrained minimization and Lagrange multiplier method
Consider the constrained minimization
min
J(x, u) = F (x) + H(u)
subject to E(x, u) = 0 and u ∈ C.
(8.12)
We use the implicit function theory for developing algorithms for (8.12).
x, u
¯) satisfies
Implicit Function Theory Let E : X × U → X is C 1 . Suppose a pair (¯
E(x, u) = 0 and Ex (¯
x, u
¯) is bounded invertible. Then the exists a δ > 0 such that
for |u − u
¯| < δ equation E(x, u) = 0 has a locally defined unique solution x = Φ(u).
Moreover, Φ : U → X is continuously differentiable at u
¯ and x˙ = Φ0 (¯
x)(d) satisfies
Ex (¯
x, u
¯)x˙ + Eu (¯
x, u
¯)d = 0.
Theorem (Lagrange Calculus) Assume that E(x, u) = 0 and Ex (x, u) is bounded
invertible. Then, by the implicit function theory x = Φ(u) and for J(u) = J(Φ(u), u)
(J 0 (u), d) = (H 0 (u), d) + (Eu (x, u)∗ λ, d)
where λ ∈ Y satisfies
Ex (x, u)∗ λ + F 0 (x) = 0.
Proof: From the implicit function theory
(J 0 , d) = (F 0 (x), x)
˙ + (H 0 (u), d),
where
Ex (x, u)x˙ + Eu (x, u) = 0.
Thus, the claim follows from
(F 0 (x), x)
˙ = −(Ex∗ λ, x)
˙ = −(λ, Ex x)
˙ = (λ, Eu d).
Hence we obtain the gradient method for equality constraint problem (8.12);
Gradient method
1. Solve for xn
E(xn , un ) = 0
128
2. Solve for λn
Ex (xn , un )∗ λn + F 0 (xn ) = 0
3. Compute the gradient by
gn = H 0 (un ) + Eu (xn , un )∗ λn
4. Gradient Step: Update
un+1 = PC (un − α gn )
with a line search for α > 0.
8.2
Conjugate gradient method
In this section we develop the conjugate gradient and residual method for the saddle
point problem.
Consider the quadratic programing
1
J(x) = (Ax, x)X − (b, x)X
2
where A is positive self-adjioint operator on a Hilbert space X. Then, the negative
gradient of J is given by
r = −J 0 (x) = b − Ax.
The gradient method is given by
xk+1 = xk + αk pk ,
pk = rk ,
where the step size αk is selected by
min
J(xk + α rk ) over α > 0.
That is, the Cauchy step is given by
0 = (J 0 (xk+1 ), pk ) = α (Apk , pk ) − (pk , rk ) ⇒ αk =
(pk , rk )
.
(Apk , pk )
(8.13)
It is known that the gradient method with Cauchy step searches the solution on a
subspace spanned by the major and minor axes of A dominantly, and exhibits a zigzag pattern of its iterates. If the condition number
ρ=
max σ(A)
min σ(A)
is large, then it is very slow convergent. In order to improve the convergence we
introduce the conjugacy of the search directions pk and it can be achieved by the
conjugate gradient method. For example, a new search direction pk is extracted for
the residual rk = b − Axk by
pk = rk − β pk−1
and the orthogonality
0 = (Apk , pk−1 ) = (rk , Apk−1 ) − β(Apk−1 , pk−1 )
129
implies
βk =
(rk , Apk−1 )
.
(Apk−1 , pk−1 )
(8.14)
Let P is the pre-conditioner, self-adjiont operator so that the condition number of
1
1
P − 2 AP − 2 is much smaller that the one for A. If we apply (8.13)(8.14) with the
pre-conditioner P , we have the pre-conditioned conjugate gradient method:
• Initialize x0 and set
r0 = b − Ax0 , z0 = P −1 r0 , p0 = z0
• Iterate on k until (rk , zk ) is sufficiently small
αk =
(rk , zk )
,
(pk , Apk )
xk+1 = xk + αk pk ,
rk+1 = rk − αk Apk .
(zk+1 , rk+1 )
,
(zk , rk )
pk+1 := zk+1 + βk pk
zk+1 = P −1 rk+1 , βk =
Here, zk is the pre-conditioned residual and (rk , zk ) = (rk , P −1 rk ). The above
formulation is equivalent to applying the conjugate gradient method without preconditioned system
1
1
1
P − 2 AP − 2 x
ˆ = P − 2 b,
1
where x
ˆ = P 2 x. We have the conjugate gradient theory.
Theorem (Conjugate gradient method)
(1) We have conjugate property
(rk , zj ) = 0, (pk , Apj ) = 0 for 0 ≤ j < k.
(2) xm+1 minimizes J(x) over x ∈ x0 + span{p0 , · · · , pm }.
Proof: For k = 1
(r1 , z0 ) = (r0 , z0 ) − α0 (Az0 , z0 ) = 0
and since (r1 , z1 ) = (r0 , z1 ) − α0 (r1 , Az0 )
(p1 , Ap0 ) = (z1 , Az0 ) + β0 (z0 , Az0 ) = −
1
(r1 , z1 ) + β0 (z0 , Az0 ) = 0
α0
Note that span{p0 , · · · , pk } = span{z0 , · · · , zk }. In induction in k, for j < k
(rk+1 , zj ) = (rk , zj ) − αk (Apk , pj ) = 0
and
((rk+1 , zk ) = (rk , zk ) − αk (Apk , pk ) = 0
Also, for j < k
(pk+1 , Apj ) = (zk+1 , Apj ) + βk (pk , Apj ) = −
1
(zk+1 , rj+1 − rj ) = 0
αj
and
(pk+1 , Apk ) = (zk+1 , Apk ) + βk (pk , Apk ) = −
130
1
(rk+1 , zk ) + βk (pk , Apk ) = 0.
αk
Hence (1) holds. For (2) we note that
(J 0 (xm+1 ), pj ) = (rm+1 , pj ) = 0 for j ≤ m
and xm+1 = x0 + span{p0 , · · · , pm }. Thus, (2) holds. Remark Note that
rk = r0 −
k−1
X
αj P −1 Apj
j=0
Thus,
|rk |2 = |r0 |2 − (2
k−1
X
αj P −1 Apj , r0 ) + |αj |.
Consider the constrained quadratic programing
min
1
(Qy, y)X − (a, y)X
2
subject to Ey = b in Y.
Then the necessary optimality condition is written for x = (y, λ) ∈ X × Y :


Q E∗
,
A(y, λ) = (a, b) where A = 
E 0
where A is self-adjoint but is indefinite.
In general we assume A is self-adjoint and invertible. Consider the least square
problem
1
1
1
|Ax − b|2X = (AAx, x)X − (Ab, x)X + |b|2 .
2
2
2
We consider the conjugate directions {pk } satisfying (Apk , Apj ) = 0, j < k. The preconditioned conjugate residual method may be derived in the same way as done for
the conjugate gradient method:
• Initialize x0 and set
r0 = P −1 (b − Ax0 ),
p0 = r0 ,
(Ap)0 = Ap0
• Iterate with k until (rk , rk ) is sufficiently small
αk =
(rk , Ark )
,
((Ap)k , P −1 (Ap)k )
βk =
(rk+1 , Ark+1 )
,
(rk , Ark )
xk+1 = xk + αk pk ,
pk+1 = rk+1 + βk pk ,
rk+1 = rk − αk P −1 (Ap)k .
(Ap)k+1 = Ark+1 + βk (Ap)k
Theorem (Conjugate residual method) Assuming αk 6= 0,
(1) We have conjugate property
(rj , Ark ) = 0, (Apk , P −1 Apj ) = 0 for 0 ≤ j < k.
(2) xm+1 minimizes |Ax − b|2 over x ∈ x0 + span{p0 , · · · , pm }.
131
Proof: For k = 1
(Ar0 , r1 ) = (Ar0 , r0 ) − α0 (Ap0 , P −1 Ap0 ) = 0
and since (Ap0 , P −1 Ap1 ) = (P −1 Ap0 , Ar1 ) + β0 (Ap0 , P −1 Ap0 ) and (P −1 Ap0 , Ar1 ) =
1
1
α0 (r0 − r1 , Ar1 ) = − α0 (r1 , Ar1 ), we have
(Ap0 , P −1 Ap1 ) = −
1
(r1 , Ar1 ) + β0 (Ap0 , P −1 Ap0 ) = 0.
α0
Note that span{p0 , · · · , pk } = span{r0 , · · · , rk } and rk = r0 +P −1 A span{p0 , · · · , pk−1 }.
In induction in k, for j < k
(rk+1 , Arj ) = (rk , Arj ) − αk (P −1 Apk , Arj ) = 0
and
(Ark+1 , rk ) = (Ark , rk ) − αk (Apk , P −1 Apk ) = 0
Also, for j < k
(Apk+1 , P −1 Apj ) = (Ark+1 , P −1 Apj ) + βk (Apk , P −1 Apj ) = −
1
(Ark+1 , rj+1 − rj ) = 0
αj
and
(Apk+1 , P −1 Apk ) = (Ark+1 , P −1 Apk )+βk (Apk , P −1 Apk ) = −
1
(rk+1 , Ark+1 )+βk (pk , Apk ) = 0.
αk
Hence (1) holds. For (2) we note that
(Axm+1 − b, P −1 Apj ) = −(rm+1 , P −1 Apj ) = 0 for j ≤ m
and xm+1 = x0 + span{p0 , · · · , pm }. Thus, (2) holds. direct inversion in the iterative subspace (DIIS)
8.3
Nonlinear Conjugate Residual method
One can extend the conjugate residual method for the nonlinear equation. Consider
the equality constrained optimization
min J(y) subject to E(y) = 0.
The necessary optimality system for x = (y, λ) is
 0

J (y) + E 0 (y)∗ λ
 = 0.
F (y, λ) = 
E(y)
Multiplication of vector by the Jacobian F 0 can be approximated by the difference of
two residual vectors by
F (xk + t y) − F (xk )
∼ Ay with t |y| ∼ 1.
t
Thus, we have the nonlinear version of the conjugate residual method:
Nonlinear Conjugate Residual method
132
• Calculate sk for approximating Ark by
sk =
F (xk + t rk ) − F (xk )
1
, t=
.
t
|rk |
• Update the direction by
pk = P −1 rk − βk pk−1 ,
βk =
(sk , rk )
.
(sk−1 , rk−1 )
• Calculate
qk = sk − βk qk−1 .
• Update the solution by
xk+1 = xk + αk pk ,
αk =
(sk , rk )
.
(qk , P −1 qk )
• Calculate the residual
rk+1 = F (xk+1 ).
9
Newton method and SQP
Next, we develop the Newton method for J 0 (u) = 0. Define the Lagrange functional
L(x, u, λ) = F (x) + H(u) + (λ, E(x, u)).
Note that
J 0 (u) = Lu (x(u), u, λ(u))
where (x(u), λ(u)) satisfies
E(x(u), u) = 0,
E 0 (x(u))∗ λ(u) + F 0 (x(u)) = 0
where we assumed Ex (x, u) is bounded invertible. By the implicit function theory
J 00 (u) = Lux (x(u), u, λ(u))x˙ + Eu (x(u), u, λ(u))∗ λ˙ + Luu (x(u), u, λ(u)),
˙ satisfies
where (x,
˙ λ)

  

x˙
Lxu (x, u, λ)
Lxx (x, u, λ) Ex (x, u)∗
 = 0.

  + 
Ex (x, u)
0
Eu (x, u)
λ˙
Thus,
x˙ = −(Ex (x, u))−1 Eu (x, u),
λ˙ = −(Ex (x, u)∗ )−1 (Lxx (x, u, λ)x˙ + Lxu (x, u, λ))
and
J 00 (u) = Lux x˙ − Eu (x, u)∗ (Ex (x, u)∗ )−1 (Lux x˙ + Lxu ) + Luu (x, u, λ).
Theorem (Lagrange Calculus II) The Newton step
J 00 (u)(u+ − u) + J 0 (u) = 0
133
is equivalently written as

Lxx (x, u, λ) Lxu (x, u, λ) Ex (x, u)∗


 Lux (x, u, λ) Luu (x, u, λ) Eu (x, u)∗


Ex (x, u)
Eu (x, u, λ)
0

∆x


0
 

 

  ∆u  +  Eu (x, u)∗ λ + H 0 (u)
 

 

∆λ
0



=0


where ∆x = x+ − x, ∆u = u+ − u and ∆λ = λ+ − λ.
Proof: Define the linear operator T by


−Ex (x, u)−1 Eu (x, u)
.
T (x, u) = 
I
Then, the Newton method is equivalently written as
0
∆x
∗
∗ 00
= 0.
+ T (x, u)
T (x, u) L (x, u, λ)
Eu∗ λ + H 0 (u)
∆u
Since E 0 (x, u) is surjective, it follows from the closed range theorem
N (T (x, u)∗ ) = R(T (x, u))⊥ = N (E 0 (x, u)∗ )⊥ = R(E 0 (x, u)).
Hence, we obtain the claimed update. Thus, we have the Newton method for the equality constraint problem (8.12).
Newton method
1. Given u0 , initialize (x0 , λ0 ) by
E(x0 , u0 ) = 0,
Ex (x0 , u0 )∗ λ0 + F 0 (x0 ) = 0
2. Newton Step: Solve for (∆x, ∆u, ∆λ)
 00

 

L (xn , un , λn ) E 0 (xn , un )∗
∆x
0

  ∆u  +  Eu (xn , un )∗ λn + H 0 (un )  = 0
0
E (xn , un )
0
∆λ
0
and update un+1 = P rojC (un + α ∆u) with a stepsize α > 0.
3. Feasible Step: Solve for (xn+1 , λn+1 )
E(xn+1 , un+1 ) = 0,
Ex (xn+1 , un+1 )∗ λn+1 + F 0 (xn+1 ) = 0.
4. Stop or set n = n + 1 and return to step 2.
where

L00 (xn , un , λn ) = 
F 00 (xn ) + (λn , Exx (xn , un ))
(λn , Eux (xn , un ))
134
(λn , Exu (xn , un ))
H 00 (un )

.
+ (λn , Euu (xn , un ))
and
E 0 (xn , un ) = (Ex (xn , un ), Eu (xn , un )).
Consider the general equality constraint minimization;
min
J(y)
subject to E(y) = 0
The necessary optimality condition is given by
J 0 (y) + E 0 (y)∗ λ = 0,
E(y) = 0,
(9.1)
where we assume E 0 (y) is surjective. Problem (8.12) is a specific case of this, i.e.,
y = (x, u),
J(y) = F (x) + H(u),
E = E(x, u).
The Newton method applied to the necessary optimality (9.1) for (y, λ) is as follows.
One can apply the Newton method for the necessary optimality system
(Lx (x, u, λ), Lu (x, u, λ, E(x, u)) = 0
and results in the SQP (sequential quadratic programing):
Newton method applied to Necessary optimality system (SQP)
1. Newton direction: Solve for (∆y, ∆λ)
 00

  0

L (yn , λn ) E 0 (yn )∗
∆y
J (yn ) + E 0 (yn )∗ λn


+
 = 0.
0
E (yn )
0
∆λ
E(yn )
(9.2)
2. Update yn+1 = yn + α ∆y and λn+1 = λm + α ∆λ.
3. Stop or set n = n + 1 and return to step 1.
where
L00 (y, λ) = J 00 (y) + E 00 (y)∗ λ.
The both Newton methods solve the same linear system (9.2) with the different
right hand side. But, the first Newton method we iterate on the variable u only but
each step uses the feasible step E(yn ) = E(xn , un ) = 0 and Lx (xn , un , λn ) = 0 for
(xn , λn ), which are the steps for the gradient method. The, it computes the Newton
direction for J 0 (u) and updates u. In this sense the second Newton’s method is a
relaxed version of the first one.
9.1
Sequential programing
In this section we discuss the sequential programming. It is an intermediate method
between the gradient method and the Newton method. Consider the constrained optimization
F (y) subject to E(y) = 0, y ∈ C.
We linearize the equality constraint and consider a sequence of the constrained optimization;
min
F (y) subject to E 0 (yn )(y − yn ) + E(yn ) = 0,
135
y ∈ C.
(9.3)
The necessary optimality (if E ( yn ) is surjective) is given by

 (F 0 (y) + E 0 (yn )∗ λ, y˜ − y) ≥ 0 for all y˜ ∈ C

E 0 (yn )(y
(9.4)
− yn ) + E(yn ) = 0.
Note that the necessary condition for (y ∗ , λ∗ ) is written as

 (F 0 (y ∗ ) + E 0 (yn )∗ λ∗ − (E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗ , y˜ − y ∗ ) ≥ 0 for all y˜ ∈ C

E 0 (y
n
)(y ∗
− yn ) + E(yn ) =
E 0 (y
n
)(y ∗
− yn ) + E(yn ) −
(9.5)
E(y ∗ ).
Suppose we have the Lipschitz continuity of solutions to (9.4) and (9.5) with respect to
the perturbation; ∆1 = (E 0 (yn )∗ −E 0 (y ∗ )∗ )λ∗ and ∆2 = E 0 (yn )(y ∗ −yn )+E(yn )−E(y ∗ ),
we have
|y − y ∗ | ≤ c1 |(E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗ | + c2 |E 0 (yn )(y ∗ − yn ) + E(yn ) − E(y ∗ )|.
Thus, for the (damped) update:
yn+1 = (1 − α)yn + α y,
α ∈ (0, 1),
(9.6)
we have the estimate
|yn+1 − y ∗ | ≤ (1 − α + αγ)|yn − y ∗ | + cα |yn − y ∗ |2 ,
(9.7)
for some constants γ and c > 0. In fact, for the case C = X let zn = (yn , λn ),
z ∗ = (y ∗ , λ∗ ) and define Gn by
 00

F (yn ) E 0 (yn )∗
.
Gn = 
E 0 (yn )
0
For the update:
zn+1 = (1 − α)zn + α z,
α ∈ (0, 1]
we have
∗
∗
zn+1 − z = (1 − α) (zn − z ) +
with
δn =
α G−1
n (
(E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗
0
F 0 (y ∗ ) − F 0 (yn+1 ) − F 00 (yn )(y ∗ − yn+1 )
E(y ∗ ) − E 0 (yn )(y ∗ − yn ) − E(yn ).
Note that if F 00 (yn ) = A, then
−1
∆1
A ∆1 − E 0 (yn )∗ e
−1
Gn
=
,
0
e
+ δn )
.
e = (E 0 (yn )A−1 E 0 (yn )∗ )−1 E 0 (yn )A−1 ∆1
Thus, we have
|G−1
n
(E 0 (yn )∗ − E 0 (y ∗ )∗ )λ∗
0
136
| ≤ γ |y n − y ∗ |
Since
∗ 2
|G−1
n δn | ≤ c |yn − y | ,
we have
|zn+1 − z ∗ | ≤ (1 − α + αγ) |zn − z ∗ | + αc |zn − z ∗ |2
which imply (9.7). When either the quadratic variation of E is dominated by the
linearization E 0 of E, i.e.,
|E 0 (y ∗ )† (E 0 (y n ) − E 0 (y ∗ ))| ≤ β |y n − y|
with small β > 0 or the multiplier |λ∗ | is small, γ > 0 is sufficiently small. The estimate
(9.7) implies that if |y − y ∗ |2 is dominating we use a small α > 0 to the damp out and
if |y − y ∗ | is dominating we use α → 1.
9.1.1
Second order version
The second order sequential programming is given by
min
F (y) + hλn , E(y) − (E 0 (yn )(y − yn ) + E(yn ))i
(9.8)
subject to E 0 (yn )(y − yn ) + E(yn ) = 0 and y ∈ C.
The necessary optimality condition for (9.8) is given by

 (F 0 (y) + (E 0 (y)∗ − E 0 (yn )∗ )λn + E 0 (yn )∗ λ, y˜ − y) ≥ 0 for all y˜ ∈ C

(9.9)
E 0 (yn )(y − yn ) + E(yn ) = 0.
It follows from (9.5) and (9.9) that
|y − y ∗ | + |λ − λ∗ | ≤ c (|yn − y ∗ |2 + |λn − λ∗ |2 ).
In summary we have
Sequential Programming I
1. Given yn ∈ C, we solve for
min
F (y) subject to E 0 (yn )(y − yn ) + E(yn ) = 0,
over y ∈ C.
2. Update yn+1 = (1 − α) yn + α y, α ∈ (0, 1). Iterate until convergence.
Sequential Programming II
1. Given yn ∈ C, λn and we solve for y ∈ C:
min
F (y)+(λn , E(y)−(E 0 (yn )(y−yn )+E(yn )) subject to E 0 (yn )(y−yn )+E(yn ) = 0,
2. Update (yn+1 , λn+1 ) = (y, λ). Iterate until convergence.
137
y ∈ C.
9.1.2
Examples
Consider the constrained optimization of the form
min
F (x) + H(u)
subject to E(x, u) = 0 and u ∈ C.
(9.10)
Suppose H(u) is nonsmooth. We linearize the equality constraint and consider a sequence of the constrained optimization;
min
F (x) + H(u) + hλn , E(x, u) − (E 0 (xn , un )(x − xn , u − un ) + E(xn , un )i
subject to E 0 (xn , un )(x − xn , u − un ) + E(xn , un ) = 0 and u ∈ C.
(9.11)
Note that if (x∗ , u∗ ) is an optimizer of (9.10), then
F 0 (x∗ ) + (E 0 (x∗ , u∗ ) − E 0 (xn , un ))λn + E 0 (xn , un )λ∗ = (E 0 (xn , un ) − E 0 (x∗ , u∗ ))(λ∗ − λn ) = ∆1
E 0 (xn , un )(x∗ − xn , u∗ − un ) + E(xn , un ) = ∆2 ,
|∆2 | ∼ M (|xn − x∗ |2 + |un − u∗ |2 ).
(9.12)
Thus, (x∗ , u∗ ) is the solution to the perturbed problem of (9.11):
min
F (x) + H(u) + hλn , E(x, u) − (E 0 (xn , un )(x − xn , u − un ) + E(xn , un )i
subject to E 0 (xn , un )(x − xn , u − un ) + E(xn , un ) = ∆ and u ∈ C.
(9.13)
Let (xn+1 , un+1 ) be the solution of (9.11). One can assume the Lipschitz continuity of
solutions to (9.13):
|xn+1 − x∗ | + |un+1 − u∗ | + |λn − λ∗ | ≤ C |∆|
(9.14)
From (9.12) and assumption (9.14) we have the quadratic convergence of the sequential
programing method (9.17):
|xn+1 − x∗ | + |un+1 − u∗ | + |λ − λ∗ | ≤ M (|xn − x∗ |2 + |un − u∗ |2 + |λn − λ∗ |). (9.15)
Assuming H is convex, the necessary optimality given by
 0
F (x) + (Ex (x, u)∗ − Ex (xn , un )∗ )λn + Ex (xn , un )∗ λ = 0





H(v) − H(u) + ((Eu (x, u)∗ − Eu (xn , un )∗ )λn + Eu (x, u)∗ λ, v − u) ≥ 0 for all v ∈ C





Ex (xn , un )(x − xn ) + Eu (xn , un )(u − un ) + E(xn , un ) = 0.
(9.16)
As in the Newton method we use
(Ex (x, u)∗ − Ex (xn , un )∗ )λn ∼ E 00 (xn , un )(x − xn , u − un )λn
F 0 (xn+1 ) ∼ F 00 (xn )(xn+1 − xn ) + F 0 (xn )
and obtain
F 00 (xn )(x − xn ) + F 0 (xn ) + λn (Exx (xn , un )(x − xn ) + Exu (xn , un )(u − un )) + Ex (xn , un )∗ λ = 0
Ex (xn , un )x + Eu (xn , un )u = Ex (xn , un )xn + Eu (xn , un )un − E(xn , un ),
138
which is a linear system of equations for (x, λ). Thus, one has x = x(u), λ = λ(u), a
continuous affine function in u;

x(u)


F 00 (xn ) + λn Exx (xn , un ) Ex (xn , un )∗
=

−1

λ(u)
Ex (xn , un )
RHS
(9.17)
0
with
RHS =
F 00 (xn )xn − F 0 (xn ) + λn Exx (xn , un )xn − Exu (xn , un )(u − un ))λn
Ex (xn , un )xn − Eu (xn , un )(u − un ) − E(xn , un )
and then the second inequality of (9.16) becomes the variational inequality for u ∈ C.
Consequently, we obtain the following algorithm:
Sequential Programming II
1. Given u ∈ C, solve (9.17) for (x(u), λ(u)).
2. Solve the variational inequality for u ∈ C:
H(v) − H(u) + ((Eu (x(u), u)∗ − Eu (xn , un )∗ )λn + Eu (xn , un )∗ λ(u), v − u) ≥ 0,
(9.18)
for all v ∈ C.
3. Set un+1 = u and xn+1 = x(u). Iterate until convergence.
Example (Optimal Control Problem) Let (x, u) ∈ H 1 (0, T ; Rn ) × L2 (0, T ; Rm ). Consider the optimal control problem:
Z
min
T
(`(x(t)) + h(u(t))) dt
0
subject to the dynamical constraint
d
x(t) = f (x(t)) + Bu(t),
dt
x(0) = x0
and the control constraint
u ∈ C = {u(t) ∈ U,
a.e.},
where U is a closed convex set in Rm . Then, the necessary optimality condition for
(9.11) is
E 0 (xn , un )(x − xn , u − un ) + E(xn , un ) = −
F 0 (x) + Ex (xn , un )λ =
d
x + f 0 (xn )(x − xn ) + Bu = 0
dt
d
λ + f 0 (xn )t λ + `0 (x) = 0
dt
u(t) = argminv∈U {h(v) + (B t λ(t), v)}.
139
t
If h(u) = α2 |u|2 and U = Rm , then u(t) = − B αλ(t) . Thus, (9.11) is equivalent to solving
the two point boundary value for (x, λ):

1
d

0
t


 dt x = f (xn )(x − xn ) − α BB λ, x(0) = x0



 − d λ = (f 0 (x) − f 0 (xn ))∗ λn + f 0 (xn )∗ λ + `0 (x),
dt
λ(T ) = 0.
Example (Inverse Medium problem) Let (y, v) ∈ H 1 (Ω) × H ( Ω). Consider the inverse
medium problem:
Z
Z
1
α
2
min
|y − z| dsx + (β |v| + |∇v|2 ) dx
2
Γ 2
Ω
subject to
∂y
= g at the boundary Γ
∂n
−∆y + vy = 0 in Ω,
and
v ∈ C = {0 ≤ v ≤ γ < ∞},
where Ω is a bounded open domain in R2 , z is a boundary measurement of the voltage
y and g ∈ L2 (Γ) is a given current. Problem is to determine the potential function
v ∈ C from z ∈ L2 (Γ) in a class of sparse and clustered media. Then, the necessary
optimality condition for (9.11) is given by

∂y


=g
−∆y + vn (y − yn ) + vyn = 0,



∂n





∂λ
−∆λ + vn λ = 0,
=y−z

∂n




Z




(α (∇v, ∇φ − ∇v) + β (|φ| − |v|) + (λyn , v)) dx ≥ 0 for all φ ∈ C

Ω
9.2
Exact penalty method
Consider the penalty method for the inequality constraint minimization
min
subject to Gx ≤ c
F (x),
(9.19)
where G ∈ L(X, L2 (Ω)). If x∗ is a minimizer of (9.19), the necessary optimality
condition is given by
F 0 (x∗ ) + G∗ µ = 0,
µ = max(0, µ + Gx∗ − c) a.e. in Ω.
(9.20)
For β > 0 the penalty method is defined by
min
F (x) + β ψ(Gx − c)
where
Z
ψ(y) =
max(0, y) dω.
Ω
140
(9.21)
The necessary optimality condition is given by
−F 0 (x) ∈ βG∗ ∂ψ(Gx − c).
where
(9.22)

s<0
 0
[0, 1] s = 0
∂ψ(s) =

1
s>0
Suppose the Lagrange multiplier µ satisfies
sup |µ(ω)| ≤ β,
(9.23)
ω∈Ω
then µ ∈ β ∂ψ(Gx∗ − c) and thus x∗ satisfies (9.22). Moreover, if (9.21) has a unique
minimizer in a neighborhood of x∗ , then x = x∗ where x is a minimizer of (9.21).
Due to the singularity and the non-uniqueness of the subdifferential of ∂ψ, the direct
treatment of the condition (9.21) many not be feasible for numerical computation. We
define a regularized functional of max(0, s); for > 0
 s≤0

2




1
2
max (0, s) =
2 |s| + 2 0 ≤ s ≤ .





s
s≥0
and consider the regularized problem of (9.21);
min
J (x) = F (x) + β ψ (Gx − c),
whee
(9.24)
Z
ψ (y) =
max (0, y) dω
Ω
Theorem (Consistency) Assume F is weakly lower semi-continuous. For an arbitrary β > 0, any weak cluster point of the solution x , > 0 of the regularized problem
(9.24) converges to a solution x of the non-smooth penalized problem (9.21) as → 0.
Proof: First we note that 0 ≤ max(s) − max(0, s) ≤ for all s ∈ R. Let x be a solution
2
to (9.21) and x be a solution of the regularized problem (9.24). Then we have
F (x ) + β ψ (x ) ≤ F (x) + β ψ (x)
F (x) + β ψ(x) ≤ F (x ) + β ψ(x ).
Adding these inequalities,
ψ (x ) − ψ(x ) ≤ ψ (x) − ψ(x).
Thus, we have for all cluster points x
¯ of x
F (¯
x) + β ψ(¯
x) ≤ F (x) + β ψ(x),
141
since F is weakly lower semi-continuous,
ψ (x ) − ψ(¯
x) = ψ (x ) − ψ(x ) + ψ(x ) − ψ(¯
x)
≤ ψ (x) − ψ(x) + ψ(x ) − ψ(¯
x).
and
lim (ψ (x) − ψ(x) + ψ(x ) − ψ(¯
x)) ≤ 0
→0+
As a consequence of Theorem, we have
Corollary If (9.21) has the unique minimizer in a neighborhood of x∗ and (9.23) holds,
then x converges to the solution x∗ of (9.19) as → 0.
The necessary optimality for (9.24) is given by
F 0 (x) + G∗ ψ0 (Gx − c) = 0.
(9.25)
Although the non-uniqueness for concerning subdifferential in the optimality condition in (9.22) is now bypassed through regularization, the optimality condition (9.25)
is still nonlinear. For this objective, we first observe that max0 has an alternative
representation, i.e.,
0
s≤0
0
max (s) = χ (s)s, χ (s) =
(9.26)
1
max(,s) s > 0
This suggests the following semi-implicit fixed point iteration;
α P (xk+1 −xk )+F 0 (xk )+β G∗ χk (Gxk −c)(Gxk+1 −c)j = 0,
χk = χ (Gxk −c), (9.27)
where P is positive, self-adjoint and serves a pre-conditioner for F 0 and α > 0 serves a
stabilizing and acceleration stepsize (see, Theorem 2).
Let dk = xk+1 − xk . Equation (9.29) for xk+1 is equivalent to the equation for dk
α P dk + F 0 (xk ) + βG∗ χk (Gxk − c + Gdk ) = 0,
which gives us
(α P + β G∗ χk G)dk = −F0 (xk ).
(9.28)
The direction dk is a decent direction for J (x) at xk , indeed,
(dk , J0 (xk )) = −((α P + β G∗ χk G)dk , dk ) = −α(Ddk , dk ) − β(χk Gdk , Gdk ) < 0.
Here we used the fact that P is strictly positive definite. So, the iteration (9.29) can
be seen as a decent method:
Algorithm (Fixed point iteration)
Step 0. Set parameters: β, α, , P
Step 1. Compute the direction by (α P + βG∗ χk G)dk = −F 0 (xk )
Step 2. Update xk+1 = xk + dk .
142
Step 3. If |J0 (xk )| < T OL, then stop. Otherwise repeat Step 1 - Step 2.
Let us make some remarks on the Algorithm. In many applications, the structure of
F 0 and G are sparse block diagonals, and the resulting system (9.28) for the direction
dk then becomes a linear system with a sparse symmetric positive-definite matrix;
and can be efficiently solved by the Cholesky decomposition method, for example. If
F (x) = 12 (x, Ax) − (b, x), then we have F 0 (x) = Ax − b. For this case we may use the
alternative update
α P (xk+1 − xk ) + Axk+1 − b + β G∗ χk (Gxk+1 − c) = 0,
assuming that it doesn’t cost much to perform this fully implicit step.
For the bilateral inequality constraint (??)-(??) the update (9.29) becomes
α P (xk+1 − xk ) + F 0 (xk ) + β G(j, :)∗ χk (Gxk+1 − g)j = 0
(9.29)
where the active index set Ak and the diagonal matrix χk on Ak are defined by
j ∈ Ak = {j : (Gxk − g)j ≥ gj },
(χk )jj =
1
.
max(, (Gxk − g)j )
The algorithm is globally convergent practically and the following results justify
the fact.
Theorem 3 Let
R(x, x
ˆ) = −(F (x) − F (ˆ(x)) − F 0 (ˆ
x)(x − ˆ(x))) + α (P (x − x
ˆ), x − x
ˆ) ≥ ω |x − x
ˆ|2
for some ω > 0 and all x, x
ˆ. Then, we have
R(xk+1 , xk ) + F (xk+1 ) − F (xk ) +
+
X
β
2
((χk Gdk , Gdk )
(χk , |Gxk+1 − c − g|2 − |Gxk − c − g|2 ) +
j0
X
(χk , |Gxk+1 − c + g|2 − |Gxk − c + g|2 ),
j 00
where {j 0 : (Gxk − c − g)j 0 ≥ 0} and {j 00 : (Gxk − c + g)j 00 ≤ 0}.
Proof: Multiplying (9.29) by dk = xk+1 − xk
α (P dk , dk ) − (F (xk+1 ) − F (xk ) − F 0 (xk )dk )
+F (xk+1 ) − F (xk ) + Ek = 0,
where
Ek = β(χk (G(j, :)xk+1 − c), G(j, :)dk ) =
+
X
β
2
((χk G(j, :)t dk , G(j, :)dk )
(χk , |Gxk+1 − c − g|2 − |Gxk − c − g|2 ) +
j0
X
(χk , |Gxk+1 − c + g|2 − |Gxk − c + g|2 )),
j 00
Corollary 2 Suppose all inactive indexes remain inactive, then
ω |xk+1 − xk |2 + J (xk+1 ) − J (xk ) ≤ 0
143
and {xk } is globally convergent.
p
p
Proof: Since s → ψ( |s − g|) on s ≥ g is and s → ψ( |s + g|) on s ≤ −g are concave,
we have
(χk , |(Gxk+1 − c − g)j 0 |2 − |(Gxk − c − g)j 0 |2 ) ≥ ψ ((Gxk+1 − c)j 0 ) − ψ ((Gxk − c − g)j 0 )
and
(χk , |(Gxk+1 − c + g)j 00 |2 − |(Gxk − c − g)j 00 |2 ) ≥ ψ ((Gxk+1 − c)j 00 ) − ψ ((Gxk − c)j 00 )
Thus, we obtain
F (xk+1 )+β ψ (xk+1 )+R(xk+1 , xk )+
β
(χk G(j, :)(xk+1 −xk ), G(j, :)(xk+1 −xk )) ≤ F (xk )+β ψ (xk ).
2
If we assume R(x, x
ˆ) ≥ ω |x − x
ˆ|2 for some ω > 0, then F (xk ) is monotonically decreasing and
X
|xk+1 − xk |2 < ∞.
Corollary 3 Suppose i ∈ Ack = I k is inactive but ψ (Gxk+1
− c) > 0. Assume
i
X
β
(χk G(j, :)(xk+1 − xk ), G(j, :)(xk+1 − xk )) − β
ψ (Gxk+1
− c) ≥ −ω 0 |xk+1 − xk |2
i
2
k
i∈I
with 0 ≤ ω 0 < ω, then the algorithm is globally convergent.
10
Sensitivity analysis
In this section the sensitivity analysis is discussed for the parameter-dependent optimization. Consider the parametric optimization problem;
F (x, p)
x ∈ C.
(10.1)
For given p ∈ P , a complete matrix space, let x = x(p) ∈ C is a minimizer of (10.1).
For p¯ ∈ P and p = p¯ + tp˙ ∈ P with t ∈ R and increment p˙ we have
F (x(p), p) ≤ F (x(¯
p), p),
F (x(¯
p), p¯) ≤ F (x(p), p¯).
Thus,
F (x(p), p) − F (x(¯
p), p¯) = F (x(p), p) − F (x(¯
p), p) + F (x(¯
p), p) − F (x(¯
p), p¯) ≤ F (x(¯
p), p) − F (x(¯
p), p¯)
F (x(p), p) − F (x(¯
p), p¯) = F (x(p), p¯) − F (x(¯
p), p¯) + F (x(p), p) − F (x(p), p¯) ≥ F (x(p), p) − F (x(p), p¯)
and
F (x(p), p) − F (x(p), p¯) ≤ F (x(p), p) − F (x(¯
p), p¯) ≤ F (x(¯
p), p) − F (x(¯
p), p¯).
Hence for t > 0 we have
F (x(¯
p + tp),
˙ p¯ + tp)
˙ − F (x(¯
p), p¯)
F (x(¯
p), p¯ + tp)
˙ − F (x(¯
p), p¯)
F (x(¯
p + tp),
˙ p¯ + tp)
˙ − F (x(¯
p + tp),
˙ p¯)
≤
≤
t
t
t
F (x(¯
p), p¯) − F (x(¯
p), p¯ − tp)
˙
F (x(¯
p), p¯) − F (x(¯
p − tp),
˙ p¯ − tp)
˙
F (x(¯
p − tp),
˙ p¯) − F (x(¯
p − t p),
˙ p¯ − tp)
˙
≤
≤
.
t
t
t
144
Based on this inequality we have
Theorem (Sensitivity I) Assume
(H1) there exists the continuous graph p → x(p) ∈ C in a neighborhood of (x(¯
p), p¯) ∈
C × P and define the value function
V (p) = F (x(p), p).
(H2) in a neighborhood of (x(¯
p), p¯)
(x, p) ∈ C × Q → Fp (x, p) is continuous.
Then, the G-derivative of V at p¯ exists and is given by
V 0 (¯
p)p˙ = Fp (x(¯
p, p¯)p.
˙
Next, consider the implicit function case. Consider a functional
V (p) = F (x(p)),
E(x(p), p) = 0
where we assume that the constraint E is C 1 and E(x, p) = 0 defines a continuous
solution graph p → x(p) in a neighborhood of (x(¯
p), p¯).
Theorem (Sensitivity II) Assume there exists λ that satisfies the adjoint equation
Ex (x(¯
p), p¯)∗ λ + Fx (x(¯
p)) = 0
and assume 1 + 2 = o(|p − p¯|), where
1 = (E(x(p), p¯) − E(x(¯
p), p¯) − Ex (x(¯
p), p¯)(x(p) − x(¯
p)), λ)
(10.2)
2 = F (x(p)) − F (x(¯
p)) − Fx (x(¯
p))(x(p) − x(¯
p))
Then, p → V (p) is differentiable at p¯ and
V 0 (¯
p)p˙ = (Ep (x(¯
p), p¯), λ).
Proof: Note that
V (p) − V (¯
p) = F 0 (x(¯
p))(x(p) − x(ˆ
p)) + 2
and
(Ex (x(¯
p), p¯)(x(p) − x(¯
p)), λ) + Fx (x(¯
p)(x(p) − x(¯
p)) = 0.
Since
0 = (E(x(p), p)−E(x(¯
p), p¯), λ) = (Ex (x(¯
p), p¯)(x(p)−x(¯
p))+(E(x(p), p)−E(x(p), p¯), λ)+1 ,
we have
V (p) − V (¯
p) = (Ep (x(p), p¯)(p − p¯), λ) + 1 + 2 + o(|p − p¯|)
and thus V (p) is differentiable and the desired result. If we assume that E and F is C 2 , then it is sufficient to have the Holder continuity
1
2
)
|x(p) − x(¯
p)|X ∼ o(|p − p¯|Q
145
for condition (10.2) holding.
Next, consider the parametric constrained optimization:
F (x, p) subject to E(x, p) ∈ K.
We assume
(H3) λ ∈ Y satisfies the adjoint equation
Ex (x(¯
p), p¯)∗ λ + Fx (¯
p), p¯) = 0.
(H4) Conditions (10.2) holds.
(H5) In a neighborhood of (x(¯
p), p¯).
(x, p) ∈ C × P → Ep (x, p) ∈ Y is continuous.
Eigenvalue problem Let A(p) be a linear operator in a Hilbert space X. For x =
(µ, y) ∈ R × X
F (x) = µ, subject to A(p)y = µ y.
That is, (µ, y) is an eigen pair of A(p).
A(p)∗ λ − µ λ = 0,
(y, λ) = 1
Thus,
µ0 (p)
˙ =(
d
A(p)py,
˙ λ).
dp
Next, consider the parametric constrained optimization; given p ∈ Q
min
x,u
F (x, u, p) subject to E(x, u, p) = 0.
(10.3)
Let (¯
x, u
¯) is a minimizer given p¯ ∈ Q. We assume
(H3) λ ∈ Y satisfies the adjoint equation
Ex (¯
x, u
¯, p¯)∗ λ + Fx (¯
x, u
¯, p¯) = 0.
(H4) Conditions (10.2) holds.
(H5) In a neighborhood of (x(¯
p), p¯).
(x, p) ∈ C × Q → Ep (x, p) ∈ Y is continuous.
Theorem (Sensitivity III) Under assumptions (H1)–(H5) the necessary optimality
for (12.16) is given by
Ex∗ λ + Fx (¯
x, u
¯, p¯) = 0
Eu∗ λ + Fu (¯
x, u
¯, p¯) = 0
E(¯
x, u
¯, p¯) = 0
and
V 0 (¯
p)p˙ = Fp (¯
x, u
¯, p¯)p˙ + (λ, Ep (¯
x, u
¯, p¯)p)
˙ = 0.
146
Proof:
F (x(p), p) − F (x(¯
p), p¯) = F (x(p), p¯) − F (x(¯
p), p¯) + F (x(p), p) − F (x(p), p¯)
≥ Fx (x(¯
p), p¯)(x(p) − x(¯
p) + F (x(p), p) − F (x(p), p¯)
F (x(p), p) − F (x(¯
p), p¯) = F (x(p), p) − F (x(¯
p), p) + F (x(¯
p), p) − F (x(¯
p), p¯)
≤ Fx (x(p), p)(x(p) − x(¯
p)) + F (x(¯
p), p) − F (x(¯
p), p¯).
Note that
E(x(p), p)−E(x(¯
p), p¯) = Ex (x(¯
p))(x(p)−x(¯
p)+E(x(p), p)−E(x(p), p¯) = o(|x(p)−x(¯
p)
and
(λ, Ex (¯
x, p¯)(x(p) − x(¯
p)) + Fx (¯
x, p¯), x(p) − x(¯
p) = 0
Combining the above inequalities and letting t → 0, we have
11
Augmented Lagrangian Method
In this section we discuss the augmented Lagrangian method for the constrained minimization;
min F (x) subject to E(x) = 0, G(x) ≤ 0
(11.1)
over x ∈ C. For the equality constraint case we consider the augmented Lagrangian
functional
c
Lc (x, λ) = F (x) + (λ, E(x)) + |E(x)|2 .
2
for c > 0. The augmented Lagrangian method is an iterative method with the update
xn+1 = argminx∈C Lc (x, λn )
λn+1 = λn + c E(xn ).
It is a combination of the multiplier method and the penalty method. That is, if c = 0
and λn+1 = λn + α E(xn ) with step size α > 0 is the multiplier method and if λn = 0
is the penalty method. The multiplier method converges (locally) if F is (locally)
uniformly convex. The penalty method requires c → ∞ and may suffer if c > 0 is too
large. It can be shown that the augmented Lagrangian method is locally convergent
¯ Since
provided that L00c (x, λ) is uniformly positive near the solution pair (¯
x, λ).
¯ = F 00 (¯
¯ E 00 (¯
L00 (¯
x, λ)
x) + (λ,
x)) + c E 0 (¯
x)∗ E 0 (¯
x),
the augmented Lagrangian method has an enlarged convergent class compared to the
multiplier method. Unlike the penalty method it is not necessary to c → ∞ and the
Lagrange multiplier update speeds up the convergence.
For the inequality constraint case G(x) ≤ 0 we consider the equivalent formulation:
min
c
F (x) + |G(x) − z|2 + (µ, G(x) − z) over (x, z) ∈ C × Z
2
147
subject to
and z ≤ 0.
G(x) = z,
For minimizing this over z ≤ 0 we have that
z ∗ = min(0,
µ + c G(x)
)
c
attains the minimum and thus we obtain
1
min F (x) + (| max(0, µ + c G(x))|2 − |µ|2 ) over x ∈ C.
2c
For (11.1), given (λ, µ) and c > 0, define the augmented Lagrangian functional
c
1
|E(x)|2 + (| max(0, µ + c G(x))|2 − |µ|2 ). (11.2)
2
2c
The first order augmented Lagrangian method is a sequential minimization of Lc (x, λ, µ)
over x ∈ C;
Lc (x, λ, µ) = F (x) + (λ, E(x)) +
Algorithm (First order Augmented Lagrangian method)
1. Initialize (λ0 , µ0 )
2. Let xn be a solution to
Lc (x, λn , µn ) over x ∈ C.
min
3. Update the Lagrange multipliers
λn+1 = λn + c E(xn ),
µn+1 = max(0, µn + c G(xn )).
(11.3)
Remark (1) Define the value function
Φ(λ, µ) = min Lc (x, λ, µ).
x∈C
Then, one can prove that
Φλ = E(x),
Φµ = max(0, µ + c G(x))
(2) The Lagrange multiplier update (11.3) is a gradient method for maximizing Φ(λ, µ).
(3) For the equality constraint case
c
Lc (x, λ) = F (x) + (λ, E(x)) + |E(x)|2
2
and
L0c (x, λ) = L00 (x, λ + c E(x)).
The necessary optimality for
max min Lc (x, λ)
λ
x
is given by
L0c (x, λ) = L00 (x, λ + c E(x)) = 0,
E(x) = 0
(11.4)
If we apply the Newton method to system (11.4), we obtain
 00
 +

 0

L0 (x, λ + c E(x)) + c E 0 (x)∗ E 0 (x) E 0 (x)∗
x −x
L0 (x, λ + c E(x))


 = −
.
0
+
E (x)
0
λ −λ
E(x)
The c > 0 adds the coercivity on the Hessian L000 term.
148
11.1
Primal-Dual Active set method
For the inequality constraint G(x) ≤ 0 (e.g., Gx − c˜ ≤ 0);
Lc (x, µ) = F (x) +
1
(| max(0, µ + c G(x)|2 − |µ|2 )
2c
and the necessary optimality condition is
F 0 (x) + G0 (x)∗ µ,
µ = max(0, µ + c G(x))
where c > 0 is arbitrary. Based on the complementarity condition we have primal dual
active set method.
Primal Dual Active Set method
For the affine case G(x) = Gx − c˜ we have the Primal Dual Active Set method as:
1. Define the active index and inactive index by
A = {k ∈ (µ + c (Gx − c˜))k > 0},
I = {j ∈ (µ + c (Gx − c˜))j ≤ 0}
2. Let µ+ = 0 on I and Gx+ = c˜ on A.
3. Solve for (x+ , µ+
A)
F 00 (x)(x+ − x) + F 0 (x) + G∗A µ+ = 0,
GA x+ − c = 0
4. Stop or set n = n + 1 and return to step 1,
where GA = {Gk }, k ∈ A. For the nonlinear G
G0 (x)(x+ − x) + G(x) = 0 on A = {k ∈ (µ + c G(x))k > 0}.
The primal dual active set method is a semismooth Newton method. That is, if
we define a generalized derivative of s → max(0, s) by 0 on (−∞, 0] and 1 on (0, ∞).
Note that s → max(0, s) is not differentiable at s = 0 and we define the derivative at
s = 0 as the limit of derivative for s < 0. So, we select the generalized derivative of
max(0, s) as 0, s ≤ 0 and 1, s > 0 and thus the generalized Newton update
µ+ − µ + 0 = µ
if µ + c (Gx − c) ≤ 0
c G(x+ − x) + c (Gx − c˜) = Gx+ − c˜ = 0
if µ + c (Gx − c˜) > 0,
which results in the active set strategy
µ+
j = 0, j ∈ I
and
(Gx+ − c˜)k = 0, k ∈ A.
We will introduce the class of semismooth functions and the semismooth Newton
method in Chapter 7 in function spaces. Specifically, the pointwise (coordinate) operation s → max(0, s) defines a semismooth function from Lp (Ω) → Lq (Ω), p > q.
149
12 Lagrange multiplier Theory for Nonsmooth
convex Optimization
In this section we present the Lagrange multiplier theory for the nonsmmoth optimization of the form
min f (x) + ϕ(Λx) over x ∈ C
(12.1)
where X is a Banach space, H is a Hilbert space lattice, f : X → R is C 1 , Λ ∈ L(X, H)
and C is a closed convex set in X. The nonsmoothness is represented by ϕ and ϕ is a
proper, lower semi continuous convex functional in H.
This problem class encompasses a wide variety of optimization problems including
variational inequalities of the first and second kind [?, ?]. We will specify those to our
specific examples. Moreover, the exact penalty formulation of the constraint problem
is
min f (x) + c |E(x)|L1 + c |(G(x))+ |L1
(12.2)
where (G(x))+ = max(0, G(x)).
We describe the Lagrange multiplier theory to deal with the non-smoothness of ϕ.
Note that (12.1) is equivalent to
min
f (x) + ϕ(Λx − u)
(12.3)
subject to x ∈ C
and u = 0 in H.
Treating the equality constraint u = 0 in 12.3) by the augmented Lagrangian method,
results in the minimization problem
min
x∈C,u∈H
f (x) + ϕ(Λx − u) + (λ, u)H +
c 2
|u| ,
2 H
(12.4)
where λ ∈ H is a multiplier and c is a positive scalar penalty parameter. By (12.6) we
have
min Lc (y, λ) = f (x) + ϕc (Λx, λ).
(12.5)
y∈C
where
ϕc (u, λ) = inf v∈H {ϕ(u − v) + (λ, v)H + 2c |v|2H }
= inf {ϕ(z) + (λ, u − z)H +
z∈H
c
|u − z|2H } (u − v = z).
2
(12.6)
For u, λ ∈ H and c > 0, ϕc (u, λ) is called the generalized Yoshida-Moreau approximations ϕ. It can be shown that u → ϕc (u, λ) is continuously Fr´echet differentiable with
Lipschitz continuous derivative. Then, the augmented Lagrangian functional [?, ?, ?]
of (12.1) is given by
Lc (x, λ) = f (x) + ϕc (Λx, λ).
(12.7)
Let ϕ∗ is the convex conjugate of ϕ
ϕc (u, λ) = inf z∈H {supy∈H {(y, z) − ϕ∗ (y)} + (λ, u − z)H + 2c |u − z|2H }
= sup {−
y∈H
1
|y − λ|2 + (y, u) − ϕ∗ (y)}.
2c
150
(12.8)
where we used
inf {(y, z)H + (λ, u − z)H +
z∈H
c
1
|u − z|2H } = (y, u) − |y − λ|2H
2
2c
Thus,
ϕ0c (u, λ) = yc (u, λ) = argmaxy∈H {−
where
1
|y − λ|2 + (y, u) − ϕ∗ (y)} = λ + c vc (u, λ) (12.9)
2c
c
vc (u, λ) = argminv∈H {ϕ∗ (u − v) + (λ, v) + |v|2 }.
2
Moreover, if
pc (λ) = argmin{
1
|p − λ|2 + h∗ (p)},
2c
then
ϕ0c (u, λ) = pc (λ + c x).
Theorem (Lipschitz complementarity)
(1) If λ ∈ ∂ϕ(x) for x, λ ∈ H, then λ = ϕ0c (x, λ) for all c > 0.
(2) Conversely, if λ = ϕ0c (x, λ) for some c > 0, then λ ∈ ∂ϕ(x).
Proof: If λ ∈ ∂ϕ(x), then from (12.6) and (12.8)
ϕ(x) ≥ ϕc (x, λ) ≥ hλ, xi − ϕ∗ (λ) = ϕ(x).
Thus, λ ∈ H attains the supremum of (12.17) and we have λ = ϕ0c (x, λ). Conversely,
if λ ∈ H satisfies λ = ϕ0c (x, λ) for some c > 0, then vc (x, λ) = 0 by (12.17). Hence it
follows from that
ϕ(x) = ϕc (x, λ) = (λ, x) − ϕ∗ (λ).
which implies λ ∈ ∂ϕ(x). If xc ∈ C denotes the solution to (12.5), then it satisfies
hf 0 (xc ) + Λ∗ λc , x − xc iX ∗ ,X ≥ 0,
for all x ∈ C
(12.10)
λc =
ϕ0c (Λxc , λ).
Under appropriate conditions [?, ?, ?] the pair (xc , λc ) ∈ C × H has a (strong-weak)
¯ as c → ∞ such that x
cluster point (¯
x, λ)
¯ ∈ C is the minimizer of (12.1) and that
¯ ∈ H is a Lagrange multiplier in the sense that
λ
¯ x−x
hf 0 (¯
x) + Λ∗ λ,
¯iX ∗ ,X ≥ 0,
for all x ∈ C,
(12.11)
with the complementarity condition
¯ = ϕ0 (Λ¯
¯
λ
x, λ),
c
for each c > 0.
(12.12)
¯ The advantage here is that
System (12.11)–(12.12) is for the primal-dual variable (¯
x, λ).
¯ ∈ ∂ϕ(Λ¯
the frequently employed differential inclusion λ
x) is replaced by the equivalent
nonlinear equation (12.12).
The first order augmented Lagrangian method for (12.1) is given by
151
• Select λ0 and set n = 0
• Let xn = argminx∈C Lc (x, λn )
• Update the Lagrange multiplier by λn+1 = ϕ0c (Λxn , λn ).
• Stop or set n = n + 1 and return to step 1.
In many applications, the convex conjugate functional ϕ∗ of ϕ is given by
ϕ∗ (v) = IK ∗ (v),
where K ∗ is a closed convex set in H and IS is the indicator function of a set S. From
(??)
1
ϕc (u, λ) = sup {− |y − λ|2 + (y, u)}
2c
y∈K ∗
and (12.12) is equivalent to
¯ = P rojK ∗ (λ
¯ + c Λ¯
λ
x).
(12.13)
which is the basis of our approach.
12.1
Examples
In this section we discuss the applications of the Lagrange multiplier theorem and the
complementarity.
Example (Inequality) Let H = L2 (Ω) and ϕ = IC with C = {s ∈ L2 (Ω) : z ≤ 0, a.e.,
i.e.,
min f (x) over x ∈ Λx − c ≤ 0.
(12.14)
In this case
ϕ∗ (v) = IK ∗ (v) with K ∗ = −C
and the complementarity (15.25) implies that
¯=0
f 0 (¯
x) + Λ∗ λ
¯ = max(0, λ
¯ + c (Λ¯
λ
x − c)).
Example (L1 (Ω) optimization) Let H = L2 (Ω) and ϕ(v) =
Z
|v| dx, i.e.
Ω
Z
min
|Λy|2 dx.
f (y) +
Ω
In this case
ϕ∗ (v) = IK ∗ (v) with K ∗ = {z ∈ L2 (Ω) : |z|2 ≤ 1, a.e.}
and the complementarity (15.25) implies that
¯=0
f 0 (¯
x) + Λ ∗ λ
¯=
λ
¯ + c Λ¯
λ
x
¯ + c Λ¯
max(1, |λ
x|2 )
152
a.e..
(12.15)
Exaples (Constrained optimization) Let X be a Hilbert space and consider
min
f (x) subject to Λx ∈ C
(12.16)
where f : X → R is C 1 and C is a closed convex set in X. In this case we let f = J
and Λ is the natural injection and ϕ = IC .
ϕc (u, λ) = inf {(λ, u − z)H +
z∈C
and
z ∗ = P rojC (u +
c
|u − z|2H }.
2
(12.17)
λ
)
c
(12.18)
attains the minimum in (12.17) and
ϕ0c (u, λ) = λ + c (u − z ∗ )
(12.19)
If H = L2 (Ω) and C is the bilateral constraint {y ∈ X : φ ≤ Λy ≤ ψ}, the complementarity (12.18)–(12.19) implies that for c > 0
f 0 (x∗ ) + Λ∗ µ = 0
(12.20)
µ = max(0, µ + c (Λx∗ − ψ) + min(0, µ + c (Λx∗ − φ))
a.e..
If either φ = −∞ or ψ = ∞, it is unilateral constrained and defines the generalized
obstacle problem. The cost functional J(y) can represent the performance index for the
optimal control and design problem, the fidelity of data-to-fit for the inverse problem
and deformation and restoring energy for the variational problem.
13
Semismooth Newton method
In this section we present the semismooth Newton method for the nonsmooth necessary
optimality condition, e.g., the complementarity condition (12.10):
λ = ϕ0c (Λx, λ).
Consider the nonlinear equation F (y) = 0 in a Banach space X. The generalized
Newton update is given by
y k+1 = y k − Vk−1 F (y k ),
(13.1)
where Vk is a generalized derivative of F at y k . In the finite dimensional space or for
a locally Lipschitz continuous function F let DF denote the set of points at which F
is differentiable. For x ∈ X = Rn we define the Bouligand derivative ∂B F (x) as
∂B F (x) = J : J =
lim
∇F (xi ) ,
(13.2)
xi →x, xi ∈DF
where DF is dense by Rademacher’s theorem which states that every locally Lipschitz
continuous function in the finite dimensional space is differentiable almost everywhere.
Thus, we take Vk ∈ ∂B F (y k ).
153
In infinite dimensional spaces notions of generalized derivatives for functions which
are not C 1 cannot rely on Rademacher’s theorem. Here, instead, we shall mainly
utilize a concept of generalized derivative that is sufficient to guarantee superlinear
convergence of Newton’s method [?]. This notion of differentiability is called Newton
derivative and is defined below. We refer to [?, ?, ?, ?] for further discussion of the
notions and topics. Let X, Z be real Banach spaces and let D ⊂ X be an open set.
Definition (Newton differentiable) (1) F : D ⊂ X → Z is called Newton differentiable at x, if there exists an open neighborhood N (x) ⊂ D and mappings
G : N (x) → L(X, Z) such that
lim
|h|→0
|F (x + h) − F (x) − G(x + h)h|Z
= 0.
|h|X
The family {G(y) : y ∈ N (x)} is called a N -derivative of F at x.
(2) F is called semismooth at y, if it is Newton differentiable at y and
lim G(y + t h)h
t→0+
exists uniformly in |h| = 1.
Semi-smoothness was originally introduced in [?] for scalar-valued functions. Convex functions and real-valued C 1 functions are examples for such semismooth functions
[?, ?] in the finite dimensional space.
For example, if F (y)(s) = ψ(y(s)), point-wise, then G(y)(s) ∈ ψB (y(s)) is an N derivative in Lp (Ω) → Lq (Ω) under appropriate conditions [?]. We often use ψ(s) = |s|
and max(0, s) for the necessary optimality.
Suppose F (y ∗ ) = 0, Then, y ∗ = y k + (y ∗ − y k ) and
|y k+1 − y ∗ | = |Vk−1 (F (y ∗ ) − F (y k ) − Vk (y ∗ − y k )| ≤ |Vk−1 |o(|y k − y ∗ |).
Thus, the semismooth Newton method is q-superlinear convergent provided that the
Jacobian sequence Vk is uniformly invertible as y k → y ∗ . That is, if one can select
a sequence of quasi-Jacobian Vk that is consistent, i.e., |Vk − G(y ∗ )| as y k → y ∗ and
invertible, (13.1) is still q-superlinear convergent.
13.1
Bilateral constraint
Consider the bilateral constraint problem
min
f (x) subject to φ ≤ Λx ≤ ψ.
(13.3)
We have the optimality condition (12.20):
f 0 (x) + Λ∗ µ = 0
µ = max(0, µ + c (Λx − ψ) + min(0, µ + c (Λx − φ))
a.e..
Since
∂B max(0, s) = {0, 1},
∂B min(0, s) = {−1, 0}, at s = 0,
the semismooth Newton method for the bilateral constraint (13.3) is of the form of the
primal-dual active set method:
Primal dual active set method
154
1. Initialize y 0 ∈ X and λ0 ∈ H. Set k = 0.
2. Set the active set A = A+ ∪ A− and the inactive set I by
A+ = {x ∈ Ω : µk +c (Λy k −ψ) > 0},
A− = {x ∈ Ω : µk +c (Λy k −φ) < 0},
I = Ω\A
3. Solve for (y k+1 , λk+1 )
J 00 (y k )(y k+1 − y k ) + J 0 (y k ) + Λ∗ µk+1 = 0
Λy k+1 (x) = ψ(x), x ∈ A+ , Λy k−1 (x) = φ(x), x ∈ A− , µk+1 (x) = 0, x ∈ I
4. Stop, or set k = k + 1 and return to the second seep.
Thus the algorithm involves solving the linear system of the form


 

A
Λ∗A+ Λ∗A−
y
f
 ΛA+
0
0   µA+  =  ψ  .
ΛA−
0
0
µA−
φ
In order to gain the stability the following one-parameter family of the regularization
can be used [?, ?]
µ = α (max(0, µ + c (Λy − ψ)) + min(0, µ + c (Λy − φ))),
13.2
α → 1+ .
L1 optimization
R
(L1 -type optimiza-
λ max(1 + , |λ + c v|) = λ + c v
(13.4)
As an important example, we discuss the case of ϕ(v) =
tion). The complementarity is reduced to the form
Ω |v|2 dx
for ≥ 0. It is often convenient (but not essential) to use very small > 0 to avoid the
singularity for the implementation of algorithm. Let
 2
s

 2 + 2 , if |s| ≤ ,
(13.5)
ϕ (s) =


|s|,
if |s| ≥ .
Then, (13.4) corresponds to the regularized one of (4.24):
min
J (y) = F (y) + ϕ (Λy).
The semismooth Newton method is given by
F 0 (y + ) + Λ∗ λ+ = 0

v+


λ+ =
if |λ + v| ≤ 1 + ,



t !
λ + cv
|λ + c v| λ+ + λ
(λ+ + c v + )


|λ
+
c
v|




= λ+ + c v + + |λ + c v| λ if |λ + c v| > 1 + .
155
(13.6)
There is no guarantee that (13.6) is solvable for λ+ and is stable. In order to obtain
the compact and unconditionally stable formula we use the damped and regularized
algorithm with β ≤ 1;

v

if |λ + c v| ≤ 1 + ,
λ+ =









t !
λ
λ
+
c
v
|λ + c v| λ+ − β
(λ+ + c v + )


max(1, |λ|) |λ + c v|





λ


= λ+ + c v + + β |λ + c v|
if |λ + c v| > 1 + .

max(1, |λ|)
(13.7)
λ
Here, the purpose of the regularization |λ|∧1
is to automatically constrain the dual
variable λ into the unit ball. The damping factor β is automatically selected to achieve
the stability. Let
d = |λ + c v|,
η = d − 1,
a=
λ
,
|λ| ∧ 1
b=
λ + cv
,
|λ + c v|
F = abt .
Then, (13.7) is equivalent to
λ+ = (η I + β F )−1 ((I − β F )(c v + ) + βd a),
where by Sherman–Morrison formula
(η I + β F )−1 =
1
β
(I −
F ).
η
η+βa·b
Then,
(η I + β F )−1 βd a =
βd
a.
η+βa·b
Since F 2 = (a · b) F ,
(η I + β F )−1 (I − β F ) =
1
βd
(I −
F ).
η
η+βa·b
In order to achieve the stability, we let
d−1
βd
= 1, i.e., β =
≤ 1.
η+βa·b
d−a·b
Consequently, we obtain a compact Newton step
λ+ =
1
λ
(I − F )(c v + ) +
,
d−1
|λ| ∧ 1
which results in;
Primal-dual active set method (L1 -optimization)
1. Initialize: λ0 = 0 and solve F 0 (y 0 ) = 0 for y 0 . Set k = 0.
2. Set inactive set Ik and active set Ak by
Ik = {|λk + c Λy k | > 1 + },
and Ak = {|λk + c Λy k | ≤ 1 + }.
156
(13.8)
3. Solve for (y k+1 , λk+1 ) ∈ X × H:


F 0 (y k+1 ) + Λ∗ λk+1 = 0









λk
1
(I − F k )(c Λy k+1 ) + k
in Ak
λk+1 = k

d −1
|λ | ∧ 1








 Λy k+1 = λk+1 in I .
k
and
4. Convergent or set k = k + 1 and Return to Step 2.
This algorithm is unconditionally stable and is rapidly convergent for our test examples.
Remark (1) Note that
(λ+ , v + ) =
|v + |2 − (a · v + )(b · v + )
+ a · v+,
d−1
(13.9)
which implies stability of the algorithm. In fact, since
 0 k+1

F (y ) − F 0 (y k ) + Λ∗ (λk+1 − λk )




1


 λk+1 − λk = (I − F k )(c Λy k+1 ) + ( k − 1) λk ,
|λ |
we have
(F 0 (y k+1 ) − F 0 (y k ), y k+1 − y k ) + c ((I − F k )Λy k+1 , Λy k+1 ) + (
1
− 1) (λk , Λy k+1 ).
|λk |
Suppose
F (y) − F (x) − (F 0 (y), y − x) ≥ ω |y − x|2X ,
we have
F (y k ) +
k
X
ω |y j − y j−1 |2 + (
j=1
1
− 1) (λj−1 , v j ) ≤ F (y 0 ).
|λj−1 | ∧ 1
where v j = Λy j . Note that (λk , v k+1 ) > 0 implies that (λk+1 , v k+1 ) > 0 and thus
inactive at k+1-step, pointwise. This fact is a key step for proving a global convergence
of the algorithm.
(2) If a · b → 1+ , then β → 1. Suppose (λ, v) is a fixed point of (13.8), then
1
c λ + cv
1
1−
1−
·v
λ=
v.
|λ| ∧ 1
d − 1 |λ + c v|
d−1
Thus, the angle between λ and λ + c v is zero and
1
c
|v|
|v|
1−
=
−
,
|λ| ∧ 1
|λ| + c |v| − 1 |λ| |λ| ∧ 1
which implies |λ| = 1. It follows that
λ + cv =
λ + cv
.
max(|λ + c v|, 1 + )
157
That is, if the algorithm converges, a · b → 1 and |λ| → 1 and it is consistent.
(3) Consider the substitution iterate:
 0 +
F (y ) + Λ∗ λ+ = 0,







 λ+ =
1
v+,
max(, |v|)
(13.10)
v = Λy.
Note that
v +
,v − v
|v|
=
|v + |2 − |v|2 + |v + − v|2
2|v|
dx
(|v + | − |v|)2 + |v + − v|2
+
= |v | − |v| −
.
2|v|
Thus,
J (v + ) ≤ J (v),
where the equality holds only if v + . This fact can be used to prove the iterative
method (13.10) is globally convergent [?, ?]. It also suggests to use the hybrid method;
for 0 < µ < 1
λ+ =
µ
1
λ
v + + (1 − µ) (
(I − F )v + +
),
max(, |v|)
d−1
max(|λ|, 1)
in order to gain the global convergence property without loosing the fast convergence
of the Newton method.
14
Exercise
Problem 1 If A is a symmetric matrix on Rn and b ∈ Rn is a vector. Let F (x) =
1 t
t
0
2 x Ax − b x. Show that F (x) = Ax − b.
Problem 2 Sequences {fn }, {gn }, {hn } are Cauchy in L2 (0, 1) but not on C[0, 1].
Problem 3 Show that ||x| − |y|| ≤ |x − y| and thus the norm is continuous.
Problem 4 Show that all norms are equivalent on a finite dimensional vector space.
Problem 5 Show that for a closed linear operator T , the domain D(T ) is a Banach
space if it is equipped by the graph norm
|x|D(T ) = (|x|2X + |T x|2Y )1/2 .
Problem 6 Complete the DC motor problem.
Problem 7 Consider the spline interpolation problem (3.8). Find the Riesz representations Fi and F˜j in H02 (0, 1). Complete the spline interpolation problem (3.8).
Problem 8 Derive the necessary optimality for L1 optimization (3.32) with U = [−1, 1].
158
Problem 9 Solve the constrained minimization in a Hilbert space X;
min
1 2
|x| − (a, x)
2
subject to
(yi , x) = bi , 1 ≤ i ≤ m1 ,
(˜
yj , x) ≤ cj , 1 ≤ j ≤ m2 .
1
2
over x ∈ X. We assume {yi }m
yj } m
i=1 , {˜
j=1 are linearly independent.
Problem 10 Solve the pointwise obstacle problem:
Z 1
1 du
( | |2 − f (x)u(x)) dx
min
0 2 dx
subject to
1
u( ) ≤ c1 ,
3
2
u( ) ≤ c2 ,
3
over a Hilbert space X = H01 (0, 1) for f = 1.
Problem 11 Find the differential form of the necessary optimality for
Z 1
1
du
1
min
( (a(x)| |2 +c(x)|u|2 )−f (x)u(x)) dx+ (α1 |u(0)|2 +α2 |u(1)|2 )−(g1 , u(0))−(g2 , u(1))
2
dx
2
0
over a Hilbert space X = H 1 (0, 1).
Problem 12 Let B ∈ Rm,n . Show that R(B) = Rm if and only if BB t is positive definite
on Rm . Show that x∗ = B t (BB t )−1 c defines a minimum norm solution to Bx = c.
Problem 13 Consider the optimal control problem (3.44) (Example Control problem)
ˆ = {u ∈ Rm : |u|∞ ≤ 1}. Derive the optimality condition (3.45).
with U
Problem 14 Consider the optimal control problem (3.44) (Example Control problem)
ˆ = {u ∈ Rm : |u|2 ≤ 1}. Find the orthogonal projection PC and derive the
with U
necessary optimality and develop the gradient method.
R1
Problem 15 Let J(x) = 0 |x(t)| dt is a functional on C(0, 1). Show that J 0 (d) =
R1
0 sign(x(t))d(t) dt.
Problem 16 Develop the Newton methods for the optimal control problem (3.44) withˆ = Rm ).
out the constraint on u (i.e., U
Problem 17 (1) Show that the conjugate functional
h∗ (p) = sup{(p, u) − h(u)}
u
is convex.
(2) The set-valued function Φ
Φ(p) = argmaxu {(p, u) − h(u)}
is monotone.
159
Problem 18 Find the graph Φ(p) for
max {pu − |u|0 }
|u|≤γ
and
max {pu − |u|1 }.
|u|≤γ
15
Non-convex nonsmooth optimization
In this section we consider a general class of nonsmooth optimization problems. Let
X be a Banach space and continuously embed to H = L2 (Ω)m . Let J : X → R+ is C 1
and N is nonsmooth functional on H is given by
Z
N (y) =
h(y(ω)) dω.
Ω
where h : Rm → R is lower semi-continuous. Consider the minimization on X;
min
J(x) + N (x)
(15.1)
subject to
x ∈ C = {x(ω) ∈ U a.e. }
where U is a closed convex set in Rm .
For example, we consider h(u) = α2 |u|2 + |u|0 where |s|0 = 1, s 6= 0 and |0|0 = 0.
For an integrable function u
Z
|u(ω)|0 dω = meas({u 6= 0}).
Ω
Thus, one can formulate the volume control and scheduling problem as (15.1).
15.1
Existence
In this section we develop a existence method for a general class of minimizations
min G(x)
(15.2)
on a Banach space X, including (15.1). Here, the constrained problem x ∈ C is
formulated as G(x) = G(x) + IC (x) using the indicator function of the constrained set
˜ ∗ . Thus xn * x means the weak or weakly star
C. Let X be a reflexible space or X = X
convergence in X. To prove the existence of solutions to (15.2) we modify a general
existence result given in [?].
Theorem (Existence) Let G be an extended real-valued functional on a Banach
space X satisfying the following properties:
(i) G is proper, weakly lower semi-continuous and G(x) + |x|p for each > 0 is
coercive for some p.
160
(ii) for any sequence {xn } in X with |xn | → ∞, |xxnn | * x
¯ weakly in X, and {G(xn )}
xn
bounded from above, we have |xn | → x
¯ strongly in X and there exists ρn ∈ (0, |xn |]
and n0 = n0 ({xn }) such that
G(xn − ρn x
¯) ≤ G(xn ),
(15.3)
for all n ≥ n0 .
Then minx∈X G(x) admits a minimizer.
Proof: Let {n } ⊂ (0, 1) be a sequence converging to 0 from above and consider the
family of auxiliary problems
min G(x) + n |x|p .
(15.4)
x∈x
By (i), every minimizing sequence for (15.1) is bounded. Extracting a weakly convergent subsequence, the existence of a solution xn ∈ X for (15.1) can be argued in a
standard manner.
Suppose {xn } is bounded. Then there exists a weakly convergent subsequence,
denoted by the same symbols and x∗ such that xn * x∗ . Passing to the limit n → 0+
in
n
n
G(xn ) + |xn |p ≤ G(x) + |x|p for all x ∈ X
2
2
and using again (i) we have
G(x∗ ) ≤ G(x) for all x ∈ X
and thus x∗ is a minimizer for G.
We now argue that {xn } is bounded, and assume to the contrary that limn→∞ |xn | =
∞. Then a weakly convergent subsequence can be extracted from |xxnn | such that, again
¯, for some x
¯ ∈ X. Since G(xn ) is bounded from (ii)
dropping indices, |xxnn | * x
xn
→x
¯ strongly in X.
|xn |
(15.5)
Next, choose ρ > 0 and n0 according to (15.3) and thus for all n ≥ n0
G(xn ) + n |xn |p ≤ G(xn − ρ¯
x) + n |xn − ρ¯
x|p ≤ G(xn ) + n |xn − ρ¯
x|p .
It follows that
|xn | ≤ |xn − ρ¯
x| = |xn − ρn
xn
xn
ρ
xn
+ ρn (
−x
¯)| ≤ |xn |(1 −
) + ρn |
−x
¯|.
|xn |
|xn |
|xn |
|xn |
This implies that
1≤|
xn
−x
¯|,
|xn |
which give a contradiction to (15.5), and concludes the proof. The first half of condition (ii) is a compactness assumption and (15.3) is a compatibility condition [?].
Remark (1) If G is not bounded below. Define the (sequential) recession functional
G∞ associated with G. It is the extended real-valued functional defined by
G∞ (x) =
inf
lim inf
xn *x,tn →∞ n→∞
161
1
G(tn xn ).
tn
(15.6)
and assume G∞ (x) ≥ 0. Then, G(x) + |x|2 is coercive. If not, there exists a sequence
{xn } satisfying G(xn ) + |xn |2 is bounded but |xn | → ∞. Let w − lim |xxnn | = x
¯.
1
xn
G(|xn |
) + |xn | → 0.
|xn |
|xn |
Since G∞ (¯
x) ≤ |x1n | G(|xn | |xxnn | ), which yields a contradiction.
Moreover, in the proof note that
G∞ (¯
x) ≤ lim inf
n→∞
1
1
xn
G(xn ) = lim inf
G(
|xn |) ≤ 0.
n→∞
|xn |
|xn | |xn |
If we assume G∞ ≥ 0, then G∞ (¯
x) = 0. Thus, (15.3) must hold, assuming G∞ (¯
x) = 0.
In [?] (15.3) is assumed for all x ∈ X and G∞ (¯
x) = 0, i.e.,
G(x − ρ¯
x) ≤ G(x).
Thus, our condition is much weaker. In [?] we find a more general version of (15.3):
there exists a ρn and zn such that ρn ∈ (0, |xn |] and zn ∈ X
G(xn − ρn zn ) ≤ G(xn ),
|zn − z| → 0 and |z − x
¯| < 1.
(2) We have also proved that a sequence {xn } that minimizes the regularized problem
for each n has a weakly convergent subsequence to a minimizer of G(x).
Example 1 If ker(G∞ ) = 0, then x
¯ = 0 and the compactness assumption implies
|¯
x| = 1, which is a contradiction. Thus, the theorem applies.
Example 2 (`0 optimization) Let X = `2 and consider `0 minimization:
1
G(x) = |Ax − b|22 + β |x|0 ,
2
(15.7)
where
|x|0 = the number of nonzero elements of x ∈ `2 ,
the counting measure of x. Assume N (A) is finite and R(A) is closed. The, one
can apply Theorem to prove the existence of solutions to (15.7). That is, suppose
n
|xn | → ∞, G(xn ) is bounded from above and |xxn | * z. First, we show that z ∈ N (A)
and
xn
|xn |
→ z. Since {G(xn )} is bounded from above, here exists M such that
1
J(xn ) = |Axn − b|22 + β |xn |0 ≤ M, for all n.
2
(15.8)
Since R(A) closed, by the closed range theorem X = R(A∗ )+N (A) and let xn = x1n +x2n
Consequently, 0 ≤ |Ax1n |2 − 2(b, Ax1n )2 + |b|22 ≤ K and |Ax1n | is bounded.
x1 2
1
xn
( A∗ b,
)2 + |b|22 → 0.
0 ≤ A( n ) 2 − 2
|xn |2
|xn |2
|xn |2
(15.9)
1
By the closed range theorem this implies that |xni is bounded and |xxnn|2 → x
¯1 = 0 in `2 .
2
Since |xxnn|2 * x
¯2 and by assumption dim N (A) < ∞ it follows that
strongly in `2 . Next, (15.3) holds. Since
|xn |0 = |
xn
|0
|xn |
162
xn
|xn |2
→z =x
¯2
and thus |z|0 < ∞ (15.3) is equivalent to showing that
|(x1n + x2n − ρ z)i |0 ≤ |(x1n + x2n )i |0
for all i,
and for all n sufficiently large. Only the coordinates for which (x1n + x2n )i = 0 with
zi 6= 0 require our attention. Since |z|0 < ∞ there exists ˜i such that zi = 0 for all i > ˜i.
For i ∈ {1, . . . , ˜i} we define Ii = {n : (x1n + x2n )i = 0, zi 6= 0}. These sets are finite.
In fact, if Ii is infinite for some i ∈ {1, . . . , ˜i}, then limn→∞,n∈Ii |xn1 |2 (x1n + x2n )i = 0.
Since limn→∞ |xn1 |2 (x1n )i = 0 this implies that zi = 0, which is a contradiction.
Example 3 (Obstacle problem) Consider
Z
min
1
G(u) =
0
1
( |ux |2 − f u) dx
2
subject to u( 12 ) ≤ 1. Let X = H 1 (0, 1). If |un | → ∞ and vn =
assumption in (ii)
Z
Z 1
1 1
1
G(un )
2
|(vn )x | dx ≤
f vn dx +
→0
2 0
|un | 0
|un |2
(15.10)
un
|un |
* v. By the
and thus (vn )x → 0 and vn → v = c for some constant c. But, since vn ( 21 ) ≤
c = v( 21 ) ≤ 0 Thus,
Z
G(un − ρ v) = G(un ) +
1
|un |
→ 0,
1
ρvf (x) dx ≤ G(un )
0
if
R1
0
f (x) dx ≥ 0. That is, if
R1
0
f (x) dx ≥ 0 (15.10) has a minimizer.
Example 4 (Friction Problem)
Z
min
0
1
1
( |ux |2 − f u) dx + |u(1)|
2
(15.11)
Using exactly the same argument above. we have (vn ) → 0 and vn → v = c. But, we
have
Z 1
G(un − ρv) = G(un ) + ρc
f (x) dx + |un (1) − ρc| − |un (1)|
0
where
ρ
|un (1) − ρc| − |un (1)|
= |vn (1) −
c| − |vn (1)| ≤ 0
|un |
|un |
for sufficiently large n since
|un (1) − ρc| − |un (1)|
→ |c − ρˆc| − |c| = −ˆ
ρ|c|
|un |
R1
where ρˆ = limn→∞ |uρnn | . Thus, if | 0 f (x) dx| ≤ 1, then (15.11) has a minimizer.
Example 5 (L∞ Laplacian).
Z
1
|u − f | dx subject to |ux | ≤ 1
min
0
163
Similarly, we have (vn )x → 0 and vn → v = c.
G(un − ρc) − G(un )
f
f
= G(vn − ρˆc −
) − G(vn −
)≤0
|un |
|un |
|un |
for sufficiently large n.
Example 6 (Elastic contact problem) Let X = H 1 (Ω)d and ~u is the deformation field.
Given, the boundary body force ~g , consider the elastic contact problem:
Z
Z
Z
1
: σ dx −
|τ · ~u| dsx , subject to n · ~u ≤ ψ, (15.12)
~g · ~u dsx +
min
Γ2
Ω 2
Γ1
where we assume the linear strain:
∂uj
1 ∂ui
(u) = (
+
)
2 ∂xj
∂xi
and the Hooke’s law:
σ = µ + λ trI
with positive Lame constants µ, λ. If |~un | → ∞ and G(un ) is bounded we have
(vn ) → 0
∇(vn ) → 0
for ~vn = |~uu~nn | * v. Thus, one can show that ~vn → v = (v 1 , v 2 ) ∈ {(−Ax2 + C1 , Ax1 +
C2 )} for the two dimensional problem. Consider the case when Ω = (0, 1)2 and Γ1 =
{x2 = 1} and Γ2 = {x2 = 0}. As shown in the previous examples
Z
Z
G(~un − ρ v) − G(~un )
2
=
ρˆn · ~g v dsx +
ρˆτ · ~g v 1 dsx
|~un |
Γ1
Γ1
Z
(|(vn )1 − ρˆv 1 | − |(vn )1 |) dsx
+
Γ2
Since v 2 ∈ C = {Ax1 + C2 ≤ 0} and for v 1 = C1
|(vn )1 − ρˆv 1 | − |(vn )1 | → |(v)1 − ρˆv 2 | − |(v)1 | = −ˆ
ρ|v 1 |,
R
R
if we assume | Γ1 τ · ~g dsx | ≤ 1 and Γ1 n · ~g v2 dsx ≥ 0 for all v2 ∈ C, then (15.12) has
a minimizer.
Example 7
Z
min
0
with a(x) > 0 except
15.2
a( 12 )
1
1
( (a(x)|ux |2 + |u|2 ) − f u) dx
2
= 0.
Necessary Optimality
Let X be a Banach space and continuously embed to H = L2 (Ω)m . Let J : X → R+
is C 1 and N is nonsmooth functional on H = L2 (Ω)m is given by
Z
N (y) =
h(y(ω)) dω.
Ω
164
where h : Rm → R is a lower semi-continuous functional. Consider the minimization
on H;
min J(y) + N (y)
(15.13)
subject to
y ∈ C = {y(ω) ∈ U a.e. }
where U is a closed convex set in Rm .
Given y ∗ ∈ C, define needle perturbations of (the optimal solution) y ∗ by
∗
y
on |ω − s| > δ
y=
u
on |ω − s| ≤ δ
for δ > 0, s ∈ ω, and u ∈ U . Assume that
|J(y) − J(y ∗ ) − (J 0 (y ∗ ), y − y ∗ )| ∼ o(meas({|s| < δ}).
(15.14)
Let H is the Hamiltonian defined by
H(y, λ) = (λ, y) − h(y),
y, λ ∈ Rm .
Then, we have the pointwise maximum principle
Theorem (Pointwise Optimality) If an integrable function y ∗ attains the minimum
and (15.14) holds, then for a.e. ω ∈ Ω
J 0 (y ∗ ) + λ = 0
(15.15)
H(u, λ(ω)) ≤ H(y ∗ (ω), λ(ω))
for all u ∈ U.
Proof: Since
0 ≥ J(y) + N (y) − (J(y ∗ ) + N (y ∗ ))
∗
0
∗
∗
Z
= J(y) − J(y ) − (J (y ), y − y ) +
(J 0 (y ∗ ), u − y ∗ ) + h(u) − h(y ∗ )) dω
|ω−s|<δ
Since h is lower semicontinuos, it follows from (15.14) that at a Lubegue point s = ω
of y ∗ (15.15) holds. For λ ∈ Rm define a set-valued function by
Φ(λ) = argmaxy∈U {(λ, y) − h(y)}.
Then, (15.15) is written as
y(ω) ∈ Φ(−J 0 (y(ω))
The graph of Φ is monotone:
(λ1 − λ2 , y1 − y2 ) ≥ 0 for all y1 ∈ Φ(λ1 ), y2 ∈ Φ(λ2 ).
but it not necessarily maximum monotone. Let h∗ be the conjugate function of h
defined by
h∗ (λ) = max H(λ, y).
u inU
165
Then, h∗ is necessarily convex. Since if u ∈ Φ(λ), then
ˆ ≥ (λ,
ˆ y) − h(u) = (λ
ˆ − λ, y) + h∗ (u),
h∗ (λ)
thus u ∈ ∂h∗ (λ). Since h∗ is convex, ∂h∗ is maximum monotone and thus ∂h∗ is the
maximum monotone extension.
Let h∗∗ be the bi-conjugate function of h;
h∗∗ (x) = sup{(x, λ) − sup H(y, λ)}
y
λ
whose epigraph is the convex envelope of the epigraph of h, i.e.,
epi(h∗∗ ) = {λ1 y1 + (1 − λ) y2 for all y1 , y2 ∈ epi(h) and λ ∈ [0, 1]}.
Then, h∗∗ is necessarily convex and is the convexfication of h and
λ ∈ ∂h∗∗ (u) if and only if u ∈ ∂h∗ (λ)
and
h∗ (λ) + h∗∗ (u) = (λ, u).
Consider the relaxed problem:
Z
min
h∗∗ (y(ω)) dω.
J(y) +
(15.16)
Ω
Since h∗∗ is convex,
Z
∗∗
N (y) =
h∗∗ (y) dω
Ω
is weakly lower semi-continuous and thus there exits a minimizer y of (15.16). It thus
follows from Theorem (Pointwise Optimality) that the necessary optimality of (15.16)
is given by
−J 0 (y)(ω) ∈ ∂h∗∗ (y(ω)),
or equivalently
y ∈ ∂h∗ (−J 0 (y)).
(15.17)
Assume
J(ˆ
y ) − J(y) − J 0 (y)(ˆ
y − y) ≥ 0 for all yˆ ∈ C,
where
y(ω) ∈ Φ(−J 0 (y(ω))).
Then, y minimizes the original cost functional (15.13).
One can construct a C 1 realization for h by
h∗ (λ) = min{
p
1
|p − λ|2 + h∗ (p)}.
2
That is, h∗ is C 1 and (h∗ )0 is Lipschitz and (h∗ )0 is the Yosida approximation of ∂h∗ .
Then, we let h = (h∗ )∗
h (u) = min{
1
|y − u|2 + h(y)}.
2
166
Consider the relaxed problem
Z
min
J(y) +
h (y(ω)) dω.
(15.18)
Ω
The necessary optimality condition becomes
y(ω) = (h∗ )0 (−J 0 (y(ω))).
(15.19)
Then, one can develop the semi smooth Newton method to solve (15.19).
Since
|h − h∗∗ |∞ ≤ Ch∗∗ ,
one can argue that a sequence of minimizers y of (15.18) to a minimizer of (15.16).
In fact,
Z
Z
h (y) dω.
h (y ) dω. ≤ J(y) +
J(y ) +
Ω
Ω
for all y ∈ C Suppose |y | is bounded , then y has a weakly convergent subsequence to
y¯ ∈ C and
Z
Z
∗∗
J(¯
y) +
h (¯
y ) dω. ≤ J(y) +
h∗∗ (y) dω.
Ω
Ω
for all y ∈ C.
Example (L0 (Ω) optimization) Based on our analysis one can analyze when h involves
the volume control by L0 (Ω), i.e., Let h on R be given by
h(u) =
where
α 2
|u| + β|u|0 ,
2
(
(15.20)
0 if u = 0
|u|0 =
1 if u 6= 0.
That is,
Z
|u|0 dω = meas({u(ω) 6= 0}).
Ω
In this case Theorem applies with
Φ(q) := argminu∈R (h(u) − qu) =


q
α
for |q| ≥

0
for |q| <
√
√
2αβ
(15.21)
2αβ.
and the conjugate function h∗ of h;
∗
−h (q) = h(Φ(q)) − q Φ(q) =

1
|q|2 + β
 − 2α

0
for |q| ≥
for |q| <
The bi-conjugate function h∗∗ (u) is given by

q
α
2

 2 |u| + β |u| > 2β

α
h∗∗ (u) =
q


 √2αβ |u| |u| ≤ 2β .
α
167
√
√
2αβ
2αβ.
Clearly, Φ : R → R is monotone, but it is not maximal monotone. The maximal
˜ = (∂h∗∗ )−1 = ∂h∗ of Φ and is given by
monotone extension Φ
√
 q
for
|q|
>
2αβ

α





√


for |q| < 2αβ
 0
˜
Φ(q)
∈
√


[ αq , 0]
for q = − 2αβ






√

[0, αq ]
for q = 2αβ.
Thus,
λ − p (λ)
(h∗ )0 =
where
p (λ) = argmin{
We have
1
|p − λ|2 + h∗ (p)}.
2

√
λ
|λ| < 2αβ






√
√
√



2αβ
2αβ ≤ λ ≤ (1 + α ) 2αβ

p (λ) =
√
√
√


− 2αβ
2αβ ≤ −λ ≤ (1 + α ) 2αβ






√

λ

|λ| ≥ (1 + α ) 2αβ.
1+ α
(h∗ )0 (λ) =





















|λ| <
0
√
λ− 2αβ
√
λ+ 2αβ
λ
α+
√
√
√
(15.22)
2αβ
√
2αβ ≤ λ ≤ (1 + α ) 2αβ
√
2αβ ≤ −λ ≤ (1 + α ) 2αβ
√
|λ| ≥ (1 + α ) 2αβ.
Example (Authority optimization) We analyze the authority optimization of two unknowns u1 and u2 , i.e., the at least one of u1 and u2 is zero. We formulate such a
problem by using
α
h(u) = (|u|21 + |u2 |2 ) + β|u1 u2 |0 ,
(15.23)
2
The, we have
 1
√
2
2
|p1 |, |p2 | ≥ 2αβ

2α (|p1 | + |p2 | ) − β




√
∗
1
|p1 |2
|p1 | ≥ |p2 |, |p2 | ≤ 2αβ
h (p1 , p2 ) =
2α




√

1
2
|p2 | ≥ |p1 |, |p1 | ≤ 2αβ
2α |p2 |
168
Also,we have
√

1
|λ1 |, |λ2 | ≥ (1 + α ) 2αβ

1+ α (λ1 , λ2 )






√
√
√
√


( 1+1 λ1 , ± 2αβ)
|λ1 | ≥ (1 + α ) 2αβ,
2αβ ≤ |λ2 | ≤ (1 + α ) 2αβ


α





√
√
√
√


2αβ ≤ |λ1 | ≤ (1 + α ) 2αβ
(± 2αβ, 1+1 λ2 )
|λ2 | ≥ (1 + α ) 2αβ,


α





√

√
√

2αβ,
2αβ
≤
|λ
|
≤
(1
+
(λ
,
λ
)
|λ
|
≤
|λ
|
≤
(1
+
)
) 2αβ

2
2
2
2
1
α
α




√
√
√
p (λ1 , λ2 ) =
(λ1 , λ1 )
|λ1 | ≤ |λ2 | ≤ (1 + α ) 2αβ,
2αβ ≤ |λ1 | ≤ (1 + α ) 2αβ





√

λ1

( 1+
|λ1 | ≥ (1 + α )| ≥ |λ2 |, |λ2 | ≤ 2αβ
, λ2 )


α





√

λ2

|λ2 | ≥ (1 + α )| ≥ |λ1 |, |λ1 | ≤ 2αβ
(λ1 , 1+
)


α





√


(λ2 , λ2 )
|λ2 | ≤ |λ1 | ≤ (1 + α )|λ2 |, |λ2 | ≤ 2αβ






√

(λ1 , λ1 )
|λ1 | ≤ |λ2 | ≤ (1 + α )|λ1 |, |λ1 | ≤ 2αβ
(15.24)
15.3
Augmented Lagrangian method
Given λ ∈ H and c > 0, consider the augmented Lagrangian functional;
c
L(x, y, λ) = J(x) + (λ, x − y)H + |x − y|2H + N (y).
2
Since
Z
c
L(x, y, λ) = J(x) + ( |x − y|2 + (λ, x − y) + h(y)) dω
2
Ω
it follows from Theorem that if (¯
x, y¯) minimizes L(x, y, λ), then the necessary optimality condition is given by
−J 0 (¯
x) = λ + c (¯
x − y¯)
y¯ ∈ Φc (λ + c x
¯).
where
c
Φc (p) = argmax{(p, y) − |y|2 − h(y)}
2
Thus, the augmented Lagrangian update is given by
λn+1 = λn + c (xn − y n )
where (xn , y n ) solves
−J 0 (xn ) = λn + c (xn − y n )
y n ∈ Φc (λn + c xn ).
If we convexity N by the bi-conjugate of h we have
c
L(x, y, λ) = J(x) + (λ, x − y)H + |y − x|2H + N ∗∗ (y).
2
169
Since
hc (x, λ) = inf {h∗∗ (y) + (λ, x − y) +
y
c
|y − x|2 }
2
= inf {sup{(y, p) − h∗ (p)} + (λ, x − y) +
y
p
= sup{−
p
c
|y − x|2 }
2
1
|p − λ|2 + (x, p) − h∗ (p)}.
2c
we have
h0c (x, λ) = argmax{−
1
|p − λ|2 + (x, p) − h∗ (p)}.
2c
If we let
pc (λ) = argmin{
1
|p − λ|2 + h∗ (p)},
2c
then
h0c (x, λ) = pc (λ + cx)
and
(15.25)
λ − pc (λ)
.
(15.26)
c
˜ c of the graph Φc is given by (h∗c )0 (λ), the
Thus, the maximal monotone extension Φ
Yosida approximation of ∂h∗ (λ) and results in the update:
(h∗c )0 (λ) =
−J 0 (xn ) = λn + c (xn − y n )
y n = (h∗c )0 (λn + c xn ).
In the case c = 0, we have the multiplier method
λn+1 = λn + α (xn − y n ), α > 0
where (xn , y n ) solves
−J 0 (xn ) = λn
˜ n ).
y n ∈ Φ(λ
In the case λ = 0 we have the proximal iterate of the form
xn+1 = (h∗c )0 (xn + α J 0 (xn )).
15.4
Semismooth Newton method
The necessary optimality condition is written as
−J 0 (x) = λ
λ = h0c (x, λ) = pc (λ + c x).
170
is the semi-smooth equation. For the case of h = α2 |x|2 + β|x|0 we have from (15.22)(15.25)
√

λ
if |λ + cx| ≤ 2αβ






√
√
√


2αβ
if 2αβ ≤ λ + cx ≤ (1 + αc ) 2αβ

h0c (x, λ) =
√
√
√


− 2αβ if 2αβ ≤ −(λ + cx) ≤ (1 + αc ) 2αβ






√

λ
if |λ + cx| ≥ (1 + αc ) 2αβ.
α+c
Thus, the semi-smooth Newton update is

xn+1 = 0





√


n+1 =

2αβ
 λ
0 n+1
n+1
−J (x
)=λ
,
√


λn+1 = − 2αβ







n+1
xn+1 = λ α
if |λn + cxn | ≤
if
if
√
2αβ
√
√
2αβ < λn + cxn < (1 + αc ) 2αβ
√
√
2αβ < −(λn + cxn ) < (1 + αc ) 2αβ
√
if |λn + cxn | ≥ (1 + αc ) 2αβ.
For the case h(u1 , u2 ) = α2 (|u1 |2 + |u2 |2 ) + β|u1 u2 |0 . Let µn = λn + cun . From
(15.24) the semi-smooth Newton update is

√
n+1

|µn1 |, |µn2 | ≥ (1 + α ) 2αβ
un+1 = λ α







√
√
√
√
λn+1

n+1
1

=
u
= ± 2αβ
, λn+1
|µn1 | ≥ (1 + α ) 2αβ,
2αβ ≤ |µn2 | ≤ (1 + α ) 2αβ

1
2
α






√
√
√

 λn+1 = ±√2αβ, un+1 = λn+1
2

|µn2 | ≥ (1 + α ) 2αβ,
2αβ ≤ |µn1 | ≤ (1 + α ) 2αβ

2
α






√
√
√
λn+1


= 1α
, un+1
, λn+1
λn+1
|µn2 | ≤ |µn1 | ≤ (1 + α ) 2αβ,
2αβ ≤ |µn2 | ≤ (1 + α ) 2αβ

1
2
1




√
√
√
λn+1
= 2α
, un+1
= λn+1
λn+1
2αβ ≤ |λ1 | ≤ (1 + α ) 2αβ
|λ1 | ≤ |λ2 | ≤ (1 + α ) 2αβ,

2
1
2






√
λn+1

n+1
n+1
1

u
=
=0
|µn1 | ≥ (1 + α )| ≥ |µn2 |, |µn2 | ≤ 2αβ

1
α , u2






√

λn+1


un+1
= 2α , un+1
=0
|µn2 | ≥ (1 + α )| ≥ |µn1 |, |µn1 | ≤ 2αβ

2
1




√


n+1
n+1
n+1
n | ≤ |µn | ≤ (1 + )|µn |, |µn | ≤

λ
=
λ
u
=
0
|µ
2αβ

2
1
2
2
2
2
2
α





√

λn+1
= λn+1
, un+1
=0
|µn1 | ≤ |µn2 | ≤ (1 + α )|µn1 |, |µn1 | ≤ 2αβ.
1
2
1
15.5
Constrained optimization
Consider the constrained minimization problem of the form;
Z
min J(x, u) = (`(x) + h(u)) dω over u ∈ U,
Ω
171
(15.27)
subject to the equality constraint
˜ u) = Ex + f (x) + Bu = 0.
E(x,
(15.28)
Here U = {u ∈ L2 (Ω)m : u(ω) ∈ U a.e. in Ω}, where U is a closed convex subset of
Rm . Let X is a closed subspace of L2 (Ω)n and E : X × U → X ∗ with E ∈ L(X, X ∗ )
bounded invertible, B ∈ L(L2 (Ω)m , L2 (Ω)n ) and f : L2 (Ω)n → L2 (Ω)n Lipschitz. We
˜ u) = 0 has a unique solution x = x(u) ∈ X, given u ∈ U. We assume there
assume E(x,
exist a solution (¯
x, x
¯) to Problem (15.32). Also, we assume that the adjoint equation:
(E + fx (¯
x))∗ p + `0 (¯
x) = 0.
(15.29)
has a solution in X. It is a special case of (15.1) when J(u) is the implicit functional
of u:
Z
`(x(ω)) dω,
J(u) =
Ω
where x = x(u) ∈ X for u ∈ U is the unique solution to Ex + f (x) + Bu = 0.
To derive a necessary condition for this class of (nonconvex) problems we use a
maximum principle approach. For arbitrary s ∈ Ω, we shall utilize needle perturbations
of the optimal solution u
¯ defined by

on {ω : |ω − s| < δ}
 u
v(ω) =
(15.30)

u
¯(ω) otherwise,
where u ∈ U is constant and δ > 0 is sufficiently small so that {|ω − s| < δ} ⊂ Ω. The
following additional properties for the optimal state x
¯ and each perturbed state x(v)
will be used:

1

|x(v) − x
¯|2L2 (Ω) = o(meas({|ω − s| < δ})) 2




 R
(15.31)
¯) − `x (·, x
¯)(x(v) − x
¯) dω = O(|x(v) − x
¯|2L2 (Ω) )
Ω `(·, x(v)) − `(·, x






hf (·, x(v)) − f (·, x
¯) − fx (·, x
¯)(x(v) − x
¯), piX ∗ ,X = O(|x(v) − x
¯|2X )
Theorem Suppose (¯
x, u
¯) ∈ X × U is optimal for problem (15.32) , that p ∈ X satisfies
the adjoint equation (15.29) and that (??), (??), and (15.31) hold. Then we have the
necessary optimality condition that u
¯(ω) minimizes h(u) + (p, Bu) pointwise a.e. in Ω.
Proof: By the second property in (15.31) we have
R
0 ≤ J(v) − J(¯
u) = Ω `(·, x(v)) − `(·, x(¯
u)) + h(v) − h(¯
u) dω
=
R
¯)(x
Ω (`x (·, x
−x
¯) + h(v) − h(¯
u)) dω + O(|x − x
¯|2 ),
where
(`x (·, x
¯)(x − x
¯), p) = −h(E + fx (·, x
¯)(x − x
¯)), pi
and
0 = hE(x − x
¯) + f (·, x) − f (·, x
¯) + B(v − u
¯), pi
= hE(x − x
¯) + fx (·, x
¯)(x − x
¯) + B(v − u
¯), pi + O(|x − x
¯|2 )
172
Thus, we obtain
Z
((−B ∗ p, v − u
¯) + h(v) − h(¯
u)) dω.
0 ≤ J(v) − J(¯
u) ≤
Ω
Dividing this by meas({|ω − s| < δ}) > 0 , letting δ → 0, and using the first property
in (15.31) we obtain the desired result at a Lebesgue point s
ω → (−B ∗ p, v − u
¯) + h(v) − h(¯
u))
since h is lower semi continuous. Consider the relaxed problem of (15.32):
Z
min J(x, u) = (`(x) + h∗∗ (u)) dω over u ∈ U,
(15.32)
Ω
We obtain the pointwise necessary optimality

¯ + f 0 (¯
x) + B u
¯=0

 Ex



(E + f 0 (¯
x))∗ p + `0 (¯
x) = 0





u
¯ ∈ ∂h∗ (−B ∗ p),
or equivalently
λ = h0c (u, λ),
λ = B ∗ p.
where hc is the Yoshida-M approximation of h∗∗ .
15.6
Algoritm
Based the complementarity condition (??) we have the Primal dual active method for
Lp optimization (h(u) = α2 |u|2 + β |u|p .):
Primal-Dual Active method (Lp (Ω)-optimization)
1. Initialize u0 ∈ X and λ0 ∈ H. Set n = 0.
2. Solve for (y n+1 , un+1 , pn+1 )
Ey n+1 + Bun+1 = g, E ∗ pn+1 + `0 (y n+1 ) = 0, λn+1 = B ∗ pn+1
and
un+1 = −
λn+1
if ω ∈ {|λn + c un | > (µp + cup }
α + βp|un |p−2
λn+1 = µp if ω ∈ {µp ≤ λn + cun ≤ µp + cun }.
λn+1 = −µp if ω ∈ {µp ≤ −(λn + cun ) ≤ µp + cun }.
un+1 = 0, if ω ∈ {|λn + cun | ≤ µp }.
3. Stop, or set n = n + 1 and return to the second seep.
173
16
Exercise
Problem 1 If A is a symmetric matrix on Rn and b ∈ Rn is a vector. Let F (x) =
1 t
t
0
2 x Ax − b x. Show that F (x) = Ax − b.
Problem 2 Sequences {fn }, {gn }, {hn } are Cauchy in L2 (0, 1) but not on C[0, 1].
Problem 3 Show that ||x| − |y|| ≤ |x − y| and thus the norm is continuous.
Problem 4 Show that all norms are equivalent on a finite dimensional vector space.
Problem 5 Show that for a closed linear operator T , the domain D(T ) is a Banach
space if it is equipped by the graph norm
|x|D(T ) = (|x|2X + |T x|2Y )1/2 .
Problem 6 Consider the spline interpolation problem (3.8). Find the Riesz representations Fi and F˜j in H02 (0, 1). Complete the spline interpolation problem (3.8).
Problem 7 Derive the necessary optimality for L1 optimization (3.32) with U = [−1, 1].
Problem 8 Solve the constrained minimization in a Hilbert space X;
min
1 2
|x| − (a, x)
2
subject to
(yi , x) = bi , 1 ≤ i ≤ m1 ,
(˜
yj , x) ≤ cj , 1 ≤ j ≤ m2 .
1
2
over x ∈ X. We assume {yi }m
yj } m
i=1 , {˜
j=1 are linearly independent.
Problem 9 Solve the pointwise obstacle problem:
Z 1
1 du
min
( | |2 − f (x)u(x)) dx
0 2 dx
subject to
1
2
u( ) ≤ c1 , u( ) ≤ c2 ,
3
3
1
over a Hilbert space X = H0 (0, 1) for f = 1.
Problem 10 Find the differential form of the necessary optimality for
Z 1
1
du
1
min
( (a(x)| |2 +c(x)|u|2 )−f (x)u(x)) dx+ (α1 |u(0)|2 +α2 |u(1)|2 )−(g1 , u(0))−(g2 , u(1))
dx
2
0 2
over a Hilbert space X = H 1 (0, 1).
Problem 11 Let B ∈ Rm,n . Show that R(B) = Rm if and only if BB t is positive definite
on Rm . Show that x∗ = B t (BB t )−1 c defines a minimum norm solution to Bx = c.
Problem 12 Consider the optimal control problem (3.44) (Example Control problem)
ˆ = {u ∈ Rm : |u|∞ ≤ 1}. Derive the optimality condition (3.45).
with U
Problem 13 Consider the optimal control problem (3.44) (Example Control problem)
ˆ = {u ∈ Rm : |u|2 ≤ 1}. Find the orthogonal projection PC and derive the
with U
necessary optimality and develop the gradient method.
174
Problem 14 Let J(x) =
R1
0 sign(x(t))d(t) dt.
R1
0
|x(t)| dt is a functional on C(0, 1). Show that J 0 (d) =
Problem 15 Develop the Newton methods for the optimal control problem (3.44) withˆ = Rm ).
out the constraint on u (i.e., U
Problem 16 (1) Show that the conjugate functional
h∗ (p) = sup{(p, u) − h(u)}
u
is convex.
(2) The set-valued function Φ
Φ(p) = argmaxu {(p, u) − h(u)}
is monotone.
Problem 17 Find the graph Φ(p) for
max {pu − |u|0 }
|u|≤γ
and
max {pu − |u|1 }.
|u|≤γ
175