Dynamic Programming

Dynamic Programming
•
Dynamic programming is a technique for solving
problems with a recursive structure with the following
characteristics:
1. optimal substructure (principle of optimality): An
optimal solution to a problem can be decomposed
into optimal solutions for subproblems, This is the
defining characteristic of problems solving by DP.
Not every problem has this property.
2. a small number of subproblems: The total number
of sub-instances to be solved is small
3. overlapping subproblems: During the computation
same instances are referred to over and over again
Illustration
y
x
u
If
then
﹌﹌﹌
﹌﹌﹌
x
v
y
﹌﹌﹌
﹌﹌﹌
u
is the shortest path between u and v
x
y
is the shortest path between x and y
v
A Negative Example
• Consider the problem of finding shortest paths on graphs
that allow negative edge weights. Referring to the
following graph, observe that a-b-c is a shortest path
between a and c of length -2. However, unlike for the
positive weight shortest path problem, the subpaths of
this shortest path are not necessarily shortest paths. Thus,
the shortest path between and and b is a-c-b (of length -1)
not a-b (of length 1). The principle of optimality fails to
hold for this problem.
+2
a
+1
-3
b
c
The Airplane Problem
• You live in a country with n+1 cities, all equally spaced
along a line. The cities are numbered from 0 to n. You are
in city n and want to get to 0.
1. You can only travel by plane.
2. You can only take one plane per day. The fare is
determined by the distance (start city # - stop city
#) you fly. Flying distance i costs air_fare[i].
3. Spending the night in city i costs hotel_cost[i].
• Goal: Minimize the total cost to get to city 0. Ignore
how many days it takes.
Example (n=4)
i hotel_cost[i] air_fare[i]
1
2
1
2
2
4
3
5
9
4
0
16
A Divide and Conquer Solution
function mincost(i: integer): integer;
var k: integer;
begin
if i = 0 then return(0)
else return( 0min
[air_fare[i-k]+hotel_cost[k]+mincost(k)])
k i-1
end; /* mincost(n) is the answer */
• Let T(n) be the time required to solve a problem of size n.
n 1
T(n) =  T(i) + O(n), T(0) = O(1)
i 0
T(n) - T(n-1) = T(n-1) + k, T(n) = 2T(n-1) + k for n > 1
T(n) = O(2n) !!!
Discussion
A DP Solution
var mincost: array[0..n] of integers
begin
mincost[0] := 0;
for i := 1 to n do
min
mincost[i]:=
{(air_fare[i-k]+hotel_cost[k])+mincost[k]}
0 k i-1
end; /* the answer is in mincost[n] */
• This solution requires time Q(n2).
Observations
•
•
•
•
Original problem: Go from city n to city 0.
Generalized Problem: Go from city i to city 0.
Another generalized problem: Go from city n to city i.
Note: A more generalized problem is: "Go from city i to
j (i ≥ j)". But this idea leads to a less efficient solution.
• The method calculating one solution from some others is
exactly the same as in the recursive version (except that
function calls are replaced by array references).
• Usually we get the idea for this method by thinking
recursively. However, the final solution is iterative.
– Divide and conquer: top-down.
– Dynamic programming: bottom-up.
Matrix Multiplication
• With the standard matrix multiplication method, how
many scalar multiplications are needed for computing
AB where A and B are of dimensions pq and qr?
1. pqr
2. pq2r
Chained Matrix Multiplication
• Need to compute the product M = A1  A2    An of
matrices A1, , An . By associativity (i.e., (AB)C =
A(BC)) of multiplication we can compute M in various
order. Use parentheses to describe the order.
• The matrix-chain multiplication problem asks: What is
the minimum cost for computing M?
Example
• A is 10100, B 10010, and C 10100
(AB)C
• How many operations for A(BC)? 200,000
Problem Formulation
• A product of matrices is fully parenthesized if it is either
a single matrix or the product of two fully parenthesized
matrix products.
• For example, for the product ABCD, there are five
possible ways to fully parenthesize the product:
(A(B(CD))), (A((BC)D)), ((A(BC))D),
((AB)(CD)), (((AB)C)D).
Problem Formulation
• Now the matrix-chain multiplication problem is
equivalent to:
Given a list p = (p0, p1,, pn) of positive integers,
compute the optimal-cost full-parenthesization
of the chain (A1, A2,, An),
where for each i, the ith matrix is of dimension
pi-1 pi and the cost is measured by # of scalar
multiplications.
Exhaustive Search
• Is brute-force search efficient for this problem?
• Let P(n) be the number of distinct full parenthesizations
of n matrices. Then
• Solving this, we obtain P(n) = C(n – 1), where
• Thus, the brute-force search is too inefficient.
DP Approach: O(n3)
• We need to cut the chain A1,, An somewhere in the
middle, i.e., pick some k; 1 k  n-1 and compute B(k) =
A1,, Ak, C(k) = Ak+1,, An, and B(k)C(k).
• Now, if we already know the optimal cost of computing
B(k) and C(k) for all k; 1 k  n-1, then we can compute
the optimal cost for the entire product by finding a k that
minimize “the optimal cost for computing B(k)"
+ “the optimal cost for computing C(k)"
+ p0pkpn
• Use the same idea to compute the minimum costs for all
the subproducts. This results in a bottom-up computation
of optimal costs.
Generalization
•
•
Original problem: Compute the cost for A1A2 An
Generalized problem: Compute the cost for AiAi+1 Aj
1. For any k such that i ≤ k < j, divide product into two
products: (AiAi+1 Ak), (Ak+1Ak+2 Aj). Let m[i, j] be
the optimal cost for computing AiAi+1 Aj. Then this
grouping has cost
m[i, k] + m[k+1, j] + pi-1pkpj
2. Then choose minimum for all such k.
•
For each i; j, 1  i  j  n, let s[i; j] be the smallest cutpoint that provides the optimal cost for m[i, j].
Problem Ordering
–
To get a DP solution, order the problems:
m[1,1] m[2,2]... m[n-1,n-1] m[n,n]
m[1,2] m[2,3]... m[n-1,n]
•••
m[1,n-1] m[2,n]
m[1,n]
1. for i = 1 to n, set m[i; i] = 0.
2. for ℓ = 2 to n,
compute s- and m-values
for all length ℓ subchains
(Ai,, Aj).
Computing Size ℓ Chains
The Parenthesization
•
Once all the entries of s have been computed, we can
compute the parenthesization that provides the optimal
cost:
1. Print-Chain(i; j)
2. /* print the parenthesization for Ai Aj */
3.
print("(")
4.
Print-Chain(i; s[i; j])
5.
Print-Chain(s[i; j] +1; j)
6.
print(")")
Example
•
M1  M2  M3  M4
1020 2050 501 1100
m[1,1]=0
m[2,2]=0
m[3,3]=0
m[1,2]=10000
m[2,3]=1000 m[3,4]=5000
m[1,3]=1200
m[2,4]=3000
m[1.4]=2200
m[4,4]=0
• For example to compute m[1,4] choose the best of
20000
M1  (M2  M3  M4)
0 + 3000 + 20000 = 23000
0
3000
(M1  M2) (M3  M4)
10000
5000
10000 + 5000 + 50000 = 65000
(M1  M2  M3)  M4
1200
0
1200 + 0 + 1000 = 2200
((M1  (M2  M3))  M4)
Characteristics
1. optimal substructure: If s[1, n] = k, then an optimal full
parenthesization contains those of (A1 Ak) and (Ak+1
An).
2. a small number of subproblems: The number of
subproblems is the number of (i; j) with 1 i  j  n,
which is n(n+1)/2 .
3. overlapping subproblems: m[i, j’] and m[i’, j] are
referred to during the computation of m[i, j], for every
i < i’  j and i  j’ < j.
• Because of the last property, computing an m-entry by
recursive calls takes exponentially many steps.
Longest Common Subsequence
• A sequence Z = <z1, z2,  , zk> is a subsequence of a
sequence X = <x1, x2,  , xm> if Z can be generated by
striking out some (or none) elements from X.
• For example, <b, c, d, b> is a subsequence of <a, b, c, a,
d, c, a, b>
• The longest common subsequence problem is the
problem of finding, for given two sequences
X = <x1, x2,  , xm> and Y = <y1, y2,  , yn>, a maximumlength common subsequence of X and Y.
• Brute-force search for LCS requires exponentially many
steps because if m  n, there are
subsequences
n
  candidate

i 1  i 
m
Example
The Optimal-Substructure of LCS
•
For a sequence Z = <z1, z2,  , zk> and i, 1  i  k. let Zi
denote the length i prefix of Z, i.e., Zi = <z1, z2,  , zi>.
•
Theorem: Let X = <x1,  , xm> and Y= <y1,  , yn>
1. If xm = yn, then an LCS of Xm-1 and Yn-1 followed by
xm (= yn) is an LCS of X and Y.
2. If xm  yn, then an LCS of X and Y is either an LCS
of Xm-1 and Y or an LCS of X and Yn-1.
Proof of the Theorem
1. Suppose xm and yn are the same symbol, say s. Take an
LCS Z of X and Y. Generation of Z should need either
xm or yn. Otherwise, appending s to Z would make a
LCS. If necessary, modify the production of Z from X
(from Y) so that its last element is xm (yn). Then Z is a
common subsequence W of Xm-1 or Yn-1 followed by a s.
By the maximality of Z, W should be an LCS.
2. If xm  yn, then for any LCS Z of X and Y, generation of
Z cannot use both xm and yn. So, Z is either an LCS of
Xm-1 and Y or an LCS of X and Yn-1.
Computation Strategy
•
If xm = yn, then append xm to an LCS of Xm-1 and Yn-1.
Otherwise, compute the longer of an LCS of X and Yn-1
and Xm-1 and Y.
• Let c[i, j] be the length of an LCS of Xi and Yj. We get
the recurrence:
0
if i  0 or j  0

c[i, j]  c[i - 1, j - 1]  1
if i, j  0 and x i  y j
max(c[i, j - 1], c[i - 1, j]) if i, j  0 and x  y
i
j

•
Let b[i, j] maintain the choice made for (Xi, Yj). With
the b-table we can reconstruct an LCS.
Example:
• Here numeric
entries are
c-values and
Arrows are
b-values.
The Other 2 Characteristics of DP
• a small number of subproblems: There are only (m+1)(n
+1) entries in c (and in b).
• overlapping subproblems: c[i; j] may be eventually
referenced to in the process of computing c[i’, j’] for any
i’  i and j’  j.
• Time (and space) complexity of the algorithm : O(mn).
Optimal Polygon Triangulation
• A polygon is a closed collection of lines (called sides) in
the plane. A point joining two sides is a vertex. The line
segment between two nonadjacent nodes is a chord. A
polygon is convex if any chord is either on the boundary
or in the interior of the polygon.
• A polygon is represented by listing its vertices in
counterclockwise
order.
• <v0, v1,  , vn-1>
represents the right
polygon.
Triangulation
• A triangulation of a polygon is a set T of chords of the
polygon that divides the polygon into disjoint triangles.
Every triangulation of an n-vertex polygon has n – 3
chords and divides the polygon into n – 2 triangles.
Optimal Triangulation
• The weight of a triangle vivjvk, denoted by w(vivjvk),
is |vivj| + |vjvk|+ |vkvi|, where |vivj| is the Euclidean
distance between vi and vj.
• The optimal (polygon) triangulation problem is the
problem of finding a triangulation of a convex polygon
that minimizes the sum of the weights of the triangles in
the triangulation.
• Given a triangulation with chords ℓ1, …, ℓn-3, the weightsum can be rewritten as:
n
n 3
• Note that vn = v0.
v v 2 

i 1
i -1 i

i 1
i
Observation
• For each i, j, 0  i < j  n, let Pij denote the polygon <vi,
vi+1,  , vj-1 , vj >, where either i > 0 or j < n.
• The polygon Pij consists of j - i consecutive sides vivi+1,
 , v v and one more line v v which is a side if (i, j) = (0,
j-1 j
j i
n-1) or (1, n) and a chord otherwise.
• Let t[i, j] be the sum of the
chord length in an optimal
triangulation of Pij.
• The ultimate goal is to
compute t[0, n-1].
The DP Approach
• Idea: In any triangulation, there is a unique triangle that
has vivj as a side. So, try each k, i < k < j, and cut the
polygon by lines vivk and vjvk (one of these lines can be a
side, but not both) thereby generating a triangle and the
rest, which consists of one or two polygons.
The Key Step
•
Then the sum of the chord lengths are:
1. k = i +1: t[i +1, j] + |vi+1vj|.
2. k = j - 1: t[i, j -1] + |vivj-1|.
3. i +1 < k < j –1: t[i, k] + t[k, j] + |vivk| + |vjvk|
– Pick a k that provides the smallest value.
The Algorithm
• Let t(i, j) be the optimal cost of a triangulation of Pij.
Observe: t(i; i) = 0 and for i < j,
• Now a dynamic program algorithm is straightforward:
just compute t(i, j)'s from small intervals (j – i) to large
intervals for all i, j, and t(0, n-1) is the solution.
Optimal Binary Search Tree
• Given a sorted list with n elements and the probability for
accessing the keys (and leaves). Find the binary search
tree with smallest cost.
• Here, "leaves" are "fictitious" nodes, added to account for
unsuccessful searches.
• If p[i] is the prob. of node i, and q[i] is the prob. of an
unsuccessful search between node i and i+1. The cost of a
tree is:
n
n
i 1
i 1
p[i]*level[node i]+q[i]*(level[leaf i]-1)
Example
The Idea
• Let c(i, j) = minimum cost for tree consisting of nodes
i, …, j.
• In calculating c(i, j), we must find the best chance for the
root. How to determine this?
– Try all possibilities, k, i ≤ k ≤ j
• Clearly, the 2 subtrees should be the
optimal trees for the given set of nodes
• Clearly, the cost of this tree is related
to c(i, k-1) and c(k+1, j). However,
the two subtrees have been "pushed
down" one level.
• Tree does not have to be balanced to be optimal
• Analysis of the algorithm
– Body of the loop takes time O(n) (finding minimun of
≤ n elements), and it is evaluated O(n2) times
– Total O(n3)
CFG Parsing: CYK Algorithm
• Problem: Given a context-free grammar G in Chomsky
Normal Form, and a word w, decide if w is in L(G), the
language generated by G.
• Example: Given G, L(G) ={12n: n > 1}, by
S  AA, A  BB | AA, B  1.
• 16 is in G by derivation:
S  AA  AAA * BBBBBB * 111111.
• Dynamic Programming solution: Exercise.