Download Report

CSE 101 Homework 3
March 4, 2015
CSE 101: Design and Analysis of Algorithms
Winter 2015
Homework 3
Due: February 12, 2015
1. Given an array of non-negative integers, you are initially positioned at the first index of the array
(we will call this index 0). Each element in the array represents your maximum jump length at that
position. Your goal is to reach the last index in the minimum number of jumps.
For example:
Suppose you are given array A = [2, 3, 1, 1, 4]. The minimum number of jumps to reach the last index
is 2. (Starting with index 0, move to index 1 with the first jump. Move 3 indices with the second jump
and reach index 4.)
Give a linear time algorithm to find the minimum number of jumps required to reach the last index.
-make a graph with each index being a vertex and an edge from each vertex to its jump destination
-then you have a DAG and can find shortest path
(a) Algorithm Description
Use a greedy algorithm. With each jump, jump to the index that would theoretically allow you
to reach the highest index possible with the next jump. That is, jump the the index i with the
highest i + A[i], where i is in the set of possible indices you can jump to from your current index.
(b) Pseudocode
Array of positive integers A
return minimumJumpSequence(A, 0)
define minimumJumpSequence(A, startIndex)
1 currentIndex = startIndex
2 solution = []
3 while currentIndex ! = n − 1
4
bestIndex = null
5
bestReachability = 0 6
for possibleIndex in indices reachable from currentIndex
7
currentReachability = max(possibleIndex + A[possibleIndex], len(A)-1)
8
if currentReachability > bestReachability
9
bestIndex = possibleIndex
10
currentIndex = bestIndex
11
solution.append(bestIndex)
12 return solution
(c) Proof of Correctness
Assume for contradiction that an optimal solution exists that is strictly better than our greedy
solution. This means that the two solutions must be different. Let i be the last index that both
solutions reach before diverging and becoming different.
Say the greedy solution jumps to some index jg and the optimal solution jumps to some index
jo . We know that because of the way our greedy solution is set up, no indices before jg can
1
reach farther in one jump than jg , otherwise they would have been chosen instead of jg by our
algorithm.
By the way we designed our algorithm, every index reachable by jo must also necessarily be
reachable by jg , because the greedy algorithm chose the index out of the reachable indices that
could reach the furthest index in the array. This means that on the next jump, the greedy
algorithm gets to choose from the same indices as the optimal solution, and possibly more. Since
the greedy algorithm chooses the index with the largest reach, it is never possible for the optimal
algorithm to be able to reach indices that the greedy algorithm cannot. Therefore, it is impossible
for the optimal algorithm to reach the end of the array in fewer jumps than the greedy algorithm,
as that would require at some step for the optimal algorithm to have a larger reachable range than
the greedy algorithm. This contradicts our assumption that an optimal algorithm exists that is
strictly better than the greedy algorithm. Therefore, the greedy algorithm must be optimal.
(d) Running Time analysis
You only need to do one i + A[i] calculation per index in the array. That will take O(n) time.
In addition, before each jump, you must calculate the minimum of i + A[i] values of the numbers
reachable by a jump from the current index. However, any such index can be used in at most two
max calculations.
Suppose you start at some index j, and you can jump a maximum of A[j] indices. Consider some
index k that can be jumped to from j. j + A[k] will be used in calculating the maximum that will
dictate which index to jump to next. There are two possible scenarios that can arise from this
situation:
i. The next jump jumps to an index that is k or greater. Then, k + A[k] will not be used in
another maximum calculation for the rest of the algorithm’s duration, since it is only possible
to jump forward.
ii. The next jump lands at an index less than k. Then, k+A[k] will be used in the next maximum
calculation. However, since our algorithm chose the index before k, call it l, it means that
l + A[l] is greater than k + A[k]. The same logic applies to any indices between l and k. This
means that the algorithm will never choose an index before k, because it is possible to jump
to index l + A[l] from l, and that index already exceeds k + A[k], regardless of the value of
A[l + A[l]].
Therefore, each of the n indices can be used in a maximum of two maximum calculations. Overall,
this means that the algorithm has an O(n) running time.
0
2. A feedback edge set of an undirected graph G = (V, E) is a subset of edges E ⊂ E that intersects
0
every cycle of the graph. Thus, removing the edges E will render the graph acyclic.
Give an efficient algorithm for the following problem:
Input: Undirected graph G = (V, E) with positive edge weights w
Pe .
0
Output: A feedback edge set E ⊂ E of minimum total weight
we .
e∈E 0
-negate edges and run Kruskal’s. Take rejected edges and everything left at the end.
(a) Algorithm Description
First of all, negate the weights of all of the edges in the graph. Then, run Kruskal’s algorithm.
The feedback edge set will be every edge not in the resultant minimum spanning tree.
(b) Pseudocode
Set of vertices V
Set of edges E
return getFeedbackEdgeSet(V , E)
define getFeedbackEdgeSet(V , E)
1 for edge in E
2
edge.weight = -edge.weight
2
3
4
6
mst = Kruskals(V , E)
solution = E 5 for edge in mst.edges:
solution.remove(edge) 7 return solution
(c) Proof of Correctness
By negating the edges and running Kruskal’s algorithm, we can ensure that the largest edges
will be chosen first for the minimum spanning tree (or forest if the graph is not assumed to be
connected). Kruskal’s rejects edges if adding those edges would connect two vertices that are
already in the same connected component. This means that it rejects edges that if added to the
current partial minimum spanning tree (or forest) would create a cycle. By negating the edge
weights and running Kruskal’s, we are finding the edges that form the maximum spanning tree
(or forest) in the original graph.
Call G0 the graph with the feedback edge set removed. Our feedback edge set must have the
property that if any of the edges are added back into G0 , the graph would have a cycle containing
that edge. We know this because all edges have positive weights, so that if any edges that do not
break up any cycles are added to the set, it would not have minimal weight because it would still
be a feedback edge set with that edge removed.
This means that every edge in the feedback edge set must have as endpoints two vertices that
are connected in G0 . This means that G0 must be a spanning forest, since adding back any edges
must create a cycle but G0 itself does not have any cycles.
Since the graph G has a set number of edges with set weights, in order for the feedback edge
set to have minimal total edge weights, G0 must have maximal total edge weights. By negating
the edge weights and running Kruskal’s, we are able to find the maximal spanning forest, so its
complement must be the minimum weight feedback edge set.
Do not take off points for the tree/forest distinction.
(d) Running Time analysis
You need to negate all edge weights. That takes O(E) time. Then, running Kruskal’s takes
O(E log V ) time. Finally, finding all edges not in the MST takes O(E) time. This means the
overall running time is O(E log V ).
3. (DPV textbook, Problem 5.20.) Give a linear-time algorithm that takes as input a tree and determines
whether it has a perfect matching: a set of edges that touches each node once.
(a) Algorithm Description
We will use a greedy approach to solve this problem. We know that the leaves of a tree must by
definition only have one parent and no children. Therefore, in order for this tree to have a perfect
matching, it must contain the edges that connect the leaves to the tree. We will use a greedy
approach to solve this problem. First, we will choose a leaf and add the edge connecting to it to
our solution. Then, we will remove the vertex on the other side of the edge from the graph, along
with any other edges that connect to it. We will then choose another leaf and repeat until either
all vertices are gone from the graph or there is a vertex with no remaining edges connected to it.
(b) Pseudocode
Set of vertices V
Set of edges E
return hasPerfectMatching(V , E)
define hasPerfectMatching(V , E)
1 while !V .isEmpty()
2
leaf = V .getALeaf()
3
if E.isEmpty()
4
return false 5
parent = leaf.parent
6
for edge in all edges connected to parent
7
E.remove(edge)
8
V .remove(leaf)
3
9
10
V .remove(parent)
return true
(c) Proof of Correctness
Leaves of a tree can only be connected to one edge: the edge to their parents. Therefore, any
perfect matching must contain the edges that connect to the leaves of the tree, or else the perfect
matching would not touch the leaves. Therefore, in constructing a solution, we can add those edges
to our potential solution without loss of generality because any perfect matching must contain
those edges. Therefore, by choosing this edge, we are not eliminating any possible solutions.
By definition, a perfect matching must touch each vertex exactly once. This means that we can
remove all of the vertices that connect to the edges that we have added to our solution, as well
as all of the other edges that connect to those vertices. This is because a perfect matching must
contain edges that touch each vertex exactly once. Since we already selected an edge that touches
the vertices we are removing, we cannot use any other edges that touch those vertices in our
solution. As a result, given the edges we have chosen so far, we are not eliminating any possible
solutions.
As a result, since our algorithm can never eliminate possible solutions, if a solution exists our
algorithm is guaranteed to find it.
(d) Running Time Analysis
For every vertex in the graph, we must either add at most one edge to our solution and check if
it has any edges connected to it. We will remove edges from the graph once per edge. Therefore,
our algorithm requires a constant amount of work per vertex and a constant amout of work per
edge. Therefore, our algorithm will run in O(V + E) time.
4. (DPV textbook, Problem 5.32.) A server has n customers waiting to be served. The service time
required by each customer is known in advance: it is ti minutes for customer i. So if, for example, the
i
P
customers are served in order of increasing i, then the ith customer has to wait
tj minutes.
j=1
We wish to minimize the total waiting time
T =
n
P
(time spent waiting by customer i).
i=1
Give an efficient algorithm for computing the optimal order in which to process the customers.
-greedy algorithm. Choose people who take the shortest amount of time to be served first.
(a) Algorithm Description Use a greedy algorithm. Serve the people who take the shortest amount
of time to serve first. That is, sort the customers by their t values and serve them in that order.
(b) Pseudocode
1 mergesort(customers) //sort by increasing amount of time it takes to serve them
(c) Proof of Correctness
Assume for contradiction that there exists some optimal solution that is strictly better than our
greedy solution. To be strictly better than our greedy solution, it must be different from our
greedy solution. Since our greedy solution sorts the customers by the amount of time it would
take to serve them, the optimal solution must have at least one pair of consecutive customers
where the second customer takes less time to serve than the first customer. From now on, I will
say that a pair of customers that satisfy these conditions are ”out of order”. Call these customers
ci and ci+1 and call their service times ti and ti+1 . By our assumption, we know ti > ti+1 .
We can try flipping their order around so that ci+1 is served before ci . This will not change the
waiting times of any other customers, as switching them will not change the time the schedule
starts serving ci and ci+1 or the time it finishes serving them. If we switch them, ci ’s waiting time
will increase by ti+1 , but ci+1 ’s waiting time will increase by ti , so since we assumed ti > ti+1 , that
4
causes the overall waiting time to go down. Therefore, switching a pair of out of order customers
can only decrease the total waiting time.
Eventually, by switching all out of order customers until none are left, we arrive at the sorted
solution we found with our greedy algorithm. Every swap will decrease the total amount of waiting
time. This raises a contradiction, as we assumed the optimal solution was better than the greedy
solution. Therefore, it is impossible for there to exist a solution better than the greedy solution.
The greedy solution must then be optimal.
(d) Running Time analysis
Mergesorting the customers will take O(n log n) time, where n is the total number of customers.
5. (CLRS textbook, Problem 22.1-4.) Suppose that a weighted, directed graph G = (V, E) has a negativeweight cycle. Give an efficient algorithm to list the vertices of one such cycle. Prove that your algorithm
is correct.
-only one negative weight cycle -run BF 2V − 1 times and see which vertices change between iterations
V and 2V − 1
-negative cycle means that distance values will not converge
-every iteration increases max number of edges in path by 1 -2V − 1 is enough iterations to have a full
2 loops around the cycle
-Still O(V E) because you only increase your work by a power of 2
(a) Algorithm Description
To solve this problem, we will run Bellman-Ford on the graph, but instead of stopping at V − 1
iterations, we will stop at 2V −1 iterations. We will compare the distance values at V −1 iterations
and 2V − 1 iterations. For any vertex v with a changed distance value, there must exist some path
from some vertex on the negative cycle to v. Therefore, if we start at v and follow our previous
vertex pointers backwards until we have visited a vertex twice, we will have found the vertices in
our negative cycle.
(b) Pseudocode
Set of vertices V
Set of edges E
source vertex s findNegativeCycle(V , E, s)
define modifiedBellmanFord(V , E, s)
1 run V − 1 iterations of Bellman-Ford
2 let dist1(v) be the distance after V − 1 iterations of Bellman-Ford for every v ∈ V
3 run another V iterations of Bellman-Ford
4 let dist2(v) be the distance after 2V − 1 iterations of Bellman-Ford for every v ∈ V
5 for every vertex in V
6
if dist1(vertex) ! = dist2(vertex)
8
possibleCycleVertices = [vertex]
9
nextV ertex = vertex.prev //The previous vertex as set by Bellman-Ford
10
while !possibleCycleVertices.contains(nextV ertex)
11
possibleCycleVertices.append(nextV ertex)
12
nextV ertex = nextV ertex.prev
13
return the vertices in possibleCycleVertices from the repeated vertex onward
14 return []
(c) Proof of Correctness
In the absence of negative cycles, the distance values obtained by running Bellman-Ford for V − 1
iterations on a graph will have converged to their true values, meaning that if more iterations are
run, the distances will remain the same. In the presence of a negative cycle, however, it is still
possible to achieve a shorter path by going around the negative cycle.
Recall that at the k th iteration of Bellman-Ford, we have found the shortest path of at most k
5
edges. Furthermore, a cycle in a graph with V vertices can have at most V edges, if it runs
through every single vertex in the graph before returning to a previous vertex. Therefore, if we
run an additional V iterations of Bellman-Ford, we are adding V edges to the maximum size of
the shortest path we are finding. This ensures that the shortest path to any vertex v on the
negative cycle must be to take the shortest path to v from s and then to go around the negative
cycle at least once. This ensures that by following the pointers to previous vertices starting at v
we are able to find every vertex in the negative cycle.
It is possible for vertices not in the negative cycle to change between iterations V − 1 and 2V − 1,
but the only way for that to happen is if one of their ancestors is on the negative cycle. Therefore,
even if you choose to start at a vertex not on the negative cycle, following the previous vertex
pointer will eventually lead you to the negative cycle.
(d) Running Time analysis
Recall that every iteration of Bellman-Ford takes O(E) time. We are doing 2V − 1 iterations, so
just running Bellman-Ford takes O((2V − 1)E) = O(V E) time. To recover the solution, we must
folow the previous vertex pointer backwards along the graph for a maximum of V vertices. This
adds an additional O(V ) to our running time, but that term gets dominated by O(V E) so our
overall running time is O(V E).
6. Suppose you are only interested in finding shortest paths in directed acyclic graphs. Modify the
Bellman-Ford algorithm so that it finds the shortest paths from the starting vertex s ∈ V to all other
vertices in one iteration.
-sort vertices in topological order
-expand out vertices in topological order, using values from the current iteration
-O(V log V ) for the sort, O(E) for the one iteration of BF, for O(V log V ).
(a) Algorithm Description
The first thing we have to do is to sort the vertices in topological order.
Next, iterating through every vertex in topological order, we will update the distances of each
vertex based on the distance of its parents, the same way you do updates in Bellman-Ford. At the
end of the first iteration, each vertex will have its true minimum distance from the source vertex.
(b) Pseudocode
Set of vertices V
Set of edges E
source vertex s Sort V in topological order
modifiedBellmanFord(sorted version of V , E, s)
define modifiedBellmanFord(V , E, s)
1 for vertex in V
2
dist(vertex) = ∞
3 dist(s) = 0
4 for vertex in V − s
5
dist(vertex) = minparent ∈ parents(vertex) dist(parent) + length(parent, vertex)
6
prev(vertex) = whichever parent led to the minimum distance
(c) Proof of Correctness
Because we go through the vertices in topological order, we will set the distance of a vertex’s
parents before we set the distance of the vertex itself. Furthermore, since we are given that the
graph is a DAG, then we can be sure that no negative cycles exist, so that when we update the
distance of a vertex we are considering every possible path from the source vertex to that vertex.
Any such path must come through one of the parents of that vertex, and if we are updating the
vertices in topological order, that guarantees that any vertex’s parents will be updated before the
vertex itself.
When we update a vertex, we can be sure that there are no shorter paths to any of its parents that
have not been considered yet. This logic is similar to the logic that allows us to run Dijkstra’s
6
algorithm with one pass through every vertex instead of with multiple iterations like BellmanFord. As a result, when we update the distance of each vertex to be the minimum combination
of the distance to a parent plus the length of the edge from the parent to the vertex, we can be
assured that the distance of the parents will not change, so that the minimum distance to the
current vertex we are setting will not change.
(d) Running Time analysis
Sorting the vertices in topological order will take O(E+V ). Running one iteration of Bellman-Ford
takes O(E) time. Thus, the overall running time is O(E + V ).
7