e2718281828459045

Category: Dynamic Programming

Maximum sum increasing subsequence

Maximum sum increasing subsequence: Given a sequence of positive numbers S[1..n], find an increasing subsequence, maximizing the summation of the elements in the subsequence.
In other words, the goal is to find some k indices, i_1, \cdots, i_k, 1 \leq i_1 < i_2 < \cdots < i_k \leq n, such that
(1) S[i_j] \le S[i_{j+1}] for 1 \leq j \leq k-1,
(2) \sum_{j = 1}^k S[i_j] is maximum over all increasing subsequence of S[1..n].

One can use dynamic programming to achieve time complexity O(n^2): let {\rm MIS}(i), where 1 \leq i \leq n, denotes the maximum sum increasing subsequence of S[1..i] which ends at S[i]. Then we have the recursion formula
{\rm MIS}(i) = S[i] + \max_{1 \leq j \leq i-1, S[j] < S[i]}\{{\rm MIS}(j)\}, for 2 \leq i \leq n and {\rm MIS}(1) = S[1].

The time complexity is \sum_{i = 1}^n O(i) = O(n^2) as for each {\rm MIS}(i), we spend O(i) time to compute it.

Is there any faster algorithm to solve this problem? The answer is yes. In the following, we describe an algorithm with time complexity O(n \log n). The algorithm is from here. The key to speed up the algorithm is to compute \max_{1 \leq j \leq i-1, S[j] < S[i]}\{{\rm MIS}(j)\} within O(\log n) time.

We first describe a slow algorithm, then we show how to use balanced binary search tree to improve the time complexity. Let A[1..n] be an array, whose values are all 0 initially. Define {\rm rank}(i) to be the rank of S[i] in the sorted sequence of S[1..n]. At the end of algorithm, the entry A[j] will contain {\rm MIS}(k), where k = {\rm rank}^{-1}(j), that is, the rank of S[k] is j in the sorted sequence of S[1..n].
We now iterate through each element in S[1..n], and fill up A[1..n]. In i^{\rm th} iteration, for 1 \leq i \leq n, we set A[{\rm rank}(i)] \leftarrow S[i] + \max(A[1..{\rm rank}(i)-1]) (note that we might be taking maximum over empty set, and maximum of an empty set is 0 in that case). The value, \max(A[1..n]) is the final answer.

Before giving the formal proof, let’s see an example first. Suppose we have the sequence S[1..6] = {5,7,4,2,3,8}. After sorting, we get the rank of each element as {\rm Rank}[1..6] = {4,5,3,1,2,6}, where {\rm Rank}[i] is the rank of S[i]. One can easily verify this: the rank of S[1], which is 5, is the 4^{\rm th} element in the sorted sequence 2,3,4,5,7,8; the rank of S[2], which is 7, is the 5^{\rm th} element in the sorted sequence, and so on.

Initially, the array A[1..6] = {0,0,0,0,0,0}.
i = 1
{\rm rank}(i) = 4, so we set A[4] \leftarrow S[1]. So A becomes 0,0,0,5,0,0;

i = 2
{\rm rank}(i) = 5, we compute \max(A[1..4]) to be 5, so we set A[5] to be S[2] + 5, which is 12. So A becomes 0,0,0,5,12,0;

i = 3
{\rm rank}(i) = 3, we compute \max(A[1..2]) to be 0, so we set A[3] to be S[3]+0 = 4. So A becomes 0,0,4,5,12,0;

i = 4
{\rm rank}(i) = 1, we compute \max(A[1..0]) to be 0 (the maximum of an empty set is 0), so we set A[1] to be S[4]+0 = 2. So A becomes 2,0,4,5,12,0;

i = 5
{\rm rank}(i) = 2, we compute \max(A[1..1]) to be 2, so we set A[2] to be S[5]+2 = 5. So A becomes 2,5,4,5,12,0.

i =6
{\rm rank}(i) = 6, we compute \max(A[1..5]) to be 12, so we set A[6] to be S[6]+12 = 20. So A becomes 2,5,4,5,12,20.

Hence, the final answer is \max(A[1..6]) = 20.

The correctness of the algorithm is now clear: when we compute {\rm MIS}(i), the entry A[j], 1 \leq j \leq i-1, corresponds to {\rm MIS}(j) if j < i, that is, S[j] appears before S[i] in the input sequence; otherwise, although S[j] < S[i], we have not encountered S[j] yet, so the entry is 0, hence they do not affect the value \max(A[1..{\rm rank}(i)-1]). Therefore, we are effectively computing S[i] + \max_{1 \leq j \leq i-1, S[j] < S[i]}\{{\rm MIS}(j)\}.

The time complexity of the algorithm, is clearly O(n^2). But it is easy to see how we can improve the time complexity by using balanced binary search tree now: in the computation \max\{A[1..i]\}, we are repeatedly querying the maximum over the range [1..i] in the array A. We can build a balanced binary search tree for A, whose leaves correspond to entries in A[i], for 1 \leq i \leq n, and the internal node correspond to the maximum value of all the leaves in the subtree rooted at the internal node. For our example, the root of the balanced binary search tree contains the maximum value of A[1..6], and its left child is in charge of the sub-range A[1..3], and its right child is in charge of the sub-range A[4..6], and so on. Each leaf corresponds to a degenerated range A[i..i], for 1 \leq i \leq 6. Both query and update in a balanced binary search tree is O(\log n) (the tree height), hence the time complexity of the algorithm is O(n \log n).

Naturally, to implement this, we can use Fenwick tree.

Dynamic Programming in Practice: Optimal Data Segmentation

I have been reviewing dynamic programming recently, and today I saw a nice application of dynamic programming used in Astrophysics: Dynamic Programming in Python: Bayesian Blocks.

The following is the problem in astrophysics phrased in a purely layman’s way: given a set of data points in the plane, each point is a pair — (time point, observation value), we would like to extract as much information as possible from these observations — which roughly speaking, involves computing a histogram of the observations. The difficulty here is determining the bin size: if the bin size is too large, we loose important information; however, at the same time, we would like to suppress the noise in the observations. It seems that one approach used in the past is by adjusting the bin size manually by a trial-and-error fashion: if the histogram is too jagged, then decrease the bin size; and if it is too smooth, increase the bin size.

The novel idea in this paper is that one first decides a fitness function/cost function over partitions of data using a Bayesian likelihood framework, and then tries to compute an optimum partition of the data points according to this function.

Here is the same problem phrased in the language of computer science:

Let P be a set of n points in the plane. A block B of points in P is a set of points consecutive in x-axis. A partition of P into block is specified by a sequence of blocks B_1, B_2, \cdots, B_k, where the blocks are pairwise disjoint, and \cup_{i=1}^k B_i = P. Let {\rm fit}(B) be the fit of the block B. The goal is to find an optimum partition B_1, \cdots, B_k, maximizing \sum_{i=1}^k {\rm fit}(B_i) (the value k is not given, it is also determined by the algorithm).

First renumber the points in P by their x-coordinates. The recursion formula is straightforward: let {\rm BestFit}(i) denotes the optimum partition of the points in P[1..i] (the leftmost i points in P), for 1 \leq i \le n and let {\rm BestFit}(0) = 0. Then {\rm BestFit}(i+1) = \max_{1 \leq j \leq i+1}\{{\rm BestFit}(j-1) + {\rm fit}(P[j..i+1])\}. There are totally n entries to compute, and we spend O(i) time for the i^{\rm th} entry, so the overall time complexity is O(n^2). Note that we assume that the function {\rm fit} can be obtained in O(1) time for a block, which usually is true (by possibly preprocessing).

An almost identical problem is segmented least square. Given a set P of n points in the plane, we would like to partition P into k blocks, and fit each block with a line. The overall cost the partition is the summation of the costs from each block.

This problem is a generalization of linear regression: in linear regression, the cost of fitting a point set P of size n to a line l_{a,b}: y = ax + b is
{\rm fit}(P, l_{a,b}) = \sum_{p \in P}(p_y - a p_x -b)^2, where p = (p_x, p_y). The “best” coefficients a and b, that is, minimizing {\rm fit}(P, l_{a,b}), has closed forms:
a = \frac{n \sum_{p \in P}p_x p_y - (\sum_{p \in P}p_x)(\sum_{p \in P} p_y)}{n \sum_{p \in P}p_x^2 -(\sum_{p \in P}p_x^2)},

b = \frac{\sum_{p\in P} p_y - a\sum_{p \in P}p_x}{n}.

However, sometimes the data points lie on more than one lines. Hence we would like to discover this fact (for example, we might be interested in (1) the number of lines; (2) the points where the line changes direction — these kind of points are called change points in the previous Bayesian block partition problem) and find the optimum cost. A human being can easily determine this sometime, but naturally there are situations where the number of lines we need to “best fit” the point set is quite unclear.

We now formulate this problem into an optimization problem. Given a partition \mathcal{P} = \{B_1, \cdots, B_k\} of the point set into segment/block, where each segment consists of a subset of P contiguous in x-axis, , let l_i^{\ast} be the optimum line fitting B_i, the cost of the partition is
{\rm cost}(\mathcal{P}) = Ck + \sum_{i=1}^k {\rm fit}(B_i, l_i^{\ast}),
where C is a positive real number, which encodes the penalty for partitioning P into k parts (without this term, we could fit each single point with a line, causing a trivial and meaningless optimum solution of cost 0). The goal is to find an optimum partition \mathcal{P} minimizing {\rm cost}(\mathcal{P}).

The dynamic programming approach is the following. First, we renumber the points in P according their x-coordinate (in ascending order). The recursion formula is the following: define {\rm BestFit}(0) = 0, and define {\rm BestFit}(i) to be an optimum partition of the first i points of P. The recursion formula is almost identical to the Bayesian Block partition problem:
{\rm BestFit}(i+1) = \min_{1 \leq j \leq i+1}({\rm fit}(P[j..i+1]) + C + {\rm BestFit}(i)), for 0 \leq i \leq n-1, where {\rm fit}(P[j..i+1]) is the optimal cost to fit the points in P[j..i+1] with a single line (we are slightly abusing the notation here).

The time complexity is O(n^2), following the same reasoning. One thing to note is that we need to do some preprocessing on P, so that we can quickly compute {\rm fit}(P[i..j]) in O(1) time.

Dynamic Programming Practice (4)

Shuffle of strings A shuffle of two strings X and Y is formed by interspersing the characters into a new string, keeping the characters of X and Y in the same order.

Example: the string “prodgyrnamammiincg” is a shuffle of “dynamic” and “programming”.

Given three strings S_1[1..m], S_2[1..n] and S[1..n+m], describe and analyze an algorithm to determine whether C is a shuffle of A and B.

This problem is from these lecture notes.

This is a dynamic programming problem. The recursion formula is straightforward: define {\rm Shuffle}(i, j) to be true if S[1..i+j] is a shuffle of S_1[1..i] and S_2[i..j], and false otherwise, for 0\leq i \leq m and 0 \leq j \leq n.

(1) {\rm Shuffle}(0,0) = {\rm true}. (An empty string is a shuffle of two empty strings).
(2) If j =0, then {\rm Shuffle}(i,0) is true if and only if the string S_1[1..i] is a prefix of S[1..n+m], for 1 \leq i \leq m.
(3) If i = 0, then {\rm Shuffle}(0,j) is true if and only if the string S_2[1..j] is a prefix of the string S, for 1 \leq j \leq n.
(4) Now assume that both S_1[1..i] and S_2[1..j] are non-empty strings, then:

  • If S_1[i] \neq S_2[j], then {\rm Shuffle}(i,j) equals to {\rm Shuffle}(i-1, j) if S[i+j] = S_1[i], and {\rm Shuffle}(i, j-1) if S[i+j] = S_2[j]; otherwise it is false.
  • If S_1[i] = S_2[j], then {\rm Shuffle}(i,j) = {\rm Shuffle}(i-1, j) || {\rm Shuffle}(i, j-1).

{\rm Shuffle}(m,n) is the final answer.

The time complexity of the algorithm is O(mn), the space complexity is O(\min\{m,n\}). Here is the result of the execution of the algorithm on the example {\rm Shuffle}("{\rm prodgyrnamammiincg}", "{\rm dynamic}", "{\rm programming}").
Shuffle of strings
Note that we do not need to explicitly maintain the whole table in the implementation: we fill up the table column by column from left to right, and for each column, the entries are filled from north to south. We only need to maintain one column of the table (the “sliding window” method).

One thing to note in this table is that most of the entries seem to be false, so is there any linear time (that is, O(n+m)) algorithm for this problem?

A probability puzzle: loaded dice

I came across this question on interview street:

Random number generator
There is an ideal random number generator, which given a positive integer M can generate any real number between 0 to M with equal probability.
Suppose we generate 2 numbers x and y via the generator by giving it 2 positive integers A and B, what’s the probability that x + y is less than C? where C is a positive integer.

Input Format
The first line of the input is an integer N, the number of test cases.
N lines follow. Each line contains 3 positive integers A, B and C.
All the integers are no larger than 10000.

Output Format
For each output, output a fraction that indicates the probability. The greatest common divisor of each pair of numerator and denominator should be 1.

Input
3
1 1 1
1 1 2
1 1 3

Output
1/2
1/1
1/1

One way to consider this problem is by generating functions. Suppose the distribution is not uniform, is there any efficient way to compute the probability that the sum of the two randomly generated numbers is no greater than C?

Suppose for the first random number generator, the probability of that the randomly generated number, X, equals to i, is p_i, for 0 \leq i \leq A, and similarly, for the second random generator, the probability that the randomly generated number Y is i is q_i, 0 \leq i \leq B Define two polynomials P(x) = \sum_{j=0}^A p_j x^j and Q(x) = \sum_{j = 0}^B q_j x^j. Then consider the multiplication of P(x)Q(x):
P(x)Q(x) = \sum_{k = 0}^{A+B}\left(\sum_{i+j = k, 0 \leq i \leq k, 0\leq j \leq k}p_i q_j x^k\right),
The k^{\rm th} coefficient in the right hand side of the above equation is precisely the probability of X+Y=k. So the probability that X+Y \leq k is the summation of the first k+1^{\rm th} terms. In the problem of interview street, since the distribution is uniform, each p_i is 1/(A+1), and each q_i is 1/(B+1). So we can easily obtain the probability that X+Y < C.

The probability puzzle (using generating functions) is the following:
Loaded dice: No matter what two loaded dice we have, it cannot happen that each of the sums 2,3, …, 12 comes up with the same probability.
(I first came across this puzzle from the book “The art of mathematics”).

We can reason like this: let p_i denotes the probability that the result of the first dice is i, for 1 \leq i \leq 6. Similarly, let q_i denotes the probability that the result of the second dice is i, 1 \leq i \leq 6. Then the probability that the sum of the two results is 2 is p_1 q_1=1/11 (the first dice and second dice must both output 1). The probability that the sum is 12 is p_6 q_6=1/11 (both dices must output 6). Then consider the probability that the sum is 7: it is at least p_1 q_6 + p_6 q_1 (the first dice outputs 1, and the second dice outputs 6, or the reverse). But p_1 q_6  + p_6 q_1 = (1/11) \times (p_1/p_6 + p_6/p_1) \ge 2/11.

This reasoning works, but the book provides a much better proof using generating functions. As before, let P(x) = \sum_{i = 1}^6 p_i x^{i-1} and let Q(x) = \sum_{i=1}^6 q_i x^{i-1}. Now if the sums 2,3,…,12 comes with the same probability, 1/11, then we have
11 P(x)Q(x) = 1+x+x^2+x^3+...+x^{10} = (x^{11}-1)/(x-1). Now since both P(x) and Q(x) are polynomials of degree 5, 11 P(x)Q(x) has real roots; however, the right hand side is a cyclotomic polynomial which does not have any real root. Thus the identity cannot hold.

Here is another application of generating functions for computing probabilities.

Counting heads Given integers n and k, along with p_1, ..., p_n \in [0,1], you want to determine the probability of obtaining exactly k heads when n biased coins are tossed independently at random, where p_i is the probability that the i^{\rm th} coin comes up head. Assume that you can multiply and add two numbers in [0,1] in O(1) time.

(This problem is from this book.)

This problem can be solved using dynamic programming: Let X_i be the indicator random variable for the event that the result of tossing i^{\rm} coin is head, for 1 \leq i \leq n. Notice that
{\rm Pr}(\sum_{i=1}^n X_i = k) = {\rm Pr}(\sum_{i = 1}^{n-1} X_i = k-1 {\rm AND } X_n = 1) + {\rm Pr}(\sum_{i=1}^{n-1} X_i = k {\rm AND } X_n =0)   = p_n {\rm Pr}(\sum_{i=1}^{n-1} X_i = k-1) + q_n {\rm Pr}(\sum_{i=1}^{n-1} X_i = k).

The time complexity is O(n^2). However, if we consider the approach generating functions, we can get a faster algorithm using fast fourier transform:
Let P_i(x) = q_i + p_i x, for 1 \leq i \leq n. Then consider the product of the n polynomials of degree 1: \prod_{i = 1}^n P_i(x) = \sum_{j = 0}^n c_j x^j. The probability of obtaining k heads is the coefficient c_k. Therefore, the problem is reduced to how fast we can multiply polynomials. Using fast fourier transform, we can multiply two polynomials of degree at most n in time O(n log n). Hence, we can use divide-and-conquer to multiply these n polynomials:
\prod_{j=1}^n P_i(x) = \left(\prod_{j=1}^{n/2} P_i(x)\right) \times \left(\prod_{j=n/2+1}^n P_i(x)\right).
Let T(n) denotes the time to multiply n polynomials of degree 1. Then we have the following recursion:
T(n) = 2 T(n/2) + O(n log n),
as in the “conquer” step, we are multiplying two polynomials of degree n/2. Hence T(n) = O(n (log n)^2).

Coin Exchange: Dynamic Programming Practice (3)

Coin Exchange: Given a set of m denominations {d_0, \cdots, d_{m-1}}, and a target value k, find the smallest number of denominations whose sum equals to k. For example, if the denominations are \{1,4,7,13,28,52,91,365\}, and the target is 122, then the optimum way to pick the denominations is 3 \times 1 + 28 + 91, so the number of bills we need is 5.

This is a very common dynamic programming problem. To simply things a bit, we assume that the smallest denomination is 1, which ensures that for any target value, there is at least one way to get that target (a sequence of {\rm target} 1’s). W.l.o.g., assume that d_1 < d_2 < \cdots d_{m-1}. The recursion formula is the following: define {\rm MinBill}(x, i) to be the smallest number of bills we need to get x, with the constraint that the largest denomination we can use is d_i, for 0 \leq i \leq m-1. The base cases are: {\rm MinBill}(x, 0) = x, for x \in \mathbb{N}; {\rm MinBill}(x, i) = 0 for x \leq 0. For x \geq 1 and 1 \leq i \leq m-1, if x < d_i, then {\rm MinBill}(x, i) = {\rm MinBill}(x,i'), where i' is the largest index satisfying d_{i'} \leq x; otherwise, {\rm MinBill}(x, i) = \min\{1+{\rm MinBill}(x-d_i, i), {\rm MinBill}(x,i-1)\}.

Time Complexity/Space complexity: there are several ways to implement the dynamic programming algorithm. If we use the filling-up table approach to implement the algorithm, the time complexity is usually O(m N), where m is the number of denominations and N is the target value. So it is a psedu-polynomial time complexity algorithm. The space complexity is O(N). In the code below, I used “recursion + memoization” approach (the reason is given here).

#include <vector>
#include <algorithm>
#include <unordered_map>
#include <iostream>
using namespace std;

size_t hasher(const pair<int, int> &p)
{
    return hash<int>()(p.first) + hash<int>()(p.second);
}

class CoinExchange {
public:
    CoinExchange(const vector<int> &Denominations): Denom {Denominations}, cache(Denom.size(), hasher) {sort(Denom.begin(), Denom.end());}
    int operator()(int k);
    int MinCoin(int target, int largest);
private:
    vector<int> Denom;
    unordered_map<pair<int, int>, int, decltype(hasher) *> cache;
};

int CoinExchange::MinCoin(int target, int largest)
{
    if (target <= 0)
        return {};

    if (!largest)
        return target;

    if (target < Denom[largest]) {
        largest = upper_bound(Denom.begin(), Denom.end(), target) - Denom.begin() - 1;
        return MinCoin(target, largest);
    }

    if (cache.find({target, largest}) != cache.end())
        return cache[{target, largest}];
 return cache[{target, largest}] = min(1 + MinCoin(target - Denom[largest], largest), MinCoin(target, largest - 1));
}

int CoinExchange::operator()(int target)
{
    return MinCoin(target, Denom.size()-1);
}

int main()
{
    vector<int> denominations {1,4,7,13,28,52,91,365};
    int target;
    cin >> target;
    CoinExchange coinexchange {denominations};
    cout << coinexchange(target) << endl;
}

Update:
See here for a very well-explained article for solving the coin exchange problem in O(1) time.

Dynamic Programming Practice (2)

Maximum set of non-intersecting lines Consider two horizontal lines l_1 and l_2 in the plane. Suppose we are given the x-coordinates of n distinct points a_1, a_2, \cdots, a_n on l_1 and n distinct points b_1, b_2, \cdots, b_n on l_2. Design an algorithm to compute the largest set S of non-intersecting line segments satisfying to the following restrictions:

(a) Each segment in S connects some point a_i to the corresponding point b_i.
(b) No two segments in S intersect.

This problem is from the chapter of dynamic programming of these lecture notes.

It is quite easy to derive a recursion formula for this problem: let S denotes n pairs of segments s_i corresponding to (a_i, b_i), 1 \leq i \leq n. Then the maximum set of non-intersecting segments of P is {\rm OPT}(P) = \max\{{\rm OPT}(P\setminus \{s_1\}), 1+{\rm OPT}(P\setminus {\rm Intersect}(s_1))\}, where {\rm Intersect}(s_i) is the set of segments in P intersecting s_i. However, in order to use the dynamic programming approach, it is not clear how the subproblems overlap and in what order should we evaluate them.

An alternative approach is to formulate the problem as a graph problem: each segment s_i correspond to a vertex in the graph, and two vertices are connected by an edge if the segment corresponding to them intersect. Then the problem is equivalent to finding a maximum independent set in the graph. However, only when the graph instance has some special properties, we can compute the maximum independent set in polynomial time easily (via dynamic programming); for example, when the underlying graph is a tree. But the graph representing the intersections of segments clearly does not have such a property.

So how to use dynamic programming to solve this problem? The key observation is the following: suppose we renumber the indices of \{a_1, \cdots, a_n\}, so that they are ordered from left to right, and we also renumber the corresponding indices of b_i‘s (so when we change the index of some a_k to a_i, we also change b_k to b_i). Then we looked at the indices of b_i‘s. The key observation is that a maximum non-intersecting set of segments correspond to the longest increasing subsequence. The renumber step can be done in O(n log n), and it takes O(n log n) time to find the longest increasing subsequence. So the overall time complexity is O(n log n).