A tale of dynamic programming

I spent some time on Code Jam lately and tried on an interesting problem in the set of past problems: Welcome to Code Jam

Welcome to Code Jam

The problem is simple: given a sentence, count the number of times the string "welcome to code jam" appears as a sub-sequence of the sentence. More precisely: find a sequence s of increasing indices into the input sentence such that the concatenation of input[s[0]], input[s[1]], ..., input[s[18]] is the string "welcome to code jam". The number of such sequence can be surprisingly high. For example, in the sentence "wweellccoommee to code qps jam", the string "welcome to code jam" appears 256 times!

I started off with the dummy recursive approach: For each 'w' found in the sentence, count how many times 'elcome to code jam' appears in the remaining sentence and continue recursively for each letter until reaching the 'm' of 'jam'. Worked well... for small examples of course! This dummy approach has an exponential complexity. The best way to 'feel' this complexity is by running it on such sentence: 'welcome to code jamwelcome to code jamwelcome to code jam'... Just concatenate 'welcome to code jam' n times and check how long the algorithm takes. For n > 5, the dummy approach takes forever. That's right, 5, and it's not such a long sentence!

I was really into the challenge and tried several improvements: changing the recursive algorithm into an iterative algorithm, indexing the letters, reduce the number of passes... Nothing broke the exponential complexity. A very good lesson!

Hopefully, Code Jam provides an analysis of past problems and other people' solutions. I didn't want a laid out analysis in plain english so I downloaded one of the solutions written in C and tried to figure out the approach in a glimpse. A matrix was used to store intermediate results, as if the algorithm was solving the problem for sub-sentences and sub-strings and storing the results. It reminded me of... Dynamic programming, of course!!

Dynamic programming

So, what is dynamic programming?

From Wikipedia:

dynamic programming (also known as dynamic optimization) is a method for solving a complex
problem by breaking it down into a collection of simpler subproblems, solving each of
those subproblems just once, and storing their solutions

Ok, great... How about an example? Topcode provides a nice introduction to dynamic programming and their first example, the Change-making problem is rather easy to understand.

You're given an unlimited supply of n type of coins with values (V1, ..., Vn) and you have to use a minimum number of coins so that they sum up to S. Now, building a dynamic programming algorithm is done in two steps:

  • State a sub-problem along with its optimal solution and memorize the solution
  • Explicit a relation between solutions to sub-problems and the solution to a sur-problem

i.e explicit an equation in which, given the solutions to all sub-problems you can easily find the solution to the problem. In the Change-making problem, the sub-problems would be: find the minimum number of coins that sum up to s < S. Let's memorize the solutions to sub-problems in an array, count[s] is the minimum number of coins that sum up to s. Now, given all solutions to sub-problems from 1 to s, how can we find the solution to s + 1? The equation is: count[s + 1] = min(count[s + 1 - Vj] + 1). Try all coins Vj <= s + 1, use the optimal solution count[s + 1 -Vj] and add 1.

Repeat the described procedure until reaching S. Dynamic programming is fast because of the memoization of solutions to sub-problems but implementing memoization is never challenging, the real challenge is to find the right relation between sub-solutions and the next solution. By the way, the Change-making problem is a component of a much more interesting problem: the Optimal denomination problem.

Back to code jam

Back to our code jam problem. How do we solve it with dynamic programming?

Let's start from our dummy approach and reverse the process: For each 'm' found in the sentence, count how many times 'welcome to code ja' appears before it. This gives us the sub-problem and its relation to the larger problem, count how many times a sub-string of 'welcome to code jam' appears in a sub-sentence of the input!

This time, the memoization structure is a matrix and not an array. Let's note the input sentence T and the string to find S, then count[i][j] is the number of times we find S[:j] in T[:i] (python notation). Assuming we know count[i][j] for each j, how can we compute count[i+1][j]? There are two cases:

  • S[i+1] != T[j] then count[i+1][j] = count[i][j]
  • S[i+1] = T[j] then count[i+1][j] = count[i][j] + count[i][j-1]

Come back to the definition of count (number of times we find S[:j] in T[:i]) to really understand the relation. And that's it! One pass through the input gives the solution! It was a great recall of dynamic programming for me; it seemed trivial for code jammers though as the website called it a 'classic dynamic programming problem'...