Dynamic Progamming: First Principles

psykotic · on Oct 25, 2017

I'm fond of this old RAND report from Dreyfus, which is worth skimming if you're mathematically inclined: Dynamic Programming and the Calculus of Variations, https://www.rand.org/content/dam/rand/pubs/reports/2006/R441...

One important takeaway is that dynamic programming in the Bellman formulation is a discrete analogue of Hamilton-Jacobi theory in how it writes down an equation for the optimal value as a function of a given endpoint rather than writing down an equation for the path as with the Euler-Lagrange equations. (You can reconstruct the path from the value function after the fact by gradient descent.) The relationship between Hamilton-Jacobi and Euler-Lagrange is the classical version of wave-particle duality. A concrete example in geometrical optics is the eikonal equation, a Hamilton-Jacobi type PDE, versus the geodesic equation, an Euler-Lagrange type ODE. Not coincidentally, one common numerical method for the eikonal equation called the fast marching method is a dynamic programming algorithm, very similar to Dijkstra's algorithm for shortest paths.

It should be mentioned that any "local" equation like a PDE or ODE cannot describe a globally optimal solution without strong assumptions such as convexity. In fact, satisfying the Euler-Lagrange equation isn't even sufficient for local optimality without further qualifications (Weierstrass conditions). But the Bellman dynamic programming equation, being recursive, can describe globally optimal solutions.

mlevental · on Oct 25, 2017

understand basically all of you've said here except this:

>The relationship between Hamilton-Jacobi and Euler-Lagrange is the classical version of wave-particle duality.

in what sense are two formalisms (hamiltonian and lagrange) related to the relationship between time and freq space for fourier solutions to pdes?

also holy shit i would pay pretty good money for a service that typeset these old monographs using latex.

kbenson · on Oct 25, 2017

Seems like in a world where there's on-demand photoshop editing through a gig economy, something like this should be feasible, if not already exist. Maybe a separate cost per page for text, diagram or both, and a system to pay other people to review pages for correctness (mechanical turk style, or just actually use mechanical turk), and you have a nifty way to update old scanned documents while helping put food on some poor grad student's table.

krat0sprakhar · on Oct 24, 2017

While dynamic programming is taught in almost all algorithms classes, I think I finally grokked it after implementing it in a few practice problems. Would strongly recommend giving a few exercises listed here a shot: https://leetcode.com/tag/dynamic-programming/

hatred · on Oct 24, 2017

Personally, I prefer these more than the leet code problems: https://people.cs.clemson.edu/~bcdean/dp_practice/

kevintb · on Oct 25, 2017

Great resource, thank you!

zerr · on Oct 25, 2017

I've grokked it a couple of times, implemented solutions for some problems (knapsack, matrix multiplication, etc...), but I always keep forgetting...

nop_slide · on Oct 25, 2017

You're not alone! Went down the whole nine yards in college, and come exam time knapsack was toast!

However I would definitely struggle with it today. My best simplistic explanation is that it is a method in which you cache values in order to not duplicate work.

I then tell about the only example which I could still bang out on a white board, which is fibonacci with a cache.

Oh yeah and another important detail is something something solving subproblems :p

zerr · on Oct 25, 2017

Yes, divide-and-conquer where subproblems overlap.

tshaddox · on Oct 25, 2017

It’s odd that there would be algorithms classes that don’t require some implementations.

TuringNYC · on Oct 25, 2017

When I did Algorithms in college ~18yrs ago, I dont recall it being called DP. We certainly learned it but without a distinct name for it, it didnt hold as a unique concept, just an obvious strategy/approach.

beardbandit · on Oct 25, 2017

When I took Algorithms II in college at a top 10 CS program, our entire class involved ZERO coding or programming of any sort.

megaman22 · on Oct 25, 2017

Same. Probably the worst class I ever took was my algorithms and data structures class. I swear the professor transcribed most of CLRS onto the blackboard over 10 weeks verbatim, and we never so much as touched a keyboard.

Bromskloss · on Oct 25, 2017

Is paper programming in pseudocode bad, though? I mean, computing science is not about computers, and all that.

farresito · on Oct 25, 2017

Honestly, it's hard to completely grok a concept until you have solved at least a couple or three problems about the topic, no matter how many examples or lines of pseudocode you have seen. I think that's his complain, and I have to agree with him. I also took a class some years ago that was purely theoretical and I didn't really learn much besides some general concepts.

Bromskloss · on Oct 25, 2017

I was actually referring to solving problems yourself in pseudocode on paper.

In any case, I'm happy with a computer science course that doesn't involve actual computer implementations, and it's not just because it drives home the point that they are different things, but also because you don't have to get bogged down in the practicalities of wrestling with a particular language or a particular toolchain.

gicadin · on Oct 25, 2017

Although there is a lot of merit in implementing an algorithm and not just writing pseudo code, doing it on paper has some advantages.

- Its a lot quicker to sketch an algorithm on paper ( you can ignore some details which are either trivial, or irrelevant to the problem )

- At a certain level you are expected to be able to convert pseudo code into actual code

- The most important part of an algorithm is knowing about it and what problems it solves (and variations). As well as the "trick" that makes it solve something particularly well - dynamic programming for solving sub problems etc... Even if I implement an algorithm or just write the pseudo code, I will forget the details fairly quickly, but the takeaway is that I know that for problems of type X I can use algorithms of type Y, (and sometimes i'll remember I can use Y because of fact C related to that particular problem or algorithm)

TeMPOraL · on Oct 25, 2017

> you can ignore some details which are either trivial, or irrelevant to the problem

The most important insight of the old saw that teaching someone builds your understanding, but being able to code it ensures you have actual, deep understanding is this: the details you ignore as "trivial" or "irrelevant to the problem" are quite likely the crucial details to understand it and make it work. You can't safely handwave away parts of the problem until you have a good understanding of the entire problem.

I can't even count the cases in which I though I understand some algorithm (either in uni, or more recently, through reading a paper), then I sat down to implement it and realized I don't really understand shit about it.

pjmlp · on Oct 25, 2017

I would say, at least speaking for myself, that is what happens most of the time.

catnaroek · on Oct 25, 2017

Sometimes I program in an actual programming language using paper and pen, solely for the pleasure of commenting my code in mathematics rather than in English. I make sure that the code is syntactically and semantically valid - stripped out of the comments, it's a perfectly valid program that you can run.

megaman22 · on Oct 25, 2017

I'm not sure it is an awful format for some people, but it sure didn't work for me.

I mean, I ended up implementing most of the problem sets in Python anyway - then had to transpile them into the bizarre undocumented form of pseudocode that was apparently the only thing the instructor and TAs could understand.

Combined with the complete lack of value added by the in-class lectures, compared to just reading the textbook, and the draconian attendance policy, it was far and away the worst class I endured in college. Particularly galling for a subject so central to the course of study and one that had material that could be so interesting if handled differently.

pjmlp · on Oct 25, 2017

All our classes required delivering some kind of working code, either at the end of semester, or throughout it.

It was also the class that introduced us to unit tests, by having the code delivered as C library, that the teacher would link against to run her tests.

icebraining · on Oct 25, 2017

They're now using Mooshak to upload and run the tests automatically. The algorithms class is now mostly taught in Java, though.

pjmlp · on Oct 25, 2017

Do you also mean FCT/UNL? Interesting, I didn't knew about it.

icebraining · on Oct 25, 2017

Yeah, I attended it until a few years ago. They're also teaching OCaml and Prolog, but Java is the main language for algos, and for the AI class too. C is only now used for the Systems class.

pjmlp · on Oct 25, 2017

Quite happy to see ML languages and Prolog still around.

Back when I was there, we used Caml Light.

Miss the campus. :)

barrkel · on Oct 25, 2017

I had to reproduce quicksort or binary search (I don't recall which) by hand, on pen and paper, in an exam :)

akhilcacharya · on Oct 25, 2017

Interesting that you should mention that because not being able to do those problems is why I want to learn it!

et2o · on Oct 24, 2017

Favorite examples for applications of dynamic programming? Mine are sequence alignment/BLAST in bioinformatics, but I'm sure there are many of which I am not aware in other fields.

Terr_ · on Oct 25, 2017

Ditto, my first "Ahah!" moment for DP was actually in a bioinformatics class, implementing Needleman–Wunsch global sequence alignment using a Blosum64 matrix door transition weights. (Details that aren't as impressive as they sound.)

I ended up using it as the basis for another extra-credit project that demonstrated the algorithm with a GUI: Select inputs, choose speed, hit Play, watch the numbers and lines, etc.

Anyway, it's stayed with me as my go-to DP example.

thechao · on Oct 24, 2017

Line breaking. The Knuth-Plass algorithm I implemented is my all-time favorite piece of code.

tshaddox · on Oct 25, 2017

That’s one of my favorites too, although I remember it being one of the easier implementations from my college course. We just did it with fixed width fonts, of course, but the results were still so impressing.

iandioch · on Oct 24, 2017

There are lots of interesting competitive programming problems (and their related algorithms) that involve it. For example, it was important in the solution to a problem in the UKIEPC (UK & Irish programming contest to gear up for the ACM ICPC) this past weekend: https://domjudge.bath.ac.uk/domjudge/public/problem.php?id=3... :)

Bromskloss · on Oct 24, 2017

I guess someone have to mention finding an optimal line wrap for a paragraph of text.

https://doi.org/10.1002%2Fspe.4380111102

TuringNYC · on Oct 25, 2017

Dont need DP here since there is an obviously better iterative approach, BUT, the easiest demonstration of DP value is the naive Recursive algo for Fibonacci sequence generation vs the DP+Recursive algo.

joshvm · on Oct 25, 2017

Stereo image matching - specifically semi-global matching. The general idea is still considered a state of the art method.

mlechha · on Oct 25, 2017

All of Reinforcement Learning :)

santaclaus · on Oct 24, 2017

Image stitching is pretty fun!

Bromskloss · on Oct 24, 2017

That sounds like it could have some things in common with the DNA-sequence alignment of the parent comment, both dealing with overlap. Is that so?

aquarin · on Oct 25, 2017

Yes, it have a lot of common.

pmalynin · on Oct 24, 2017

Reinforcement Learning

gugagore · on Oct 25, 2017

I think really just "Value Iteration" (which isn't just used in RL). Reinforcement Learning itself is a problem setting and there are solutions in RL that don't use dynamic programming (for example, policy gradient methods).

pmalynin · on Oct 25, 2017

Well of course :) but it’s a cute method to solve certain MDPs

wht · on Oct 24, 2017

In O.R. graduate School, Professor Gene Woolsey told us that he'd rise from the grave and stand on our desks screaming 'No!No!No!' if we ever actually used it to solve a practical problem.

IIRC, his complaints were about the speed of formulation, difficulty to understand and communicate the model to others, and the processing required to regenerate answers when the model changed.

I believe Optiant used Dynamic Programming for supply chain optimization.. So people do or did use it for practical problem solving. ..I think.

gwern · on Oct 25, 2017

You might be surpised how far you can get with DP on a modern computer! For example, in https://www.gwern.net/Coin-flip we consider a gambling game with 300 rounds and at least as many options each round, and while it initially sounds infeasible, you can still compute the optimal strategy in under a minute with some optimized C/C++ implementing the DP solution. Modern computers are fast. (And you can define a solution in like 5-10 lines of R/Python if you don't mind them being a lot slower. I don't know how you would solve it with LP or MIP but I suspect it will be both slower and longer.)

mattkrause · on Oct 24, 2017

What were you supposed to use instead? My impression is that dynamic programming is sometimes sluggish but often better than the alternatives.

throwaway7645 · on Oct 25, 2017

Most optimization problems use Linear Programming if the problem has linear/continuous variables and Mixed Integer Programming if binary values (turning something on or off) is part of the solution. These are used in production pretty much everywhere. LP (linear programming) problems are extremely fast too. My company has a HUGE LP & MIP problems and the LPs are still only a few minutes to solve on nice hardware. MIP can be much trickier.

hvidgaard · on Oct 25, 2017

Unfortunately MIP (and ILP in general), is potentially really really slow, (and proved to me NP-Hard). That extra constraint of a value that must be an integer really complicates things. Still, it's a really nice way to express may optimization problems.

gugagore · on Oct 25, 2017

I don't think it's that common to solve, say, shortest paths by solving the LP/MIP formulation (unless maybe the problem at hand isn't strictly a shortest paths problem). Is it?

xoroshiro · on Oct 26, 2017

For something as well known as shortest path, no. One thing about LP/MIP formulations though, is that it's arguably easier to extend. Not to say there aren't tricks to modifying existing algorithms, but most of the time, there is no need to spend all that time, when adding in a few variables or constraints will do the trick for this one problem that you will probably not need to revisit for quite a while.

Myrmornis · on Oct 25, 2017

That's for a certain definition of "optimization problem" right? One could equally say "Most optimization problems use a numerical hill-climbing algorithm".

mattkrause · on Oct 25, 2017

That's not a drop-in replacement for dynamic programming though, is it? Can you do something like edit distance with IP?

throwaway7645 · on Oct 25, 2017

Yea, I'm not familiar with Dynamic Programming, but it sounds different. I'm talking about garden variety optimization problems (minimize or maximize something).

gicadin · on Oct 25, 2017

Dynamic Programming is one of the most common tools when doing optimization.

Generally, shortest path algorithms rely on dynamic programming for a reasonable solution. Examples of include the Traveling Salesman.

gumby · on Oct 25, 2017

> I believe Optiant used Dynamic Programming for supply chain optimization.. So people do or did use it for practical problem solving. ..I think.

Yes, the least cost path problem is not just a classic DP problem but is an example in this article.

hyperpallium · on Oct 25, 2017

Edit distance (nee Levenshtein distance) is solved with dynamic programming, and is used by spell-checkers.

I agree reformulating the problem can be confusing. It wouldn't be worth it, but for the incredible efficiency gains (not always needed).

mratsim · on Oct 25, 2017

For the last part about economic optimization, I would not approach it with Dynamic Programming. As evidenced by go and many games, doing the "local" best move does not guarantee the best result in the end. If brute-forcing is untractable, the state-of-the-art is using Monte-Carlo Tree Search as evidenced by its dominance in board games, Magic the Gathering, Poker, robots competitions, RTS, etc ...

sacado2 · on Oct 25, 2017

Constraint solving works pretty well in these domains too, and AFAIK, top-level artificial players in board games is a mix of constraint solving and MCTS.

charlysl · on Oct 25, 2017

I learned it as part of speech processing, first for Dynamic Time Warping and then as the Viterbi and Baum Welch algorithms. Together with Hidden Markov Models it's a thing of beauty how it is used to model speech.

All of this is explained very intuitively in speech.zone

In the wikipedia entry there is a fun (really) explanation of why its creator called it like this.

HHest · on Oct 25, 2017

Please note, the last name of the author referenced in the article is "Trick", not Tick.

http://mat.gsia.cmu.edu/classes/dynamic/dynamic.html

matt4077 · on Oct 25, 2017

Together with memo(r)ization, and the typo in the title, I'm starting to thing the letter 'R' is having a bad day.

tommsy64 · on Oct 25, 2017

There appears to be a missing return statement in the Fibonacci Number example in the if block. Should look like this: https://repl.it/NLIf/1

molikto · on Oct 25, 2017

Why use monospace font?????

pingiun · on Oct 25, 2017

I agree, it's hard to read

raister · on Oct 25, 2017

Best font ever... why not?

goerz · on Oct 25, 2017

Totally love the font!

misja111 · on Oct 25, 2017

> Tail recursion [3], a variant of traditional recursion implements memoisation, which uses memoisation very economically.

I don't understand this part, can anybody explain?

cdax · on Oct 25, 2017

In "traditional" recursion, each recursive call is pushed onto the function call stack. This means that your recursive solution will eventually run out of memory when the number of recursive calls is more than what the function call stack can handle.

e.g., consider the following recursive solution for calculating the sum of all positive integers up to a given `n`:

  fn sum(n):
    if n == 0:
      return 0
    else:
      return n + sum(n - 1)    // A

When n = 1000, your function call stack will have to hold ~1000 calls (fn(1000), fn(999), fn(998), fn(997), and so on...), before it's able to compute the sum. This is because the return statement in line A above, needs to keep track of the variable `n` from the calling function in order to be able to compute the return value. If only there was some way to eliminate that variable `n`...

That's where tail-recursion or tail-call optimization comes in:

  fn sum_tail(n, pre_sum):
    if n == 0:
      return pre_sum
    return sum_tail(n - 1, pre_sum + n)    // B
    
  fn sum(n):
    return sum_tail(n, 0)

In this solution, the return statement at line B does not depend on any variable from the calling function (it simply passes that value on to next function call), and so the calling function can immediately be popped off the stack, saving stack space and making your solution more memory-efficient.

It's useful to know that whether any stack space can actually be saved, depends on whether your language of choice implements tail-call optimization. e.g., JavaScript recently started supporting tail-call optimization after ES6 [1], and Python does not support it [2].

[1] http://2ality.com/2015/06/tail-call-optimization.html [2] https://stackoverflow.com/questions/13591970/does-python-opt...

misja111 · on Oct 25, 2017

Thanks for the great explanation of tail recursion.

But I still don't see how tail recursion implements memoisation? One might even argue that tail recursion is the opposite of memoisation; while tail recursion saves memory because it eliminates the need to remember previous results of function calls, memoisation on the other hand uses extra memory to save results of earlier function calls.

rishabhparikh · on Oct 25, 2017

The memoization occurs in the fact that we are "remembering" the previous value (pre_sum). It's a little different in that we aren't memoizing/caching all previous values, but we are still caching the last computed values aka the "tail".

vanderZwan · on Oct 25, 2017

rishabhparikh is correct, but since the point is quite subtle it might help to be a bit more elaborate.

The crux of the misunderstanding is here: [tail recursion] eliminates the need to remember previous results of function calls.

Emphasis: results. The thing is: tail call recursion essentially means "finish all intermediate computations and pass the results of these to the next function call".

In the non-tail call version, the sum function has no results until you hit the deepest level of recursion; it is the equivalent of writing the sum out in full, which causes all that extra overhead:

    sum = n + (n-1) + (n-2) + ... + 0

You have to recurse until you reach n == 0, and only then does the whole sum "collapse".

The `pre_sum + n` bit in the second example is what fixes this. All intermediate calculations are finished and then "stored" by passing them as arguments to recursive calls. This gives the functional equivalent of a for loop:

    sum = 0
    for i in n..0:
      sum += i

This is why it can be considered memoization: each recursive call builds on the "stored" finished result of the previous call.

misja111 · on Oct 25, 2017

> This is why it can be considered memoization: each recursive call builds on the "stored" finished result of the previous call.

Ok, but isn't this then also the case for non-tail recursive calls? And ordinary recursive call builds just as well on the finished result of the previous call and no result is really "stored" in either case.

ohkaiby · on Oct 25, 2017

No, if you look at vanderZwan's explanation for the non-tail recursive call, you'll see that it doesn't build on a finished result of the previous call.

> You have to recurse until you reach n == 0, and only then does the whole sum "collapse".

smitherfield · on Oct 25, 2017

NB: Although the ES6 spec requires TCO, it turned out to present significant implementation challenges, so it's only implemented in WebKit (Safari), and is, IIRC, marked WONTFIX in V8 (Chrome, Node), SpiderMonkey (Firefox) and Chakra (Edge). There's a good chance it'll get removed from a later edition of the standard.

fmihaila · on Oct 25, 2017

Glaring typo in the title: it should say progRamming.

santaclaus · on Oct 24, 2017

> Memorisation

It is memoization, not memorisation! Although for two weeks in algorithms class I did in fact think our prof was just pronouncing memorize in a cutesy manner.

quotemstr · on Oct 25, 2017

Honestly though, we should just switch to using "memorization". It's a less obscure word that communicates the intended meaning better, IMHO.

Retra · on Oct 25, 2017

Obscure words are very well suited to esoteric subject matter where precision is needed. A particular brand of function optimization is exactly that.

cee_el1234 · on Oct 25, 2017

Indeed... the way information is being organized is specific enough to be given a proper term - memorization indicates just storing things in whatever form

quotemstr · on Oct 25, 2017

What precise meaning does "memorize" denote that "memorize" wouldn't? Sometimes jargon is just jargon.

TuringTest · on Oct 25, 2017

"Memoization" is only used in the context of caching results of previous function calls.

So if you see the word, you instantly know that this is the context. You wouldn't know that if you see "memorization".

jjaredsimpson · on Oct 25, 2017

Exactly. Memorization is the process of committing something to memory. Things like spaced repetition, flashcards, using mnemonics, etc. Memoization is the technique of caching expensive function calls and returning those cached values upon subsequent invocations.

Seeing the word memorize implies a process that an actor is undergoing and says nothing about the data being memorized. Seeing the word memoize implies the existence of an expensive function and implies a process of repeated calls to the function. It also implies the function is pure, i.e. you can't memoize calls to fread().

kbenson · on Oct 25, 2017

Memoization implies caching specifically, which implies its own things, such as cache size and eviction methodology. Not something that's necessarily thought of when using the term "memory".

Memoization is close enough to "memorization" that you pretty much know what it is without having heard of it before and easily rememberable, while being specific enough to the actual concept as to be easily searchable and imply specific concerns of its own. That's a win-win in my book.

lemming · on Oct 25, 2017

Sadly, the older I get the more I am aware of my memory's aggressive cache eviction policy.

kbenson · on Oct 25, 2017

You and me both. :

But that's part of it, you generally don't really associate that with memory until you get older...

khedoros1 · on Oct 25, 2017

I think of memoization in terms of memos. You're leaving memos behind to avoid recalculating, but you're going to throw them out when you have your result; it's scratch paper to refer back to your previous work. The word itself denotes the reason for storing the data.

Memorization makes me think of a repetitive process of committing something permanently to memory.

PeachPlum · on Oct 29, 2017

Memoization includes writing to disk such as a materialized view.

ram_rar · on Oct 25, 2017

Given the sad state of programming interviews. People mostly land up memorizing

foxh0und · on Oct 25, 2017

Thanks all!

throwaway7645 · on Oct 25, 2017

I came on here to say that.

chrisbennet · on Oct 25, 2017

Me too!

_lhlo · on Oct 25, 2017

I didn't.

dep_b · on Oct 25, 2017

That's simply the British English way of writing the word. The author is a Swede and most probably got educated in the official way of writing English.

derefr · on Oct 25, 2017

If you think the parent is talking about the s/z, look back earlier in the word. Memoization (regardless of locale) doesn't have an "r".

tw1010 · on Oct 25, 2017

According to the authors linkedin he is as far as I can tell born and educated in Australia. Why do you say he is from Sweden?

BucketSort · on Oct 25, 2017

Obligatory Demaine lectures - https://youtu.be/OQ5jsbhAv_M

yandrypozo · on Oct 25, 2017

that's the best set of videos about DP !!

yipopov · on Oct 25, 2017

Would it have killed him to use a proportional font? We have the technology to accomplish that now.