Off the top of my head I can think of a doubtless unsophisticated way to recover the "stack" in this situation: namely, the same way you recover the "stack" in the factorial example, by using an accumulator hidden in the interpreter. When foo calls bar(), the interpreter passes along, when it makes the tail call, an explicit list of predecessors (in this case ["foo()"]); when bar calls baz(), the invocation of baz receives ["bar()", "foo()"], etc.
Some Schemes (like MIT Scheme) do this with a circular buffer of bounded capacity. Another Scheme interpreter (a little one by Jake Donham) had a switch to turn off TCO to help debugging. Yet another approach is full omniscient debugging (being able to go backward in time).
Yeah. But you could decide how much of the stack to keep, at least. Nevertheless, I know that some smart persons (I just can't remember who!) have addressed this issue in a significantly subtler way---wish I could find the reference.
Parent wasn't suggesting to keep full stack frames, but something like (file/function name/line number) tuples, that's not the end of the world. By making it a fixed-size list, you can ensure your memory usage won't blow up.