In the case of the former, I think you're misunderstanding Wirth. His statement isn't predicated on the idea that functional programming 100% requires the language to eliminate state, necessarily; just that functional programming discourages state in lieu of primitives less aligned with the machine.
For instance, functional programmers would almost all tell you that `map (x => ...) xs` is "better" than `for i from 0..len(xs): xs[i] = ...`. But the former, implemented trivially, is very slow: from the allocation of the closure to the allocation of the new list to the function calls on each iteration and the lack of tail recursion in `map`'s implementation (this is a trivial implementation, remember?)
Of course, the functional programmer would tell you, "Well, it's easy to optimize that, the performance issues are just because your implementation is too trivial", and Wirth would rejoin, "Too trivial? What's that?"
I don't think this is a fair point, since the two snippets do different things. One creates a new list, and the other does not. If, as a functional programmer, I was actually interested in mutating the existing sequence (e.g. for performance reasons), I would definitely write the loop.
If you're interested in maximum constness (which I tend to be, because I find it's almost always easier to read code where values don't unexpectedly change in a branch somewhere) then you'd be comparing
let ys = map f xs
to
let ys = []
for i from 0 .. len(xs):
ys.push(f(xs[i]))
where the former obviously makes it much more clear what's going on.
Sure, it's using "primitives further from the physical machine" but that is exactly what programming is about! You create a new layer of primitives on top of the old ones, where the new layer makes it slightly easier to express the solution to the problem you're solving. You do this incrementally.
When someone has built a more easily handled set of primitives for you, it would be silly not to use them, all else equal.
----
In other words: the only real reason to mutate values is to improve performance at the cost of readability, and at the cost of losing the ability to safely share that data across concurrent threads.
If, indeed, that is a cost you're willing to pay for the additional performance, no functional programmer I know would shy away from the imperative mutating loop.
They do two different things, but they do different things the way the style they're written in encourages. Performance-considerations aside, the functional programmer would rather create a new list (or, as you said, they're "interested in maximum constness" - the precise preference for statelessness Wirth is calling out); the imperative programmer would mutate the existing list.
Wirth is not talking about clarity, not in the sense of "can I look at the code and understand the high-level intent of the programmer"; Wirth is interested in clarity in the sense of "can I look at the code and understand exactly what it's doing, at every level"?
For Wirth, programming is not about using an endless stack of primitives that get you further and further from the physical machine, so much that they start to obfuscate what's happening at lower layers. It's about building the smallest, simplest stack of primitives such that you can express yourself effectively while still understanding the entirety of the system. The Oberon system includes everything from the HDL for the silicon all the way up to the OS and compiler in around 10,000 lines of code because you're supposed to be able to keep all of it in your head.
I'm not saying that any of this is correct, per se, nor am I arguing for it - I'm sympathetic to it in some ways and disagree with it in others (I am, in fact, very much into FP). I'm just trying to give a charitable and clear interpretation of his perspective. FP may not want to get rid of state in one sense, as you've pointed out; but it wants to get rid of state in another, and Wirth doesn't like that because it necessitates complexity - and Wirth hates that.
For Wirth, programming is not about using an endless stack of primitives that get you further and further from the physical machine, so much that they start to obfuscate what's happening at lower layers. It's about building the smallest, simplest stack of primitives such that you can express yourself effectively while still understanding the entirety of the system.
Now that is an interesting perspective I hadn't even considered. Also not sure I would agree – but if I was interested in finding out more but found the OP unconvincing, where would I go to find out more?
For instance, functional programmers would almost all tell you that `map (x => ...) xs` is "better" than `for i from 0..len(xs): xs[i] = ...`. But the former, implemented trivially, is very slow: from the allocation of the closure to the allocation of the new list to the function calls on each iteration and the lack of tail recursion in `map`'s implementation (this is a trivial implementation, remember?)
Of course, the functional programmer would tell you, "Well, it's easy to optimize that, the performance issues are just because your implementation is too trivial", and Wirth would rejoin, "Too trivial? What's that?"