"In sum, something "fundamental" changes when you want to go fault-tolerant and tolerate node failure in an asynchronous system. When you combine faults and full-asynchrony, you get the FLP impossibility result. That means you lose progress! That is why Paxos does not guarantee making progress under a full asynchronous model with a crash failure."
This is unclear to me. Egalitarian Paxos guarantees progress under a full asynchronous model and doesn't have the dueling leaders problem. So this looks like a weakness of standard Paxos itself, not a fundamental problem.
Though it might not be explicitly stated in the paper, EPaxos has the same liveness guarantee as all other consensus protocols: commands will eventually commit if a long enough period of synchrony occurs. As the author of this post notes, this is a fundamental limitation of the specification of the consensus problem - no protocol can get around the limitation while solving consensus by definition.
Similarly, Max Zorn used to ask people whether they recalled what Zorn's Lemma was introduced as a lemma to. (I haven't a clue, and I doubt most of them did either.)
Zorn introduced a "maximal principle" as an axiom. It appears Tukey called a generalization of it Zorn's Lemma for unknown reasons, though there is some version of the statement that really is proved, but apparently by Chevalley (a fact Zorn knew).
One of the most popular problems to be solved in modern IT systems is that of keeping the state of your application distributed across a number of machines.
Paxos is one of the algorithms that solves this problem by giving you a protocol that allows the sets of machines to agree upon a set of operations that would all be applied to their states thus giving you a set of machines in the same state.
A simple example, is if you had a set of 3 machines starting with state of “0” and wanted to add “1” to their state. Paxos would define how they should communicate so that in the end, even if one of the machines failed during execution, would all end up with a state of “1”.
Are we there yet? Do we need paxos-like consensus protocols? Hardware is becoming cheaper and commoditised and with all the hype around blockchain, it looks like people are ready to pay extra for the redundant hardware needed for 100% fault tolerance.
Still, it feels to me to in almost all cases, including financial transactions, it's good enough to be right 99.999% of the time and just amortise the costs of the very rare bit flip...
You bring up a good point, although I'm not sure if I agree with (or understand) the premises. I can't imagine a world where hardware and the protocols running on top will be immune to physical sources of faults, such as natural disasters, human intervention, or cosmic radiation(!). As a result, dealing with failures will always be a consideration in building distributed software systems, and Paxos or protocols that solve the same problem as Paxos will always be relevant.
I do think you make a good point that we don't always need Paxos-like protocols. Paxos is a very strong tool, so it solves a difficult problem, but is heavy-handed in many scenarios. There is a lot of space to explore lighter-weight alternatives to Paxos while still providing similarly strong properties.
https://ug93tad.github.io/consensus/
And on the topic of Paxos, some recent HN discussion:
https://news.ycombinator.com/item?id=16003662 - WPaxos: a wide area network Paxos protocol
https://news.ycombinator.com/item?id=13923949 - Paxos in 25 Lines
https://news.ycombinator.com/item?id=13950493 - Gryadka is not Paxos, so it's probably wrong [RETRACTED]