Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Raft Consensus Algorithm (raftconsensus.github.io)
87 points by ForHackernews on Oct 29, 2014 | hide | past | favorite | 27 comments


One of the most lucid explanations of the Raft Consensus Algorithm that I've ever seen is the one in John Ousterhout and Diego Ongaro's Raft User Study[0]. The 2 videos do an excellent job of explaining both the Paxos and Raft algorithms and the key differences between them. John's narration is extremely clear and the slides are very informative. Definitely recommended!

Raft is a consensus algorithm designed to be easier to understand than Paxos. To measure Raft's understandability, we conducted an experimental study using CS students at two universities. We recorded a video lecture of Raft and another of Paxos, and created corresponding quizzes. This page makes our materials available for anyone interested. We think these are valuable resources for anyone learning consensus (whether Raft or Paxos or both).

[0] - https://ramcloud.stanford.edu/~ongaro/userstudy/

Raft Video - https://www.youtube.com/watch?v=YbZ3zDzDnrw

Paxos Video - http://www.youtube.com/watch?v=JEpsBg0AO6o&feature=youtu.be



The only issue I have with raft (besides bugs in the various implementations I have to deal with at work) is that it is not yet proven to truly be simpler than Paxos. In other words, is it a genuinely simpler algorithm that provides the the same guarantees as Paxos, or is it simpler because it leaves edge cases unspecified, forcing implementors to grind through the edge cases with heuristics? If it's the second, it's not actually simpler, it's just worse.

I'm not saying that it isn't genuinely simpler. Just that in discussions about raft, I see most people blindly assuming that it is. But I don't believe that it has been proven yet. I think this is the nuclear power plant part of the bikeshed discussion. [1]

If you scrub to the 50 minute mark of this talk[2] by Leslie Lamport (creator of Paxos), my coworker David Varvel asks him this question and he gives more or less the same answer.

[1] http://en.wikipedia.org/wiki/Parkinson's_law_of_triviality [2] http://channel9.msdn.com/Events/Build/2014/3-642


Is there a good open-source Paxos implementation? There are many open-source Raft implementations, some of which are good. goraft powers etcd, for example.

I believe most developers just want to consume a working library. It seems likely that the plethora of Raft libraries and dearth of Paxos libraries is because it is easier to write a Raft library, but I also don't think it matters _why_, it only matters that there are good Raft libraries.


CoreOs (etcd team) is currently rewriting their raft implementation because it had a lot of problems. I don't believe there is a "good" implementation out there yet, because it's all very new. And should an implantation be "good", AKA implement the spec without introducing incidental errors, will it still have edge cases because the spec has edge cases? This is all currently unknown. It's precisely these assumptions that I'm pointing out.

A quick google search brings up libpaxos and other open paxos implementations, so it appears that they exist.


The new etcd raft implementation is working well and has a solid set of deterministic tests. We recently released an alpha of etcd that utilizes this improved raft implementation: https://github.com/coreos/etcd/releases.

The core raft state machine comes in under 600 LOC last time I checked and the docs are here: https://godoc.org/github.com/coreos/etcd/raft. There are a few non-etcd projects that have started working with it, most notably the cockroachdb folks have started to use it to implement the "multi-raft" that they need for their DB. You can find the discussion and work on github.com/cockroachdb.

It turns out getting a raft implementation that is easy to audit with a copy of the paper in hand and easy to test in a fast and consistent way requires a significant engineering effort. I hope that based on the lessons learned from goraft that this implementation is something that a number of Go projects can use and leverage. Particularly since we carefully separated the WAL, RPC and state machine from each other.


If you are looking for a good one. Checkout ours https://github.com/cloud-software-foundation/c5-replicator


Wow - this looks like a great implementation. Who is cloud-software-foundation? Is this the software that powers WANdisco?


This does not power wandisco. WanDISCO uses DConE.


Interesting - thanks for the libpaxos link. Is it any good?

I'm curious: what are these edge cases you're talking about? I've worked on a Raft implementation, and I don't remember any particular edge cases (though it's been a while).

There's definitely some interesting choices for when to replace nodes in the face of failure, but I don't think that's a Raft algorithm shortcoming.


Is there a good open-source Paxos implementation?

Try Corosync/Pacemaker.

I believe most developers just want to consume a working library.

Right. Maybe they'd even prefer not to have to bother, they just want the availability thing solved for parallel service instances. Perhaps the more interesting question is, "how do you neatly fit raft/paxos like functionality at deployment time in to the average SOA service development process?" (subtext: without making devs learn either algorithm)

I think the answer there is an improved devops process that discourages any form of assumption around infrastructure type and can easily run failure tests at various levels of the deployment topology.

Some of my thinking in the area is documented at http://stani.sh/walter/pfcts


Yes there is -- Basho Riak's riak_ensemble module implements a version of Multi-Paxos.

Description:

https://github.com/basho/riak_ensemble/blob/develop/doc/Read...

Code:

https://github.com/basho/riak_ensemble

Ensemble module, the way I understand it, provides ability to have "consistent" CP (in CAP theorem sense) updates in the otherwise default AP database setup in Riak.

Disclaimer: I am not affiliated in any way with Basho, just follow riak_core module development because I think it is a piece of very cool engineering.


To me it's clearly simpler. Paxos involces special client numbers (to guarantee ordering), but you still have to fall back to randomized delays to avoid livelock; raft just starts with randomized delays up front.

If you can read the raft paper and come away also knowing how group membership changes (wow) and log compaction works, in additional to the "basic" log replication consensus, that's huge.

edit: I'm sure it's still 'hard' to implement, like bringing a brand new replacement machine up from scratch (copying the full data set over, to warm him up, and then starting replication efficiently), i.e. there's still a lot of coding to do


I doubt any consensus algorithm can be simpler than single decree paxos. But what people usually care about is deciding on a sequence of operations. Paxos made live, and others have engineered their own solutions to make multi-paxos efficient at this. Raft solves much of this for you, right from the start.


> In other words, is it a genuinely simpler algorithm that provides the the same guarantees as Paxos, or is it simpler because it leaves edge cases unspecified, forcing implementors to grind through the edge cases with heuristics?

Isn't it formally proven correct? And "simpler" would be subjective right? So what do you mean by "proven to truly be simpler than Paxos"? Thanks, and also that talk looks really great.


I don't believe it has been formally proven correct. I would love to see evidence that it has been proven to be mathematically equivalent to Paxos.


My understanding is that Raft has been proven formally correct in one of the author's PhD:

https://ramcloud.stanford.edu/~ongaro/thesis.pdf (Chapter 8)


That's great, they hadn't done this when I last talked to them.


naw dog, they got tla+ and shiznick


This is pretty awesome and I think only real hardcore folks who need time-guaranteed leader failover (like, I don't know, a heart pacemaker or something) would stick with paxos. (Assuming paxos can provide that).


hmm, actually watching the paxos video posted above - even basic paxos needs randomized delays to avoid livelock with multiple proposers. ouch. so why would people choose it over raft?


Can the Raft Consensus works with only two nodes?


Technically speaking, it should work, however it is no longer available during partitions since the only 2 nodes partition creates two minority partitions, so the algorithm can't promote a new leader, nor make progresses since new log entries can't reach the majority. Basically you have the same properties using a single node without consensus, if not for the data redundancy. However it is not something like: one node dies, the other can continue. TLDR: 3 nodes is the practical minimum.


The RAFT paper is incredibly comprehensible, and clearly written with an audience of engineers in mind (it even specs out signatures for RPC calls). Definitely worth a read, if you have't read it already.

I saw a good talk about RAFT given by a Basho engineer, video and slides are here: http://www.meetup.com/Erlang-NYC/events/131394712/

The speakers Erlang implementation: https://github.com/andrewjstone/rafter


Have the authors seen [ http://en.wikipedia.org/wiki/Paxos_(computer_science) ](PAXOS)? It seems related.


PAXOS is very complicated, Raft is direct response to it by simplifying the consensus algorithm. From the article:

  "It's equivalent to Paxos in fault-tolerance and performance. The difference is that it's decomposed into relatively independent subproblems, and it cleanly addresses all major pieces needed for practical systems."
One of the authors of Raft is Prof. John Ousterhout(TCL creator, among other things)


Yes. They even reference it on their website:

"Raft is a consensus algorithm that is designed to be easy to understand. It's equivalent to Paxos in fault-tolerance and performance. The difference is that it's decomposed into relatively independent subproblems, and it cleanly addresses all major pieces needed for practical systems."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: