Go has a nice set of polished libraries now for distributed computing (hashicorp/raft, etc) that makes it easy to start new projects. I wish other languages also had as many, might have to start porting.
Raft is incredibly easy to write correctly, with tests, since basically everything you need is in figure 2. I've done it once, but then lost all of the code because I got mugged, my computer was stolen, and I forgot to make a github repo, and am rebuilding it currently (current one I'm doing clocks at about 700 SLOC, including tests and it's about 3/4 done).
Raft is not easy to implement. People think it is. But that’s only because they have implemented the happy paths, which are hard enough to get right as it is.
The alternative to the happy paths would be all unexptecd things that might happen. They would be easy to implement only if we can forsee them. I think the diffuculty comes from our limited abiliyt to to predict all situations where things may go wrong.
A classic example of this is when in 2006 it was discovered that nearly all implementations of binary search and mergsort are broken - more than 50 years after these algorithms where invented, and after they've been implemented thousands of times by the brghtest minds in computer science.
Joshua Bloch (of the Java fame) bloged about this [1] and here is the main take-away:
> The key lesson was to carefully consider the invariants in your programs.
Raft is easy to understand. It may be easier to write compared to e.g. Paxos, but that's it.
The consensus module and log are not very useful on their own. You need the integration with multiple concurrent clients, some state machine (even key value store will be complicated). This is still hard and doesn't get any easier with raft.
It might be simple to implement the various operations but it's not easy to get perfect and reliable. I rather have a production-ready library used by major projects instead of something I write myself.
Wouldn't it be nice if this was actually part of the OS networking stack?
Wouldn't that make it easier to write correct (for some form of correctness) distributed applications,
leaving the messy details to a proven lower-level stack?
I wrote my master thesis on putting raft inside the RPC layer exactly for that reason. We arrived at the conclusion that this was indeed a very good way to easily provide distributed consensus to the application layer.
Ah, nice. My first introduction to systems with replayable command logs at their heart was Prevalyer, and it really changed the way I think about system design for the better. I'm excited to try this out.