> Redistribution and use in source and binary forms, without modification,
is permitted for the sole purpose of using the "tarsnap" backup service
provided by Tarsnap Backup Inc.
The codebase is a jewel. I love the design, the way it's organized, the coding style, the algorithms, everything.
Then I started making a mental map of tarsnap: How does it build its deduplication index? How does it decide where block boundaries start within a file? Etc.
Eventually I started coding the algorithms in Python, mostly as a way of understanding the code. It's not actually as hard as it sounds, but you have to be rigorous. (It's a C -> Python conversion, after all, so there's not much room for error.)
My process was basically: Copy the C code into a Python file; comment out the code; for each line, write the corresponding Python; try to get something running as quickly as possible.
It worked pretty well, but I eventually lost interest.
Over the years, I've wanted a deduplication library, and 2021 is no exception. Someday I'll just roll up my sleeves and finish porting it.