> IMHO a process which is lossy should never be described as deduplication. Depe...

samatman · on Aug 21, 2022

Splitting those into two cases: I think getting rid of format conversions (which can, after all, be performed again) is worthwhile, but isn't deduplication, that's more like pruning.

Multiple versions of an EPUB with slightly different content is exactly the case where a compression algorithm with an attention span, and some metadata to work with can, get the multiple copies down enough in size that there's no point in disposing of the unique parts.