Good article, but there are a lot of claims of '100% certainty' that arn't neces...

DennisP · on Oct 24, 2017

It looks like there are about as many Earth-like planets in the universe as grains of sand on the Earth. Write your name on a grain of sand on one of those planets. Now have someone else randomly pick a single grain of sand from some planet in the universe. How certain are you that they won't pick yours?

The chance of that happening is roughly equal to the chance of a collision randomly occurring somewhere in a few quadrillion SHA256 hashes.

https://crypto.stackexchange.com/questions/52261/birthday-at...

http://www.npr.org/sections/krulwich/2012/09/17/161096233/wh...

https://www.cnet.com/news/the-milky-way-is-flush-with-habita...

dracodoc · on Oct 24, 2017

It's totally possible the reality is different from the theoretical limit. See collision attacks against MD5, SHA-0 by Wang Xiaoyun.

https://en.wikipedia.org/wiki/Wang_Xiaoyun

DennisP · on Oct 25, 2017

Yes, the chance that the hash will be broken is much higher than the chance of collisions occurring randomly. I'm just responding to "hash functions only guarantee no collisions to a high probability." People really underestimate how strong that probabilistic guarantee is.

api · on Oct 24, 2017

Those are directed attacks not random collisions.

skate22 · on Oct 24, 2017

The article uses the context of content validation from untrusted sources lol

DennisP · on Oct 25, 2017

As long as the hash function remains unbroken, untrusted sources can't screw with you.

Hash functions tend to be broken gradually and publicly, and we migrate to new ones as they start to look shaky. It's theoretically possible for someone to privately break a function that everyone else thinks is secure, but it would be an extremely impressive achievement since lots of full-time cryptographers work on breaking these things and publish every little bit of progress they make.

ComodoHacker · on Oct 24, 2017

Let's suppose everyone has moved to content addressing. Considering all the amount of content generated every day, how much time would it take before real hash collisions start to emerge?

openasocket · on Oct 24, 2017

The number of 256-bit hashes you would have to generate in order to have a 50% that there's a duplicate in there is ~4*10^38. If we had a billion machines each generating a billion new hashes a second, it would take over 12 million years to get that many.

GordonS · on Oct 24, 2017

This assumes computing power will remain constant; it won't.

Also, it's highly likely that flaws will be found with the hash algorithm long before 12 million years are up.

openasocket · on Oct 24, 2017

I don't assume computing power is constant, just that the rate of content generation is more or less constant (I think a billion devices publishing a billion new pieces of content every second is a pretty reasonable upper bound). OP asked about the odds of collisions occurring by accident due to the sheer volume of content generated and published, not about attacker scenarios.

Obviously we wouldn't use the same hash algorithm and setup for 12 million years, but the sheer absurdity of that length of time and that pace of content production shows this method will last, at least until flaws are found in SHA256.

jononor · on Oct 24, 2017

IPFS uses a multihash format, so that new hash functions can be introduced over time.

ComodoHacker · on Oct 25, 2017

>50% that there's a duplicate in ther

I think even 0.01% would be enough to harm usability.

DennisP · on Oct 25, 2017

The 50% chance he's referring to is the chance that, out of all that hashes that exist, there will be two hashes that match. He's saying that in his scenario it would be millions of years before there's even a single duplicate.

CoryG89 · on Oct 24, 2017

So what you're telling me is I have a chance? :P

Blahah · on Oct 24, 2017

100% certainty is correct to integer rounding. It's correct to many significant figures too.

skate22 · on Oct 24, 2017

I think '100% certainty' should be reserved for claims that have been proven to always be true. I'm fine with saying 'near 100% certainty'lol