You need to watch his video, very cool, it really helps to understand how this w...

chias · on May 13, 2020

There's a few things in there that are factually incorrect -- in particular, the false notion that "every input has a unique output" can be quite dangerous in some cryptographic settings.

That said, the purpose of this talk is about the mechanics of the function, and not its properties or how to use it safely. So don't let that detract from what is, really, an awesome presentation.

tridentboy · on May 13, 2020

I'm sorry, could you please elaborate? I was always under the assumption that hash functions have to be deterministic, and thus, that "every input has a unique output" was a correct statement.

AFAIK the contrary is invalid, so that "not every output is the result of one and only one input".

chias · on May 13, 2020

A function being deterministic means that any input will have a single output. But it is not unique for any hash function, SHA-256 included. The definition of a hash function is any function which takes an arbitrary length input and outputs an n-bit output for some fixed value of n. By virtue of the fact that you have infinite inputs and finite outputs, the outputs cannot be unique.

Generally when people make this claim, what they're actually referring to is what's called Collision Resistance (CR) and/or Weak Collision Resistance (WCR), which instead make claims on difficulty of finding such collisions (of which infinitely many exist).

WCR, necessary for almost any cryptographic use, states that for any given input it should be difficult to find a different input which hashes to the same value. CR, generally desirable for cryptographic hash functions, states that it should be difficult to find two different inputs such that their hashes are equal. CR implies WCR, but WCR does not imply CR -- for example, SHA-256 (currently) exhibits CR but SHA-1 only exhibits WCR.

apeescape · on May 13, 2020

There are 2^256 potential outputs for SHA-256, while the number of potential inputs is infinite. Therefore, the same output can be generated with different inputs, although finding such "collisions" by chance is extremely unlikely

surye · on May 13, 2020

The claim is not that every output has a unique input, which would not be correct, and seems to be what you are addressing.

chias · on May 13, 2020

at 1:08 in the video, that is exactly what he claims:

"So every piece of data in the world has its own unique hash digest."

This is false for the reasons apeescape describes: every piece of data in the world has its own hash digest, but these hash digests are not unique.

infogulch · on May 13, 2020

Yes that sentence is technically incorrect, but practically correct. We've never found a collision and though we expect it to be theoretically possible, even common if you consider "all possible inputs" and the pigeonhole principle, for practical purposes hash outputs are unique because nobody considers "all possible inputs" when evaluating probabilities.

I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. Because following that with "technically, it's more 'practically' unique, theoretically there are collisions but you won't encounter them with probability > 2^-256" (or whatever it is) just confuses the topic to them more than just summarizing. You have to admit that most people won't go on a 200h adventure to learn about the state space of 256+ bits and how to conceptualize tiny statistical probabilities, so there must be a point where you have to cut the explanation to an approximation of the truth. This is true in every field.

tialaramex · on May 13, 2020

I don't like to leave holes like this in people's comprehension. It's OK if people don't end up with an intuitive feeling for how relatively unlikely different things that don't actually happen are, but I want them to be aware of that category as distinct from things which can't happen because the type of argument needed is different.

The air molecules in the room you're in can't all gather in one corner because that's not possible, it's forbidden by conservation rules.

But they won't gather in two opposite corners only because that's so tremendously unlikely, it would be allowed by conservation but statistically it's ludicrous.

The same is true at the opposite end of the spectrum. Almost all real numbers are normal (in all bases) but the nature of "Almost all" in mathematics is different in an important way from "All" and I want people to grasp this difference when I'm discussing properties of numbers. It definitely is not true that all real numbers are normal, you probably rarely think about any normal numbers at all.

infogulch · on May 13, 2020

> I don't like to leave holes like this in people's comprehension.

I agree. I think this wording would be better than in my previous comment, what do you think?

    it's reasonable to say that hash outputs are *almost surely* unique

chias · on May 14, 2020

> I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. [...] theoretically there are collisions but you won't encounter them

You could have said exactly the same thing about MD5 right up until you couldn't. Then you could have said "oh yeah well MD5 is broken, but it's safe to assume you'll never find one for SHA-1", right up until we did. So if you say "oh yeah well SHA-1 is broken, but it's safe to assume you'll never find one for SHA-256", I disagree.

It would be one thing if collisions in hash functions were found by just repeatedly hashing things until you find a collision. If that were the case, then yes, I'd agree with you on those 1-in-2^256 odds, at least for a while. But by and large, that's not what happens. Over time, weaknesses are found in algorithms which allow you shrink the search space, which significantly changes your odds.

chrisweekly · on May 14, 2020

Kind of agree w you, but still feel adding a few words by way of a disclaimer about collisions is much better than presenting as plain truth something that merely approaches it.

jhardy54 · on May 13, 2020

On the other hand, if we can count "every piece of data in the world" then we can estimate the probability of having a collision.

riquito · on May 13, 2020

I see what you mean, but it sounds like the output is unique, and we probably agree that in this field you need to use sentences that cannot be easily misinterpreted.

kebman · on May 13, 2020

That video is really, really awesome! And it won't leave you feeling "Japanese" either. (Which is a great people, btw. I'd really like to go there someday, mostly for the food and language and history. And Anime also, I'm forced to admit.)

kebman · on May 13, 2020

[flagged]

Phenomenit · on May 13, 2020

I think you're being down-voted because your comment doesn't really add anything to the discussion at hand.

kebman · on May 13, 2020

Oh, to this site punishes people for adding a positive remark. Great... I'll keep that in mind then. Anyway, thanks for letting me know!

Dylan16807 · on May 14, 2020

Just being positive and nothing else is what the upvote button is for. And turning your comment slightly gray is not really a punishment.

tomhoward · on May 15, 2020

Positive, supportive comments have always been welcome on HN.

Dylan16807 · on May 16, 2020

Being positive and supportive is a good quality, but it is not enough to make a comment good. Comments are supposed to have thought and substance too.

Shallow praise is better than a shallow dismissal, but not by enough.

dang · on May 16, 2020

That's too harsh, and not in the spirit of the site. Right from the beginning, pg made this distinction:

Empty comments can be ok if they're positive. There's nothing wrong with submitting a comment saying just "Thanks." What we especially discourage are comments that are empty and negative—comments that are mere name-calling.

https://news.ycombinator.com/newswelcome.html

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

tomhoward · on May 16, 2020

I'm pretty sure I've seen dang or pg say that quick, low-effort comments that express enthusiasm for a comment or post are fine (whereas low-effort, drive-by dismissals are very much not fine). I've tried searching but it's not obvious what search terms would turn up such a comment.

But I hope my recollection about that is right. Yes we want comments to be substantive in general, but we don't want to be surly or even Grinch-like when someone is just expressing excitement and affirmation for someone else's contribution. I'm sure that's not what pg or dang would want here.