There's a few things in there that are factually incorrect -- in particular, the false notion that "every input has a unique output" can be quite dangerous in some cryptographic settings.
That said, the purpose of this talk is about the mechanics of the function, and not its properties or how to use it safely. So don't let that detract from what is, really, an awesome presentation.
I'm sorry, could you please elaborate? I was always under the assumption that hash functions have to be deterministic, and thus, that "every input has a unique output" was a correct statement.
AFAIK the contrary is invalid, so that "not every output is the result of one and only one input".
A function being deterministic means that any input will have a single output. But it is not unique for any hash function, SHA-256 included. The definition of a hash function is any function which takes an arbitrary length input and outputs an n-bit output for some fixed value of n. By virtue of the fact that you have infinite inputs and finite outputs, the outputs cannot be unique.
Generally when people make this claim, what they're actually referring to is what's called Collision Resistance (CR) and/or Weak Collision Resistance (WCR), which instead make claims on difficulty of finding such collisions (of which infinitely many exist).
WCR, necessary for almost any cryptographic use, states that for any given input it should be difficult to find a different input which hashes to the same value. CR, generally desirable for cryptographic hash functions, states that it should be difficult to find two different inputs such that their hashes are equal. CR implies WCR, but WCR does not imply CR -- for example, SHA-256 (currently) exhibits CR but SHA-1 only exhibits WCR.
There are 2^256 potential outputs for SHA-256, while the number of potential inputs is infinite. Therefore, the same output can be generated with different inputs, although finding such "collisions" by chance is extremely unlikely
Yes that sentence is technically incorrect, but practically correct. We've never found a collision and though we expect it to be theoretically possible, even common if you consider "all possible inputs" and the pigeonhole principle, for practical purposes hash outputs are unique because nobody considers "all possible inputs" when evaluating probabilities.
I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. Because following that with "technically, it's more 'practically' unique, theoretically there are collisions but you won't encounter them with probability > 2^-256" (or whatever it is) just confuses the topic to them more than just summarizing. You have to admit that most people won't go on a 200h adventure to learn about the state space of 256+ bits and how to conceptualize tiny statistical probabilities, so there must be a point where you have to cut the explanation to an approximation of the truth. This is true in every field.
I don't like to leave holes like this in people's comprehension. It's OK if people don't end up with an intuitive feeling for how relatively unlikely different things that don't actually happen are, but I want them to be aware of that category as distinct from things which can't happen because the type of argument needed is different.
The air molecules in the room you're in can't all gather in one corner because that's not possible, it's forbidden by conservation rules.
But they won't gather in two opposite corners only because that's so tremendously unlikely, it would be allowed by conservation but statistically it's ludicrous.
The same is true at the opposite end of the spectrum. Almost all real numbers are normal (in all bases) but the nature of "Almost all" in mathematics is different in an important way from "All" and I want people to grasp this difference when I'm discussing properties of numbers. It definitely is not true that all real numbers are normal, you probably rarely think about any normal numbers at all.
> I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. [...] theoretically there are collisions but you won't encounter them
You could have said exactly the same thing about MD5 right up until you couldn't. Then you could have said "oh yeah well MD5 is broken, but it's safe to assume you'll never find one for SHA-1", right up until we did. So if you say "oh yeah well SHA-1 is broken, but it's safe to assume you'll never find one for SHA-256", I disagree.
It would be one thing if collisions in hash functions were found by just repeatedly hashing things until you find a collision. If that were the case, then yes, I'd agree with you on those 1-in-2^256 odds, at least for a while. But by and large, that's not what happens. Over time, weaknesses are found in algorithms which allow you shrink the search space, which significantly changes your odds.
Kind of agree w you, but still feel adding a few words by way of a disclaimer about collisions is much better than presenting as plain truth something that merely approaches it.
I see what you mean, but it sounds like the output is unique, and we probably agree that in this field you need to use sentences that cannot be easily misinterpreted.
That video is really, really awesome! And it won't leave you feeling "Japanese" either. (Which is a great people, btw. I'd really like to go there someday, mostly for the food and language and history. And Anime also, I'm forced to admit.)
That's too harsh, and not in the spirit of the site. Right from the beginning, pg made this distinction:
Empty comments can be ok if they're positive. There's nothing wrong with submitting a comment saying just "Thanks." What we especially discourage are comments that are empty and negative—comments that are mere name-calling.
I'm pretty sure I've seen dang or pg say that quick, low-effort comments that express enthusiasm for a comment or post are fine (whereas low-effort, drive-by dismissals are very much not fine). I've tried searching but it's not obvious what search terms would turn up such a comment.
But I hope my recollection about that is right. Yes we want comments to be substantive in general, but we don't want to be surly or even Grinch-like when someone is just expressing excitement and affirmation for someone else's contribution. I'm sure that's not what pg or dang would want here.