> Yes. I started writing my test program to see if hash
> collisions actually happen - and are not just a theoretical construct.
This should be no surprise. The birthday paradox, as Knuth points in in TAoCP,
makes it clear why collisions happen. Given 23 people in a room, there is a greater
that 50% chance that at least two have the same birthday.
Placing only 23 keys into a hash table with 365 entries with a perfectly
random hash function will probably result in at least one collision.
Placing only 23 keys into a hash table with 365 entries with
a perfectly random hash function will probably result in at
least one collision.
No, it won't "probably" result in at least one collision, it will result in at least one collision with probability about 1/2. As quoted elsewhere, placing k objects in a hash table with N entries will result in no collisions with probability about exp(-(k^2)/(2N)).
Interesting. I was going to complain, as I thought "probably" meant the probability of occurrence was greater than 50%.. That doesn't seem to be the case.
Probably: almost certainly; as far as one knows or can tell
I would guess waynecochran thought something similar.
Curious. I've just taken a straw poll in the office. I asked "if someone said something was probably going to happen, what probability would you give it?"
While mathematically correct, this does not seem to be how people use with that phrase. "X is probably going to happen" implies there is some evidence in favor of X, while "X is probably not going to happen" implies evidence against. Now, in regular speech, people do not feel compelled to force themselves to choose one or the other, but are just as likely to say "I have no idea", implying no evidence in either direction.
From an Information Theory point of view, 50% probability means zero information, and 25%/75% probabilities mean one bit of information against/in-favor-of the event. This sounds roughly consistent with the results of the poll, with people assigning higher probabilities implying higher information density of the evidence.
This should be no surprise. The birthday paradox, as Knuth points in in TAoCP, makes it clear why collisions happen. Given 23 people in a room, there is a greater that 50% chance that at least two have the same birthday.
Placing only 23 keys into a hash table with 365 entries with a perfectly random hash function will probably result in at least one collision.