Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Kinda pointless that they're hashing the urls, as to tell if that url was malicious google/tencent would have had to generate that hash themselves as well. The associations are already there, so the hashing step adds nothing (except to prevent a MITM attack where the bad actor didn't have a lookup table).

If Apple wants to create a privacy-oriented safe browsing experience, they need to ship that dataset with the phones themselves.



By using the prefix of the hash, it adds some uncertainty about which URL you're requesting


This assumes there can be many prefix matches for any given url hash.

How many times (probability) do you think a 32-bit hash prefix matches more than one 256-bit hash? Very very unlikely; it's one-to-one practically speaking.


Nearly 100% probability. Someone else[0] already did the ballpark math and came up with ~10,000 unique URLs per 32-bit hash prefix if you use the number of pages indexed by Google. My math has it coming out to be ~10 billion per 32-bit hash prefix. I simply did: 30,000,000,000,000/(36*8), but I might be doing something wrong, happy to be corrected.

The issue is, as mentioned in that comment, with visiting a handful of pages you can get a clearer picture of similar URLs that are being returned among these hash prefixes, which can be used to build a profile of your browsing history.

[0]: https://news.ycombinator.com/item?id=21255223


Why are you doing 36*8? It look likes 26 + 10?

You got 32 bits, thus 2^32 = 4 294 967 296 possibilities

30 000 000 000 000 / 4 294 967 296 = 6984.9193

That doesn't seems to me like a 100% probability at all.


You're right, I was using 0-9A-Z, when it should have been 0-9A-F. I still need some coffee :)

So for 6,984 URLs per 32-bit hash, wouldn't that be evenly distributed since it's the result of a hashing function? Therefore we'd expect fairly close to 6,900 URLs per prefix? In what situation would you expect a 1-to-1 of 32-bit prefix to URL? Note: happy to be disproven, this is not my specialty at all.


> when it should have been 0-9A-F.

We are working in bits, use bits instead, that will avoid theses kinds of mistakes.

> In what situation would you expect a 1-to-1 of 32-bit prefix to URL?

Oh yeah sorry I misunderstood it, yeah it's pretty unlikely that you would get 1 url for a prefix(but still possible). I would have to get out my old probability books to find that out but it's not worth it, the probability would be way too tiny.

I thought it was about certainty to be able to match it with the real URL. In theory it would takes only a few page hit to be certain of the domain and thus the URL (if there's no unknown string in the URL).


> We are working in bits, use bits instead, that will avoid theses kinds of mistakes.

K.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: