Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a suspicion what you'll find is that Bing use the toolbar to match $current_page_content with $clicked_page_content. When $current_page_content contains obscure words, that becomes the only signal, and so bing's engine will naturally associate it with $clicked_page.

In other words, there's a relationship between Page A and Page B if there exists a link beween them (==PageRank). But the strength of the relationship is increased based on how many users click on that link. I think that's the information Bing were trying to capture (or if they weren't, they should have been).

What I'm saying is that it's probably an unintentional side-effect. At scale though, the effect is that Bing gradually uses Google as a signal, simply because Google is a popular site.

edit: Yet another way of saying it: I think it's not just clicks on Google searches that are captured by Bing, but clicks anywhere. Google is a large site, so its influence on Bing can be measured. This is what we're seeing. My theory. I don't work in search.



Exactly. If they are just matching (even more simply) $search_term_entered to $clicked_link then you would expect that they are "copying" from any search engine configured in the toolbar.

Now the interesting thing to reverse engineer is what other information might be passed along to give relevance to the search term/click pair. If Google could establish that there was a third piece of info in the tuple, such as "originating search domain" and that Bing used this to weight term/click pairs based on the authority of the source, Google's claims would hold more water. I suspect that Bing has to apply some kind of validation of the term/click pairs (for instance, only sending pairs that appear on the same results page from accredited engines), otherwise they would be subject to "Bing bomb" attacks where users or botnets vote up lower ranked (or even unranked) clicks for a given term. (And if they don't validate or detect gaming, then there would be ample opportunity to inject all kinds of synthetic behavior into Bing's search results. Based on the relatively few number of users and clicks it took to own a long tail term, it seems like the protection they have is very weak or simple.)


This makes a lot of sense, and would have be easy enough for Google to test as well, creating some tiny, brand new, never before heard of test search engine that Bing would have no reason to copy, see if the same thing happened.


That would be a nice way of testing it.

edit: I'm not even sure if it's only search engines that are being analysed by Bing or all pages, but it's possible that it is just SEs - they could be capturing query terms distinctly.


They also probably should have tested to see if it happened with results other than the ones in the #1 spot.


Also, why did the experiment succeed for only 6 or 7 of the 100 terms that they tried? There's more than what meets the eye here, regardless of the hype and everyone jumping on the bandwagon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: