Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stackoverflow abuses nofollow (2011) (brianbondy.com)
126 points by nerfhammer on Nov 12, 2012 | hide | past | favorite | 69 comments


This was dealt with more than a year ago, when we did remove nofollow from trusted posts links' http://meta.stackoverflow.com/questions/111279/remove-nofoll....

Would have been dealt with even earlier, but our experiments with it (http://meta.stackoverflow.com/a/69032/130213) lead us to believe Google was starting to treat us as a link farm.

What we were seeing in our earlier attempt wasn't pages appearing lower in results (which could reasonably be expected in some cases as a result of the change; and we wouldn't care, 2nd on a search under an equally good resource is fine by us), but pages being removed entirely; almost universally pages that had newly un-nofollow-ed links to stores, or ad-laden sites (mostly legitimate posts, but some spam that stuck around for a bit before the community deleted it). So, "classified as link farm" seemed like the most likely problem. Google naturally won't tell you why your ranking drops, so it was (and remains) just an educated guess as to what happened.

So we stopped, putting it on the "wish we could, but reality doesn't let us"-list until Google reached out to us (among thousands of others, I'm sure) to change our nofollow practices. Google didn't describe any changes in their algorithm, but it seems reasonable to me that there would have been some tweaks around nofollow to accompany a new policy; again just an educated guess.

Basically, this is an old post complaining about a long since addressed concern that we had tried to address even earlier but ran into practical problems with.

While the exact details of our algorithm are secret by necessity, I will say that we've had to consider posts individually to prevent spammers from kiting a single account up to post nofollow-less spam content. People still try, it's kind of astounding how much spammers try (I suspect SEO's opaqueness cuts both ways here), but it doesn't work (well, you can get one link in your profile; but there's less SEO juice to pass and you are hard-capped at one, no matter how long it takes someone to delete your account).

Disclaimer: Stack Exchange employee, I was on all the relevant calls but has been a couple years so grain of salt and all that.


I'm the original author of the article.

I'm not sure why this made HN 2 years after posting it, but I wanted to make a comment.

SO has implemented a way to remove the nofollow links, but it is way too strict, and probably only affects a very very very small percentage of answers. I'd bet less than 0.1%.

For example see this answer with 74 upvotes from a user with almost 100k reputation. The links are to MSDN (which is probably not spam by definition) and to a quoted source on techbubbles.com. http://stackoverflow.com/questions/2660355/net-4-0-has-a-new...


Those should have been un-nofollow-ed (went and looked the algorithm up), so that's probably a bug.

I'll dig into it, should have a fix in the next deploy (assuming it is a bug, but I don't see what else it'd be).


Hm. I took a look at a few of my highly voted comments. Some of them have nofollow, some don't. I can't really tell why. For instance, this answer: http://stackoverflow.com/questions/1990464/efficiency-of-pur... has lots of links in it, citing sources, all of which are nofollowed. A few more of my lower voted answers do not have nofollow. But this one does: http://stackoverflow.com/questions/1946426/html-5-is-it-br-b.... As I go through my history, I find that most of my links are nofollowed; even ones that get more than 25 upvotes, have been around for years, and I'm within the top 1% of reputation site-wide: http://stackoverflow.com/questions/5724522/creating-github-r....

So, either the algorithm is really strict and fairly capricious, or it's quite badly broken.


You're trying to decide on your end how many upvotes/downvotes should trigger nofollow - have you talked to Google about a standard (which presumably sites like Reddit could also use) to expose that information so Google's crawlers can reach their own conclusions on a case-by-case basis if need be?


This was also the exact time of all the Google Panda tweaks and the fallout from Google starting to penalize creative commons scrapers of Stack Overflow content. There were a lot of balls in the air and only a few of them were ours, so we wanted to be careful.

When Google is 90% of your traffic, you REALLY REALLY REALLY do not want to get on their bad side, even accidentally, so I can hope you understand why we wanted to be cautious to the extreme here. If Google decides you're not doing things right, it is literally a business ending move.

Unless your VCs and investors are cool with you losing, y'know, ninety percent of your traffic.


Somebody please remind the community about this post the next time someone says "You only have to care about Google algorithms if you're a fly-by-night spam site."


Of course, if/when there is a SO-penalizing bug in Stack Overflow or Google code, no-one will be able to fix it since they can't Google for Stack Overflow answers! ;)


I've seen this happen to other sites, something angers the Google algo and organic traffic falls off a cliff.

History seems to bear out that SO correctly identified the source of the "problem", and SO is such a valuable resource that I'd rather it be Google-visible over passing pagerank.

This is the problem when you have a single player dominating the search space with an opaque algo and a very limited appeals process.


I think "dealt with" is a strong way to phrase it.

The site still applies rel=nofollow to what seems like the majority of outgoing links. Since I would call the overwhelming majority of content on Stackoverflow not spam, there's a lot of good links getting nothing.


I guess I'm in the minority, but I'm not really buying that this is abusive.

Yes, it would be nice if SO removed the nofollow from known-good posts (which, indeed, it seems they are now doing), but adding nofollow is a pretty simple and reasonable way to make the site a whole lot less attractive to some of the most obnoxious sort of spammers. I would hope nobody is adding links to their SO answers with the expectation of an SEO benefit.


I thought that the entire point of nofollow was to apply it to links that come from user content so that spamming your site becomes less useful. Before this article, I never even heard of the idea that you're supposed to apply nofollow selectively based on your own judgment of how spammy a post was. I always thought it was a simple dichotomy: links created by users get nofollow, and links you create yourself don't.


That's typically the strategy for blogs. Links in a blog post are known good and don't have nofollow. Links in comments have nofollow to deter comment spam. That works for blogs where the site owner is creating the main content and the comments are secondary.

Sites like StackOverflow only have comments from users; there is no original content from the site owners (except inasmuch as they might participate as users). Using the blog model, it does seem reasonable to not use nofollow on content created by users who have already been vetted by your reputation system.


I agree that it does seem reasonable to be more granular with nofollow. But refraining from doing so and just applying nofollow to all user content hardly seems like abuse, even so.


StackOverflow has the philosophy that high-ranked users are just about indistinguishable from staff. So in that sense, high-ranked user profiles are similar to the staff page and the strict dichotomy of user-generatedness is blurred.


If there's a link in a stackoverflow answer of mine, it's there because there's really good information at the end of the link. That's the sort of information Google & Bing use to make their searches useful, it's the sort of information I want Google et al to know.


Google and Bing are free to decide to just disregard 'nofollow' tags on stackoverflow.com if they want.

Remember the tag was only put in place to help search engines in the first place. (Well, to stop people from spamming to get good search engine results. I think you get my point.)


I remember a court case where a website's terms of use precluded search engine bots for indexing the site. The fact that the site owner did not use the industry standard of the nofollow tag ultimately worked against them.

So yes google can disregard nofollow on a case by case basis and face the associated potential legal consequences. I am sure that this would not be a problem in the case of stackoverflow, but I am not sure this is true in general.


I'm going to assume you actually mean robots.txt instead of nofollow, since that's the only accepted way to automatically state "do not index this site" (or portions of it). The big difference here is that the site in question controls the content of robots.txt, while it's the owners of the sites with inbound links that control the presence of rel=nofollow.

As such, I don't think nofollow could have any possible legal consequences. It's intent is to indicate that you haven't vetted the links in question and as such are specifying a lack of trust in their content (i.e., user-submitted links); though like the article says, many people attempt to micromanage SEO through them which dilutes their usefulness (I've been told we have some links that do the same thing; no doubt someone thinking they were smarter than Google... so now we have to track them down and undo these pointless additions)


I almost turned my head all the way upside-down to figure out how Google is under some kind of requirement to obey nofollow when deciding what pages to recommend.

"You can browse my pages but you can't get useful information from them" is bizarre. I'm not saying that no site anywhere will ever sue over it, because this is America and you can sue over anything.


"google can disregard nofollow on a case by case basis"

This is somewhat correct. While Google won't disregard noindex/nofollow directives (or robots.txt Disallow) it can - and will - show "blocked" pages in SERP if the quarry is specific enough or a the page is "strong" (lots of inbound links/strong social signals and etc...)

However, it will not show in page info (hide description and title) and use alternatives sources of information to fill those out instead.

For example: it can use DMoZ info instead of the real one.

Source: http://www.incapsula.com/the-incapsula-blog/item/395-new-sni...


They can disregard the nofollow tags, but they don't have as much information about the links as SO does.


Yes, I agree. StackOverflow is harming the quality of results in Google because they want people to filter through them for the information. Anyone that's used SO knows that answers can be misleading, incomplete or straight up misinformed. At the very least they're generally less comprehensive and less keyword-full than source material that would appear as links.

I suppose it's selfish of me to want SO to share them and selfish of SO to mark them as nofollow and not allow the source articles to rise in ranking.


Original topic is not explicitly about content, but speaking of SO's content:

> StackOverflow is harming the quality of results in Google because they want people to filter through them for the information.

Well, no, actually. StackOverflow may be shooting itself in the foot. SO cannot dictate the rules of the search game to Google, and Google adjusts its SERPs pretty quickly, especially nowadays, when user behavior gets to have bigger influence on SERP with every ranking algo update. Poor quality content leads to higher bounce rate and lower avg. time on site and visit depth, which will inevitably lead to lower ranking and less traffic for SO. This (and perhaps a thousand of other factors) works for every site, and even more so for heavy traffic content projects like StackOverflow. So don't be afraid, the system will adjust itself.

Overall, I agree with your sentiment: I love SO, but nofollowing the Web looks kind of lame and selfish. After all, doesn't SO have its staff of moderators and admins to fight spammy comments?


yeh right that's going to scale NOT - UGC spam can cause your site to tank I have seen this happen to major authority sites (I work for FSTE 100 publisher)

Our suspicion is that when we clamped down on the spammers by banning accounts they reported our site for spam that they had created we lost 20% of our traffic - major brands like us can tough it out this could kill smaller sites without the runway to survive this or have "friends" inside the wire at Google.

I have had to help a small company completely rename and start again on a new domain after some one hacked the site and inserted 1000's of pages with links to porn sites.


Yes, I would be happy with the traffic from SO though the SEO benefit from a "follow" link may not hurt. I have seen the tricks that spammers do to get follow links and am sure SO will be spammed crazy if it allowed them.


I agree. The article incidentally misses the most obvious reason which is spam prevention.


Sites are free to overuse 'rel=nofollow', and search engines are free to selectively ignore it (for either link-discovery or ranking purposes).

The strongest point I see made by this author is about SO's hypocrisy: SO requires attribution to be with a link, and specify that link must not have 'rel=nofollow'. Yet their content relies heavily on references to elsewhere which are all 'rel=nofollow'ed.

A sense of fair play in attribution, and spirit of mutual assistance between reliable authorities, would suggest allowing at least some well-vetted outlinks to be unencumbered.


Quoting Jeff Atwood:

If you republish this content, we require that you: [...]

Hyperlink directly to the original question on the source site [...]

By “directly”, I mean each hyperlink must point directly to our domain in standard HTML visible even with JavaScript disabled, and not use a tinyurl or any other form of obfuscation or redirection. Furthermore, the links must not be nofollowed.

This is about the spirit of fair attribution. Attribution to the website, and more importantly, to the individuals who so generously contributed their time to create that content in the first place!

Anyway, I hope that clears up any confusion — feel free to remix and reuse to your heart’s content, as long as a good faith effort is made to attribute the content!

http://blog.stackoverflow.com/2009/06/attribution-required/


If the link to stackexchange was user-generated content I seriously doubt they would actually expect it to have nofollow.

The outgoing links to any site from the stackexchange blog do not have nofollow. Any content that they are explicitly curating does not have nofollow, any user content does. I think that is a reasonably consistent policy.


The author doesn't address this response by Jeff Atwood:

URGENT UPDATE We were seeing a significant drop in Google (organic) traffic for Server Fault after instituting this "follow links if enough upvotes post-edit or post-create" policy.

We traced it back to what we currently think are a string of posts on Server Fault that got nofollow removed through "trust", but were being interpreted by Google as link farms or spammy pages.

[....]

http://meta.stackoverflow.com/a/51156


There's a follow-up to the URGENT UPDATE on SO: http://meta.stackoverflow.com/questions/111279/remove-nofoll...

So it appears that they do remove rel=nofollow from reputable links, although their threshold for reputable appears to be very high.


How is that not the expected result?

When the sites that informed various well regarded answers on the sites get their due, their search engine rank rises. With such a well ranked site now pointing towards them, they rise in the results, often higher than the SO answer referencing it.

As a secondary issue, there's a relatively common black-hat-seo practice of buying a well regarded domain, and soaking that links for all they're worth to promote something. Their switch-flipping may have run afoul of systems designed to mitigate that.


The author posted the original article on 17th Dec 2010 and an update on 11th Dec 2011.

So not sure how this made it to the front page of HN. Not surprised the author doesn't address the response.


Although I get the concern here, it would seem that if you removed the nofollows, it would open up the site to all kinds of abuse and diminish the value of the content very quickly. It doesn't strike me as a very difficult task to create a few thousand profiles and have them vote for each other (and also give you a ton of votes to random people to hide better), and then use the collected power for SEO spam.

I agree with you that there has to be a better solution, but it doesn't strike me as a very trivial one... any thoughts on how one would approach this?


>It doesn't strike me as a very difficult task to create a few thousand profiles and have them vote for each other (and also give you a ton of votes to random people to hide better), and then use the collected power for SEO spam.

I have to disagree. I bet this kind of abuse is well studied by any major site relying on a reputation system/user content. I can't point to anything specific, but off the top of my head discovering that kind of abuse seems exactly like finding strongly connected components: http://en.wikipedia.org/wiki/Strongly_connected_component

So you could probably prevent this algorithmically, but even if you couldn't... the site is heavily moderated. Bad questions and answers are down voted or closed. In my experience, getting lots of reputation on SO actually requires a lot of persistent effort. I have just over 1000 rep myself and it felt like it took forever to get there. Even if you could get lots of upvotes, you'd still be looking at daily reputation limits. And finally, you'd waste all that on one or two spam links only to get perma-banned and have all your links deleted?

A combination of high reputation + an initial nofollow time period (to allow spam links to be discovered) seems like it would be pretty effective.

There's really only one route to taking advantage of the google juice: http://xkcd.com/810/


While I agree that it would be possible to get these networks exposed, it is still non-trivial effort. That's really the point I am making. Personally, I'd rather see them allocate more resources towards making developer QA even better, than chasing protective features.

My 1781 reputation was not that hard to come by - it happened organically with me trying to help out in the Google App Engine section. I'm not a developer by trade, just a hobbyist.

Based on that experience, I'd imagine there to be areas that are much more obscure, with little moderation oversight, being decent sources of easy reputation points...


"it would seem that if you removed the nofollows, it would open up the site to all kinds of abuse"

That's why the author of the article suggested that they remove "nofollow" for users who have an established reputation, like SlashDot does. That algorithm would seem to be very easy to implement.


Interestingly they already remove nofollow from links in your profile after you reach a certain karma threshold


What happens when a less established user edits the post of a more established one ( and maybe also vice versa)


You need a good amount of karma to edit someone else's posts, so that's a non-issue. If it was an issue, the rel=nofollow should be based on the rep of the user creating the link, whether that was the poster or the editor.


You need 2000 rep to do this, which is not so easy to gain, but certainly achievable for a highly determined person. However, what @wookietrader wrote applies -- each edit is marked with the person who has done it which solves the problem.


Anyone can edit. If your rep is low, it has to be approved.


Since you can check what user added what link (using diff), you can still use the reputation threshold. If that does not scale, just use the minimum reputation of the editors' reputations.


As the article mentions; at a certain reputation level StackExchange users are granted a followed link in their profile. Similar metrics could be applied to questions and answers with the "nofollow" being removed from highly rated user content which is unlikely to be spam.

Still a hard problem but author reputation and question/answer votes give a good set of data from which we might identify links to sources which deserve attribution.


The article seems to argue that because there is no perfect solution to spam (i.e. spammers will still want their links on SO even if they are nofollowed), then there is no point in marginally reducing the incentive to spam.


I think the point is myopic. Human moderators will eventually clean up spam, but Google is freakishly fast at indexing things like StackOverflow. Omitting rel=nofollow would mean that SO ends up lending linkjuice to spammers at least some of the time, thereby providing spammers with incentive to relentless spam even if their spam is cleaned up by humans.

My general rule of thumb is "any content that comes from untrusted users gets a nofollow stamp". It's "XSS prevention" for your pagerank.


But SO knows which users are trusted, and which aren't. So why is it adding rel=nofollow to trusted users?


When I say "trusted users", I mean "people who have the product's interests before their own". I don't think this applies to users, even high-reptuation users.

Arguably, it eliminates incentive to link to anything except the most useful resource. I have >11k reputation on SO, and I'll easily admit that I'd be tempted to link to my blog or whatnot for an answer whenever I could wedge it in, if I could siphon off some of SO's linkjuice. I'm not interested in spamming or anything, but that's how the game is played; SO has such a massive amount of clout with Google that any non-nofollow links from it confer an awful lot of weight.

By just nofollowing everything, they disincentivize people from even being tempted to game the system and promote their own stuff, even if it were done in a helpful way. Instead, I am incentivized to provide the most useful, comprehensive link I can find to answer a question. All my incentive is in earning an acceptance of my answer, rather than just in having an answer good enough to not not get spam-binned that sits there and feeds me pagerank.


I have seen reputable open source software developers put paid linkspam on their personal websites. When I called them out about it, they said "ha ha, I was wondering if anyone would notice that, yeah, they paid me a pretty good amount of money for that."

Trusted contributors could easily be bought off. Someone who was sick and tired of StackOverflow after putting way too much time into it, who was offered the right amount of money? Hey, you know, it might be worth it to make a few bucks instead of a few more meaningless internet points.

If you just make the threshold so high that only the very top contributors are nofollowed, then only that small cabal controls where SO's considerable linkjuice goes. If you broaden your criteria, there are more and more potential corruptible people; and it becomes easier to generate a little quick and easy reputation with a sockpuppet that you can spam with a few times before you're caught and move on to another sockpuppet.


I'm confident that stackoverflow already utilizes voting ring detection logic. High karma on SO is quite valuable, I'm sure there are many that have already tried to game the system, people who are probably a lot better at evading the voting ring detection than the average spammer...


I'm curious how karma on SO is valuable. Not in the "you seem smart i'll give you a job" sense of value, but in the actual converted into dollars value that spammers go after.


It's valuable in the "I'm a leet hacker" sense.


This topic is discussed quite frequently in the meta pages on StackOverflow. For example, http://meta.stackoverflow.com/questions/111279/remove-nofoll... covers exactly this subject.

SO gets lots of googlejuice, and because it's gameable, they appear to want to give some of that googlejuice only to "reputable" links (which they do for links in high-rep users' profiles). That's a really hard thing to judge, but the fact that they're thinking about it, talking about it, and soliciting feedback on it means that they're trying to do the right thing, even if the article's author doesn't think their "right thing" is good enough for him.


I can understand why they do it - diminishing the page rank of sites that give answers to questions on Stack Overflow causes Stack Overflow to gain more traffic - but it still seems kind of underhanded. They rely on others for the content of the site, so they should give credit to the producers of that content - without it, they would have no service at all.


This nofollow stuff has always seemed very weird to me for several reasons.

- Search Engines are supposed to be analyzing the web, figuring out whats important. They have the incentive to do this well. The websites they are analyzing do not. Websites have an incentive to nofollow everything. They might be worried about their own rankings, but why should they care about the sites they link to. Nofollow is safe and harmless. No-nofolow should get you in trouble. what does a site gain from not using nofollow links on everything?

- To enforce nofollow rules, Google are supposedly disciplining sites by hurting their organic rankings. But a page with nofollow links appears exactly the same to a user. It's just as right an answer as it was before. Is Google lowering the quality of its results to police the web?

- If Google are able to detect "link farms" and user generated content that should have been nofollow-ed, why don't they just treat those links as nofollow and ignore them. Use this detection to analyze the web, not police it.

- Are they ignoring important information sources? A huge, information rich portion of the web is user generated. Eg wikipedia & stackoverflow. How can Google really be ignoring these links here as data sources? The links on a wikipedia page for example, are very informative. If some webpage is mentioned frequently in stackoverflow questions and answers, it's probably important and its probably a good answer to a lot of questions people are asking google.


How is this news?

1. This post is from 2011 2. Google ended the ability to link sculpt a long time ago.

I would be interested in studies showing cases where link sculpting still worked.


Agree. The original post is actually from from Dec 2010 and an update on Dec 2011!


I really don't see SO as being the problem here. I happen to think that secret SEO/page rank algorithms are the culprit in many ways. Google is in the unique position as to be able to write a "how to behave" guide for the internet. Play by their rules and you are golden, start to violate them and you start to be "punished" in a gradual (but exponentially more severe). The whole time full feedback and data on what you are doing wrong is provided. As this approach evolves spammers would be less and less successful and might just have to resort to legitimate means of marketing.


Furthermore, the search engines could nip this kind of abuse in the bud, by ignoring the "nofollow" attribute of the links on a site if it determines that site is abusing it (e.g. if all or some high percentage of the links on the site have "nofollow").


They're technically using it for it's purpose, or they can claim they are even if they have a different motive. Stackoverflow allows links in comments and posts and this could be abused.


Sites with high pagerank that allow user submitted links are few and far between. If they didn't put nofollow (at least for new accounts), SO is popular enough that it would have to deal with targeted spam attacks aimed specifically at them. That is a sucky problem to deal with.


Will be abused sadly ;-(


I feel the exact same way about Wikipedia.


Especially since the revert brigade makes it very difficult to make any legitimate changes, let alone spammy ones, on any page that would be worth it.


This is an interesting reality. I know that some of the SO guys frequent HN. I wonder if we'll see a valid response.

The only possible reason I could see is spam avoidance.


Esprit d'escalier: "In my day (usenet), we didn't have links--and we liked it." (old SNL joke for those who don't get it).

This conversation has 2 angles I find very interesting. The first is the whole question of quality of the content. Google is a huge AI trying to give you the same quality of information that you'd select if you looked at the same (millions of) sources. At first, I liked netcan's answer[1], which says essentially, nofollow them all or none, Google will sort out its own. But the more I think about it, the more context that sites can provide automatically, the better. If a site can say "this content provided by untrusted source" (because we're a community-driven site, and we can't police everything), that's a help to Google's Algorithm. Google is free to still follow the link and perhaps, using it's other knowledge of the target, assign a trustworthiness score to the nofollow quality on the original site. If a good community is providing good links, but the website owner still marks them nofollow, that might become a badge of honor in Google eyes (all-crawling robot tentacles?).

As a side note, I love how everybody, from slimy SEO to internet ethicist, is trying to guess Google's secret formula (it used to be Coca-Cola that had a secret formula that was releveant). Google is essentially the unknowable proto-God of this new information universe.

The other interesting topic is getting paid for content. My first reaction to the OP was "wah, wah, OP doesn't want to pay for his little corner of the internet." Let's face it, the internet doesn't run for free (hardware and electricity), and the content that attracts people doesn't write itself. StackOverflow provides a community for people to deliver their content into and receive socaial rewards. If SO can't do that because of some quirk of SEO, then it's likely it wouldn't be able to host the community anymore. The internet has still not figured out a way to pay for that other than advertizing (the forcing of unwanted content on readers). But advertizing only works if you can keep track of views and stay on top of the SEO game. I totally agree they're hypocrites for hoarding the follows (must follow to us be we nofollow to everyone), but how's that different from a corporate board's fiduciary duty to shareholders to maximize the profit of the company? I also think cheald[2] makes a very good point about why the dynamics must be towards nofollow. The question is then how involved or committed are you to the community, and how do you feel rewarded for the time you contribute to it.

[1] http://news.ycombinator.com/item?id=4777260 [2] http://news.ycombinator.com/item?id=4774888


Someone have a mirror of the article?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: