I suppose I could try to go to each linked article and find their RSS feed and see if they normally expose the content but that seems quite complicated. I don't know if missed click-throughs is why pg doesn't do something similar for the existing HN rss feed. I suspect that it's hard to make it always have the correct content. This is clearly evident in the fact that my page doesn't get parsed correctly. :)
I'm not super familiar with pg's RSS feed but doesn't it just provide a link to the article? That doesn't bypass the click-through to the content provider. It encourages it, just like the HN homepage.
Right, so maybe that's why pg doesn't do something like this feed for the regular feed. Also, there's the fact that no heuristic will be a 100% reliable in determining the actual content area.
That doesn't mean you shouldn't do it. I see you're a PhD student. What if I wrote a program that would let people republish your research in full without your prior consent? Yes, I could add a feature that limits this program to only working with research that's been approved for it, but that seems quite complicated.
This Readbility service is wrong. I guess that's why it's popular?
I'm sorry that you mistook my description of the task difficulty as meaning that I wouldn't do something complicated. Like you said, I'm a PhD student, which incidentally means that I have many other demands on my time and do many difficult things every day.
"What if I wrote a program that would let people republish your research in full without your prior consent"
Nirmal that readability service is incredible. I tried it on my blog - I hate to admit it, but it does look a lot better. One thing that makes it much easier to read is that it strips out the comments. I am not sure how I feel about this, but I suspect for 80% of readers (and likely many of the people who use the bookmarklet) the comments are just noise.
Similarly stripping the navigation also makes it easier to read, but then it loses it's value as a website.
I wonder if there is a readability wordpress plugin that could display the readability version in a JS pop-up overlay. I think if I can find one, I'll add that to my site.
Edit: I just tried it on DaringFireball. Beautiful.
Well the traditional approach is to contact them and secure their permission. But as you alluded, that doesn't scale.
So perhaps there's a startup opportunity here, for a system that secures permission from people who are ok with their content being reproduced without their prior consent.
Perhaps there could be a service that creates a list of web sites where the content is licensed Creative Commons. Then, if someone was going to create a web app like this, they could filter that list against their web app to only include sites that have been licensed creative commons.