I believe the main library for reader mode is called readability. I played around with a python implementation a while back. Just pipe in your raw html as part of the process. It's good, but not flawless. If I remember correctly, it included some quotes and image text as part of the body for the site I tried it on.