You can do the pre-rendering on the previous line if you can spare two buffers. You just render into a buffer, then during the blanking interval latch it into a shift register. That gives you a lot more time to render.
You're right and I think I'm actually doing that on my 60% complete prototype. Haven't touched it for a couple of months and I've forgotten some of the details!