At some point it might become useful to feed the audio in to speech recognition,...

At some point it might become useful to feed the audio in to speech recognition, then feed the result in to a Text-to-Speech engine. You will lose all of your prosody and speaker characteristics, but blind people have their screen readers at crazy speeds so it will stay intelligible.