Author here, I'll check this every so often and try to answer any questions…

Etheryte · on Aug 12, 2019

Very cool project and good job on releasing a new version! Libraries like these are huge work.

One question: compared to the native text-to-speech on macOS, the synthesized speech sounds, for a lack of a better word, robotic. Is this an inherent property of the approach you used or a result of trying to squeeze something as complex as this into a Javascript library?

masswerk · on Aug 12, 2019

This is based on eSpeak [1] for Un*x environments, which is based on an application for Acorn/Risc OS. So, yes, it's quite dated. On the other hand, it's lean enough to be run in realtime in Emscripten...

However, all the configuration data, including phoneme tables, may be overwritten (but you would have to install eSpeak on your machine first, in order to compile these.)

Another approach would be actually porting this to JS (instead of cross-compiling), by this having full access. But I simply do not have the resources for this. (Meanwhile, there's the Web Speech Synthesis API. With this being available on most modern clients, it's probably not worth the effort.)

[1] http://espeak.sourceforge.net/

Cybiote · on Aug 12, 2019

I don't have a question, just wanted to say I found the Stereo Spanning example to be a genuine piece of art. The choice of voice and script were truly excellent, having such a robotic voice read the bot's lines was great. Their reading had me chuckling in a few places I would not have if I'd read it simply as text.

masswerk · on Aug 12, 2019

thebeefytaco · on Aug 12, 2019

Is this is 100% client-side and would work offline?

masswerk · on Aug 12, 2019

Yes, it's 100% client side, but you have to cache additional files. (A working set consists of the main script, a worker script for the application core and at least one voice definition to be used.)

Mind that the core won't run concurrently as a worker on mobile devices, but rather as an instance in the main/UI thread. This is, because mobile devices will mute the playback triggered by a message from a worker, as there is no immediate user interaction. Therefore, longer utterances are likely to block the UI noticeably, while the internal sound file is processed. This is a bit sad, but how things are.