Right, but the claim was this: *if you have been handling Unicode and using wide...

josephg · on April 15, 2021

Yeah I hear you. Its definitely possible to write correct code using UCS-2 (where each "char" sometimes represents only half of a codepoint). But its easy to end up with subtly broken code, that only breaks for non-english speakers who don't know enough english to file a bug report.

The ergonomics of the language guide you in that direction when, as you say, a "char" doesn't actually represent a character. Or even an atomic unicode codepoint. And when string.length gives you an essentially meaningless value.

Luckily, code like this will also break when encountering emoji. Thats great, because it means my local users will complain about these bugs and they're easy for me to reproduce. As a result these problems are slowly being fixed.