Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Right, but the claim was this:

if you have been handling Unicode and using wide characters, you have not been handling Unicode properly

I agree that UTF-8 is a better encoding overall for the majority of cases. I don't think that means UTF-16, which for example Delphi UnicodeStrings are[1], is not proper.

edit: maybe this is a language confusion thing. For historically tragic reasons, we're stuck with "char" as the basic element of string types in lots of languages. In Delphi a "widechar" is technically a code unit[2], and may or may not represent a code point. This is how I interpreted the OP. Maybe he meant wide characters as code points, in which I would agree.

[1]: http://docwiki.embarcadero.com/RADStudio/Sydney/en/Unicode_i...

[2]: https://en.wikipedia.org/wiki/Character_encoding#Terminology



Yeah I hear you. Its definitely possible to write correct code using UCS-2 (where each "char" sometimes represents only half of a codepoint). But its easy to end up with subtly broken code, that only breaks for non-english speakers who don't know enough english to file a bug report.

The ergonomics of the language guide you in that direction when, as you say, a "char" doesn't actually represent a character. Or even an atomic unicode codepoint. And when string.length gives you an essentially meaningless value.

Luckily, code like this will also break when encountering emoji. Thats great, because it means my local users will complain about these bugs and they're easy for me to reproduce. As a result these problems are slowly being fixed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: