Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm, interesting. I guess they'd likely end up three bytes in UTF-8. Do you have any details you can point me at?


Nothing like a paper or anything, but just kind of anecdotally I downloaded 西遊記 from Project Gutenberg [1] and measured it. In UTF-8 it's 2,236,564 bytes, in UTF-16 it's 1,554,998 bytes.

[1] https://www.gutenberg.org/files/23962/23962-0.txt




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: