Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using vectorization to speed up UTF-8 character counting (daemonology.net)
17 points by cperciva on June 5, 2008 | hide | past | favorite | 3 comments


I thought the title meant it was using Altivec or SSE, but it's merely operating on a chunk of 4 bytes at a time dealing with misaligned data up front. Still a good article, despite my initial disappointment.

A similar article which originally taught me these tenants is:

http://rentzsch.com/papers/straightenUpAndFlyRight


it's merely operating on a chunk of 4 bytes at a time

If you have a modern CPU, that code operates on 8 bytes at a time. :-)


I don't think I believe those benchmarks. Many of those numbers are significantly slower (!) than main memory bandwidth, and for code like strlen() which can be naively implemented in about three instructions per byte.

Something is wrong with the testing, I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: