Well, look on the bright side, at least you're not calculating
bucket = hash(x) % buckets
Unless you're stuck supporting non power of two number of shards in which case you need modulus. And shoving more randomness into the LSB is more important than the MSB.
I've thought about it some and I think anyone with a networking background is automatically going to "subnet" their data from MSB down out of pure raw habit. Of course you split top down, just like IP addrs.
I've also seen non-technical popular explanations of sharding which stereotypically use something like phone numbers and start at the most significant digit.
And a stereotypical test question is this does not work well with stuff like customer IDs or transaction IDs because they tend toward sequential-ish, unless you're talking about some kind of data mining thing not daily live transactions.
On the other hand it works well if you shard (sorta) by date if you use the right (usually non-native) format. So if you have dates like 2013-10-14 and shard by YYYYMM somehow, then it could be easy to wipe 201210 from the database because whichever shard its on, its probably not impacting the latency figures for 201310 today. Unless you wanted to speed up the delete by smoothing it across all shards in which case sharding by hash ends up being the smart idea.
Trying to do tricky things can turn into a mess when the tricky thing changes mid project, too.
This is actually an important topic for me. I am implementing a sharding distribution based on consistent hashing using MurmurHash3 as hash function.
I am taking the first 4-bytes of the hash function output and using that. I checked and MurmurHash3 mixes the first and last 8 bytes of hash output as a last step, but I am not sure how much differentiation there is in the first 4-bytes.