Reading bits in far too many ways

userbinator · on Feb 20, 2018

The first major decision to make is whether fields are packed MSB-first or LSB-first

Unfortunately in many cases like image and video codecs, that decision has been made for you already --- and not in the direction that's most efficient for a CPU (but of course is pretty much immaterial for hardware.) I suspect "It’s easy to fool yourself into thinking that one variant is better than another because it looks prettier" was what drove those decisions.

The big-endian/MSB-first order requires extra "bit movement" if you are maintaining a bit buffer in a register, since the bits of interest are always going to reside in the topmost bits, but you often need them in the lowest bits to use them. With little-endian/LSB-first, the bits of interest are already in the lowest position, and naturally "fall off the end" as you shift right and replenish into the upper bits. IMHO another reason to prefer little-endian: it's very logically consistent.

BeeOnRope · on Feb 26, 2018

I don't think it's as cut and dry as you describe. With the MSB-first approach, sure the "current" bits are at the end you don't want, so you need a shift to move them into the low bits - but this (logical) shift does double duty since it zeros all the other higher bits which you usually need to do before using it as well.

With the LSB-first approach, where the current bits are already the low end, you usually can't use them directly (e.g., to do arithmetic on them, or use them in an address calculation) since you need to mask off the higher bits[1] - how do you do that? You can use two shifts: << then >>, or generate a mask and AND it, or ...? If the number of bits read amount is a compile-time constant it's probably as simple as a single AND, but the non-constant case seems a bit worse for LSB-first.

You say that the current bits just naturally fall off the end after shifting in the LSB-first case, but the same is true in the MSB-first case: they fall off the left side after a left shift, instead of off the right side after a right shift.

[1] Yes, there are exceptions such as if you want to use say exactly 8 bits and your hardware has some efficient way to refer to the bottom 8 bits of a register (see: x86) or if having garbage in the high bits doesn't matter for your subsequent operations (or perhaps you can amortize the zeroing over several bit reads: e.g,. you read a bunch of 7-bit fields, add them, then zero the high bits).

ghusbands · on Feb 20, 2018

Your reasoning about which end has interesting bits has no basis in how CPUs actually do any of their work The byte-ordering does not fundamentally affect how operations are implemented.

Us humans often find little-endian confusing because we write numbers big-endian. If you have the number 258 (0x102) in memory in little endian (which most computers use) 32-bit, it is hex 02 01 00 00, binary 00000010 00000001 00000000 00000000. If you bit-shift that one to the right without care, you end up with 00000001 00000000 10000000 00000000, hex 01 00 80 00, which is 8,388,609.

To fix it, write your digits also in little-endian order, so the first number is 10000000 010000000 00000000 00000000. Then the shift operations match expectations.

userbinator · on Feb 21, 2018

Your reasoning about which end has interesting bits has no basis in how CPUs actually do any of their work The byte-ordering does not fundamentally affect how operations are implemented.

Yes it does. Consider an MSB-first ordering, and you want to extract 4 bits to use as an integer or whatever: in the bit buffer, that would be "aaaaxxxx...", where "a" are the bits you're interested in. You'll need to copy from the bit buffer to the "work" register, then shift right in order to put them in the right place. Furthermore, to "eject" those bits, you need to shift left and insert from the lower bits, i.e. the bit buffer becomes "xxxx..." in its most significant bits.

With LSB-first, "...xxxxaaaa", the bits do not need any shifting --- they're already in the right place.

ghusbands · on Feb 22, 2018

Just no. CPUs don't have an endianness on internal registers and operations - it only applies to memory access (and occasionally some simple conversion operations).

BeeOnRope · on Feb 26, 2018

They are talking about bit-endianess here not the usual byte-endianess when they mention MSB-first and LSB-first.

The bit-endianesss you chose greatly impacts how your bit-reading loop will be implemented. In particular, if you want to use some bits, you want them in the low bits: if you write the value 42 into a 7-bit field, and then later I give you back (42 << 56) or something, you are going to be very confused: you expect to get back 42 (i.e., the return value equals 42, or to be pedantic the 7-bit field should be right-aligned in the uint32_t or uint64_t return value).

ghusbands · on Feb 28, 2018

Oh my. You're right, I entirely misunderstood. How embarrassing. I wish I could delete my comments.

jandrese · on Feb 20, 2018

Personally I don't care so much which is better, I just wish people would choose one and stick with it. I mean eventually 2's complement math managed to win, why can't LSB?

Of course the goddamn Internet decided to go MSB, so we'll forever be stuck with that friction.

And don't get me started on line endings. Gah!

jrochkind1 · on Feb 20, 2018

This article is awesome, I don't work in this kind of thing at all, and I feel like it's a good exersize for my brain.

zmodem · on Feb 20, 2018

Part 2: https://news.ycombinator.com/item?id=16419774

cpach · on Feb 20, 2018

Fun small educational project: If you want to learn how to work with units smaller than a byte, implement Base64 from scratch. It’s really fun!

psykotic · on Feb 20, 2018

In case anyone might be mislead given the topic of this article, you shouldn't do base64 encoding with a general purpose routine like getbits(). Since 3 * 8 / 6 = 4 you can process the input bytes in groups of 3 to avoid straddling issues (ideally with an unaligned 32-bit load if they're fast on your platform). Then your inner loop will look like this:

    uint32_t triple = *(uint32_t *)in;
    *out++ = base64[triple & MASK];
    *out++ = base64[(triple >> 6) & MASK];
    *out++ = base64[(triple >> 12) & MASK];
    *out++ = base64[(triple >> 18) & MASK];
    in += 3;

(The 32-bit load can read one byte past the end of the buffer, so allocate extra space or stop the loop early.)

wmu · on Feb 20, 2018

Better would be the Huffman coding; one has to deal with variable bitfield widths.

zmodem · on Feb 20, 2018

In fact, that appears to be where the author is coming from. Part 2 mentions techniques coming from the Kraken Huffman decoder (http://www.radgametools.com/oodlecompressors.htm)

ahartmetz · on Feb 20, 2018

Almost everything on this guy's blog is amazing if you are interested in hardware, optimization and that kind of thing.