Wednesday, June 24, 2009

Byte ordering endianness mayhem! Aahh!

Working with (binary) data structures written to files can be difficult due to the fact that a hexdump of a binary file may appear to be out of order.

The following example may illustrate the source of confusion. This simple python script will create three binary files. To each file we will write 01 02 03 04 split up in different ways.
from struct import *
open('1long.bin','wb').write(pack('L', 0x01020304))
open('2short.bin','wb').write(pack('HH', 0x0102, 0x0304))
open('4char.bin','wb').write(pack('BBBB', 0x01, 0x02, 0x03, 0x04))
We can hexdump them to see their contents:
$ hexdump 1long.bin
0000000 0304 0102 (this is the long: 0x01020304)

$ hexdump 2short.bin
0000000 0102 0304 (these are the 2 shorts: 0x0102, 0x0304)

$ hexdump 4char.bin
0000000 0201 0403 (these are the 4 chars: 0x01, 0x02, 0x03, 0x04)
Before hexdumping, one might suspect that the outputs would be the same, but they're not. Each gave a diffent ordering.

These examples were run on a regular run-of-the-mill 32-bit Intel machine. These machines have a 16-bit word size (weird, yeah, I know) and use little-endian byte ordering. Little endian byte ordering means that the lest sigificant byte of any given data type, will go in the lowest-addressed memory space (or slot in a file on disk).

The least significant byte of the long 0x01020304 is the "04". Why doesn't it appear on the far left in the hexdump such as this: 04 03 02 01?

.... Need to finish writing in here ....

Diagram that I drew showing how reordering address so they increase from right to left can help understand little-endian byte ordering:

After googling around a bit, I found a thread which talks about how to reformat your hexdump so it is in big endian format (I believe). It's quite useful:

$ od -tx1 -w16 -Ax 1long.bin
000000 04 03 02 01
$ od -tx1 -w16 -Ax 2short.bin
000000 02 01 04 03
$ od -tx1 -w16 -Ax 4char.bin
000000 01 02 03 04