Endianness#

By the end of this chapter you will be able to choose between native, big-endian, and little-endian byte order; use the portable n / N / v / V shortcuts; and apply the < / > modifiers to any integer or float.

A multi-byte integer has to be written somehow. 0x12345678 is four bytes — but in which order? Two conventions dominate:

  • Big-endian — highest-order byte first: 12 34 56 78. Used by most network protocols, by PowerPC and SPARC CPUs, and by file formats that care about portability (PNG, Java .class, most TCP/IP headers).

  • Little-endian — lowest-order byte first: 78 56 34 12. Used by x86, x86-64, ARM (in its common modes), and by file formats from the Intel lineage (BMP, WAV, TIFF in its common variant, all of FAT).

Mix these up and you will read nonsense. pack gives you three ways to pin down the byte order.

The portable shortcuts: n / N / v / V#

Four letters cover the most common wire-format cases:

Directive

Bytes

Endian

Meaning

n

2

big-endian

Network (port numbers)

N

4

big-endian

Network (IPv4 addrs)

v

2

little-endian

“VAX”

V

4

little-endian

“VAX”

“Network byte order” is big-endian. If you are reading an RFC, the answer is almost always n or N.

my $port = pack "n", 443;          # "\x01\xbb"
my $ip   = pack "N", 0x7F000001;   # "\x7f\x00\x00\x01" — 127.0.0.1

These four are unsigned only. With the ! modifier they become signed:

my $neg = pack "n!", -1;            # "\xff\xff"

If your field fits one of these four shapes, use them. They are portable, they are readable, and their meaning does not depend on the CPU running the code.

The general modifiers: < and >#

Any integer directive (s, S, l, L, i, I, q, Q, j, J) and every float (f, d, F, D) accepts an endianness modifier:

Modifier

Meaning

Mnemonic

>

Big-endian

The “big end” touches the directive letter

<

Little-endian

The “little end” touches the directive

Read l> as “long, big end” and l< as “long, little end”.

my $be32 = pack "l>", -1_000_000;    # big-endian 32-bit signed
my $le64 = pack "q<", 2 ** 40;       # little-endian 64-bit signed
my $bed  = pack "d>", 3.14159;       # big-endian IEEE 754 double

For the common 16- and 32-bit unsigned cases, n / N / v / V are shorter and should be preferred. Use < / > when:

  • the field is signed — n and N are unsigned by default

  • the field is 64-bit — there are no n/N equivalents for q/Q

  • the field is a float — n/N cover integers only

  • you want to apply endianness to a whole group (see below)

Applying endianness to a group#

If every integer in a sub-structure shares the same byte order, write the endianness modifier once on a group:

my $rec = pack "(s l l)<", $flags, $x, $y;
# same as "s<l<l<"

A group-level modifier cascades into every byte-ordered directive inside, including nested groups. Directives that do not accept a byte-order modifier (such as C) are silently ignored.

Native order: when it is what you want#

The bare directives s / S / l / L / i / I / q / Q produce the host CPU’s native byte order. That is useful when:

  • You are building a C struct in memory for an ioctl call on the same machine.

  • You are writing a scratch file that only this program will read back.

  • The protocol spec explicitly says “host byte order” (rare).

It is wrong when data leaves the process: a file written on a laptop, read on a server, must not use native order unless both machines happen to agree.

Detecting the host’s byte order#

Useful once, as a sanity check. Pack 1 as a 16-bit integer and look at the first byte:

my $little_endian = unpack("c", pack("s", 1)) == 1;
my $big_endian    = unpack("c", pack("s", 1)) == 0;

On x86 the first is true; on a big-endian host the second is true.

Floats have endianness too#

IEEE 754 does not pin down byte order. A double written on a little-endian machine comes out reversed on a big-endian one:

# send a float in big-endian order — safe across machines that use IEEE 754
my $buf = pack "d>", 2.718281828;

Mismatched hardware (non-IEEE float formats, exotic long double layouts) cannot be repaired by < / > alone. For portability across odd platforms, send floats as text or use a byte-exact format like the output of sprintf "%a".

Worked example: parsing a BMP header#

The BMP bitmap format begins with the ASCII magic BM followed by three little-endian 32-bit fields (file size, reserved, pixel data offset):

my ($magic, $filesize, $reserved, $pixoff) =
    unpack "A2 V V V", $bmp_header;

die "not a BMP" unless $magic eq "BM";

V is little-endian 32-bit unsigned — exactly what the spec demands. A2 reads two bytes as a text string without stripping. Every field is the right shape; no byte-swapping required, no platform-dependent behaviour.

What to remember#

  • Wire format? Use n / N / v / V — or < / > if you need signed, 64-bit, or float.

  • Stay inside one process? Native is fine — but flag the choice deliberately.

  • IEEE 754 is not enough. If a float crosses a machine, pin the byte order.

With byte counts and byte order under control, the remaining directives — strings, positioning, groups — fall out naturally.