Endianness#
By the end of this chapter you will be able to choose between
native, big-endian, and little-endian byte order; use the portable
n / N / v / V shortcuts; and apply the < / > modifiers
to any integer or float.
A multi-byte integer has to be written somehow. 0x12345678 is
four bytes — but in which order? Two conventions dominate:
Big-endian — highest-order byte first:
12 34 56 78. Used by most network protocols, by PowerPC and SPARC CPUs, and by file formats that care about portability (PNG, Java.class, most TCP/IP headers).Little-endian — lowest-order byte first:
78 56 34 12. Used by x86, x86-64, ARM (in its common modes), and by file formats from the Intel lineage (BMP, WAV, TIFF in its common variant, all of FAT).
Mix these up and you will read nonsense. pack gives you three ways
to pin down the byte order.
The portable shortcuts: n / N / v / V#
Four letters cover the most common wire-format cases:
Directive |
Bytes |
Endian |
Meaning |
|---|---|---|---|
|
2 |
big-endian |
Network (port numbers) |
|
4 |
big-endian |
Network (IPv4 addrs) |
|
2 |
little-endian |
“VAX” |
|
4 |
little-endian |
“VAX” |
“Network byte order” is big-endian. If you are reading an RFC, the
answer is almost always n or N.
my $port = pack "n", 443; # "\x01\xbb"
my $ip = pack "N", 0x7F000001; # "\x7f\x00\x00\x01" — 127.0.0.1
These four are unsigned only. With the ! modifier they become
signed:
my $neg = pack "n!", -1; # "\xff\xff"
If your field fits one of these four shapes, use them. They are portable, they are readable, and their meaning does not depend on the CPU running the code.
The general modifiers: < and >#
Any integer directive (s, S, l, L, i, I, q, Q, j,
J) and every float (f, d, F, D) accepts an endianness
modifier:
Modifier |
Meaning |
Mnemonic |
|---|---|---|
|
Big-endian |
The “big end” touches the directive letter |
|
Little-endian |
The “little end” touches the directive |
Read l> as “long, big end” and l< as “long, little end”.
my $be32 = pack "l>", -1_000_000; # big-endian 32-bit signed
my $le64 = pack "q<", 2 ** 40; # little-endian 64-bit signed
my $bed = pack "d>", 3.14159; # big-endian IEEE 754 double
For the common 16- and 32-bit unsigned cases, n / N / v / V
are shorter and should be preferred. Use < / > when:
the field is signed —
nandNare unsigned by defaultthe field is 64-bit — there are no
n/Nequivalents forq/Qthe field is a float —
n/Ncover integers onlyyou want to apply endianness to a whole group (see below)
Applying endianness to a group#
If every integer in a sub-structure shares the same byte order, write the endianness modifier once on a group:
my $rec = pack "(s l l)<", $flags, $x, $y;
# same as "s<l<l<"
A group-level modifier cascades into every byte-ordered directive
inside, including nested groups. Directives that do not accept a
byte-order modifier (such as C) are silently ignored.
Native order: when it is what you want#
The bare directives s / S / l / L / i / I / q / Q
produce the host CPU’s native byte order. That is useful when:
You are building a C struct in memory for an
ioctlcall on the same machine.You are writing a scratch file that only this program will read back.
The protocol spec explicitly says “host byte order” (rare).
It is wrong when data leaves the process: a file written on a laptop, read on a server, must not use native order unless both machines happen to agree.
Detecting the host’s byte order#
Useful once, as a sanity check. Pack 1 as a 16-bit integer and
look at the first byte:
my $little_endian = unpack("c", pack("s", 1)) == 1;
my $big_endian = unpack("c", pack("s", 1)) == 0;
On x86 the first is true; on a big-endian host the second is true.
Floats have endianness too#
IEEE 754 does not pin down byte order. A double written on a little-endian machine comes out reversed on a big-endian one:
# send a float in big-endian order — safe across machines that use IEEE 754
my $buf = pack "d>", 2.718281828;
Mismatched hardware (non-IEEE float formats, exotic long double
layouts) cannot be repaired by < / > alone. For portability
across odd platforms, send floats as text or use a byte-exact format
like the output of sprintf "%a".
Worked example: parsing a BMP header#
The BMP bitmap format begins with the ASCII magic BM followed by
three little-endian 32-bit fields (file size, reserved, pixel data
offset):
my ($magic, $filesize, $reserved, $pixoff) =
unpack "A2 V V V", $bmp_header;
die "not a BMP" unless $magic eq "BM";
V is little-endian 32-bit unsigned — exactly what the spec
demands. A2 reads two bytes as a text string without stripping.
Every field is the right shape; no byte-swapping required, no
platform-dependent behaviour.
What to remember#
Wire format? Use
n/N/v/V— or</>if you need signed, 64-bit, or float.Stay inside one process? Native is fine — but flag the choice deliberately.
IEEE 754 is not enough. If a float crosses a machine, pin the byte order.
With byte counts and byte order under control, the remaining directives — strings, positioning, groups — fall out naturally.