--- name: Endianness --- # Endianness **By the end of this chapter you will be able to** choose between native, big-endian, and little-endian byte order; use the portable `n` / `N` / `v` / `V` shortcuts; and apply the `<` / `>` modifiers to any integer or float. A multi-byte integer has to be written somehow. `0x12345678` is four bytes — but in which order? Two conventions dominate: - **Big-endian** — highest-order byte first: `12 34 56 78`. Used by most network protocols, by PowerPC and SPARC CPUs, and by file formats that care about portability (PNG, Java `.class`, most TCP/IP headers). - **Little-endian** — lowest-order byte first: `78 56 34 12`. Used by x86, x86-64, ARM (in its common modes), and by file formats from the Intel lineage (BMP, WAV, TIFF in its common variant, all of FAT). Mix these up and you will read nonsense. `pack` gives you three ways to pin down the byte order. ## The portable shortcuts: `n` / `N` / `v` / `V` Four letters cover the most common wire-format cases: | Directive | Bytes | Endian | Meaning | |-----------|-------|----------------|------------------------| | `n` | 2 | big-endian | Network (port numbers) | | `N` | 4 | big-endian | Network (IPv4 addrs) | | `v` | 2 | little-endian | "VAX" | | `V` | 4 | little-endian | "VAX" | "Network byte order" is big-endian. If you are reading an RFC, the answer is almost always `n` or `N`. ```perl my $port = pack "n", 443; # "\x01\xbb" my $ip = pack "N", 0x7F000001; # "\x7f\x00\x00\x01" — 127.0.0.1 ``` These four are unsigned only. With the `!` modifier they become signed: ```perl my $neg = pack "n!", -1; # "\xff\xff" ``` If your field fits one of these four shapes, use them. They are portable, they are readable, and their meaning does not depend on the CPU running the code. ## The general modifiers: `<` and `>` Any integer directive (`s`, `S`, `l`, `L`, `i`, `I`, `q`, `Q`, `j`, `J`) and every float (`f`, `d`, `F`, `D`) accepts an endianness modifier: | Modifier | Meaning | Mnemonic | |----------|---------------|--------------------------------------------| | `>` | Big-endian | The "big end" touches the directive letter | | `<` | Little-endian | The "little end" touches the directive | Read `l>` as "long, big end" and `l<` as "long, little end". ```perl my $be32 = pack "l>", -1_000_000; # big-endian 32-bit signed my $le64 = pack "q<", 2 ** 40; # little-endian 64-bit signed my $bed = pack "d>", 3.14159; # big-endian IEEE 754 double ``` For the common 16- and 32-bit unsigned cases, `n` / `N` / `v` / `V` are shorter and should be preferred. Use `<` / `>` when: - the field is signed — `n` and `N` are unsigned by default - the field is 64-bit — there are no `n`/`N` equivalents for `q`/`Q` - the field is a float — `n`/`N` cover integers only - you want to apply endianness to a whole group (see below) ## Applying endianness to a group If every integer in a sub-structure shares the same byte order, write the endianness modifier once on a group: ```perl my $rec = pack "(s l l)<", $flags, $x, $y; # same as "s", 2.718281828; ``` Mismatched hardware (non-IEEE float formats, exotic long double layouts) cannot be repaired by `<` / `>` alone. For portability across odd platforms, send floats as text or use a byte-exact format like the output of `sprintf "%a"`. ## Worked example: parsing a BMP header The BMP bitmap format begins with the ASCII magic `BM` followed by three little-endian 32-bit fields (file size, reserved, pixel data offset): ```perl my ($magic, $filesize, $reserved, $pixoff) = unpack "A2 V V V", $bmp_header; die "not a BMP" unless $magic eq "BM"; ``` `V` is little-endian 32-bit unsigned — exactly what the spec demands. `A2` reads two bytes as a text string without stripping. Every field is the right shape; no byte-swapping required, no platform-dependent behaviour. ## What to remember - **Wire format? Use `n` / `N` / `v` / `V`** — or `<` / `>` if you need signed, 64-bit, or float. - **Stay inside one process? Native is fine** — but flag the choice deliberately. - **IEEE 754 is not enough.** If a float crosses a machine, pin the byte order. With byte counts and byte order under control, the remaining directives — strings, positioning, groups — fall out naturally.