--- name: Strings, bits, and nybbles --- # Strings, bits, and nybbles **By the end of this chapter you will be able to** pack and unpack fixed-width strings, C-style NUL-terminated strings, bit strings, and hex strings — the three text-ish families of directives. Binary formats carry string data in several distinct flavours: raw bytes, space-padded text, NUL-terminated C strings, bit fields, hex-encoded numbers. [`pack`](../../p5/core/perlfunc/pack) has a directive for each. ## The three string directives: `a` / `A` / `Z` All three pack exactly one value into a fixed width. They differ in what they pad with, and — crucially — in what [`unpack`](../../p5/core/perlfunc/unpack) strips back off: | Letter | Pad byte | Unpack returns | Typical use | |--------|---------------|--------------------------------------|-------------------------| | `a` | `"\0"` (NUL) | All bytes unchanged | Arbitrary binary | | `A` | `" "` (space) | Trailing whitespace and NUL stripped | ASCII fixed-width text | | `Z` | `"\0"` (NUL) | Bytes up to the first NUL | C-style NUL-terminated | The width is the **repeat count**, not a count of values: ```perl pack "a4", "hi" # "hi\0\0" pack "A4", "hi" # "hi " pack "Z4", "hi" # "hi\0\0" pack "a4", "abcdef" # "abcd" — truncated pack "A*", "hello" # "hello" — whatever the value's length is ``` ### `Z` guarantees a trailing NUL — with a caveat `Z` always reserves space for at least one terminating NUL: ```perl pack "Z*", "hello" # "hello\0" — a free NUL appended pack "Z5", "hello" # "hell\0" — truncated to make room! ``` If a C program on the other side expects a zero-terminated `char[32]`, use `Z32`. It will pack up to 31 bytes of data plus the terminator. ### Picking between `a` and `A` for text - The data is ASCII, padded with spaces on disk → `A`. - The data is arbitrary bytes (might contain spaces or NULs as valid content) → `a`. The classic trap: you pack a name with `A20` and the wire format uses NUL padding. `unpack "A20"` strips both spaces *and* NULs, so it looks fine — but `pack "A20", "bob"` will round-trip to `"bob "` padded with spaces. If the spec says NUL padding, use `a` on the pack side. ## Worked example: fixed-width text records A ledger file lays data out in columns: ```text 0 1 2 3 4 0123456789012345678901234567890123456789012345678 2026-04-22 coffee at the station 3.50 2026-04-23 train to Brussels 42.00 ``` Column 1-10 is the date, 12-38 the description, 40-47 the amount. The gaps are single spaces (byte 11 and byte 39). `unpack` with `A10 x A27 x A*` peels each record apart: ```perl while (<$ledger>) { chomp; my ($date, $desc, $amount) = unpack "A10 x A27 x A*", $_; print " $date | $desc | $amount\n"; } ``` `x` skips one byte; `A27` eats the description column and strips trailing spaces; `A*` greedy-eats the remaining bytes as the amount. The output side uses wider fields so spaces survive the round-trip: ```perl my $line = pack "A11 A28 A8 A*", $date, $desc, sprintf("%.2f", $amt_left), sprintf("%12.2f", $amt_right); ``` Notice the extra byte on each `A` width — that single extra byte is the column gap. A consistent byte budget per column is what makes fixed-width records tractable at all. ## Bit strings: `b` and `B` A bit string is a string of `"0"` and `"1"` characters that packs into actual bits. Two directives, differing only in which direction you read each byte: | Directive | Bit order within each byte | |-----------|-------------------------------------| | `b` | LSB first — bit 0, 1, 2, 3, 4, 5, 6, 7 | | `B` | MSB first — bit 7, 6, 5, 4, 3, 2, 1, 0 | The repeat count is the number of **bits**, not bytes: ```perl pack "B8", "10001100" # "\x8c" — MSB first, bit 7 set, bit 3 & 2 set pack "b8", "00110001" # "\x8c" — LSB first, same byte ``` `B` matches the usual "left-to-right is high-to-low" convention you see in binary dumps; `b` matches the convention used by some hardware registers and the [`vec`](../../p5/core/perlfunc/vec) built-in. Use `B` when you are writing out a "normal" binary number, `b` when you are mirroring a data sheet that numbers bit 0 on the right. ### Counting set bits One of the more surprising uses of `unpack`: count set bits in a buffer with a single call. `%32b*` unpacks the bits and asks for a 32-bit sum, which is the count: ```perl my $n_bits = unpack "%32b*", $mask; ``` The `%N` prefix is unpack-only — see the [positioning chapter](positioning) for the other uses. ## Hex strings: `h` and `H` Hex strings are the text representation most developers read — two hex digits per byte. Two directives, differing in nybble order: | Directive | Nybble order | |-----------|------------------------------| | `h` | Low nybble first | | `H` | High nybble first | `H` is the "normal" hex dump — read left to right, high digit then low: ```perl pack "H*", "deadbeef" # "\xde\xad\xbe\xef" unpack "H*", "\xde\xad\xbe\xef" # "deadbeef" ``` `h` reverses each byte's nybbles — rarely what you want unless a spec explicitly says so. If the round-trip you expect is *not* what you get, try the other letter. ```perl pack "h*", "ef" # "\xfe" (nybbles swapped) pack "H*", "ef" # "\xef" ``` ## Choosing between them | You have | Reach for | |------------------------------------------|-----------| | ASCII in a fixed-width column on disk | `A` | | Arbitrary bytes in a fixed-width slot | `a` | | A C-style NUL-terminated string | `Z` | | A binary number as a string of `0` / `1` | `B` | | Same, with the register-convention bit order | `b` | | A hex string in the normal reading order | `H` | | A hex string with nybbles swapped | `h` | Bits, nybbles, and string directives all share one rule: the repeat count specifies the width of the *single* value being packed, not the number of values. That's the one fact to carry forward. Next up: `()` groups and repeat counts, the tools for applying a pattern of directives to a list of values of unknown length.