Strings, bits, and nybbles#
By the end of this chapter you will be able to pack and unpack fixed-width strings, C-style NUL-terminated strings, bit strings, and hex strings — the three text-ish families of directives.
Binary formats carry string data in several distinct flavours: raw bytes, space-padded text, NUL-terminated C strings, bit fields, hex-encoded numbers. pack has a directive for each.
The three string directives: a / A / Z#
All three pack exactly one value into a fixed width. They differ in what they pad with, and — crucially — in what unpack strips back off:
Letter | Pad byte | Unpack returns | Typical use |
|---|---|---|---|
|
| All bytes unchanged | Arbitrary binary |
|
| Trailing whitespace and NUL stripped | ASCII fixed-width text |
|
| Bytes up to the first NUL | C-style NUL-terminated |
The width is the repeat count, not a count of values:
pack "a4", "hi" # "hi\0\0"
pack "A4", "hi" # "hi "
pack "Z4", "hi" # "hi\0\0"
pack "a4", "abcdef" # "abcd" — truncated
pack "A*", "hello" # "hello" — whatever the value's length is
Z guarantees a trailing NUL — with a caveat#
Z always reserves space for at least one terminating NUL:
pack "Z*", "hello" # "hello\0" — a free NUL appended
pack "Z5", "hello" # "hell\0" — truncated to make room!
If a C program on the other side expects a zero-terminated char[32], use Z32. It will pack up to 31 bytes of data plus the terminator.
Picking between a and A for text#
The data is ASCII, padded with spaces on disk →
A.The data is arbitrary bytes (might contain spaces or NULs as valid content) →
a.
The classic trap: you pack a name with A20 and the wire format uses NUL padding. unpack "A20" strips both spaces and NULs, so it looks fine — but pack "A20", "bob" will round-trip to "bob " padded with spaces. If the spec says NUL padding, use a on the pack side.
Worked example: fixed-width text records#
A ledger file lays data out in columns:
0 1 2 3 4
0123456789012345678901234567890123456789012345678
2026-04-22 coffee at the station 3.50
2026-04-23 train to Brussels 42.00
Column 1-10 is the date, 12-38 the description, 40-47 the amount. The gaps are single spaces (byte 11 and byte 39). unpack with A10 x A27 x A* peels each record apart:
while (<$ledger>) {
chomp;
my ($date, $desc, $amount) = unpack "A10 x A27 x A*", $_;
print " $date | $desc | $amount\n";
}
x skips one byte; A27 eats the description column and strips trailing spaces; A* greedy-eats the remaining bytes as the amount. The output side uses wider fields so spaces survive the round-trip:
my $line = pack "A11 A28 A8 A*", $date, $desc,
sprintf("%.2f", $amt_left),
sprintf("%12.2f", $amt_right);
Notice the extra byte on each A width — that single extra byte is the column gap. A consistent byte budget per column is what makes fixed-width records tractable at all.
Bit strings: b and B#
A bit string is a string of "0" and "1" characters that packs into actual bits. Two directives, differing only in which direction you read each byte:
Directive | Bit order within each byte |
|---|---|
| LSB first — bit 0, 1, 2, 3, 4, 5, 6, 7 |
| MSB first — bit 7, 6, 5, 4, 3, 2, 1, 0 |
The repeat count is the number of bits, not bytes:
pack "B8", "10001100" # "\x8c" — MSB first, bit 7 set, bit 3 & 2 set
pack "b8", "00110001" # "\x8c" — LSB first, same byte
B matches the usual «left-to-right is high-to-low» convention you see in binary dumps; b matches the convention used by some hardware registers and the vec built-in. Use B when you are writing out a «normal» binary number, b when you are mirroring a data sheet that numbers bit 0 on the right.
Counting set bits#
One of the more surprising uses of unpack: count set bits in a buffer with a single call. %32b* unpacks the bits and asks for a 32-bit sum, which is the count:
my $n_bits = unpack "%32b*", $mask;
The %N prefix is unpack-only — see the positioning chapter for the other uses.
Hex strings: h and H#
Hex strings are the text representation most developers read — two hex digits per byte. Two directives, differing in nybble order:
Directive | Nybble order |
|---|---|
| Low nybble first |
| High nybble first |
H is the «normal» hex dump — read left to right, high digit then low:
pack "H*", "deadbeef" # "\xde\xad\xbe\xef"
unpack "H*", "\xde\xad\xbe\xef" # "deadbeef"
h reverses each byte’s nybbles — rarely what you want unless a spec explicitly says so. If the round-trip you expect is not what you get, try the other letter.
pack "h*", "ef" # "\xfe" (nybbles swapped)
pack "H*", "ef" # "\xef"
Choosing between them#
You have | Reach for |
|---|---|
ASCII in a fixed-width column on disk |
|
Arbitrary bytes in a fixed-width slot |
|
A C-style NUL-terminated string |
|
A binary number as a string of |
|
Same, with the register-convention bit order |
|
A hex string in the normal reading order |
|
A hex string with nybbles swapped |
|
Bits, nybbles, and string directives all share one rule: the repeat count specifies the width of the single value being packed, not the number of values. That’s the one fact to carry forward.
Next up: () groups and repeat counts, the tools for applying a pattern of directives to a list of values of unknown length.