Strings, bits, and nybbles#
By the end of this chapter you will be able to pack and unpack fixed-width strings, C-style NUL-terminated strings, bit strings, and hex strings — the three text-ish families of directives.
Binary formats carry string data in several distinct flavours: raw
bytes, space-padded text, NUL-terminated C strings, bit fields,
hex-encoded numbers. pack has a
directive for each.
The three string directives: a / A / Z#
All three pack exactly one value into a fixed width. They differ in
what they pad with, and — crucially — in what unpack
strips back off:
Letter |
Pad byte |
Unpack returns |
Typical use |
|---|---|---|---|
|
|
All bytes unchanged |
Arbitrary binary |
|
|
Trailing whitespace and NUL stripped |
ASCII fixed-width text |
|
|
Bytes up to the first NUL |
C-style NUL-terminated |
The width is the repeat count, not a count of values:
pack "a4", "hi" # "hi\0\0"
pack "A4", "hi" # "hi "
pack "Z4", "hi" # "hi\0\0"
pack "a4", "abcdef" # "abcd" — truncated
pack "A*", "hello" # "hello" — whatever the value's length is
Z guarantees a trailing NUL — with a caveat#
Z always reserves space for at least one terminating NUL:
pack "Z*", "hello" # "hello\0" — a free NUL appended
pack "Z5", "hello" # "hell\0" — truncated to make room!
If a C program on the other side expects a zero-terminated
char[32], use Z32. It will pack up to 31 bytes of data plus the
terminator.
Picking between a and A for text#
The data is ASCII, padded with spaces on disk →
A.The data is arbitrary bytes (might contain spaces or NULs as valid content) →
a.
The classic trap: you pack a name with A20 and the wire format
uses NUL padding. unpack "A20" strips both spaces and NULs, so
it looks fine — but pack "A20", "bob" will round-trip to "bob "
padded with spaces. If the spec says NUL padding, use a on the
pack side.
Worked example: fixed-width text records#
A ledger file lays data out in columns:
0 1 2 3 4
0123456789012345678901234567890123456789012345678
2026-04-22 coffee at the station 3.50
2026-04-23 train to Brussels 42.00
Column 1-10 is the date, 12-38 the description, 40-47 the amount.
The gaps are single spaces (byte 11 and byte 39). unpack with
A10 x A27 x A* peels each record apart:
while (<$ledger>) {
chomp;
my ($date, $desc, $amount) = unpack "A10 x A27 x A*", $_;
print " $date | $desc | $amount\n";
}
x skips one byte; A27 eats the description column and strips
trailing spaces; A* greedy-eats the remaining bytes as the amount.
The output side uses wider fields so spaces survive the round-trip:
my $line = pack "A11 A28 A8 A*", $date, $desc,
sprintf("%.2f", $amt_left),
sprintf("%12.2f", $amt_right);
Notice the extra byte on each A width — that single extra byte is
the column gap. A consistent byte budget per column is what makes
fixed-width records tractable at all.
Bit strings: b and B#
A bit string is a string of "0" and "1" characters that packs
into actual bits. Two directives, differing only in which direction
you read each byte:
Directive |
Bit order within each byte |
|---|---|
|
LSB first — bit 0, 1, 2, 3, 4, 5, 6, 7 |
|
MSB first — bit 7, 6, 5, 4, 3, 2, 1, 0 |
The repeat count is the number of bits, not bytes:
pack "B8", "10001100" # "\x8c" — MSB first, bit 7 set, bit 3 & 2 set
pack "b8", "00110001" # "\x8c" — LSB first, same byte
B matches the usual “left-to-right is high-to-low” convention you
see in binary dumps; b matches the convention used by some
hardware registers and the vec
built-in. Use B when you are writing out a “normal” binary
number, b when you are mirroring a data sheet that numbers bit 0
on the right.
Counting set bits#
One of the more surprising uses of unpack: count set bits in a
buffer with a single call. %32b* unpacks the bits and asks for a
32-bit sum, which is the count:
my $n_bits = unpack "%32b*", $mask;
The %N prefix is unpack-only — see the
positioning chapter for the other uses.
Hex strings: h and H#
Hex strings are the text representation most developers read — two hex digits per byte. Two directives, differing in nybble order:
Directive |
Nybble order |
|---|---|
|
Low nybble first |
|
High nybble first |
H is the “normal” hex dump — read left to right, high digit then
low:
pack "H*", "deadbeef" # "\xde\xad\xbe\xef"
unpack "H*", "\xde\xad\xbe\xef" # "deadbeef"
h reverses each byte’s nybbles — rarely what you want unless a
spec explicitly says so. If the round-trip you expect is not what
you get, try the other letter.
pack "h*", "ef" # "\xfe" (nybbles swapped)
pack "H*", "ef" # "\xef"
Choosing between them#
You have |
Reach for |
|---|---|
ASCII in a fixed-width column on disk |
|
Arbitrary bytes in a fixed-width slot |
|
A C-style NUL-terminated string |
|
A binary number as a string of |
|
Same, with the register-convention bit order |
|
A hex string in the normal reading order |
|
A hex string with nybbles swapped |
|
Bits, nybbles, and string directives all share one rule: the repeat count specifies the width of the single value being packed, not the number of values. That’s the one fact to carry forward.
Next up: () groups and repeat counts, the tools for applying a
pattern of directives to a list of values of unknown length.