Strings, bits, and nybbles#

By the end of this chapter you will be able to pack and unpack fixed-width strings, C-style NUL-terminated strings, bit strings, and hex strings — the three text-ish families of directives.

Binary formats carry string data in several distinct flavours: raw bytes, space-padded text, NUL-terminated C strings, bit fields, hex-encoded numbers. pack has a directive for each.

The three string directives: a / A / Z#

All three pack exactly one value into a fixed width. They differ in what they pad with, and — crucially — in what unpack strips back off:

Letter

Pad byte

Unpack returns

Typical use

a

"\0" (NUL)

All bytes unchanged

Arbitrary binary

A

" " (space)

Trailing whitespace and NUL stripped

ASCII fixed-width text

Z

"\0" (NUL)

Bytes up to the first NUL

C-style NUL-terminated

The width is the repeat count, not a count of values:

pack "a4", "hi"           # "hi\0\0"
pack "A4", "hi"           # "hi  "
pack "Z4", "hi"           # "hi\0\0"

pack "a4", "abcdef"       # "abcd"  — truncated
pack "A*", "hello"        # "hello" — whatever the value's length is

Z guarantees a trailing NUL — with a caveat#

Z always reserves space for at least one terminating NUL:

pack "Z*", "hello"        # "hello\0"   — a free NUL appended
pack "Z5", "hello"        # "hell\0"    — truncated to make room!

If a C program on the other side expects a zero-terminated char[32], use Z32. It will pack up to 31 bytes of data plus the terminator.

Picking between a and A for text#

  • The data is ASCII, padded with spaces on disk → A.

  • The data is arbitrary bytes (might contain spaces or NULs as valid content) → a.

The classic trap: you pack a name with A20 and the wire format uses NUL padding. unpack "A20" strips both spaces and NULs, so it looks fine — but pack "A20", "bob" will round-trip to "bob " padded with spaces. If the spec says NUL padding, use a on the pack side.

Worked example: fixed-width text records#

A ledger file lays data out in columns:

0         1         2         3         4
0123456789012345678901234567890123456789012345678
2026-04-22 coffee at the station           3.50
2026-04-23 train to Brussels              42.00

Column 1-10 is the date, 12-38 the description, 40-47 the amount. The gaps are single spaces (byte 11 and byte 39). unpack with A10 x A27 x A* peels each record apart:

while (<$ledger>) {
    chomp;
    my ($date, $desc, $amount) = unpack "A10 x A27 x A*", $_;
    print "  $date | $desc | $amount\n";
}

x skips one byte; A27 eats the description column and strips trailing spaces; A* greedy-eats the remaining bytes as the amount. The output side uses wider fields so spaces survive the round-trip:

my $line = pack "A11 A28 A8 A*", $date, $desc,
                                 sprintf("%.2f", $amt_left),
                                 sprintf("%12.2f", $amt_right);

Notice the extra byte on each A width — that single extra byte is the column gap. A consistent byte budget per column is what makes fixed-width records tractable at all.

Bit strings: b and B#

A bit string is a string of "0" and "1" characters that packs into actual bits. Two directives, differing only in which direction you read each byte:

Directive

Bit order within each byte

b

LSB first — bit 0, 1, 2, 3, 4, 5, 6, 7

B

MSB first — bit 7, 6, 5, 4, 3, 2, 1, 0

The repeat count is the number of bits, not bytes:

pack "B8", "10001100"      # "\x8c" — MSB first, bit 7 set, bit 3 & 2 set
pack "b8", "00110001"      # "\x8c" — LSB first, same byte

B matches the usual “left-to-right is high-to-low” convention you see in binary dumps; b matches the convention used by some hardware registers and the vec built-in. Use B when you are writing out a “normal” binary number, b when you are mirroring a data sheet that numbers bit 0 on the right.

Counting set bits#

One of the more surprising uses of unpack: count set bits in a buffer with a single call. %32b* unpacks the bits and asks for a 32-bit sum, which is the count:

my $n_bits = unpack "%32b*", $mask;

The %N prefix is unpack-only — see the positioning chapter for the other uses.

Hex strings: h and H#

Hex strings are the text representation most developers read — two hex digits per byte. Two directives, differing in nybble order:

Directive

Nybble order

h

Low nybble first

H

High nybble first

H is the “normal” hex dump — read left to right, high digit then low:

pack "H*", "deadbeef"      # "\xde\xad\xbe\xef"
unpack "H*", "\xde\xad\xbe\xef"   # "deadbeef"

h reverses each byte’s nybbles — rarely what you want unless a spec explicitly says so. If the round-trip you expect is not what you get, try the other letter.

pack "h*", "ef"            # "\xfe"  (nybbles swapped)
pack "H*", "ef"            # "\xef"

Choosing between them#

You have

Reach for

ASCII in a fixed-width column on disk

A

Arbitrary bytes in a fixed-width slot

a

A C-style NUL-terminated string

Z

A binary number as a string of 0 / 1

B

Same, with the register-convention bit order

b

A hex string in the normal reading order

H

A hex string with nybbles swapped

h

Bits, nybbles, and string directives all share one rule: the repeat count specifies the width of the single value being packed, not the number of values. That’s the one fact to carry forward.

Next up: () groups and repeat counts, the tools for applying a pattern of directives to a list of values of unknown length.