Positioning and padding#

By the end of this chapter you will be able to skip bytes, insert padding, jump to absolute offsets, and align fields to arbitrary boundaries — the four directives x, X, @, and ..

Some directives produce a byte without consuming a value from the list (pad bytes). Some consume a value without producing a byte (positioning commands). Taken together they let you model the gaps and alignments that real binary formats contain.

x — skip forward / insert NUL#

x is the simplest of the four. In pack it inserts one NUL byte without consuming a value; in unpack it skips one byte without producing a value.

pack   "C x C", 65, 66       # "A\0B"
unpack "C x C", "A\0B"       # (65, 66)

With a repeat count, it inserts or skips that many bytes:

pack "C x3 C", 65, 66        # "A\0\0\0B"

x* means “skip to the end of the string” in unpack, and is equivalent to x0 (no-op) in pack.

x does not mean “a space”#

The confusing thing about x is that it pads with NUL, not with space. If you want space padding for text, use A with a wider repeat count instead:

# Wrong — Perl inserts a NUL, not a space
pack "A10 x A10", "date", "label"
# "date      \0label     "

# Right — widen the field
pack "A11 A10", "date", "label"
# "date       label     "

X — back up#

X moves the write/read head backward by one byte. With a repeat count, backward by that many bytes. Unlike x, X has no effect on the output bytes of pack — it just rewinds so a subsequent directive overwrites them.

pack "CC X C", 65, 66, 67    # "AC" — the second C overwrites 66 with 67

X is mostly useful with unpack: peek forward, then rewind to re-read:

# Read a 16-bit length, then re-read the same 2 bytes as two separate bytes
my ($len, $hi, $lo) = unpack "n XX CC", $frame;

X before the start of the string is a fatal error.

@ — jump to absolute position#

@N sets the current position to byte N within the innermost () group (or from the start of the string if there is no enclosing group). In pack, this zero-fills from the current position forward, or truncates if the current position is past N:

pack "C @4 C", 65, 66        # "A\0\0\0B"  — byte 0 is 'A', byte 4 is 'B'

This is the template-level equivalent of offsetof(struct, field): if a C header file tells you field 2 lives at offset 12, @12 parks you there regardless of what the preceding directives did.

@ inside a group#

Within each repetition of a group, @ restarts at 0. That can surprise you:

pack '@1A((@2A)@3A)', qw(X Y Z)
# "\0X\0\0YZ"

Walking through: the outer @1A puts "X" at byte 1 (with a NUL at byte 0). The inner group starts a new coordinate system at "X"’s position; inside it, @2A puts "Y" at the group’s byte 2 (two further NULs in between), then @3A puts "Z" at byte 3 of the group. Read the spec carefully before using @ inside groups.

. — absolute position from the data#

. is the value-driven cousin of @. Instead of a fixed repeat count, the position comes from the next value in the list:

pack "C . C", 65, 4, 66     # "A\0\0\0B" — position 4 from the next value

Useful when the offset is computed at runtime — say, from a header field you just read:

my $buf = pack "C .N C", $type, $offset, $value;
# $offset bytes of NUL fill before $value

In unpack, . returns the current position rather than zero-filling:

my ($a, $pos) = unpack "C .", "hello";
# $a = 104 ('h'), $pos = 1

With .*, the offset is measured from the start of the whole string; with .N for integer N 1, from the start of the enclosing N-th group; with .0, relative to the current position.

The ! modifier: alignment#

x!N and X!N turn x and X into alignment commands: advance (or rewind) to the nearest multiple of N. This is how you respect C struct padding rules.

# struct { char c; double d; char cc[2]; }
my $s = pack "c x![d] d c2", $c, $d, $c1, $c2;

Reading x![d]: “align forward to a multiple of the width a packed d would take” — i.e. 8 bytes. That is exactly the rule most C compilers apply to a double field after a char. Use [l!], [d], [Q] to name the alignment requirement without having to know the numeric width.

A bare x!0 or x!1 is a no-op.

Checksums — %N on unpack only#

%N is a prefix on a single directive, not a standalone directive. It says “instead of returning the values the next directive would produce, return their N-bit sum”. Common uses:

my $bytesum = unpack "%32C*", $buf;    # sum every byte
my $setbits = unpack "%32b*", $mask;   # count set bits

Exists only in unpack. There is no % in pack — you cannot pack a checksum directly; compute it yourself and pack the result with whatever integer directive you need.

Worked example: reading a variable-offset payload#

Some formats start with a header of fixed size but put the actual data at an offset the header contains. The classic recipe:

# Header: magic (4 bytes), version (1 byte), data-offset (4 bytes BE),
# then data starts at the byte-offset the header pointed at.
my ($magic, $ver, $off) = unpack "A4 C N", $file;

die "bad magic" unless $magic eq "MAGI";

# Jump to the data section
my $data = unpack "x$off a*", $file;

Two separate unpack calls — the first reads the offset, the second uses it in the template. Template strings are not regular expressions; they have no way to look ahead and use a computed value within the same call. Unpack in stages when the shape of the data depends on what you have just read.

With positioning and alignment in the toolbox, the two worked examples in the next chapters should read as straightforward applications: parsing a real network header and a real file format.