Grouping and counts#
By the end of this chapter you will be able to repeat a pattern
of directives across a list of values, use the * and […] forms
of repeat count, and write self-describing length-prefixed records
with the / directive.
A bare repeat count applies to one directive letter: C4 packs
four bytes. The moment the repeating unit is more than one directive
— “pack a short and two bytes, many times” — you need a group.
() — a group is a sub-template#
Parentheses gather a sequence of directives so that a repeat count or endianness modifier applies to the whole. Compare:
pack "C S C S C S", @a, @b, @c, @d, @e, @f # repeat by hand
pack "(CS)3", @a, @b, @c, @d, @e, @f # same thing, grouped
pack "(CS)*", @pairs # repeat as often as values last
A group has no byte cost of its own — it is a syntactic device. The total packed length is what the directives inside it would produce without the parentheses.
Grouping with endianness#
The single most practical use of a group — a sub-structure whose every integer shares one byte order:
my $rec = pack "(l s s)<", $id, $x, $y;
# same as "l<s<s<"
The < cascades into every byte-ordered directive inside, including
nested groups. Directives that do not accept a byte-order modifier
(like C, a, Z) are silently unaffected.
Repeat counts in full#
After any directive or group, you may write:
Form |
Meaning |
|---|---|
|
Apply the directive/group |
|
Apply as often as values last. For |
|
Same as |
|
Repeat count is the packed-byte length of the bracketed template |
The [template] form is the tool for expressing “as many bytes as a
foo takes”:
pack "x[L]" # skip 4 bytes (sizeof a packed long)
pack "x[d]" # skip 8 bytes (sizeof a packed double)
pack "a[Q]" # one string 8 bytes wide
It is especially useful for alignment (see the positioning chapter) and for templates whose widths must track the platform-dependent size of a native integer:
pack "a[l!]", $native_long_buf # room for one native long
* is per-value-group, not “swallow everything”#
A single * counts “remaining values” for this directive. Two
A*s in a row do not race:
pack "A*A*", "hello", "world" # "helloworld"
The first A* packs all of "hello"; the second packs all of
"world". Each * consumes one value from the list, at that
value’s full length. This is the general rule: every directive
corresponds to one piece of data, whatever its repeat count.
The / directive — length and data together#
Wire formats frequently store a count immediately before the thing
being counted: “a 2-byte length, then that many bytes of payload”.
The / directive ties the two into one step.
In pack: length-item/sequence-item#
Write two directives separated by a slash. The first packs the
length; the second packs the payload. pack computes the length for
you:
my $msg = pack "n/a*", "hello, world";
# "\x00\x0chello, world"
# ^^^^^^^^ big-endian 16-bit length = 12
# ^^^^^^^^^^^ the payload itself
The length-item may be any numeric directive — C, n, N, w,
S<, and so on — or a string directive such as A4 when the
protocol writes the length as ASCII:
my $buf = pack "A4/A*", "Humpty-Dumpty";
# "13 Humpty-Dumpty" — 4-char ASCII length, then the string
In unpack: /item#
The unpack form is simpler: a bare / before the item. The count
is taken from the most recent integer directive:
my ($payload) = unpack "n/a*", $msg; # "hello, world"
Reading that template: “read a n integer, call it L; then read
L bytes as an a*-style string.” The length itself does not
appear in the output list.
The common mistake: A* after a /#
You cannot put another A* or a* after a /-introduced field in
unpack and expect it to behave — the * is greedy:
# Wrong — $prio will be undef, $sm gets everything left
my ($src, $dst, $sm, $prio) = unpack "Z* Z* C A* C", $buf;
# Right — use /A* so $sm knows where to stop
my ($src, $dst, $sm, $prio) = unpack "Z* Z* C/A* C", $buf;
In the second template, C/A* reads a byte-count then that many
bytes. Everything after the slash respects it, and the trailing C
picks up the next byte as expected.
Worked example: key-value pairs#
A protocol stores a dictionary as count followed by count pairs
of (length, key, length, value):
my %env = ( HOST => "example.com",
PORT => "443",
USER => "alice" );
my $blob = pack "S (S/A* S/A*)*",
scalar keys %env,
%env;
Reading it back:
my %parsed = unpack "S/(S/A* S/A*)", $blob;
The pack template says: “one 16-bit count (of pairs), then repeat
the sub-template S/A* S/A* for each pair.” The unpack template
reads the count and applies it to the group directly — the count is
no longer in the output list.
Edge cases and constraints#
/makes no sense with a fixed-length item. The second directive must be variable-width:a*,A*,Z*,/A$n, or the like. Perl will reject a fixed-length second item.()*withpackcannot be matched by()*on unpack. Pack has the values, so it can say “repeat until done”. Unpack does not know how many repetitions are encoded in the buffer unless a count directive precedes the group.Nested groups are legitimate and common:
pack "((CC)(S))<", @records
Endianness modifiers cascade through every nesting level.
Repeat count on a group applies to the whole group and takes that many repetitions’ worth of values.
(CS)3consumes six list values, not three.
Next chapter: the directives that move around inside a template
without producing a value — x, X, @, ..