Grouping and counts#
By the end of this chapter you will be able to repeat a pattern of directives across a list of values, use the * and […] forms of repeat count, and write self-describing length-prefixed records with the / directive.
A bare repeat count applies to one directive letter: C4 packs four bytes. The moment the repeating unit is more than one directive — «pack a short and two bytes, many times» — you need a group.
() — a group is a sub-template#
Parentheses gather a sequence of directives so that a repeat count or endianness modifier applies to the whole. Compare:
pack "C S C S C S", @a, @b, @c, @d, @e, @f # repeat by hand
pack "(CS)3", @a, @b, @c, @d, @e, @f # same thing, grouped
pack "(CS)*", @pairs # repeat as often as values last
A group has no byte cost of its own — it is a syntactic device. The total packed length is what the directives inside it would produce without the parentheses.
Grouping with endianness#
The single most practical use of a group — a sub-structure whose every integer shares one byte order:
my $rec = pack "(l s s)<", $id, $x, $y;
# same as "l<s<s<"
The < cascades into every byte-ordered directive inside, including nested groups. Directives that do not accept a byte-order modifier (like C, a, Z) are silently unaffected.
Repeat counts in full#
After any directive or group, you may write:
Form | Meaning |
|---|---|
| Apply the directive/group |
| Apply as often as values last. For |
| Same as |
| Repeat count is the packed-byte length of the bracketed template |
The [template] form is the tool for expressing «as many bytes as a foo takes»:
pack "x[L]" # skip 4 bytes (sizeof a packed long)
pack "x[d]" # skip 8 bytes (sizeof a packed double)
pack "a[Q]" # one string 8 bytes wide
It is especially useful for alignment (see the positioning chapter) and for templates whose widths must track the platform-dependent size of a native integer:
pack "a[l!]", $native_long_buf # room for one native long
* is per-value-group, not «swallow everything»#
A single * counts «remaining values» for this directive. Two A*s in a row do not race:
pack "A*A*", "hello", "world" # "helloworld"
The first A* packs all of "hello"; the second packs all of "world". Each * consumes one value from the list, at that value’s full length. This is the general rule: every directive corresponds to one piece of data, whatever its repeat count.
The / directive — length and data together#
Wire formats frequently store a count immediately before the thing being counted: «a 2-byte length, then that many bytes of payload». The / directive ties the two into one step.
In pack: length-item/sequence-item#
Write two directives separated by a slash. The first packs the length; the second packs the payload. pack computes the length for you:
my $msg = pack "n/a*", "hello, world";
# "\x00\x0chello, world"
# ^^^^^^^^ big-endian 16-bit length = 12
# ^^^^^^^^^^^ the payload itself
The length-item may be any numeric directive — C, n, N, w, S<, and so on — or a string directive such as A4 when the protocol writes the length as ASCII:
my $buf = pack "A4/A*", "Humpty-Dumpty";
# "13 Humpty-Dumpty" — 4-char ASCII length, then the string
In unpack: /item#
The unpack form is simpler: a bare / before the item. The count is taken from the most recent integer directive:
my ($payload) = unpack "n/a*", $msg; # "hello, world"
Reading that template: «read a n integer, call it L; then read L bytes as an a*-style string.» The length itself does not appear in the output list.
The common mistake: A* after a /#
You cannot put another A* or a* after a /-introduced field in unpack and expect it to behave — the * is greedy:
# Wrong — $prio will be undef, $sm gets everything left
my ($src, $dst, $sm, $prio) = unpack "Z* Z* C A* C", $buf;
# Right — use /A* so $sm knows where to stop
my ($src, $dst, $sm, $prio) = unpack "Z* Z* C/A* C", $buf;
In the second template, C/A* reads a byte-count then that many bytes. Everything after the slash respects it, and the trailing C picks up the next byte as expected.
Worked example: key-value pairs#
A protocol stores a dictionary as count followed by count pairs of (length, key, length, value):
my %env = ( HOST => "example.com",
PORT => "443",
USER => "alice" );
my $blob = pack "S (S/A* S/A*)*",
scalar keys %env,
%env;
Reading it back:
my %parsed = unpack "S/(S/A* S/A*)", $blob;
The pack template says: «one 16-bit count (of pairs), then repeat the sub-template S/A* S/A* for each pair.» The unpack template reads the count and applies it to the group directly — the count is no longer in the output list.
Edge cases and constraints#
/makes no sense with a fixed-length item. The second directive must be variable-width:a*,A*,Z*,/A$n, or the like. Perl will reject a fixed-length second item.()*withpackcannot be matched by()*on unpack. Pack has the values, so it can say «repeat until done». Unpack does not know how many repetitions are encoded in the buffer unless a count directive precedes the group.Nested groups are legitimate and common:
pack "((CC)(S))<", @records
Endianness modifiers cascade through every nesting level.
Repeat count on a group applies to the whole group and takes that many repetitions” worth of values.
(CS)3consumes six list values, not three.
Next chapter: the directives that move around inside a template without producing a value — x, X, @, ..