Grouping and counts#

By the end of this chapter you will be able to repeat a pattern of directives across a list of values, use the * and […] forms of repeat count, and write self-describing length-prefixed records with the / directive.

A bare repeat count applies to one directive letter: C4 packs four bytes. The moment the repeating unit is more than one directive — “pack a short and two bytes, many times” — you need a group.

() — a group is a sub-template#

Parentheses gather a sequence of directives so that a repeat count or endianness modifier applies to the whole. Compare:

pack "C S C S C S", @a, @b, @c, @d, @e, @f     # repeat by hand
pack "(CS)3",       @a, @b, @c, @d, @e, @f     # same thing, grouped
pack "(CS)*",       @pairs                     # repeat as often as values last

A group has no byte cost of its own — it is a syntactic device. The total packed length is what the directives inside it would produce without the parentheses.

Grouping with endianness#

The single most practical use of a group — a sub-structure whose every integer shares one byte order:

my $rec = pack "(l s s)<", $id, $x, $y;
# same as "l<s<s<"

The < cascades into every byte-ordered directive inside, including nested groups. Directives that do not accept a byte-order modifier (like C, a, Z) are silently unaffected.

Repeat counts in full#

After any directive or group, you may write:

Form

Meaning

N

Apply the directive/group N times

*

Apply as often as values last. For x, X, @: equivalent to 0. For u: 45.

[N]

Same as N

[templ]

Repeat count is the packed-byte length of the bracketed template

The [template] form is the tool for expressing “as many bytes as a foo takes”:

pack "x[L]"      # skip 4 bytes (sizeof a packed long)
pack "x[d]"      # skip 8 bytes (sizeof a packed double)
pack "a[Q]"      # one string 8 bytes wide

It is especially useful for alignment (see the positioning chapter) and for templates whose widths must track the platform-dependent size of a native integer:

pack "a[l!]", $native_long_buf      # room for one native long

* is per-value-group, not “swallow everything”#

A single * counts “remaining values” for this directive. Two A*s in a row do not race:

pack "A*A*", "hello", "world"     # "helloworld"

The first A* packs all of "hello"; the second packs all of "world". Each * consumes one value from the list, at that value’s full length. This is the general rule: every directive corresponds to one piece of data, whatever its repeat count.

The / directive — length and data together#

Wire formats frequently store a count immediately before the thing being counted: “a 2-byte length, then that many bytes of payload”. The / directive ties the two into one step.

In pack: length-item/sequence-item#

Write two directives separated by a slash. The first packs the length; the second packs the payload. pack computes the length for you:

my $msg = pack "n/a*", "hello, world";
# "\x00\x0chello, world"
#  ^^^^^^^^ big-endian 16-bit length = 12
#          ^^^^^^^^^^^ the payload itself

The length-item may be any numeric directive — C, n, N, w, S<, and so on — or a string directive such as A4 when the protocol writes the length as ASCII:

my $buf = pack "A4/A*", "Humpty-Dumpty";
# "13  Humpty-Dumpty"  — 4-char ASCII length, then the string

In unpack: /item#

The unpack form is simpler: a bare / before the item. The count is taken from the most recent integer directive:

my ($payload) = unpack "n/a*", $msg;     # "hello, world"

Reading that template: “read a n integer, call it L; then read L bytes as an a*-style string.” The length itself does not appear in the output list.

The common mistake: A* after a /#

You cannot put another A* or a* after a /-introduced field in unpack and expect it to behave — the * is greedy:

# Wrong — $prio will be undef, $sm gets everything left
my ($src, $dst, $sm, $prio) = unpack "Z* Z* C A* C", $buf;

# Right — use /A* so $sm knows where to stop
my ($src, $dst, $sm, $prio) = unpack "Z* Z* C/A* C", $buf;

In the second template, C/A* reads a byte-count then that many bytes. Everything after the slash respects it, and the trailing C picks up the next byte as expected.

Worked example: key-value pairs#

A protocol stores a dictionary as count followed by count pairs of (length, key, length, value):

my %env = ( HOST => "example.com",
            PORT => "443",
            USER => "alice" );

my $blob = pack "S (S/A* S/A*)*",
                scalar keys %env,
                %env;

Reading it back:

my %parsed = unpack "S/(S/A* S/A*)", $blob;

The pack template says: “one 16-bit count (of pairs), then repeat the sub-template S/A* S/A* for each pair.” The unpack template reads the count and applies it to the group directly — the count is no longer in the output list.

Edge cases and constraints#

  • / makes no sense with a fixed-length item. The second directive must be variable-width: a*, A*, Z*, /A$n, or the like. Perl will reject a fixed-length second item.

  • ()* with pack cannot be matched by ()* on unpack. Pack has the values, so it can say “repeat until done”. Unpack does not know how many repetitions are encoded in the buffer unless a count directive precedes the group.

  • Nested groups are legitimate and common:

    pack "((CC)(S))<", @records
    

    Endianness modifiers cascade through every nesting level.

  • Repeat count on a group applies to the whole group and takes that many repetitions’ worth of values. (CS)3 consumes six list values, not three.

Next chapter: the directives that move around inside a template without producing a value — x, X, @, ..