Bytes and widths#

By the end of this chapter you will be able to pick the right integer directive for a given field, count the bytes a template produces, and tell when you need a fixed width versus a native width.

Binary formats are specified by byte counts. “A 16-bit unsigned length field followed by 4-byte packets” is an instruction about bytes, not about numbers. The first job of a pack template is to match those counts exactly.

The fixed-width integer directives#

The spine of every template:

Bytes

Unsigned

Signed

1

C

c

2

S

s

4

L

l

8

Q

q

Each directive consumes one value from the list and produces exactly that many bytes:

length pack "C",  65                  # 1
length pack "S",  12345               # 2
length pack "L",  1_000_000_000       # 4
length pack "Q",  1_000_000_000_000   # 8 (if 64-bit Perl)

A repeat count packs that many of the same directive in a row:

length pack "C4", 1, 2, 3, 4          # 4
length pack "L3", 10, 20, 30          # 12

Compose them freely; the total length is the sum:

length pack "C S L", 65, 1000, 2_000_000   # 1 + 2 + 4 = 7

C is your default one-byte directive#

For protocol bytes — flag fields, version numbers, opcodes, type tags — C (unsigned 8-bit, 0 to 255) is almost always the right choice. c (signed, −128 to 127) turns up only when the spec says “signed”. W is a variant of C that allows values above 255 when your template is in U0 (UTF-8) mode — ignore it until you hit that case.

IP addresses are the textbook example:

my $raw = pack "C4", 192, 168, 1, 42;
#         ^-- four unsigned bytes, in the order the list gives them

Native vs. fixed width#

The directives s / S / l / L are specified to be exactly 2 and 4 bytes regardless of the C compiler. Their native-width counterparts add !:

length pack "l",  0          # 4  (always)
length pack "l!", 0          # 4 on most 32-bit and 64-bit systems,
                             # 8 on 64-bit Alpha, some legacy Unixes.

i and I are always native (whatever sizeof(int) returns on this machine, with a 32-bit minimum). i! is an alias for i.

Rule of thumb:

  • For interoperable binary (wire formats, on-disk files, data crossing machines) use fixed width (l, L, …) plus an explicit endianness — or the portable shortcuts.

  • For in-process work that matches some local C struct you are about to hand to ioctl or syscall, use native width (l!, L!, i, I).

Mixing the two is fine, but stop and ask yourself why each time.

Counting the length of a template#

Before you send a record, check the byte count is what the spec says. length pack(TEMPLATE, ...) with placeholder values is the simplest way:

my $sz = length pack "n N L C", 0, 0, 0, 0;
# 2 + 4 + 4 + 1 = 11

For templates whose length depends on native sizes, length pack TEMPLATE, 0, 0, ... gives you the answer on the current machine.

Worked example: a minimal packet header#

A hypothetical protocol specifies:

  • 8-bit version

  • 8-bit flags

  • 16-bit big-endian length (payload follows)

Three fields, four bytes total. The template writes itself:

my $hdr = pack "C C n", $version, $flags, length $payload;
my $pkt = $hdr . $payload;

length $hdr    # 4

Two things to note:

  1. We did not spell out C1 C1 n1 — a bare directive letter has an implicit repeat count of 1.

  2. The length in the third field is computed from $payload in the same expression. A later chapter shows the / form that ties the two together so you cannot forget.

Edge cases worth remembering#

  • Values out of range: pack "C", 256 silently truncates (you get \x00). There is no warning. Check ranges yourself if the data is untrusted.

  • Signed vs. unsigned round-trip: a value packed as C and unpacked as c (or vice versa) flips sign for values ≥ 128:

    unpack "c", pack "C", 200        # -56
    
  • Too few values: missing values pack as 0 for numeric directives, "" for strings. No warning.

The next chapter handles the question every multi-byte integer raises: which end of the number goes first?