Bytes and widths#
By the end of this chapter you will be able to pick the right integer directive for a given field, count the bytes a template produces, and tell when you need a fixed width versus a native width.
Binary formats are specified by byte counts. “A 16-bit unsigned
length field followed by 4-byte packets” is an instruction about
bytes, not about numbers. The first job of a pack
template is to match those counts exactly.
The fixed-width integer directives#
The spine of every template:
Bytes |
Unsigned |
Signed |
|---|---|---|
1 |
|
|
2 |
|
|
4 |
|
|
8 |
|
|
Each directive consumes one value from the list and produces exactly that many bytes:
length pack "C", 65 # 1
length pack "S", 12345 # 2
length pack "L", 1_000_000_000 # 4
length pack "Q", 1_000_000_000_000 # 8 (if 64-bit Perl)
A repeat count packs that many of the same directive in a row:
length pack "C4", 1, 2, 3, 4 # 4
length pack "L3", 10, 20, 30 # 12
Compose them freely; the total length is the sum:
length pack "C S L", 65, 1000, 2_000_000 # 1 + 2 + 4 = 7
C is your default one-byte directive#
For protocol bytes — flag fields, version numbers, opcodes, type
tags — C (unsigned 8-bit, 0 to 255) is almost always the right
choice. c (signed, −128 to 127) turns up only when the spec says
“signed”. W is a variant of C that allows values above 255 when
your template is in U0 (UTF-8) mode — ignore it until you hit that
case.
IP addresses are the textbook example:
my $raw = pack "C4", 192, 168, 1, 42;
# ^-- four unsigned bytes, in the order the list gives them
Native vs. fixed width#
The directives s / S / l / L are specified to be exactly 2
and 4 bytes regardless of the C compiler. Their native-width
counterparts add !:
length pack "l", 0 # 4 (always)
length pack "l!", 0 # 4 on most 32-bit and 64-bit systems,
# 8 on 64-bit Alpha, some legacy Unixes.
i and I are always native (whatever sizeof(int) returns on
this machine, with a 32-bit minimum). i! is an alias for i.
Rule of thumb:
For interoperable binary (wire formats, on-disk files, data crossing machines) use fixed width (
l,L, …) plus an explicit endianness — or the portable shortcuts.For in-process work that matches some local C struct you are about to hand to
ioctlorsyscall, use native width (l!,L!,i,I).
Mixing the two is fine, but stop and ask yourself why each time.
Counting the length of a template#
Before you send a record, check the byte count is what the spec
says. length pack(TEMPLATE, ...) with placeholder values is the
simplest way:
my $sz = length pack "n N L C", 0, 0, 0, 0;
# 2 + 4 + 4 + 1 = 11
For templates whose length depends on native sizes, length pack TEMPLATE, 0, 0, ... gives you the answer on the current machine.
Worked example: a minimal packet header#
A hypothetical protocol specifies:
8-bit version
8-bit flags
16-bit big-endian length (payload follows)
Three fields, four bytes total. The template writes itself:
my $hdr = pack "C C n", $version, $flags, length $payload;
my $pkt = $hdr . $payload;
length $hdr # 4
Two things to note:
We did not spell out
C1 C1 n1— a bare directive letter has an implicit repeat count of 1.The length in the third field is computed from
$payloadin the same expression. A later chapter shows the/form that ties the two together so you cannot forget.
Edge cases worth remembering#
Values out of range:
pack "C", 256silently truncates (you get\x00). There is no warning. Check ranges yourself if the data is untrusted.Signed vs. unsigned round-trip: a value packed as
Cand unpacked asc(or vice versa) flips sign for values ≥ 128:unpack "c", pack "C", 200 # -56
Too few values: missing values pack as
0for numeric directives,""for strings. No warning.
The next chapter handles the question every multi-byte integer raises: which end of the number goes first?