SCALARs and strings · Fixed-length data

pack#

Convert a list of Perl values into a binary string according to a template.

pack is the low-level serialiser: you describe the byte layout with a TEMPLATE, hand it a LIST of values, and it returns a scalar whose characters are the concatenated machine-level encoding of those values. It is the inverse of unpack and the standard way to build fixed-width records, wire protocols, C struct layouts, raw IP addresses for sockaddr_in, and any other byte-level payload that Perl does not model as a first-class type.

This page is a directive reference. For a narrative introduction — “I have this wire protocol, how do I parse it?” — start at the pack/unpack tutorial.

Synopsis#

my $bytes = pack TEMPLATE, LIST;
my $ip    = pack "C4",     split /\./, "192.168.1.1";
my $rec   = pack "Z8 Z8 L", $user, $host, $ts;

What you get back#

A plain Perl scalar holding the packed bytes. Its length is the sum of the widths of every template directive; its contents are binary and may contain embedded NUL bytes. Treat it as a byte string, not as text — writing it to a handle with a :utf8 or :encoding(…) layer will re-encode it. Use binmode $fh or open with ">:raw" before emitting.

By default the result is in character mode (C0). A template that starts with U, or switches to U0 mid-template, produces a UTF-8-encoded Unicode string instead. Do not use this as a substitute for the Encode module.

Template syntax#

A TEMPLATE is a sequence of directives. Each directive is an ASCII letter, optionally followed by:

a repeat count — a decimal integer, *, or […]
one or more modifiers — !, <, >
a group — ( … ) gathers directives so a repeat count or endianness modifier applies to the whole

Whitespace between directives is ignored. A # introduces a comment running to end-of-line — the same convention as Perl source.

Directive table#

Every pack directive in one place. W is the width in bytes per scalar consumed (a C consumes one value and produces one byte; an a3 consumes one value and produces three bytes). Endian column: native means the CPU’s byte order; big / little are fixed regardless of host.

Directive	Consumes	W	Signed	Endian	Modifiers	Notes
`a`	1 string	count	—	—	—	NUL-padded to width, truncates if too long
`A`	1 string	count	—	—	—	Space-padded to width
`Z`	1 string	count	—	—	—	NUL-terminated; `Z*` always appends a trailing `NUL`
`b`	1 string	count/8	—	—	—	Bit string, LSB first within each byte
`B`	1 string	count/8	—	—	—	Bit string, MSB first within each byte
`h`	1 string	count/2	—	—	—	Hex string, low nybble first
`H`	1 string	count/2	—	—	—	Hex string, high nybble first
`c`	1 integer	1	yes	—	—	Signed char
`C`	1 integer	1	no	—	—	Unsigned char (octet)
`W`	1 integer	1	no	—	—	Unsigned char; allows values above 255 in `U0` mode
`s`	1 integer	2	yes	native	`!` `<` `>`	`s!` = native `short`
`S`	1 integer	2	no	native	`!` `<` `>`	`S!` = native `unsigned short`
`l`	1 integer	4	yes	native	`!` `<` `>`	`l!` = native `long`
`L`	1 integer	4	no	native	`!` `<` `>`	`L!` = native `unsigned long`
`i`	1 integer	native	yes	native	`!` `<` `>`	`sizeof(int)`; at least 32 bits
`I`	1 integer	native	no	native	`!` `<` `>`	`sizeof(unsigned int)`
`q`	1 integer	8	yes	native	`<` `>`	Requires 64-bit-integer Perl
`Q`	1 integer	8	no	native	`<` `>`	Requires 64-bit-integer Perl
`n`	1 integer	2	no	big	`!`	Portable network order; `n!` = signed
`N`	1 integer	4	no	big	`!`	Portable network order; `N!` = signed
`v`	1 integer	2	no	little	`!`	“VAX” order; `v!` = signed
`V`	1 integer	4	no	little	`!`	“VAX” order; `V!` = signed
`j`	1 integer	IV size	yes	native	`<` `>`	Perl-internal signed integer
`J`	1 integer	UV size	no	native	`<` `>`	Perl-internal unsigned integer
`f`	1 number	4	—	native	`<` `>`	Single-precision IEEE 754
`d`	1 number	8	—	native	`<` `>`	Double-precision IEEE 754
`F`	1 number	NV size	—	native	`<` `>`	Perl-internal float (`NV`)
`D`	1 number	varies	—	native	`<` `>`	Long double; format varies by platform
`p`	1 string / `undef`	ptr	—	native	`<` `>`	Pointer to NUL-terminated string; `undef` → null pointer
`P`	1 string / `undef`	ptr	—	native	`<` `>`	Pointer to fixed-length buffer; count = buffer length
`u`	1 string	varies	—	—	—	Uuencoded; count = max bytes per output line (default 45)
`U`	1 codepoint	varies	—	—	—	Unicode character number; encodes to UTF-8 in `U0` mode
`w`	1 integer ≥ 0	varies	no	—	—	BER-compressed integer, big-endian base-128
`x`	nothing	1	—	—	`!`	Insert one `NUL`; `x!N` aligns to multiple of `N`
`X`	nothing	−1	—	—	`!`	Back up one byte; `X!N` aligns backward
`@`	nothing	absolute	—	—	`!`	Zero-fill or truncate to position N within group
`.`	1 integer	absolute	—	—	`!`	Zero-fill or truncate to position given by the value
`( … )`	—	—	—	—	`<` `>` `!`	Group: repeat count and endianness propagate inside
`/`	—	—	—	—	—	Unpack-only. In pack, use length-item`/`item

The modifiers:

Modifier	Effect
`!`	On `s` / `S` / `l` / `L` / `i` / `I`: use native `sizeof(…)` instead of fixed width. On `x` / `X`: turn into alignment commands. On `n` / `N` / `v` / `V`: interpret as signed. On `@` / `.`: count bytes of the internal representation, not characters.
`>`	Force big-endian byte order (“big end touches the construct”).
`<`	Force little-endian byte order.

Applied to a group, < / > cascade into every byte-ordered directive inside the group and are silently ignored by directives that do not accept them.

Repeat counts#

A directive letter may be followed by:

a number N — apply the directive that many times, consuming N values from LIST
* — consume all remaining values; for x / X / @ this is equivalent to 0; for u it selects the default of 45
[N] — equivalent to a bare N
[template] — the repeat count is the packed byte length of template. x[L] skips as many bytes as a packed long; x![d] aligns to a double boundary

String and bit/nybble directives treat the count as width of a single value, not count-of-values: pack "A4", "abcdef" produces "abcd", not four copies of "abcdef".

Grouping#

Parentheses group directives. A group may take a repeat count or an endianness modifier:

pack "(sl)<", -42, 4711     # same as "s<l<", -42, 4711
pack "(CCS)*", @triplets    # repeat group for every triplet

Within each repetition of a group, @ positioning starts over at 0 — pack '@1A((@2A)@3A)', qw(X Y Z) produces "\0X\0\0YZ".

Length-prefixed payloads (`/`)#

Write length-item/sequence-item. The length is computed from the sequence value and packed according to length-item; then the sequence itself is packed according to sequence-item:

my $msg = pack "n/a*", "hello, world";
# "\x00\x0chello, world" — 16-bit big-endian length, then the bytes

The length directive may be any integer directive (n, N, w, C, …) or even a string directive (A4, Z*) when you want the length written as ASCII.

Examples#

Build a fixed-width record with two NUL-terminated strings and a 32-bit native timestamp:

my $rec = pack "Z8 Z8 L", "alice", "server01", $now;
# 8 + 8 + 4 = 20 bytes

Portable network byte order — prefer n / N / v / V over native-width directives when the bytes leave the machine:

my $be = pack "n N", 42, 4711;      # big-endian 16 + 32 bit
my $le = pack "v V", 42, 4711;      # little-endian
my $sx = pack "s>l>", -42, 4711;    # signed big-endian via modifier
my $gx = pack "(sl)<", -42, 4711;   # signed little-endian via group

Align a field inside a C struct. x![d] inserts just enough NUL bytes to reach the next multiple of a double’s width:

# struct { char c; double d; char cc[2]; }
my $s = pack "c x![d] d c2", $c, $d, $c1, $c2;

Round-trip through a hex string:

my $raw = pack "H*", "deadbeef";    # "\xde\xad\xbe\xef"
my $hex = unpack "H*", $raw;        # "deadbeef"

Edge cases#

Too few values: missing values are treated as "". pack "A4 A4", "hi" yields "hi \0\0\0\0", not a fatal error.
Too many values: extras are silently ignored.
a vs A vs Z: all three pad to an exact width. a pads with NUL, A pads with space, Z pads with NUL and guarantees a trailing NUL — so Z8 encodes at most 7 data bytes plus the terminator.
Character width vs byte width — the single most common trap. Counts for a / A / Z and offsets for @ / . are in characters of the packed string, not bytes. In C0 mode a character is one byte; in U0 mode a character may span multiple UTF-8 bytes. Use the ! modifier on @ / . when you need byte offsets regardless of mode.
Endianness on s / S / l / L / i / I / q / Q: native only, not portable. Use n / N / v / V or apply > / < explicitly. f / d are likewise native; IEEE 754 alone does not pin down endianness.
Inf / NaN packed as integers: fatal error. No sensible mapping exists.
q / Q on non-64-bit Perl: raises an exception.
p and P capture pointers into the caller’s memory. The referent must remain live until the packed string is consumed — a temporary string passed to p may be freed before you read it back. Avoid outside XS code.
Grouping and @: positioning with @ starts over at 0 inside every repetition of a group. pack '@1A((@2A)@3A)', qw(X Y Z) produces "\0X\0\0YZ", not what a naive reading suggests.
Floating-point round-trip loss: Perl stores numbers as doubles, so unpack("f", pack("f", $x)) generally does not equal $x — packing through single precision truncates.
Writing packed bytes to a text handle: a handle with :utf8 or :encoding(…) will re-encode the bytes. Use binmode $fh or open with ">:raw".

Differences from upstream#

Fully compatible with upstream Perl 5.42.