SCALARs and strings · Fixed-length data
pack#
Convert a list of Perl values into a binary string according to a template.
pack is the low-level serialiser: you describe the byte layout with
a TEMPLATE, hand it a LIST of values, and it returns a scalar
whose characters are the concatenated machine-level encoding of those
values. It is the inverse of unpack and the standard way
to build fixed-width records, wire protocols, C struct layouts, raw
IP addresses for sockaddr_in, and any other byte-level payload that
Perl does not model as a first-class type.
This page is a directive reference. For a narrative introduction — “I have this wire protocol, how do I parse it?” — start at the pack/unpack tutorial.
Synopsis#
my $bytes = pack TEMPLATE, LIST;
my $ip = pack "C4", split /\./, "192.168.1.1";
my $rec = pack "Z8 Z8 L", $user, $host, $ts;
What you get back#
A plain Perl scalar holding the packed bytes. Its length is the sum
of the widths of every template directive; its contents are binary
and may contain embedded NUL bytes. Treat it as a byte string, not
as text — writing it to a handle with a :utf8 or :encoding(…)
layer will re-encode it. Use binmode $fh or open with ">:raw"
before emitting.
By default the result is in character mode (C0). A template
that starts with U, or switches to U0 mid-template, produces a
UTF-8-encoded Unicode string instead. Do not use this as a substitute
for the Encode module.
Template syntax#
A TEMPLATE is a sequence of directives. Each directive is an
ASCII letter, optionally followed by:
a repeat count — a decimal integer,
*, or[…]one or more modifiers —
!,<,>a group —
( … )gathers directives so a repeat count or endianness modifier applies to the whole
Whitespace between directives is ignored. A # introduces a comment
running to end-of-line — the same convention as Perl source.
Directive table#
Every pack directive in one place. W is the width in bytes per
scalar consumed (a C consumes one value and produces one byte; an
a3 consumes one value and produces three bytes). Endian column:
native means the CPU’s byte order; big / little are fixed
regardless of host.
Directive |
Consumes |
W |
Signed |
Endian |
Modifiers |
Notes |
|---|---|---|---|---|---|---|
|
1 string |
count |
— |
— |
— |
NUL-padded to width, truncates if too long |
|
1 string |
count |
— |
— |
— |
Space-padded to width |
|
1 string |
count |
— |
— |
— |
NUL-terminated; |
|
1 string |
count/8 |
— |
— |
— |
Bit string, LSB first within each byte |
|
1 string |
count/8 |
— |
— |
— |
Bit string, MSB first within each byte |
|
1 string |
count/2 |
— |
— |
— |
Hex string, low nybble first |
|
1 string |
count/2 |
— |
— |
— |
Hex string, high nybble first |
|
1 integer |
1 |
yes |
— |
— |
Signed char |
|
1 integer |
1 |
no |
— |
— |
Unsigned char (octet) |
|
1 integer |
1 |
no |
— |
— |
Unsigned char; allows values above 255 in |
|
1 integer |
2 |
yes |
native |
|
|
|
1 integer |
2 |
no |
native |
|
|
|
1 integer |
4 |
yes |
native |
|
|
|
1 integer |
4 |
no |
native |
|
|
|
1 integer |
native |
yes |
native |
|
|
|
1 integer |
native |
no |
native |
|
|
|
1 integer |
8 |
yes |
native |
|
Requires 64-bit-integer Perl |
|
1 integer |
8 |
no |
native |
|
Requires 64-bit-integer Perl |
|
1 integer |
2 |
no |
big |
|
Portable network order; |
|
1 integer |
4 |
no |
big |
|
Portable network order; |
|
1 integer |
2 |
no |
little |
|
“VAX” order; |
|
1 integer |
4 |
no |
little |
|
“VAX” order; |
|
1 integer |
IV size |
yes |
native |
|
Perl-internal signed integer |
|
1 integer |
UV size |
no |
native |
|
Perl-internal unsigned integer |
|
1 number |
4 |
— |
native |
|
Single-precision IEEE 754 |
|
1 number |
8 |
— |
native |
|
Double-precision IEEE 754 |
|
1 number |
NV size |
— |
native |
|
Perl-internal float ( |
|
1 number |
varies |
— |
native |
|
Long double; format varies by platform |
|
1 string / |
ptr |
— |
native |
|
Pointer to NUL-terminated string; |
|
1 string / |
ptr |
— |
native |
|
Pointer to fixed-length buffer; count = buffer length |
|
1 string |
varies |
— |
— |
— |
Uuencoded; count = max bytes per output line (default 45) |
|
1 codepoint |
varies |
— |
— |
— |
Unicode character number; encodes to UTF-8 in |
|
1 integer ≥ 0 |
varies |
no |
— |
— |
BER-compressed integer, big-endian base-128 |
|
nothing |
1 |
— |
— |
|
Insert one |
|
nothing |
−1 |
— |
— |
|
Back up one byte; |
|
nothing |
absolute |
— |
— |
|
Zero-fill or truncate to position N within group |
|
1 integer |
absolute |
— |
— |
|
Zero-fill or truncate to position given by the value |
|
— |
— |
— |
— |
|
Group: repeat count and endianness propagate inside |
|
— |
— |
— |
— |
— |
Unpack-only. In pack, use length-item |
The modifiers:
Modifier |
Effect |
|---|---|
|
On |
|
Force big-endian byte order (“big end touches the construct”). |
|
Force little-endian byte order. |
Applied to a group, < / > cascade into every byte-ordered directive
inside the group and are silently ignored by directives that do not
accept them.
Repeat counts#
A directive letter may be followed by:
a number
N— apply the directive that many times, consumingNvalues fromLIST*— consume all remaining values; forx/X/@this is equivalent to0; foruit selects the default of 45[N]— equivalent to a bareN[template]— the repeat count is the packed byte length of template.x[L]skips as many bytes as a packed long;x![d]aligns to a double boundary
String and bit/nybble directives treat the count as width of a
single value, not count-of-values: pack "A4", "abcdef" produces
"abcd", not four copies of "abcdef".
Grouping#
Parentheses group directives. A group may take a repeat count or an endianness modifier:
pack "(sl)<", -42, 4711 # same as "s<l<", -42, 4711
pack "(CCS)*", @triplets # repeat group for every triplet
Within each repetition of a group, @ positioning starts over at 0
— pack '@1A((@2A)@3A)', qw(X Y Z) produces "\0X\0\0YZ".
Length-prefixed payloads (/)#
Write length-item/sequence-item. The length is computed from the
sequence value and packed according to length-item; then the
sequence itself is packed according to sequence-item:
my $msg = pack "n/a*", "hello, world";
# "\x00\x0chello, world" — 16-bit big-endian length, then the bytes
The length directive may be any integer directive (n, N, w,
C, …) or even a string directive (A4, Z*) when you want the
length written as ASCII.
Examples#
Build a fixed-width record with two NUL-terminated strings and a 32-bit native timestamp:
my $rec = pack "Z8 Z8 L", "alice", "server01", $now;
# 8 + 8 + 4 = 20 bytes
Portable network byte order — prefer n / N / v / V over
native-width directives when the bytes leave the machine:
my $be = pack "n N", 42, 4711; # big-endian 16 + 32 bit
my $le = pack "v V", 42, 4711; # little-endian
my $sx = pack "s>l>", -42, 4711; # signed big-endian via modifier
my $gx = pack "(sl)<", -42, 4711; # signed little-endian via group
Align a field inside a C struct. x![d] inserts just enough NUL
bytes to reach the next multiple of a double’s width:
# struct { char c; double d; char cc[2]; }
my $s = pack "c x![d] d c2", $c, $d, $c1, $c2;
Round-trip through a hex string:
my $raw = pack "H*", "deadbeef"; # "\xde\xad\xbe\xef"
my $hex = unpack "H*", $raw; # "deadbeef"
Edge cases#
Too few values: missing values are treated as
"".pack "A4 A4", "hi"yields"hi \0\0\0\0", not a fatal error.Too many values: extras are silently ignored.
avsAvsZ: all three pad to an exact width.apads withNUL,Apads with space,Zpads withNULand guarantees a trailingNUL— soZ8encodes at most 7 data bytes plus the terminator.Character width vs byte width — the single most common trap. Counts for
a/A/Zand offsets for@/.are in characters of the packed string, not bytes. InC0mode a character is one byte; inU0mode a character may span multiple UTF-8 bytes. Use the!modifier on@/.when you need byte offsets regardless of mode.Endianness on
s/S/l/L/i/I/q/Q: native only, not portable. Usen/N/v/Vor apply>/<explicitly.f/dare likewise native; IEEE 754 alone does not pin down endianness.Inf/NaNpacked as integers: fatal error. No sensible mapping exists.q/Qon non-64-bit Perl: raises an exception.pandPcapture pointers into the caller’s memory. The referent must remain live until the packed string is consumed — a temporary string passed topmay be freed before you read it back. Avoid outside XS code.Grouping and
@: positioning with@starts over at 0 inside every repetition of a group.pack '@1A((@2A)@3A)', qw(X Y Z)produces"\0X\0\0YZ", not what a naive reading suggests.Floating-point round-trip loss: Perl stores numbers as doubles, so
unpack("f", pack("f", $x))generally does not equal$x— packing through single precision truncates.Writing packed bytes to a text handle: a handle with
:utf8or:encoding(…)will re-encode the bytes. Usebinmode $fhor open with">:raw".
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
unpack— the inverse operation; same template languagesprintf— formatted text output rather than binary bytesvec— bit-level access to a string without a templatesyswrite— write a packed byte string without PerlIO surprisespack/unpack tutorial — task-oriented walkthrough with worked protocol and file-format examples