SCALARs and strings · Fixed-length data
pack#
Convert a list of Perl values into a binary string according to a template.
pack is the low-level serialiser: you describe the byte layout with a TEMPLATE, hand it a LIST of values, and it returns a scalar whose characters are the concatenated machine-level encoding of those values. It is the inverse of unpack and the standard way to build fixed-width records, wire protocols, C struct layouts, raw IP addresses for sockaddr_in, and any other byte-level payload that Perl does not model as a first-class type.
This page is a directive reference. For a narrative introduction — ”I have this wire protocol, how do I parse it?“ — start at the pack/unpack tutorial.
Synopsis#
my $bytes = pack TEMPLATE, LIST;
my $ip = pack "C4", split /\./, "192.168.1.1";
my $rec = pack "Z8 Z8 L", $user, $host, $ts;
What you get back#
A plain Perl scalar holding the packed bytes. Its length is the sum of the widths of every template directive; its contents are binary and may contain embedded NUL bytes. Treat it as a byte string, not as text — writing it to a handle with a :utf8 or :encoding(…) layer will re-encode it. Use binmode $fh or open with ">:raw" before emitting.
By default the result is in character mode (C0). A template that starts with U, or switches to U0 mid-template, produces a UTF-8-encoded Unicode string instead. Do not use this as a substitute for the Encode module.
Template syntax#
A TEMPLATE is a sequence of directives. Each directive is an ASCII letter, optionally followed by:
a repeat count — a decimal integer,
*, or[…]one or more modifiers —
!,<,>a group —
( … )gathers directives so a repeat count or endianness modifier applies to the whole
Whitespace between directives is ignored. A # introduces a comment running to end-of-line — the same convention as Perl source.
Directive table#
Every pack directive in one place. W is the width in bytes per scalar consumed (a C consumes one value and produces one byte; an a3 consumes one value and produces three bytes). Endian column: native means the CPU’s byte order; big / little are fixed regardless of host.
Directive | Consumes | W | Signed | Endian | Modifiers | Notes |
|---|---|---|---|---|---|---|
| 1 string | count | — | — | — | NUL-padded to width, truncates if too long |
| 1 string | count | — | — | — | Space-padded to width |
| 1 string | count | — | — | — | NUL-terminated; |
| 1 string | count/8 | — | — | — | Bit string, LSB first within each byte |
| 1 string | count/8 | — | — | — | Bit string, MSB first within each byte |
| 1 string | count/2 | — | — | — | Hex string, low nybble first |
| 1 string | count/2 | — | — | — | Hex string, high nybble first |
| 1 integer | 1 | yes | — | — | Signed char |
| 1 integer | 1 | no | — | — | Unsigned char (octet) |
| 1 integer | 1 | no | — | — | Unsigned char; allows values above 255 in |
| 1 integer | 2 | yes | native |
|
|
| 1 integer | 2 | no | native |
|
|
| 1 integer | 4 | yes | native |
|
|
| 1 integer | 4 | no | native |
|
|
| 1 integer | native | yes | native |
|
|
| 1 integer | native | no | native |
|
|
| 1 integer | 8 | yes | native |
| Requires 64-bit-integer Perl |
| 1 integer | 8 | no | native |
| Requires 64-bit-integer Perl |
| 1 integer | 2 | no | big |
| Portable network order; |
| 1 integer | 4 | no | big |
| Portable network order; |
| 1 integer | 2 | no | little |
| ”VAX“ order; |
| 1 integer | 4 | no | little |
| ”VAX“ order; |
| 1 integer | IV size | yes | native |
| Perl-internal signed integer |
| 1 integer | UV size | no | native |
| Perl-internal unsigned integer |
| 1 number | 4 | — | native |
| Single-precision IEEE 754 |
| 1 number | 8 | — | native |
| Double-precision IEEE 754 |
| 1 number | NV size | — | native |
| Perl-internal float ( |
| 1 number | varies | — | native |
| Long double; format varies by platform |
| 1 string / | ptr | — | native |
| Pointer to NUL-terminated string; |
| 1 string / | ptr | — | native |
| Pointer to fixed-length buffer; count = buffer length |
| 1 string | varies | — | — | — | Uuencoded; count = max bytes per output line (default 45) |
| 1 codepoint | varies | — | — | — | Unicode character number; encodes to UTF-8 in |
| 1 integer ≥ 0 | varies | no | — | — | BER-compressed integer, big-endian base-128 |
| nothing | 1 | — | — |
| Insert one |
| nothing | −1 | — | — |
| Back up one byte; |
| nothing | absolute | — | — |
| Zero-fill or truncate to position N within group |
| 1 integer | absolute | — | — |
| Zero-fill or truncate to position given by the value |
| — | — | — | — |
| Group: repeat count and endianness propagate inside |
| — | — | — | — | — | Unpack-only. In pack, use length-item |
The modifiers:
Modifier | Effect |
|---|---|
| On |
| Force big-endian byte order (”big end touches the construct“). |
| Force little-endian byte order. |
Applied to a group, < / > cascade into every byte-ordered directive inside the group and are silently ignored by directives that do not accept them.
Repeat counts#
A directive letter may be followed by:
a number
N— apply the directive that many times, consumingNvalues fromLIST*— consume all remaining values; forx/X/@this is equivalent to0; foruit selects the default of 45[N]— equivalent to a bareN[template]— the repeat count is the packed byte length of template.x[L]skips as many bytes as a packed long;x![d]aligns to a double boundary
String and bit/nybble directives treat the count as width of a single value, not count-of-values: pack "A4", "abcdef" produces "abcd", not four copies of "abcdef".
Grouping#
Parentheses group directives. A group may take a repeat count or an endianness modifier:
pack "(sl)<", -42, 4711 # same as "s<l<", -42, 4711
pack "(CCS)*", @triplets # repeat group for every triplet
Within each repetition of a group, @ positioning starts over at 0 — pack '@1A((@2A)@3A)', qw(X Y Z) produces "\0X\0\0YZ".
Length-prefixed payloads (/)#
Write length-item/sequence-item. The length is computed from the sequence value and packed according to length-item; then the sequence itself is packed according to sequence-item:
my $msg = pack "n/a*", "hello, world";
# "\x00\x0chello, world" — 16-bit big-endian length, then the bytes
The length directive may be any integer directive (n, N, w, C, …) or even a string directive (A4, Z*) when you want the length written as ASCII.
Examples#
Build a fixed-width record with two NUL-terminated strings and a 32-bit native timestamp:
my $rec = pack "Z8 Z8 L", "alice", "server01", $now;
# 8 + 8 + 4 = 20 bytes
Portable network byte order — prefer n / N / v / V over native-width directives when the bytes leave the machine:
my $be = pack "n N", 42, 4711; # big-endian 16 + 32 bit
my $le = pack "v V", 42, 4711; # little-endian
my $sx = pack "s>l>", -42, 4711; # signed big-endian via modifier
my $gx = pack "(sl)<", -42, 4711; # signed little-endian via group
Align a field inside a C struct. x![d] inserts just enough NUL bytes to reach the next multiple of a double’s width:
# struct { char c; double d; char cc[2]; }
my $s = pack "c x![d] d c2", $c, $d, $c1, $c2;
Round-trip through a hex string:
my $raw = pack "H*", "deadbeef"; # "\xde\xad\xbe\xef"
my $hex = unpack "H*", $raw; # "deadbeef"
Edge cases#
Too few values: missing values are treated as
"".pack "A4 A4", "hi"yields"hi \0\0\0\0", not a fatal error.Too many values: extras are silently ignored.
avsAvsZ: all three pad to an exact width.apads withNUL,Apads with space,Zpads withNULand guarantees a trailingNUL— soZ8encodes at most 7 data bytes plus the terminator.Character width vs byte width — the single most common trap. Counts for
a/A/Zand offsets for@/.are in characters of the packed string, not bytes. InC0mode a character is one byte; inU0mode a character may span multiple UTF-8 bytes. Use the!modifier on@/.when you need byte offsets regardless of mode.Endianness on
s/S/l/L/i/I/q/Q: native only, not portable. Usen/N/v/Vor apply>/<explicitly.f/dare likewise native; IEEE 754 alone does not pin down endianness.Inf/NaNpacked as integers: fatal error. No sensible mapping exists.q/Qon non-64-bit Perl: raises an exception.pandPcapture pointers into the caller’s memory. The referent must remain live until the packed string is consumed — a temporary string passed topmay be freed before you read it back. Avoid outside XS code.Grouping and
@: positioning with@starts over at 0 inside every repetition of a group.pack '@1A((@2A)@3A)', qw(X Y Z)produces"\0X\0\0YZ", not what a naive reading suggests.Floating-point round-trip loss: Perl stores numbers as doubles, so
unpack("f", pack("f", $x))generally does not equal$x— packing through single precision truncates.Writing packed bytes to a text handle: a handle with
:utf8or:encoding(…)will re-encode the bytes. Usebinmode $fhor open with">:raw".
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
unpack— the inverse operation; same template languagesprintf— formatted text output rather than binary bytesvec— bit-level access to a string without a templatesyswrite— write a packed byte string without PerlIO surprisespack/unpack tutorial — task-oriented walkthrough with worked protocol and file-format examples