SCALARs and strings · Fixed-length data

pack#

Convert a list of Perl values into a binary string according to a template.

pack is the low-level serialiser: you describe the byte layout with a TEMPLATE, hand it a LIST of values, and it returns a scalar whose characters are the concatenated machine-level encoding of those values. It is the inverse of unpack and the standard way to build fixed-width records, wire protocols, C struct layouts, raw IP addresses for sockaddr_in, and any other byte-level payload that Perl does not model as a first-class type.

This page is a directive reference. For a narrative introduction — “I have this wire protocol, how do I parse it?” — start at the pack/unpack tutorial.

Synopsis#

my $bytes = pack TEMPLATE, LIST;
my $ip    = pack "C4",     split /\./, "192.168.1.1";
my $rec   = pack "Z8 Z8 L", $user, $host, $ts;

What you get back#

A plain Perl scalar holding the packed bytes. Its length is the sum of the widths of every template directive; its contents are binary and may contain embedded NUL bytes. Treat it as a byte string, not as text — writing it to a handle with a :utf8 or :encoding(…) layer will re-encode it. Use binmode $fh or open with ">:raw" before emitting.

By default the result is in character mode (C0). A template that starts with U, or switches to U0 mid-template, produces a UTF-8-encoded Unicode string instead. Do not use this as a substitute for the Encode module.

Template syntax#

A TEMPLATE is a sequence of directives. Each directive is an ASCII letter, optionally followed by:

  • a repeat count — a decimal integer, *, or […]

  • one or more modifiers!, <, >

  • a group( ) gathers directives so a repeat count or endianness modifier applies to the whole

Whitespace between directives is ignored. A # introduces a comment running to end-of-line — the same convention as Perl source.

Directive table#

Every pack directive in one place. W is the width in bytes per scalar consumed (a C consumes one value and produces one byte; an a3 consumes one value and produces three bytes). Endian column: native means the CPU’s byte order; big / little are fixed regardless of host.

Directive

Consumes

W

Signed

Endian

Modifiers

Notes

a

1 string

count

NUL-padded to width, truncates if too long

A

1 string

count

Space-padded to width

Z

1 string

count

NUL-terminated; Z* always appends a trailing NUL

b

1 string

count/8

Bit string, LSB first within each byte

B

1 string

count/8

Bit string, MSB first within each byte

h

1 string

count/2

Hex string, low nybble first

H

1 string

count/2

Hex string, high nybble first

c

1 integer

1

yes

Signed char

C

1 integer

1

no

Unsigned char (octet)

W

1 integer

1

no

Unsigned char; allows values above 255 in U0 mode

s

1 integer

2

yes

native

! < >

s! = native short

S

1 integer

2

no

native

! < >

S! = native unsigned short

l

1 integer

4

yes

native

! < >

l! = native long

L

1 integer

4

no

native

! < >

L! = native unsigned long

i

1 integer

native

yes

native

! < >

sizeof(int); at least 32 bits

I

1 integer

native

no

native

! < >

sizeof(unsigned int)

q

1 integer

8

yes

native

< >

Requires 64-bit-integer Perl

Q

1 integer

8

no

native

< >

Requires 64-bit-integer Perl

n

1 integer

2

no

big

!

Portable network order; n! = signed

N

1 integer

4

no

big

!

Portable network order; N! = signed

v

1 integer

2

no

little

!

“VAX” order; v! = signed

V

1 integer

4

no

little

!

“VAX” order; V! = signed

j

1 integer

IV size

yes

native

< >

Perl-internal signed integer

J

1 integer

UV size

no

native

< >

Perl-internal unsigned integer

f

1 number

4

native

< >

Single-precision IEEE 754

d

1 number

8

native

< >

Double-precision IEEE 754

F

1 number

NV size

native

< >

Perl-internal float (NV)

D

1 number

varies

native

< >

Long double; format varies by platform

p

1 string / undef

ptr

native

< >

Pointer to NUL-terminated string; undef → null pointer

P

1 string / undef

ptr

native

< >

Pointer to fixed-length buffer; count = buffer length

u

1 string

varies

Uuencoded; count = max bytes per output line (default 45)

U

1 codepoint

varies

Unicode character number; encodes to UTF-8 in U0 mode

w

1 integer ≥ 0

varies

no

BER-compressed integer, big-endian base-128

x

nothing

1

!

Insert one NUL; x!N aligns to multiple of N

X

nothing

−1

!

Back up one byte; X!N aligns backward

@

nothing

absolute

!

Zero-fill or truncate to position N within group

.

1 integer

absolute

!

Zero-fill or truncate to position given by the value

( )

< > !

Group: repeat count and endianness propagate inside

/

Unpack-only. In pack, use length-item/item

The modifiers:

Modifier

Effect

!

On s / S / l / L / i / I: use native sizeof(…) instead of fixed width. On x / X: turn into alignment commands. On n / N / v / V: interpret as signed. On @ / .: count bytes of the internal representation, not characters.

>

Force big-endian byte order (“big end touches the construct”).

<

Force little-endian byte order.

Applied to a group, < / > cascade into every byte-ordered directive inside the group and are silently ignored by directives that do not accept them.

Repeat counts#

A directive letter may be followed by:

  • a number N — apply the directive that many times, consuming N values from LIST

  • * — consume all remaining values; for x / X / @ this is equivalent to 0; for u it selects the default of 45

  • [N] — equivalent to a bare N

  • [template] — the repeat count is the packed byte length of template. x[L] skips as many bytes as a packed long; x![d] aligns to a double boundary

String and bit/nybble directives treat the count as width of a single value, not count-of-values: pack "A4", "abcdef" produces "abcd", not four copies of "abcdef".

Grouping#

Parentheses group directives. A group may take a repeat count or an endianness modifier:

pack "(sl)<", -42, 4711     # same as "s<l<", -42, 4711
pack "(CCS)*", @triplets    # repeat group for every triplet

Within each repetition of a group, @ positioning starts over at 0 — pack '@1A((@2A)@3A)', qw(X Y Z) produces "\0X\0\0YZ".

Length-prefixed payloads (/)#

Write length-item/sequence-item. The length is computed from the sequence value and packed according to length-item; then the sequence itself is packed according to sequence-item:

my $msg = pack "n/a*", "hello, world";
# "\x00\x0chello, world" — 16-bit big-endian length, then the bytes

The length directive may be any integer directive (n, N, w, C, …) or even a string directive (A4, Z*) when you want the length written as ASCII.

Examples#

Build a fixed-width record with two NUL-terminated strings and a 32-bit native timestamp:

my $rec = pack "Z8 Z8 L", "alice", "server01", $now;
# 8 + 8 + 4 = 20 bytes

Portable network byte order — prefer n / N / v / V over native-width directives when the bytes leave the machine:

my $be = pack "n N", 42, 4711;      # big-endian 16 + 32 bit
my $le = pack "v V", 42, 4711;      # little-endian
my $sx = pack "s>l>", -42, 4711;    # signed big-endian via modifier
my $gx = pack "(sl)<", -42, 4711;   # signed little-endian via group

Align a field inside a C struct. x![d] inserts just enough NUL bytes to reach the next multiple of a double’s width:

# struct { char c; double d; char cc[2]; }
my $s = pack "c x![d] d c2", $c, $d, $c1, $c2;

Round-trip through a hex string:

my $raw = pack "H*", "deadbeef";    # "\xde\xad\xbe\xef"
my $hex = unpack "H*", $raw;        # "deadbeef"

Edge cases#

  • Too few values: missing values are treated as "". pack "A4 A4", "hi" yields "hi  \0\0\0\0", not a fatal error.

  • Too many values: extras are silently ignored.

  • a vs A vs Z: all three pad to an exact width. a pads with NUL, A pads with space, Z pads with NUL and guarantees a trailing NUL — so Z8 encodes at most 7 data bytes plus the terminator.

  • Character width vs byte width — the single most common trap. Counts for a / A / Z and offsets for @ / . are in characters of the packed string, not bytes. In C0 mode a character is one byte; in U0 mode a character may span multiple UTF-8 bytes. Use the ! modifier on @ / . when you need byte offsets regardless of mode.

  • Endianness on s / S / l / L / i / I / q / Q: native only, not portable. Use n / N / v / V or apply > / < explicitly. f / d are likewise native; IEEE 754 alone does not pin down endianness.

  • Inf / NaN packed as integers: fatal error. No sensible mapping exists.

  • q / Q on non-64-bit Perl: raises an exception.

  • p and P capture pointers into the caller’s memory. The referent must remain live until the packed string is consumed — a temporary string passed to p may be freed before you read it back. Avoid outside XS code.

  • Grouping and @: positioning with @ starts over at 0 inside every repetition of a group. pack '@1A((@2A)@3A)', qw(X Y Z) produces "\0X\0\0YZ", not what a naive reading suggests.

  • Floating-point round-trip loss: Perl stores numbers as doubles, so unpack("f", pack("f", $x)) generally does not equal $x — packing through single precision truncates.

  • Writing packed bytes to a text handle: a handle with :utf8 or :encoding(…) will re-encode the bytes. Use binmode $fh or open with ">:raw".

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • unpack — the inverse operation; same template language

  • sprintf — formatted text output rather than binary bytes

  • vec — bit-level access to a string without a template

  • chr / ord — single-character conversions

  • syswrite — write a packed byte string without PerlIO surprises

  • pack/unpack tutorial — task-oriented walkthrough with worked protocol and file-format examples