Fixed-length data

vec#

Read or write a fixed-width slot inside a string treated as a packed bit vector.

vec views EXPR as an array of unsigned integers, each BITS wide, packed back-to-back from the start of the string. OFFSET indexes into that array — not into bytes, not into bits, but into BITS-sized elements. Read form returns the integer at that slot; the lvalue form writes one. Widths are 1, 2, 4, 8, 16, 32, and on 64-bit builds 64.

Synopsis#

my $n = vec($buf, $offset, $bits);
vec($buf, $offset, $bits) = $n;

What you get back#

In rvalue context, an unsigned integer — the contents of the selected slot, zero-extended to a Perl number. In lvalue context, an assignable slot; assigning truncates the right-hand value to BITS and writes it into the string, extending the string with zero bytes if OFFSET lies past the current end.

The parentheses around vec(...) in the lvalue form are required — without them, vec $buf, $o, $b = 3 parses the = as part of the argument list.

How the bits are laid out#

The layout depends on BITS, and is chosen so code is portable across big- and little-endian machines:

  • BITS == 8: each slot is one byte of the string. vec($s, $i, 8) is the unsigned value of substr($s, $i, 1).

  • BITS == 16, 32, 64: bytes of the string are grouped into chunks of BITS/8 and interpreted in big-endian order — equivalent to unpack with n, N, or (on 64-bit builds) Q> / the moral equivalent. vec($s, 0, 32) reads the first four bytes as a big-endian uint32.

  • BITS == 4, 2, 1: the string is broken into bytes, and each byte is split into 8/BITS groups, numbered little-endian-ish within the byte. The bit values from low to high are 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80. So for chr(0x36) (0b00110110):

    • BITS == 4 gives the two nibbles (0x6, 0x3).

    • BITS == 2 gives the four 2-bit groups (0x2, 0x1, 0x3, 0x0).

    • BITS == 1 gives the eight bits (0, 1, 1, 0, 1, 1, 0, 0).

A slot entirely off the end of the string reads as 0. Writing past the end grows the string with zero bytes to reach the slot. A negative OFFSET is a fatal error.

Global state it touches#

None. vec operates purely on its arguments.

Examples#

Read a byte at a given index:

my $s = "Perl";
print vec($s, 0, 8);            # 80   (== ord 'P')
print vec($s, 3, 8);            # 108  (== ord 'l')

Build a string by writing 32-bit big-endian words:

my $buf = '';
vec($buf, 0, 32) = 0x5065726C;  # "Perl"
vec($buf, 1, 32) = 0x50657270;  # "PerlPerp"
print $buf;                     # PerlPerp

Use vec as a compact boolean array — one bit per flag:

my $flags = '';
vec($flags, 17, 1) = 1;
vec($flags, 42, 1) = 1;
print vec($flags, 17, 1);       # 1
print vec($flags, 18, 1);       # 0  (slot never set, still zero)
print length $flags;            # 6  (string auto-extended to fit bit 42)

Count the set bits in a bit vector without looping bit by bit — the idiomatic pattern uses unpack:

my $ones = unpack("%32b*", $flags);   # population count

Convert a bit vector into a string of 0s and 1s for display:

my $bits = unpack("b*", $flags);      # "00...010...010..."

Combine two bit vectors with the bitwise string operators — those treat string operands as bit vectors of the same shape vec reads and writes:

my $union        = $flags_a | $flags_b;
my $intersection = $flags_a & $flags_b;
my $diff         = $flags_a ^ $flags_b;

Edge cases#

  • Lvalue precedence: vec EXPR, O, B = N is a syntax error. Always write vec(EXPR, O, B) = N.

  • Off-the-end read: vec($short, 1_000_000, 8) returns 0, never dies, never warns.

  • Off-the-end write: the string is zero-padded up to the slot. For BITS == 1, writing bit 94 grows the string to 12 bytes.

  • Negative OFFSET: fatal — "Negative offset to vec in lvalue context" or the rvalue equivalent.

  • BITS not a supported power of two: fatal with "Illegal number of bits in vec". Valid widths are 1, 2, 4, 8, 16, 32, and 64 on 64-bit builds.

  • UTF-8 encoded strings: vec wants a byte string. If the scalar is flagged UTF-8, Perl first tries to downgrade it to a one-byte-per-character representation. If any character has a codepoint of 256 or higher, that fails fatally with "Wide character in vec". Call utf8::downgrade deliberately, or pack the data with pack "C*" first, before reaching for vec.

  • Read on undef: under use warnings, triggers an uninitialized warning on the string argument; returns 0.

  • Assignment value wider than BITS: the value is masked to the low BITS bits. vec($s, 0, 4) = 0x1F stores 0xF.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • pack — build multi-field binary structures; vec is the random-access counterpart when every field has the same width

  • unpack — pull fields out of a binary string; use unpack("b*", $v) or unpack("%32b*", $v) to render or popcount a vec bit vector

  • substr — byte-level random access when BITS would be 8 and you also want the lvalue to grow or shrink the string

  • sprintf — format the integer a vec read returns, e.g. sprintf "%08b", vec($s, $i, 8)

  • ord — one-shot equivalent of vec($s, $i, 8) when you only need the byte value and never assign back