--- name: Bytes and widths --- # Bytes and widths **By the end of this chapter you will be able to** pick the right integer directive for a given field, count the bytes a template produces, and tell when you need a fixed width versus a native width. Binary formats are specified by *byte counts*. "A 16-bit unsigned length field followed by 4-byte packets" is an instruction about bytes, not about numbers. The first job of a [`pack`](../../p5/core/perlfunc/pack) template is to match those counts exactly. ## The fixed-width integer directives The spine of every template: | Bytes | Unsigned | Signed | |-------|----------|--------| | 1 | `C` | `c` | | 2 | `S` | `s` | | 4 | `L` | `l` | | 8 | `Q` | `q` | Each directive consumes one value from the list and produces exactly that many bytes: ```perl length pack "C", 65 # 1 length pack "S", 12345 # 2 length pack "L", 1_000_000_000 # 4 length pack "Q", 1_000_000_000_000 # 8 (if 64-bit Perl) ``` A repeat count packs that many of the same directive in a row: ```perl length pack "C4", 1, 2, 3, 4 # 4 length pack "L3", 10, 20, 30 # 12 ``` Compose them freely; the total length is the sum: ```perl length pack "C S L", 65, 1000, 2_000_000 # 1 + 2 + 4 = 7 ``` ## `C` is your default one-byte directive For protocol bytes — flag fields, version numbers, opcodes, type tags — `C` (unsigned 8-bit, 0 to 255) is almost always the right choice. `c` (signed, −128 to 127) turns up only when the spec says "signed". `W` is a variant of `C` that allows values above 255 when your template is in `U0` (UTF-8) mode — ignore it until you hit that case. IP addresses are the textbook example: ```perl my $raw = pack "C4", 192, 168, 1, 42; # ^-- four unsigned bytes, in the order the list gives them ``` ## Native vs. fixed width The directives `s` / `S` / `l` / `L` are specified to be exactly 2 and 4 bytes regardless of the C compiler. Their native-width counterparts add `!`: ```perl length pack "l", 0 # 4 (always) length pack "l!", 0 # 4 on most 32-bit and 64-bit systems, # 8 on 64-bit Alpha, some legacy Unixes. ``` `i` and `I` are *always* native (whatever `sizeof(int)` returns on this machine, with a 32-bit minimum). `i!` is an alias for `i`. **Rule of thumb:** - For interoperable binary (wire formats, on-disk files, data crossing machines) use **fixed width** (`l`, `L`, …) plus an explicit endianness — or the [portable shortcuts](endianness). - For in-process work that matches some local C struct you are about to hand to `ioctl` or `syscall`, use **native width** (`l!`, `L!`, `i`, `I`). Mixing the two is fine, but stop and ask yourself why each time. ## Counting the length of a template Before you send a record, check the byte count is what the spec says. `length pack(TEMPLATE, ...)` with placeholder values is the simplest way: ```perl my $sz = length pack "n N L C", 0, 0, 0, 0; # 2 + 4 + 4 + 1 = 11 ``` For templates whose length depends on native sizes, `length pack TEMPLATE, 0, 0, ...` gives you the answer on the current machine. ## Worked example: a minimal packet header A hypothetical protocol specifies: - 8-bit version - 8-bit flags - 16-bit big-endian length (payload follows) Three fields, four bytes total. The template writes itself: ```perl my $hdr = pack "C C n", $version, $flags, length $payload; my $pkt = $hdr . $payload; length $hdr # 4 ``` Two things to note: 1. We did not spell out `C1 C1 n1` — a bare directive letter has an implicit repeat count of 1. 2. The length in the third field is computed from `$payload` in the same expression. A later chapter shows the [`/` form](grouping-and-counts) that ties the two together so you cannot forget. ## Edge cases worth remembering - **Values out of range**: `pack "C", 256` silently truncates (you get `\x00`). There is no warning. Check ranges yourself if the data is untrusted. - **Signed vs. unsigned round-trip**: a value packed as `C` and unpacked as `c` (or vice versa) flips sign for values ≥ 128: ```perl unpack "c", pack "C", 200 # -56 ``` - **Too few values**: missing values pack as `0` for numeric directives, `""` for strings. No warning. The next chapter handles the question every multi-byte integer raises: which end of the number goes first?