--- name: read signature: 'read FILEHANDLE,SCALAR,LENGTH,OFFSET' signature_alt: 'read FILEHANDLE,SCALAR,LENGTH' since: 5.0 status: documented categories: ["I/O", "Fixed-length data"] --- ```{index} single: read; Perl built-in ``` *[I/O](../perlfunc-by-category) · [Fixed-length data](../perlfunc-by-category)* # read Read a fixed amount of buffered input from a filehandle into a scalar. `read` pulls up to `LENGTH` characters from `FILEHANDLE` and stores them in `SCALAR`, returning how many were actually read. It goes through the handle's PerlIO stack and is therefore buffered on top of the underlying OS read — contrast with [`sysread`](sysread), which bypasses the buffer and calls `read(2)` directly. The optional `OFFSET` argument lets you splice the incoming data into the middle of `SCALAR` rather than overwriting it. ## Synopsis ```perl read FILEHANDLE, SCALAR, LENGTH read FILEHANDLE, SCALAR, LENGTH, OFFSET ``` ## What you get back - The **number of characters read**, which may be less than `LENGTH`. - `0` at end of file. - [`undef`](undef) on error, with [`$!`](../perlvar) set. `SCALAR` is grown or shrunk so that the last character actually read becomes the last character of the scalar — unless `OFFSET` is given, in which case only the slice at `OFFSET` is overwritten and anything beyond it is left alone (see *The OFFSET argument* below). A short read is **not** an error. On a regular file it normally means you reached end of file; on a pipe, socket, or terminal it means no more data is available right now. Loop until you either have the bytes you need or `read` returns `0` / [`undef`](undef): ```perl my $buf = ""; my $want = 4096; while ($want > 0) { my $got = read($fh, $buf, $want, length $buf); die "read error: $!" unless defined $got; last if $got == 0; # EOF $want -= $got; } ``` ## Global state it touches - [`$!`](../perlvar) — set when `read` returns [`undef`](undef). - `${^UTF8CACHE}` / the handle's PerlIO layers — determine whether `LENGTH` is counted in bytes or in characters (see *Character vs byte semantics* below). `read` does not interact with [`$_`](../perlvar), [`$/`](../perlvar), [`$\`](../perlvar), or [`$,`](../perlvar). Unlike [`readline`](readline), it does not care about the input record separator. ## The OFFSET argument `OFFSET` controls **where in `SCALAR`** the incoming data lands. It does not seek the filehandle. - **Omitted** — data replaces the entire contents of `SCALAR`. - **Positive, within length** — data is written starting at position `OFFSET`. Characters before `OFFSET` are preserved; characters from `OFFSET` to the end of `SCALAR` are overwritten or extended. - **Positive, beyond length** — `SCALAR` is first padded with `"\0"` bytes out to `OFFSET`, then the read is appended. Useful for reading into a fixed slot inside a larger buffer you are assembling. - **Negative** — counts backwards from the end of `SCALAR`. `-1` means "overwrite the last character and append from there." ```perl my $buf = "HEADER"; read($fh, $buf, 16, length $buf); # append 16 chars after "HEADER" my $slab = ""; read($fh, $slab, 512, 1024); # pad to 1024 "\0" bytes, then # read 512 chars — $slab is # now 1536 chars long ``` ## Character vs byte semantics `LENGTH` is measured in whatever unit the handle deals in: - **Byte-mode handle** (the default, and every handle opened without an encoding layer): `LENGTH` is a byte count. `read($fh, $buf, 10)` pulls 10 bytes and `length $buf` is 10. - **`:utf8` layer**: `LENGTH` is a **character** count. Perl decodes UTF-8 on the way in, and `$buf` holds decoded codepoints. The number of bytes consumed from the file can be anywhere from `LENGTH` to `4 * LENGTH`, depending on the text. - **`:encoding(...)` layer**: same rule as `:utf8`, for any encoding the layer knows. ```perl open my $fh, "<:utf8", "greek.txt" or die $!; read($fh, my $buf, 5); # 5 characters, not 5 bytes ``` Mixing a byte-mode read with UTF-8 data produces mojibake and, under `use warnings`, a `Malformed UTF-8` warning if you later decode the result. Pick the layer at [`open`](open) time and stick with it. ## Buffered vs unbuffered I/O `read` is **stdio-buffered** through PerlIO — internally it calls `fread(3)` (or PerlIO's replacement) against the handle's buffer. That has two consequences worth remembering: - You can mix `read`, [`readline`](readline) / `<$fh>`, [`getc`](getc), and [`seek`](seek) freely on the same handle. They all see the same buffer. - You must **not** mix `read` with [`sysread`](sysread) on the same handle. [`sysread`](sysread) bypasses the buffer and goes straight to `read(2)`; any bytes already pulled into the buffer by a previous `read` become invisible to [`sysread`](sysread), and vice versa. If you need raw syscall semantics, use [`sysread`](sysread) exclusively on that handle. For byte-accurate, non-buffered input — for example on a non-blocking socket, or when implementing a protocol where a short read is meaningful rather than "try again" — reach for [`sysread`](sysread). ## Examples Read a fixed-size header from a binary file: ```perl open my $fh, "<", "packet.bin" or die "open: $!"; binmode $fh; my $header; my $n = read($fh, $header, 16); die "short header: got $n bytes" unless $n == 16; ``` Append 16 bytes to the end of an existing buffer by using `OFFSET` equal to the current length: ```perl my $buf = "PRELUDE:"; read($fh, $buf, 16, length $buf); # $buf is now "PRELUDE:" . 16 new bytes ``` Read into position 1024 of a scalar, padding the gap with `"\0"`: ```perl my $slot = ""; read($fh, $slot, 64, 1024); # length($slot) == 1088 # substr($slot, 0, 1024) is "\0" x 1024 ``` Loop until you have exactly `N` bytes or hit EOF — the correct pattern for pipes and sockets where a single `read` often returns fewer bytes than requested: ```perl sub read_exact { my ($fh, $n) = @_; my $buf = ""; while (length($buf) < $n) { my $got = read($fh, $buf, $n - length($buf), length $buf); return undef unless defined $got; return $buf if $got == 0; # EOF; caller inspects length # loop } return $buf; } ``` Character-counted read through a UTF-8 layer: ```perl open my $fh, "<:encoding(UTF-8)", "notes.txt" or die $!; read($fh, my $chunk, 100); # 100 characters printf "chars=%d bytes=%d\n", length $chunk, do { use bytes; length $chunk; }; ``` ## Edge cases - **Closed filehandle**: returns [`undef`](undef) and sets [`$!`](../perlvar) to `"Bad file descriptor"`. Under `use warnings` a `read() on closed filehandle` warning is emitted. - **Unopened filehandle**: same as closed — [`undef`](undef) and [`$!`](../perlvar) set. - **`LENGTH` of `0`**: `read` returns `0` immediately and does not touch `SCALAR`. It is **not** a reliable EOF probe; use [`eof`](eof) for that. - **Negative `LENGTH`**: a fatal runtime error (`Negative length at ...`). Validate `LENGTH` before calling. - **Negative `OFFSET` whose magnitude exceeds the current length of `SCALAR`**: a fatal runtime error (`Offset outside string`). Clamp with `max($offset, -length $buf)` when the offset is computed. - **Short read on a pipe or socket**: not an error. `read` returns fewer characters than requested whenever the PerlIO buffer empties before `LENGTH` is reached. Loop if you need the full count. - **EOF mid-read**: returns the partial count. The next call returns `0`. After that, `$fh` stays at EOF until you [`seek`](seek) or `clearerr`. - **Reading from a tied handle**: `read` dispatches to the tie class's `READ` method, which is responsible for honouring `LENGTH` and `OFFSET`. Misbehaving tie classes can violate the "grow `SCALAR` so the last character read is the last character" contract. - **Interaction with [`sysread`](sysread)**: do not mix them on one handle. `read` fills the PerlIO buffer in chunks of its own choosing; [`sysread`](sysread) ignores that buffer entirely. - **Binary data on a text-mode handle**: on Unix-like systems there is no distinct text mode, but an encoding layer still transforms bytes. `binmode $fh` (or `open ..., "<:raw", ...`) before reading binary data. - **`FILEHANDLE` as an expression**: a bareword or simple scalar is fine. Anything more complex must be parenthesised: `read(($handles[$i]), $buf, $len)`. ## Differences from upstream Fully compatible with upstream Perl 5.42. ## See also - [`open`](open) — acquires the filehandle and decides whether subsequent `read`s are byte-counted or character-counted via the I/O layer - [`sysread`](sysread) — the unbuffered counterpart, a direct `read(2)` system call; use it for non-blocking I/O or when a short read is meaningful - [`readline`](readline) / `<$fh>` — record-oriented input that respects [`$/`](../perlvar) instead of a byte/character count - [`getc`](getc) — read a single character; roughly `read($fh, $c, 1)` but with different EOF/undef reporting - [`binmode`](binmode) — remove or add I/O layers so that `LENGTH` is unambiguously a byte count or a character count - [`eof`](eof) — the right way to test for end of file, rather than reading a zero-length chunk