I/O · Fixed-length data

read#

Read a fixed amount of buffered input from a filehandle into a scalar.

read pulls up to LENGTH characters from FILEHANDLE and stores them in SCALAR, returning how many were actually read. It goes through the handle’s PerlIO stack and is therefore buffered on top of the underlying OS read — contrast with sysread, which bypasses the buffer and calls read(2) directly. The optional OFFSET argument lets you splice the incoming data into the middle of SCALAR rather than overwriting it.

Synopsis#

read FILEHANDLE, SCALAR, LENGTH
read FILEHANDLE, SCALAR, LENGTH, OFFSET

What you get back#

  • The number of characters read, which may be less than LENGTH.

  • 0 at end of file.

  • undef on error, with $! set.

SCALAR is grown or shrunk so that the last character actually read becomes the last character of the scalar — unless OFFSET is given, in which case only the slice at OFFSET is overwritten and anything beyond it is left alone (see The OFFSET argument below).

A short read is not an error. On a regular file it normally means you reached end of file; on a pipe, socket, or terminal it means no more data is available right now. Loop until you either have the bytes you need or read returns 0 / undef:

my $buf = "";
my $want = 4096;
while ($want > 0) {
    my $got = read($fh, $buf, $want, length $buf);
    die "read error: $!" unless defined $got;
    last if $got == 0;                 # EOF
    $want -= $got;
}

Global state it touches#

  • $! — set when read returns undef.

  • ${^UTF8CACHE} / the handle’s PerlIO layers — determine whether LENGTH is counted in bytes or in characters (see Character vs byte semantics below).

read does not interact with $_, $/, $\, or $,. Unlike readline, it does not care about the input record separator.

The OFFSET argument#

OFFSET controls where in SCALAR the incoming data lands. It does not seek the filehandle.

  • Omitted — data replaces the entire contents of SCALAR.

  • Positive, within length — data is written starting at position OFFSET. Characters before OFFSET are preserved; characters from OFFSET to the end of SCALAR are overwritten or extended.

  • Positive, beyond lengthSCALAR is first padded with "\0" bytes out to OFFSET, then the read is appended. Useful for reading into a fixed slot inside a larger buffer you are assembling.

  • Negative — counts backwards from the end of SCALAR. -1 means “overwrite the last character and append from there.”

my $buf = "HEADER";
read($fh, $buf, 16, length $buf);      # append 16 chars after "HEADER"

my $slab = "";
read($fh, $slab, 512, 1024);           # pad to 1024 "\0" bytes, then
                                       # read 512 chars — $slab is
                                       # now 1536 chars long

Character vs byte semantics#

LENGTH is measured in whatever unit the handle deals in:

  • Byte-mode handle (the default, and every handle opened without an encoding layer): LENGTH is a byte count. read($fh, $buf, 10) pulls 10 bytes and length $buf is 10.

  • :utf8 layer: LENGTH is a character count. Perl decodes UTF-8 on the way in, and $buf holds decoded codepoints. The number of bytes consumed from the file can be anywhere from LENGTH to 4 * LENGTH, depending on the text.

  • :encoding(...) layer: same rule as :utf8, for any encoding the layer knows.

open my $fh, "<:utf8", "greek.txt" or die $!;
read($fh, my $buf, 5);                 # 5 characters, not 5 bytes

Mixing a byte-mode read with UTF-8 data produces mojibake and, under use warnings, a Malformed UTF-8 warning if you later decode the result. Pick the layer at open time and stick with it.

Buffered vs unbuffered I/O#

read is stdio-buffered through PerlIO — internally it calls fread(3) (or PerlIO’s replacement) against the handle’s buffer. That has two consequences worth remembering:

  • You can mix read, readline / <$fh>, getc, and seek freely on the same handle. They all see the same buffer.

  • You must not mix read with sysread on the same handle. sysread bypasses the buffer and goes straight to read(2); any bytes already pulled into the buffer by a previous read become invisible to sysread, and vice versa. If you need raw syscall semantics, use sysread exclusively on that handle.

For byte-accurate, non-buffered input — for example on a non-blocking socket, or when implementing a protocol where a short read is meaningful rather than “try again” — reach for sysread.

Examples#

Read a fixed-size header from a binary file:

open my $fh, "<", "packet.bin" or die "open: $!";
binmode $fh;
my $header;
my $n = read($fh, $header, 16);
die "short header: got $n bytes" unless $n == 16;

Append 16 bytes to the end of an existing buffer by using OFFSET equal to the current length:

my $buf = "PRELUDE:";
read($fh, $buf, 16, length $buf);      # $buf is now "PRELUDE:" . 16 new bytes

Read into position 1024 of a scalar, padding the gap with "\0":

my $slot = "";
read($fh, $slot, 64, 1024);            # length($slot) == 1088
                                       # substr($slot, 0, 1024) is "\0" x 1024

Loop until you have exactly N bytes or hit EOF — the correct pattern for pipes and sockets where a single read often returns fewer bytes than requested:

sub read_exact {
    my ($fh, $n) = @_;
    my $buf = "";
    while (length($buf) < $n) {
        my $got = read($fh, $buf, $n - length($buf), length $buf);
        return undef unless defined $got;
        return $buf if $got == 0;       # EOF; caller inspects length
        # loop
    }
    return $buf;
}

Character-counted read through a UTF-8 layer:

open my $fh, "<:encoding(UTF-8)", "notes.txt" or die $!;
read($fh, my $chunk, 100);             # 100 characters
printf "chars=%d bytes=%d\n", length $chunk, do {
    use bytes; length $chunk;
};

Edge cases#

  • Closed filehandle: returns undef and sets $! to "Bad file descriptor". Under use warnings a read() on closed filehandle warning is emitted.

  • Unopened filehandle: same as closed — undef and $! set.

  • LENGTH of 0: read returns 0 immediately and does not touch SCALAR. It is not a reliable EOF probe; use eof for that.

  • Negative LENGTH: a fatal runtime error (Negative length at ...). Validate LENGTH before calling.

  • Negative OFFSET whose magnitude exceeds the current length of SCALAR: a fatal runtime error (Offset outside string). Clamp with max($offset, -length $buf) when the offset is computed.

  • Short read on a pipe or socket: not an error. read returns fewer characters than requested whenever the PerlIO buffer empties before LENGTH is reached. Loop if you need the full count.

  • EOF mid-read: returns the partial count. The next call returns 0. After that, $fh stays at EOF until you seek or clearerr.

  • Reading from a tied handle: read dispatches to the tie class’s READ method, which is responsible for honouring LENGTH and OFFSET. Misbehaving tie classes can violate the “grow SCALAR so the last character read is the last character” contract.

  • Interaction with sysread: do not mix them on one handle. read fills the PerlIO buffer in chunks of its own choosing; sysread ignores that buffer entirely.

  • Binary data on a text-mode handle: on Unix-like systems there is no distinct text mode, but an encoding layer still transforms bytes. binmode $fh (or open ..., "<:raw", ...) before reading binary data.

  • FILEHANDLE as an expression: a bareword or simple scalar is fine. Anything more complex must be parenthesised: read(($handles[$i]), $buf, $len).

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • open — acquires the filehandle and decides whether subsequent reads are byte-counted or character-counted via the I/O layer

  • sysread — the unbuffered counterpart, a direct read(2) system call; use it for non-blocking I/O or when a short read is meaningful

  • readline / <$fh> — record-oriented input that respects $/ instead of a byte/character count

  • getc — read a single character; roughly read($fh, $c, 1) but with different EOF/undef reporting

  • binmode — remove or add I/O layers so that LENGTH is unambiguously a byte count or a character count

  • eof — the right way to test for end of file, rather than reading a zero-length chunk