read#
Read a fixed amount of buffered input from a filehandle into a scalar.
read pulls up to LENGTH characters from FILEHANDLE and stores
them in SCALAR, returning how many were actually read. It goes
through the handle’s PerlIO stack and is therefore buffered on top of
the underlying OS read — contrast with sysread, which bypasses the
buffer and calls read(2) directly. The optional OFFSET argument
lets you splice the incoming data into the middle of SCALAR rather
than overwriting it.
Synopsis#
read FILEHANDLE, SCALAR, LENGTH
read FILEHANDLE, SCALAR, LENGTH, OFFSET
What you get back#
The number of characters read, which may be less than
LENGTH.0at end of file.
SCALAR is grown or shrunk so that the last character actually read
becomes the last character of the scalar — unless OFFSET is given,
in which case only the slice at OFFSET is overwritten and anything
beyond it is left alone (see The OFFSET argument below).
A short read is not an error. On a regular file it normally means
you reached end of file; on a pipe, socket, or terminal it means no
more data is available right now. Loop until you either have the bytes
you need or read returns 0 / undef:
my $buf = "";
my $want = 4096;
while ($want > 0) {
my $got = read($fh, $buf, $want, length $buf);
die "read error: $!" unless defined $got;
last if $got == 0; # EOF
$want -= $got;
}
Global state it touches#
${^UTF8CACHE}/ the handle’s PerlIO layers — determine whetherLENGTHis counted in bytes or in characters (see Character vs byte semantics below).
read does not interact with $_, $/, $\, or $,. Unlike
readline, it does not care about the input record separator.
The OFFSET argument#
OFFSET controls where in SCALAR the incoming data lands. It
does not seek the filehandle.
Omitted — data replaces the entire contents of
SCALAR.Positive, within length — data is written starting at position
OFFSET. Characters beforeOFFSETare preserved; characters fromOFFSETto the end ofSCALARare overwritten or extended.Positive, beyond length —
SCALARis first padded with"\0"bytes out toOFFSET, then the read is appended. Useful for reading into a fixed slot inside a larger buffer you are assembling.Negative — counts backwards from the end of
SCALAR.-1means “overwrite the last character and append from there.”
my $buf = "HEADER";
read($fh, $buf, 16, length $buf); # append 16 chars after "HEADER"
my $slab = "";
read($fh, $slab, 512, 1024); # pad to 1024 "\0" bytes, then
# read 512 chars — $slab is
# now 1536 chars long
Character vs byte semantics#
LENGTH is measured in whatever unit the handle deals in:
Byte-mode handle (the default, and every handle opened without an encoding layer):
LENGTHis a byte count.read($fh, $buf, 10)pulls 10 bytes andlength $bufis 10.:utf8layer:LENGTHis a character count. Perl decodes UTF-8 on the way in, and$bufholds decoded codepoints. The number of bytes consumed from the file can be anywhere fromLENGTHto4 * LENGTH, depending on the text.:encoding(...)layer: same rule as:utf8, for any encoding the layer knows.
open my $fh, "<:utf8", "greek.txt" or die $!;
read($fh, my $buf, 5); # 5 characters, not 5 bytes
Mixing a byte-mode read with UTF-8 data produces mojibake and, under
use warnings, a Malformed UTF-8 warning if you later decode the
result. Pick the layer at open time and stick with it.
Buffered vs unbuffered I/O#
read is stdio-buffered through PerlIO — internally it calls
fread(3) (or PerlIO’s replacement) against the handle’s buffer.
That has two consequences worth remembering:
You can mix
read,readline/<$fh>,getc, andseekfreely on the same handle. They all see the same buffer.You must not mix
readwithsysreadon the same handle.sysreadbypasses the buffer and goes straight toread(2); any bytes already pulled into the buffer by a previousreadbecome invisible tosysread, and vice versa. If you need raw syscall semantics, usesysreadexclusively on that handle.
For byte-accurate, non-buffered input — for example on a non-blocking
socket, or when implementing a protocol where a short read is
meaningful rather than “try again” — reach for sysread.
Examples#
Read a fixed-size header from a binary file:
open my $fh, "<", "packet.bin" or die "open: $!";
binmode $fh;
my $header;
my $n = read($fh, $header, 16);
die "short header: got $n bytes" unless $n == 16;
Append 16 bytes to the end of an existing buffer by using OFFSET
equal to the current length:
my $buf = "PRELUDE:";
read($fh, $buf, 16, length $buf); # $buf is now "PRELUDE:" . 16 new bytes
Read into position 1024 of a scalar, padding the gap with "\0":
my $slot = "";
read($fh, $slot, 64, 1024); # length($slot) == 1088
# substr($slot, 0, 1024) is "\0" x 1024
Loop until you have exactly N bytes or hit EOF — the correct pattern
for pipes and sockets where a single read often returns fewer bytes
than requested:
sub read_exact {
my ($fh, $n) = @_;
my $buf = "";
while (length($buf) < $n) {
my $got = read($fh, $buf, $n - length($buf), length $buf);
return undef unless defined $got;
return $buf if $got == 0; # EOF; caller inspects length
# loop
}
return $buf;
}
Character-counted read through a UTF-8 layer:
open my $fh, "<:encoding(UTF-8)", "notes.txt" or die $!;
read($fh, my $chunk, 100); # 100 characters
printf "chars=%d bytes=%d\n", length $chunk, do {
use bytes; length $chunk;
};
Edge cases#
Closed filehandle: returns
undefand sets$!to"Bad file descriptor". Underuse warningsaread() on closed filehandlewarning is emitted.LENGTHof0:readreturns0immediately and does not touchSCALAR. It is not a reliable EOF probe; useeoffor that.Negative
LENGTH: a fatal runtime error (Negative length at ...). ValidateLENGTHbefore calling.Negative
OFFSETwhose magnitude exceeds the current length ofSCALAR: a fatal runtime error (Offset outside string). Clamp withmax($offset, -length $buf)when the offset is computed.Short read on a pipe or socket: not an error.
readreturns fewer characters than requested whenever the PerlIO buffer empties beforeLENGTHis reached. Loop if you need the full count.EOF mid-read: returns the partial count. The next call returns
0. After that,$fhstays at EOF until youseekorclearerr.Reading from a tied handle:
readdispatches to the tie class’sREADmethod, which is responsible for honouringLENGTHandOFFSET. Misbehaving tie classes can violate the “growSCALARso the last character read is the last character” contract.Interaction with
sysread: do not mix them on one handle.readfills the PerlIO buffer in chunks of its own choosing;sysreadignores that buffer entirely.Binary data on a text-mode handle: on Unix-like systems there is no distinct text mode, but an encoding layer still transforms bytes.
binmode $fh(oropen ..., "<:raw", ...) before reading binary data.FILEHANDLEas an expression: a bareword or simple scalar is fine. Anything more complex must be parenthesised:read(($handles[$i]), $buf, $len).
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
open— acquires the filehandle and decides whether subsequentreads are byte-counted or character-counted via the I/O layersysread— the unbuffered counterpart, a directread(2)system call; use it for non-blocking I/O or when a short read is meaningfulreadline/<$fh>— record-oriented input that respects$/instead of a byte/character countgetc— read a single character; roughlyread($fh, $c, 1)but with different EOF/undef reportingbinmode— remove or add I/O layers so thatLENGTHis unambiguously a byte count or a character counteof— the right way to test for end of file, rather than reading a zero-length chunk