read#
Read a fixed amount of buffered input from a filehandle into a scalar.
read pulls up to LENGTH characters from FILEHANDLE and stores them in SCALAR, returning how many were actually read. It goes through the handle’s PerlIO stack and is therefore buffered on top of the underlying OS read — contrast with sysread, which bypasses the buffer and calls read(2) directly. The optional OFFSET argument lets you splice the incoming data into the middle of SCALAR rather than overwriting it.
Synopsis#
read FILEHANDLE, SCALAR, LENGTH
read FILEHANDLE, SCALAR, LENGTH, OFFSET
What you get back#
The number of characters read, which may be less than
LENGTH.0at end of file.
SCALAR is grown or shrunk so that the last character actually read becomes the last character of the scalar — unless OFFSET is given, in which case only the slice at OFFSET is overwritten and anything beyond it is left alone (see The OFFSET argument below).
A short read is not an error. On a regular file it normally means you reached end of file; on a pipe, socket, or terminal it means no more data is available right now. Loop until you either have the bytes you need or read returns 0 / undef:
my $buf = "";
my $want = 4096;
while ($want > 0) {
my $got = read($fh, $buf, $want, length $buf);
die "read error: $!" unless defined $got;
last if $got == 0; # EOF
$want -= $got;
}
Global state it touches#
${^UTF8CACHE}/ the handle’s PerlIO layers — determine whetherLENGTHis counted in bytes or in characters (see Character vs byte semantics below).
read does not interact with $_, $/, $\, or $,. Unlike readline, it does not care about the input record separator.
The OFFSET argument#
OFFSET controls where in SCALAR the incoming data lands. It does not seek the filehandle.
Omitted — data replaces the entire contents of
SCALAR.Positive, within length — data is written starting at position
OFFSET. Characters beforeOFFSETare preserved; characters fromOFFSETto the end ofSCALARare overwritten or extended.Positive, beyond length —
SCALARis first padded with"\0"bytes out toOFFSET, then the read is appended. Useful for reading into a fixed slot inside a larger buffer you are assembling.Negative — counts backwards from the end of
SCALAR.-1means ”overwrite the last character and append from there.“
my $buf = "HEADER";
read($fh, $buf, 16, length $buf); # append 16 chars after "HEADER"
my $slab = "";
read($fh, $slab, 512, 1024); # pad to 1024 "\0" bytes, then
# read 512 chars — $slab is
# now 1536 chars long
Character vs byte semantics#
LENGTH is measured in whatever unit the handle deals in:
Byte-mode handle (the default, and every handle opened without an encoding layer):
LENGTHis a byte count.read($fh, $buf, 10)pulls 10 bytes andlength $bufis 10.:utf8layer:LENGTHis a character count. Perl decodes UTF-8 on the way in, and$bufholds decoded codepoints. The number of bytes consumed from the file can be anywhere fromLENGTHto4 * LENGTH, depending on the text.:encoding(...)layer: same rule as:utf8, for any encoding the layer knows.
open my $fh, "<:utf8", "greek.txt" or die $!;
read($fh, my $buf, 5); # 5 characters, not 5 bytes
Mixing a byte-mode read with UTF-8 data produces mojibake and, under use warnings, a Malformed UTF-8 warning if you later decode the result. Pick the layer at open time and stick with it.
Buffered vs unbuffered I/O#
read is stdio-buffered through PerlIO — internally it calls fread(3) (or PerlIO’s replacement) against the handle’s buffer. That has two consequences worth remembering:
You can mix
read,readline/<$fh>,getc, andseekfreely on the same handle. They all see the same buffer.You must not mix
readwithsysreadon the same handle.sysreadbypasses the buffer and goes straight toread(2); any bytes already pulled into the buffer by a previousreadbecome invisible tosysread, and vice versa. If you need raw syscall semantics, usesysreadexclusively on that handle.
For byte-accurate, non-buffered input — for example on a non-blocking socket, or when implementing a protocol where a short read is meaningful rather than ”try again“ — reach for sysread.
Examples#
Read a fixed-size header from a binary file:
open my $fh, "<", "packet.bin" or die "open: $!";
binmode $fh;
my $header;
my $n = read($fh, $header, 16);
die "short header: got $n bytes" unless $n == 16;
Append 16 bytes to the end of an existing buffer by using OFFSET equal to the current length:
my $buf = "PRELUDE:";
read($fh, $buf, 16, length $buf); # $buf is now "PRELUDE:" . 16 new bytes
Read into position 1024 of a scalar, padding the gap with "\0":
my $slot = "";
read($fh, $slot, 64, 1024); # length($slot) == 1088
# substr($slot, 0, 1024) is "\0" x 1024
Loop until you have exactly N bytes or hit EOF — the correct pattern for pipes and sockets where a single read often returns fewer bytes than requested:
sub read_exact {
my ($fh, $n) = @_;
my $buf = "";
while (length($buf) < $n) {
my $got = read($fh, $buf, $n - length($buf), length $buf);
return undef unless defined $got;
return $buf if $got == 0; # EOF; caller inspects length
# loop
}
return $buf;
}
Character-counted read through a UTF-8 layer:
open my $fh, "<:encoding(UTF-8)", "notes.txt" or die $!;
read($fh, my $chunk, 100); # 100 characters
printf "chars=%d bytes=%d\n", length $chunk, do {
use bytes; length $chunk;
};
Edge cases#
Closed filehandle: returns
undefand sets$!to"Bad file descriptor". Underuse warningsaread() on closed filehandlewarning is emitted.LENGTHof0:readreturns0immediately and does not touchSCALAR. It is not a reliable EOF probe; useeoffor that.Negative
LENGTH: a fatal runtime error (Negative length at ...). ValidateLENGTHbefore calling.Negative
OFFSETwhose magnitude exceeds the current length ofSCALAR: a fatal runtime error (Offset outside string). Clamp withmax($offset, -length $buf)when the offset is computed.Short read on a pipe or socket: not an error.
readreturns fewer characters than requested whenever the PerlIO buffer empties beforeLENGTHis reached. Loop if you need the full count.EOF mid-read: returns the partial count. The next call returns
0. After that,$fhstays at EOF until youseekorclearerr.Reading from a tied handle:
readdispatches to the tie class’sREADmethod, which is responsible for honouringLENGTHandOFFSET. Misbehaving tie classes can violate the ”growSCALARso the last character read is the last character“ contract.Interaction with
sysread: do not mix them on one handle.readfills the PerlIO buffer in chunks of its own choosing;sysreadignores that buffer entirely.Binary data on a text-mode handle: on Unix-like systems there is no distinct text mode, but an encoding layer still transforms bytes.
binmode $fh(oropen ..., "<:raw", ...) before reading binary data.FILEHANDLEas an expression: a bareword or simple scalar is fine. Anything more complex must be parenthesised:read(($handles[$i]), $buf, $len).
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
open— acquires the filehandle and decides whether subsequentreads are byte-counted or character-counted via the I/O layersysread— the unbuffered counterpart, a directread(2)system call; use it for non-blocking I/O or when a short read is meaningfulreadline/<$fh>— record-oriented input that respects$/instead of a byte/character countgetc— read a single character; roughlyread($fh, $c, 1)but with different EOF/undef reportingbinmode— remove or add I/O layers so thatLENGTHis unambiguously a byte count or a character counteof— the right way to test for end of file, rather than reading a zero-length chunk