I/O · Fixed-length data

sysread#

Read raw bytes from a filehandle by calling the underlying read(2) system call.

sysread tries to read LENGTH bytes from FILEHANDLE and store them in SCALAR. It bypasses the PerlIO buffering stack, which is the whole point — reads go straight to the kernel and return whatever a single read(2) returns, which may be fewer bytes than requested. Use it for sockets, pipes, devices, and any other situation where you want control over individual system calls rather than line- or record-oriented input.

Synopsis#

sysread FILEHANDLE, SCALAR, LENGTH
sysread FILEHANDLE, SCALAR, LENGTH, OFFSET

What you get back#

  • The number of bytes actually read, which may be less than LENGTH on sockets, pipes, ttys, and during signal interruption.

  • 0 at end of file.

  • undef on error, with $! set to the errno from the failed read(2).

SCALAR is grown or shrunk so that the last byte read becomes the last byte of the scalar. After a short read, the scalar holds only the bytes that were actually received — nothing is zero-padded to LENGTH.

my $n = sysread $sock, my $buf, 4096;
defined $n           or die "read failed: $!";
$n == 0              and return;          # peer closed
process_chunk($buf);                      # length($buf) == $n

Global state it touches#

  • $! — set when sysread returns undef. Interrupted reads show up as EINTR; would-block reads on non-blocking handles show up as EAGAIN / EWOULDBLOCK.

  • SCALAR itself — always assigned to, even on failure (it is truncated to OFFSET bytes when OFFSET is given and the read fails or returns zero).

  • The filehandle’s byte position advances by the number of bytes returned; this is the same kernel-level position manipulated by sysseek.

sysread does not consult or touch $/, $\, $,, or $_. It is a byte-level primitive; record semantics do not apply.

OFFSET — where the bytes land inside SCALAR#

Without OFFSET, SCALAR is replaced by the bytes read.

With OFFSET, the bytes are written into SCALAR starting at that position:

  • OFFSET >= 0 — start writing at byte OFFSET. If OFFSET is greater than the current length of SCALAR, the string is first padded with "\0" bytes up to OFFSET, then the new data is appended.

  • OFFSET < 0 — counted backwards from the end of SCALAR. -1 means “overwrite the last byte”, -10 means “overwrite the last 10 bytes”.

This is how you grow a buffer incrementally without repeatedly concatenating:

my $buf = "";
while ((my $n = sysread $fh, $buf, 8192, length $buf) > 0) {
    # each call appends at the current end of $buf
}
defined $n or die "read failed: $!";

Examples#

Read up to 64 bytes from a file. $buf ends up exactly $n bytes long:

open my $fh, "<", "input.bin" or die $!;
my $n = sysread $fh, my $buf, 64;
defined $n or die "sysread: $!";

Read up to 32 bytes and place them starting at position 512 in $buf, padding with "\0" if $buf was shorter:

my $buf = "header";
my $n = sysread $fh, $buf, 32, 512;   # $buf is now 512 + $n bytes

Drain a socket until the peer closes. Checking for 0 — not eof — is the only correct end-of-stream test for sysread:

while (1) {
    my $n = sysread $sock, my $chunk, 4096;
    defined $n or die "read: $!";
    last if $n == 0;                  # orderly shutdown from peer
    handle($chunk);
}

Restart a read interrupted by a signal. On a blocking handle, EINTR is the one failure you typically retry rather than propagate:

use Errno qw(EINTR);
my $n;
RETRY: {
    $n = sysread $fh, my $buf, $want;
    redo RETRY if !defined $n && $! == EINTR;
}
defined $n or die "read: $!";

Fill a fixed-size record, tolerating short reads. A single sysread can return fewer bytes than asked for even on a regular file near EOF, and routinely does on sockets and pipes:

sub read_exact {
    my ($fh, $want) = @_;
    my $buf = "";
    while (length($buf) < $want) {
        my $n = sysread $fh, $buf, $want - length($buf), length $buf;
        defined $n        or die "read: $!";
        $n == 0 and die "short read: got ", length $buf, " of $want";
    }
    return $buf;
}

Edge cases#

  • Never mix with buffered I/O on the same handle. readline, read, <$fh>, getc, eof, seek, and tell all go through PerlIO buffers; sysread goes around them. Interleaving them on the same handle leaves data stranded in the buffer that sysread will never see, or skips over data the buffered reader already consumed. Use one style per handle.

  • :utf8 is forbidden. A handle with a :utf8 layer (including the implicit layer added by :encoding(...)) makes sysread throw an exception. Strip the layer with binmodebinmode $fh with no second argument restores raw bytes — before calling sysread.

  • :crlf / :perlio still buffer. Even without :utf8, the default :perlio stack buffers reads and translates line endings under :crlf. sysread ignores those layers, so bytes the buffered reader already pulled into the PerlIO buffer are silently lost to sysread. Open handles destined for sysread with sysopen or call binmode $fh to disable :crlf.

  • Short reads are normal, not errors. A return of $n with 0 < $n < LENGTH is success. Only undef indicates failure; 0 indicates EOF. Loop until you have what you need.

  • No syseof. There is no separate end-of-file test for sysread; eof looks at the PerlIO buffer and is meaningless here. The return value of 0 is the end-of-file signal.

  • OFFSET beyond current length. sysread $fh, $buf, 10, 1_000_000 pads $buf with a million "\0" bytes before writing. This is occasionally useful for preallocating, but usually a bug.

  • Negative OFFSET counts from the current end of SCALAR, not from LENGTH. sysread $fh, $buf, 4, -4 overwrites the last four bytes of whatever $buf currently holds.

  • Non-blocking handles. On a handle with O_NONBLOCK set, a read with nothing available returns undef with $! equal to EAGAIN or EWOULDBLOCK. Treat that as “try again later”, not as an error.

  • Signals. A blocking sysread interrupted by a signal returns undef with $! equal to EINTR. Retry the call.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • syswrite — the write-side counterpart; same buffering rules, same short-I/O semantics

  • sysopen — opens a handle with raw open(2) flags, the natural partner for sysread on files

  • sysseek — byte-accurate seek that composes correctly with sysread; do not mix seek with sysread

  • read — the buffered counterpart; use it for line- and record-oriented input, not for sockets or partial reads

  • binmode — strip :utf8 or :crlf from a handle before using sysread

  • $! — the errno value after a failed sysread, including EINTR, EAGAIN, and EWOULDBLOCK