I/O

getc#

Read the next single character from a filehandle.

getc returns one character of input from FILEHANDLE, advancing the handle’s read position by that character. If FILEHANDLE is omitted, getc reads from STDIN. At end-of-file or on a read error, getc returns undef and — in the error case — sets $!. The unit is a character, not a byte: under a :utf8 or :encoding(...) PerlIO layer a single call consumes however many bytes make up one decoded codepoint.

Synopsis#

getc FILEHANDLE
getc

What you get back#

A one-character string on success, or undef at end-of-file or on error. Distinguish the two by checking $!:

my $ch = getc $fh;
if (!defined $ch) {
    die "read error: $!" if $!;
    # otherwise clean EOF
}

Because undef collapses with the empty string "" under stringy comparisons, always test with defined, never with length or a string match.

Global state it touches#

  • $! — set when getc returns undef because of a read error (not set for a clean end-of-file).

  • The current input filehandle — the no-argument form reads from STDIN, not from the handle selected with select. select governs default output, not default input.

  • The handle’s PerlIO layer stack — determines whether the returned scalar is a byte or a decoded character, and whether CRLF translation is applied on the way in.

Examples#

Read one character from STDIN:

my $ch = getc;
print "got: $ch\n" if defined $ch;

Read from an explicit filehandle and detect end-of-file:

open my $fh, "<", "input.txt" or die "open: $!";
while (defined(my $ch = getc $fh)) {
    print $ch;
}
close $fh;

Read characters, not bytes, from a UTF-8 file. Each call returns one decoded codepoint even when the underlying byte stream uses multi-byte sequences:

open my $fh, "<:encoding(UTF-8)", "utf8.txt" or die $!;
my $ch = getc $fh;      # e.g. "ä" as one character, even though
                        # it occupies two bytes on disk

Consume a yes/no prompt without waiting for a whole line — requires turning off terminal line buffering first (see Edge cases):

print "continue? [y/n] ";
system "stty", "-icanon", "eol", "\001";
my $key = getc(STDIN);
system "stty", "icanon", "eol", "^@";
print "\n";
exit unless defined $key and lc $key eq "y";

Edge cases#

  • Terminal line buffering: on an interactive STDIN, the terminal driver hands Perl nothing until the user presses Enter, so getc blocks until a full line is available and then returns only its first character. getc itself cannot bypass this. To read a single keypress, put the terminal into non-canonical (“cbreak”) mode first — shell out to stty, use POSIX::termios via the POSIX module for a portable in-process solution, or use the CPAN module Term::ReadKey for a friendlier interface. Remember to restore the terminal mode on exit, including from signal handlers.

  • Bytes vs characters under I/O layers: the return value is whatever the handle’s layer stack produces. A bytes-mode handle yields one byte per call; a :utf8 or :encoding(...) handle yields one decoded character per call, consuming as many bytes off the stream as that character takes. Malformed input under a decoding layer triggers the layer’s usual warning or error rather than silently returning garbage.

  • End-of-file vs error: both return undef. $! is cleared before the call is entered but only set when an actual read error occurred, so defined $ch or $! and die ... is the idiom.

  • Closed or unopened filehandle: returns undef and sets $! to Bad file descriptor. Under use warnings a getc() on closed filehandle warning is emitted.

  • Efficiency: getc is one-character-at-a-time, which crosses the PerlIO layer stack per call. For bulk scanning, read, sysread, or readline are substantially faster; reserve getc for interactive prompts and protocol parsers that genuinely need character-at-a-time control.

  • Default handle is STDIN, not ARGV: unlike the diamond operator <>, getc without an argument does not walk @ARGV or fall back to the magic ARGV handle — it reads STDIN directly.

  • Wide terminal input: reading a character from a terminal in raw mode still returns one codepoint only when the input layer decodes UTF-8. A paste containing a multi-byte character under a bytes-mode handle surfaces as several separate getc calls, one per byte.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • read — read a fixed number of characters or bytes in one call, much faster than looping getc

  • readline — read one whole record (line) at a time, governed by $/

  • sysread — unbuffered read straight from the OS, bypassing the PerlIO layer stack

  • eof — test whether the next getc / readline would return undef for end-of-file

  • $! — errno set when getc returns undef because of a read error