SCALARs and strings

chr#

Return the character whose codepoint is the given number.

chr takes a non-negative integer and returns a one-character string holding the character at that codepoint in the current character set. chr(65) is "A" in both ASCII and Unicode; chr(0x263a) is the Unicode smiley face . If NUMBER is omitted, chr operates on $_. The inverse operation is ord.

Synopsis#

chr NUMBER
chr
chr($n)

What you get back#

A string of length one character. With Unicode semantics — the default for any codepoint above 0x7f — that character may be encoded internally as several bytes of UTF-8, so length reports 1 while bytes::length can report up to four. For codepoints in the range 0..127 the result is a single-byte ASCII string and the two lengths agree.

Global state it touches#

With no argument, chr reads $_. It writes nothing. The bytes pragma changes what chr does with out-of-range or negative input (see Edge cases); no other pragma or special variable affects it.

Examples#

ASCII character from its codepoint:

print chr(65);              # A
print chr(97), "\n";        # a

Build a string from a list of codepoints with map and join:

my @code = (72, 101, 108, 108, 111);
print join("", map { chr } @code), "\n";   # Hello

Unicode codepoint above the ASCII range — the result is one character, four bytes in its internal UTF-8 form:

use utf8;
my $smiley = chr(0x263a);
print length($smiley);                       # 1
use bytes; print length($smiley);            # 3

Round-trip with ord — for any single character $c, the identity chr(ord($c)) eq $c holds:

my $c = "Z";
print chr(ord($c)) eq $c ? "yes" : "no";     # yes

Default-argument form inside a loop over $_:

for (65, 66, 67) {
    print chr;                               # prints ABC
}

Negative input yields the Unicode replacement character U+FFFD:

print chr(-1) eq "\x{fffd}" ? "repl" : "?";  # repl

Edge cases#

  • Fractional argument: truncated toward zero before lookup. chr(65.9) returns "A".

  • Non-numeric string: converted by the normal numeric-coercion rules, with a warning under use warnings. chr("65abc") returns "A".

  • chr(undef) returns chr(0), the NUL character "\0", and warns under use warnings about use of an uninitialized value.

  • Negative values: outside the bytes pragma, negative input returns the Unicode replacement character "\x{fffd}". Under use bytes, the low eight bits of the integer value are used: chr(-1) yields the byte "\xff".

  • Values above 0x10ffff (the maximum Unicode codepoint) still return a single-character string under default semantics, but emit a Unicode non-character is illegal for interchange warning for the non-character ranges, and a Unicode surrogate U+... warning for the surrogate range 0xd800..0xdfff when the string is later encoded.

  • The 128..255 range: characters in this range are by default not stored as UTF-8 internally, for backward compatibility with byte strings. They compare equal to their single-byte form and to the same Unicode codepoint; the internal representation only matters when inspecting with bytes::length or interacting with PerlIO layers.

  • chr never croaks on any numeric input. Errors surface only as warnings, never exceptions.

  • Not an lvalue: chr returns a fresh scalar. chr(65) = "B" is a syntax error.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • ord — inverse of chr: codepoint of the first character of a string

  • sprintfsprintf "%c", $n is equivalent to chr $n; use it to splice a character into a larger format at once

  • pack — build a multi-character string from a list of codepoints in one call (pack "U*", @code for Unicode, pack "C*", @code for bytes)

  • hex — parse a hex string into the integer you then pass to chr; pairs with chr when decoding U+XXXX notation

  • use bytes — pragma that switches chr to byte-oriented semantics for values outside 0..0x7f

  • perlunicode — background on how Perl stores and compares Unicode strings