SCALARs and strings

ord#

Return the Unicode code point of the first character of a string.

ord takes a string, looks at its first character, and returns that character’s numeric code point as an integer. Note character, not byte — if the string is a wide-character string containing the letter é (U+00E9), ord returns 233, regardless of how many bytes é occupies in memory. For the empty string, ord returns 0. If EXPR is omitted, ord operates on $_.

For the reverse direction — turning a code point back into a character — see chr.

Synopsis#

ord EXPR
ord
ord($str)

What you get back#

A non-negative integer. For an ASCII input it is in the range 0..127; for a Latin-1 input 0..255; for a general Unicode string anything up to 0x10FFFF. ord never returns a negative number and never returns a non-integer. Only the first character of the argument is consulted — trailing characters are ignored. To walk every code point of a string, combine with split // or unpack "U*".

Global state it touches#

With no argument, ord reads $_. It neither writes nor reads any other special variable. The interpretation of the input as characters (not bytes) depends on whether the scalar is internally flagged as UTF-8 — see Edge cases below.

Examples#

Basic ASCII lookup:

print ord("A");             # 65
print ord("0");             # 48
print ord("\n");             # 10

Empty string returns zero, matching the “no first character” contract:

print ord("");              # 0

Default-argument form inside a while loop over $_:

while (<STDIN>) {
    last if ord == 4;       # stop on EOT (Ctrl-D) as first char
    print;
}

Decode a wide character. ord sees the character, not the bytes:

use utf8;
print ord("é");             # 233        (U+00E9)
print ord("€");             # 8364       (U+20AC)
print ord("😀");            # 128512     (U+1F600)

Walk every code point in a string — useful for debugging encoding surprises:

use utf8;
my $s = "héllo";
print join(",", map { ord } split //, $s), "\n";
# 104,233,108,108,111

Pair with chr for a no-op round-trip on any single character:

my $c = "Z";
print chr(ord($c)) eq $c ? "same\n" : "changed\n";   # same

Edge cases#

  • Character vs byte semantics: ord returns the code point of the first character, which for a UTF-8-flagged scalar is not the same as the first byte. ord("\x{100}") returns 256, even though that character occupies two bytes in memory. To inspect the first byte instead, force byte semantics:

    use bytes;
    print ord("\x{100}");       # 196   (first UTF-8 byte, 0xC4)
    no bytes;
    print ord("\x{100}");       # 256
    

    Reach for use bytes only when you genuinely need byte-level inspection; it is a local, surgical pragma.

  • Non-UTF-8 scalar containing high bytes: if the scalar is a plain byte string (no UTF-8 flag) and its first byte is 0xC4, ord returns 196, not 256. The byte is interpreted as Latin-1. Whether a given scalar is UTF-8-flagged depends on how it was built — use utf8 in source, decode_utf8, a :utf8 I/O layer, chr of a value above 255, and so on.

  • undef: ord(undef) returns 0 and, under use warnings, emits Use of uninitialized value in ord.

  • Numeric argument: ord(65) stringifies 65 to "65" first, then takes the first character — so it returns ord("6") == 54, not 65. This surprises people; if you have an integer and want to round-trip through a character, use chr first or skip ord entirely.

  • Multi-character argument: only the first character matters. ord("ABC") returns 65; the B and C are never consulted.

  • Surrogate and non-character code points: ord will happily return 0xD800..0xDFFF or 0xFFFE/0xFFFF if those appear in the input. Perl does not refuse to store them; validation, if needed, is the caller’s job.

  • Maximum value: on a standard build, ord can return up to 0x7FFFFFFF (31-bit) for a string produced by chr of a value in that range. Real-world Unicode text tops out at 0x10FFFF.

  • Precedence: the unary form ord $x binds tighter than , but looser than most arithmetic operators, so ord $x + 1 parses as ord($x) + 1, not ord($x + 1). Parenthesise when in doubt.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • chr — the inverse operation; turns a code point back into a one-character string

  • sprintf — use %c to format an integer as its character, or %x / %04x to render an ord result in hex

  • unpackunpack "U*" for every code point of a string; unpack "C*" for every byte

  • lc / uc — case-fold a character before taking its code point when you want case-insensitive comparisons

  • $_ — the default argument when ord is called without one