ord#
Return the Unicode code point of the first character of a string.
ord takes a string, looks at its first character, and returns that
character’s numeric code point as an integer. Note character, not
byte — if the string is a wide-character string containing the letter
é (U+00E9), ord returns 233, regardless of how many bytes é
occupies in memory. For the empty string, ord returns 0. If
EXPR is omitted, ord operates on $_.
For the reverse direction — turning a code point back into a
character — see chr.
Synopsis#
ord EXPR
ord
ord($str)
What you get back#
A non-negative integer. For an ASCII input it is in the range
0..127; for a Latin-1 input 0..255; for a general Unicode string
anything up to 0x10FFFF. ord never returns a negative number and
never returns a non-integer. Only the first character of the argument
is consulted — trailing characters are ignored. To walk every code
point of a string, combine with split // or unpack "U*".
Global state it touches#
With no argument, ord reads $_. It neither writes nor reads any
other special variable. The interpretation of the input as characters
(not bytes) depends on whether the scalar is internally flagged as
UTF-8 — see Edge cases below.
Examples#
Basic ASCII lookup:
print ord("A"); # 65
print ord("0"); # 48
print ord("\n"); # 10
Empty string returns zero, matching the “no first character” contract:
print ord(""); # 0
Default-argument form inside a while loop over $_:
while (<STDIN>) {
last if ord == 4; # stop on EOT (Ctrl-D) as first char
print;
}
Decode a wide character. ord sees the character, not the bytes:
use utf8;
print ord("é"); # 233 (U+00E9)
print ord("€"); # 8364 (U+20AC)
print ord("😀"); # 128512 (U+1F600)
Walk every code point in a string — useful for debugging encoding surprises:
use utf8;
my $s = "héllo";
print join(",", map { ord } split //, $s), "\n";
# 104,233,108,108,111
Pair with chr for a no-op round-trip on any single character:
my $c = "Z";
print chr(ord($c)) eq $c ? "same\n" : "changed\n"; # same
Edge cases#
Character vs byte semantics:
ordreturns the code point of the first character, which for a UTF-8-flagged scalar is not the same as the first byte.ord("\x{100}")returns256, even though that character occupies two bytes in memory. To inspect the first byte instead, force byte semantics:use bytes; print ord("\x{100}"); # 196 (first UTF-8 byte, 0xC4) no bytes; print ord("\x{100}"); # 256
Reach for
use bytesonly when you genuinely need byte-level inspection; it is a local, surgical pragma.Non-UTF-8 scalar containing high bytes: if the scalar is a plain byte string (no UTF-8 flag) and its first byte is
0xC4,ordreturns196, not256. The byte is interpreted as Latin-1. Whether a given scalar is UTF-8-flagged depends on how it was built —use utf8in source,decode_utf8, a:utf8I/O layer,chrof a value above255, and so on.undef:ord(undef)returns0and, underuse warnings, emitsUse of uninitialized value in ord.Numeric argument:
ord(65)stringifies65to"65"first, then takes the first character — so it returnsord("6") == 54, not65. This surprises people; if you have an integer and want to round-trip through a character, usechrfirst or skipordentirely.Multi-character argument: only the first character matters.
ord("ABC")returns65; theBandCare never consulted.Surrogate and non-character code points:
ordwill happily return0xD800..0xDFFFor0xFFFE/0xFFFFif those appear in the input. Perl does not refuse to store them; validation, if needed, is the caller’s job.Maximum value: on a standard build,
ordcan return up to0x7FFFFFFF(31-bit) for a string produced bychrof a value in that range. Real-world Unicode text tops out at0x10FFFF.Precedence: the unary form
ord $xbinds tighter than,but looser than most arithmetic operators, soord $x + 1parses asord($x) + 1, notord($x + 1). Parenthesise when in doubt.
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
chr— the inverse operation; turns a code point back into a one-character stringsprintf— use%cto format an integer as its character, or%x/%04xto render anordresult in hexunpack—unpack "U*"for every code point of a string;unpack "C*"for every bytelc/uc— case-fold a character before taking its code point when you want case-insensitive comparisons$_— the default argument whenordis called without one