uc#
Return an uppercased copy of a string.
uc walks EXPR character by character and returns a new string in
which every cased character has been replaced by its uppercase
counterpart. Characters that have no uppercase mapping — digits,
punctuation, already-uppercase letters, symbols — are passed through
unchanged. If EXPR is omitted, uc operates on $_. The input is
never modified; uc always returns a fresh string.
uc is the internal function implementing the \U escape in
double-quoted strings, so "\Ufoo\E" and uc("foo") produce the
same result.
Synopsis#
uc EXPR
uc
What you get back#
A new scalar string containing the uppercased version of EXPR. The
length in characters is unchanged for all cased letters in the Basic
Multilingual Plane; a few characters fold to a longer form under
Unicode rules (see Edge cases), so the byte length and character
length of the result can both grow.
my $s = uc("Perl is GREAT"); # "PERL IS GREAT"
Global state it touches#
$_— used as the implicit argument whenEXPRis omitted.LC_CTYPElocale — consulted whenuse localeis active.ucotherwise ignores the environment.use bytes/use feature 'unicode_strings'/use locale— the lexical pragmas in scope at the call site select which of the casing rule sets below is applied.
Which rules apply#
uc picks one of three rule sets based on the pragmas in effect and
the internal representation of the string. The logic matches lc
exactly; the mapping table is just the reverse.
use bytesin effect — ASCII rules. Onlya-zchange, toA-Zrespectively. Every byte outside0x61..0x7Ais passed through unchanged, regardless of what it would mean as Latin-1 or UTF-8. Use this only when you have deliberately opted out of Unicode handling.use localeforLC_CTYPEin effect — the current locale governs code points below 256; Unicode rules govern the rest (the latter only reachable if the string has the UTF-8 flag set). From v5.20 onward, a UTF-8 locale gives full Unicode rules for the whole string. Under non-UTF-8 locales, case changes that cross the 255/256 boundary are not well-defined and, since v5.22, trigger a locale warning. Seeperllocale.String has the UTF-8 flag set — Unicode rules.
use feature 'unicode_strings'oruse locale ':not_characters'in effect — Unicode rules, even for byte strings.Otherwise — ASCII rules. Anything outside
a-zis passed through unchanged. This is the historic default for strings that have neither the UTF-8 flag nor a lexical pragma opting into Unicode.
The practical takeaway: if you want predictable Unicode uppercasing
of arbitrary input, put use feature 'unicode_strings'; (or a
modern use v5.12; or higher) at the top of the file and stop
worrying about which representation the string happens to have.
Examples#
Basic ASCII:
my $s = uc("hello"); # "HELLO"
Omitted argument operates on $_:
for ("alpha", "beta") {
print uc, "\n"; # "ALPHA\n", "BETA\n"
}
\U in a double-quoted string is the same operation:
my $s = "Perl is \Ugreat\E"; # "Perl is GREAT"
Unicode uppercasing under unicode_strings:
use feature 'unicode_strings';
my $s = uc("straße"); # "STRASSE"
The German sharp s has no single-character uppercase in Unicode; it
maps to the two-character sequence SS. The returned string is
therefore one character longer than the input.
Greek lowercase letters uppercase into capitals, with unchanged characters passed through:
use feature 'unicode_strings';
my $s = uc("Καλημέρα, world!"); # "ΚΑΛΗΜΈΡΑ, WORLD!"
Byte mode confines the operation to ASCII, which is occasionally what you want for protocol tokens:
use bytes;
my $s = uc("héllo"); # "HéLLO" — only the ASCII letters change
Edge cases#
undef:uc(undef)returns""and triggers anuninitializedwarning underuse warnings. Guard inputs that may be undefined if that warning matters.Empty string:
uc("")returns"". No warning.One-to-many mappings: A small number of characters expand to multiple characters on uppercase — the best-known is U+00DF (
ß) →SS, and the Greek U+0390 / U+03B0 expansions. The result string is longer than the input in those cases. Do not assumelength(uc($s)) == length($s).Titlecase vs uppercase:
ucdoes not titlecase the first letter. Titlecase is a distinct Unicode category;ucfirstapplies it to the first character and leaves the rest alone.Non-cased characters: Digits, punctuation, whitespace, symbols, and CJK ideographs are returned unchanged.
uc("42!")is"42!".In-place update is not a thing:
ucreturns a new value; it never modifies its argument. To uppercase in place, assign back:$s = uc $s.Context:
ucis always scalar. Calling it in list context still produces a single string.Byte-string / Unicode-flag surprises: A string built from a
readwithout a:utf8layer has no UTF-8 flag, soucwithoutuse feature 'unicode_strings'will apply ASCII rules even if the bytes are valid UTF-8. Decode first, or enable the pragma, to get the casing the bytes look like they should get.use localeand non-UTF-8 locales (v5.22+): Case changes that would cross the 255/256 boundary emit aCan't do uc("…") on non-UTF-8 localewarning and return the input character unchanged.
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
lc— the inverse operation; lowercases every cased character using the same rule selectionucfirst— uppercase (titlecase) only the first character of the string; for a Unicode-awareucfirst. lc $restpattern when titlecasing a whole wordlcfirst— lowercase only the first characterfc— Unicode casefolding for case-insensitive comparison; use this, notucorlc, when comparing strings for equivalencesprintf— the%sconversion does not casefold; combine withucwhen a format needs an uppercased fieldtr///— thetr/a-z/A-Z/idiom uppercases only ASCII and is faster thanucfor guaranteed-ASCII input