SCALARs and strings

uc#

Return an uppercased copy of a string.

uc walks EXPR character by character and returns a new string in which every cased character has been replaced by its uppercase counterpart. Characters that have no uppercase mapping — digits, punctuation, already-uppercase letters, symbols — are passed through unchanged. If EXPR is omitted, uc operates on $_. The input is never modified; uc always returns a fresh string.

uc is the internal function implementing the \U escape in double-quoted strings, so "\Ufoo\E" and uc("foo") produce the same result.

Synopsis#

uc EXPR
uc

What you get back#

A new scalar string containing the uppercased version of EXPR. The length in characters is unchanged for all cased letters in the Basic Multilingual Plane; a few characters fold to a longer form under Unicode rules (see Edge cases), so the byte length and character length of the result can both grow.

my $s = uc("Perl is GREAT");   # "PERL IS GREAT"

Global state it touches#

  • $_ — used as the implicit argument when EXPR is omitted.

  • LC_CTYPE locale — consulted when use locale is active. uc otherwise ignores the environment.

  • use bytes / use feature 'unicode_strings' / use locale — the lexical pragmas in scope at the call site select which of the casing rule sets below is applied.

Which rules apply#

uc picks one of three rule sets based on the pragmas in effect and the internal representation of the string. The logic matches lc exactly; the mapping table is just the reverse.

  • use bytes in effect — ASCII rules. Only a-z change, to A-Z respectively. Every byte outside 0x61..0x7A is passed through unchanged, regardless of what it would mean as Latin-1 or UTF-8. Use this only when you have deliberately opted out of Unicode handling.

  • use locale for LC_CTYPE in effect — the current locale governs code points below 256; Unicode rules govern the rest (the latter only reachable if the string has the UTF-8 flag set). From v5.20 onward, a UTF-8 locale gives full Unicode rules for the whole string. Under non-UTF-8 locales, case changes that cross the 255/256 boundary are not well-defined and, since v5.22, trigger a locale warning. See perllocale.

  • String has the UTF-8 flag set — Unicode rules.

  • use feature 'unicode_strings' or use locale ':not_characters' in effect — Unicode rules, even for byte strings.

  • Otherwise — ASCII rules. Anything outside a-z is passed through unchanged. This is the historic default for strings that have neither the UTF-8 flag nor a lexical pragma opting into Unicode.

The practical takeaway: if you want predictable Unicode uppercasing of arbitrary input, put use feature 'unicode_strings'; (or a modern use v5.12; or higher) at the top of the file and stop worrying about which representation the string happens to have.

Examples#

Basic ASCII:

my $s = uc("hello");           # "HELLO"

Omitted argument operates on $_:

for ("alpha", "beta") {
    print uc, "\n";            # "ALPHA\n", "BETA\n"
}

\U in a double-quoted string is the same operation:

my $s = "Perl is \Ugreat\E";   # "Perl is GREAT"

Unicode uppercasing under unicode_strings:

use feature 'unicode_strings';
my $s = uc("straße");          # "STRASSE"

The German sharp s has no single-character uppercase in Unicode; it maps to the two-character sequence SS. The returned string is therefore one character longer than the input.

Greek lowercase letters uppercase into capitals, with unchanged characters passed through:

use feature 'unicode_strings';
my $s = uc("Καλημέρα, world!"); # "ΚΑΛΗΜΈΡΑ, WORLD!"

Byte mode confines the operation to ASCII, which is occasionally what you want for protocol tokens:

use bytes;
my $s = uc("héllo");           # "HéLLO" — only the ASCII letters change

Edge cases#

  • undef: uc(undef) returns "" and triggers an uninitialized warning under use warnings. Guard inputs that may be undefined if that warning matters.

  • Empty string: uc("") returns "". No warning.

  • One-to-many mappings: A small number of characters expand to multiple characters on uppercase — the best-known is U+00DF (ß) → SS, and the Greek U+0390 / U+03B0 expansions. The result string is longer than the input in those cases. Do not assume length(uc($s)) == length($s).

  • Titlecase vs uppercase: uc does not titlecase the first letter. Titlecase is a distinct Unicode category; ucfirst applies it to the first character and leaves the rest alone.

  • Non-cased characters: Digits, punctuation, whitespace, symbols, and CJK ideographs are returned unchanged. uc("42!") is "42!".

  • In-place update is not a thing: uc returns a new value; it never modifies its argument. To uppercase in place, assign back: $s = uc $s.

  • Context: uc is always scalar. Calling it in list context still produces a single string.

  • Byte-string / Unicode-flag surprises: A string built from a read without a :utf8 layer has no UTF-8 flag, so uc without use feature 'unicode_strings' will apply ASCII rules even if the bytes are valid UTF-8. Decode first, or enable the pragma, to get the casing the bytes look like they should get.

  • use locale and non-UTF-8 locales (v5.22+): Case changes that would cross the 255/256 boundary emit a Can't do uc("…") on non-UTF-8 locale warning and return the input character unchanged.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • lc — the inverse operation; lowercases every cased character using the same rule selection

  • ucfirst — uppercase (titlecase) only the first character of the string; for a Unicode-aware ucfirst . lc $rest pattern when titlecasing a whole word

  • lcfirst — lowercase only the first character

  • fc — Unicode casefolding for case-insensitive comparison; use this, not uc or lc, when comparing strings for equivalence

  • sprintf — the %s conversion does not casefold; combine with uc when a format needs an uppercased field

  • tr/// — the tr/a-z/A-Z/ idiom uppercases only ASCII and is faster than uc for guaranteed-ASCII input