--- name: uc signature: 'uc EXPR' signatures: - 'uc EXPR' - 'uc' since: 5.0 status: documented categories: ["SCALARs and strings"] --- ```{index} single: uc; Perl built-in ``` *[SCALARs and strings](../perlfunc-by-category)* # uc Return an uppercased copy of a string. `uc` walks `EXPR` character by character and returns a new string in which every cased character has been replaced by its uppercase counterpart. Characters that have no uppercase mapping — digits, punctuation, already-uppercase letters, symbols — are passed through unchanged. If `EXPR` is omitted, `uc` operates on [`$_`](../perlvar). The input is never modified; `uc` always returns a fresh string. `uc` is the internal function implementing the `\U` escape in double-quoted strings, so `"\Ufoo\E"` and `uc("foo")` produce the same result. ## Synopsis ```perl uc EXPR uc ``` ## What you get back A new scalar string containing the uppercased version of `EXPR`. The length in characters is unchanged for all cased letters in the Basic Multilingual Plane; a few characters fold to a longer form under Unicode rules (see *Edge cases*), so the byte length and character length of the result can both grow. ```perl my $s = uc("Perl is GREAT"); # "PERL IS GREAT" ``` ## Global state it touches - [`$_`](../perlvar) — used as the implicit argument when `EXPR` is omitted. - `LC_CTYPE` locale — consulted when `use locale` is active. `uc` otherwise ignores the environment. - `use bytes` / `use feature 'unicode_strings'` / `use locale` — the lexical pragmas in scope at the **call site** select which of the casing rule sets below is applied. ## Which rules apply `uc` picks one of three rule sets based on the pragmas in effect and the internal representation of the string. The logic matches [`lc`](lc) exactly; the mapping table is just the reverse. - **`use bytes` in effect** — ASCII rules. Only `a-z` change, to `A-Z` respectively. Every byte outside `0x61..0x7A` is passed through unchanged, regardless of what it would mean as Latin-1 or UTF-8. Use this only when you have deliberately opted out of Unicode handling. - **`use locale` for `LC_CTYPE` in effect** — the current locale governs code points below 256; Unicode rules govern the rest (the latter only reachable if the string has the UTF-8 flag set). From v5.20 onward, a UTF-8 locale gives full Unicode rules for the whole string. Under non-UTF-8 locales, case changes that cross the 255/256 boundary are not well-defined and, since v5.22, trigger a locale warning. See `perllocale`. - **String has the UTF-8 flag set** — Unicode rules. - **`use feature 'unicode_strings'` or `use locale ':not_characters'` in effect** — Unicode rules, even for byte strings. - **Otherwise** — ASCII rules. Anything outside `a-z` is passed through unchanged. This is the historic default for strings that have neither the UTF-8 flag nor a lexical pragma opting into Unicode. The practical takeaway: if you want predictable Unicode uppercasing of arbitrary input, put `use feature 'unicode_strings';` (or a modern `use v5.12;` or higher) at the top of the file and stop worrying about which representation the string happens to have. ## Examples Basic ASCII: ```perl my $s = uc("hello"); # "HELLO" ``` Omitted argument operates on [`$_`](../perlvar): ```perl for ("alpha", "beta") { print uc, "\n"; # "ALPHA\n", "BETA\n" } ``` `\U` in a double-quoted string is the same operation: ```perl my $s = "Perl is \Ugreat\E"; # "Perl is GREAT" ``` Unicode uppercasing under `unicode_strings`: ```perl use feature 'unicode_strings'; my $s = uc("straße"); # "STRASSE" ``` The German sharp s has no single-character uppercase in Unicode; it maps to the two-character sequence `SS`. The returned string is therefore one character longer than the input. Greek lowercase letters uppercase into capitals, with unchanged characters passed through: ```perl use feature 'unicode_strings'; my $s = uc("Καλημέρα, world!"); # "ΚΑΛΗΜΈΡΑ, WORLD!" ``` Byte mode confines the operation to ASCII, which is occasionally what you want for protocol tokens: ```perl use bytes; my $s = uc("héllo"); # "HéLLO" — only the ASCII letters change ``` ## Edge cases - **[`undef`](undef)**: `uc(undef)` returns `""` and triggers an `uninitialized` warning under `use warnings`. Guard inputs that may be undefined if that warning matters. - **Empty string**: `uc("")` returns `""`. No warning. - **One-to-many mappings**: A small number of characters expand to multiple characters on uppercase — the best-known is U+00DF (`ß`) → `SS`, and the Greek U+0390 / U+03B0 expansions. The result string is longer than the input in those cases. Do not assume `length(uc($s)) == length($s)`. - **Titlecase vs uppercase**: `uc` does **not** titlecase the first letter. Titlecase is a distinct Unicode category; [`ucfirst`](ucfirst) applies it to the first character and leaves the rest alone. - **Non-cased characters**: Digits, punctuation, whitespace, symbols, and CJK ideographs are returned unchanged. `uc("42!")` is `"42!"`. - **In-place update is not a thing**: `uc` returns a new value; it never modifies its argument. To uppercase in place, assign back: `` $s = uc $s ``. - **Context**: `uc` is always scalar. Calling it in list context still produces a single string. - **Byte-string / Unicode-flag surprises**: A string built from a [`read`](read) without a `:utf8` layer has no UTF-8 flag, so `uc` without `use feature 'unicode_strings'` will apply **ASCII rules** even if the bytes are valid UTF-8. Decode first, or enable the pragma, to get the casing the bytes look like they should get. - **`use locale` and non-UTF-8 locales (v5.22+)**: Case changes that would cross the 255/256 boundary emit a `Can't do uc("…") on non-UTF-8 locale` warning and return the input character unchanged. ## Differences from upstream Fully compatible with upstream Perl 5.42. ## See also - [`lc`](lc) — the inverse operation; lowercases every cased character using the same rule selection - [`ucfirst`](ucfirst) — uppercase (titlecase) only the first character of the string; for a Unicode-aware [`ucfirst`](ucfirst) `. lc $rest` pattern when titlecasing a whole word - [`lcfirst`](lcfirst) — lowercase only the first character - [`fc`](fc) — Unicode casefolding for case-insensitive comparison; use this, not `uc` or [`lc`](lc), when comparing strings for equivalence - [`sprintf`](sprintf) — the `%s` conversion does not casefold; combine with `uc` when a format needs an uppercased field - [`tr///`](../perlop) — the `tr/a-z/A-Z/` idiom uppercases only ASCII and is faster than `uc` for guaranteed-ASCII input