--- name: ucfirst signature: 'ucfirst EXPR' signatures: - 'ucfirst EXPR' - 'ucfirst' since: 5.0 status: documented categories: ["SCALARs and strings"] --- ```{index} single: ucfirst; Perl built-in ``` *[SCALARs and strings](../perlfunc-by-category)* # ucfirst Return a copy of a string with its first character titlecased. `ucfirst` takes a string, converts its **first character** to the Unicode **titlecase** form, and returns the result. Every other character is left untouched. The original is never modified. If the argument is omitted, `ucfirst` operates on [`$_`](../perlvar). Titlecase usually coincides with uppercase — `"a"` becomes `"A"` — but differs for a small number of digraphs in scripts that distinguish the two (see *Edge cases*). The same pragma, locale, and UTF-8-flag rules that govern [`lc`](lc) and [`uc`](uc) govern `ucfirst`. ## Synopsis ```perl ucfirst EXPR ucfirst ``` ## What you get back A new string of the **same length in characters** as the input — only the first character is altered. The return value is always a fresh scalar; modifying it does not affect the argument, and modifying the argument afterward does not affect the returned string. The byte length may change by a few bytes if the first character's titlecase form has a different UTF-8 encoding from its original form. ```perl my $str = ucfirst("hello world!"); # "Hello world!" ``` ## Global state it touches - Reads [`$_`](../perlvar) when called without an argument. - Observes `use locale` for `LC_CTYPE` — with locale in effect the current locale's uppercasing table applies to a first character whose code point is below 256. - Observes `use bytes` — under `use bytes` only a leading `a`-`z` changes, to `A`-`Z`. - Observes `use feature 'unicode_strings'` — forces Unicode rules regardless of the UTF-8 flag on the input. ## Which casing rules apply `ucfirst` picks one of five rulesets for the **first character**, in priority order. The first one whose condition holds wins. Subsequent characters are always passed through unchanged, regardless of which ruleset applied to the first. 1. **`use bytes` in effect** — ASCII rules. A leading `a`-`z` maps to `A`-`Z`; every other leading byte is left alone, including bytes in the 128-255 range that would otherwise be Latin-1 letters. 2. **`use locale` for `LC_CTYPE` in effect** — the current locale's tables apply to a first character below code point 256; a first character at 256 or above (only reachable when the string already carries the UTF-8 flag) uses Unicode rules. From Perl 5.20 onward, a UTF-8 locale uses full Unicode rules throughout. 3. **The argument has the UTF-8 flag set** — Unicode titlecase rules apply to the first character. 4. **`use feature 'unicode_strings'` or `use locale ':not_characters'` in effect** — Unicode titlecase rules apply to the first character, regardless of the UTF-8 flag. 5. **Otherwise** — ASCII rules. A leading character outside `a`-`z` is returned unchanged, including Latin-1 lowercase letters like `à`-`þ`. The upshot: if you want predictable Unicode behaviour on every string regardless of how it was constructed, enable `use feature 'unicode_strings'` (or `use v5.12` and above, which turns it on for you). ## Titlecase vs uppercase For the overwhelming majority of characters, titlecase and uppercase are the same code point — `ucfirst("abc")` and `uc(substr("abc", 0, 1)) . substr("abc", 1)` produce identical results. They differ only for a handful of Unicode characters whose uppercase form is a sequence of two capital letters but whose titlecase form is a single capital-followed-by-lowercase letter. The standard example is the Latin digraph `dz` (U+01F3): - uppercase is `DZ` (U+01F1) — `"DZ"` rendered as one capital glyph. - titlecase is `Dz` (U+01F2) — `"Dz"`, capital `D` followed by small `z`. `ucfirst` returns titlecase, which is the correct form when the character stands at the start of a capitalised word. [`uc`](uc) on the same first character would return the two-capital form, which reads as if the whole word were in caps. ## Examples Basic ASCII first-character uppercasing: ```perl my $s = ucfirst("hello, world!"); # "Hello, world!" ``` Default to [`$_`](../perlvar): ```perl for ("foo", "bar", "baz") { print ucfirst, "\n"; # Foo / Bar / Baz } ``` Unicode characters only titlecase when the rules allow it: ```perl use feature 'unicode_strings'; my $s = ucfirst("äpfel"); # "Äpfel" ``` Without `unicode_strings` and without the UTF-8 flag on the input, a non-ASCII first character passes through unchanged: ```perl my $bytes = "\xE4pfel"; # ä as a single Latin-1 byte my $cap = ucfirst $bytes; # unchanged: "\xE4pfel" ``` `ucfirst` is what backs the `\u` escape inside double-quoted strings: ```perl my $str = "\uperl\E is great"; # "Perl is great" ``` Capitalise every word — combine with [`split`](split), [`map`](map), and [`join`](join): ```perl my $title = join " ", map { ucfirst lc $_ } split / /, "HELLO world"; # "Hello World" ``` Titlecase a digraph that distinguishes titlecase from uppercase: ```perl use feature 'unicode_strings'; my $t = ucfirst("\x{01F3}xyz"); # "\x{01F2}xyz" — Dz, not DZ ``` ## Edge cases - **[`undef`](undef) argument** raises an `uninitialized` warning under `use warnings` and returns the empty string. - **Empty string** returns the empty string — there is no first character to change. - **Single-character string** behaves exactly like [`uc`](uc) would for that one character under the same ruleset, except in the digraph cases described above where titlecase and uppercase differ. - **Numbers are stringified first**: `ucfirst(42)` returns `"42"` — the leading `"4"` is not a cased character, so nothing changes. - **Leading whitespace or punctuation** is **not** skipped. `ucfirst(" hello")` returns `" hello"` — the first character is the space, which has no titlecase form. If you need to capitalise the first *letter*, strip leading non-letters first or use a regex (`s/\b(\w)/\u$1/`). - **`LATIN SMALL LETTER SHARP S` (U+00DF, `ß`)** titlecases to `"Ss"` under full Unicode rules — a length change that contradicts the usual "same length in characters" guarantee. Under `use locale` on a non-UTF-8 locale, Perl leaves the character unchanged rather than guessing, and from Perl 5.22 onward this raises a locale warning. - **Locale and the UTF-8 flag interact**: under `use locale` on a non-UTF-8 locale, a first character below 256 follows the locale while a first character 256 or above follows Unicode. In practice `ucfirst` only ever touches one character, so the interaction is subtler than for [`lc`](lc) / [`uc`](uc), but it is still observable in strings that begin with supplementary-plane characters. - **Tied variables** have their `FETCH` called once. `ucfirst` does not modify the tied value; it returns a plain (non-tied) scalar. - **`ucfirst` is a unary named operator**, not a list operator. `ucfirst $a, $b` parses as `(ucfirst $a), $b` — only `$a` is transformed, and the comma operator discards the result. Use parentheses or [`map`](map) when you mean to titlecase multiple strings: ```perl my @capped = map { ucfirst } @words; ``` ## Differences from upstream Fully compatible with upstream Perl 5.42. ## See also - [`lcfirst`](lcfirst) — the lowercasing counterpart; turns the first character into its lowercase form - [`uc`](uc) — uppercase every character in the string; use when you want the whole string capitalised, not just the first letter - [`fc`](fc) — Unicode casefold, the correct choice for case-insensitive comparison when input may contain non-ASCII - [`$_`](../perlvar) — the default subject `ucfirst` reads when called with no argument - `use locale` — controls whether locale tables or ASCII/Unicode rules apply to the first character when it is below code point 256 - `use feature 'unicode_strings'` — forces full Unicode titlecasing regardless of the UTF-8 flag on the input