SCALARs and strings

ucfirst#

Return a copy of a string with its first character titlecased.

ucfirst takes a string, converts its first character to the Unicode titlecase form, and returns the result. Every other character is left untouched. The original is never modified. If the argument is omitted, ucfirst operates on $_. Titlecase usually coincides with uppercase — "a" becomes "A" — but differs for a small number of digraphs in scripts that distinguish the two (see Edge cases). The same pragma, locale, and UTF-8-flag rules that govern lc and uc govern ucfirst.

Synopsis#

ucfirst EXPR
ucfirst

What you get back#

A new string of the same length in characters as the input — only the first character is altered. The return value is always a fresh scalar; modifying it does not affect the argument, and modifying the argument afterward does not affect the returned string. The byte length may change by a few bytes if the first character’s titlecase form has a different UTF-8 encoding from its original form.

my $str = ucfirst("hello world!");   # "Hello world!"

Global state it touches#

  • Reads $_ when called without an argument.

  • Observes use locale for LC_CTYPE — with locale in effect the current locale’s uppercasing table applies to a first character whose code point is below 256.

  • Observes use bytes — under use bytes only a leading a-z changes, to A-Z.

  • Observes use feature 'unicode_strings' — forces Unicode rules regardless of the UTF-8 flag on the input.

Which casing rules apply#

ucfirst picks one of five rulesets for the first character, in priority order. The first one whose condition holds wins. Subsequent characters are always passed through unchanged, regardless of which ruleset applied to the first.

  1. use bytes in effect — ASCII rules. A leading a-z maps to A-Z; every other leading byte is left alone, including bytes in the 128-255 range that would otherwise be Latin-1 letters.

  2. use locale for LC_CTYPE in effect — the current locale’s tables apply to a first character below code point 256; a first character at 256 or above (only reachable when the string already carries the UTF-8 flag) uses Unicode rules. From Perl 5.20 onward, a UTF-8 locale uses full Unicode rules throughout.

  3. The argument has the UTF-8 flag set — Unicode titlecase rules apply to the first character.

  4. use feature 'unicode_strings' or use locale ':not_characters' in effect — Unicode titlecase rules apply to the first character, regardless of the UTF-8 flag.

  5. Otherwise — ASCII rules. A leading character outside a-z is returned unchanged, including Latin-1 lowercase letters like à-þ.

The upshot: if you want predictable Unicode behaviour on every string regardless of how it was constructed, enable use feature 'unicode_strings' (or use v5.12 and above, which turns it on for you).

Titlecase vs uppercase#

For the overwhelming majority of characters, titlecase and uppercase are the same code point — ucfirst("abc") and uc(substr("abc", 0, 1)) . substr("abc", 1) produce identical results. They differ only for a handful of Unicode characters whose uppercase form is a sequence of two capital letters but whose titlecase form is a single capital-followed-by-lowercase letter. The standard example is the Latin digraph dz (U+01F3):

  • uppercase is DZ (U+01F1) — "DZ" rendered as one capital glyph.

  • titlecase is Dz (U+01F2) — "Dz", capital D followed by small z.

ucfirst returns titlecase, which is the correct form when the character stands at the start of a capitalised word. uc on the same first character would return the two-capital form, which reads as if the whole word were in caps.

Examples#

Basic ASCII first-character uppercasing:

my $s = ucfirst("hello, world!");     # "Hello, world!"

Default to $_:

for ("foo", "bar", "baz") {
    print ucfirst, "\n";              # Foo / Bar / Baz
}

Unicode characters only titlecase when the rules allow it:

use feature 'unicode_strings';
my $s = ucfirst("äpfel");             # "Äpfel"

Without unicode_strings and without the UTF-8 flag on the input, a non-ASCII first character passes through unchanged:

my $bytes = "\xE4pfel";               # ä as a single Latin-1 byte
my $cap   = ucfirst $bytes;           # unchanged: "\xE4pfel"

ucfirst is what backs the \u escape inside double-quoted strings:

my $str = "\uperl\E is great";        # "Perl is great"

Capitalise every word — combine with split, map, and join:

my $title = join " ", map { ucfirst lc $_ } split / /, "HELLO world";
# "Hello World"

Titlecase a digraph that distinguishes titlecase from uppercase:

use feature 'unicode_strings';
my $t = ucfirst("\x{01F3}xyz");        # "\x{01F2}xyz" — Dz, not DZ

Edge cases#

  • undef argument raises an uninitialized warning under use warnings and returns the empty string.

  • Empty string returns the empty string — there is no first character to change.

  • Single-character string behaves exactly like uc would for that one character under the same ruleset, except in the digraph cases described above where titlecase and uppercase differ.

  • Numbers are stringified first: ucfirst(42) returns "42" — the leading "4" is not a cased character, so nothing changes.

  • Leading whitespace or punctuation is not skipped. ucfirst(" hello") returns " hello" — the first character is the space, which has no titlecase form. If you need to capitalise the first letter, strip leading non-letters first or use a regex (s/\b(\w)/\u$1/).

  • LATIN SMALL LETTER SHARP S (U+00DF, ß) titlecases to "Ss" under full Unicode rules — a length change that contradicts the usual “same length in characters” guarantee. Under use locale on a non-UTF-8 locale, Perl leaves the character unchanged rather than guessing, and from Perl 5.22 onward this raises a locale warning.

  • Locale and the UTF-8 flag interact: under use locale on a non-UTF-8 locale, a first character below 256 follows the locale while a first character 256 or above follows Unicode. In practice ucfirst only ever touches one character, so the interaction is subtler than for lc / uc, but it is still observable in strings that begin with supplementary-plane characters.

  • Tied variables have their FETCH called once. ucfirst does not modify the tied value; it returns a plain (non-tied) scalar.

  • ucfirst is a unary named operator, not a list operator. ucfirst $a, $b parses as (ucfirst $a), $b — only $a is transformed, and the comma operator discards the result. Use parentheses or map when you mean to titlecase multiple strings:

    my @capped = map { ucfirst } @words;
    

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • lcfirst — the lowercasing counterpart; turns the first character into its lowercase form

  • uc — uppercase every character in the string; use when you want the whole string capitalised, not just the first letter

  • fc — Unicode casefold, the correct choice for case-insensitive comparison when input may contain non-ASCII

  • $_ — the default subject ucfirst reads when called with no argument

  • use locale — controls whether locale tables or ASCII/Unicode rules apply to the first character when it is below code point 256

  • use feature 'unicode_strings' — forces full Unicode titlecasing regardless of the UTF-8 flag on the input