SCALARs and strings

lc#

Return a lowercased copy of a string.

lc takes a string, converts every cased character to its lowercase equivalent, and returns the result. The original is never modified. If the argument is omitted, lc operates on $_. The exact set of characters that change depends on the string’s encoding and the pragmas in scope at the call site — ASCII-only by default, full Unicode under use feature 'unicode_strings' or when the input is already a character string with the UTF-8 flag.

Synopsis#

lc EXPR
lc

What you get back#

A new string. The return value is always a fresh scalar — modifying it does not affect the argument, and modifying the argument afterward does not affect the returned string. Length in characters is preserved: lc never adds or removes characters, even in the rare Unicode cases where upper- and lower-case forms have different lengths in other directions (see fc for folding, which can change length).

my $str = lc("Perl is GREAT");   # "perl is great"

Global state it touches#

  • Reads $_ when called without an argument.

  • Observes use locale for LC_CTYPE — with locale in effect the current locale’s lowercasing table applies to code points below 256.

  • Observes use bytes — under use bytes only A-Z change, to a-z.

  • Observes use feature 'unicode_strings' — forces Unicode rules regardless of the UTF-8 flag on the input.

Which casing rules apply#

Perl picks one of four rulesets, in priority order. The first one whose condition holds wins:

  1. use bytes in effect — ASCII rules. Only A-Z map to a-z; every other byte is left alone, including bytes in the 128-255 range that would otherwise be Latin-1 letters.

  2. use locale for LC_CTYPE in effect — the current locale’s tables apply to code points below 256; code points 256 and above (only reachable when the string already carries the UTF-8 flag) use Unicode rules. From Perl 5.20 onward, a UTF-8 locale uses full Unicode rules throughout.

  3. The argument has the UTF-8 flag set — Unicode rules apply to every character.

  4. use feature 'unicode_strings' or use locale ':not_characters' in effect — Unicode rules apply to every character, regardless of the UTF-8 flag.

  5. Otherwise — ASCII rules. Characters outside A-Z are returned unchanged, including Latin-1 uppercase letters like À-Þ.

The upshot: if you want predictable Unicode behaviour on every string regardless of how it was constructed, enable use feature 'unicode_strings' (or use v5.12 and above, which turns it on for you).

Examples#

Basic ASCII lowercasing:

my $s = lc("Hello, World!");        # "hello, world!"

Default to $_:

for ("FOO", "Bar", "BAZ") {
    print lc, "\n";                 # foo / bar / baz
}

Unicode characters only lowercase when the rules allow it:

use feature 'unicode_strings';
my $s = lc("ÄÖÜ");                  # "äöü"

Without unicode_strings and without the UTF-8 flag on the input, non-ASCII characters pass through unchanged:

my $bytes = "\xC4\xD6\xDC";         # Ä Ö Ü as Latin-1 bytes
my $lower = lc $bytes;              # unchanged: "\xC4\xD6\xDC"

lc is what backs the \L...\E escape inside double-quoted strings:

my $str = "Perl is \LGREAT\E";      # "Perl is great"

Case-insensitive compare by lowercasing both sides:

sub ieq { lc($_[0]) eq lc($_[1]) }
ieq("Perl", "PERL");                # true

For correct case-insensitive comparison across Unicode, prefer fc (the Unicode folding function) over lc — see See also.

Edge cases#

  • undef argument raises an uninitialized warning under use warnings and returns the empty string.

  • Empty string returns the empty string.

  • Numbers are stringified first: lc(42) returns "42".

  • LATIN CAPITAL LETTER SHARP S (U+1E9E) lowercases to U+00DF (ß) under Unicode rules, but only if the result can be represented without crossing the 255/256 boundary in a way the current ruleset accepts. Under use locale on a non-UTF-8 locale, Perl leaves the character unchanged rather than guessing — and from Perl 5.22 onward this raises a locale warning.

  • Locale and the UTF-8 flag interact: under use locale on a non-UTF-8 locale, characters below 256 follow the locale while characters 256 and above follow Unicode. A string containing both ranges will get two different casing tables applied to it.

  • Tied variables have their FETCH called once. lc does not modify the tied value; it returns a plain (non-tied) scalar.

  • lc is a unary named operator, not a list operator. lc $a, $b parses as (lc $a), $b — only $a is lowercased, and the comma operator discards the result. Use parentheses or map when you mean to lowercase multiple strings:

    my @lower = map { lc } @words;
    

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • uc — the uppercasing counterpart, with the same pragma and locale rules

  • lcfirst — lowercase only the first character; handy for de-capitalising sentences or identifiers

  • fc — Unicode casefold, the correct choice for case-insensitive comparison when input may contain non-ASCII

  • $_ — the default subject lc reads when called with no argument

  • use locale — controls whether locale tables or ASCII/Unicode rules apply to characters below 256

  • use feature 'unicode_strings' — forces full Unicode casing regardless of the UTF-8 flag on the input