SCALARs and strings

fc#

Return the Unicode casefolded form of a string for case-insensitive comparison.

fc is the string operation that powers case-insensitive string equality. It produces a form of the string where case distinctions have been erased, so two strings are considered case-insensitively equal when — and only when — their fc results are byte-for-byte identical. It is also the function behind the \F escape in double-quoted strings.

Synopsis#

use feature 'fc';           # or: use v5.16;

fc EXPR
fc                          # operates on $_
"\F...\E"                   # same casefold, inside a double-quoted string

fc is gated behind the fc feature. Enable it explicitly with use feature 'fc', pull it in via a version bundle (use v5.16 or newer, use feature ':5.16'), or call it fully qualified as CORE::fc without any pragma.

What you get back#

A string containing the casefolded form of EXPR. The result is a fresh string; the argument is never modified. Length may change — casefolding "\x{1E9E}" (LATIN CAPITAL LETTER SHARP S) normally expands to "ss", two characters from one.

Treat the return value as opaque: it is a key suitable for equality comparison, not for display. fc("Hello") is "hello" for ASCII, but in the general case the output is not something you’d show to a user.

Why not lc or uc?#

Lowercasing and uppercasing are not reliable for case-insensitive comparison. Both of these are wrong:

lc($a) eq lc($b)            # Wrong
uc($a) eq uc($b)            # Also wrong

They fail on characters whose lower/upper mapping is not symmetric — most famously the German sharp S. lc("\x{1E9E}") is "\x{1E9E}" (no lowercase form), but uc("ß") is "SS". Casefolding sidesteps this by mapping both sides into a dedicated equality form:

fc($a) eq fc($b)            # Right

The regex-based equivalent that was correct before fc existed:

$a =~ /^\Q$b\E\z/i

fc is the direct, non-regex way to get the same answer.

Global state it touches#

  • $_ — used as the argument when EXPR is omitted.

  • use locale — inside a use locale scope, casefolding of characters crossing the 255/256 boundary is disabled (see Edge cases below for the U+1E9E rule).

  • use feature 'unicode_strings' — affects fc the same way it affects lc: forces full Unicode semantics on byte strings that would otherwise be treated under legacy 8-bit rules.

Examples#

Case-insensitive equality, the canonical use:

use feature 'fc';
fc("Hello") eq fc("HELLO");             # true
fc("café")  eq fc("CAFÉ");              # true

The German sharp S — the textbook case where lc and uc both fail but fc succeeds:

use feature 'fc';
my $a = "straße";
my $b = "STRASSE";
fc($a) eq fc($b);                       # true
lc($a) eq lc($b);                       # false — "straße" ne "strasse"

Using fc as a hash key for case-insensitive lookup:

use feature 'fc';
my %seen;
for my $word (@words) {
    $seen{ fc $word }++;                # groups "Foo", "FOO", "foo"
}

No argument — operates on $_:

use feature 'fc';
for (@lines) {
    next unless fc eq "quit";           # matches "QUIT", "Quit", ...
    last;
}

Inside a double-quoted string via \F (same operation, inline):

use feature 'fc';
my $name = "Alice";
my $key  = "\F$name\E";                 # "alice" — same as fc($name)

Calling without the feature pragma, fully qualified:

my $folded = CORE::fc($input);          # works in any scope

Edge cases#

  • No argument: fc with no argument folds $_.

  • undef argument: stringifies to the empty string, which folds to the empty string. Emits an uninitialized warning under use warnings.

  • Feature not enabled: fc EXPR without use feature 'fc' (or a use v5.16+ bundle) is a compile-time error — the parser does not recognise fc as a keyword. CORE::fc(EXPR) always works.

  • Length can change: full casefolds may expand one character to several. fc("\x{1E9E}") is "ss". Never rely on length(fc $s) == length($s).

  • Not a round-trip: fc is one-way. There is no “uncasefold” operation; the original case is gone.

  • U+1E9E under use locale: fc of LATIN CAPITAL LETTER SHARP S (U+1E9E) normally folds to "ss". Under use locale that mapping is suppressed because it crosses the 255/256 codepoint boundary, which locale rules do not handle cleanly. Instead fc returns "\x{17F}\x{17F}" (two LATIN SMALL LETTER LONG S). Since each long s itself folds to "s", two of them compare equal to a single U+1E9E folded outside the locale scope — so equality semantics are preserved even though the byte form differs.

  • Turkic and “simple” folds are not provided: Perl implements only the full, non-Turkic form of casefolding. For the simple form or the Turkic variant, use Unicode::UCD::casefold or the CPAN module Unicode::Casing.

  • Not the same as NFKC or NFC: fc erases case, not compatibility differences. fc("ffi") (LATIN SMALL LIGATURE FFI) is "ffi" after fold, not "ffi". Combine with Unicode::Normalize if you need both.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • lc — lowercases a string; use for display, not for case-insensitive comparison

  • uc — uppercases a string; same caveat as lc

  • lcfirst — lowercases only the first character

  • ucfirst — uppercases only the first character

  • index — substring search; pair with fc on both operands for a case-insensitive variant

  • $_ — the default argument when EXPR is omitted