String comparison operators#

The string-flavoured counterparts of the numeric comparison family. Same shape, same precedence row, but operating lexicographically (Unicode code-point order by default) instead of numerically.

Operator

Question

Returns

lt

less than

true / false

le

less or equal

true / false

eq

equal

true / false

ge

greater or equal

true / false

gt

greater than

true / false

ne

not equal

true / false

cmp

three-way (sort)

-1, 0, or +1

Both operands are coerced to strings before comparing.

$name eq "John"               # exact string match
$kind ne "guest"              # negated match
$word lt "m"                  # alphabetically before "m"
$a    cmp  $b                 # sort comparator

Lexicographic order#

Strings compare character by character, code-point by code-point. The first differing character decides; if one string is a prefix of the other, the shorter wins.

"abc" lt "abd"        # TRUE  -- 'c' < 'd' at position 2
"abc" lt "abcd"       # TRUE  -- prefix loses
"ABC" lt "abc"        # TRUE  -- ASCII: uppercase < lowercase
"10"  lt "9"          # TRUE  -- '1' < '9' at position 0 (lex, not numeric!)

The last example is the canonical reason eq/lt/gt exist: when you want digit-strings sorted numerically, you must use <=> or pre-coerce.

Unicode#

Code-point order is not the same as locale-aware «alphabetical» order:

  • "ä" gt "z" is TRUE under code-point order because U+00E4 is beyond U+007A.

  • Under German DIN 5007-1 («dictionary») order, "ä" should sort with "a" — long before "z".

For locale-correct collation, use Unicode::Collate from perlfunc or use locale with a suitable locale set. The bare lt/gt/cmp give you ordered, stable, language-independent comparison — which is exactly the right thing for sort keys, hash bucketing, deterministic test output, and so on. It is the wrong thing for human-facing alphabetical listings in any non-English language.

cmp for sorting#

cmp is the string-comparison spaceship. It returns -1, 0, or +1 and chains the same way <=> does:

my @sorted = sort { $a cmp $b } @names;        # ascending lex order
my @cased  = sort {
    lc($a) cmp lc($b) || $a cmp $b             # case-insensitive,
                                               # ties broken by case
} @names;

Mixing flavours: a worked bug#

The compound-key sort idiom from numeric comparison showed ||-chaining of <=> and cmp. The bug to avoid is using the wrong operator for the type of the key:

# version strings like "1.10", "1.2", "1.20", ...
sort @versions                             # ASCII order:  "1.10","1.2","1.20"
sort { $a <=> $b } @versions               # numeric mash: works only by accident
                                           # (everything past first dot ignored)
sort { sortkey($a) <=> sortkey($b) } @versions   # parse first, then compare

The right answer for version strings is a parser like Sort::Versions or hand-rolled (\d+)-tokenisation; neither cmp nor <=> does it correctly on its own.

Precedence#

String comparison shares row 11 of the precedence table with numeric comparison. They are non-associative — the chaining caveat from numeric comparison applies here too:

"a" lt "b" lt "c"     # parses as ("a" lt "b") lt "c"
                      #         = 1            lt "c"
                      #         = TRUE  (because "1" < "c" lexically)
                      # — accidentally right for the wrong reason.

Write the conjunction explicitly with &&.

See also#