# String comparison operators The string-flavoured counterparts of the numeric comparison family. Same shape, same precedence row, but operating lexicographically (Unicode code-point order by default) instead of numerically. | Operator | Question | Returns | |------------|------------------|--------------------| | `lt` | less than | true / false | | `le` | less or equal | true / false | | `eq` | equal | true / false | | `ge` | greater or equal | true / false | | `gt` | greater than | true / false | | `ne` | not equal | true / false | | `cmp` | three-way (sort) | `-1`, `0`, or `+1` | Both operands are coerced to strings before comparing. ```perl $name eq "John" # exact string match $kind ne "guest" # negated match $word lt "m" # alphabetically before "m" $a cmp $b # sort comparator ``` ## Lexicographic order Strings compare character by character, code-point by code-point. The first differing character decides; if one string is a prefix of the other, the shorter wins. ```perl "abc" lt "abd" # TRUE -- 'c' < 'd' at position 2 "abc" lt "abcd" # TRUE -- prefix loses "ABC" lt "abc" # TRUE -- ASCII: uppercase < lowercase "10" lt "9" # TRUE -- '1' < '9' at position 0 (lex, not numeric!) ``` The last example is the canonical reason `eq`/`lt`/`gt` exist: when you want digit-strings sorted *numerically*, you must use [`<=>`](numeric-comparison.md) or pre-coerce. ## Unicode Code-point order is **not** the same as locale-aware ”alphabetical“ order: - `"ä" gt "z"` is TRUE under code-point order because U+00E4 is beyond U+007A. - Under German DIN 5007-1 (”dictionary“) order, `"ä"` should sort with `"a"` — long before `"z"`. For locale-correct collation, use `Unicode::Collate` from [perlfunc](../perlfunc.md) or `use locale` with a suitable locale set. The bare `lt`/`gt`/`cmp` give you ordered, stable, language-independent comparison — which is exactly the right thing for `sort` keys, hash bucketing, deterministic test output, and so on. It is the *wrong* thing for human-facing alphabetical listings in any non-English language. ## `cmp` for sorting `cmp` is the string-comparison spaceship. It returns `-1`, `0`, or `+1` and chains the same way [`<=>`](numeric-comparison.md) does: ```perl my @sorted = sort { $a cmp $b } @names; # ascending lex order my @cased = sort { lc($a) cmp lc($b) || $a cmp $b # case-insensitive, # ties broken by case } @names; ``` ## Mixing flavours: a worked bug The compound-key sort idiom from [numeric comparison](numeric-comparison.md) showed `||`-chaining of `<=>` and `cmp`. The bug to avoid is using the *wrong* operator for the *type* of the key: ```perl # version strings like "1.10", "1.2", "1.20", ... sort @versions # ASCII order: "1.10","1.2","1.20" sort { $a <=> $b } @versions # numeric mash: works only by accident # (everything past first dot ignored) sort { sortkey($a) <=> sortkey($b) } @versions # parse first, then compare ``` The right answer for version strings is a parser like `Sort::Versions` or hand-rolled `(\d+)`-tokenisation; neither `cmp` nor `<=>` does it correctly on its own. ## Precedence String comparison shares row 11 of the [precedence](precedence.md) table with numeric comparison. They are non-associative — the chaining caveat from [numeric comparison](numeric-comparison.md) applies here too: ```perl "a" lt "b" lt "c" # parses as ("a" lt "b") lt "c" # = 1 lt "c" # = TRUE (because "1" < "c" lexically) # — accidentally right for the wrong reason. ``` Write the conjunction explicitly with `&&`. ## See also - [Numeric comparison](numeric-comparison.md) — the parallel family. - [`sort`](../perlfunc/sort.md), [`reverse`](../perlfunc/reverse.md), [`lc`](../perlfunc/lc.md), [`uc`](../perlfunc/uc.md), [`fc`](../perlfunc/fc.md) — perlfunc tools that pair with string comparison. - [Unicode in Perl](../../../tutorial/unicode/index.md) — the locale / code-point-order distinction in depth.