# String comparison operators

The string-flavoured counterparts of the numeric comparison
family. Same shape, same precedence row, but operating
lexicographically (Unicode code-point order by default) instead
of numerically.

| Operator   | Question         | Returns            |
|------------|------------------|--------------------|
| `lt`       | less than        | true / false       |
| `le`       | less or equal    | true / false       |
| `eq`       | equal            | true / false       |
| `ge`       | greater or equal | true / false       |
| `gt`       | greater than     | true / false       |
| `ne`       | not equal        | true / false       |
| `cmp`      | three-way (sort) | `-1`, `0`, or `+1` |

Both operands are coerced to strings before comparing.

```perl
$name eq "John"               # exact string match
$kind ne "guest"              # negated match
$word lt "m"                  # alphabetically before "m"
$a    cmp  $b                 # sort comparator
```

## Lexicographic order

Strings compare character by character, code-point by code-point.
The first differing character decides; if one string is a prefix
of the other, the shorter wins.

```perl
"abc" lt "abd"        # TRUE  -- 'c' < 'd' at position 2
"abc" lt "abcd"       # TRUE  -- prefix loses
"ABC" lt "abc"        # TRUE  -- ASCII: uppercase < lowercase
"10"  lt "9"          # TRUE  -- '1' < '9' at position 0 (lex, not numeric!)
```

The last example is the canonical reason `eq`/`lt`/`gt` exist:
when you want digit-strings sorted *numerically*, you must use
[`<=>`](numeric-comparison.md) or pre-coerce.

## Unicode

Code-point order is **not** the same as locale-aware
”alphabetical“ order:

- `"ä" gt "z"` is TRUE under code-point order because U+00E4 is
  beyond U+007A.
- Under German DIN 5007-1 (”dictionary“) order, `"ä"` should sort
  with `"a"` — long before `"z"`.

For locale-correct collation, use `Unicode::Collate` from
[perlfunc](../perlfunc.md) or `use locale` with a suitable
locale set. The bare `lt`/`gt`/`cmp` give you ordered, stable,
language-independent comparison — which is exactly the right
thing for `sort` keys, hash bucketing, deterministic test output,
and so on. It is the *wrong* thing for human-facing alphabetical
listings in any non-English language.

## `cmp` for sorting

`cmp` is the string-comparison spaceship. It returns `-1`, `0`,
or `+1` and chains the same way [`<=>`](numeric-comparison.md) does:

```perl
my @sorted = sort { $a cmp $b } @names;        # ascending lex order
my @cased  = sort {
    lc($a) cmp lc($b) || $a cmp $b             # case-insensitive,
                                               # ties broken by case
} @names;
```

## Mixing flavours: a worked bug

The compound-key sort idiom from
[numeric comparison](numeric-comparison.md) showed `||`-chaining of
`<=>` and `cmp`. The bug to avoid is using the *wrong* operator
for the *type* of the key:

```perl
# version strings like "1.10", "1.2", "1.20", ...
sort @versions                             # ASCII order:  "1.10","1.2","1.20"
sort { $a <=> $b } @versions               # numeric mash: works only by accident
                                           # (everything past first dot ignored)
sort { sortkey($a) <=> sortkey($b) } @versions   # parse first, then compare
```

The right answer for version strings is a parser like
`Sort::Versions` or hand-rolled `(\d+)`-tokenisation; neither
`cmp` nor `<=>` does it correctly on its own.

## Precedence

String comparison shares row 11 of the
[precedence](precedence.md) table with numeric comparison. They are
non-associative — the chaining caveat from
[numeric comparison](numeric-comparison.md)
applies here too:

```perl
"a" lt "b" lt "c"     # parses as ("a" lt "b") lt "c"
                      #         = 1            lt "c"
                      #         = TRUE  (because "1" < "c" lexically)
                      # — accidentally right for the wrong reason.
```

Write the conjunction explicitly with `&&`.

## See also

- [Numeric comparison](numeric-comparison.md) — the parallel family.
- [`sort`](../perlfunc/sort.md), [`reverse`](../perlfunc/reverse.md),
  [`lc`](../perlfunc/lc.md), [`uc`](../perlfunc/uc.md), [`fc`](../perlfunc/fc.md)
  — perlfunc tools that pair with string comparison.
- [Unicode in Perl](../../../tutorial/unicode/index.md) — the locale
  / code-point-order distinction in depth.