# Alternation Alternation is the `|` operator. It picks between two or more sub-patterns at the same position. ```perl "cats and dogs" =~ /cat|dog|bird/; # matches 'cat' "cats and dogs" =~ /dog|cat|bird/; # matches 'cat' ``` The order of the alternatives does not change *where* the overall pattern matches. The engine still honours «earliest position wins» — both patterns above match at position 0 because that’s the earliest position where any alternative can match. ## Leftmost alternative wins at a given position Within a single starting position, alternatives are tried left to right and the first one that succeeds is used: ```perl "cats" =~ /c|ca|cat|cats/; # matches 'c' — first alternative wins "cats" =~ /cats|cat|ca|c/; # matches 'cats' — first wins, longer ``` If one alternative is a prefix of another and you want the longer match, put it first. The engine does not look past the first successful alternative at the current position. This is *Traditional NFA* behaviour, and it is what Perl, PCRE2, Python, and most modern engines implement. POSIX-conformant engines (notably some `awk` and `grep` implementations) follow the *longest-leftmost* rule instead — they would match `cats` regardless of alternative order. The [cross-engine](cross-engine.md) chapter has the comparison. Friedl puts it succinctly: > Greedy alternation is non-greedy in a Traditional NFA. The pattern `tour|to|tournament` against `three tournaments won` matches `tour`, not `tournament`. The first alternative succeeds and the engine commits to it; longer alternatives further along the list are never tried. ## Implications for ordering An alternation that is part of a larger pattern can have its performance and correctness shaped by the order of alternatives: - **Specificity first.** When you want the longest of several prefixes, put the longest first. `/web|website|websites/` matches `web` even on input `website`; `/websites|website|web/` matches `websites`. - **Common case first.** Alternation is tried left-to-right; if 90% of your inputs hit alternative 3, the engine wastes time on alternatives 1 and 2 every time. Reorder by frequency. - **Sibling captures.** When the alternatives capture, the order affects which `$n` is set — see *Alternation and capturing* below. ## Combining-pieces formal rule `perlre`’s «Combining RE Pieces» gives the precise statement underlying «leftmost wins». For two pattern pieces `S` and `T`: > When `S` can match, it is a better match than when only `T` > can match. That is the formal version of the rule. «Better» means the engine prefers it. For two `S` matches, the same internal ordering applies (greediness rules within the alternative); likewise for `T` matches. Across alternatives, *the existence of a successful S match excludes consideration of T*. This is why `S|T` cannot be reordered by the engine on its own: Perl does not search for the *best* alternative across the disjunction — it commits to `S` whenever `S` succeeds. ## Grouping vs. alternation precedence `|` has very low precedence. It splits the pattern at the *outermost* level containing it: ```perl /ab|cd/; # 'ab' OR 'cd' /^ab|cd$/; # '^ab' OR 'cd$' — probably not what you meant! /^(ab|cd)$/; # '^' + ('ab' or 'cd') + '$' — what you meant ``` To constrain alternation to part of a pattern, wrap it in a group. Non-capturing `(?:…)` is preferred unless you need the capture: ```perl /house(?:cat|keeper)/; # 'housecat' or 'housekeeper' /house(cat|keeper)/; # same, but $1 will be 'cat' or 'keeper' ``` The group creates a local scope for `|`. Outside the group `|` resumes its top-level role: ```perl /^(?:foo|bar|baz)$|^xyz$/; # ('foo'/'bar'/'baz') or 'xyz' ``` ## Empty alternatives An empty alternative matches the empty string — a useful trick for «this or nothing»: ```perl /house(cat|)/; # 'housecat' or 'house' /(19|20|)\d\d/; # '19xx', '20xx', or just 'xx' ``` Modern style prefers `(?:…)?` over `(?:…|)`; they are equivalent, but the `?` form is clearer: ```perl /house(?:cat)?/; # same as house(cat|), no capture ``` Watch for the backtracking cost when an empty alternative is combined with a quantifier — the engine can re-explore the same position many times. See the [performance](performance.md) chapter on zero-length-match termination. ### Cross-engine note Strict POSIX, `lex`, and most older `awk` implementations *disallow* empty alternatives — `(this|that|)` is a syntax error. Perl 5.42, PCRE2, Rust’s `regex` crate, Python `re`, and modern engine implementations accept them. If a pattern needs to be portable to older POSIX tools, write `(?:this|that)?` instead — it expresses the same idea and is portable across the engines that accept neither form. ## Alternation inside character classes Character classes are almost always what you want when alternating between single characters. `/a|b|c/` and `/[abc]/` match the same strings, but `[abc]` is faster, terser, and clearer: ```perl /a|b|c/; # works, but verbose /[abc]/; # use this ``` Alternation is for alternatives longer than one character (or ones that are themselves patterns). When each alternative is a single character, reach for a class. The engine treats a class as a *single* atomic choice; an alternation as N choices to be explored in order. ## Common-prefix factoring A pattern like `/this|that|then|those/` defeats the engine’s fixed-string-check optimisation: there is no literal prefix the engine can scan for cheaply. Refactoring to expose the common prefix turns the alternation into something the optimiser can work with: ```perl /this|that|then|those/; # no common prefix visible /th(?:is|at|en|ose)/; # common prefix 'th' exposed ``` The two patterns match the same strings, and the second is materially faster on large inputs. The engine scans for `th` using Boyer-Moore, then runs the small alternation only at candidate positions. The general technique: 1. Find the longest literal prefix common to all alternatives. 2. Lift it outside the alternation. 3. Wrap the remainder in `(?:…)` so the alternation stays localised. For lists of words with a few different prefixes, you can apply the rewrite recursively. For very long lists, see the [performance](performance.md) chapter’s section on matching many strings — once the list is in the thousands, a specialised matcher beats any alternation. ## Alternation and capturing Only one alternative inside a group can match at a time, so the group captures the matching alternative: ```perl if ("bert" =~ /(cat|dog|bert|ernie)/) { print "matched $1\n"; # matched bert } ``` Sibling groups outside the alternation retain their normal numbering: ```perl /^(\w+):\s*(yes|no|maybe)$/; # $1 = the key, $2 = the verdict ``` Inside a nested alternation, the groups are numbered left to right by opening paren, even across branches: ```perl /(a)|(b)/; # On match of 'a': $1 = 'a', $2 undef # On match of 'b': $1 undef, $2 = 'b' ``` Check with `defined $n`, not truth — an empty capture is different from an absent capture. ## Branch reset: `(?|…)` Parallel-capture patterns are the usual reason you pick up `(?|…)`. Inside `(?|…)`, every branch starts numbering its captures at the same slot. After the group, numbering resumes at one past the maximum across all branches. ```perl # Without (?|…): need to know which branch matched. if ($time =~ /(\d\d|\d):(\d\d)|(\d\d)(\d\d)/) { my ($h, $m) = ($1, $2); ($h, $m) = ($3, $4) unless defined $h; } # With (?|…): $1 and $2 come from whichever branch matched. if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))/) { my ($h, $m) = ($1, $2); } ``` With a trailing fixed piece: ```perl if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))\s+([A-Z]{3})/) { # $1 = hours, $2 = minutes, $3 = zone (numbered after the group) print "hour=$1 minute=$2 zone=$3\n"; } ``` Rules inside `(?|…)`: - Each branch independently numbers its capturing groups from the current group count. - After the group, the outer numbering continues at one higher than the maximum count reached in any branch. - Named groups keep their names; you can repeat a name across branches. Use the *same names in the same order* in every branch, or surprises ensue (see the [groups and captures](groups-and-captures.md) chapter). Branch reset is the cleanest way to express «parse X in one of several equivalent formats, then reach for the same variables afterwards.» ## Alternation in `split` `split` takes a regexp pattern, so alternation works there too: ```perl my @words = split /\s+|-/, "one-two three four-five"; # ('one', 'two', 'three', 'four', 'five') ``` If the separator pattern contains capturing groups, split includes the captured text in the output list — often surprising. Use `(?:…)` unless you want that: ```perl split /(?:\s+|-)/, "a-b c"; # ('a', 'b', 'c') split /(\s+|-)/, "a-b c"; # ('a', '-', 'b', ' ', 'c') ``` The capturing form is occasionally what you want — preserving the exact separators between fields. Mostly you want non-capturing. ## Summary - `|` separates alternatives; leftmost that matches at the current position wins. - Wrap alternations in `(?:…)` to localise them. - Prefer character classes over single-character alternations. - Lift common prefixes outside the alternation when you want the engine’s literal-scan optimisation to fire. - `(?|…)` resets capture numbering across branches — use it when the branches capture the same conceptual fields. ## See also - The [groups and captures](groups-and-captures.md) chapter — capture numbering inside `(?|…)`, and the rest of the named- capture machinery. - The [performance](performance.md) chapter — ordering by likelihood, common-prefix factoring, and matching many strings. - The [cross-engine](cross-engine.md) chapter — alternation syntax differences across engine families. - [`split`](../../p5/core/perlfunc/split.md) — alternation as separator syntax.