--- name: regex alternation --- # Alternation Alternation is the `|` operator. It picks between two or more sub-patterns at the same position. ```perl "cats and dogs" =~ /cat|dog|bird/; # matches 'cat' "cats and dogs" =~ /dog|cat|bird/; # matches 'cat' ``` The order of the alternatives does not change *where* the overall pattern matches. The engine still honours "earliest position wins" — both patterns above match at position 0 because that's the earliest position where any alternative can match. ## Leftmost alternative wins at a given position Within a single starting position, alternatives are tried left to right and the first one that succeeds is used: ```perl "cats" =~ /c|ca|cat|cats/; # matches 'c' — first alternative wins "cats" =~ /cats|cat|ca|c/; # matches 'cats' — first wins, longer ``` If one alternative is a prefix of another and you want the longer match, put it first. The engine does not look past the first successful alternative at the current position. An implication: on a complex pattern, reorder alternatives by likelihood and specificity. Rare, specific alternatives first; broad catch-alls last. ## Grouping vs. alternation precedence `|` has very low precedence. It splits the pattern at the *outermost* level containing it: ```perl /ab|cd/; # 'ab' OR 'cd' /^ab|cd$/; # '^ab' OR 'cd$' — probably not what you meant! /^(ab|cd)$/; # '^' + ('ab' or 'cd') + '$' — what you meant ``` To constrain alternation to part of a pattern, wrap it in a group. Non-capturing `(?:…)` is preferred unless you need the capture: ```perl /house(?:cat|keeper)/; # 'housecat' or 'housekeeper' /house(cat|keeper)/; # same, but $1 will be 'cat' or 'keeper' ``` The group creates a local scope for `|`. Outside the group `|` resumes its top-level role: ```perl /^(?:foo|bar|baz)$|^xyz$/; # ('foo'/'bar'/'baz') or 'xyz' ``` ## Empty alternatives An empty alternative matches the empty string — a useful trick for "this or nothing": ```perl /house(cat|)/; # 'housecat' or 'house' /(19|20|)\d\d/; # '19xx', '20xx', or just 'xx' ``` Modern style prefers `(?:…)?` over `(?:…|)`; they are equivalent, but the `?` form is clearer: ```perl /house(?:cat)?/; # same as house(cat|), no capture ``` Watch for the backtracking cost when an empty alternative is combined with a quantifier — the engine can re-explore the same position many times. See the performance chapter. ## Alternation inside character classes Character classes are almost always what you want when alternating between single characters. `/a|b|c/` and `/[abc]/` match the same strings, but `[abc]` is faster, terser, and clearer: ```perl /a|b|c/; # works, but verbose /[abc]/; # use this ``` Alternation is for alternatives longer than one character (or ones that are themselves patterns). When each alternative is a single character, reach for a class. ## Alternation and capturing Only one alternative inside a group can match at a time, so the group captures the matching alternative: ```perl if ("bert" =~ /(cat|dog|bert|ernie)/) { print "matched $1\n"; # matched bert } ``` Sibling groups outside the alternation retain their normal numbering: ```perl /^(\w+):\s*(yes|no|maybe)$/; # $1 = the key, $2 = the verdict ``` Inside a nested alternation, the groups are numbered left to right by opening paren, even across branches: ```perl /(a)|(b)/; # On match of 'a': $1 = 'a', $2 undef # On match of 'b': $1 undef, $2 = 'b' ``` Check with `defined $n`, not truth — an empty capture is different from an absent capture. ## Branch reset: (?|…) Parallel-capture patterns are the usual reason you pick up `(?|…)`. Inside `(?|…)`, every branch starts numbering its captures at the same slot. After the group, numbering resumes at one past the maximum across all branches. ```perl # Without (?|…): need to know which branch matched. if ($time =~ /(\d\d|\d):(\d\d)|(\d\d)(\d\d)/) { my ($h, $m) = ($1, $2); ($h, $m) = ($3, $4) unless defined $h; } # With (?|…): $1 and $2 come from whichever branch matched. if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))/) { my ($h, $m) = ($1, $2); } ``` With a trailing fixed piece: ```perl if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))\s+([A-Z]{3})/) { # $1 = hours, $2 = minutes, $3 = zone (numbered after the group) print "hour=$1 minute=$2 zone=$3\n"; } ``` Rules inside `(?|…)`: - Each branch independently numbers its capturing groups from the current group count. - After the group, the outer numbering continues at one higher than the maximum count reached in any branch. - Named groups keep their names; you can repeat a name across branches. Branch reset is the cleanest way to express "parse X in one of several equivalent formats, then reach for the same variables afterwards." ## Alternation in split `split` takes a regexp pattern, so alternation works there too: ```perl my @words = split /\s+|-/, "one-two three four-five"; # ('one', 'two', 'three', 'four', 'five') ``` If the separator pattern contains capturing groups, split includes the captured text in the output list — often surprising. Use `(?:…)` unless you want that: ```perl split /(?:\s+|-)/, "a-b c"; # ('a', 'b', 'c') split /(\s+|-)/, "a-b c"; # ('a', '-', 'b', ' ', 'c') ``` ## Summary - `|` separates alternatives; leftmost that matches at the current position wins. - Wrap alternations in `(?:…)` to localise them. - Prefer character classes over single-character alternations. - `(?|…)` resets capture numbering across branches — use it when the branches capture the same conceptual fields. ## See also - [`perlre`](../../p5/core/perlre) — complete alternation semantics. - [`split`](../../p5/core/perlfunc/split) — alternation as separator syntax. - The [groups and captures](groups-and-captures) chapter — interaction with capture numbering.