Alternation#
Alternation is the | operator. It picks between two or more
sub-patterns at the same position.
"cats and dogs" =~ /cat|dog|bird/; # matches 'cat'
"cats and dogs" =~ /dog|cat|bird/; # matches 'cat'
The order of the alternatives does not change where the overall pattern matches. The engine still honours “earliest position wins” — both patterns above match at position 0 because that’s the earliest position where any alternative can match.
Leftmost alternative wins at a given position#
Within a single starting position, alternatives are tried left to right and the first one that succeeds is used:
"cats" =~ /c|ca|cat|cats/; # matches 'c' — first alternative wins
"cats" =~ /cats|cat|ca|c/; # matches 'cats' — first wins, longer
If one alternative is a prefix of another and you want the longer match, put it first. The engine does not look past the first successful alternative at the current position.
An implication: on a complex pattern, reorder alternatives by likelihood and specificity. Rare, specific alternatives first; broad catch-alls last.
Grouping vs. alternation precedence#
| has very low precedence. It splits the pattern at the outermost
level containing it:
/ab|cd/; # 'ab' OR 'cd'
/^ab|cd$/; # '^ab' OR 'cd$' — probably not what you meant!
/^(ab|cd)$/; # '^' + ('ab' or 'cd') + '$' — what you meant
To constrain alternation to part of a pattern, wrap it in a group.
Non-capturing (?:…) is preferred unless you need the capture:
/house(?:cat|keeper)/; # 'housecat' or 'housekeeper'
/house(cat|keeper)/; # same, but $1 will be 'cat' or 'keeper'
The group creates a local scope for |. Outside the group |
resumes its top-level role:
/^(?:foo|bar|baz)$|^xyz$/; # ('foo'/'bar'/'baz') or 'xyz'
Empty alternatives#
An empty alternative matches the empty string — a useful trick for “this or nothing”:
/house(cat|)/; # 'housecat' or 'house'
/(19|20|)\d\d/; # '19xx', '20xx', or just 'xx'
Modern style prefers (?:…)? over (?:…|); they are equivalent,
but the ? form is clearer:
/house(?:cat)?/; # same as house(cat|), no capture
Watch for the backtracking cost when an empty alternative is combined with a quantifier — the engine can re-explore the same position many times. See the performance chapter.
Alternation inside character classes#
Character classes are almost always what you want when alternating
between single characters. /a|b|c/ and /[abc]/ match the same
strings, but [abc] is faster, terser, and clearer:
/a|b|c/; # works, but verbose
/[abc]/; # use this
Alternation is for alternatives longer than one character (or ones that are themselves patterns). When each alternative is a single character, reach for a class.
Alternation and capturing#
Only one alternative inside a group can match at a time, so the group captures the matching alternative:
if ("bert" =~ /(cat|dog|bert|ernie)/) {
print "matched $1\n"; # matched bert
}
Sibling groups outside the alternation retain their normal numbering:
/^(\w+):\s*(yes|no|maybe)$/;
# $1 = the key, $2 = the verdict
Inside a nested alternation, the groups are numbered left to right by opening paren, even across branches:
/(a)|(b)/;
# On match of 'a': $1 = 'a', $2 undef
# On match of 'b': $1 undef, $2 = 'b'
Check with defined $n, not truth — an empty capture is
different from an absent capture.
Branch reset: (?|…)#
Parallel-capture patterns are the usual reason you pick up
(?|…). Inside (?|…), every branch starts numbering its captures
at the same slot. After the group, numbering resumes at one past
the maximum across all branches.
# Without (?|…): need to know which branch matched.
if ($time =~ /(\d\d|\d):(\d\d)|(\d\d)(\d\d)/) {
my ($h, $m) = ($1, $2);
($h, $m) = ($3, $4) unless defined $h;
}
# With (?|…): $1 and $2 come from whichever branch matched.
if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))/) {
my ($h, $m) = ($1, $2);
}
With a trailing fixed piece:
if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))\s+([A-Z]{3})/) {
# $1 = hours, $2 = minutes, $3 = zone (numbered after the group)
print "hour=$1 minute=$2 zone=$3\n";
}
Rules inside (?|…):
Each branch independently numbers its capturing groups from the current group count.
After the group, the outer numbering continues at one higher than the maximum count reached in any branch.
Named groups keep their names; you can repeat a name across branches.
Branch reset is the cleanest way to express “parse X in one of several equivalent formats, then reach for the same variables afterwards.”
Alternation in split#
split takes a regexp pattern, so alternation works there too:
my @words = split /\s+|-/, "one-two three four-five";
# ('one', 'two', 'three', 'four', 'five')
If the separator pattern contains capturing groups, split includes
the captured text in the output list — often surprising. Use
(?:…) unless you want that:
split /(?:\s+|-)/, "a-b c"; # ('a', 'b', 'c')
split /(\s+|-)/, "a-b c"; # ('a', '-', 'b', ' ', 'c')
Summary#
|separates alternatives; leftmost that matches at the current position wins.Wrap alternations in
(?:…)to localise them.Prefer character classes over single-character alternations.
(?|…)resets capture numbering across branches — use it when the branches capture the same conceptual fields.
See also#
perlre— complete alternation semantics.split— alternation as separator syntax.The groups and captures chapter — interaction with capture numbering.