Groups and captures#
Grouping does two things, which are easy to confuse: it turns a sequence of pattern elements into a single unit for quantification and alternation, and it saves what that unit matched for later use.
/house(cat|keeper)/; # 'house' followed by 'cat' or 'keeper'
/(ab){3}/; # 'ababab'
/(\d{3})-(\d{4})/; # capture two groups separated by '-'
Capturing groups: $1, $2, …#
Every pair of unescaped parentheses in a pattern opens a capturing
group. After a successful match the matched text of the nth group
is in $n:
if ($time =~ /(\d\d):(\d\d):(\d\d)/) {
my ($hours, $minutes, $seconds) = ($1, $2, $3);
}
In list context, a match returns the list of captured strings directly:
my ($h, $m, $s) = $time =~ /(\d\d):(\d\d):(\d\d)/;
If the pattern fails, the list is empty — a useful idiom for “parse or give up”:
my ($h, $m, $s) = $time =~ /(\d\d):(\d\d):(\d\d)/
or die "not a time: $time";
Nested groups are numbered by the position of their opening (, in
left-to-right order:
/(ab(cd|ef)((gi)|j))/
1 2 34
$1 captures the outer group, $2 the first inner, $3 the next,
$4 the innermost.
Unset capture groups — ones that did not participate in the match —
have $n undefined. Check with defined, not truth:
if ("x" =~ /(a)?(x)/) {
print "1 is $1\n" if defined $1; # $1 is undef here
print "2 is $2\n" if defined $2;
}
Non-capturing groups#
If you only need the grouping for quantification or alternation, and
don’t want the capture, use (?:…):
/(?:ab){3}/; # 'ababab', no capture
/(?:\d+\.)*\d+/; # a dotted decimal, no captures at all
Non-capturing groups are a small speed win and a larger clarity win. They signal “this grouping is for syntax, not data”. They also prevent renumbering of the capturing groups you do care about:
# match a number — $1 = whole, $2 = optional exponent value
/([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE]([+-]?\d+))?)/;
Without the (?:…) wrappings, $2, $3, $4 would all be set
and the intended $2 (the exponent) would shift to $5.
Split also benefits from (?:…). split /(?:\s+)/ separates on
runs of whitespace without inserting the separators into the output;
split /(\s+)/ leaves them in alternating positions.
Named captures#
(?<name>…) or (?'name'…) names a group. Its match is accessible
through the hash %+:
if ("2026-04-23" =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
print "year = $+{year}\n"; # 2026
print "month = $+{month}\n"; # 04
print "day = $+{day}\n"; # 23
}
Named groups also populate $1, $2, … in the usual left-to-right
order, so code that uses both conventions works. Inside the pattern
itself, reference a named group with \k<name> (or \k'name'):
/(?<quote>["'])(.*?)\k<quote>/; # same quote at start and end
Backreferences#
A backreference in a pattern demands that a later position match the same text an earlier group captured — not the same pattern, the same actual characters.
Form |
Refers to |
|---|---|
|
first capturing group |
|
same as |
|
braces disambiguate |
|
immediately previous capturing group (relative) |
|
second-previous capturing group |
|
named capturing group |
|
same, alternate brace form |
Examples:
# Match a three-letter word followed by a space and the same word.
"the the other day" =~ /\b(\w{3})\s\1\b/; # $1 eq 'the'
# Match a four-letter, three-letter, two-letter, or one-letter
# run followed by itself.
/^(\w{1,4})\1$/; # 'beriberi', 'booboo', 'coco', 'mama', 'papa'
Use \g{…} when digits follow the reference to avoid ambiguity:
/(\d)abc\g{1}23/; # the '1' refers to group 1, '23' is literal
/(\d)abc\123/; # '\123' is octal 0x53 ('S'), not group 1
Relative backreferences (\g-1, \g{-2}) refer to the nth-most
recently opened group. They survive when the pattern is embedded
inside another that adds outer groups in front:
my $pair = '([a-z])(\d)\g{-1}\g{-2}'; # a11a, g22g, x33x, ...
# Embed it: outer group shifts numbering by 1, but relative
# backreferences still work:
"code=e99e" =~ /^(\w+)=$pair$/; # matches
Named and relative references make long patterns robust against cut-and-paste.
The position arrays: @- and @+#
After a successful match, @- and @+ hold the start and end
offsets of the whole match and of each capture group:
$-[0],$+[0]— offsets of the whole match.$-[n],$+[n]— offsets of the nth capture, or undef if the group did not participate.
my $s = "Mmm...donut, thought Homer";
if ($s =~ /^(Mmm|Yech)\.\.\.(donut|peas)/) {
for my $i (1 .. $#-) {
printf "Match %d: %s at (%d,%d)\n",
$i,
substr($s, $-[$i], $+[$i] - $-[$i]),
$-[$i], $+[$i];
}
}
# Match 1: Mmm at (0,3)
# Match 2: donut at (6,11)
Offsets are often easier than substrings when you need to modify the original string at the matched position.
Prematch, match, postmatch#
Perl sets three special scalars after each match that expose the surrounding text:
$`— everything before the match (the pre-match).$&— the match itself.$'— everything after the match (the post-match).
"the cat caught the mouse" =~ /cat/;
# $` = 'the '
# $& = 'cat'
# $' = ' caught the mouse'
On modern Perl these carry no performance penalty. Older guidance said to avoid them; that guidance no longer applies.
The named variants ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}
also exist and are set whether or not the /p modifier is present
(the /p modifier itself has been a no-op since Perl 5.20).
Alternative numbering across branches: (?|…)#
Parallel alternatives sometimes want to capture into the same
numbered slots regardless of which branch matches. (?|…) reuses
the same group numbers across each alternative inside it:
if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))\s+([A-Z]{3})/) {
# $1 is hour from whichever branch matched
# $2 is minute from whichever branch matched
# $3 is zone, numbered after both branches
print "hour=$1 minute=$2 zone=$3\n";
}
Without (?|…) you would have to check $1-vs-$3 or use named
captures. It is covered further in the alternation chapter.
$+ and $^N#
$+ holds the match of the highest-numbered capture group that
succeeded. $^N holds the match of the most-recently-closed
capture group (rightmost ) that completed), which is the one you
want inside a (?{…}) code assertion.
Summary#
(…)captures; use$1,$2, … or@-,@+.(?:…)groups without capturing; prefer this when you don’t need the match.(?<name>…)names the capture; access via%+, reference with\k<name>.Backreferences are
\1,\g{1},\g{-1},\k<name>.Prematch, match, postmatch in
$`,$&,$'.
See also#
perlre— the full capture semantics including the\K(keep) assertion.perlvar— the full list of capture-related special variables, including%+,%-,$&,$^N.The alternation chapter —
(?|…)and branch reset.