Groups and captures#

Grouping does two things, which are easy to confuse: it turns a sequence of pattern elements into a single unit for quantification and alternation, and it saves what that unit matched for later use.

/house(cat|keeper)/;       # 'house' followed by 'cat' or 'keeper'
/(ab){3}/;                 # 'ababab'
/(\d{3})-(\d{4})/;         # capture two groups separated by '-'

Capturing groups: $1, $2, …#

Every pair of unescaped parentheses in a pattern opens a capturing group. After a successful match the matched text of the nth group is in $n:

if ($time =~ /(\d\d):(\d\d):(\d\d)/) {
    my ($hours, $minutes, $seconds) = ($1, $2, $3);
}

In list context, a match returns the list of captured strings directly:

my ($h, $m, $s) = $time =~ /(\d\d):(\d\d):(\d\d)/;

If the pattern fails, the list is empty — a useful idiom for “parse or give up”:

my ($h, $m, $s) = $time =~ /(\d\d):(\d\d):(\d\d)/
    or die "not a time: $time";

Nested groups are numbered by the position of their opening (, in left-to-right order:

/(ab(cd|ef)((gi)|j))/
  1  2      34

$1 captures the outer group, $2 the first inner, $3 the next, $4 the innermost.

Unset capture groups — ones that did not participate in the match — have $n undefined. Check with defined, not truth:

if ("x" =~ /(a)?(x)/) {
    print "1 is $1\n" if defined $1;   # $1 is undef here
    print "2 is $2\n" if defined $2;
}

Non-capturing groups#

If you only need the grouping for quantification or alternation, and don’t want the capture, use (?:…):

/(?:ab){3}/;            # 'ababab', no capture
/(?:\d+\.)*\d+/;        # a dotted decimal, no captures at all

Non-capturing groups are a small speed win and a larger clarity win. They signal “this grouping is for syntax, not data”. They also prevent renumbering of the capturing groups you do care about:

# match a number — $1 = whole, $2 = optional exponent value
/([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE]([+-]?\d+))?)/;

Without the (?:…) wrappings, $2, $3, $4 would all be set and the intended $2 (the exponent) would shift to $5.

Split also benefits from (?:…). split /(?:\s+)/ separates on runs of whitespace without inserting the separators into the output; split /(\s+)/ leaves them in alternating positions.

Named captures#

(?<name>…) or (?'name'…) names a group. Its match is accessible through the hash %+:

if ("2026-04-23" =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
    print "year = $+{year}\n";    # 2026
    print "month = $+{month}\n";  # 04
    print "day = $+{day}\n";      # 23
}

Named groups also populate $1, $2, … in the usual left-to-right order, so code that uses both conventions works. Inside the pattern itself, reference a named group with \k<name> (or \k'name'):

/(?<quote>["'])(.*?)\k<quote>/;   # same quote at start and end

Backreferences#

A backreference in a pattern demands that a later position match the same text an earlier group captured — not the same pattern, the same actual characters.

Form

Refers to

\1

first capturing group

\g1

same as \1 (prefer \g1 when digits follow)

\g{1}

braces disambiguate

\g-1

immediately previous capturing group (relative)

\g{-2}

second-previous capturing group

\k<name>

named capturing group

\k{name}

same, alternate brace form

Examples:

# Match a three-letter word followed by a space and the same word.
"the the other day" =~ /\b(\w{3})\s\1\b/;   # $1 eq 'the'

# Match a four-letter, three-letter, two-letter, or one-letter
# run followed by itself.
/^(\w{1,4})\1$/;    # 'beriberi', 'booboo', 'coco', 'mama', 'papa'

Use \g{…} when digits follow the reference to avoid ambiguity:

/(\d)abc\g{1}23/;    # the '1' refers to group 1, '23' is literal
/(\d)abc\123/;       # '\123' is octal 0x53 ('S'), not group 1

Relative backreferences (\g-1, \g{-2}) refer to the nth-most recently opened group. They survive when the pattern is embedded inside another that adds outer groups in front:

my $pair = '([a-z])(\d)\g{-1}\g{-2}';   # a11a, g22g, x33x, ...

# Embed it: outer group shifts numbering by 1, but relative
# backreferences still work:
"code=e99e" =~ /^(\w+)=$pair$/;   # matches

Named and relative references make long patterns robust against cut-and-paste.

The position arrays: @- and @+#

After a successful match, @- and @+ hold the start and end offsets of the whole match and of each capture group:

  • $-[0], $+[0] — offsets of the whole match.

  • $-[n], $+[n] — offsets of the nth capture, or undef if the group did not participate.

my $s = "Mmm...donut, thought Homer";
if ($s =~ /^(Mmm|Yech)\.\.\.(donut|peas)/) {
    for my $i (1 .. $#-) {
        printf "Match %d: %s at (%d,%d)\n",
               $i,
               substr($s, $-[$i], $+[$i] - $-[$i]),
               $-[$i], $+[$i];
    }
}
# Match 1: Mmm at (0,3)
# Match 2: donut at (6,11)

Offsets are often easier than substrings when you need to modify the original string at the matched position.

Prematch, match, postmatch#

Perl sets three special scalars after each match that expose the surrounding text:

  • $` — everything before the match (the pre-match).

  • $& — the match itself.

  • $' — everything after the match (the post-match).

"the cat caught the mouse" =~ /cat/;
# $`   = 'the '
# $&   = 'cat'
# $'   = ' caught the mouse'

On modern Perl these carry no performance penalty. Older guidance said to avoid them; that guidance no longer applies.

The named variants ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} also exist and are set whether or not the /p modifier is present (the /p modifier itself has been a no-op since Perl 5.20).

Alternative numbering across branches: (?|…)#

Parallel alternatives sometimes want to capture into the same numbered slots regardless of which branch matches. (?|…) reuses the same group numbers across each alternative inside it:

if ($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))\s+([A-Z]{3})/) {
    # $1 is hour from whichever branch matched
    # $2 is minute from whichever branch matched
    # $3 is zone, numbered after both branches
    print "hour=$1 minute=$2 zone=$3\n";
}

Without (?|…) you would have to check $1-vs-$3 or use named captures. It is covered further in the alternation chapter.

$+ and $^N#

$+ holds the match of the highest-numbered capture group that succeeded. $^N holds the match of the most-recently-closed capture group (rightmost ) that completed), which is the one you want inside a (?{…}) code assertion.

Summary#

  • (…) captures; use $1, $2, … or @-, @+.

  • (?:…) groups without capturing; prefer this when you don’t need the match.

  • (?<name>…) names the capture; access via %+, reference with \k<name>.

  • Backreferences are \1, \g{1}, \g{-1}, \k<name>.

  • Prematch, match, postmatch in $`, $&, $'.

See also#

  • perlre — the full capture semantics including the \K (keep) assertion.

  • perlvar — the full list of capture-related special variables, including %+, %-, $&, $^N.

  • The alternation chapter — (?|…) and branch reset.