Regex match variables#

After every successful pattern match, Perl populates a fixed set of variables with information about what matched and where. These are read-only — you observe them, you do not write to them. The «successful» qualification is critical: an unsuccessful match leaves the variables holding whatever the previous successful match in the same dynamic scope set them to. Always gate your access with the boolean result of the match itself.

Variable

Holds

$1..$N

Text captured by the Nth capturing group

$&

The entire matched substring

$`

The string preceding the match

$'

The string following the match

$+

Text captured by the highest-numbered group

$^N

Text captured by the most recently closed group

@-

Start offsets: $-[0] = match, $-[N] = $N start

@+

End offsets: $+[0] = match end, $+[N] = $N end

%+

Hash of named captures: $+{name} = (?<name>...)

%-

Hash of all captures by name (arrayref values)

@{^CAPTURE}

Array of captures: ${^CAPTURE}[0] = $1, etc.

${^MATCH}

Same as $&, populated only with the /p flag

${^PREMATCH}

Same as $`, only with /p

${^POSTMATCH}

Same as $', only with /p

The basic pattern#

if ("Mr. Smith, age 47" =~ /(\w+)\s+(\w+),\s+age\s+(\d+)/) {
    print "title: $1\n";          # Mr
    print "name:  $2\n";          # Smith
    print "age:   $3\n";          # 47
    print "match: $&\n";          # the full match
}

The if is what makes the access safe. Without it, on a non-matching string, $1/$& would still hold values from some earlier successful match — there is no «match failed → clear» rule.

Numbered captures — $1..$N#

Each (...) capturing group fills one variable. The numbering is left-to-right by opening parenthesis:

"alpha-beta=42" =~ /^(\w+)-(\w+)=(\d+)$/;
print "$1 / $2 / $3\n";           # alpha / beta / 42

Non-capturing groups (?:...) and lookarounds (?=...)/(?!...) do not consume a number; they are invisible to the count.

For s/// substitutions, $1..$N are visible inside the replacement (along with the substitution-only $1-style backreferences in single-quoted replacements):

my $s = "John Smith";
$s =~ s/(\w+)\s+(\w+)/$2, $1/;    # "Smith, John"

Named captures — (?<name>...) and %+#

Named captures are clearer than counting parentheses, especially in patterns with many groups:

my $log = "2024-03-15 14:32:01 ERROR connection refused";
if ($log =~ /^(?<date>\d{4}-\d{2}-\d{2})
              \s+
              (?<time>\d{2}:\d{2}:\d{2})
              \s+
              (?<level>\w+)
              \s+
              (?<msg>.*)$/x) {
    print "[$+{level}] $+{date} $+{time}: $+{msg}\n";
}

%+ is the named-capture hash. The numbered forms still work (named captures are also assigned numbers in the order they appear), so $1 would be $+{date} here.

%- is similar but its values are array references, holding every capture under that name (relevant when using branch-reset groups (?|...|...) or when a name is reused across alternation branches):

"abc" =~ /(?|(?<x>a)|(?<x>b)|(?<x>c))/;
print "%-{x} has @{$-{x}}\n";    # captured value(s) under name 'x'

Most code only ever reads %+. %- is for the corner cases.

Match boundaries — @- and @+#

The match’s start and end offsets in the matched string:

"hello world" =~ /(\w+)\s+(\w+)/;
print "match started at $-[0], ended at $+[0]\n";   # 0 .. 11
print "group 1: $-[1] .. $+[1]\n";                  # 0 .. 5
print "group 2: $-[2] .. $+[2]\n";                  # 6 .. 11

$-[N] and $+[N] give the same information as substr($var, $-[N], $+[N] - $-[N]) would extract — they are how you reconstruct positions, not values, after a match.

The classic use: replacing matched substrings while preserving the original (without the s/// operator):

my $s = "the quick brown fox";
$s =~ /quick (brown)/;
my $before = substr($s, 0, $-[0]);
my $after  = substr($s, $+[0]);
my $g1     = substr($s, $-[1], $+[1] - $-[1]);
print "$before|FOUND $g1|$after\n";

$&, $`, $' — match, prematch, postmatch#

"hello world" =~ /wo\w+/;
print "before: '$`'\n";          # 'hello '
print "match:  '$&'\n";          # 'world'
print "after:  '$\''\n";         # ''

These three are Perl’s oldest match variables and historically the most expensive — see Performance: the $& story below.

$+ is the text captured by the highest-numbered group that participated in the match:

"abc" =~ /(a)(b)?(c)?/;          # $1='a', $2='b', $3='c', $+='c'
"a"   =~ /(a)(b)?(c)?/;          # $1='a', $2=undef, $3=undef, $+='a'

$^N is similar but holds the most-recently-closed group’s text — useful inside complex patterns that need the value of «the group that just finished»:

my $s = "tag:value";
$s =~ /(\w+):(\w+) (?{ $cb = $^N })/;
# $cb is the text most recently captured (here: "value")

Performance: the $& story#

Historical caveat that you will find in old code and in books:

Don’t mention $&, $`, or $' anywhere in your code, including in modules you require — they cause every successful match to copy the whole matched string, slowing the program down.

This was true through Perl 5.10. From Perl 5.18 onward, the runtime tracks which of the three variables your code actually mentions and only copies what is needed. From Perl 5.20 a copy-on-write scheme makes them effectively free.

PetaPerl follows the modern behaviour: $& and friends are safe to use anywhere. The /p flag and ${^MATCH} / ${^PREMATCH} / ${^POSTMATCH} exist for the era between 5.10 and 5.20 when /p-gated copying was a useful optimisation. There is no reason to write new code with them.

Scoping — they are dynamically scoped#

The match variables are not lexicals; they behave like dynamically-scoped variables. Every successful match localises a global match state to the current dynamic scope. Crucially:

  • An unsuccessful match does not clear the variables.

  • A successful match inside an inner block overrides them, but only within that block — when the inner block exits, the outer scope’s match state is restored.

"alpha" =~ /(\w+)/;              # $1 = 'alpha'
{
    "1234" =~ /(\d+)/;           # $1 = '1234' (inside this block)
    print "inner: $1\n";          # 1234
}
print "outer: $1\n";              # alpha — restored

This is occasionally surprising: a function you call from inside a regex-handling block does not pollute your $1 unless that function does its own successful match in the same dynamic scope (which is rare unless they are running in the same lexical block).

${^LAST_SUCCESSFUL_PATTERN}#

A read-only reference to the regex that produced the current match state — useful for diagnostics:

"hello" =~ /(\w+)/;
print "last pattern was: ${^LAST_SUCCESSFUL_PATTERN}\n";

When matches fail — pos#

The position on a string after a /g-anchored match is held not in one of the variables on this page, but in pos:

my $s = "1 2 3 4";
while ($s =~ /(\d+)/g) {
    print "matched $1 at ", pos($s) - length($1), "\n";
}
# After the loop, pos($s) is undef.

pos is per-string, settable, and is what \G anchors to.

@{^CAPTURE} — captures as an array#

The numbered captures, also exposed as a zero-indexed array:

"alpha=42" =~ /(\w+)=(\d+)/;
print "name = ${^CAPTURE}[0]\n"; # 'alpha' (same as $1)
print "val  = ${^CAPTURE}[1]\n"; # '42'    (same as $2)
print "n    = scalar @{^CAPTURE}\n";  # 2

This is occasionally easier to iterate than $1, $2, , but most code uses the numbered or named forms directly.

See also#

  • m//, s/// — the operators that populate every variable on this page.

  • qr// — compiles a pattern; the resulting regex object can later be matched against and produce these same captures.

  • pos — the per-string offset for /g and \G.

  • Regex binding=~, the operator that decides which string the match runs against.

  • Regular expressions guide — the regex language itself.

  • Groups and captures — the chapter on the capture variables.