Regular expressions and pattern matching
m//#
Search a string for a pattern and report whether — and what — it matched.
m// is the match operator. It compiles PATTERN as a regular
expression (see perlre), runs it against a target string,
and returns a value shaped by calling context and by the modifiers
you apply. The target is whatever sits on the left of
=~ or !~; without a binding operator the
target is $_. The leading m is optional when the
delimiter is /, so /PATTERN/ and m/PATTERN/ mean the same
thing.
Synopsis#
$str =~ m/PATTERN/flags
$str =~ /PATTERN/flags
m/PATTERN/flags # target is $_
/PATTERN/flags # target is $_
$str =~ m{PATTERN}flags # any paired non-word delimiters
What you get back#
Context decides the shape of the return value.
Scalar context, no
/g:1on match, the empty string on no match. Both are usable as booleans; the empty string is a dual-value false that also equals0numerically.List context, no
/g: the list of capture values($1, $2, $3, …)on a successful match; if the pattern has no capture groups, the singleton(1); on failure, the empty list. This is howif (my ($x, $y) = $s =~ /(\w+)=(\w+)/)works.Scalar context,
/g: each call advances through the string, returning true for the next match and false once there are no more.poson the target tracks where the next attempt will begin.List context,
/g: every match in one shot. With capture groups, the flat list of all captures from every match. Without captures, the list of every full match.
Successful matches also populate the regex special variables
($1 through $9, $&,
$`, $', $+,
%+, %-) for the enclosing dynamic
scope. A failed match leaves them holding their previous values —
always test the match itself, never a capture variable, to decide
whether a match happened.
Global state it touches#
$1,$2, … — numbered captures, set on success, unchanged on failure.$+— the highest-numbered capture that actually matched (useful with alternations).${^LAST_SUCCESSFUL_PATTERN}— the last pattern that matched in the current dynamic scope; also the pattern the empty formm//reuses (see Edge cases).poson the target string — read and updated by/gmatching; reset on failure unless/cis also set.Locale / Unicode rule sources when
/l,/u, or/dare in effect.
Delimiters#
With m, any pair of non-whitespace characters works as the
delimiter, and bracketing pairs nest:
m/pattern/
m{pattern}
m[pattern]
m(pattern)
m<pattern>
m!pattern!
m#pattern#
m,pattern,
Picking a delimiter that does not appear in the pattern avoids
backslash-clutter — known as LTS, leaning toothpick syndrome. A
path-matching pattern reads cleanly with m{…} or m!…! and badly
with m/…/.
Two delimiter choices change semantics:
'(single quote) — no variable interpolation insidePATTERN.m'$foo'matches the literal four characters.?—m?PATTERN?matches only once between calls toreset. The leadingmis mandatory; since Perl 5.22 the bare?…?form is a syntax error.
When the delimiter is a word character (a letter or digit), a
space is required after m: m q foo q is legal, mqfooq is not.
Modifiers#
Pattern-compile modifiers (also accepted by qr,
s, and split):
m— multi-line:^and$match at every embedded newline, not only at string ends.s— single-line:.matches every character including newline.i— case-insensitive matching.x— ignore whitespace and#-comments in the pattern;xxextends this into character classes.p— preserve copies of the matched string. Since 5.20 this is a no-op —${^PREMATCH},${^MATCH},${^POSTMATCH}are always available after a successful match.a,u,l,d— character-set rules for\d,\s,\w, and the POSIX classes./arestricts them to ASCII;/aaadditionally forbids ASCII/non-ASCII matching under/i.n— non-capturing:(…)behaves like(?:…)and does not populate$1,$2, ….o— compile the pattern exactly once even if interpolated variables change. Almost always the wrong tool; useqrto build a reusable compiled pattern instead.
Match-process modifiers (specific to m// and s///):
Examples#
Test whether a string contains a pattern:
if ($line =~ /error/i) {
warn "matched: $line";
}
Bind a capture in one go:
if (my ($key, $val) = $line =~ /^(\w+)\s*=\s*(.*)$/) {
$config{$key} = $val;
}
Pull every number out of a string with list-context /g:
my @nums = "x=1 y=22 z=333" =~ /(\d+)/g;
# @nums = (1, 22, 333)
Iterate matches one at a time with scalar-context /g, using
pos to see where the engine is:
my $s = "foo 1 bar 22 baz 333";
while ($s =~ /(\d+)/g) {
printf "matched %s at offset %d\n", $1, pos($s) - length($1);
}
Extended form with the x modifier and named captures:
if ($ts =~ m{
^ (?<year>\d{4}) -
(?<mon> \d{2}) -
(?<day> \d{2}) $
}x) {
printf "year=%s mon=%s day=%s\n", $+{year}, $+{mon}, $+{day};
}
Avoid LTS by picking a delimiter that does not appear in the pattern:
next if $path =~ m{^/usr/local/};
Use \G with m//gc to walk a string token-by-token without
losing position on a failed arm:
while (1) {
if ($s =~ /\G(\d+)/gc) { push @tok, ['num', $1] }
elsif ($s =~ /\G(\w+)/gc) { push @tok, ['word', $1] }
elsif ($s =~ /\G(\s+)/gc) { next }
else { last }
}
Edge cases#
Empty pattern:
//andm//reuse the last successfully matched pattern in the current dynamic scope. If nothing has matched yet, an empty pattern matches everywhere. Passing user input straight intom/$pat/when$patmight be empty is a sharp edge — wrap it in a non-capturing group:m/(?:$pat)/. The last successful pattern is also readable as${^LAST_SUCCESSFUL_PATTERN}.Defined-or ambiguity: Perl resolves
$x // $yas the defined-or operator, never as two empty matches. In pathological positions (print $fh //) Perl still assumes defined-or; force a match by writingm//explicitly or spacing out the delimiters.Failed match leaves captures stale:
$1after a failing/…/still holds the capture from the previous successful match — always gate capture use on the match result./owith changing variables:m/$x/olocks the first value of$xinto the compiled pattern. Later changes to$xare silently ignored. Reach forqrinstead when you want an explicit, reusable compilation.Interpolation when the delimiter is
':m'$var'is a literal-dollar-var match. This is rarely what you want, and the same effect is available with\Q…\Eorquotemetain any other delimiter./gplus target modification: modifying the target between/giterations resetsposto the start. Iterate over a copy if you need to mutate the original as you go.\Goutside/g: without/g,\Ganchors at theposthe target had at call time and matches at most once. On a string that has never had a/gapplied,\Gis equivalent to\A.m?…?reset scope:resetclearsm??state only for the current package. Am??in one package is not affected byresetcalled from another.Comparison operators near an empty regex:
$x //= 1is always the defined-or assignment; if you genuinely want the empty regex, writem//rather than//.
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
qr— compile a pattern once and reuse it; avoids/oand keeps the pattern a first-class values— same pattern syntax, replaces what it matchestr— character-by-character translation; a different tool with a superficially similar shapesplit— when you want the pieces between matches rather than the matches themselvespos— read or set the position/gmatching resumes fromperlre— the regex language itself: assertions, character classes, backreferences, named captures