Regular expressions and pattern matching

m//#

Search a string for a pattern and report whether — and what — it matched.

m// is the match operator. It compiles PATTERN as a regular expression (see perlre), runs it against a target string, and returns a value shaped by calling context and by the modifiers you apply. The target is whatever sits on the left of =~ or !~; without a binding operator the target is $_. The leading m is optional when the delimiter is /, so /PATTERN/ and m/PATTERN/ mean the same thing.

Synopsis#

$str =~ m/PATTERN/flags
$str =~ /PATTERN/flags
m/PATTERN/flags              # target is $_
/PATTERN/flags               # target is $_
$str =~ m{PATTERN}flags      # any paired non-word delimiters

What you get back#

Context decides the shape of the return value.

  • Scalar context, no /g: 1 on match, the empty string on no match. Both are usable as booleans; the empty string is a dual-value false that also equals 0 numerically.

  • List context, no /g: the list of capture values ($1, $2, $3, …) on a successful match; if the pattern has no capture groups, the singleton (1); on failure, the empty list. This is how if (my ($x, $y) = $s =~ /(\w+)=(\w+)/) works.

  • Scalar context, /g: each call advances through the string, returning true for the next match and false once there are no more. pos on the target tracks where the next attempt will begin.

  • List context, /g: every match in one shot. With capture groups, the flat list of all captures from every match. Without captures, the list of every full match.

Successful matches also populate the regex special variables ($1 through $9, $&, $`, $', $+, %+, %-) for the enclosing dynamic scope. A failed match leaves them holding their previous values — always test the match itself, never a capture variable, to decide whether a match happened.

Global state it touches#

  • $_ — the default target when no =~ binding is given.

  • $1, $2, … — numbered captures, set on success, unchanged on failure.

  • $&, $`, $' — match, prematch, postmatch.

  • $+ — the highest-numbered capture that actually matched (useful with alternations).

  • %+, %- — named-capture hashes.

  • ${^LAST_SUCCESSFUL_PATTERN} — the last pattern that matched in the current dynamic scope; also the pattern the empty form m// reuses (see Edge cases).

  • pos on the target string — read and updated by /g matching; reset on failure unless /c is also set.

  • Locale / Unicode rule sources when /l, /u, or /d are in effect.

Delimiters#

With m, any pair of non-whitespace characters works as the delimiter, and bracketing pairs nest:

m/pattern/
m{pattern}
m[pattern]
m(pattern)
m<pattern>
m!pattern!
m#pattern#
m,pattern,

Picking a delimiter that does not appear in the pattern avoids backslash-clutter — known as LTS, leaning toothpick syndrome. A path-matching pattern reads cleanly with m{…} or m!…! and badly with m/…/.

Two delimiter choices change semantics:

  • ' (single quote) — no variable interpolation inside PATTERN. m'$foo' matches the literal four characters.

  • ?m?PATTERN? matches only once between calls to reset. The leading m is mandatory; since Perl 5.22 the bare ?…? form is a syntax error.

When the delimiter is a word character (a letter or digit), a space is required after m: m q foo q is legal, mqfooq is not.

Modifiers#

Pattern-compile modifiers (also accepted by qr, s, and split):

  • m — multi-line: ^ and $ match at every embedded newline, not only at string ends.

  • s — single-line: . matches every character including newline.

  • i — case-insensitive matching.

  • x — ignore whitespace and #-comments in the pattern; xx extends this into character classes.

  • p — preserve copies of the matched string. Since 5.20 this is a no-op — ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} are always available after a successful match.

  • a, u, l, d — character-set rules for \d, \s, \w, and the POSIX classes. /a restricts them to ASCII; /aa additionally forbids ASCII/non-ASCII matching under /i.

  • n — non-capturing: (…) behaves like (?:…) and does not populate $1, $2, ….

  • o — compile the pattern exactly once even if interpolated variables change. Almost always the wrong tool; use qr to build a reusable compiled pattern instead.

Match-process modifiers (specific to m// and s///):

  • g — global matching. Scalar-context behavior is iterative (advances pos on each call); list-context behavior returns every match at once.

  • c — only meaningful with /g. A failed /g match keeps pos where it was instead of resetting to the start; required for lex-style scanners built around \G.

Examples#

Test whether a string contains a pattern:

if ($line =~ /error/i) {
    warn "matched: $line";
}

Bind a capture in one go:

if (my ($key, $val) = $line =~ /^(\w+)\s*=\s*(.*)$/) {
    $config{$key} = $val;
}

Pull every number out of a string with list-context /g:

my @nums = "x=1 y=22 z=333" =~ /(\d+)/g;
# @nums = (1, 22, 333)

Iterate matches one at a time with scalar-context /g, using pos to see where the engine is:

my $s = "foo 1 bar 22 baz 333";
while ($s =~ /(\d+)/g) {
    printf "matched %s at offset %d\n", $1, pos($s) - length($1);
}

Extended form with the x modifier and named captures:

if ($ts =~ m{
        ^ (?<year>\d{4}) -
          (?<mon> \d{2}) -
          (?<day> \d{2}) $
    }x) {
    printf "year=%s mon=%s day=%s\n", $+{year}, $+{mon}, $+{day};
}

Avoid LTS by picking a delimiter that does not appear in the pattern:

next if $path =~ m{^/usr/local/};

Use \G with m//gc to walk a string token-by-token without losing position on a failed arm:

while (1) {
    if    ($s =~ /\G(\d+)/gc)    { push @tok, ['num',  $1] }
    elsif ($s =~ /\G(\w+)/gc)    { push @tok, ['word', $1] }
    elsif ($s =~ /\G(\s+)/gc)    { next                   }
    else                         { last                   }
}

Edge cases#

  • Empty pattern: // and m// reuse the last successfully matched pattern in the current dynamic scope. If nothing has matched yet, an empty pattern matches everywhere. Passing user input straight into m/$pat/ when $pat might be empty is a sharp edge — wrap it in a non-capturing group: m/(?:$pat)/. The last successful pattern is also readable as ${^LAST_SUCCESSFUL_PATTERN}.

  • Defined-or ambiguity: Perl resolves $x // $y as the defined-or operator, never as two empty matches. In pathological positions (print $fh //) Perl still assumes defined-or; force a match by writing m// explicitly or spacing out the delimiters.

  • Failed match leaves captures stale: $1 after a failing /…/ still holds the capture from the previous successful match — always gate capture use on the match result.

  • /o with changing variables: m/$x/o locks the first value of $x into the compiled pattern. Later changes to $x are silently ignored. Reach for qr instead when you want an explicit, reusable compilation.

  • Interpolation when the delimiter is ': m'$var' is a literal-dollar-var match. This is rarely what you want, and the same effect is available with \Q…\E or quotemeta in any other delimiter.

  • /g plus target modification: modifying the target between /g iterations resets pos to the start. Iterate over a copy if you need to mutate the original as you go.

  • \G outside /g: without /g, \G anchors at the pos the target had at call time and matches at most once. On a string that has never had a /g applied, \G is equivalent to \A.

  • m?…? reset scope: reset clears m?? state only for the current package. A m?? in one package is not affected by reset called from another.

  • Comparison operators near an empty regex: $x //= 1 is always the defined-or assignment; if you genuinely want the empty regex, write m// rather than //.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • qr — compile a pattern once and reuse it; avoids /o and keeps the pattern a first-class value

  • s — same pattern syntax, replaces what it matches

  • tr — character-by-character translation; a different tool with a superficially similar shape

  • split — when you want the pieces between matches rather than the matches themselves

  • pos — read or set the position /g matching resumes from

  • perlre — the regex language itself: assertions, character classes, backreferences, named captures