Modifiers#

Modifiers change how a pattern is matched without changing the pattern itself. They appear after the closing delimiter of a match, substitution, or qr//:

"Hello" =~ /hello/i;        # case-insensitive match
$x =~ s/foo/bar/g;          # global substitution
my $re = qr/\d+/i;          # compiled pattern with /i

The same modifiers can be embedded inside a pattern with (?flags) or (?flags:…), localising their effect to part of the pattern.

The full modifier set#

Modifier

Effect

/i

case-insensitive matching

/m

multi-line: ^ and $ match at every \n

/s

single-line: . matches \n too

/x

extended: ignore whitespace and # comments

/xx

extended-extended: also ignore whitespace inside […]

/g

global: match as many times as possible

/c

do not reset pos on failure (with /g)

/r

s///r returns the result instead of modifying

/e

s///e evaluates the replacement as Perl code

/ee

s///ee evaluates, then evaluates the result too

/n

non-capturing: (…) act like (?:…)

/p

no-op; accepted for compat

/o

compile the pattern once (rarely needed; prefer qr//)

/a

\d, \w, \s restricted to ASCII

/aa

/a, plus /i does not cross ASCII/non-ASCII boundary

/u

Unicode semantics regardless of use utf8

/l

use current locale

/d

dual-mode semantics — avoid in new code

The first eight are the everyday modifiers; the charset modifiers /a, /u, /l, /d are covered in detail in the unicode chapter and summarised again here.

/i — case-insensitive#

"Hello" =~ /hello/i;          # matches
"HELLO" =~ /[A-Z]/i;          # matches — class is insensitive too
"Grüße" =~ /GRÜSSE/i;         # matches under Unicode semantics

Unicode case folding includes mappings like ß ss, German Eszett casefolding, and so on. The full table lives in the Unicode standard; for ASCII, /i does what you expect.

/i carries a negligible performance penalty for ASCII. The implementation folds case at compile time — the pattern carries the case-folded character set, the input is read once. Unicode case-folding is more involved (sequences like ß ss), but the extra work is bounded and rarely shows up in practice.

/m — multi-line#

Changes where ^ and $ match. Without /m, they match only at the outer ends of the string. With /m, they match at every embedded newline too.

my $x = "first\nsecond\nthird";

$x =~ /^second/;    # does not match
$x =~ /^second/m;   # matches — second is at start of a line
$x =~ /first$/m;    # matches
$x =~ /third$/m;    # matches

\A, \z, \Z remain absolute string anchors even under /m — see the anchors and assertions chapter.

/s — single-line#

Makes . match newline characters too.

my $x = "a\nb";
$x =~ /a.b/;        # does not match — . does not cross \n
$x =~ /a.b/s;       # matches

/m and /s are independent. They can both be used on the same match. Despite the names, they do not conflict:

$x =~ /^a.b$/sm;    # . matches newline AND ^,$ are line-aware

/x — extended pattern#

Ignores literal whitespace and lets # introduce end-of-line comments. Crucial for any pattern more than a line long.

Before /x:

/^[+-]?\d+(\.\d*)?([eE][+-]?\d+)?$/;

After:

/^
    [+-]?           # optional sign
    \d+             # integer part
    (\.\d*)?        # optional fraction
    ([eE][+-]?\d+)? # optional exponent
 $/x;

Whitespace in the pattern is ignored; whitespace you want to match becomes \s, \ , or [ ]:

/\w+ \s+ \w+/x;      # three tokens: word, space, word (literal spaces ignored)
/\w+\s+\w+/x;        # equivalent
/key:[ ]value/x;     # literal space via bracket class
/key:\ value/x;      # literal space via backslash

# starts a comment that ends at the next newline. To match a literal # under /x, escape it or put it in a class.

The «Pattern White Space» set under /x follows Unicode UAX#31: SPACE, CHARACTER TABULATION, LINE FEED, LINE TABULATION, FORM FEED, CARRIAGE RETURN, NEXT LINE, PARAGRAPH SEPARATOR, LINE SEPARATOR. In practice you only meet ASCII space, tab, and newline.

/xx — extended in classes too#

Inside […] whitespace is not ignored under plain /x[ab c] matches a, b, ' ', or c. To also ignore whitespace inside classes, use /xx:

/[ab c]/xx;          # matches 'a', 'b', or 'c' — space ignored
/[ab\ c]/xx;         # the \ is needed to match a literal space
/[ ! @ ]/xx;         # matches '!' or '@' — visible spacing

/xx is a superset of /x. It is convenient for character classes laid out for readability, especially with shorthand classes:

/[ \d \s \-_,. ]+/xx;   # digits, whitespace, common punctuation

/xx was added in Perl 5.26. Patterns built before that read fine; new code can reach for it freely.

/g — global#

In scalar context, keeps a position in the string (pos $x) and advances each time the pattern matches, allowing iteration:

my $x = "cat dog house";
while ($x =~ /(\w+)/g) {
    print "$1 at ", pos($x), "\n";
}
# cat at 3
# dog at 7
# house at 13

In list context, returns all matches at once:

my @words = $x =~ /(\w+)/g;   # ('cat', 'dog', 'house')

If the pattern contains no captures, list context returns the whole matched text for each match:

my @digits = "abc123def456" =~ /\d+/g;   # ('123', '456')

With multiple captures, each iteration returns the tuple in order:

my @pairs = "a=1,b=2,c=3" =~ /(\w)=(\d)/g;
# ('a', '1', 'b', '2', 'c', '3')

/c — preserve position on failure#

By default, a failed /g match resets pos to undef. Under /gc, pos stays at its previous value — crucial for hand-rolled lexers:

my $s = "123abc";
while (1) {
    if ($s =~ /\G(\d+)/gc)    { print "num $1\n"; next; }
    if ($s =~ /\G([a-z]+)/gc) { print "word $1\n"; next; }
    last;   # nothing matched; exit
}

See the anchors and assertions chapter for \G.

/r — non-destructive substitution#

s/// normally modifies the target string and returns the count. s///r leaves the target alone and returns the result:

my $name = "  Alice  ";
my $trimmed = $name =~ s/^\s+|\s+$//gr;
# $name is still "  Alice  "
# $trimmed is "Alice"

Enables substitution chains without intermediate variables:

my $clean = $input
    =~ s/\s+/ /gr          # collapse runs of whitespace
    =~ s/^ | $//gr         # trim ends
    =~ s/[^\x00-\x7f]//gr; # drop non-ASCII

Each s///r returns the transformed string, which the next one receives.

/e — evaluate the replacement#

The replacement half of s///e is Perl code, not a double-quoted string. The return value of the code replaces the match:

my $x = "numbers: 1 2 3 4";
$x =~ s/(\d+)/$1 * 2/ge;
# $x is now "numbers: 2 4 6 8"

/ee evaluates twice: the code returns a string, which is then evaluated as Perl again. Rarely useful and easy to misuse — only reach for it when you are sure. The substitution chapter has worked examples.

/n — non-capturing#

Makes every ordinary (…) behave like (?:…). Useful when a long pattern has parentheses purely for grouping and you want to keep $1, $2, … unset:

"hello" =~ /(hi|hello)/n;    # matches, but $1 is not set

Named captures (?<name>…) still capture under /n. Added in Perl 5.22.

/p — no-op#

/p is a silent no-op. ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} are always available. Accepted for backward compatibility, but contributes nothing in new code.

/o — compile once#

/o disables re-compilation of a pattern that interpolates variables. Compiled patterns are also cached automatically, and qr// gives explicit control. /o is rarely the right tool; qr// is clearer and composes.

Charset modifiers — /a, /aa, /u, /l, /d#

These control how the shorthand classes (\d, \w, \s), case folding, and the POSIX class set behave under Unicode and locale considerations. Brief summary here; the full treatment is in the unicode chapter.

Modifier

Behaviour

/u

full Unicode semantics. Default under use v5.12+.

/a

restrict \d, \w, \s, [[:…:]] to ASCII.

/aa

/a plus /i does not cross ASCII/non-ASCII.

/l

follow the current use locale POSIX locale.

/d

dual-mode semantics. Avoid in new code.

/a, /u, /l, /d are mutually exclusive — at most one can be in effect at a time. /aa is a refinement of /a. They are set-once modifiers: an inner (?d:…) cannot un-set an outer /u.

The everyday choice in modern code: use v5.12 (or later) selects /u automatically, which is what you want for Unicode-correct text. Add /a only when the pattern must reject non-ASCII characters — typically because the input is known to be ASCII-only and you want to enforce that.

Inline modifiers#

(?flags) turns flags on for the rest of the enclosing group (or pattern):

/(?i)yes/;         # case-insensitive, same as /yes/i

(?flags:…) scopes the flags to the inner group only and is non-capturing:

/Answer: ((?i)yes)/;   # only the 'yes' is case-insensitive
/Answer: ((?i:yes))/;  # clearer: scope is the group's contents

(?-flags) turns flags off. They can be combined:

/(?i-m:pattern)/;   # turn on /i, turn off /m, within this group

Inline modifiers are the right tool when different parts of a long pattern need different modifiers. They are also useful in patterns built by interpolation, where the embedded (?i) travels with the pattern fragment.

Caret form: (?^…)#

(?^flags) is shorthand for «reset all flags, then apply these». The expansion is d-imnsx followed by the listed flags. So (?^x:…) means «default flags except /x is on, regardless of what flags were in scope outside».

The caret form is what Perl uses when stringifying compiled patterns; you may see it in error messages or qr// output. In hand-written code it is occasionally useful when interpolating a pattern fragment that should not inherit modifiers from its surroundings:

my $strict = qr/(?^:foo|bar)/;   # always default flags
# ... no matter how this is interpolated.

A negative flag is not legal after the caret (it would be redundant — the caret already cleared everything).

Mutually-exclusive flags#

/a and /aa override each other (last one wins); same for /x and /xx. They are not additive. (?xx-x:…) turns all x behaviour off, not «subtract one x from two».

The charset family /a, /d, /l, /u is mutually exclusive; specifying one un-specifies the others. They cannot be turned off with a leading -, only switched between. So (?-d:…) is a fatal error; (?dl:…) is also fatal (two charset flags together).

/p is special: its presence anywhere in a pattern has a global effect (and that effect is a no-op).

Order of modifiers#

On a match or substitution, the order of trailing modifiers is not significant. /mgis and /sgmi are identical. Pick a house style and stick to it.

How a regex is read — the four phases#

For modifiers and inline forms to work, you have to understand that Perl reads a regex in phases. The phases:

  1. Phase A: parser identifies the delimiter, finds the end of the pattern. (?#…) comments are removed here.

  2. Phase B: pattern is parsed as a double-quotish string. Variables interpolate, escape sequences cook, \Q…\E translates to quotemeta-style escaping.

  3. Phase C: under /x or /xx, unescaped whitespace and #-comments are stripped.

  4. Phase D: the regex compiler reads the cooked, stripped pattern and turns it into the engine’s internal form.

The order matters because phases B and D operate on different representations:

  • \Q$dir\E cooks at Phase B, before the regex compiler sees it. By Phase D the variable’s contents have already been metaquoted; the regex compiler sees a literal pattern.

  • \U…\E is interpreted by Phase B as a string-cook directive (uppercase the contents), which is almost certainly not what you want inside a regex. Use \Q and \E for regex purposes; \U only when you want the pattern’s literal text to be uppercased.

  • (?#comment) is removed before Phase B sees the pattern at all. A literal # inside (?#…) is fine. A literal ) is not — the comment ends at the first ).

The /x modifier applies at Phase C, between interpolation and compilation. That means whitespace introduced by an interpolated variable is not stripped by /x:

my $sub = " a b ";       # contains literal spaces
"axb" =~ / x $sub x /x;  # the spaces in $sub are NOT stripped

If you want the interpolated content to obey /x, you have to strip its whitespace at the source.

use re 'strict'#

use re 'strict' raises a number of normally-tolerated regex sloppiness conditions to compile-time errors. It is per-lexical- scope. In strict mode:

  • A { in a non-quantifier position is an error, not a literal.

  • [a-] (dash at end of class) is an error.

  • A useless negation of an always-on flag is an error.

  • A (?-d) (trying to turn off /d) is an error rather than a warning.

  • Unmatched [ in a string is an error.

Strict mode is recommended for new regex-heavy code, especially generated patterns where small typos go unnoticed in a normal build. It is not default because it would break working older patterns.

use re 'strict';

/abc{,1/;           # error in strict; warning otherwise
/[a-]/;             # error in strict

See also#

  • The unicode chapter — /a, /aa, /u, /l, /d in detail, plus what they do to character classes and case folding.

  • The substitution chapter — /e, /r, /g, and /c in their substitution-specific roles.

  • The basics chapter — the four-phase parsing model in its first appearance.

  • The anchors and assertions chapter — \A, \z, \Z interactions with /m.

  • The performance chapter — /o discussion and why qr// superseded it.

  • m, s, qr — operator references.