Modifiers#

Modifiers change how a pattern is matched without changing the pattern itself. They appear after the closing delimiter of a match, substitution, or qr//:

"Hello" =~ /hello/i;        # case-insensitive match
$x =~ s/foo/bar/g;          # global substitution
my $re = qr/\d+/i;          # compiled pattern with /i

The same modifiers can be embedded inside a pattern with (?flags) or (?flags:…), localising their effect to part of the pattern.

The full modifier set#

Modifier	Effect
`/i`	case-insensitive matching
`/m`	multi-line: `^` and `$` match at every `\n`
`/s`	single-line: `.` matches `\n` too
`/x`	extended: ignore whitespace and `#` comments
`/xx`	extended-extended: also ignore whitespace inside `[…]`
`/g`	global: match as many times as possible
`/c`	do not reset `pos` on failure (with `/g`)
`/r`	`s///r` returns the result instead of modifying
`/e`	`s///e` evaluates the replacement as Perl code
`/ee`	`s///ee` evaluates, then evaluates the result too
`/n`	non-capturing: `(…)` act like `(?:…)`
`/p`	no-op; accepted for compat
`/o`	compile the pattern once (rarely needed; prefer `qr//`)
`/a`	`\d`, `\w`, `\s` restricted to ASCII
`/aa`	`/a`, plus `/i` does not cross ASCII/non-ASCII boundary
`/u`	Unicode semantics regardless of `use utf8`
`/l`	use current locale
`/d`	dual-mode semantics - avoid in new code

The first eight are the everyday modifiers; the charset modifiers /a, /u, /l, /d are covered in detail in the unicode chapter and summarised again here.

`/i` - case-insensitive#

"Hello" =~ /hello/i;          # matches
"HELLO" =~ /[A-Z]/i;          # matches - class is insensitive too
"Grüße" =~ /GRÜSSE/i;         # matches under Unicode semantics

Unicode case folding includes mappings like ß → ss, German Eszett casefolding, and so on. The full table lives in the Unicode standard; for ASCII, /i does what you expect.

/i carries a negligible performance penalty for ASCII. The implementation folds case at compile time - the pattern carries the case-folded character set, the input is read once. Unicode case-folding is more involved (sequences like ß ↔ ss), but the extra work is bounded and rarely shows up in practice.

`/m` - multi-line#

Changes where ^ and $ match. Without /m, they match only at the outer ends of the string. With /m, they match at every embedded newline too.

my $x = "first\nsecond\nthird";

$x =~ /^second/;    # does not match
$x =~ /^second/m;   # matches - second is at start of a line
$x =~ /first$/m;    # matches
$x =~ /third$/m;    # matches

\A, \z, \Z remain absolute string anchors even under /m

see the anchors and assertions chapter.

`/s` - single-line#

Makes . match newline characters too.

my $x = "a\nb";
$x =~ /a.b/;        # does not match - . does not cross \n
$x =~ /a.b/s;       # matches

/m and /s are independent. They can both be used on the same match. Despite the names, they do not conflict:

$x =~ /^a.b$/sm;    # . matches newline AND ^,$ are line-aware

`/x` - extended pattern#

Ignores literal whitespace and lets # introduce end-of-line comments. Crucial for any pattern more than a line long.

Before /x:

/^[+-]?\d+(\.\d*)?([eE][+-]?\d+)?$/;

After:

/^
    [+-]?           # optional sign
    \d+             # integer part
    (\.\d*)?        # optional fraction
    ([eE][+-]?\d+)? # optional exponent
 $/x;

Whitespace in the pattern is ignored; whitespace you want to match becomes \s, \ , or [ ]:

/\w+ \s+ \w+/x;      # three tokens: word, space, word (literal spaces ignored)
/\w+\s+\w+/x;        # equivalent
/key:[ ]value/x;     # literal space via bracket class
/key:\ value/x;      # literal space via backslash

# starts a comment that ends at the next newline. To match a literal # under /x, escape it or put it in a class.

The “Pattern White Space” set under /x follows Unicode UAX#31: SPACE, CHARACTER TABULATION, LINE FEED, LINE TABULATION, FORM FEED, CARRIAGE RETURN, NEXT LINE, PARAGRAPH SEPARATOR, LINE SEPARATOR. In practice you only meet ASCII space, tab, and newline.

`/xx` - extended in classes too#

Inside […] whitespace is not ignored under plain /x - [ab c] matches a, b, ' ', or c. To also ignore whitespace inside classes, use /xx:

/[ab c]/xx;          # matches 'a', 'b', or 'c' - space ignored
/[ab\ c]/xx;         # the \ is needed to match a literal space
/[ ! @ ]/xx;         # matches '!' or '@' - visible spacing

/xx is a superset of /x. It is convenient for character classes laid out for readability, especially with shorthand classes:

/[ \d \s \-_,. ]+/xx;   # digits, whitespace, common punctuation

/xx was added in Perl 5.26. Patterns built before that read fine; new code can reach for it freely.

`/g` - global#

In scalar context, keeps a position in the string (pos $x) and advances each time the pattern matches, allowing iteration:

my $x = "cat dog house";
while ($x =~ /(\w+)/g) {
    print "$1 at ", pos($x), "\n";
}
# cat at 3
# dog at 7
# house at 13

In list context, returns all matches at once:

my @words = $x =~ /(\w+)/g;   # ('cat', 'dog', 'house')

If the pattern contains no captures, list context returns the whole matched text for each match:

my @digits = "abc123def456" =~ /\d+/g;   # ('123', '456')

With multiple captures, each iteration returns the tuple in order:

my @pairs = "a=1,b=2,c=3" =~ /(\w)=(\d)/g;
# ('a', '1', 'b', '2', 'c', '3')

`/c` - preserve position on failure#

By default, a failed /g match resets pos to undef. Under /gc, pos stays at its previous value - crucial for hand-rolled lexers:

my $s = "123abc";
while (1) {
    if ($s =~ /\G(\d+)/gc)    { print "num $1\n"; next; }
    if ($s =~ /\G([a-z]+)/gc) { print "word $1\n"; next; }
    last;   # nothing matched; exit
}

See the anchors and assertions chapter for \G.

`/r` - non-destructive substitution#

s/// normally modifies the target string and returns the count. s///r leaves the target alone and returns the result:

my $name = "  Alice  ";
my $trimmed = $name =~ s/^\s+|\s+$//gr;
# $name is still "  Alice  "
# $trimmed is "Alice"

Enables substitution chains without intermediate variables:

my $clean = $input
    =~ s/\s+/ /gr          # collapse runs of whitespace
    =~ s/^ | $//gr         # trim ends
    =~ s/[^\x00-\x7f]//gr; # drop non-ASCII

Each s///r returns the transformed string, which the next one receives.

`/e` - evaluate the replacement#

The replacement half of s///e is Perl code, not a double-quoted string. The return value of the code replaces the match:

my $x = "numbers: 1 2 3 4";
$x =~ s/(\d+)/$1 * 2/ge;
# $x is now "numbers: 2 4 6 8"

/ee evaluates twice: the code returns a string, which is then evaluated as Perl again. Rarely useful and easy to misuse - only reach for it when you are sure. The substitution chapter has worked examples.

`/n` - non-capturing#

Makes every ordinary (…) behave like (?:…). Useful when a long pattern has parentheses purely for grouping and you want to keep $1, $2, … unset:

"hello" =~ /(hi|hello)/n;    # matches, but $1 is not set

Named captures (?<name>…) still capture under /n. Added in Perl 5.22.

`/p` - no-op#

/p is a silent no-op. ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} are always available. Accepted for backward compatibility, but contributes nothing in new code.

`/o` - compile once#

/o disables re-compilation of a pattern that interpolates variables. Compiled patterns are also cached automatically, and qr// gives explicit control. /o is rarely the right tool; qr// is clearer and composes.

Charset modifiers - `/a`, `/aa`, `/u`, `/l`, `/d`#

These control how the shorthand classes (\d, \w, \s), case folding, and the POSIX class set behave under Unicode and locale considerations. Brief summary here; the full treatment is in the unicode chapter.

Modifier	Behaviour
`/u`	full Unicode semantics. Default under `use v5.12+`.
`/a`	restrict `\d`, `\w`, `\s`, `[[:…:]]` to ASCII.
`/aa`	`/a` plus `/i` does not cross ASCII/non-ASCII.
`/l`	follow the current `use locale` POSIX locale.
`/d`	dual-mode semantics. Avoid in new code.

/a, /u, /l, /d are mutually exclusive - at most one can be in effect at a time. /aa is a refinement of /a. They are set-once modifiers: an inner (?d:…) cannot un-set an outer /u.

The everyday choice in modern code: use v5.12 (or later) selects /u automatically, which is what you want for Unicode-correct text. Add /a only when the pattern must reject non-ASCII characters - typically because the input is known to be ASCII-only and you want to enforce that.

Inline modifiers#

(?flags) turns flags on for the rest of the enclosing group (or pattern):

/(?i)yes/;         # case-insensitive, same as /yes/i

(?flags:…) scopes the flags to the inner group only and is non-capturing:

/Answer: ((?i)yes)/;   # only the 'yes' is case-insensitive
/Answer: ((?i:yes))/;  # clearer: scope is the group's contents

(?-flags) turns flags off. They can be combined:

/(?i-m:pattern)/;   # turn on /i, turn off /m, within this group

Inline modifiers are the right tool when different parts of a long pattern need different modifiers. They are also useful in patterns built by interpolation, where the embedded (?i) travels with the pattern fragment.

Caret form: `(?^…)`#

(?^flags) is shorthand for “reset all flags, then apply these”. The expansion is d-imnsx followed by the listed flags. So (?^x:…) means “default flags except /x is on, regardless of what flags were in scope outside”.

The caret form is what Perl uses when stringifying compiled patterns; you may see it in error messages or qr// output. In hand-written code it is occasionally useful when interpolating a pattern fragment that should not inherit modifiers from its surroundings:

my $strict = qr/(?^:foo|bar)/;   # always default flags
# ... no matter how this is interpolated.

A negative flag is not legal after the caret (it would be redundant - the caret already cleared everything).

Mutually-exclusive flags#

/a and /aa override each other (last one wins); same for /x and /xx. They are not additive. (?xx-x:…) turns all x behaviour off, not “subtract one x from two”.

The charset family /a, /d, /l, /u is mutually exclusive; specifying one un-specifies the others. They cannot be turned off with a leading -, only switched between. So (?-d:…) is a fatal error; (?dl:…) is also fatal (two charset flags together).

/p is special: its presence anywhere in a pattern has a global effect (and that effect is a no-op).

Order of modifiers#

On a match or substitution, the order of trailing modifiers is not significant. /mgis and /sgmi are identical. Pick a house style and stick to it.

How a regex is read - the four phases#

For modifiers and inline forms to work, you have to understand that Perl reads a regex in phases. The phases:

Phase A: parser identifies the delimiter, finds the end of the pattern. (?#…) comments are removed here.
Phase B: pattern is parsed as a double-quotish string. Variables interpolate, escape sequences cook, \Q…\E translates to quotemeta-style escaping.
Phase C: under /x or /xx, unescaped whitespace and #-comments are stripped.
Phase D: the regex compiler reads the cooked, stripped pattern and turns it into the engine’s internal form.

The order matters because phases B and D operate on different representations:

\Q$dir\E cooks at Phase B, before the regex compiler sees it. By Phase D the variable’s contents have already been metaquoted; the regex compiler sees a literal pattern.
\U…\E is interpreted by Phase B as a string-cook directive (uppercase the contents), which is almost certainly not what you want inside a regex. Use \Q and \E for regex purposes; \U only when you want the pattern’s literal text to be uppercased.
(?#comment) is removed before Phase B sees the pattern at all. A literal # inside (?#…) is fine. A literal ) is not - the comment ends at the first ).

The /x modifier applies at Phase C, between interpolation and compilation. That means whitespace introduced by an interpolated variable is not stripped by /x:

my $sub = " a b ";       # contains literal spaces
"axb" =~ / x $sub x /x;  # the spaces in $sub are NOT stripped

If you want the interpolated content to obey /x, you have to strip its whitespace at the source.

`use re 'strict'`#

use re 'strict' raises a number of normally-tolerated regex sloppiness conditions to compile-time errors. It is per-lexical- scope. In strict mode:

A { in a non-quantifier position is an error, not a literal.
[a-] (dash at end of class) is an error.
A useless negation of an always-on flag is an error.
A (?-d) (trying to turn off /d) is an error rather than a warning.
Unmatched [ in a string is an error.

Strict mode is recommended for new regex-heavy code, especially generated patterns where small typos go unnoticed in a normal build. It is not default because it would break working older patterns.

use re 'strict';

/abc{,1/;           # error in strict; warning otherwise
/[a-]/;             # error in strict

Modifiers#

The full modifier set#

/i - case-insensitive#

/m - multi-line#

/s - single-line#

/x - extended pattern#

/xx - extended in classes too#

/g - global#

/c - preserve position on failure#

/r - non-destructive substitution#

/e - evaluate the replacement#

/n - non-capturing#

/p - no-op#

/o - compile once#

Charset modifiers - /a, /aa, /u, /l, /d#