Modifiers#
Modifiers change how a pattern is matched without changing the pattern itself. They appear after the closing delimiter of a match, substitution, or qr//:
"Hello" =~ /hello/i; # case-insensitive match
$x =~ s/foo/bar/g; # global substitution
my $re = qr/\d+/i; # compiled pattern with /i
The same modifiers can be embedded inside a pattern with (?flags) or (?flags:…), localising their effect to part of the pattern.
The full modifier set#
Modifier | Effect |
|---|---|
| case-insensitive matching |
| multi-line: |
| single-line: |
| extended: ignore whitespace and |
| extended-extended: also ignore whitespace inside |
| global: match as many times as possible |
| do not reset |
|
|
|
|
|
|
| non-capturing: |
| no-op; accepted for compat |
| compile the pattern once (rarely needed; prefer |
|
|
|
|
| Unicode semantics regardless of |
| use current locale |
| dual-mode semantics — avoid in new code |
The first eight are the everyday modifiers; the charset modifiers /a, /u, /l, /d are covered in detail in the unicode chapter and summarised again here.
/i — case-insensitive#
"Hello" =~ /hello/i; # matches
"HELLO" =~ /[A-Z]/i; # matches — class is insensitive too
"Grüße" =~ /GRÜSSE/i; # matches under Unicode semantics
Unicode case folding includes mappings like ß → ss, German Eszett casefolding, and so on. The full table lives in the Unicode standard; for ASCII, /i does what you expect.
/i carries a negligible performance penalty for ASCII. The implementation folds case at compile time — the pattern carries the case-folded character set, the input is read once. Unicode case-folding is more involved (sequences like ß ↔ ss), but the extra work is bounded and rarely shows up in practice.
/m — multi-line#
Changes where ^ and $ match. Without /m, they match only at the outer ends of the string. With /m, they match at every embedded newline too.
my $x = "first\nsecond\nthird";
$x =~ /^second/; # does not match
$x =~ /^second/m; # matches — second is at start of a line
$x =~ /first$/m; # matches
$x =~ /third$/m; # matches
\A, \z, \Z remain absolute string anchors even under /m — see the anchors and assertions chapter.
/s — single-line#
Makes . match newline characters too.
my $x = "a\nb";
$x =~ /a.b/; # does not match — . does not cross \n
$x =~ /a.b/s; # matches
/m and /s are independent. They can both be used on the same match. Despite the names, they do not conflict:
$x =~ /^a.b$/sm; # . matches newline AND ^,$ are line-aware
/x — extended pattern#
Ignores literal whitespace and lets # introduce end-of-line comments. Crucial for any pattern more than a line long.
Before /x:
/^[+-]?\d+(\.\d*)?([eE][+-]?\d+)?$/;
After:
/^
[+-]? # optional sign
\d+ # integer part
(\.\d*)? # optional fraction
([eE][+-]?\d+)? # optional exponent
$/x;
Whitespace in the pattern is ignored; whitespace you want to match becomes \s, \ , or [ ]:
/\w+ \s+ \w+/x; # three tokens: word, space, word (literal spaces ignored)
/\w+\s+\w+/x; # equivalent
/key:[ ]value/x; # literal space via bracket class
/key:\ value/x; # literal space via backslash
# starts a comment that ends at the next newline. To match a literal # under /x, escape it or put it in a class.
The «Pattern White Space» set under /x follows Unicode UAX#31: SPACE, CHARACTER TABULATION, LINE FEED, LINE TABULATION, FORM FEED, CARRIAGE RETURN, NEXT LINE, PARAGRAPH SEPARATOR, LINE SEPARATOR. In practice you only meet ASCII space, tab, and newline.
/xx — extended in classes too#
Inside […] whitespace is not ignored under plain /x — [ab c] matches a, b, ' ', or c. To also ignore whitespace inside classes, use /xx:
/[ab c]/xx; # matches 'a', 'b', or 'c' — space ignored
/[ab\ c]/xx; # the \ is needed to match a literal space
/[ ! @ ]/xx; # matches '!' or '@' — visible spacing
/xx is a superset of /x. It is convenient for character classes laid out for readability, especially with shorthand classes:
/[ \d \s \-_,. ]+/xx; # digits, whitespace, common punctuation
/xx was added in Perl 5.26. Patterns built before that read fine; new code can reach for it freely.
/g — global#
In scalar context, keeps a position in the string (pos $x) and advances each time the pattern matches, allowing iteration:
my $x = "cat dog house";
while ($x =~ /(\w+)/g) {
print "$1 at ", pos($x), "\n";
}
# cat at 3
# dog at 7
# house at 13
In list context, returns all matches at once:
my @words = $x =~ /(\w+)/g; # ('cat', 'dog', 'house')
If the pattern contains no captures, list context returns the whole matched text for each match:
my @digits = "abc123def456" =~ /\d+/g; # ('123', '456')
With multiple captures, each iteration returns the tuple in order:
my @pairs = "a=1,b=2,c=3" =~ /(\w)=(\d)/g;
# ('a', '1', 'b', '2', 'c', '3')
/c — preserve position on failure#
By default, a failed /g match resets pos to undef. Under /gc, pos stays at its previous value — crucial for hand-rolled lexers:
my $s = "123abc";
while (1) {
if ($s =~ /\G(\d+)/gc) { print "num $1\n"; next; }
if ($s =~ /\G([a-z]+)/gc) { print "word $1\n"; next; }
last; # nothing matched; exit
}
See the anchors and assertions chapter for \G.
/r — non-destructive substitution#
s/// normally modifies the target string and returns the count. s///r leaves the target alone and returns the result:
my $name = " Alice ";
my $trimmed = $name =~ s/^\s+|\s+$//gr;
# $name is still " Alice "
# $trimmed is "Alice"
Enables substitution chains without intermediate variables:
my $clean = $input
=~ s/\s+/ /gr # collapse runs of whitespace
=~ s/^ | $//gr # trim ends
=~ s/[^\x00-\x7f]//gr; # drop non-ASCII
Each s///r returns the transformed string, which the next one receives.
/e — evaluate the replacement#
The replacement half of s///e is Perl code, not a double-quoted string. The return value of the code replaces the match:
my $x = "numbers: 1 2 3 4";
$x =~ s/(\d+)/$1 * 2/ge;
# $x is now "numbers: 2 4 6 8"
/ee evaluates twice: the code returns a string, which is then evaluated as Perl again. Rarely useful and easy to misuse — only reach for it when you are sure. The substitution chapter has worked examples.
/n — non-capturing#
Makes every ordinary (…) behave like (?:…). Useful when a long pattern has parentheses purely for grouping and you want to keep $1, $2, … unset:
"hello" =~ /(hi|hello)/n; # matches, but $1 is not set
Named captures (?<name>…) still capture under /n. Added in Perl 5.22.
/p — no-op#
/p is a silent no-op. ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} are always available. Accepted for backward compatibility, but contributes nothing in new code.
/o — compile once#
/o disables re-compilation of a pattern that interpolates variables. Compiled patterns are also cached automatically, and qr// gives explicit control. /o is rarely the right tool; qr// is clearer and composes.
Charset modifiers — /a, /aa, /u, /l, /d#
These control how the shorthand classes (\d, \w, \s), case folding, and the POSIX class set behave under Unicode and locale considerations. Brief summary here; the full treatment is in the unicode chapter.
Modifier | Behaviour |
|---|---|
| full Unicode semantics. Default under |
| restrict |
|
|
| follow the current |
| dual-mode semantics. Avoid in new code. |
/a, /u, /l, /d are mutually exclusive — at most one can be in effect at a time. /aa is a refinement of /a. They are set-once modifiers: an inner (?d:…) cannot un-set an outer /u.
The everyday choice in modern code: use v5.12 (or later) selects /u automatically, which is what you want for Unicode-correct text. Add /a only when the pattern must reject non-ASCII characters — typically because the input is known to be ASCII-only and you want to enforce that.
Inline modifiers#
(?flags) turns flags on for the rest of the enclosing group (or pattern):
/(?i)yes/; # case-insensitive, same as /yes/i
(?flags:…) scopes the flags to the inner group only and is non-capturing:
/Answer: ((?i)yes)/; # only the 'yes' is case-insensitive
/Answer: ((?i:yes))/; # clearer: scope is the group's contents
(?-flags) turns flags off. They can be combined:
/(?i-m:pattern)/; # turn on /i, turn off /m, within this group
Inline modifiers are the right tool when different parts of a long pattern need different modifiers. They are also useful in patterns built by interpolation, where the embedded (?i) travels with the pattern fragment.
Caret form: (?^…)#
(?^flags) is shorthand for «reset all flags, then apply these». The expansion is d-imnsx followed by the listed flags. So (?^x:…) means «default flags except /x is on, regardless of what flags were in scope outside».
The caret form is what Perl uses when stringifying compiled patterns; you may see it in error messages or qr// output. In hand-written code it is occasionally useful when interpolating a pattern fragment that should not inherit modifiers from its surroundings:
my $strict = qr/(?^:foo|bar)/; # always default flags
# ... no matter how this is interpolated.
A negative flag is not legal after the caret (it would be redundant — the caret already cleared everything).
Mutually-exclusive flags#
/a and /aa override each other (last one wins); same for /x and /xx. They are not additive. (?xx-x:…) turns all x behaviour off, not «subtract one x from two».
The charset family /a, /d, /l, /u is mutually exclusive; specifying one un-specifies the others. They cannot be turned off with a leading -, only switched between. So (?-d:…) is a fatal error; (?dl:…) is also fatal (two charset flags together).
/p is special: its presence anywhere in a pattern has a global effect (and that effect is a no-op).
Order of modifiers#
On a match or substitution, the order of trailing modifiers is not significant. /mgis and /sgmi are identical. Pick a house style and stick to it.
How a regex is read — the four phases#
For modifiers and inline forms to work, you have to understand that Perl reads a regex in phases. The phases:
Phase A: parser identifies the delimiter, finds the end of the pattern.
(?#…)comments are removed here.Phase B: pattern is parsed as a double-quotish string. Variables interpolate, escape sequences cook,
\Q…\Etranslates toquotemeta-style escaping.Phase C: under
/xor/xx, unescaped whitespace and#-comments are stripped.Phase D: the regex compiler reads the cooked, stripped pattern and turns it into the engine’s internal form.
The order matters because phases B and D operate on different representations:
\Q$dir\Ecooks at Phase B, before the regex compiler sees it. By Phase D the variable’s contents have already been metaquoted; the regex compiler sees a literal pattern.\U…\Eis interpreted by Phase B as a string-cook directive (uppercase the contents), which is almost certainly not what you want inside a regex. Use\Qand\Efor regex purposes;\Uonly when you want the pattern’s literal text to be uppercased.(?#comment)is removed before Phase B sees the pattern at all. A literal#inside(?#…)is fine. A literal)is not — the comment ends at the first).
The /x modifier applies at Phase C, between interpolation and compilation. That means whitespace introduced by an interpolated variable is not stripped by /x:
my $sub = " a b "; # contains literal spaces
"axb" =~ / x $sub x /x; # the spaces in $sub are NOT stripped
If you want the interpolated content to obey /x, you have to strip its whitespace at the source.
use re 'strict'#
use re 'strict' raises a number of normally-tolerated regex sloppiness conditions to compile-time errors. It is per-lexical- scope. In strict mode:
A
{in a non-quantifier position is an error, not a literal.[a-](dash at end of class) is an error.A useless negation of an always-on flag is an error.
A
(?-d)(trying to turn off/d) is an error rather than a warning.Unmatched
[in a string is an error.
Strict mode is recommended for new regex-heavy code, especially generated patterns where small typos go unnoticed in a normal build. It is not default because it would break working older patterns.
use re 'strict';
/abc{,1/; # error in strict; warning otherwise
/[a-]/; # error in strict
See also#
The unicode chapter —
/a,/aa,/u,/l,/din detail, plus what they do to character classes and case folding.The substitution chapter —
/e,/r,/g, and/cin their substitution-specific roles.The basics chapter — the four-phase parsing model in its first appearance.
The anchors and assertions chapter —
\A,\z,\Zinteractions with/m.The performance chapter —
/odiscussion and whyqr//superseded it.