# Modifiers Modifiers change how a pattern is matched without changing the pattern itself. They appear after the closing delimiter of a match, substitution, or `qr//`: ```perl "Hello" =~ /hello/i; # case-insensitive match $x =~ s/foo/bar/g; # global substitution my $re = qr/\d+/i; # compiled pattern with /i ``` The same modifiers can be embedded inside a pattern with `(?flags)` or `(?flags:…)`, localising their effect to part of the pattern. ## The full modifier set | Modifier | Effect | |------------|---------------------------------------------------------| | `/i` | case-insensitive matching | | `/m` | multi-line: `^` and `$` match at every `\n` | | `/s` | single-line: `.` matches `\n` too | | `/x` | extended: ignore whitespace and `#` comments | | `/xx` | extended-extended: also ignore whitespace inside `[…]` | | `/g` | global: match as many times as possible | | `/c` | do not reset `pos` on failure (with `/g`) | | `/r` | `s///r` returns the result instead of modifying | | `/e` | `s///e` evaluates the replacement as Perl code | | `/ee` | `s///ee` evaluates, then evaluates the result too | | `/n` | non-capturing: `(…)` act like `(?:…)` | | `/p` | no-op; accepted for compat | | `/o` | compile the pattern once (rarely needed; prefer `qr//`) | | `/a` | `\d`, `\w`, `\s` restricted to ASCII | | `/aa` | `/a`, plus `/i` does not cross ASCII/non-ASCII boundary | | `/u` | Unicode semantics regardless of `use utf8` | | `/l` | use current locale | | `/d` | dual-mode semantics — avoid in new code | The first eight are the everyday modifiers; the charset modifiers `/a`, `/u`, `/l`, `/d` are covered in detail in the [unicode](unicode.md) chapter and summarised again here. ## `/i` — case-insensitive ```perl "Hello" =~ /hello/i; # matches "HELLO" =~ /[A-Z]/i; # matches — class is insensitive too "Grüße" =~ /GRÜSSE/i; # matches under Unicode semantics ``` Unicode case folding includes mappings like `ß → ss`, German Eszett casefolding, and so on. The full table lives in the Unicode standard; for ASCII, `/i` does what you expect. `/i` carries a negligible performance penalty for ASCII. The implementation folds case at *compile time* — the pattern carries the case-folded character set, the input is read once. Unicode case-folding is more involved (sequences like `ß ↔ ss`), but the extra work is bounded and rarely shows up in practice. ## `/m` — multi-line Changes where `^` and `$` match. Without `/m`, they match only at the outer ends of the string. With `/m`, they match at every embedded newline too. ```perl my $x = "first\nsecond\nthird"; $x =~ /^second/; # does not match $x =~ /^second/m; # matches — second is at start of a line $x =~ /first$/m; # matches $x =~ /third$/m; # matches ``` `\A`, `\z`, `\Z` remain absolute string anchors even under `/m` — see the [anchors and assertions](anchors-and-assertions.md) chapter. ## `/s` — single-line Makes `.` match newline characters too. ```perl my $x = "a\nb"; $x =~ /a.b/; # does not match — . does not cross \n $x =~ /a.b/s; # matches ``` `/m` and `/s` are independent. They can both be used on the same match. Despite the names, they do not conflict: ```perl $x =~ /^a.b$/sm; # . matches newline AND ^,$ are line-aware ``` ## `/x` — extended pattern Ignores literal whitespace and lets `#` introduce end-of-line comments. Crucial for any pattern more than a line long. Before `/x`: ```perl /^[+-]?\d+(\.\d*)?([eE][+-]?\d+)?$/; ``` After: ```perl /^ [+-]? # optional sign \d+ # integer part (\.\d*)? # optional fraction ([eE][+-]?\d+)? # optional exponent $/x; ``` Whitespace in the *pattern* is ignored; whitespace you want to match becomes `\s`, `\ `, or `[ ]`: ```perl /\w+ \s+ \w+/x; # three tokens: word, space, word (literal spaces ignored) /\w+\s+\w+/x; # equivalent /key:[ ]value/x; # literal space via bracket class /key:\ value/x; # literal space via backslash ``` `#` starts a comment that ends at the next newline. To match a literal `#` under `/x`, escape it or put it in a class. The «Pattern White Space» set under `/x` follows Unicode UAX#31: SPACE, CHARACTER TABULATION, LINE FEED, LINE TABULATION, FORM FEED, CARRIAGE RETURN, NEXT LINE, PARAGRAPH SEPARATOR, LINE SEPARATOR. In practice you only meet ASCII space, tab, and newline. ## `/xx` — extended in classes too Inside `[…]` whitespace is *not* ignored under plain `/x` — `[ab c]` matches `a`, `b`, `' '`, or `c`. To also ignore whitespace inside classes, use `/xx`: ```perl /[ab c]/xx; # matches 'a', 'b', or 'c' — space ignored /[ab\ c]/xx; # the \ is needed to match a literal space /[ ! @ ]/xx; # matches '!' or '@' — visible spacing ``` `/xx` is a superset of `/x`. It is convenient for character classes laid out for readability, especially with shorthand classes: ```perl /[ \d \s \-_,. ]+/xx; # digits, whitespace, common punctuation ``` `/xx` was added in Perl 5.26. Patterns built before that read fine; new code can reach for it freely. ## `/g` — global In scalar context, keeps a position in the string (`pos $x`) and advances each time the pattern matches, allowing iteration: ```perl my $x = "cat dog house"; while ($x =~ /(\w+)/g) { print "$1 at ", pos($x), "\n"; } # cat at 3 # dog at 7 # house at 13 ``` In list context, returns all matches at once: ```perl my @words = $x =~ /(\w+)/g; # ('cat', 'dog', 'house') ``` If the pattern contains no captures, list context returns the whole matched text for each match: ```perl my @digits = "abc123def456" =~ /\d+/g; # ('123', '456') ``` With multiple captures, each iteration returns the tuple in order: ```perl my @pairs = "a=1,b=2,c=3" =~ /(\w)=(\d)/g; # ('a', '1', 'b', '2', 'c', '3') ``` ## `/c` — preserve position on failure By default, a failed `/g` match resets `pos` to undef. Under `/gc`, `pos` stays at its previous value — crucial for hand-rolled lexers: ```perl my $s = "123abc"; while (1) { if ($s =~ /\G(\d+)/gc) { print "num $1\n"; next; } if ($s =~ /\G([a-z]+)/gc) { print "word $1\n"; next; } last; # nothing matched; exit } ``` See the [anchors and assertions](anchors-and-assertions.md) chapter for `\G`. ## `/r` — non-destructive substitution `s///` normally modifies the target string and returns the count. `s///r` leaves the target alone and returns the result: ```perl my $name = " Alice "; my $trimmed = $name =~ s/^\s+|\s+$//gr; # $name is still " Alice " # $trimmed is "Alice" ``` Enables substitution chains without intermediate variables: ```perl my $clean = $input =~ s/\s+/ /gr # collapse runs of whitespace =~ s/^ | $//gr # trim ends =~ s/[^\x00-\x7f]//gr; # drop non-ASCII ``` Each `s///r` returns the transformed string, which the next one receives. ## `/e` — evaluate the replacement The replacement half of `s///e` is Perl code, not a double-quoted string. The return value of the code replaces the match: ```perl my $x = "numbers: 1 2 3 4"; $x =~ s/(\d+)/$1 * 2/ge; # $x is now "numbers: 2 4 6 8" ``` `/ee` evaluates twice: the code returns a string, which is then evaluated as Perl again. Rarely useful and easy to misuse — only reach for it when you are sure. The [substitution](substitution.md) chapter has worked examples. ## `/n` — non-capturing Makes every ordinary `(…)` behave like `(?:…)`. Useful when a long pattern has parentheses purely for grouping and you want to keep `$1`, `$2`, … unset: ```perl "hello" =~ /(hi|hello)/n; # matches, but $1 is not set ``` Named captures `(?…)` still capture under `/n`. Added in Perl 5.22. ## `/p` — no-op `/p` is a silent no-op. `${^PREMATCH}`, `${^MATCH}`, `${^POSTMATCH}` are always available. Accepted for backward compatibility, but contributes nothing in new code. ## `/o` — compile once `/o` disables re-compilation of a pattern that interpolates variables. Compiled patterns are also cached automatically, and `qr//` gives explicit control. `/o` is rarely the right tool; `qr//` is clearer and composes. ## Charset modifiers — `/a`, `/aa`, `/u`, `/l`, `/d` These control how the shorthand classes (`\d`, `\w`, `\s`), case folding, and the POSIX class set behave under Unicode and locale considerations. Brief summary here; the full treatment is in the [unicode](unicode.md) chapter. | Modifier | Behaviour | |------------|-----------------------------------------------------| | `/u` | full Unicode semantics. Default under `use v5.12+`. | | `/a` | restrict `\d`, `\w`, `\s`, `[[:…:]]` to ASCII. | | `/aa` | `/a` plus `/i` does not cross ASCII/non-ASCII. | | `/l` | follow the current `use locale` POSIX locale. | | `/d` | dual-mode semantics. Avoid in new code. | `/a`, `/u`, `/l`, `/d` are mutually exclusive — at most one can be in effect at a time. `/aa` is a refinement of `/a`. They are *set-once* modifiers: an inner `(?d:…)` cannot un-set an outer `/u`. The everyday choice in modern code: `use v5.12` (or later) selects `/u` automatically, which is what you want for Unicode-correct text. Add `/a` only when the pattern must reject non-ASCII characters — typically because the input is known to be ASCII-only and you want to enforce that. ## Inline modifiers `(?flags)` turns flags on for the rest of the enclosing group (or pattern): ```perl /(?i)yes/; # case-insensitive, same as /yes/i ``` `(?flags:…)` scopes the flags to the inner group only and is non-capturing: ```perl /Answer: ((?i)yes)/; # only the 'yes' is case-insensitive /Answer: ((?i:yes))/; # clearer: scope is the group's contents ``` `(?-flags)` turns flags off. They can be combined: ```perl /(?i-m:pattern)/; # turn on /i, turn off /m, within this group ``` Inline modifiers are the right tool when different parts of a long pattern need different modifiers. They are also useful in patterns built by interpolation, where the embedded `(?i)` travels with the pattern fragment. ### Caret form: `(?^…)` `(?^flags)` is shorthand for «reset all flags, then apply these». The expansion is `d-imnsx` followed by the listed flags. So `(?^x:…)` means «default flags except `/x` is on, regardless of what flags were in scope outside». The caret form is what Perl uses when stringifying compiled patterns; you may see it in error messages or `qr//` output. In hand-written code it is occasionally useful when interpolating a pattern fragment that should not inherit modifiers from its surroundings: ```perl my $strict = qr/(?^:foo|bar)/; # always default flags # ... no matter how this is interpolated. ``` A negative flag is not legal after the caret (it would be redundant — the caret already cleared everything). ### Mutually-exclusive flags `/a` and `/aa` override each other (last one wins); same for `/x` and `/xx`. They are not additive. `(?xx-x:…)` turns *all* `x` behaviour off, not «subtract one `x` from two». The charset family `/a`, `/d`, `/l`, `/u` is mutually exclusive; specifying one un-specifies the others. They cannot be turned *off* with a leading `-`, only switched between. So `(?-d:…)` is a fatal error; `(?dl:…)` is also fatal (two charset flags together). `/p` is special: its presence anywhere in a pattern has a global effect (and that effect is a no-op). ## Order of modifiers On a match or substitution, the order of trailing modifiers is not significant. `/mgis` and `/sgmi` are identical. Pick a house style and stick to it. ## How a regex is read — the four phases For modifiers and inline forms to work, you have to understand that Perl reads a regex in *phases*. The phases: 1. **Phase A**: parser identifies the delimiter, finds the end of the pattern. `(?#…)` comments are removed here. 2. **Phase B**: pattern is parsed as a *double-quotish string*. Variables interpolate, escape sequences cook, `\Q…\E` translates to `quotemeta`-style escaping. 3. **Phase C**: under `/x` or `/xx`, unescaped whitespace and `#`-comments are stripped. 4. **Phase D**: the regex compiler reads the cooked, stripped pattern and turns it into the engine’s internal form. The order matters because phases B and D operate on different representations: - `\Q$dir\E` cooks at Phase B, before the regex compiler sees it. By Phase D the variable’s contents have already been metaquoted; the regex compiler sees a literal pattern. - `\U…\E` is interpreted by Phase B as a string-cook directive (uppercase the contents), which is almost certainly *not* what you want inside a regex. Use `\Q` and `\E` for regex purposes; `\U` only when you want the pattern’s *literal text* to be uppercased. - `(?#comment)` is removed before Phase B sees the pattern at all. A literal `#` inside `(?#…)` is fine. A literal `)` is not — the comment ends at the first `)`. The `/x` modifier applies at Phase C, between interpolation and compilation. That means whitespace introduced by an interpolated variable is *not* stripped by `/x`: ```perl my $sub = " a b "; # contains literal spaces "axb" =~ / x $sub x /x; # the spaces in $sub are NOT stripped ``` If you want the interpolated content to obey `/x`, you have to strip its whitespace at the source. ## `use re 'strict'` `use re 'strict'` raises a number of normally-tolerated regex sloppiness conditions to compile-time errors. It is per-lexical- scope. In strict mode: - A `{` in a non-quantifier position is an error, not a literal. - `[a-]` (dash at end of class) is an error. - A useless negation of an always-on flag is an error. - A `(?-d)` (trying to turn off `/d`) is an error rather than a warning. - Unmatched `[` in a string is an error. Strict mode is recommended for new regex-heavy code, especially generated patterns where small typos go unnoticed in a normal build. It is not default because it would break working older patterns. ```perl use re 'strict'; /abc{,1/; # error in strict; warning otherwise /[a-]/; # error in strict ``` ## See also - The [unicode](unicode.md) chapter — `/a`, `/aa`, `/u`, `/l`, `/d` in detail, plus what they do to character classes and case folding. - The [substitution](substitution.md) chapter — `/e`, `/r`, `/g`, and `/c` in their substitution-specific roles. - The [basics](basics.md) chapter — the four-phase parsing model in its first appearance. - The [anchors and assertions](anchors-and-assertions.md) chapter — `\A`, `\z`, `\Z` interactions with `/m`. - The [performance](performance.md) chapter — `/o` discussion and why `qr//` superseded it. - [`m`](../../p5/core/perlfunc/m.md), [`s`](../../p5/core/perlfunc/s.md), [`qr`](../../p5/core/perlfunc/qr.md) — operator references.