--- name: regular expressions guide --- # Regular expressions A regular expression (regexp, regex) is a pattern that decides whether a string has a given shape, or pulls pieces out of a string that does. Perl treats regexps as a first-class sublanguage: they appear wherever you match (`m//`), substitute (`s///`), quote a pattern (`qr//`), or split on a separator (`split`). This guide teaches the language from the ground up. Each chapter covers one topic, starts with the common case, and moves on to the edge cases you hit once the common case stops working. ## Who this is for Readers who know Perl well enough to use scalars, arrays, and hashes, but treat regexps as something to copy from elsewhere and hope works. After reading, you will read an unfamiliar pattern and know what it will match — and when it will refuse. ## How this guide is organised The chapters are meant to be read in order on a first pass, but each stands on its own for later reference. ```{toctree} :maxdepth: 1 basics character-classes anchors-and-assertions quantifiers groups-and-captures alternation modifiers substitution unicode performance ``` - **Basics** — `m//`, `s///`, the binding operators `=~` and `!~`, what counts as a metacharacter, how to escape. - **Character classes** — bracketed classes `[…]`, negated classes, shorthand `\d` `\w` `\s`, POSIX classes, Unicode properties. - **Anchors and assertions** — `^`, `$`, `\b`, `\A`, `\z`, `\G`, and lookahead / lookbehind. - **Quantifiers** — `*`, `+`, `?`, `{n,m}`, greedy vs. non-greedy vs. possessive. - **Groups and captures** — `(...)`, `(?:...)`, named captures, backreferences, position arrays. - **Alternation** — `|`, precedence, branch reset `(?|...)`. - **Modifiers** — `/i`, `/m`, `/s`, `/x`, `/g`, `/c`, `/r`, `/n`, `/p`, `/a`, `/u`, `/l`, `/d`, and inline forms `(?i)`…`(?-i)`. - **Substitution** — `s///` in depth: the replacement string, `/e`, chained `/r`. - **Unicode** — `\p{…}`, `\P{…}`, `\X`, scripts, the charset modifiers. - **Performance** — why some regexps blow up, atomic groups, possessive quantifiers, recursion and `(?{…})` assertions. ## A first round-trip The shortest useful regexp program — find three-letter words repeated back to back, separated by a single space: ```perl my $text = "I said the the other day"; if ($text =~ /\b(\w{3})\s\1\b/) { print "Repeated: $1\n"; # Repeated: the } ``` - `\b` is a word boundary — the pattern only fires at word edges. - `(\w{3})` captures exactly three word characters into `$1`. - `\s` is one whitespace character between the two copies. - `\1` is the backreference: match the same three characters again. Every more complicated pattern in the tutorial is layered on that idea — anchor where you need to, describe what you want, capture what you want back. ## Conventions in the examples - Example output appears as an inline `# …` comment beside the expression that produces it. - Examples assume default modifiers unless they show `/i`, `/x`, etc. - Where a pattern is shown on its own (no `=~`), treat it as a fragment the surrounding text will combine with a string. - When a Unicode character is needed, examples use `\x{263a}` or `\N{GREEK SMALL LETTER SIGMA}` so the rendered text matches the source. ## Related reference pages The tutorial teaches; the reference pages define. - [`perlre`](../../p5/core/perlre) — the complete regexp syntax reference, including features skipped here. - [`m`](../../p5/core/perlfunc/m) — the match operator. - [`s`](../../p5/core/perlfunc/s) — substitution. - [`qr`](../../p5/core/perlfunc/qr) — compile a pattern for reuse. - [`split`](../../p5/core/perlfunc/split) — split a string on a regexp. - [`pos`](../../p5/core/perlfunc/pos) — read or set the `/g` match position. - [`quotemeta`](../../p5/core/perlfunc/quotemeta) — escape metacharacters in a string. - [`tr`](../../p5/core/perlfunc/tr) — character-by-character translation, often confused with regexp substitution but unrelated in semantics.