---
name: regular expressions guide
---
# Regular expressions

A regular expression (regexp, regex) is a pattern that decides whether
a string has a given shape, or pulls pieces out of a string that does.
Perl treats regexps as a first-class sublanguage: they appear wherever
you match (`m//`), substitute (`s///`), quote a pattern (`qr//`), or
split on a separator (`split`).

This guide teaches the language from the ground up. Each chapter
covers one topic, starts with the common case, and moves on to the
edge cases you hit once the common case stops working.

## Who this is for

Readers who know Perl well enough to use scalars, arrays, and hashes,
but treat regexps as something to copy from elsewhere and hope works.
After reading, you will read an unfamiliar pattern and know what it
will match — and when it will refuse.

## How this guide is organised

The chapters are meant to be read in order on a first pass, but each
stands on its own for later reference.

```{toctree}
:maxdepth: 1

basics
character-classes
anchors-and-assertions
quantifiers
groups-and-captures
alternation
modifiers
substitution
unicode
performance
```

- **Basics** — `m//`, `s///`, the binding operators `=~` and `!~`,
  what counts as a metacharacter, how to escape.
- **Character classes** — bracketed classes `[…]`, negated classes,
  shorthand `\d` `\w` `\s`, POSIX classes, Unicode properties.
- **Anchors and assertions** — `^`, `$`, `\b`, `\A`, `\z`, `\G`, and
  lookahead / lookbehind.
- **Quantifiers** — `*`, `+`, `?`, `{n,m}`, greedy vs. non-greedy vs.
  possessive.
- **Groups and captures** — `(...)`, `(?:...)`, named captures,
  backreferences, position arrays.
- **Alternation** — `|`, precedence, branch reset `(?|...)`.
- **Modifiers** — `/i`, `/m`, `/s`, `/x`, `/g`, `/c`, `/r`, `/n`,
  `/p`, `/a`, `/u`, `/l`, `/d`, and inline forms `(?i)`…`(?-i)`.
- **Substitution** — `s///` in depth: the replacement string, `/e`,
  chained `/r`.
- **Unicode** — `\p{…}`, `\P{…}`, `\X`, scripts, the charset modifiers.
- **Performance** — why some regexps blow up, atomic groups, possessive
  quantifiers, recursion and `(?{…})` assertions.

## A first round-trip

The shortest useful regexp program — find three-letter words repeated
back to back, separated by a single space:

```perl
my $text = "I said the the other day";
if ($text =~ /\b(\w{3})\s\1\b/) {
    print "Repeated: $1\n";        # Repeated: the
}
```

- `\b` is a word boundary — the pattern only fires at word edges.
- `(\w{3})` captures exactly three word characters into `$1`.
- `\s` is one whitespace character between the two copies.
- `\1` is the backreference: match the same three characters again.

Every more complicated pattern in the tutorial is layered on that
idea — anchor where you need to, describe what you want, capture
what you want back.

## Conventions in the examples

- Example output appears as an inline `# …` comment beside the
  expression that produces it.
- Examples assume default modifiers unless they show `/i`, `/x`, etc.
- Where a pattern is shown on its own (no `=~`), treat it as a
  fragment the surrounding text will combine with a string.
- When a Unicode character is needed, examples use `\x{263a}` or
  `\N{GREEK SMALL LETTER SIGMA}` so the rendered text matches the
  source.

## Related reference pages

The tutorial teaches; the reference pages define.

- [`perlre`](../../p5/core/perlre) — the complete regexp syntax
  reference, including features skipped here.
- [`m`](../../p5/core/perlfunc/m) — the match operator.
- [`s`](../../p5/core/perlfunc/s) — substitution.
- [`qr`](../../p5/core/perlfunc/qr) — compile a pattern for reuse.
- [`split`](../../p5/core/perlfunc/split) — split a string on a regexp.
- [`pos`](../../p5/core/perlfunc/pos) — read or set the `/g` match
  position.
- [`quotemeta`](../../p5/core/perlfunc/quotemeta) — escape
  metacharacters in a string.
- [`tr`](../../p5/core/perlfunc/tr) — character-by-character
  translation, often confused with regexp substitution but unrelated
  in semantics.