---
name: regex quantifiers
---
# Quantifiers
A quantifier says "this previous thing, N times." It attaches to a
single character, a character class, or a group, and turns a
single-character match into a match of many.
```perl
/a*/; # zero or more 'a'
/\d+/; # one or more digits
/colou?r/; # 'color' or 'colour'
```
## The four basic quantifiers
| Quantifier | Meaning |
|------------|------------------------------------------|
| `a?` | 0 or 1 times |
| `a*` | 0 or more times |
| `a+` | 1 or more times |
| `a{n}` | exactly n times |
| `a{n,m}` | at least n, at most m |
| `a{n,}` | at least n |
| `a{,m}` | at most m |
Whitespace inside `{…}` is tolerated but not required. `a{2, 4}` and
`a{2,4}` are identical.
```perl
/[a-z]+\s+\d*/; # word, spaces, any number of digits
/y(es)?/i; # 'y', 'Y', 'yes', or 'YES'
$year =~ /^\d{2,4}$/; # 2, 3, or 4 digits
$year =~ /^\d{4}$|^\d{2}$/;# better: exactly 2 or exactly 4
```
The quantifier applies to the atom immediately before it:
- `ab?` — `a` then optional `b`.
- `(ab)?` — optional `ab`.
- `[ab]?` — optional `a` or `b`.
Group or class, then quantify.
## Greedy by default
Basic quantifiers are *greedy*: they grab as much as the string
allows and only give back if the rest of the pattern cannot match
otherwise. Consider:
```perl
my $x = "the cat in the hat";
$x =~ /^(.*)(cat)(.*)$/;
# $1 = 'the '
# $2 = 'cat'
# $3 = ' in the hat'
```
Here `.*` in the middle works as you'd expect — it stops at `cat`
because `cat` is the only place where the rest of the pattern can
match. Now compare:
```perl
$x =~ /^(.*)(at)(.*)$/;
# $1 = 'the cat in the h'
# $2 = 'at'
# $3 = ''
```
There are *two* places where `at` occurs — at the end of `cat` and
at the end of `hat`. The first `.*` grabs as much as possible while
still leaving room for `at` somewhere, so it grabs up to the last
`at`. The leftmost quantifier wins the contest.
## Principles of the match
With greedy quantifiers, the engine follows four principles in
order:
1. **Earliest position wins.** The whole pattern is tried starting
at position 0, then 1, then 2, until a match is found. The
earliest starting position always wins.
2. **Leftmost alternative wins** (in `a|b|c`).
3. **Greedy quantifiers grab as much as possible** while still
letting the rest of the pattern match.
4. **Leftmost greedy quantifier gets priority.** If two `.*` compete
for the same characters, the first one takes them.
Examples:
```perl
my $x = "The programming republic of Perl";
$x =~ /^(.+)(e|r)(.*)$/;
# $1 = 'The programming republic of Pe'
# $2 = 'r'
# $3 = 'l'
# .+ is leftmost greedy; grabs everything it can while leaving
# room for (e|r) somewhere.
$x =~ /(m{1,2})(.*)$/;
# $1 = 'mm' -- m{1,2} matches at first 'm' in 'programming',
# $2 = 'ing republic of Perl'
# takes the maximum 2.
$x =~ /.*(m{1,2})(.*)$/;
# $1 = 'm'
# $2 = 'ing republic of Perl'
# .* grabs all the way to the last 'm', leaving only one 'm' for
# m{1,2}.
```
## Non-greedy quantifiers
Appending `?` to any quantifier flips it from *grab as much as
possible* to *grab as little as possible*.
| Non-greedy | Meaning |
|------------|----------------------------------------------|
| `a??` | 0 or 1, prefer 0 |
| `a*?` | 0 or more, prefer fewer |
| `a+?` | 1 or more, prefer fewer |
| `a{n,m}?` | n to m, prefer n |
The overall match still has to succeed, so the engine will
*expand* the non-greedy quantifier one step at a time until it
does.
```perl
my $x = "The programming republic of Perl";
$x =~ /^(.+?)(e|r)(.*)$/;
# $1 = 'Th'
# $2 = 'e'
# $3 = ' programming republic of Perl'
# .+? grabs as little as possible while allowing the match.
```
Non-greedy is usually what you want when scanning between
delimiters:
```perl
my $html = 'bold and italic';
while ($html =~ /<(\w+)>(.*?)<\/\1>/g) {
print "$1: $2\n";
}
# b: bold
# i: italic
```
With greedy `.*` the match would swallow the first `` and stop
at the final ``, mangling the capture.
## Possessive quantifiers
Adding `+` after a quantifier (not `?`) makes it *possessive*: it is
greedy, but it refuses to give anything back on backtracking. Once
it matches, those characters are out of play for the rest of the
pattern.
| Possessive | Meaning |
|------------|----------------------------------------------|
| `a?+` | 0 or 1, do not give back |
| `a*+` | 0 or more, do not give back |
| `a++` | 1 or more, do not give back |
| `a{n,m}+` | n to m, do not give back |
The payoff is speed — on patterns that *should* fail, possessive
quantifiers fail *immediately* instead of exhaustively backtracking.
```perl
# Ordinary greedy: backtracks once per character on 'abc ' when
# the second \w+ cannot match.
/^\w+\s+\w+$/;
# Possessive: once \w+ claims the word characters, it keeps them.
# No backtracking, no quadratic blowup on pathological input.
/^\w++\s+\w+$/;
```
A common idiom for matching quoted strings without backtracking
catastrophe:
```perl
/"(?:[^"\\]++|\\.)*+"/;
```
Each iteration of the inner group either gobbles an unlimited run
of non-quote, non-backslash characters (possessively), or a
single escape. Neither alternative is willing to give back, so
failure is fast.
Possessive quantifiers are covered further in the performance
chapter. They are not always equivalent to their greedy cousins —
if the greedy form *does* need to give back, the possessive form
will fail the match. Use them when you can prove nothing later in
the pattern depends on those characters.
## Zero-width quantification
A quantifier on a zero-width assertion is almost always a mistake.
`\b*` is syntactically legal but matches the empty string at any
word boundary, which is not useful. The engine rejects some of these
explicitly; for the rest, results are usually not what the author
meant.
## Choosing the right quantifier
- **Greedy** — the default, and right most of the time.
- **Non-greedy** — when scanning between a start and end marker that
can both appear legitimately multiple times.
- **Possessive** — when the match either works in exactly one way or
fails, and speed on failure matters.
When in doubt, write it greedy and add a test that fails if you
got it wrong.
## See also
- [`perlre`](../../p5/core/perlre) — complete syntax, including
the `{n,m}+` corner cases.
- The [performance](performance) chapter — nested quantifiers and
catastrophic backtracking.