---
name: regex character classes
---
# Character classes

A character class matches exactly one character, chosen from a set you
define. Where a literal `a` matches only `a`, the class `[abc]`
matches any one of `a`, `b`, or `c`.

```perl
/cat/;               # matches 'cat'
/[bcr]at/;           # matches 'bat', 'cat', or 'rat'
/item[0123456789]/;  # matches 'item0' through 'item9'
```

The class still consumes one character of the string. `[bcr]at` never
matches `at` (no letter present) and never matches `brat` (two
letters where the class expects one).

## Ranges

Inside `[…]`, a dash between two characters denotes a contiguous
range in the underlying character set:

```perl
/[0-9]/;         # any ASCII digit
/[a-z]/;         # any ASCII lowercase letter
/[a-zA-Z]/;      # any ASCII letter
/[0-9a-fA-F]/;   # any hex digit
```

Ranges can be combined with individual characters:

```perl
/[0-9bx-z]aa/;   # matches '0aa'..'9aa', 'baa', 'xaa', 'yaa', 'zaa'
```

A dash that is first or last inside the class is literal:

```perl
/[-ab]/;         # matches '-', 'a', or 'b'
/[ab-]/;         # same
```

## Negation

A caret `^` as the first character inside `[…]` inverts the class:

```perl
/[^a]/;          # any character except 'a'
/[^0-9]/;        # any non-digit
```

A caret elsewhere is literal:

```perl
/[a^]/;          # matches 'a' or '^'
```

A negated class still matches *one* character — `[^a]` does not match
the empty string; it requires one non-`a`.

## Special characters inside a class

Inside `[…]` the special set shrinks to `- ] \ ^ $` (and the pattern
delimiter). The others — `.`, `*`, `+`, `?`, `(`, `)`, `{`, `}`, `|`
— are literals in a class:

```perl
/[.+*]/;         # matches a literal '.', '+', or '*'
/[()]/;          # matches '(' or ')'
```

To match `]` inside the class, either escape it or put it first
(after any leading `^`):

```perl
/[\]]/;          # matches ']'
/[]ab]/;         # matches ']', 'a', or 'b'
```

`$` and `\` are slightly awkward because they interact with
interpolation and escaping:

```perl
my $x = 'bcr';
/[$x]at/;        # matches 'bat', 'cat', or 'rat' — interpolated
/[\$x]at/;       # matches '$at' or 'xat' — '$' is literal
/[\\$x]at/;      # matches '\at' plus interpolation of $x
```

## Shorthand classes

Several common classes have shorthand names usable both inside and
outside `[…]`:

| Shorthand | Matches                                          |
|-----------|--------------------------------------------------|
| `\d`      | a digit                                          |
| `\D`      | a non-digit                                      |
| `\w`      | a word character (alphanumeric or `_`)           |
| `\W`      | a non-word character                             |
| `\s`      | whitespace (space, tab, `\r`, `\n`, `\f`, more)  |
| `\S`      | non-whitespace                                   |
| `\h`      | horizontal whitespace (space, tab, unicode)      |
| `\H`      | non-horizontal-whitespace                        |
| `\v`      | vertical whitespace (`\n`, `\r`, `\f`, `\v`…)    |
| `\V`      | non-vertical-whitespace                          |

Under Unicode — the default since Perl 5.14 — `\d`, `\w`, `\s` match
more than just ASCII. `\d` matches any Unicode digit (Devanagari
digits, Arabic-Indic digits, and many more), `\w` matches any letter
in any script plus marks and connector punctuation, and `\s` adds
Unicode space characters such as non-breaking space.

To restrict these to ASCII, add the `/a` modifier or use explicit
ranges like `[0-9]` and `[A-Za-z_0-9]`.

```perl
"item0" =~ /\w\w\w\w\d/;     # matches
"abc\x{0660}" =~ /\w\w\w\d/; # matches: U+0660 is an Arabic-Indic zero
"abc\x{0660}" =~ /\w\w\w\d/a;# does not match under /a
```

## The period

`.` matches any single character except newline. Under the `/s`
modifier (covered in the modifiers chapter), `.` also matches
newline:

```perl
"a\nb" =~ /a.b/;        # does not match
"a\nb" =~ /a.b/s;       # matches
```

To match any character including newline without `/s`, use `\N` — it
always excludes newline regardless of `/s`, but that's the opposite
of what you want. Use `[\s\S]` or `[\d\D]` as the classic
"match anything" idiom:

```perl
"a\nb" =~ /a[\s\S]b/;   # matches without /s
```

## Composing classes

You can mix shorthands, ranges, and individual characters inside one
class:

```perl
/[\d\s]/;        # a digit or whitespace
/[A-Z\d_]/;      # uppercase letter, digit, or underscore
/[a-zA-Z\d]/;    # letter or digit (ASCII)
```

De Morgan's law matters: `[^\d\w]` is *not* `[\D\W]`. The first
requires the character to be *both* non-digit and non-word. But every
digit is a word character, so `[^\d\w]` simplifies to `[^\w]`, i.e.
`\W`. Be careful when combining negated shorthands.

## POSIX classes

POSIX character classes use the form `[:name:]` and only work inside
`[…]`:

| POSIX         | Equivalent                     |
|---------------|--------------------------------|
| `[:alpha:]`   | alphabetic                     |
| `[:alnum:]`   | alphanumeric                   |
| `[:digit:]`   | digit (like `\d`)              |
| `[:word:]`    | word char (Perl extension)     |
| `[:space:]`   | whitespace (like `\s`)         |
| `[:upper:]`   | uppercase                      |
| `[:lower:]`   | lowercase                      |
| `[:xdigit:]`  | hex digit                      |
| `[:ascii:]`   | 0x00–0x7F                      |
| `[:cntrl:]`   | control character              |
| `[:graph:]`   | printable, not space           |
| `[:print:]`   | printable, including space     |
| `[:punct:]`   | punctuation                    |
| `[:blank:]`   | space or tab                   |

Negate a POSIX class with `^` *inside* the colons:

```perl
/[[:^digit:]]/;   # same as \D
/[[:alpha:][:digit:]]/;  # letter or digit — equivalent to \w minus '_'
```

POSIX classes follow the same Unicode-vs-ASCII rules as the
shorthands: without `/a`, `[:alpha:]` is the Unicode alphabetic set.

## Unicode properties

Unicode defines thousands of properties. The notation is
`\p{Name}` for "has this property" and `\P{Name}` for "does not
have this property".

```perl
/\p{Lu}/;              # any uppercase letter, any script
/\p{Greek}/;           # any character in the Greek script
/\p{Number}/;          # any numeric character
/\P{ASCII}/;           # any non-ASCII character
```

Short single-letter aliases exist for common properties and drop the
braces: `\pL` is a letter, `\pN` a number, `\pP` punctuation, and so
on. `\p{L}` is the same as `\pL`.

The Unicode chapter covers properties in detail, including the
compound form `\p{Name=Value}` and the `\X` grapheme cluster.

## A useful habit

Named and shorthand classes are almost always clearer than explicit
ranges. `\d{4}-\d{2}-\d{2}` reads; `[0-9]{4}-[0-9]{2}-[0-9]{2}`
needs a moment. Use the ranges only when you have a concrete reason
— usually performance in a hot loop, or deliberately restricting to
ASCII.

## See also

- [`perlre`](../../p5/core/perlre) — full class syntax, including
  class set operations `[a&&b]`, `[a+b]`, `[a-b]`.
- [`m`](../../p5/core/perlfunc/m) — the match operator.
- [`qr`](../../p5/core/perlfunc/qr) — compile a pattern for reuse.