Basics#

The smallest useful regexp is a plain string. "Hello World" =~ /World/ asks: does the string on the left contain the pattern on the right? It does, so the expression is true.

if ("Hello World" =~ /World/) {
    print "matched\n";
}

The // enclose the pattern. The =~ operator binds the pattern to the string you want to test. Without a binding operator, Perl applies the pattern to $_ instead.

The match operator#

The long form is m//:

"Hello World" =~ m/World/;
"Hello World" =~ m!World!;    # alternate delimiters
"Hello World" =~ m{World};    # paired delimiters

m lets you pick any delimiter. That matters when the pattern itself contains the default delimiter / — compare

"/usr/bin/perl" =~ /\/usr\/bin\/perl/;  # "leaning toothpick syndrome"
"/usr/bin/perl" =~ m!/usr/bin/perl!;    # clearer

Paired delimiters ({}, (), [], <>) nest, which is useful when your pattern contains the delimiter character escaped or not.

Without m, the leading slash is required: /pat/ only. With m, the leading m is required: m{pat}, not {pat}.

Binding: =~ and !~#

=~ asks “does it match?”. !~ asks “does it fail to match?”.

$s = "Hello World";

print "yes\n" if $s =~ /World/;   # yes
print "no\n"  if $s !~ /planet/;  # no

!~ is not a separate regexp construct — it is the negated binding. It is equivalent to not ($s =~ /pat/).

Matching against $_#

If you omit the binding, the match is against $_:

for ("cat", "dog", "bird") {
    print "has an 'o'\n" if /o/;   # implicit: $_ =~ /o/
}

This is idiomatic in while (<>) loops, inside grep and map, and inside for loops that set $_.

Case sensitivity and the default anchor#

Matches are case-sensitive and unanchored:

"Hello" =~ /hello/;    # does not match — case differs
"Hello" =~ /ell/;      # matches — inside the string is fine

To match case-insensitively, append /i. To constrain the match to the start or end of the string, use anchors. Both are covered in their own chapters.

When a pattern could match at several positions, Perl tries from the left and takes the first one that works:

"That hat is red" =~ /hat/;   # matches 'hat' in 'That', not in 'hat'

Metacharacters#

Most characters in a pattern match themselves. These do not:

{ } [ ] ( ) ^ $ . | * + ? - # \

Each has a special meaning covered later. To match a literal copy of one, put a backslash in front:

"2+2=4" =~ /2+2/;    # fails — '+' is a quantifier, needs escaping
"2+2=4" =~ /2\+2/;   # matches

"end." =~ /end\./;   # matches a literal dot
"end." =~ /end./;    # also matches — but . matches any character,
                     # so this would also match "endx", "end ", etc.

The backslash itself is a metacharacter, so a literal backslash in a pattern needs \\:

'C:\WIN32' =~ /C:\\WIN/;    # matches

A metacharacter that has nothing special to do in its context reverts to matching itself. } only closes a {…} quantifier; outside that context it is a literal }. This is convenient but easy to misread; use re 'strict' catches many such cases.

Escape sequences#

Non-printing characters use the same escapes as in double-quoted strings:

Sequence

Matches

\t

tab

\n

newline

\r

carriage return

\f

form feed

\e

escape (\x1B)

\0

NUL byte

\xHH

byte with hex value HH

\x{…}

Unicode codepoint with hex value

\o{…}

octal codepoint

\cX

control-X

"1000\t2000" =~ /0\t2/;      # matches
"a\x{263a}b" =~ /\x{263a}/;  # matches U+263A, WHITE SMILING FACE

Variables in patterns#

A pattern is (by default) interpolated like a double-quoted string, so variables are substituted before matching:

my $word = "house";
"housecat" =~ /$word/;       # matches
"housecat" =~ /${word}cat/;  # matches — braces disambiguate

To match a literal $ or @, escape it:

'price: $10' =~ /\$10/;      # matches a literal dollar sign

If a user-supplied string will be interpolated into a pattern and you want its metacharacters treated literally, use quotemeta — or its in-pattern equivalent \Q…\E:

my $input = "1+1";
"1+1=2" =~ /\Q$input\E/;     # matches the literal string

Without \Q…\E the + would be read as a quantifier.

Substitution at a glance#

Replacing text uses the s/// operator, which takes a pattern and a replacement string:

my $x = "feed the cat";
$x =~ s/cat/dog/;            # $x is now "feed the dog"

Substitution is covered in depth in its own chapter; it is mentioned here so you can combine it with the facts above. Most everything that applies to m// patterns applies inside s/// patterns too.

Where to go next#

Literal matches get you surprisingly far, but every real regexp uses character classes, anchors, or quantifiers. Character classes come next — they let one position in the pattern accept any of several characters.

See also#

  • m — full reference for the match operator.

  • s — full reference for substitution.

  • quotemeta — escape a string for safe pattern interpolation.

  • perlre — complete regexp syntax.