Basics#
The smallest useful regexp is a plain string. "Hello World" =~ /World/ asks: does the string on the left contain the pattern on the
right? It does, so the expression is true.
if ("Hello World" =~ /World/) {
print "matched\n";
}
The // enclose the pattern. The =~ operator binds the pattern to
the string you want to test. Without a binding operator, Perl applies
the pattern to $_ instead.
The match operator#
The long form is m//:
"Hello World" =~ m/World/;
"Hello World" =~ m!World!; # alternate delimiters
"Hello World" =~ m{World}; # paired delimiters
m lets you pick any delimiter. That matters when the pattern itself
contains the default delimiter / — compare
"/usr/bin/perl" =~ /\/usr\/bin\/perl/; # "leaning toothpick syndrome"
"/usr/bin/perl" =~ m!/usr/bin/perl!; # clearer
Paired delimiters ({}, (), [], <>) nest, which is useful when
your pattern contains the delimiter character escaped or not.
Without m, the leading slash is required: /pat/ only. With m,
the leading m is required: m{pat}, not {pat}.
Binding: =~ and !~#
=~ asks “does it match?”. !~ asks “does it fail to match?”.
$s = "Hello World";
print "yes\n" if $s =~ /World/; # yes
print "no\n" if $s !~ /planet/; # no
!~ is not a separate regexp construct — it is the negated binding.
It is equivalent to not ($s =~ /pat/).
Matching against $_#
If you omit the binding, the match is against $_:
for ("cat", "dog", "bird") {
print "has an 'o'\n" if /o/; # implicit: $_ =~ /o/
}
This is idiomatic in while (<>) loops, inside grep and map,
and inside for loops that set $_.
Case sensitivity and the default anchor#
Matches are case-sensitive and unanchored:
"Hello" =~ /hello/; # does not match — case differs
"Hello" =~ /ell/; # matches — inside the string is fine
To match case-insensitively, append /i. To constrain the match to
the start or end of the string, use anchors. Both are covered in
their own chapters.
When a pattern could match at several positions, Perl tries from the left and takes the first one that works:
"That hat is red" =~ /hat/; # matches 'hat' in 'That', not in 'hat'
Metacharacters#
Most characters in a pattern match themselves. These do not:
{ } [ ] ( ) ^ $ . | * + ? - # \
Each has a special meaning covered later. To match a literal copy of one, put a backslash in front:
"2+2=4" =~ /2+2/; # fails — '+' is a quantifier, needs escaping
"2+2=4" =~ /2\+2/; # matches
"end." =~ /end\./; # matches a literal dot
"end." =~ /end./; # also matches — but . matches any character,
# so this would also match "endx", "end ", etc.
The backslash itself is a metacharacter, so a literal backslash in a
pattern needs \\:
'C:\WIN32' =~ /C:\\WIN/; # matches
A metacharacter that has nothing special to do in its context reverts
to matching itself. } only closes a {…} quantifier; outside that
context it is a literal }. This is convenient but easy to misread;
use re 'strict' catches many such cases.
Escape sequences#
Non-printing characters use the same escapes as in double-quoted strings:
Sequence |
Matches |
|---|---|
|
tab |
|
newline |
|
carriage return |
|
form feed |
|
escape ( |
|
NUL byte |
|
byte with hex value HH |
|
Unicode codepoint with hex value |
|
octal codepoint |
|
control-X |
"1000\t2000" =~ /0\t2/; # matches
"a\x{263a}b" =~ /\x{263a}/; # matches U+263A, WHITE SMILING FACE
Variables in patterns#
A pattern is (by default) interpolated like a double-quoted string, so variables are substituted before matching:
my $word = "house";
"housecat" =~ /$word/; # matches
"housecat" =~ /${word}cat/; # matches — braces disambiguate
To match a literal $ or @, escape it:
'price: $10' =~ /\$10/; # matches a literal dollar sign
If a user-supplied string will be interpolated into a pattern and you
want its metacharacters treated literally, use
quotemeta — or its in-pattern
equivalent \Q…\E:
my $input = "1+1";
"1+1=2" =~ /\Q$input\E/; # matches the literal string
Without \Q…\E the + would be read as a quantifier.
Substitution at a glance#
Replacing text uses the s/// operator, which takes a pattern and a
replacement string:
my $x = "feed the cat";
$x =~ s/cat/dog/; # $x is now "feed the dog"
Substitution is covered in depth in its own chapter; it is mentioned
here so you can combine it with the facts above. Most everything that
applies to m// patterns applies inside s/// patterns too.
Where to go next#
Literal matches get you surprisingly far, but every real regexp uses character classes, anchors, or quantifiers. Character classes come next — they let one position in the pattern accept any of several characters.