--- name: regex basics --- # Basics The smallest useful regexp is a plain string. `"Hello World" =~ /World/` asks: does the string on the left contain the pattern on the right? It does, so the expression is true. ```perl if ("Hello World" =~ /World/) { print "matched\n"; } ``` The `//` enclose the pattern. The `=~` operator binds the pattern to the string you want to test. Without a binding operator, Perl applies the pattern to `$_` instead. ## The match operator The long form is `m//`: ```perl "Hello World" =~ m/World/; "Hello World" =~ m!World!; # alternate delimiters "Hello World" =~ m{World}; # paired delimiters ``` `m` lets you pick any delimiter. That matters when the pattern itself contains the default delimiter `/` — compare ```perl "/usr/bin/perl" =~ /\/usr\/bin\/perl/; # "leaning toothpick syndrome" "/usr/bin/perl" =~ m!/usr/bin/perl!; # clearer ``` Paired delimiters (`{}`, `()`, `[]`, `<>`) nest, which is useful when your pattern contains the delimiter character escaped or not. Without `m`, the leading slash is required: `/pat/` only. With `m`, the leading `m` is required: `m{pat}`, not `{pat}`. ## Binding: =~ and !~ `=~` asks "does it match?". `!~` asks "does it fail to match?". ```perl $s = "Hello World"; print "yes\n" if $s =~ /World/; # yes print "no\n" if $s !~ /planet/; # no ``` `!~` is not a separate regexp construct — it is the negated binding. It is equivalent to `not ($s =~ /pat/)`. ## Matching against $_ If you omit the binding, the match is against `$_`: ```perl for ("cat", "dog", "bird") { print "has an 'o'\n" if /o/; # implicit: $_ =~ /o/ } ``` This is idiomatic in `while (<>)` loops, inside `grep` and `map`, and inside `for` loops that set `$_`. ## Case sensitivity and the default anchor Matches are case-sensitive and unanchored: ```perl "Hello" =~ /hello/; # does not match — case differs "Hello" =~ /ell/; # matches — inside the string is fine ``` To match case-insensitively, append `/i`. To constrain the match to the start or end of the string, use anchors. Both are covered in their own chapters. When a pattern could match at several positions, Perl tries from the left and takes the first one that works: ```perl "That hat is red" =~ /hat/; # matches 'hat' in 'That', not in 'hat' ``` ## Metacharacters Most characters in a pattern match themselves. These do not: { } [ ] ( ) ^ $ . | * + ? - # \ Each has a special meaning covered later. To match a literal copy of one, put a backslash in front: ```perl "2+2=4" =~ /2+2/; # fails — '+' is a quantifier, needs escaping "2+2=4" =~ /2\+2/; # matches "end." =~ /end\./; # matches a literal dot "end." =~ /end./; # also matches — but . matches any character, # so this would also match "endx", "end ", etc. ``` The backslash itself is a metacharacter, so a literal backslash in a pattern needs `\\`: ```perl 'C:\WIN32' =~ /C:\\WIN/; # matches ``` A metacharacter that has nothing special to do in its context reverts to matching itself. `}` only closes a `{…}` quantifier; outside that context it is a literal `}`. This is convenient but easy to misread; `use re 'strict'` catches many such cases. ## Escape sequences Non-printing characters use the same escapes as in double-quoted strings: | Sequence | Matches | |----------|------------------------------------| | `\t` | tab | | `\n` | newline | | `\r` | carriage return | | `\f` | form feed | | `\e` | escape (`\x1B`) | | `\0` | NUL byte | | `\xHH` | byte with hex value HH | | `\x{…}` | Unicode codepoint with hex value | | `\o{…}` | octal codepoint | | `\cX` | control-X | ```perl "1000\t2000" =~ /0\t2/; # matches "a\x{263a}b" =~ /\x{263a}/; # matches U+263A, WHITE SMILING FACE ``` ## Variables in patterns A pattern is (by default) interpolated like a double-quoted string, so variables are substituted before matching: ```perl my $word = "house"; "housecat" =~ /$word/; # matches "housecat" =~ /${word}cat/; # matches — braces disambiguate ``` To match a literal `$` or `@`, escape it: ```perl 'price: $10' =~ /\$10/; # matches a literal dollar sign ``` If a user-supplied string will be interpolated into a pattern and you want its metacharacters treated literally, use [`quotemeta`](../../p5/core/perlfunc/quotemeta) — or its in-pattern equivalent `\Q…\E`: ```perl my $input = "1+1"; "1+1=2" =~ /\Q$input\E/; # matches the literal string ``` Without `\Q…\E` the `+` would be read as a quantifier. ## Substitution at a glance Replacing text uses the `s///` operator, which takes a pattern and a replacement string: ```perl my $x = "feed the cat"; $x =~ s/cat/dog/; # $x is now "feed the dog" ``` Substitution is covered in depth in its own chapter; it is mentioned here so you can combine it with the facts above. Most everything that applies to `m//` patterns applies inside `s///` patterns too. ## Where to go next Literal matches get you surprisingly far, but every real regexp uses character classes, anchors, or quantifiers. Character classes come next — they let one position in the pattern accept any of several characters. ## See also - [`m`](../../p5/core/perlfunc/m) — full reference for the match operator. - [`s`](../../p5/core/perlfunc/s) — full reference for substitution. - [`quotemeta`](../../p5/core/perlfunc/quotemeta) — escape a string for safe pattern interpolation. - [`perlre`](../../p5/core/perlre) — complete regexp syntax.