Regex binding operators#
Two operators that connect a string with a regular expression operation. They do not perform the match themselves; they merely say «this expression’s input is that string».
Operator | Reads as | Use |
|---|---|---|
| matches | bind a regex op ( |
| doesn’t match | same, with the boolean result negated |
$str =~ /pattern/ # match: TRUE if $str contains a match
$str =~ s/foo/bar/ # substitution: returns count of changes
$str =~ tr/a-z/A-Z/ # transliteration: returns count
$str !~ /pattern/ # match negated: TRUE if NO match
Without an explicit binding, regex ops act on $_:
$_ = "hello";
print "match\n" if /h/; # implicit $_ binding
=~ and !~ are how you redirect that input to a different variable.
What they actually return#
=~ returns whatever the regex op on the right would return:
m//— boolean (true on match, false on no match) in scalar context; the captured groups in list context.s///— the number of substitutions performed (which is boolean-true when ≥ 1).tr///— the number of characters processed.
!~ returns the boolean negation, regardless of the underlying op. It is mostly used with m//:
print "no digit" if $str !~ /\d/;
print "$n changes" if $str =~ s/foo/bar/g;
my @hits = $str =~ /(\w+)/g; # list context: captures
Three op partners#
=~ accepts three regex operations on its right:
m//— match. Themis optional when the delimiters are slashes:$s =~ /pattern/and$s =~ m{pattern}both work.s///— substitution. Three pieces: pattern, replacement, flags. Returns the count of replacements.tr///(also spellabley///) — transliteration. Replaces characters one-for-one between two character sets. Returns the count of characters processed.
$line =~ /(\d+)/ # extract first run of digits
$line =~ s/^\s+// # strip leading whitespace
$line =~ s/\s+/ /g # collapse all whitespace to single spaces
$line =~ tr/A-Z/a-z/ # ASCII lowercase
!~ only with m//#
!~ is meaningful only with the match operation, since substitution and transliteration return counts and the «no-changes-made» case is a meaningful zero, not a «didn’t match» boolean. Perl will let you write $s !~ s/.../.../, but you almost never want it — the result !s/// is «true if zero substitutions» which reads strangely. Stick to !~ /.../.
Lvalue vs rvalue#
=~ itself does not assign. It only routes the regex op on its right at the string on its left. The op itself may then mutate that string — s/// and tr/// do, m// does not — but the mutation comes from the op, not from =~:
my $s = "hello";
$s =~ s/l/L/g; # mutates $s — now "heLLo"
my $n = $s =~ /(\w+)/; # does NOT mutate $s; $n is the boolean result
The string on the left of =~ must be modifiable for s/// and tr///. A literal string or $1 (a regex capture variable) will fail with «Modification of a read-only value attempted»:
"hello" =~ s/l/L/g; # FATAL — literal is read-only
$1 =~ s/x/y/; # FATAL — capture variable is read-only
Copy first if you need to modify a read-only source:
(my $copy = $1) =~ s/x/y/; # idiom for "modify a copy of $1"
Precedence#
=~ and !~ sit at row 6 of the precedence table — quite tight, between unary and the multiplicative operators. This is why you can write $s =~ /foo/ && $t =~ /bar/ without parens around either match.
Tutorial cross-reference#
The boolean-logic tutorial covers regex-as-set-algebra in its applications chapter — alternation as union, lookarounds as intersection and complement:
Boolean Logic — Applications (the Regular expressions: logic on sets of strings section)
The full regex language reference lives in the regex guide: