Regular expressions and pattern matching
quotemeta#
Return a copy of a string with every regex-significant character backslash-escaped, so the result can be interpolated into a pattern and match its own literal content.
quotemeta exists to bridge string data and regex syntax. When user
input, filenames, or configuration values end up on the right-hand
side of =~, any character that happens to be a regex metacharacter
(., *, +, ?, (, [, \, |, …) would otherwise fire its
special behaviour. quotemeta preempts that by inserting a backslash
before every ASCII non-word character, leaving letters, digits, and
underscore untouched.
Synopsis#
quotemeta EXPR
quotemeta
What you get back#
A new string. Every ASCII character that does not match
/[A-Za-z_0-9]/ is preceded by a backslash; word characters pass
through unchanged. The result is always safe to paste into a regex
as a literal pattern fragment.
quotemeta is a pure value producer — it does not modify its
argument.
my $safe = quotemeta 'a.b*c'; # 'a\.b\*c'
Global state it touches#
Reads $_ when called with no argument. Within the
scope of use locale, additional non-ASCII Latin‑1
characters are quoted to protect against locales that treat
punctuation like | as a word character.
Examples#
Bare form defaults to $_, like most list built-ins:
for ('a+b', 'c.d') {
print quotemeta, "\n"; # a\+b
} # c\.d
Interpolating user input into a substitution. Without quotemeta,
.*? in $substring would be regex-active and match far more than
intended:
my $sentence = 'The quick brown fox jumped over the lazy dog';
my $substring = 'quick.*?fox';
my $quoted = quotemeta $substring;
$sentence =~ s{$quoted}{big bad wolf};
# $sentence unchanged — the literal text 'quick.*?fox' is not present
The \Q...\E escape in a double-quoted string is exactly
quotemeta applied to the enclosed region. The two forms below are
equivalent:
my $pat1 = "\Q$substring\E";
my $pat2 = quotemeta($substring);
Building a regex that matches a literal filename anywhere in a path:
my $name = 'my.config[prod].ini';
if ($path =~ /\Q$name\E\z/) {
# matches exactly the filename, no metachar surprises
}
Escaping a list of terms for an alternation:
my @terms = ('C++', 'C#', '.NET');
my $alt = join '|', map { quotemeta } @terms;
# 'C\+\+|C\#|\.NET'
Edge cases#
No argument:
quotemetawith no expression quotes$_. Inside a loop likefor (@input) { push @safe, quotemeta }this is the idiomatic form.Already-safe strings are untouched: pure word-character input (
[A-Za-z0-9_]+) round-trips identically.Literal backslashes inside
\Q...\Einteract with double-quotish backslash interpolation before\Qsees the text. A sequence like"\Q\t\E"quotes a literal tab character, not a backslash followed byt. When you need actual backslashes in the quoted region, keep the data in a variable and usequotemeta $varrather than embedding it in a double-quoted literal.No way to inject a literal
$or@inside\Q...\E: a protected\$becomes the four-character sequence\\\$in the output, and an unprotected$starts scalar interpolation before\Qruns. Drop out of the quoted region to splice in sigils.Unicode-aware quoting (Perl 5.16+): on UTF‑8 strings — and on byte strings under
use feature 'unicode_strings'oruse v5.12or greater — non-ASCII characters are quoted only when they carry the Unicode propertiesPattern_Syntax,Pattern_White_Space,White_Space,Default_Ignorable_Code_Point, orGeneral_Category=Control. Identifier-class characters (letters, marks, digits) are left alone. This is the stable contract: Perl promises that any future regex metacharacter will havePattern_Syntaxset, so strings safe today stay safe.Legacy non-UTF‑8 strings outside
unicode_stringsscope have every upper-Latin‑1 code point (\x80–\xFF) quoted, for backwards compatibility with pre-5.16 behaviour.Locale quoting: within
use locale, all non-ASCII Latin‑1 code points are quoted whether the string is UTF‑8 or not. ASCII-range quoting is unaffected by locale.
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
m— the match operator;quotemetaexists to feed safe literals into its patternqr— compiled regex objects;qr/\Q$var\E/is the usual way to pin a pattern around a literal strings— substitution; the most common site ofquotemeta-style escaping on the left-hand pattern\Qand\E— the double-quotish escape pair that is justquotemetawith different syntax$_— default input whenquotemetais called without an argument