Regular expressions and pattern matching

quotemeta#

Return a copy of a string with every regex-significant character backslash-escaped, so the result can be interpolated into a pattern and match its own literal content.

quotemeta exists to bridge string data and regex syntax. When user input, filenames, or configuration values end up on the right-hand side of =~, any character that happens to be a regex metacharacter (., *, +, ?, (, [, \, |, …) would otherwise fire its special behaviour. quotemeta preempts that by inserting a backslash before every ASCII non-word character, leaving letters, digits, and underscore untouched.

Synopsis#

quotemeta EXPR
quotemeta

What you get back#

A new string. Every ASCII character that does not match /[A-Za-z_0-9]/ is preceded by a backslash; word characters pass through unchanged. The result is always safe to paste into a regex as a literal pattern fragment.

quotemeta is a pure value producer — it does not modify its argument.

my $safe = quotemeta 'a.b*c';     # 'a\.b\*c'

Global state it touches#

Reads $_ when called with no argument. Within the scope of use locale, additional non-ASCII Latin‑1 characters are quoted to protect against locales that treat punctuation like | as a word character.

Examples#

Bare form defaults to $_, like most list built-ins:

for ('a+b', 'c.d') {
    print quotemeta, "\n";     # a\+b
}                              # c\.d

Interpolating user input into a substitution. Without quotemeta, .*? in $substring would be regex-active and match far more than intended:

my $sentence  = 'The quick brown fox jumped over the lazy dog';
my $substring = 'quick.*?fox';
my $quoted    = quotemeta $substring;
$sentence =~ s{$quoted}{big bad wolf};
# $sentence unchanged — the literal text 'quick.*?fox' is not present

The \Q...\E escape in a double-quoted string is exactly quotemeta applied to the enclosed region. The two forms below are equivalent:

my $pat1 = "\Q$substring\E";
my $pat2 = quotemeta($substring);

Building a regex that matches a literal filename anywhere in a path:

my $name = 'my.config[prod].ini';
if ($path =~ /\Q$name\E\z/) {
    # matches exactly the filename, no metachar surprises
}

Escaping a list of terms for an alternation:

my @terms = ('C++', 'C#', '.NET');
my $alt   = join '|', map { quotemeta } @terms;
# 'C\+\+|C\#|\.NET'

Edge cases#

  • No argument: quotemeta with no expression quotes $_. Inside a loop like for (@input) { push @safe, quotemeta } this is the idiomatic form.

  • Already-safe strings are untouched: pure word-character input ([A-Za-z0-9_]+) round-trips identically.

  • Literal backslashes inside \Q...\E interact with double-quotish backslash interpolation before \Q sees the text. A sequence like "\Q\t\E" quotes a literal tab character, not a backslash followed by t. When you need actual backslashes in the quoted region, keep the data in a variable and use quotemeta $var rather than embedding it in a double-quoted literal.

  • No way to inject a literal $ or @ inside \Q...\E: a protected \$ becomes the four-character sequence \\\$ in the output, and an unprotected $ starts scalar interpolation before \Q runs. Drop out of the quoted region to splice in sigils.

  • Unicode-aware quoting (Perl 5.16+): on UTF‑8 strings — and on byte strings under use feature 'unicode_strings' or use v5.12 or greater — non-ASCII characters are quoted only when they carry the Unicode properties Pattern_Syntax, Pattern_White_Space, White_Space, Default_Ignorable_Code_Point, or General_Category=Control. Identifier-class characters (letters, marks, digits) are left alone. This is the stable contract: Perl promises that any future regex metacharacter will have Pattern_Syntax set, so strings safe today stay safe.

  • Legacy non-UTF‑8 strings outside unicode_strings scope have every upper-Latin‑1 code point (\x80\xFF) quoted, for backwards compatibility with pre-5.16 behaviour.

  • Locale quoting: within use locale, all non-ASCII Latin‑1 code points are quoted whether the string is UTF‑8 or not. ASCII-range quoting is unaffected by locale.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • m — the match operator; quotemeta exists to feed safe literals into its pattern

  • qr — compiled regex objects; qr/\Q$var\E/ is the usual way to pin a pattern around a literal string

  • s — substitution; the most common site of quotemeta-style escaping on the left-hand pattern

  • \Q and \E — the double-quotish escape pair that is just quotemeta with different syntax

  • $_ — default input when quotemeta is called without an argument