Cross-engine comparison#
Perl regexps are not the only kind. A reader who knows Perl well will, sooner or later, meet a regex written for another tool — sed, awk, an Emacs Lisp buffer search, a Go service, an embedded PCRE2 library — and be expected to read it. The syntactic differences are usually small, the semantic differences are sometimes substantial, and the architectural differences (which inputs make the engine catastrophically slow) can be decisive.
This chapter is the single page that answers «what does X look like elsewhere?» for the five engine families a Perl-trained reader is most likely to encounter. It is not a porting tutorial; it is the comparison reference.
Engines covered#
Five families, picked because each is widespread in its niche and materially different from Perl. Engines that are widespread but largely Perl-shaped (Java’s java.util.regex, Python’s re, ECMAScript, .NET) are out of scope; the differences are minor and a Perl reader can read them on sight.
Engine | Where you meet it | Family |
|---|---|---|
PCRE2 | nginx, PHP | Perl-extended |
Emacs | Emacs Lisp regex search, | Its own family |
POSIX BRE |
| POSIX Basic |
POSIX ERE |
| POSIX Extended |
RE2 / Go | Go’s standard library, Cloud Bigtable, code search | Linear-time hybrid |
Versions anchored against in this chapter: PCRE2 10.44, Emacs 30.x, POSIX 1003.1-2024, Go 1.22 (RE2 tracks Go’s regexp engine). The values in the tables below were verified against these versions; they will drift over time, and engine-specific updates should re-verify before relying on a row.
What’s not in scope#
The following engines are deliberately omitted from the tables. Each is widespread enough that you may meet it; in each case, the differences from Perl are small enough that a Perl reader can read the regex with little adjustment.
Java
java.util.regex— Perl-derived; differences are flag spellings and a few escape conventions.Python
re— Perl-derived; biggest delta is variable-length lookbehind support arrived later than in Perl.ECMAScript (JavaScript) — Perl-derived; lookbehind only since ES2018, no possessive quantifiers, sticky flag
yinstead of\G. Modern V8 / SpiderMonkey converge on Perl’s feature set..NET regex — Perl-shaped, plus balanced groups (
(?<name-pop>…)), which are unique. Niche enough to skip.Rust
regex— RE2 lineage; the story is essentially Go’s with a slightly different feature surface (it has fixed-width lookbehind, RE2 does not).Vim — different magic-mode semantics; Vim-internal.
Boost.Regex — C++ niche; closely tracks PCRE2.
The five families, in a paragraph each#
PCRE2#
Perl-Compatible Regular Expressions, version 2 (PCRE2 replaced the original PCRE library in 2015). The reference implementation for «Perl regex outside of Perl». Reads almost identically to Perl 5.42 patterns and supports nearly the same feature set: backreferences, lookaround, atomic groups, possessive quantifiers, recursive subpatterns, named captures, conditional patterns, callouts. PCRE2 also has a JIT compiler that is sometimes faster than perl’s.
Where PCRE2 differs from Perl 5.42:
(?{...})and(??{...})— Perl runs arbitrary Perl code; PCRE2 has callouts ((?C)/(?Cn)) which call back into the host application. Not a security difference; just a different interface.A handful of flag spellings differ (
PCRE2_CASELESSvs/i, etc.) at the API level, not in pattern syntax.Recursive subpattern atomicity:
(?R)is atomic in PCRE2 by default; Perl allows backtracking into a recursed group.(*VERB)set is largely the same, with PCRE2 adding(*UTF),(*UCP), and other startup options.
For most patterns, «looks the same» is correct.
Emacs#
GNU Emacs’s regex engine is its own family. On the surface it looks BRE-shaped — \(...\) for grouping, \{m,n\} for counts — but inside it has Perl-lite features (non-greedy quantifiers, word boundaries) and Emacs-specific features (syntax classes \sw, category classes \cC) that no other engine has.
Notable Emacs-specific constructs:
Buffer anchors:
\`(backtick) is start-of-buffer,\'is end-of-buffer. Closer to Perl’s\A/\zthan to^/$.Syntax classes:
\swmatches a «word-syntax» character as defined by the current buffer’s syntax table;\s-matches whitespace-syntax. This is mode-dependent — the same regex matches different characters in C-mode vs Lisp-mode.Category classes:
\cCmatches a character in category C. Used heavily for CJK text classification.No
\dshorthand — Emacs requires[0-9]or[[:digit:]]. (\w,\s,\bexist, but\ddoes not.)No lookahead, lookbehind, or
\K.
If you read Emacs Lisp in 2026 you will see \\(\\) patterns constantly — that is the doubled escape required because Emacs Lisp string literals consume one level of backslash before the regex parser sees the rest.
POSIX BRE — sed, grep, ed#
POSIX Basic Regular Expressions. The engine you get from sed, grep, and ed with no flags. The most surprising thing for a Perl reader: the metacharacter set is smaller. Bare +, ?, {m,n}, (, ), and | are literal characters. The metacharacter forms are the backslashed variants: \+, \?, \{m,n\}, \(, \), \| (the last is a GNU extension; strict POSIX BRE has no alternation at all).
# POSIX BRE, finds the literal text "a+b":
echo 'a+b' | grep 'a+b'
# To match "one or more a", under BRE:
echo 'aaab' | grep 'a\+b'
Backreferences (\1–\9) work in BRE (this is older than POSIX ERE’s no-backreferences rule). POSIX classes [[:digit:]] etc. work. \< and \> (GNU extension) provide word boundaries. Shortened character escapes like \d, \s, \w are not supported.
grep and sed ship the same engine but flag-default to different families: GNU grep -P invokes PCRE2, grep -E is ERE, default grep is BRE. sed -E is ERE, default sed is BRE.
POSIX ERE — awk, egrep, sed -E#
POSIX Extended Regular Expressions, the engine awk defaults to and the one selected by egrep (now grep -E) and sed -E / sed -r. Bare +, ?, {m,n}, (...), and | are metacharacters here; this is the family closest to «what a Perl-trained programmer expects» among the POSIX-conformant set.
What strict POSIX ERE does not have, by spec:
Backreferences in the pattern (
\1,\2, …). They are legal insedsubstitution replacements but not in match patterns. GNU extensions add them; portable scripts should not rely on this.Lookaround.
Possessive quantifiers, atomic groups.
\d,\s,\w,\b(use[0-9],[[:space:]], etc.).
The most common surprise: an ERE \( is a literal opening paren. Backslashing punctuation is sometimes a metacharacter and sometimes a no-op, with the rule being the opposite of BRE. Memorise: BRE backslash enables metacharacters; ERE backslash disables them.
RE2 / Go regexp#
Google’s RE2 is a fundamentally different architecture. RE2 constructs an NFA at compile time and uses a lazy DFA (constructing DFA states on demand and caching them) at match time. The result is guaranteed linear time in the input length, no matter the pattern.
The trade-off: RE2 cannot support features that take regular languages out of the regular-language family. Specifically:
No backreferences in patterns.
\1,\g{1},\k<name>— none. A pattern like(.)\1simply cannot be compiled.No general lookaround. RE2 supports anchors and
\b(zero-width assertions), but not(?=...),(?<=...),(?!...), or(?<!...).No atomic groups, no possessive quantifiers, no recursive subpatterns, no embedded code.
What RE2 does support, and what makes it pleasant for the common case:
Named captures with the Python-shaped syntax
(?P<name>...).ASCII by default for
\d,\w,\s; Unicode mode enabled with(?u).The
(?flags)and(?flags:...)modifier syntaxes.A swap-default-greediness flag
(?U)— under it,*is non-greedy and*?is greedy. The opposite of Perl.
If you have ever written a Perl regex that catastrophically backtracked, RE2 is what you write next. The cost is the missing features above; the upside is the engine cannot be DoSed by a malicious input.
Go’s standard regexp package wraps RE2 and adds a few Go-isms, none of which change the fundamentals.
Feature comparison tables#
The remainder of this chapter is row-major: rows are features, columns are engines. This layout optimises for the «I know one engine, what does X look like in another?» use case. Cell values are concise and may simplify edge cases — verify against the engine’s own reference for production code.
Quantifiers, grouping, alternation#
Feature | Perl 5.42 | PCRE2 | Emacs | POSIX BRE | POSIX ERE | RE2 / Go |
|---|---|---|---|---|---|---|
| meta | meta | meta | meta | meta | meta |
| meta | meta | meta | literal (use | meta | meta |
| meta | meta | meta | literal (use | meta | meta |
| meta | meta |
|
| meta | meta |
| meta | meta | meta |
| meta | meta |
| meta | meta |
|
| meta | meta |
| yes | yes | no | no | no | yes |
Possessive | yes | yes | no | no | no | no |
Non-greedy | yes | yes | yes | no | no | yes |
Atomic group | yes | yes | no | no | no | no |
The single highest-value row for a Perl reader meeting BRE: + and ? are literal. A regex like colou?r reads as «matches colou?r exactly» under BRE. The rule is consistent — BRE’s metacharacter set is small; ERE expanded it; Perl expanded it further.
Character shorthand classes#
What does \d match? It depends. The columns below show what character set each engine treats as digits, word characters, and whitespace under default settings.
Shorthand | Perl 5.42 (default) | Perl 5.42 + | PCRE2 (default) | Emacs | POSIX BRE | POSIX ERE | RE2 / Go (default) |
|---|---|---|---|---|---|---|---|
| ASCII (or Unicode under | Unicode | ASCII | NO | NO | NO | ASCII; Unicode under |
| ASCII or Unicode | Unicode | ASCII | yes (syntax-table-driven) | NO | NO | ASCII; Unicode under |
| ASCII or Unicode | Unicode | ASCII | yes | NO | NO | ASCII; Unicode under |
| yes | yes | yes | yes ( | NO | NO | yes |
| yes | yes | yes | NO | NO | NO | NO |
| yes | yes | yes | NO | NO | NO | NO |
Two surprises here for a Perl reader:
POSIX BRE and ERE both lack
\d,\w,\s. The portable forms are[0-9],[[:alnum:]_],[[:space:]]. Tools that accept\d(GNUgrep,awkin some implementations) do so as a non-portable extension.Emacs has
\wand\sbut no\d. The\scharacter is followed by a syntax-class character —\s-for whitespace,\swfor word — which is unique to Emacs.
POSIX bracket classes like [[:digit:]] and [[:alpha:]] work in every engine in the table; they are the portable spelling.
Anchors and newline handling#
Feature | Perl 5.42 | PCRE2 | Emacs | POSIX BRE / ERE | RE2 / Go |
|---|---|---|---|---|---|
| yes | yes | start-of-buffer | yes | yes |
|
|
| always (line-oriented) | not specified |
|
| yes | yes | end-of-buffer | yes | yes |
| yes | yes | no | varies | yes |
|
|
| no | no |
|
| yes | yes |
| no | yes ( |
| yes | yes | no | no | no (use |
| yes | yes | no | no | no |
Emacs uses \` and \' for the buffer anchors, where most modern engines use \A and \z. The Emacs spelling is older; it predates POSIX. Mixing up ^ and \A matters in line- oriented Emacs code more than in Perl, because Emacs matches are often invoked in contexts that scan an entire buffer.
\G is Perl-specific (and PCRE2). RE2 / Go uses a different API on the match object (a Loc method that returns the next position) rather than embedding the position in the pattern.
Lookaround and atomic constructs#
Feature | Perl 5.42 | PCRE2 | Emacs | POSIX | RE2 / Go |
|---|---|---|---|---|---|
Lookahead | yes | yes | NO | NO | NO |
Fixed-width lookbehind | yes | yes | NO | NO | NO |
Variable-length lookbehind | yes (exp.) | yes | NO | NO | NO |
| yes | yes | NO | NO | NO |
Atomic group | yes | yes | NO | NO | NO |
Possessive quantifiers | yes | yes | NO | NO | NO |
Effectively: only PCRE2 (and other Perl-derived engines outside this table — Java, Python, .NET) supports lookaround. POSIX tools and RE2 / Go simply do not have it. Patterns that need lookaround do not port across this divide.
Captures and backreferences#
Feature | Perl 5.42 | PCRE2 | Emacs | POSIX BRE | POSIX ERE | RE2 / Go |
|---|---|---|---|---|---|---|
Numbered captures | yes | yes | yes | yes | strict no, GNU yes | yes |
Backreferences in pattern ( | yes (1–99) | yes | yes | yes ( | strict no, GNU yes | NO |
Named captures | yes | yes | NO | NO | NO | yes ( |
Branch reset | yes | yes | NO | NO | NO | NO |
Recursive subpatterns | yes | yes | NO | NO | NO | NO |
Conditional patterns | yes | yes | NO | NO | NO | NO |
| yes | callouts | NO | NO | NO | NO |
| yes | NO | NO | NO | NO | NO (different API) |
The single most important takeaway: RE2 / Go regexp does not support backreferences. Patterns like /(.)\1/ cannot be compiled. This is not an engine bug; it is the price RE2 pays for guaranteed linear time. Patterns containing backreferences are not regular languages and a true DFA cannot match them in linear time.
If you are porting Perl regexps to a Go service, the audit question is «does this pattern use backrefs?» — if so, the conversion is not mechanical.
Engine-architecture summary#
Property | PCRE2 / Perl 5.42 | Emacs | POSIX BRE / ERE (typical impl) | RE2 / Go |
|---|---|---|---|---|
Algorithm | Traditional NFA + JIT (PCRE2) | Traditional NFA | DFA (GNU) / NFA (others) | Hybrid NFA → lazy DFA |
Worst-case time | Exponential | Exponential | Linear (DFA) | Linear (guaranteed) |
Backtracking model | yes | yes | DFA: no | no |
Catastrophic backtracking risk | yes | yes | no (DFA) | no |
Backreferences supported | yes | yes | varies | NO |
POSIX BRE/ERE is implementation-dependent: GNU grep and awk use a hybrid DFA-with-NFA-fallback that is fast and not subject to catastrophic backtracking; older traditional implementations (historical awk, traditional grep on some BSDs) use a backtracking NFA. The portable assumption is «linear time, but verify if your input may be hostile».
Selecting an engine when you have the choice#
If you are writing the program (not reading code that already chose), the engine you pick is mostly determined by the language you are programming in. The exceptions are when the same library ships multiple engines, and when an external regex tool is your choice:
A Perl program: Perl’s built-in. No reason to reach outside.
A C program embedding regex: PCRE2. The standard library’s
regex.his POSIX BRE/ERE; PCRE2 is what you want for Perl-like behaviour.A Go service handling untrusted input:
regexp(RE2). The linear-time guarantee is the headline feature; the missing backreferences and lookaround are usually acceptable.A Rust program: the
regexcrate. RE2-lineage; same linear-time guarantee.Shell scripting: prefer
grep -E(ERE) andsed -E(ERE) over the BRE defaults for Perl-readability. Usegrep -P(PCRE2) only when ERE genuinely is not enough.An Emacs Lisp function: Emacs’s. Otherwise leave Emacs.
See also#
The character classes chapter — the shorthand-by-engine row appears in its L3 sidebar.
The anchors and assertions chapter — the lookaround and anchor rows appear in its L3 sidebar.
The groups and captures chapter — the captures-and-backreferences row appears in its L3 sidebar.
The performance chapter — Traditional NFA model and catastrophic backtracking, the architectural reason RE2 exists.