regmust#

Return the longest anchored and longest floating fixed strings the optimiser extracted from a compiled pattern.

A fixed string is a substring that must appear in any string that matches. An anchored fixed string is one whose offset from the start of the match is known; a floating fixed string can appear anywhere in a range of positions. The optimiser uses whichever it finds — preferring the longer, or the floating one if they are the same length — to skip positions that cannot match.

Synopsis#

use re 'regmust';
my ($anchored, $floating) = regmust(qr/here .* there/x);

## $anchored = 'here', $floating = 'there'

What you get back#

A two-element list ($anchored, $floating). Each element is either the fixed-string text as an SV or &PL_sv_no (the false-but-defined sentinel) when the optimiser has no fixed string of that kind. Returns undef if the argument is not a compiled pattern or its engine is unknown.

Examples#

my ($a, $f) = regmust(qr/foo\s+bar/);   # $a = 'foo', $f = 'bar'
my ($a, $f) = regmust(qr/\d+/);         # no fixed strings — both false
my @got     = regmust("not a regex");   # empty list — ref wasn't a qr//

Edge cases#

  • Argument is not a qr// — returns undef.

  • Pattern’s engine is not the core engine — returns undef.

  • Pattern has only one kind of fixed string — the other slot is &PL_sv_no (false but defined).

  • UTF-8 substrings are returned as-is; the optimiser stores the UTF-8 form in a separate slot and this function prefers the byte form when both are present.

Differences from upstream#

  • The returned fixed strings are the values picked by this build’s optimiser; upstream’s POD notes that the exact result is optimiser-dependent and may change between Perl versions. The same caveat applies here.