Regular expressions and pattern matching
pos#
Report or set where the next /g regex match will resume in a string.
pos reads (or, as an lvalue, writes) the offset that the regex engine
stores on a scalar after a global match (m//g, s///g). The offset
counts characters, not bytes, and is the position after the last
successful match — the place the next /g iteration will start
scanning. With no argument pos operates on $_.
Synopsis#
pos SCALAR
pos $str = N
pos
What you get back#
An integer offset, or undef when no position is recorded. 0 is
a valid offset and means “start of string”; it is not the same as
undef, which means “no /g match has run, or the last one
failed and reset the position.” Always distinguish the two with
defined:
if (defined pos $str) {
# a /g scan is in progress
}
Used as an lvalue, pos SCALAR returns an assignable location:
pos($str) = 5; # next /g match starts at char offset 5
Global state it touches#
pos reads and writes the per-scalar regex position attached to its
operand. With no argument it targets $_. The stored offset is
what \G anchors against in the next match, so every call to
pos potentially changes where \G binds.
Examples#
Walk every word in a string with /g in scalar context, using pos
to report progress:
my $s = "one two three";
while ($s =~ /(\w+)/g) {
printf "%-5s ends at %d\n", $1, pos $s;
}
# one ends at 3
# two ends at 7
# three ends at 13
Skip ahead before starting the scan. The first match begins at offset
4, not 0:
my $s = "AAA BBB CCC";
pos($s) = 4;
$s =~ /(\w+)/g;
print $1, "\n"; # BBB
Anchor a follow-up match to the previous one with \G. Without
\G the engine would scan forward past any gap; with it, the match
must start exactly where the last one ended:
my $s = "12ab34cd";
while ($s =~ /\G(\d+)(\w+?)(?=\d|\z)/g) {
print "num=$1 tail=$2\n";
}
# num=12 tail=ab
# num=34 tail=cd
Restart a scan by clearing the position:
pos($s) = undef; # next /g starts from offset 0 again
Edge cases#
Bare
postargets$_.posinsidewhile (<>) { ... }therefore reports the position on the current input line.Characters, not bytes. For a string containing multi-byte characters,
posreturns the character offset. The (deprecated)use bytespragma switches to byte offsets; new code should not rely on it.Failed
/gmatch resets the position toundef— the next/gstarts over at offset0. Add the/cmodifier (m//gc) to preserve the position on failure, which is the usual idiom when composing several alternative\G-anchored patterns against the same string.Reads during a match are stale.
posreflects the previous match’s end. Expressions like(?{ pos() = 5 })ors//pos() = 5/einfluence the next match, not the one currently running.Zero-length match flag. Setting
posalso clears the internal matched with zero-length flag, so a subsequent zero-width match at the same position is allowed again. Seeperlreunder Repeated Patterns Matching a Zero-length Substring.Non-lvalue operand.
posrequires a real scalar variable for the lvalue form;pos("literal") = 3is a compile-time error.Offset
0vsundef.posof0means the next/gstarts at the beginning;undefmeans no position is set. They behave identically for the first match but differ after the zero-length-match flag state is considered.
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
m— the/gmodifier is what creates and advances the positionposreads; the/cmodifier on failure preserves itqr— precompile a pattern once, then reuse it in\G-anchored/gloops without reparsingsplit— the other everyday way to walk a string in pieces; noposinvolved, but often a cleaner choice when the delimiters are simple\G— zero-width anchor that binds to the currentpos; the main reason to touchposin the first place$_— default target of barepos