Regex-driven one-liners#
The previous chapter kept regexps at the level sed or awk users
already know: literal patterns inside m// and s///. This chapter
covers one-liners whose shape depends on a regex feature beyond plain
matching — extractions, captures, lookarounds, code-in-replacement,
and non-destructive rewrites.
Readers who already know the constructs ((?:...), (?<name>...),
(?=...), the /e and /r modifiers) can skim for the command-line
framing. Readers who know what /g does but have never written /e
will get the most out of reading straight through.
For the regex language itself, see the regular expressions guide.
Extracting all matches#
/g in list context returns every match, not just the first.
# Every integer in the input
pperl -ne 'print "$_\n" for /-?\d+/g' file.txt
# Every IPv4-shaped substring
pperl -lne 'print for /\b(?:\d{1,3}\.){3}\d{1,3}\b/g' access.log
# Every HTTP header value (naive: no folded headers)
pperl -nE 'say $1 if /^[\w-]+:\s*(.+)$/' headers.txt
say (enabled by -E) appends a
newline, saving one "\n" and keeping the program short.
Non-greedy quantifiers#
*?, +?, ??, {n,m}? — match as few characters as possible.
# Everything between the first < and the next > (tag-by-tag)
pperl -lne 'print for /<(.+?)>/g' markup.html
# Every quoted string (double quotes, no escape handling)
pperl -lne 'print for /"([^"]*)"/g' config.txt
The negated character class [^"]* is usually a faster and safer
non-greedy than .*? when you know what must not appear inside.
Captures in substitution#
The right-hand side of s/// can reference left-hand captures.
# Swap every "word1 word2" pair
pperl -pe 's/(\w+)\s+(\w+)/$2 $1/g' file.txt
# Quote every unquoted key in key=value pairs
pperl -pe 's/(\w+)=/$1="/g; s/=([^"\n]+)/"$1"/g' config.txt
# Increase every trailing number by one
pperl -pe 's/(\d+)$/$1 + 1/e' file.txt
/e — replacement is Perl code#
The /e modifier treats the replacement as code; its return value is
what gets substituted in.
# Number every word in the file (global counter)
pperl -pe 's/(\w+)/++$i . ".$1"/ge' file.txt
# Number words per line (counter reset each line)
pperl -pe '$i = 0; s/(\w+)/++$i . ".$1"/ge' file.txt
# Wrap every integer in base 16
pperl -pe 's/\d+/sprintf "0x%x", $&/ge' file.txt
# Convert Celsius-labelled numbers to Fahrenheit
pperl -pe 's/(\d+)C\b/sprintf "%.1fF", $1 * 9 / 5 + 32/ge' weather.txt
/ge together: evaluate the right-hand side as code, do it for every
match. $& is the whole matched text; $1, $2, … are captures.
/ee evaluates twice — the first e produces a string, the second
evaluates that string. Useful when the replacement string is itself
dynamic code read from input; needed rarely, a hazard often.
/r — non-destructive substitution#
/r returns the modified string instead of rewriting $_. The
original stays intact.
# Print filename and its .bak variant side by side
ls *.txt | pperl -pe '$_ = $_ . ($_ =~ s/$/.bak/r)'
# Compute a transformed version without touching $_
pperl -ne 'my $upper = $_ =~ tr/a-z/A-Z/r; print $_, $upper' file.txt
Reach for /r when a single line needs to be printed in two forms, or
when the transformation is conditional and you want the untouched line
back in the failure branch.
Named captures#
(?<name>...) captures into $+{name} and into %+.
# Parse combined-log-format access lines
pperl -nE '
if (/^(?<ip>\S+).*?"(?<method>\S+) (?<path>\S+)/) {
say "$+{ip} $+{method} $+{path}";
}
' access.log
Named captures pay for themselves once the pattern has more than two groups, or when the same group needs to be referenced both in the pattern and in the replacement code.
Lookahead and lookbehind#
Zero-width assertions: match a position where the surrounding text satisfies a condition, without consuming the condition itself.
# Digits preceded by $ (e.g. "$42" → 42, ignore "42")
pperl -lne 'print for /(?<=\$)\d+/g' prices.txt
# Words NOT followed by "!"
pperl -lne 'print for /\b\w+\b(?!!)/g' shouts.txt
# Insert CRLF only where LF is not already preceded by CR
pperl -pe 's/(?<!\r)\n/\r\n/g' unix.txt
Common pairs:
Lookaround |
Meaning |
|---|---|
|
Next characters match |
|
Next characters do not match |
|
Preceding characters match |
|
Preceding characters do not match |
Lookbehind in pperl supports variable-length patterns; see regular expressions: anchors and assertions.
Multi-line matches across slurped input#
-0777 slurps the entire file into $_. The regex modifiers that
matter in that mode:
/s—.matches\n./m—^and$match at every line start/end, not just the start/end of the whole string.
# Every BEGIN...END block (non-greedy), each on one output line
pperl -0777 -lne '
while (/BEGIN(.*?)END/sg) {
(my $body = $1) =~ s/\n/ /g;
print $body;
}
' text.txt
# Paragraphs where every line starts with ">"
pperl -00 -ne 'print if /\A(?:>.*\n?)+\z/m' mail.txt
\A and \z anchor the whole string start and end, unaffected by
/m. Use them when /m is on but you really do mean the whole
string.
Transliteration with tr#
tr (also spelt
y) is character-by-character
substitution. Not a regex, but a close neighbour — and the right tool
for character-set jobs.
pperl -pe 'tr/A-Za-z/N-ZA-Mn-za-m/' # ROT13
pperl -pe 'tr/A-Za-z/a-zA-Z/' # swap case
pperl -pe 'tr/a-z//d' # delete lowercase letters
pperl -pe 'tr/ \t/ /s' # squeeze whitespace runs to one space
# Count commas on each line (tr returns the count)
pperl -ne 'print tr/,//, "\n"' data.csv
The d flag deletes characters in the left set that have no
counterpart on the right. The s flag squeezes runs. The c flag
complements the left set (everything NOT listed).
Parsing delimited substrings#
CSV with quoted commas#
A quick-and-dirty split that honours double-quoted fields:
pperl -ne '
my @F = /"([^"]*)"|([^,]+)/g;
@F = grep defined, @F;
print join("|", @F), "\n";
' quoted.csv
Correct and fast enough for a one-off. For anything that must handle
embedded quotes ("he said ""hi"""), use Text::CSV; see
numeric.
URL query strings#
# Each key=value pair on its own line
pperl -lne 'print for /([^?&=]+)=([^&]*)/g' urls.txt
Practical extraction one-liners#
Words by length#
# Words of exactly 5 characters
pperl -lne 'print for /\b\w{5}\b/g' file.txt
Numbers with units#
pperl -lne 'print for /\b\d+(?:\.\d+)?\s*(?:ms|s|kb|mb|gb)\b/gi' timings.txt
HTTP User-Agent from a request log#
pperl -nE 'say $1 if /^User-Agent: (.+)$/' request.log
Balanced braces (as many as pperl’s regex engine supports)#
pperl’s regex engine supports recursion via (?R) and (?1), so
single-level brace balancing is a one-liner:
pperl -0777 -nE 'say $& while /\{(?:[^{}]|(?R))*\}/g' code.txt
For deeply nested structures, reach for a parser. Regex is the right tool up to the complexity where it stops being one.
Find out more#
regular expressions guide — the language underneath every recipe here, including modifiers, performance, and Unicode.
m— the match operator.s— substitution, the full set of modifiers.tr— transliteration reference.progression — simpler
s///recipes without captures or code.perlop— theqr//form for precompiling patterns referenced by name inside a one-liner.