Regex-driven one-liners#

The previous chapter kept regexps at the level sed or awk users already know: literal patterns inside m// and s///. This chapter covers one-liners whose shape depends on a regex feature beyond plain matching - extractions, captures, lookarounds, code-in-replacement, and non-destructive rewrites.

Readers who already know the constructs ((?:...), (?<name>...), (?=...), the /e and /r modifiers) can skim for the command-line framing. Readers who know what /g does but have never written /e will get the most out of reading straight through.

For the regex language itself, see the regular expressions guide.

Extracting all matches#

/g in list context returns every match, not just the first.

# Every integer in the input
pperl -ne 'print "$_\n" for /-?\d+/g' file.txt

# Every IPv4-shaped substring
pperl -lne 'print for /\b(?:\d{1,3}\.){3}\d{1,3}\b/g' access.log

# Every HTTP header value (naive: no folded headers)
pperl -nE 'say $1 if /^[\w-]+:\s*(.+)$/' headers.txt

say (enabled by -E) appends a newline, saving one "\n" and keeping the program short.

Non-greedy quantifiers#

*?, +?, ??, {n,m}? - match as few characters as possible.

# Everything between the first < and the next > (tag-by-tag)
pperl -lne 'print for /<(.+?)>/g' markup.html

# Every quoted string (double quotes, no escape handling)
pperl -lne 'print for /"([^"]*)"/g' config.txt

The negated character class [^"]* is usually a faster and safer non-greedy than .*? when you know what must not appear inside.

Captures in substitution#

The right-hand side of s/// can reference left-hand captures.

# Swap every "word1 word2" pair
pperl -pe 's/(\w+)\s+(\w+)/$2 $1/g' file.txt

# Quote every unquoted key in key=value pairs
pperl -pe 's/(\w+)=/$1="/g; s/=([^"\n]+)/"$1"/g' config.txt

# Increase every trailing number by one
pperl -pe 's/(\d+)$/$1 + 1/e' file.txt

`/e` - replacement is Perl code#

The /e modifier treats the replacement as code; its return value is what gets substituted in.

# Number every word in the file (global counter)
pperl -pe 's/(\w+)/++$i . ".$1"/ge' file.txt

# Number words per line (counter reset each line)
pperl -pe '$i = 0; s/(\w+)/++$i . ".$1"/ge' file.txt

# Wrap every integer in base 16
pperl -pe 's/\d+/sprintf "0x%x", $&/ge' file.txt

# Convert Celsius-labelled numbers to Fahrenheit
pperl -pe 's/(\d+)C\b/sprintf "%.1fF", $1 * 9 / 5 + 32/ge' weather.txt

/ge together: evaluate the right-hand side as code, do it for every match. $& is the whole matched text; $1, $2, … are captures.

/ee evaluates twice - the first e produces a string, the second evaluates that string. Useful when the replacement string is itself dynamic code read from input; needed rarely, a hazard often.

`/r` - non-destructive substitution#

/r returns the modified string instead of rewriting $_. The original stays intact.

# Print filename and its .bak variant side by side
ls *.txt | pperl -pe '$_ = $_ . ($_ =~ s/$/.bak/r)'

# Compute a transformed version without touching $_
pperl -ne 'my $upper = $_ =~ tr/a-z/A-Z/r; print $_, $upper' file.txt

Reach for /r when a single line needs to be printed in two forms, or when the transformation is conditional and you want the untouched line back in the failure branch.

Named captures#

(?<name>...) captures into $+{name} and into %+.

# Parse combined-log-format access lines
pperl -nE '
    if (/^(?<ip>\S+).*?"(?<method>\S+) (?<path>\S+)/) {
        say "$+{ip} $+{method} $+{path}";
    }
' access.log

Named captures pay for themselves once the pattern has more than two groups, or when the same group needs to be referenced both in the pattern and in the replacement code.

Lookahead and lookbehind#

Zero-width assertions: match a position where the surrounding text satisfies a condition, without consuming the condition itself.

# Digits preceded by $ (e.g. "$42" → 42, ignore "42")
pperl -lne 'print for /(?<=\$)\d+/g' prices.txt

# Words NOT followed by "!"
pperl -lne 'print for /\b\w+\b(?!!)/g' shouts.txt

# Insert CRLF only where LF is not already preceded by CR
pperl -pe 's/(?<!\r)\n/\r\n/g' unix.txt

Common pairs:

Lookaround	Meaning
`(?=X)`	Next characters match `X`
`(?!X)`	Next characters do not match `X`
`(?<=X)`	Preceding characters match `X`
`(?<!X)`	Preceding characters do not match `X`

Lookbehind in pperl supports variable-length patterns; see regular expressions: anchors and assertions.

Multi-line matches across slurped input#

-0777 slurps the entire file into $_. The regex modifiers that matter in that mode:

/s - . matches \n.
/m - ^ and $ match at every line start/end, not just the start/end of the whole string.

# Every BEGIN...END block (non-greedy), each on one output line
pperl -0777 -lne '
    while (/BEGIN(.*?)END/sg) {
        (my $body = $1) =~ s/\n/ /g;
        print $body;
    }
' text.txt

# Paragraphs where every line starts with ">"
pperl -00 -ne 'print if /\A(?:>.*\n?)+\z/m' mail.txt

\A and \z anchor the whole string start and end, unaffected by /m. Use them when /m is on but you really do mean the whole string.

Transliteration with `tr`#

tr (also spelt y) is character-by-character substitution. Not a regex, but a close neighbour - and the right tool for character-set jobs.

pperl -pe 'tr/A-Za-z/N-ZA-Mn-za-m/'               # ROT13
pperl -pe 'tr/A-Za-z/a-zA-Z/'                     # swap case
pperl -pe 'tr/a-z//d'                             # delete lowercase letters
pperl -pe 'tr/ \t/ /s'                            # squeeze whitespace runs to one space

# Count commas on each line (tr returns the count)
pperl -ne 'print tr/,//, "\n"' data.csv

The d flag deletes characters in the left set that have no counterpart on the right. The s flag squeezes runs. The c flag complements the left set (everything NOT listed).

Parsing delimited substrings#

CSV with quoted commas#

A quick-and-dirty split that honours double-quoted fields:

pperl -ne '
    my @F = /"([^"]*)"|([^,]+)/g;
    @F = grep defined, @F;
    print join("|", @F), "\n";
' quoted.csv

Correct and fast enough for a one-off. For anything that must handle embedded quotes ("he said ""hi"""), use Text::CSV; see numeric.

URL query strings#

# Each key=value pair on its own line
pperl -lne 'print for /([^?&=]+)=([^&]*)/g' urls.txt

Practical extraction one-liners#

Words by length#

# Words of exactly 5 characters
pperl -lne 'print for /\b\w{5}\b/g' file.txt

Numbers with units#

pperl -lne 'print for /\b\d+(?:\.\d+)?\s*(?:ms|s|kb|mb|gb)\b/gi' timings.txt

HTTP User-Agent from a request log#

pperl -nE 'say $1 if /^User-Agent: (.+)$/' request.log

Balanced braces (as many as pperl’s regex engine supports)#

pperl’s regex engine supports recursion via (?R) and (?1), so single-level brace balancing is a one-liner:

pperl -0777 -nE 'say $& while /\{(?:[^{}]|(?R))*\}/g' code.txt

For deeply nested structures, reach for a parser. Regex is the right tool up to the complexity where it stops being one.

Find out more#

regular expressions guide - the language underneath every recipe here, including modifiers, performance, and Unicode.
m - the match operator.
s - substitution, the full set of modifiers.
tr - transliteration reference.
progression - simpler s/// recipes without captures or code.
perlop - the qr// form for precompiling patterns referenced by name inside a one-liner.