# Regex-driven one-liners The previous chapter kept regexps at the level `sed` or `awk` users already know: literal patterns inside `m//` and `s///`. This chapter covers one-liners whose shape depends on a regex feature beyond plain matching — extractions, captures, lookarounds, code-in-replacement, and non-destructive rewrites. Readers who already know the constructs (`(?:...)`, `(?...)`, `(?=...)`, the `/e` and `/r` modifiers) can skim for the command-line framing. Readers who know what `/g` does but have never written `/e` will get the most out of reading straight through. For the regex language itself, see the [regular expressions guide](../regular-expression/index). ## Extracting all matches `/g` in list context returns every match, not just the first. ```bash # Every integer in the input pperl -ne 'print "$_\n" for /-?\d+/g' file.txt # Every IPv4-shaped substring pperl -lne 'print for /\b(?:\d{1,3}\.){3}\d{1,3}\b/g' access.log # Every HTTP header value (naive: no folded headers) pperl -nE 'say $1 if /^[\w-]+:\s*(.+)$/' headers.txt ``` [`say`](../../p5/core/perlfunc/say) (enabled by `-E`) appends a newline, saving one `"\n"` and keeping the program short. ## Non-greedy quantifiers `*?`, `+?`, `??`, `{n,m}?` — match as few characters as possible. ```bash # Everything between the first < and the next > (tag-by-tag) pperl -lne 'print for /<(.+?)>/g' markup.html # Every quoted string (double quotes, no escape handling) pperl -lne 'print for /"([^"]*)"/g' config.txt ``` The negated character class `[^"]*` is usually a faster and safer non-greedy than `.*?` when you know what must not appear inside. ## Captures in substitution The right-hand side of `s///` can reference left-hand captures. ```bash # Swap every "word1 word2" pair pperl -pe 's/(\w+)\s+(\w+)/$2 $1/g' file.txt # Quote every unquoted key in key=value pairs pperl -pe 's/(\w+)=/$1="/g; s/=([^"\n]+)/"$1"/g' config.txt # Increase every trailing number by one pperl -pe 's/(\d+)$/$1 + 1/e' file.txt ``` (e-modifier)= ## `/e` — replacement is Perl code The `/e` modifier treats the replacement as code; its return value is what gets substituted in. ```bash # Number every word in the file (global counter) pperl -pe 's/(\w+)/++$i . ".$1"/ge' file.txt # Number words per line (counter reset each line) pperl -pe '$i = 0; s/(\w+)/++$i . ".$1"/ge' file.txt # Wrap every integer in base 16 pperl -pe 's/\d+/sprintf "0x%x", $&/ge' file.txt # Convert Celsius-labelled numbers to Fahrenheit pperl -pe 's/(\d+)C\b/sprintf "%.1fF", $1 * 9 / 5 + 32/ge' weather.txt ``` `/ge` together: evaluate the right-hand side as code, do it for every match. `$&` is the whole matched text; `$1`, `$2`, … are captures. `/ee` evaluates twice — the first `e` produces a string, the second evaluates that string. Useful when the replacement string is itself dynamic code read from input; needed rarely, a hazard often. ## `/r` — non-destructive substitution `/r` returns the modified string instead of rewriting `$_`. The original stays intact. ```bash # Print filename and its .bak variant side by side ls *.txt | pperl -pe '$_ = $_ . ($_ =~ s/$/.bak/r)' # Compute a transformed version without touching $_ pperl -ne 'my $upper = $_ =~ tr/a-z/A-Z/r; print $_, $upper' file.txt ``` Reach for `/r` when a single line needs to be printed in two forms, or when the transformation is conditional and you want the untouched line back in the failure branch. ## Named captures `(?...)` captures into `$+{name}` and into `%+`. ```bash # Parse combined-log-format access lines pperl -nE ' if (/^(?\S+).*?"(?\S+) (?\S+)/) { say "$+{ip} $+{method} $+{path}"; } ' access.log ``` Named captures pay for themselves once the pattern has more than two groups, or when the same group needs to be referenced both in the pattern and in the replacement code. (lookaround)= ## Lookahead and lookbehind Zero-width assertions: match a position where the surrounding text satisfies a condition, without consuming the condition itself. ```bash # Digits preceded by $ (e.g. "$42" → 42, ignore "42") pperl -lne 'print for /(?<=\$)\d+/g' prices.txt # Words NOT followed by "!" pperl -lne 'print for /\b\w+\b(?!!)/g' shouts.txt # Insert CRLF only where LF is not already preceded by CR pperl -pe 's/(?" pperl -00 -ne 'print if /\A(?:>.*\n?)+\z/m' mail.txt ``` `\A` and `\z` anchor the whole string start and end, unaffected by `/m`. Use them when `/m` is on but you really do mean the whole string. ## Transliteration with `tr` [`tr`](../../p5/core/perlfunc/tr) (also spelt [`y`](../../p5/core/perlfunc/y)) is character-by-character substitution. Not a regex, but a close neighbour — and the right tool for character-set jobs. ```bash pperl -pe 'tr/A-Za-z/N-ZA-Mn-za-m/' # ROT13 pperl -pe 'tr/A-Za-z/a-zA-Z/' # swap case pperl -pe 'tr/a-z//d' # delete lowercase letters pperl -pe 'tr/ \t/ /s' # squeeze whitespace runs to one space # Count commas on each line (tr returns the count) pperl -ne 'print tr/,//, "\n"' data.csv ``` The `d` flag deletes characters in the left set that have no counterpart on the right. The `s` flag squeezes runs. The `c` flag complements the left set (everything NOT listed). ## Parsing delimited substrings ### CSV with quoted commas A quick-and-dirty split that honours double-quoted fields: ```bash pperl -ne ' my @F = /"([^"]*)"|([^,]+)/g; @F = grep defined, @F; print join("|", @F), "\n"; ' quoted.csv ``` Correct and fast enough for a one-off. For anything that must handle embedded quotes (`"he said ""hi"""`), use `Text::CSV`; see [numeric](csv-basics). ### URL query strings ```bash # Each key=value pair on its own line pperl -lne 'print for /([^?&=]+)=([^&]*)/g' urls.txt ``` ## Practical extraction one-liners ### Words by length ```bash # Words of exactly 5 characters pperl -lne 'print for /\b\w{5}\b/g' file.txt ``` ### Numbers with units ```bash pperl -lne 'print for /\b\d+(?:\.\d+)?\s*(?:ms|s|kb|mb|gb)\b/gi' timings.txt ``` ### HTTP User-Agent from a request log ```bash pperl -nE 'say $1 if /^User-Agent: (.+)$/' request.log ``` ### Balanced braces (as many as pperl's regex engine supports) pperl's regex engine supports recursion via `(?R)` and `(?1)`, so single-level brace balancing is a one-liner: ```bash pperl -0777 -nE 'say $& while /\{(?:[^{}]|(?R))*\}/g' code.txt ``` For deeply nested structures, reach for a parser. Regex is the right tool up to the complexity where it stops being one. ## Find out more - [regular expressions guide](../regular-expression/index) — the language underneath every recipe here, including modifiers, performance, and Unicode. - [`m`](../../p5/core/perlfunc/m) — the match operator. - [`s`](../../p5/core/perlfunc/s) — substitution, the full set of modifiers. - [`tr`](../../p5/core/perlfunc/tr) — transliteration reference. - [progression](substitution) — simpler `s///` recipes without captures or code. - [`perlop`](../../p5/core/perlop) — the `qr//` form for precompiling patterns referenced by name inside a one-liner.