Traps and surprises#

The ways pperl one-liners silently do something other than what the shell user meant. Each section names a trap, shows its symptom, and gives the fix. Read straight through once to prime the pattern-match; come back when a recipe is misbehaving.

Quoting hell#

The single biggest source of wasted time in one-liner work.

Single quotes are your friend#

Under bash and zsh, text inside single quotes is not interpolated. No $VAR expansion, no backtick execution, no ! history substitution (zsh doesn’t do !; bash does, and it bites). Use single quotes for Perl programs unless you have a reason not to.

# Correct
pperl -ne 'print if /$var/' file              # Perl sees literal $var

# Trap
pperl -ne "print if /$var/" file              # shell expands $var before pperl sees it

The ! surprise in interactive bash#

$ pperl -E 'say "hello!"'
bash: !": event not found

Bash’s history expansion fires on unquoted ! (and some escaped forms). Workarounds:

  • set +H in ~/.bashrc disables history expansion.

  • Escape the ! when writing the command: "hello\!".

  • Keep the shell’s single-quoted string around the program — ! is inert inside '...' once history expansion is off.

A single quote inside a single-quoted string#

Bash has no escape for it. The workaround is the four-character close-open dance: '\'' reads as close-quote, literal-quote, open-quote.

# Want Perl to see:  print "it's here"
pperl -e 'print "it'\''s here\n"'

For any program with more than one embedded apostrophe, give up on '...' and use q/.../ inside double quotes:

pperl -e "print q/it's here/, qq/\n/"

or move the program into a file.

The $ escape cascade under double quotes#

If you quote the Perl program in double quotes (because you want $1 to interpolate from the shell), Perl’s own variables need \$:

col=3
pperl -lane "\$s += \$F[$col]; END { print \$s }"      # works

That is legible once. After the third such one-liner in a row, move to environment variables:

col=3
col=$col pperl -lane '$s += $F[$ENV{col}]; END { print $s }'

See aliases#parametrised-functions.

Locale and encoding#

pperl’s default I/O is bytes. If your input is UTF-8, Perl still sees a byte string until you ask for otherwise.

The symptom#

$ echo 'héllo' | pperl -lne 'print length'
6                                             # "h" + 2 bytes for é + 3 more = 6
$ echo 'héllo' | pperl -CSD -lne 'print length'
5                                             # correct

The fix is -C (see switches):

  • -CSD — STDIN/STDOUT/STDERR and file opens all UTF-8.

  • -CS — stdio only.

  • -CA — treat @ARGV as UTF-8.

Regex matching on UTF-8#

Without -C, \w matches ASCII word characters only. With -C, \w matches Unicode letters/digits as defined by Perl’s Unicode property tables.

# Count "word" characters in a multilingual file
pperl -CSD -lne '$c += () = /\w/g; END { print $c }' file.txt

Mojibake on the way out#

If pperl is reading UTF-8 cleanly but the terminal shows garbled characters, the problem is further down — likely LC_ALL or the terminal’s own encoding. locale and tput diagnose it; this is not a pperl problem.

-i in-place editing#

Always .bak until verified#

# Destructive if the pattern is wrong
pperl -i -pe 's/OLD/NEW/g' *.conf

# Safe
pperl -i.bak -pe 's/OLD/NEW/g' *.conf
# ...verify...
rm *.conf.bak

A wrong pattern with bare -i replaces the file’s contents with something useless. There is no recovery short of version control. Use -i.bak, confirm, delete.

The empty-file trap#

-i writes whatever -p (or -n plus explicit prints) emits. If the program prints nothing, the file becomes empty.

# Wrong: -i with -n and no print — empties the file
pperl -i -ne '/KEEP/ && print' file.txt      # prints only lines matching KEEP

# Same thing, probably what you meant: keep the matching lines
pperl -i -ne 'print if /KEEP/' file.txt

Both are correct. The point: under -n, you own every byte of output.

In-place across many files with ownership caveats#

-i creates a new file and renames. On Linux, this means the new file inherits the calling user’s ownership and umask. Root-owned configs edited by non-root users will fail; root-owned configs edited by root will keep the mode but may lose SELinux context. Know your threat model before pointing -i at /etc.

The diamond operator <> and magic open#

<> (the “diamond”) is what -n / -p reads from. It reads from each argument in @ARGV as a filename, falling back to STDIN if @ARGV is empty.

The “-” is STDIN#

If an argument is "-", <> reads STDIN at that point. Useful for interleaving:

echo prefix | pperl -ne 'print' - trailer.txt
# prints: prefix, then contents of trailer.txt

Filename metacharacters (and why they matter less than they used to)#

Historically, bare <> treated ">foo" as “open foo for writing” — a two-argument open that read the mode from the filename. Modern Perl’s <> uses three-argument open internally, so filenames starting with <, >, |, or containing pipes are treated as plain filenames. No attack surface on input filenames with modern pperl.

Filename is in $ARGV#

pperl -lne 'print "$ARGV: $_" if /\bERROR\b/' *.log

$ARGV updates as each file opens. "-" is the special value for STDIN.

$. keeps counting across files#

$. is not reset when <> moves to the next file. If you want per-file line numbers:

pperl -lne 'close ARGV if eof; print "$ARGV:$.: $_" if /TODO/' *.pl

close ARGV if eof resets $. when the current file ends.

Record-separator traps#

-l removes then restores — once#

-l chomps input and sets $\ to the chomped separator for output. Fine under -n / -p. But -l applied twice (say, you added -l after -0) confuses people:

# Read NUL-delimited input, output newline-terminated
pperl -0 -lpe ''                              # correct: -l sets $\ = "\n"

# But if you write -l0 Perl parses that as -l with argument 0:
pperl -l0 -pe ''                              # $\ becomes NUL — probably NOT what you wanted

Put -l after -0 (no argument to -l) when you want the standard pair.

Paragraph mode keeps trailing blanks#

-00 (paragraph mode) sets $/ = "". Perl reads up to and including the blank-line separator. Without -l, the separator stays attached to each paragraph; printing it back out preserves the blank line. With -l, the separator is chomped and $\ = "\n" appends one newline on output — paragraphs collapse.

$ printf 'a\n\nb\n\nc\n' | pperl -00 -pe ''
a

b

c
$ printf 'a\n\nb\n\nc\n' | pperl -00 -lpe ''
a
b
c

Know which one you want.

-0777 and $.#

Under -0777, each file is one record. $. increments per record, so $. is the number of files processed, not the number of lines.

pperl -0777 -e 'while (<>) { print "file $.: length=", length, "\n" }' *.txt
# file 1: length=1024
# file 2: length=2048

The -F interpretation of its argument#

-F takes a regex. Perl versions before 5.10 required you to write the delimiters; modern pperl accepts both forms:

pperl -F:      -lane '...'                    # bare: Perl quotes it
pperl -F/:/    -lane '...'                    # explicit delimiters
pperl -F'\t'   -lane '...'                    # tab — regex form

The common mistake is a literal tab in a shell where the terminal swallowed it:

pperl -F'	' -lane '...'                   # fragile: tab may be a space in your paste

Write -F'\t' unless you are pasting from known-good source.

Operator precedence around print#

print has lower precedence than most operators. This trips people whose awk reflex is wrong for Perl:

# Wrong: prints into a filehandle named "..."
pperl -ne 'print "out.txt" $_'

# Right: parentheses or commas
pperl -ne 'print $_, "\n"'
pperl -ne 'print("foo\n")'

# Wrong: the binary || applies to print's argument list
pperl -ne 'print if $_ || "never"'            # works but obscure

# One particular case worth memorising:
pperl -ne 'print (1+1)*2'                     # prints 2, not 4
# ^ print(1+1) is the function call; *2 is applied to its return value

When in doubt, put parentheses around print’s arguments or end the program with an explicit semicolon.

tr is not a regex#

tr / y take character sets, not patterns. tr/\d/X/ does NOT match digits — it replaces literal backslash-d characters.

pperl -pe 'tr/0-9/X/'                         # replace digits with X (character range)
pperl -pe 's/\d/X/g'                          # replace digits with X (regex)

Use tr for character-class work (case conversion, deletion, ranges) and s/// when you need any regex machinery.

BEGIN and END scope#

BEGIN { ... } and END { ... } blocks are part of the program, not the loop body. Variables declared with my inside BEGIN are not visible to the loop body. Use our or global variables for state that needs to persist from BEGIN into the loop:

# Wrong: $greeting is not in scope inside the loop
pperl -ne 'BEGIN { my $greeting = "Hi"; } print "$greeting $_"'

# Right: declare without my, or use our
pperl -ne 'BEGIN { $greeting = "Hi"; } print "$greeting $_"'

In practice, most BEGIN blocks do setup with side effects ($\ = "\n", setting a format) rather than bindings.

die and the \n convention#

die without a trailing \n appends " at -e line N.\n" to its message. With a trailing \n, it prints the message as-is. Choose deliberately:

# Developer diagnostic — want the location
pperl -E 'die "unreachable" if $x > 100'

# User-facing message — don't leak program structure
pperl -E 'die "bad input\n" unless /\d+/'

The same rule applies to warn.

Missing newline at end of file#

If the last line of an input file has no trailing newline, -l still chomps what’s there (nothing), and $_ still lacks a \n. -p reprints it without a final newline, and the next line of shell output runs into it. Rarely a correctness problem; often a display problem.

pperl -pe '' no-final-newline.txt             # output lacks trailing \n

Find out more#

  • switches — the mechanical expansions of every flag mentioned here.

  • perlvar — the full definition of every special variable that the traps above depend on ($/, $\, $., $ARGV, @ARGV).

  • perlop — the diamond operator, tr, and the quoting forms referenced above.