# Traps and surprises The ways pperl one-liners silently do something other than what the shell user meant. Each section names a trap, shows its symptom, and gives the fix. Read straight through once to prime the pattern-match; come back when a recipe is misbehaving. (quoting)= ## Quoting hell The single biggest source of wasted time in one-liner work. ### Single quotes are your friend Under bash and zsh, text inside single quotes is not interpolated. No `$VAR` expansion, no backtick execution, no `!` history substitution (zsh doesn't do `!`; bash does, and it bites). Use single quotes for Perl programs unless you have a reason not to. ```bash # Correct pperl -ne 'print if /$var/' file # Perl sees literal $var # Trap pperl -ne "print if /$var/" file # shell expands $var before pperl sees it ``` ### The `!` surprise in interactive bash ```bash $ pperl -E 'say "hello!"' bash: !": event not found ``` Bash's history expansion fires on unquoted `!` (and some escaped forms). Workarounds: - `set +H` in `~/.bashrc` disables history expansion. - Escape the `!` when writing the command: `"hello\!"`. - Keep the shell's single-quoted string around the program — `!` is inert inside `'...'` once history expansion is off. ### A single quote inside a single-quoted string Bash has no escape for it. The workaround is the four-character close-open dance: `'\''` reads as close-quote, literal-quote, open-quote. ```bash # Want Perl to see: print "it's here" pperl -e 'print "it'\''s here\n"' ``` For any program with more than one embedded apostrophe, give up on `'...'` and use `q/.../` inside double quotes: ```bash pperl -e "print q/it's here/, qq/\n/" ``` or move the program into a file. ### The `$` escape cascade under double quotes If you quote the Perl program in double quotes (because you want `$1` to interpolate from the shell), Perl's own variables need `\$`: ```bash col=3 pperl -lane "\$s += \$F[$col]; END { print \$s }" # works ``` That is legible once. After the third such one-liner in a row, move to environment variables: ```bash col=3 col=$col pperl -lane '$s += $F[$ENV{col}]; END { print $s }' ``` See [aliases#parametrised-functions](parametrised-functions). (encoding)= ## Locale and encoding pperl's default I/O is bytes. If your input is UTF-8, Perl still sees a byte string until you ask for otherwise. ### The symptom ```bash $ echo 'héllo' | pperl -lne 'print length' 6 # "h" + 2 bytes for é + 3 more = 6 $ echo 'héllo' | pperl -CSD -lne 'print length' 5 # correct ``` The fix is `-C` (see [switches](c-unicode)): - `-CSD` — STDIN/STDOUT/STDERR and file opens all UTF-8. - `-CS` — stdio only. - `-CA` — treat `@ARGV` as UTF-8. ### Regex matching on UTF-8 Without `-C`, `\w` matches ASCII word characters only. With `-C`, `\w` matches Unicode letters/digits as defined by Perl's Unicode property tables. ```bash # Count "word" characters in a multilingual file pperl -CSD -lne '$c += () = /\w/g; END { print $c }' file.txt ``` ### Mojibake on the way out If pperl is reading UTF-8 cleanly but the terminal shows garbled characters, the problem is further down — likely `LC_ALL` or the terminal's own encoding. `locale` and `tput` diagnose it; this is not a pperl problem. (in-place-edit)= ## `-i` in-place editing ### Always `.bak` until verified ```bash # Destructive if the pattern is wrong pperl -i -pe 's/OLD/NEW/g' *.conf # Safe pperl -i.bak -pe 's/OLD/NEW/g' *.conf # ...verify... rm *.conf.bak ``` A wrong pattern with bare `-i` replaces the file's contents with something useless. There is no recovery short of version control. Use `-i.bak`, confirm, delete. ### The empty-file trap `-i` writes whatever `-p` (or `-n` plus explicit prints) emits. If the program prints nothing, the file becomes empty. ```bash # Wrong: -i with -n and no print — empties the file pperl -i -ne '/KEEP/ && print' file.txt # prints only lines matching KEEP # Same thing, probably what you meant: keep the matching lines pperl -i -ne 'print if /KEEP/' file.txt ``` Both are correct. The point: under `-n`, you own every byte of output. ### In-place across many files with ownership caveats `-i` creates a new file and renames. On Linux, this means the new file inherits the calling user's ownership and umask. Root-owned configs edited by non-root users will fail; root-owned configs edited by root will keep the mode but may lose SELinux context. Know your threat model before pointing `-i` at `/etc`. ## The diamond operator `<>` and magic open `<>` (the "diamond") is what `-n` / `-p` reads from. It reads from each argument in `@ARGV` as a filename, falling back to STDIN if `@ARGV` is empty. ### The "-" is STDIN If an argument is `"-"`, `<>` reads STDIN at that point. Useful for interleaving: ```bash echo prefix | pperl -ne 'print' - trailer.txt # prints: prefix, then contents of trailer.txt ``` ### Filename metacharacters (and why they matter less than they used to) Historically, bare `<>` treated `">foo"` as "open foo for writing" — a two-argument open that read the mode from the filename. Modern Perl's `<>` uses three-argument `open` internally, so filenames starting with `<`, `>`, `|`, or containing pipes are treated as plain filenames. No attack surface on input filenames with modern pperl. ### Filename is in `$ARGV` ```bash pperl -lne 'print "$ARGV: $_" if /\bERROR\b/' *.log ``` `$ARGV` updates as each file opens. `"-"` is the special value for STDIN. ### `$.` keeps counting across files `$.` is not reset when `<>` moves to the next file. If you want per-file line numbers: ```bash pperl -lne 'close ARGV if eof; print "$ARGV:$.: $_" if /TODO/' *.pl ``` `close ARGV if eof` resets `$.` when the current file ends. ## Record-separator traps ### `-l` removes then restores — once `-l` chomps input and sets `$\` to the chomped separator for output. Fine under `-n` / `-p`. But `-l` applied twice (say, you added `-l` after `-0`) confuses people: ```bash # Read NUL-delimited input, output newline-terminated pperl -0 -lpe '' # correct: -l sets $\ = "\n" # But if you write -l0 Perl parses that as -l with argument 0: pperl -l0 -pe '' # $\ becomes NUL — probably NOT what you wanted ``` Put `-l` after `-0` (no argument to `-l`) when you want the standard pair. ### Paragraph mode keeps trailing blanks `-00` (paragraph mode) sets `$/ = ""`. Perl reads up to and including the blank-line separator. Without `-l`, the separator stays attached to each paragraph; printing it back out preserves the blank line. With `-l`, the separator is chomped and `$\ = "\n"` appends one newline on output — paragraphs collapse. ```bash $ printf 'a\n\nb\n\nc\n' | pperl -00 -pe '' a b c $ printf 'a\n\nb\n\nc\n' | pperl -00 -lpe '' a b c ``` Know which one you want. ### `-0777` and `$.` Under `-0777`, each file is one record. `$.` increments per record, so `$.` is the number of files processed, not the number of lines. ```bash pperl -0777 -e 'while (<>) { print "file $.: length=", length, "\n" }' *.txt # file 1: length=1024 # file 2: length=2048 ``` ## The `-F` interpretation of its argument `-F` takes a regex. Perl versions before 5.10 required you to write the delimiters; modern pperl accepts both forms: ```bash pperl -F: -lane '...' # bare: Perl quotes it pperl -F/:/ -lane '...' # explicit delimiters pperl -F'\t' -lane '...' # tab — regex form ``` The common mistake is a literal tab in a shell where the terminal swallowed it: ```bash pperl -F' ' -lane '...' # fragile: tab may be a space in your paste ``` Write `-F'\t'` unless you are pasting from known-good source. ## Operator precedence around `print` `print` has lower precedence than most operators. This trips people whose `awk` reflex is wrong for Perl: ```bash # Wrong: prints into a filehandle named "..." pperl -ne 'print "out.txt" $_' # Right: parentheses or commas pperl -ne 'print $_, "\n"' pperl -ne 'print("foo\n")' # Wrong: the binary || applies to print's argument list pperl -ne 'print if $_ || "never"' # works but obscure # One particular case worth memorising: pperl -ne 'print (1+1)*2' # prints 2, not 4 # ^ print(1+1) is the function call; *2 is applied to its return value ``` When in doubt, put parentheses around `print`'s arguments or end the program with an explicit semicolon. ## `tr` is not a regex [`tr`](../../p5/core/perlfunc/tr) / [`y`](../../p5/core/perlfunc/y) take character sets, not patterns. `tr/\d/X/` does NOT match digits — it replaces literal backslash-d characters. ```bash pperl -pe 'tr/0-9/X/' # replace digits with X (character range) pperl -pe 's/\d/X/g' # replace digits with X (regex) ``` Use `tr` for character-class work (case conversion, deletion, ranges) and `s///` when you need any regex machinery. ## BEGIN and END scope `BEGIN { ... }` and `END { ... }` blocks are part of the program, not the loop body. Variables declared with `my` inside `BEGIN` are not visible to the loop body. Use `our` or global variables for state that needs to persist from `BEGIN` into the loop: ```bash # Wrong: $greeting is not in scope inside the loop pperl -ne 'BEGIN { my $greeting = "Hi"; } print "$greeting $_"' # Right: declare without my, or use our pperl -ne 'BEGIN { $greeting = "Hi"; } print "$greeting $_"' ``` In practice, most `BEGIN` blocks do setup with side effects (`$\ = "\n"`, setting a format) rather than bindings. ## `die` and the `\n` convention [`die`](../../p5/core/perlfunc/die) without a trailing `\n` appends `" at -e line N.\n"` to its message. With a trailing `\n`, it prints the message as-is. Choose deliberately: ```bash # Developer diagnostic — want the location pperl -E 'die "unreachable" if $x > 100' # User-facing message — don't leak program structure pperl -E 'die "bad input\n" unless /\d+/' ``` The same rule applies to [`warn`](../../p5/core/perlfunc/warn). ## Missing newline at end of file If the last line of an input file has no trailing newline, `-l` still chomps what's there (nothing), and `$_` still lacks a `\n`. `-p` reprints it without a final newline, and the next line of shell output runs into it. Rarely a correctness problem; often a display problem. ```bash pperl -pe '' no-final-newline.txt # output lacks trailing \n ``` ## Find out more - [switches](switches) — the mechanical expansions of every flag mentioned here. - [`perlvar`](../../p5/core/perlvar) — the full definition of every special variable that the traps above depend on (`$/`, `$\`, `$.`, `$ARGV`, `@ARGV`). - [`perlop`](../../p5/core/perlop) — the diamond operator, `tr`, and the quoting forms referenced above.