SCALARs and strings

tr///#

Character-by-character substitution. tr scans a string and replaces every occurrence of a character from SEARCHLIST with the positionally corresponding character from REPLACEMENTLIST, returning the count of characters it touched.

It is not a regex. SEARCHLIST is a set of literal characters (with range shorthand like a-z), not a pattern. No metacharacters, no \d, no variable interpolation. The translation table is built once, at compile time.

Synopsis#

tr/SEARCHLIST/REPLACEMENTLIST/cdsr
y/SEARCHLIST/REPLACEMENTLIST/cdsr
$str =~ tr/abc/xyz/;
$count = ($str =~ tr/aeiou//);

y/// is a sed-compatibility synonym for tr/// — identical behaviour, identical modifiers. See y///.

What you get back#

Without the /r modifier, tr mutates the target string in place and returns the number of characters matched — or, under /d, matched-or-deleted; under /s, matched-before-squashing. The return value is an integer and is commonly used to count characters without changing anything (see the tr/*/*/ idiom below).

With /r, tr leaves the original string alone, builds a transliterated copy, and returns that copy. The returned string is always a plain string — even if the target was a tied variable or a blessed object.

Target without =~: tr operates on $_. Target with =~: the left-hand side must be an lvalue (scalar, array element, hash element) unless /r is used, in which case any expression works.

Modifiers#

The four trailing flags change what tr does with each character it sees. They can be combined:

  • c — complement SEARCHLIST. The set of characters acted on becomes every character NOT in SEARCHLIST. Under /c, SEARCHLIST is sorted by codepoint after complementing, and any REPLACEMENTLIST is applied to that sorted set — so the order of characters you wrote in SEARCHLIST no longer matters.

  • d — delete. Characters matched by SEARCHLIST that have no position in REPLACEMENTLIST are removed from the target entirely rather than replaced with the final replacement character.

  • s — squash. A run of characters that all translate to the same output character collapses to a single instance of that output.

  • r — return modified copy. Do not touch the target. Return the transliterated string instead of a count.

Replacement list length rules#

The relationship between SEARCHLIST and REPLACEMENTLIST changes with the modifiers:

  • If REPLACEMENTLIST is shorter than SEARCHLIST and /d is not in effect, the final replacement character is replicated to fill the gap. tr/abcd/AB/ is the same as tr/abcd/ABBB/.

  • If REPLACEMENTLIST is shorter than SEARCHLIST and /d is in effect, surplus SEARCHLIST characters are deleted. tr/abcd/AB/d does tr/ab/AB/ plus s/[cd]//g.

  • If REPLACEMENTLIST is empty and /d is not in effect, REPLACEMENTLIST becomes a copy of SEARCHLIST. This is the count-only form: tr/abc// counts a, b, c without changing anything.

  • If REPLACEMENTLIST is empty and /d is in effect, every character in SEARCHLIST is deleted. tr/abc//d removes all a, b, c from the target.

Character ranges#

A hyphen between two characters specifies an inclusive codepoint range: tr/A-J/0-9/ is tr/ABCDEFGHIJ/0123456789/. A hyphen at the start, at the end, or preceded by a backslash is taken literally — tr/-ab-/xyz/ and tr/\-abc/wxyz/ treat the hyphen as the literal character -.

Only use ASCII-alphabet (same case) or ASCII-digit ranges. a-z, A-Z, 0-9, and clean subsets like h-k or B-E are portable and mean exactly what they read as. Mixed-case ranges, \x-ranges, or ranges straddling the alphabet/digit boundary look obvious but produce different results on non-ASCII platforms — spell them out.

Portable Unicode ranges use \N{U+...} endpoints:

$str =~ tr/\N{U+20}-\N{U+7E}//d;    # delete all ASCII printables

Delimiters#

The three delimiters around SEARCHLIST and REPLACEMENTLIST can be any matching pair of printable characters, not just /. Bracketing pairs ({}, (), [], <>) require two independent pairs, one for each list:

tr(aeiouy)(yuoiea);
tr[+\-*/]"ABCD";

When the delimiter is a single quote (tr'...'...'), the lists are treated almost literally — only \\ pairs are collapsed. Hyphens inside single-quoted tr are not range specifiers; they are literal hyphens.

Global state it touches#

  • $_ — default target when neither =~ nor !~ is used. tr/// alone reads and (unless /r is used) mutates $_.

  • No other special variables. tr does not set $&, $1, etc. — those belong to regex matching, and tr is not regex.

Examples#

ASCII lowercase to uppercase:

my $name = "Perl";
$name =~ tr/a-z/A-Z/;           # $name is now "PERL"

Count without changing — the tr/X/X/ idiom. The replacement is the same character, so the string is unchanged; the return value is the count:

my $stars = "**hello**";
my $cnt   = ($stars =~ tr/*/*/);    # $cnt == 4

Empty REPLACEMENTLIST without /d counts, too:

my $digits = ($text =~ tr/0-9//);   # count ASCII digits in $text

Delete characters with /d:

my $s = "(555) 123-4567";
$s =~ tr/0-9//cd;               # keep only digits: "5551234567"

Squash runs with /s:

my $line = "hello    world";
$line =~ tr/ //s;               # "hello world"

Non-destructive /r — build a copy, leave the original:

my $host  = "example.com";
my $HOST  = $host =~ tr/a-z/A-Z/r;    # $host unchanged, $HOST uppercased

Chain with other /r operators:

my $tag = $host =~ tr/a-z/A-Z/r
               =~ s/\./_/gr;     # "EXAMPLE_COM"

map with /r — turn a list of strings into a transformed list:

my @upper = map tr/a-z/A-Z/r, @names;

Whole-word compacting — collapse non-alphabetic runs to a single space:

$sentence =~ tr/a-zA-Z/ /cs;    # "foo, bar; BAZ!" -> "foo bar BAZ "

Strip the high bit off every byte:

$bytes =~ tr[\200-\377][\000-\177];

Edge cases#

  • Not a regex. tr/\d/X/ matches the literal characters \ and d, not “any digit”. Character classes, anchors, quantifiers, and backreferences are all silently literal. If you want regex, use s///.

  • No variable interpolation. $var and @var inside the lists are literal $ / @ plus the following name. The table is compiled once, when the program is parsed. To build the lists dynamically, wrap in eval:

    eval "tr/$from/$to/";
    die $@ if $@;
    
  • Repeated characters in SEARCHLIST — only the first mapping counts. tr/AAA/XYZ/ translates every A to X; the Y and Z positions are unreachable.

  • No rescan. Transliterated output is not re-fed into the table, even if the replacement character appears in SEARCHLIST too. tr/ox/xo/ turns "oxxo" into "xoox", not some fixed point.

  • !~ with tr — returns 0 if any character was changed, 1 otherwise. Useful only for quick “nothing happened” checks:

    $foo !~ tr/A/a/     # true iff no 'A' was in $foo
    
  • Lvalue requirement. Without /r, the =~ target must be assignable. "constant" =~ tr/a/b/ is a compile-time error.

  • /r always returns a plain string. If the target is a tied scalar or a blessed string-overloaded object, /r strips the magic and gives you a fresh plain string.

  • tr is not the shell tr(1). Similar syntax, overlapping semantics, but case-folding beyond ASCII belongs to lc / uc / lcfirst / ucfirst or to s/// with \U / \L — not to tr.

  • /c with a multi-character REPLACEMENTLIST is non-portable across character sets. Under /c the search set is sorted by codepoint, and that sort order differs between ASCII and EBCDIC. Stick to single-character REPLACEMENTLIST when combining with /c, or use /c only with /d (where the replacement doesn’t matter).

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • y/// — sed-compatibility synonym for tr///; identical behaviour, pick whichever reads better in context

  • s/// — full regex substitution; reach for it when you need patterns, captures, or variable interpolation in the search side

  • lc — Unicode-aware lowercase; use instead of tr/A-Z/a-z/ when the input may contain non-ASCII letters

  • uc — Unicode-aware uppercase; complement to lc

  • tr in perlop — the upstream reference for the quote-like operator form, delimiter rules, and range portability

  • $_ — default target when tr is used without =~ or !~