SCALARs and strings

length#

Return the number of characters in a string.

length stringifies its argument and returns how many characters that string contains. “Characters” means logical Perl characters, not bytes: a string holding ten codepoints has length 10 regardless of how many bytes those codepoints occupy when encoded as UTF-8 on disk or on the wire.

Synopsis#

length EXPR
length

What you get back#

A non-negative integer — the count of characters in the stringified value of EXPR. If EXPR is omitted, length operates on $_. If EXPR is undef, the result is undef, not 0; this is the one case where length does not return a number, and it is deliberate — it lets you distinguish an undefined scalar from an empty string without a separate defined check:

length(undef)       # undef
length("")          # 0
length("abc")       # 3

Characters, not bytes#

On a normal Perl string, length counts characters. The byte count of the same string encoded as UTF-8 is generally different and is what external systems (file sizes, network payloads, Content-Length headers) want. Get it by encoding first:

use Encode;
my $s     = "na\x{EF}ve";          # five characters
my $chars = length $s;              # 5
my $bytes = length(encode('UTF-8', $s));  # 6

See Encode and perlunicode for the full story on how Perl strings carry an internal character-vs-byte flag and when that flag matters.

Examples#

Basic count:

length "hello"                      # 5
length ""                           # 0

Default argument is $_:

for ("a", "bb", "ccc") {
    print length, "\n";             # 1, 2, 3
}

Distinguishing undefined from empty:

my $x;
my $y = "";
print defined(length $x) ? "set" : "undef", "\n";   # undef
print defined(length $y) ? "set" : "undef", "\n";   # set

A Unicode string — character count vs byte count:

my $s = "caf\x{E9}";                # four characters, one of them non-ASCII
length $s;                          # 4
length encode('UTF-8', $s);         # 5

Numbers are stringified before counting, which is occasionally useful and occasionally surprising:

length 12345                        # 5
length 3.14                         # 4  ("3.14")
length 1e6                          # 7  ("1000000")

Edge cases#

  • undef in, undef out. length(undef) returns undef, not 0. Under use warnings reading an undefined scalar through length $x where $x is undef does not emit an uninitialized warning, unlike most other operators.

  • Whole array or hash: length does not count elements. It stringifies its argument, so length @arr first evaluates @arr in scalar context (its element count) and then counts the digits of that number. For the actual size use scalar @arr or scalar keys %hash:

    my @arr = (1) x 100;
    length @arr;                      # 3  (digits of "100")
    scalar @arr;                      # 100
    
  • Tied or magical scalars: length triggers FETCH, so the counted value is whatever the tie layer returns at that moment.

  • Overloaded objects: length $obj invokes the "" stringification overload (or falls back to the default reference form) and counts characters in the resulting string.

  • Wide characters from byte input: a string read from a handle without an encoding layer is a byte string. length counts bytes then — because each byte is one character to Perl. Decode first if you want character counts:

    open my $fh, "<:raw", $path or die $!;
    my $bytes = do { local $/; <$fh> };
    my $text  = decode('UTF-8', $bytes);
    length $bytes;                    # byte count
    length $text;                     # character count
    
  • Locale and pragmas: unlike lc / uc / fc, length does not vary by use locale. A character is a character regardless.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • scalar — force scalar context; the correct way to get an element count from an array (scalar @arr) instead of the length of its stringified count

  • substr — extract a substring by character offset; pairs naturally with length for slicing work

  • index — find the position of a substring; returns a character offset measured the same way length measures

  • sprintf%s field widths are character counts, same notion as length

  • reverse — in scalar context reverses a string character-by-character; same character model as length

  • pos — regex-match position, also in characters