length#
Return the number of characters in a string.
length stringifies its argument and returns how many characters
that string contains. “Characters” means logical Perl characters, not
bytes: a string holding ten codepoints has length 10 regardless of
how many bytes those codepoints occupy when encoded as UTF-8 on
disk or on the wire.
Synopsis#
length EXPR
length
What you get back#
A non-negative integer — the count of characters in the stringified
value of EXPR. If EXPR is omitted, length operates on $_. If
EXPR is undef, the result is undef, not 0; this is the one
case where length does not return a number, and it is deliberate —
it lets you distinguish an undefined scalar from an empty string
without a separate defined check:
length(undef) # undef
length("") # 0
length("abc") # 3
Characters, not bytes#
On a normal Perl string, length counts characters. The byte count of
the same string encoded as UTF-8 is generally different and is what
external systems (file sizes, network payloads, Content-Length
headers) want. Get it by encoding first:
use Encode;
my $s = "na\x{EF}ve"; # five characters
my $chars = length $s; # 5
my $bytes = length(encode('UTF-8', $s)); # 6
See Encode and perlunicode for
the full story on how Perl strings carry an internal
character-vs-byte flag and when that flag matters.
Examples#
Basic count:
length "hello" # 5
length "" # 0
Default argument is $_:
for ("a", "bb", "ccc") {
print length, "\n"; # 1, 2, 3
}
Distinguishing undefined from empty:
my $x;
my $y = "";
print defined(length $x) ? "set" : "undef", "\n"; # undef
print defined(length $y) ? "set" : "undef", "\n"; # set
A Unicode string — character count vs byte count:
my $s = "caf\x{E9}"; # four characters, one of them non-ASCII
length $s; # 4
length encode('UTF-8', $s); # 5
Numbers are stringified before counting, which is occasionally useful and occasionally surprising:
length 12345 # 5
length 3.14 # 4 ("3.14")
length 1e6 # 7 ("1000000")
Edge cases#
undefin,undefout.length(undef)returnsundef, not0. Underuse warningsreading an undefined scalar throughlength $xwhere$xis undef does not emit anuninitializedwarning, unlike most other operators.Whole array or hash:
lengthdoes not count elements. It stringifies its argument, solength @arrfirst evaluates@arrin scalar context (its element count) and then counts the digits of that number. For the actual size usescalar @arrorscalar keys %hash:my @arr = (1) x 100; length @arr; # 3 (digits of "100") scalar @arr; # 100
Tied or magical scalars:
lengthtriggers FETCH, so the counted value is whatever the tie layer returns at that moment.Overloaded objects:
length $objinvokes the""stringification overload (or falls back to the default reference form) and counts characters in the resulting string.Wide characters from byte input: a string read from a handle without an encoding layer is a byte string.
lengthcounts bytes then — because each byte is one character to Perl. Decode first if you want character counts:open my $fh, "<:raw", $path or die $!; my $bytes = do { local $/; <$fh> }; my $text = decode('UTF-8', $bytes); length $bytes; # byte count length $text; # character count
Locale and pragmas: unlike
lc/uc/fc,lengthdoes not vary byuse locale. A character is a character regardless.
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
scalar— force scalar context; the correct way to get an element count from an array (scalar @arr) instead of the length of its stringified countsubstr— extract a substring by character offset; pairs naturally withlengthfor slicing workindex— find the position of a substring; returns a character offset measured the same waylengthmeasuressprintf—%sfield widths are character counts, same notion aslengthreverse— in scalar context reverses a string character-by-character; same character model aslengthpos— regex-match position, also in characters