--- name: length signature: 'length EXPR' since: 5.0 status: documented categories: ["SCALARs and strings"] --- ```{index} single: length; Perl built-in ``` *[SCALARs and strings](../perlfunc-by-category)* # length Return the number of characters in a string. `length` stringifies its argument and returns how many **characters** that string contains. "Characters" means logical Perl characters, not bytes: a string holding ten codepoints has length `10` regardless of how many bytes those codepoints occupy when encoded as UTF-8 on disk or on the wire. ## Synopsis ```perl length EXPR length ``` ## What you get back A non-negative integer — the count of characters in the stringified value of `EXPR`. If `EXPR` is omitted, `length` operates on [`$_`](../perlvar). If `EXPR` is [`undef`](undef), the result is [`undef`](undef), not `0`; this is the one case where `length` does not return a number, and it is deliberate — it lets you distinguish an undefined scalar from an empty string without a separate [`defined`](defined) check: ```perl length(undef) # undef length("") # 0 length("abc") # 3 ``` ## Characters, not bytes On a normal Perl string, `length` counts characters. The byte count of the same string encoded as UTF-8 is generally different and is what external systems (file sizes, network payloads, `Content-Length` headers) want. Get it by encoding first: ```perl use Encode; my $s = "na\x{EF}ve"; # five characters my $chars = length $s; # 5 my $bytes = length(encode('UTF-8', $s)); # 6 ``` See [`Encode`](../../Encode) and `perlunicode` for the full story on how Perl strings carry an internal character-vs-byte flag and when that flag matters. ## Examples Basic count: ```perl length "hello" # 5 length "" # 0 ``` Default argument is [`$_`](../perlvar): ```perl for ("a", "bb", "ccc") { print length, "\n"; # 1, 2, 3 } ``` Distinguishing undefined from empty: ```perl my $x; my $y = ""; print defined(length $x) ? "set" : "undef", "\n"; # undef print defined(length $y) ? "set" : "undef", "\n"; # set ``` A Unicode string — character count vs byte count: ```perl my $s = "caf\x{E9}"; # four characters, one of them non-ASCII length $s; # 4 length encode('UTF-8', $s); # 5 ``` Numbers are stringified before counting, which is occasionally useful and occasionally surprising: ```perl length 12345 # 5 length 3.14 # 4 ("3.14") length 1e6 # 7 ("1000000") ``` ## Edge cases - **[`undef`](undef) in, [`undef`](undef) out.** `length(undef)` returns [`undef`](undef), not `0`. Under `use warnings` reading an undefined scalar through `length $x` where `$x` is undef does **not** emit an `uninitialized` warning, unlike most other operators. - **Whole array or hash**: `length` does not count elements. It stringifies its argument, so `length @arr` first evaluates `@arr` in scalar context (its element count) and then counts the digits of that number. For the actual size use `scalar @arr` or `scalar keys %hash`: ```perl my @arr = (1) x 100; length @arr; # 3 (digits of "100") scalar @arr; # 100 ``` - **Tied or magical scalars**: `length` triggers FETCH, so the counted value is whatever the tie layer returns at that moment. - **Overloaded objects**: `length $obj` invokes the `""` stringification overload (or falls back to the default reference form) and counts characters in the resulting string. - **Wide characters from byte input**: a string read from a handle without an encoding layer is a byte string. `length` counts bytes then — because each byte is one character to Perl. Decode first if you want character counts: ```perl open my $fh, "<:raw", $path or die $!; my $bytes = do { local $/; <$fh> }; my $text = decode('UTF-8', $bytes); length $bytes; # byte count length $text; # character count ``` - **Locale and pragmas**: unlike [`lc`](lc) / [`uc`](uc) / [`fc`](fc), `length` does not vary by `use locale`. A character is a character regardless. ## Differences from upstream Fully compatible with upstream Perl 5.42. ## See also - [`scalar`](scalar) — force scalar context; the correct way to get an element count from an array (`scalar @arr`) instead of the length of its stringified count - [`substr`](substr) — extract a substring by character offset; pairs naturally with `length` for slicing work - [`index`](index) — find the position of a substring; returns a character offset measured the same way `length` measures - [`sprintf`](sprintf) — `%s` field widths are character counts, same notion as `length` - [`reverse`](reverse) — in scalar context reverses a string character-by-character; same character model as `length` - [`pos`](pos) — regex-match position, also in characters