--- name: substr signatures: - 'substr EXPR, OFFSET' - 'substr EXPR, OFFSET, LENGTH' - 'substr EXPR, OFFSET, LENGTH, REPLACEMENT' - 'substr EXPR, OFFSET, LENGTH (as lvalue)' since: 5.0 status: documented categories: ["SCALARs and strings"] --- ```{index} single: substr; Perl built-in ``` *[SCALARs and strings](../perlfunc-by-category)* # substr Extract, replace, or alias a contiguous slice of a string. `substr` picks out a run of characters from `EXPR` starting at `OFFSET` and spanning `LENGTH` characters, and returns that slice. The same call can also *modify* the original string in three distinct ways: as an lvalue on the left of `=`, through the four-argument `REPLACEMENT` form, or through an aliased lvalue stored in a variable that keeps tracking the slice across subsequent writes. Character zero is the first character. Negative `OFFSET` counts from the end; negative `LENGTH` stops that many characters short of the end. If `LENGTH` is omitted, the slice runs to the end of the string. ## Synopsis ```perl substr EXPR, OFFSET substr EXPR, OFFSET, LENGTH substr EXPR, OFFSET, LENGTH, REPLACEMENT substr(EXPR, OFFSET, LENGTH) = NEWVALUE # lvalue form ``` ## What you get back In rvalue position, a new scalar containing the extracted slice. In lvalue position, a magic scalar that, when assigned to, splices the new value into `EXPR` at the recorded position — the returned scalar does **not** hold a copy of the original characters. The four-argument form assigns `REPLACEMENT` into `EXPR` in place and returns the substring that was there before, so the extract and the replace happen in one call. If the requested span lies partially outside the string, only the in-bounds portion comes back. If it lies entirely outside the string, the rvalue forms return [`undef`](undef) and warn; the lvalue forms raise an exception. ## Global state it touches None. `substr` is pure with respect to interpreter globals. It does not read or write [`$_`](../perlvar), [`$/`](../perlvar), [`$\`](../perlvar), [`$!`](../perlvar), or any other special variable. Warnings about out-of-range access are governed by the usual `use warnings` / `$^W` scope. ## Positions and negative indices `OFFSET` is measured from the start of the string when non-negative, and from the end when negative: `-1` is the position of the last character, `-2` the second-to-last, and so on. `LENGTH`, when non-negative, is the number of characters to include. When negative, it is a right-anchor: the slice stops that many characters short of the end of the string. Omitting `LENGTH` is the same as "to the end." ```perl my $s = "The black cat climbed the green tree"; my $color = substr $s, 4, 5; # "black" my $middle = substr $s, 4, -11; # "black cat climbed the" my $end = substr $s, 14; # "climbed the green tree" my $tail = substr $s, -4; # "tree" my $z = substr $s, -4, 2; # "tr" ``` ## Using substr as an lvalue If `EXPR` is itself an lvalue, `substr(EXPR, OFFSET, LENGTH)` can appear on the left-hand side of an assignment. The right-hand value replaces the slice. The replacement does not have to match `LENGTH`: a shorter replacement shortens the string, a longer one extends it. ```perl my $name = 'fred'; substr($name, 0, 2) = 'SH'; # $name is now "SHed" substr($name, 1, 2) = 'ayfiel'; # $name is now "Shayfield" ``` Assigning past the current end of the string is allowed, up to one position past the end (append-at-end). Any further and the assignment raises an exception. ```perl my $name = 'fred'; substr($name, 4) = 'dy'; # $name is now "freddy" substr($name, 7) = 'gap'; # exception ``` To keep the string length fixed, pad or truncate the replacement explicitly — for example with `sprintf "%-*s", $len, $val`. ## The four-argument replacement form `substr EXPR, OFFSET, LENGTH, REPLACEMENT` replaces the slice in place and returns the substring that *was* there. This is usually preferable to the lvalue form: the operation is a single expression, and the old value is available as the return value. ```perl my $s = "The black cat climbed the green tree"; my $z = substr $s, 14, 7, "jumped from"; # $z is "climbed" # $s is now "The black cat jumped from the green tree" ``` This mirrors the extract-and-replace idiom of [`splice`](splice) on arrays. ## The aliased-lvalue trick The scalar returned by the three-argument form is **not** a copy of the characters — it is a magic bullet that remembers which region of the original string it refers to. Stashing that lvalue in a variable (via `for`, or `\` taken through an lvalue accessor) lets repeated writes keep splicing into the same slot of the original string, even as that string changes size: ```perl my $x = '1234'; for (substr($x, 1, 2)) { $_ = 'a'; print $x, "\n"; # 1a4 $_ = 'xyz'; print $x, "\n"; # 1xyz4 $x = '56789'; $_ = 'pq'; print $x, "\n"; # 5pq9 } ``` With a negative `OFFSET`, the alias tracks the end of the string: ```perl my $x = '1234'; for (substr($x, -3, 2)) { $_ = 'a'; print $x, "\n"; # 1a4 $x = 'abcdefg'; print $_, "\n"; # f } ``` This is useful when repeatedly rewriting the same slot of a buffer (header fields in a fixed-layout record, for instance) — but it is also a classic source of surprises. Prefer the four-argument form unless you specifically want the aliasing. ## Examples Extract a fixed field from a record: ```perl my $line = "2026-04-22 richard.jelinek\@example.com"; my $date = substr $line, 0, 10; # "2026-04-22" ``` Strip a leading prefix of known length: ```perl my $msg = "ERROR: disk full"; my $rest = substr $msg, length("ERROR: "); # "disk full" ``` Overwrite a fixed field in place with the four-argument form, and keep the old value: ```perl my $record = "name: John age: 42"; my $old = substr $record, 12, 10, sprintf("%-10s", "Jane"); # $record is now "name: Jane age: 42" # $old is "John " ``` Walk a string in windows of three characters without copying the whole string each time: ```perl my $s = "ABCDEFGH"; for (my $i = 0; $i + 3 <= length $s; $i++) { print substr($s, $i, 3), "\n"; # ABC BCD CDE DEF ... } ``` Rewrite the last byte of a buffer repeatedly via an aliased lvalue: ```perl my $buf = "packet: 0"; my $tail = \substr($buf, -1, 1); $$tail = '1'; # $buf is now "packet: 1" $$tail = '2'; # $buf is now "packet: 2" ``` ## Edge cases - **Out-of-range rvalue, partially in bounds**: returns just the in-bounds portion, no warning. ```perl my $s = "abcd"; substr($s, 2, 100); # "cd" ``` - **Out-of-range rvalue, entirely past the end**: returns [`undef`](undef) and warns under `use warnings`. The empty-tail case (exactly at the end) returns `""` without a warning: ```perl my $name = 'fred'; my $null = substr $name, 4, 2; # "" (no warning) my $oops = substr $name, 7; # undef, with warning ``` - **Out-of-range lvalue**: raises an exception. `substr` refuses to silently extend the string by a gap of unspecified contents. - **Negative `OFFSET` larger than the string**: counts as "before the start", and is clamped to position `0` in current Perls; under `use warnings` this issues a "substr outside of string" warning in some versions. Prefer to clamp explicitly. - **Negative `LENGTH` that exceeds the remaining tail**: yields an empty string, because the right anchor falls at or before the left anchor. ```perl my $s = "abcde"; substr($s, 1, -10); # "" ``` - **Unicode**: `substr` operates in characters, not bytes. On a string with the UTF-8 flag on, positions count codepoints; on a byte string, positions count bytes. Mixing the two by treating a decoded string as a byte buffer produces wrong offsets — decode first, then index. - **Tied or magical scalars**: lvalue `substr` honours `STORE`-time magic on the target, so tied variables see the mutated whole string, not the slice. - **Aliased lvalue kept beyond its target's lifetime**: if the original variable goes out of scope while the alias is still alive, later writes through the alias act on a freed string; keep the alias lifetime inside the target's. - **Pre-5.10 corner**: repeated writes through a single lvalue had unspecified behaviour in older perls, as did negative offsets before 5.16. Modern Perl (and pperl) follow the semantics documented on this page. ## Differences from upstream Fully compatible with upstream Perl 5.42. ## See also - [`index`](index) — find the position of a substring, to feed into `substr` as `OFFSET` - [`rindex`](rindex) — same, searching from the right - [`length`](length) — the upper bound on any meaningful `OFFSET` or `LENGTH` you pass to `substr` - [`splice`](splice) — the array-level analogue of the four-argument replacement form - [`pack`](pack) / [`unpack`](unpack) — for fixed-layout records where many fields are read or rewritten in one shot, prefer these over a chain of `substr` calls - [`tr///`](../perlop) — bulk character-level substitution that doesn't need positional indices