SCALARs and strings

substr#

Extract, replace, or alias a contiguous slice of a string.

substr picks out a run of characters from EXPR starting at OFFSET and spanning LENGTH characters, and returns that slice. The same call can also modify the original string in three distinct ways: as an lvalue on the left of =, through the four-argument REPLACEMENT form, or through an aliased lvalue stored in a variable that keeps tracking the slice across subsequent writes. Character zero is the first character. Negative OFFSET counts from the end; negative LENGTH stops that many characters short of the end. If LENGTH is omitted, the slice runs to the end of the string.

Synopsis#

substr EXPR, OFFSET
substr EXPR, OFFSET, LENGTH
substr EXPR, OFFSET, LENGTH, REPLACEMENT
substr(EXPR, OFFSET, LENGTH) = NEWVALUE       # lvalue form

What you get back#

In rvalue position, a new scalar containing the extracted slice. In lvalue position, a magic scalar that, when assigned to, splices the new value into EXPR at the recorded position — the returned scalar does not hold a copy of the original characters. The four-argument form assigns REPLACEMENT into EXPR in place and returns the substring that was there before, so the extract and the replace happen in one call.

If the requested span lies partially outside the string, only the in-bounds portion comes back. If it lies entirely outside the string, the rvalue forms return undef and warn; the lvalue forms raise an exception.

Global state it touches#

None. substr is pure with respect to interpreter globals. It does not read or write $_, $/, $\, $!, or any other special variable. Warnings about out-of-range access are governed by the usual use warnings / $^W scope.

Positions and negative indices#

OFFSET is measured from the start of the string when non-negative, and from the end when negative: -1 is the position of the last character, -2 the second-to-last, and so on.

LENGTH, when non-negative, is the number of characters to include. When negative, it is a right-anchor: the slice stops that many characters short of the end of the string. Omitting LENGTH is the same as “to the end.”

my $s = "The black cat climbed the green tree";
my $color  = substr $s, 4, 5;       # "black"
my $middle = substr $s, 4, -11;     # "black cat climbed the"
my $end    = substr $s, 14;         # "climbed the green tree"
my $tail   = substr $s, -4;         # "tree"
my $z      = substr $s, -4, 2;      # "tr"

Using substr as an lvalue#

If EXPR is itself an lvalue, substr(EXPR, OFFSET, LENGTH) can appear on the left-hand side of an assignment. The right-hand value replaces the slice. The replacement does not have to match LENGTH: a shorter replacement shortens the string, a longer one extends it.

my $name = 'fred';
substr($name, 0, 2) = 'SH';         # $name is now "SHed"
substr($name, 1, 2) = 'ayfiel';     # $name is now "Shayfield"

Assigning past the current end of the string is allowed, up to one position past the end (append-at-end). Any further and the assignment raises an exception.

my $name = 'fred';
substr($name, 4) = 'dy';            # $name is now "freddy"
substr($name, 7) = 'gap';           # exception

To keep the string length fixed, pad or truncate the replacement explicitly — for example with sprintf "%-*s", $len, $val.

The four-argument replacement form#

substr EXPR, OFFSET, LENGTH, REPLACEMENT replaces the slice in place and returns the substring that was there. This is usually preferable to the lvalue form: the operation is a single expression, and the old value is available as the return value.

my $s = "The black cat climbed the green tree";
my $z = substr $s, 14, 7, "jumped from";    # $z is "climbed"
# $s is now "The black cat jumped from the green tree"

This mirrors the extract-and-replace idiom of splice on arrays.

The aliased-lvalue trick#

The scalar returned by the three-argument form is not a copy of the characters — it is a magic bullet that remembers which region of the original string it refers to. Stashing that lvalue in a variable (via for, or \ taken through an lvalue accessor) lets repeated writes keep splicing into the same slot of the original string, even as that string changes size:

my $x = '1234';
for (substr($x, 1, 2)) {
    $_ = 'a';    print $x, "\n";    # 1a4
    $_ = 'xyz';  print $x, "\n";    # 1xyz4
    $x = '56789';
    $_ = 'pq';   print $x, "\n";    # 5pq9
}

With a negative OFFSET, the alias tracks the end of the string:

my $x = '1234';
for (substr($x, -3, 2)) {
    $_ = 'a';    print $x, "\n";    # 1a4
    $x = 'abcdefg';
    print $_, "\n";                 # f
}

This is useful when repeatedly rewriting the same slot of a buffer (header fields in a fixed-layout record, for instance) — but it is also a classic source of surprises. Prefer the four-argument form unless you specifically want the aliasing.

Examples#

Extract a fixed field from a record:

my $line = "2026-04-22  richard.jelinek\@example.com";
my $date = substr $line, 0, 10;         # "2026-04-22"

Strip a leading prefix of known length:

my $msg = "ERROR: disk full";
my $rest = substr $msg, length("ERROR: ");   # "disk full"

Overwrite a fixed field in place with the four-argument form, and keep the old value:

my $record = "name:       John      age: 42";
my $old    = substr $record, 12, 10, sprintf("%-10s", "Jane");
# $record is now "name:       Jane      age: 42"
# $old is "John      "

Walk a string in windows of three characters without copying the whole string each time:

my $s = "ABCDEFGH";
for (my $i = 0; $i + 3 <= length $s; $i++) {
    print substr($s, $i, 3), "\n";  # ABC BCD CDE DEF ...
}

Rewrite the last byte of a buffer repeatedly via an aliased lvalue:

my $buf = "packet: 0";
my $tail = \substr($buf, -1, 1);
$$tail = '1';  # $buf is now "packet: 1"
$$tail = '2';  # $buf is now "packet: 2"

Edge cases#

  • Out-of-range rvalue, partially in bounds: returns just the in-bounds portion, no warning.

    my $s = "abcd";
    substr($s, 2, 100);                 # "cd"
    
  • Out-of-range rvalue, entirely past the end: returns undef and warns under use warnings. The empty-tail case (exactly at the end) returns "" without a warning:

    my $name = 'fred';
    my $null = substr $name, 4, 2;      # "" (no warning)
    my $oops = substr $name, 7;         # undef, with warning
    
  • Out-of-range lvalue: raises an exception. substr refuses to silently extend the string by a gap of unspecified contents.

  • Negative OFFSET larger than the string: counts as “before the start”, and is clamped to position 0 in current Perls; under use warnings this issues a “substr outside of string” warning in some versions. Prefer to clamp explicitly.

  • Negative LENGTH that exceeds the remaining tail: yields an empty string, because the right anchor falls at or before the left anchor.

    my $s = "abcde";
    substr($s, 1, -10);                 # ""
    
  • Unicode: substr operates in characters, not bytes. On a string with the UTF-8 flag on, positions count codepoints; on a byte string, positions count bytes. Mixing the two by treating a decoded string as a byte buffer produces wrong offsets — decode first, then index.

  • Tied or magical scalars: lvalue substr honours STORE-time magic on the target, so tied variables see the mutated whole string, not the slice.

  • Aliased lvalue kept beyond its target’s lifetime: if the original variable goes out of scope while the alias is still alive, later writes through the alias act on a freed string; keep the alias lifetime inside the target’s.

  • Pre-5.10 corner: repeated writes through a single lvalue had unspecified behaviour in older perls, as did negative offsets before 5.16. Modern Perl (and pperl) follow the semantics documented on this page.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • index — find the position of a substring, to feed into substr as OFFSET

  • rindex — same, searching from the right

  • length — the upper bound on any meaningful OFFSET or LENGTH you pass to substr

  • splice — the array-level analogue of the four-argument replacement form

  • pack / unpack — for fixed-layout records where many fields are read or rewritten in one shot, prefer these over a chain of substr calls

  • tr/// — bulk character-level substitution that doesn’t need positional indices