SCALARs and strings

index#

Find the position of a substring inside a string.

index scans STR left-to-right looking for the first occurrence of SUBSTR and returns the zero-based position where it starts. No regular-expression metacharacters, no case folding, no wildcards — SUBSTR is matched literally, character for character. When the search fails, index returns -1.

Synopsis#

index STR, SUBSTR
index STR, SUBSTR, POSITION

What you get back#

An integer. On a match, the zero-based offset of the first character of SUBSTR inside STR. On no match, -1. The sentinel is the idiomatic way to test:

if (index($line, $needle) >= 0) { ... }   # found
if (index($line, $needle) == -1) { ... }  # not found

POSITION and the return value use the same zero-based scale, so you can feed one back into the other to walk every occurrence:

my $pos = -1;
while (($pos = index($text, $needle, $pos + 1)) != -1) {
    push @hits, $pos;
}

How POSITION is interpreted#

POSITION is the earliest offset the match is allowed to start at. The search still proceeds to the end of STR; POSITION does not bound the search, it only shifts where it begins.

  • POSITION omitted or undef — search from offset 0.

  • POSITION negative or otherwise before the start — treated as 0.

  • POSITION past the end of STR — treated as the end, so the only way to match is if SUBSTR is the empty string (which matches at any offset, including the end).

An empty SUBSTR always matches, and matches at POSITION (clamped into range). This follows from the “first position where SUBSTR occurs” rule: the empty string occurs everywhere.

index("hello", "");      # 0
index("hello", "", 3);   # 3
index("hello", "", 99);  # 5   (clamped to end of string)

Examples#

Find a single character or a whole word:

index("Perl is great", "P");     # 0
index("Perl is great", "g");     # 8
index("Perl is great", "great"); # 8

Report a miss:

index("Perl is great", "Z");     # -1

Skip past an earlier match with POSITION to find the second occurrence:

index("Perl is great", "e", 5);  # 10

Walk every occurrence of a substring:

my $s = "abcabcabc";
my $p = -1;
while (($p = index($s, "bc", $p + 1)) != -1) {
    print "hit at $p\n";
}
# hit at 1
# hit at 4
# hit at 7

A common idiom — test for containment without building a regex:

if (index($path, "/tmp/") != -1) {
    warn "path touches /tmp";
}

Pairs naturally with substr to split on the first occurrence of a separator:

my $line = "key=value=with=equals";
my $eq   = index($line, "=");
my ($k, $v) = $eq >= 0
    ? (substr($line, 0, $eq), substr($line, $eq + 1))
    : ($line, undef);

Edge cases#

  • Empty SUBSTR matches at POSITION (clamped into STR). index($s, "") is 0; index($s, "", $n) is $n capped at length $s. Never -1.

  • Empty STR with non-empty SUBSTR returns -1. Empty STR with empty SUBSTR returns 0.

  • Negative POSITION is clamped to 0. index does not interpret negative offsets as “from the end” — that is rindex’s job, and even there the semantics differ.

  • POSITION past the end of STR is clamped to length STR, so only an empty SUBSTR can match.

  • Undef arguments stringify to "" and trigger an uninitialized warning under use warnings. index(undef, "x") is -1; index("abc", undef) is 0 (empty-substring rule).

  • Characters, not bytes. index operates on the logical character sequence of the string. For a string of wide characters, the returned offset is a character offset, not a byte offset. If you need byte offsets, downgrade or encode the string first (use bytes for a lexical byte view, or Encode::encode_utf8 to work on an octet string).

  • Case sensitivity: index is case-sensitive. Lowercase both arguments first if you want a case-insensitive search, or use =~ /\Q$needle\E/i and @- / $-[0] to recover the position.

Differences from upstream#

Fully compatible with upstream Perl 5.42.

See also#

  • rindex — same matching rules, scans from the right and returns the offset of the last occurrence at or before POSITION

  • substr — extract the matched region once index has located it, or replace it in place

  • length — upper bound for a valid POSITION; returns character length on the same scale index uses

  • pos — position tracking for regex-based scanning; use together with m//g when you need captures rather than a raw offset

  • sprintf — build the search string when SUBSTR is assembled from parts; index takes a literal, so prepare it first