--- name: split signatures: - 'split /PATTERN/, EXPR, LIMIT' - 'split /PATTERN/, EXPR' - 'split /PATTERN/' - 'split' since: 5.0 status: documented categories: ["Regular expressions and pattern matching"] --- ```{index} single: split; Perl built-in ``` *[Regular expressions and pattern matching](../perlfunc-by-category)* # split Cut a string into a list of fields using a regex separator. `split` scans `EXPR` for matches of `PATTERN` and returns the pieces between the matches as a list. The matched text itself is the separator and is **not** included in the result — unless the pattern contains capture groups, in which case each capture becomes an extra element in the output. In scalar context the list size is returned. If `PATTERN` is omitted, `split` behaves like the `awk` default: runs of whitespace are the separator and any leading whitespace in `EXPR` is stripped. If `EXPR` is omitted, [`$_`](../perlvar) is split. ## Synopsis ```perl split /PATTERN/, EXPR, LIMIT split /PATTERN/, EXPR split /PATTERN/ split ``` ## What you get back A list of substrings (fields) in list context; the count of fields in scalar context. The separator text is never part of a field. ```perl my @fields = split /,/, "a,b,c"; # ("a", "b", "c") my $n = split /,/, "a,b,c"; # 3 ``` Splitting an `EXPR` that is the empty string always yields zero fields, regardless of `LIMIT`: ```perl my @x = split /,/, "", -1; # () ``` Prior to Perl 5.11, `split` in void or scalar context overwrote [`@_`](../perlvar). Modern Perl does not; never rely on that old side effect. ## The LIMIT argument `LIMIT` controls how many fields are produced and, crucially, whether trailing empty fields are kept. - **Positive `LIMIT`** — upper bound on the number of fields. `EXPR` is split at most `LIMIT - 1` times; the final field holds the rest of the string verbatim. `LIMIT = 1` means no splits at all — you get the whole string back as a single element. ```perl my @x = split /,/, "a,b,c", 1; # ("a,b,c") my @x = split /,/, "a,b,c", 2; # ("a", "b,c") my @x = split /,/, "a,b,c", 3; # ("a", "b", "c") my @x = split /,/, "a,b,c", 4; # ("a", "b", "c") ``` - **Negative `LIMIT`** — treated as arbitrarily large. As many fields as possible are produced, including **all** trailing empty fields. ```perl my @x = split /,/, "a,b,c,,,", -1; # ("a", "b", "c", "", "", "") ``` - **Omitted or zero `LIMIT`** — like negative, **except** trailing empty fields are stripped. Leading empty fields are always kept. ```perl my @x = split /,/, "a,b,c,,,"; # ("a", "b", "c") my @x = split /,/, ",,a,b"; # ("", "", "a", "b") ``` - **Implicit `LIMIT` in list assignment.** When assigning to a fixed list, Perl sets `LIMIT` to one more than the number of targets, so the trailing fields do not need to be scanned. The following gets `LIMIT = 3` automatically: ```perl my ($login, $passwd) = split /:/; ``` In time-critical code, pass an explicit `LIMIT` rather than letting the whole string be scanned. ## The awk-style whitespace case: `split " "` A literal single space as the pattern — `split " "`, not `split / /` — triggers a special case: `PATTERN` is treated as `/\s+/` **and** any leading whitespace in `EXPR` is stripped before splitting. This matches classic `awk` behaviour. ```perl my @x = split " ", " Quick brown fox\n"; # ("Quick", "brown", "fox") my @x = split " ", "RED\tGREEN\tBLUE"; # ("RED", "GREEN", "BLUE") ``` To split on a *single* literal space only, use the regex form `/ /` — it is not special-cased: ```perl my @x = split / /, " abc"; # ("", "abc") ``` Since Perl 5.18, the trigger is any expression whose value is the single-character string `" "` (not just a literal), and since Perl 5.28 the rule works correctly under `use feature 'unicode_strings'`. Under Perl 5.39.9+ the `/x` default modifier does **not** affect `split STRING`, so `split " "` still means awk-emulation even inside `use re "/x"`. If you want to split on one literal space under `use re "/x"`, write `split /(?-x: )/` or `split /\x{20}/`. If `PATTERN` is omitted entirely, it defaults to `" "`, so bare `split` is the awk-style form. ## The empty-pattern case: splitting into characters If `PATTERN` matches the empty string, the split happens at every match position — i.e. between characters. ```perl my @x = split //, "abc"; # ("a", "b", "c") ``` As a `split`-specific rule, the bare match operator `//` is **not** the "repeat the last successful match" form here; it is the literal empty pattern. A zero-width match at the very start of `EXPR` never produces an empty leading field, which is why splitting a leading space is handled like this: ```perl my @x = split //, " abc"; # (" ", "a", "b", "c") — 4, not 5 my @x = split //, " abc", -1; # (" ", "a", "b", "c", "") — trailing empty with -1 ``` A positive-width match at the start *does* produce an empty leading field: ```perl my @x = split / /, " abc"; # ("", "abc") ``` ## Capture groups become extra elements If `PATTERN` contains capturing groups, each capture is inserted into the output list for every separator match, in the order the groups are declared. A group that does not participate in the match contributes [`undef`](undef). These extras do **not** count toward `LIMIT`. ```perl my @x = split /-|,/ , "1-10,20", 3; # ("1", "10", "20") my @x = split /(-|,)/ , "1-10,20", 3; # ("1", "-", "10", ",", "20") my @x = split /-|(,)/ , "1-10,20", 3; # ("1", undef, "10", ",", "20") my @x = split /(-)|,/ , "1-10,20", 3; # ("1", "-", "10", undef, "20") my @x = split /(-)|(,)/, "1-10,20", 3; # ("1", "-", undef, "10", undef, ",", "20") ``` Use this when you want the separators preserved alongside the fields — typical for tokenisers where the delimiters are themselves part of the output stream. ## Global state it touches - [`$_`](../perlvar) — `EXPR` defaults to it when no second argument is given. - [`@_`](../perlvar) — modern Perl does **not** touch it. Only relevant if you support pre-5.11 perls. - The regex engine's match variables (`$1`, `$2`, …, `$&`, `$'`, `$``) are **not** set as a side effect of `split`, even when `PATTERN` captures — the captures go into the returned list. ## Examples Classic CSV-style line, no embedded commas: ```perl my @cells = split /,/, "alice,bob,carol"; # ("alice", "bob", "carol") ``` Parse a `/etc/passwd` line into known fields; implicit `LIMIT` stops at the first seven separators: ```perl my ($user, $pw, $uid, $gid, $gecos, $home, $shell) = split /:/, $line; ``` Awk-style whitespace tokenising, leading whitespace discarded: ```perl my @words = split " ", " one\ttwo three\n"; # ("one", "two", "three") ``` Characters of a string — the empty pattern: ```perl my @chars = split //, "hello"; # ("h", "e", "l", "l", "o") ``` Keep every trailing empty field for a strict column-count reader: ```perl my @cols = split /\t/, $line, -1; ``` Preserve separators by capturing — round-trip reconstructible: ```perl my @parts = split /([,;])/, "a,b;c"; # ("a", ",", "b", ";", "c") my $same = join "", @parts; # "a,b;c" ``` Default form reads [`$_`](../perlvar) and does awk-style whitespace splitting: ```perl for (@lines) { my @f = split; # split " ", $_ ... } ``` ## Edge cases - **Empty input** — `split /X/, ""` returns the empty list for every `LIMIT`, including `-1`. - **`LIMIT = 1`** — no splitting occurs; the whole string is the one and only field. Useful when you want `split` semantics conditionally without losing the input. - **`/^/` pattern** — treated as if `/^/m` so it matches at every line start, which is the only useful behaviour. Splits a string into lines keeping the newlines. - **Zero-width match at position 0** — never produces an empty leading field (see `split //, " abc"` above). - **Positive-width match at position 0** — produces an empty leading field, always. - **Match at end of string** — produces an empty trailing field, which is then stripped unless `LIMIT` is non-zero. - **Patterns with modifiers** — the usual `qr//` modifiers (`/i`, `/m`, `/s`, `/x`, `/u`, `/a`, `/l`, `/n`) apply. The pattern need not be a literal; any expression producing a regex or string works, and `qr//` objects are accepted. - **Single-space string that is not a literal** — from Perl 5.18 on, any expression evaluating to `" "` triggers the awk-style special case, not just the literal `" "`. - **`split " "` vs `split / /`** — the first is awk-style whitespace (`\s+` with leading-ws stripping), the second is a single literal space. The two are not interchangeable. - **Unicode whitespace** — under `use feature 'unicode_strings'` (default since 5.28 for the awk-style case), `split " "` treats Unicode whitespace as a separator too. Outside that scope the behaviour is affected by the "Unicode Bug". ## Differences from upstream Fully compatible with upstream Perl 5.42. ## See also - [`join`](join) — inverse operation; stitches a list back into a string with a chosen glue - [`m`](m) — the match operator; when you want to *find* something rather than cut the string around it - [`qr`](qr) — pre-compile a pattern once for repeated `split` calls in a hot loop - [`index`](index) — locate a fixed substring without building the full list of fields; cheaper when you only need the first split - [`substr`](substr) — extract by byte/character offset when the field boundaries are positional, not delimited