Regular expressions and pattern matching
split#
Cut a string into a list of fields using a regex separator.
split scans EXPR for matches of PATTERN and returns the pieces
between the matches as a list. The matched text itself is the
separator and is not included in the result — unless the pattern
contains capture groups, in which case each capture becomes an extra
element in the output. In scalar context the list size is returned.
If PATTERN is omitted, split behaves like the awk default:
runs of whitespace are the separator and any leading whitespace in
EXPR is stripped. If EXPR is omitted, $_ is split.
Synopsis#
split /PATTERN/, EXPR, LIMIT
split /PATTERN/, EXPR
split /PATTERN/
split
What you get back#
A list of substrings (fields) in list context; the count of fields in scalar context. The separator text is never part of a field.
my @fields = split /,/, "a,b,c"; # ("a", "b", "c")
my $n = split /,/, "a,b,c"; # 3
Splitting an EXPR that is the empty string always yields zero
fields, regardless of LIMIT:
my @x = split /,/, "", -1; # ()
Prior to Perl 5.11, split in void or scalar context overwrote
@_. Modern Perl does not; never rely on that old
side effect.
The LIMIT argument#
LIMIT controls how many fields are produced and, crucially, whether
trailing empty fields are kept.
Positive
LIMIT— upper bound on the number of fields.EXPRis split at mostLIMIT - 1times; the final field holds the rest of the string verbatim.LIMIT = 1means no splits at all — you get the whole string back as a single element.my @x = split /,/, "a,b,c", 1; # ("a,b,c") my @x = split /,/, "a,b,c", 2; # ("a", "b,c") my @x = split /,/, "a,b,c", 3; # ("a", "b", "c") my @x = split /,/, "a,b,c", 4; # ("a", "b", "c")
Negative
LIMIT— treated as arbitrarily large. As many fields as possible are produced, including all trailing empty fields.my @x = split /,/, "a,b,c,,,", -1; # ("a", "b", "c", "", "", "")
Omitted or zero
LIMIT— like negative, except trailing empty fields are stripped. Leading empty fields are always kept.my @x = split /,/, "a,b,c,,,"; # ("a", "b", "c") my @x = split /,/, ",,a,b"; # ("", "", "a", "b")
Implicit
LIMITin list assignment. When assigning to a fixed list, Perl setsLIMITto one more than the number of targets, so the trailing fields do not need to be scanned. The following getsLIMIT = 3automatically:my ($login, $passwd) = split /:/;
In time-critical code, pass an explicit
LIMITrather than letting the whole string be scanned.
The awk-style whitespace case: split " "#
A literal single space as the pattern — split " ", not split / /
— triggers a special case: PATTERN is treated as /\s+/ and
any leading whitespace in EXPR is stripped before splitting. This
matches classic awk behaviour.
my @x = split " ", " Quick brown fox\n";
# ("Quick", "brown", "fox")
my @x = split " ", "RED\tGREEN\tBLUE";
# ("RED", "GREEN", "BLUE")
To split on a single literal space only, use the regex form
/ / — it is not special-cased:
my @x = split / /, " abc"; # ("", "abc")
Since Perl 5.18, the trigger is any expression whose value is the
single-character string " " (not just a literal), and since Perl
5.28 the rule works correctly under use feature 'unicode_strings'.
Under Perl 5.39.9+ the /x default modifier does not affect
split STRING, so split " " still means awk-emulation even inside
use re "/x". If you want to split on one literal space under
use re "/x", write split /(?-x: )/ or split /\x{20}/.
If PATTERN is omitted entirely, it defaults to " ", so bare
split is the awk-style form.
The empty-pattern case: splitting into characters#
If PATTERN matches the empty string, the split happens at every
match position — i.e. between characters.
my @x = split //, "abc"; # ("a", "b", "c")
As a split-specific rule, the bare match operator // is not
the “repeat the last successful match” form here; it is the literal
empty pattern. A zero-width match at the very start of EXPR never
produces an empty leading field, which is why splitting a leading
space is handled like this:
my @x = split //, " abc"; # (" ", "a", "b", "c") — 4, not 5
my @x = split //, " abc", -1; # (" ", "a", "b", "c", "") — trailing empty with -1
A positive-width match at the start does produce an empty leading field:
my @x = split / /, " abc"; # ("", "abc")
Capture groups become extra elements#
If PATTERN contains capturing groups, each capture is inserted
into the output list for every separator match, in the order the
groups are declared. A group that does not participate in the match
contributes undef. These extras do not count toward
LIMIT.
my @x = split /-|,/ , "1-10,20", 3;
# ("1", "10", "20")
my @x = split /(-|,)/ , "1-10,20", 3;
# ("1", "-", "10", ",", "20")
my @x = split /-|(,)/ , "1-10,20", 3;
# ("1", undef, "10", ",", "20")
my @x = split /(-)|,/ , "1-10,20", 3;
# ("1", "-", "10", undef, "20")
my @x = split /(-)|(,)/, "1-10,20", 3;
# ("1", "-", undef, "10", undef, ",", "20")
Use this when you want the separators preserved alongside the fields — typical for tokenisers where the delimiters are themselves part of the output stream.
Global state it touches#
$_—EXPRdefaults to it when no second argument is given.@_— modern Perl does not touch it. Only relevant if you support pre-5.11 perls.The regex engine’s match variables (
$1,$2, …,$&,$',$``) are **not** set as a side effect ofsplit, even whenPATTERN` captures — the captures go into the returned list.
Examples#
Classic CSV-style line, no embedded commas:
my @cells = split /,/, "alice,bob,carol"; # ("alice", "bob", "carol")
Parse a /etc/passwd line into known fields; implicit LIMIT stops
at the first seven separators:
my ($user, $pw, $uid, $gid, $gecos, $home, $shell)
= split /:/, $line;
Awk-style whitespace tokenising, leading whitespace discarded:
my @words = split " ", " one\ttwo three\n";
# ("one", "two", "three")
Characters of a string — the empty pattern:
my @chars = split //, "hello"; # ("h", "e", "l", "l", "o")
Keep every trailing empty field for a strict column-count reader:
my @cols = split /\t/, $line, -1;
Preserve separators by capturing — round-trip reconstructible:
my @parts = split /([,;])/, "a,b;c"; # ("a", ",", "b", ";", "c")
my $same = join "", @parts; # "a,b;c"
Default form reads $_ and does awk-style
whitespace splitting:
for (@lines) {
my @f = split; # split " ", $_
...
}
Edge cases#
Empty input —
split /X/, ""returns the empty list for everyLIMIT, including-1.LIMIT = 1— no splitting occurs; the whole string is the one and only field. Useful when you wantsplitsemantics conditionally without losing the input./^/pattern — treated as if/^/mso it matches at every line start, which is the only useful behaviour. Splits a string into lines keeping the newlines.Zero-width match at position 0 — never produces an empty leading field (see
split //, " abc"above).Positive-width match at position 0 — produces an empty leading field, always.
Match at end of string — produces an empty trailing field, which is then stripped unless
LIMITis non-zero.Patterns with modifiers — the usual
qr//modifiers (/i,/m,/s,/x,/u,/a,/l,/n) apply. The pattern need not be a literal; any expression producing a regex or string works, andqr//objects are accepted.Single-space string that is not a literal — from Perl 5.18 on, any expression evaluating to
" "triggers the awk-style special case, not just the literal" ".split " "vssplit / /— the first is awk-style whitespace (\s+with leading-ws stripping), the second is a single literal space. The two are not interchangeable.Unicode whitespace — under
use feature 'unicode_strings'(default since 5.28 for the awk-style case),split " "treats Unicode whitespace as a separator too. Outside that scope the behaviour is affected by the “Unicode Bug”.
Differences from upstream#
Fully compatible with upstream Perl 5.42.
See also#
join— inverse operation; stitches a list back into a string with a chosen gluem— the match operator; when you want to find something rather than cut the string around itqr— pre-compile a pattern once for repeatedsplitcalls in a hot loopindex— locate a fixed substring without building the full list of fields; cheaper when you only need the first splitsubstr— extract by byte/character offset when the field boundaries are positional, not delimited