Alternatives to ithreads#

pperl does not implement Perl’s interpreter threads. That sounds like a loss until you look at why most programs reach for threads. Almost every upstream ithreaded program falls into one of three shapes:

  • Parallel compute over data — “apply this function to each element of this large array, as fast as you can.”

  • Parallel I/O — “perform these N independent network or filesystem calls concurrently.”

  • Background work with isolated state — “run this task to the side, with no shared memory, and wait for the result.”

pperl has a dedicated, better-performing answer for each.

Decision table#

Your goal

Upstream tool

pperl-native answer

Parallelise a compute-heavy loop

threads + work-crew pattern

Auto-parallelisation (JIT + Rayon)

Parallelise map / grep over large data

Work-crew thread pool

Parallel map / grep

Run I/O-bound tasks concurrently

threads + Thread::Queue

fork with pipes, or event-driven I/O

Isolate a subtask with its own state

Detached thread

fork

Produce/consume pipeline

threads + Thread::Queue chain

fork + pipes, or sequential iterator

Synchronised counter

:shared + lock

Sequential accumulator + parallel reduction

Auto-parallelisation for compute loops#

pperl’s JIT detects loops whose bodies are free of I/O, global writes, and impure calls, then dispatches the loop body across a Rayon work-stealing pool. A scalar-accumulator like $sum += ... is recognised as a reduction and combined after all iterations finish.

my $sum = 0;
for my $i (1 .. 10_000_000) {
    $sum += sqrt($i);
}
print $sum, "\n";

Under pperl this compiles to a single Rayon-dispatched loop body. On a typical 8-core machine the speedup is in the 5-6x range over the sequential JIT, with no user-visible thread objects, no synchronisation code, and no shared state.

See Parallel Execution for the full story — what qualifies, how reduction detection works, and the CLI flags (--no-parallel, --threads=N, --parallel-threshold=N) that control the behaviour.

Porting a work-crew ithreaded loop typically means deleting the threading code. The sequential form is often the pperl-optimal form already.

Before:

use threads;
use threads::shared;

my $sum :shared = 0;
my @chunks = chunk_data(\@big);
my @workers = map {
    my $c = $_;
    threads->create(sub {
        my $local = 0;
        $local += process($_) for @$c;
        lock $sum; $sum += $local;
    });
} @chunks;
$_->join for @workers;

After:

my $sum = 0;
$sum += process($_) for @big;

The chunking, the per-thread accumulator, the lock — all gone. The JIT does the chunking. The reduction detector handles the accumulator. There is no shared variable to lock.

Parallel map and grep#

For list-shaped transformations, the built-ins are already the natural idiom:

my @results  = map  { expensive($_) } @input;
my @filtered = grep { test($_)     } @input;

When the callback has no detectable side effects and the input size exceeds --parallel-threshold, pperl dispatches the callback in parallel. Result order is preserved.

Equivalent ithreaded code would involve a thread pool, an input queue, an output queue, a sentinel value to signal end-of-input, and per-thread accumulators. The pperl version is one line.

fork — process-level concurrency#

When work is not a pure compute loop — it involves I/O, subprocess management, or genuinely independent state — reach for fork. A forked child has:

  • A full copy of the parent’s memory, copy-on-write at the OS level.

  • Its own file descriptor table (with shared underlying file descriptions).

  • Complete isolation at the interpreter level — no shared Perl heap, ever.

my $pid = fork;
die "fork: $!" unless defined $pid;
if ($pid == 0) {
    # child
    exec 'processing-tool', @args
        or die "exec: $!";
}
# parent
waitpid $pid, 0;

Communication uses pipes, sockets, or the filesystem — the same inter-process primitives you would use between unrelated programs. That sounds heavier than in-process threading, and at the raw syscall level it is; but for program structure it is often simpler because there is no shared memory to guard.

Forking a work crew#

The upstream threads::shared work-crew in the previous section translates to a fork-based equivalent when the work involves I/O:

my @pids;
for my $chunk (@chunks) {
    my $pid = fork // die "fork: $!";
    if ($pid == 0) {
        process_chunk($chunk);
        exit 0;
    }
    push @pids, $pid;
}
waitpid $_, 0 for @pids;

Results flow back through pipes, files, or a named collection point in /tmp. Whatever you would have used between separate programs.

Pipeline via fork and pipes#

For pipeline-shaped workloads, Perl’s open with a |- or -| form spawns a child with a pipe already attached:

open my $producer, '-|', 'find', '/data', '-type', 'f'
    or die "fork: $!";
while (my $path = <$producer>) {
    chomp $path;
    # filter / process / forward
}
close $producer;

Each pipeline stage is its own process, scheduled by the OS, with no shared Perl state.

Isolated background work#

The upstream pattern of threads->create(sub { ... })->detach — fire and forget — maps to a double-fork:

my $pid = fork // die "fork: $!";
if ($pid == 0) {
    # first child: fork again and exit immediately, orphaning
    # the grandchild to init so the parent does not have to wait
    my $grand = fork // die "fork: $!";
    exit 0 if $grand != 0;
    background_work();
    exit 0;
}
waitpid $pid, 0;    # reap the first child, not the grandchild

The grandchild runs to completion independently of the original program, no zombie is left behind, and state isolation is absolute.

Choosing between auto-parallelisation and fork#

  • Auto-parallelisation wins for compute-bound loops over in-memory data: no process startup cost, no serialisation of results, JIT-compiled body.

  • fork wins for I/O, subprocess work, and anything where the task should not inherit the parent’s global state changes.

  • Neither is the right answer for low-cost task dispatch in tight loops — the usual culprit there is a sequential loop that does not actually benefit from concurrency. Measure before adding either layer.

See also#

  • Parallel Execution — the auto-parallelisation chapter: what qualifies, CLI flags, reduction detection

  • ithreads basics — the upstream model for context on what these alternatives replace

  • Shared data — the upstream sharing primitives, for reading existing ithreaded code

  • fork — full reference for the process-level primitive

  • wait — reap a child process

  • lock — the no-op ithreads primitive under pperl

  • Reference · P5 and Reference · PP — threading support is a runtime concern