Alternatives to ithreads#
pperl does not implement Perl’s interpreter threads. That sounds like a loss until you look at why most programs reach for threads. Almost every upstream ithreaded program falls into one of three shapes:
Parallel compute over data — “apply this function to each element of this large array, as fast as you can.”
Parallel I/O — “perform these N independent network or filesystem calls concurrently.”
Background work with isolated state — “run this task to the side, with no shared memory, and wait for the result.”
pperl has a dedicated, better-performing answer for each.
Decision table#
Your goal |
Upstream tool |
pperl-native answer |
|---|---|---|
Parallelise a compute-heavy loop |
|
Auto-parallelisation (JIT + Rayon) |
Parallelise |
Work-crew thread pool |
Parallel |
Run I/O-bound tasks concurrently |
|
|
Isolate a subtask with its own state |
Detached thread |
|
Produce/consume pipeline |
|
|
Synchronised counter |
|
Sequential accumulator + parallel reduction |
Auto-parallelisation for compute loops#
pperl’s JIT detects loops whose bodies are free of I/O, global
writes, and impure calls, then dispatches the loop body across a
Rayon work-stealing pool. A scalar-accumulator like $sum += ...
is recognised as a reduction and combined after all iterations
finish.
my $sum = 0;
for my $i (1 .. 10_000_000) {
$sum += sqrt($i);
}
print $sum, "\n";
Under pperl this compiles to a single Rayon-dispatched loop body. On a typical 8-core machine the speedup is in the 5-6x range over the sequential JIT, with no user-visible thread objects, no synchronisation code, and no shared state.
See Parallel Execution for the full story — what
qualifies, how reduction detection works, and the CLI flags
(--no-parallel, --threads=N, --parallel-threshold=N) that
control the behaviour.
Porting a work-crew ithreaded loop typically means deleting the threading code. The sequential form is often the pperl-optimal form already.
Before:
use threads;
use threads::shared;
my $sum :shared = 0;
my @chunks = chunk_data(\@big);
my @workers = map {
my $c = $_;
threads->create(sub {
my $local = 0;
$local += process($_) for @$c;
lock $sum; $sum += $local;
});
} @chunks;
$_->join for @workers;
After:
my $sum = 0;
$sum += process($_) for @big;
The chunking, the per-thread accumulator, the lock — all gone. The JIT does the chunking. The reduction detector handles the accumulator. There is no shared variable to lock.
Parallel map and grep#
For list-shaped transformations, the built-ins are already the natural idiom:
my @results = map { expensive($_) } @input;
my @filtered = grep { test($_) } @input;
When the callback has no detectable side effects and the input size
exceeds --parallel-threshold, pperl dispatches the callback in
parallel. Result order is preserved.
Equivalent ithreaded code would involve a thread pool, an input queue, an output queue, a sentinel value to signal end-of-input, and per-thread accumulators. The pperl version is one line.
fork — process-level concurrency#
When work is not a pure compute loop — it involves I/O,
subprocess management, or genuinely independent state — reach for
fork. A forked child has:
A full copy of the parent’s memory, copy-on-write at the OS level.
Its own file descriptor table (with shared underlying file descriptions).
Complete isolation at the interpreter level — no shared Perl heap, ever.
my $pid = fork;
die "fork: $!" unless defined $pid;
if ($pid == 0) {
# child
exec 'processing-tool', @args
or die "exec: $!";
}
# parent
waitpid $pid, 0;
Communication uses pipes, sockets, or the filesystem — the same inter-process primitives you would use between unrelated programs. That sounds heavier than in-process threading, and at the raw syscall level it is; but for program structure it is often simpler because there is no shared memory to guard.
Forking a work crew#
The upstream threads::shared work-crew in the previous section
translates to a fork-based equivalent when the work involves I/O:
my @pids;
for my $chunk (@chunks) {
my $pid = fork // die "fork: $!";
if ($pid == 0) {
process_chunk($chunk);
exit 0;
}
push @pids, $pid;
}
waitpid $_, 0 for @pids;
Results flow back through pipes, files, or a named collection point
in /tmp. Whatever you would have used between separate programs.
Pipeline via fork and pipes#
For pipeline-shaped workloads, Perl’s open with a |- or -|
form spawns a child with a pipe already attached:
open my $producer, '-|', 'find', '/data', '-type', 'f'
or die "fork: $!";
while (my $path = <$producer>) {
chomp $path;
# filter / process / forward
}
close $producer;
Each pipeline stage is its own process, scheduled by the OS, with no shared Perl state.
Isolated background work#
The upstream pattern of threads->create(sub { ... })->detach —
fire and forget — maps to a double-fork:
my $pid = fork // die "fork: $!";
if ($pid == 0) {
# first child: fork again and exit immediately, orphaning
# the grandchild to init so the parent does not have to wait
my $grand = fork // die "fork: $!";
exit 0 if $grand != 0;
background_work();
exit 0;
}
waitpid $pid, 0; # reap the first child, not the grandchild
The grandchild runs to completion independently of the original program, no zombie is left behind, and state isolation is absolute.
Choosing between auto-parallelisation and fork#
Auto-parallelisation wins for compute-bound loops over in-memory data: no process startup cost, no serialisation of results, JIT-compiled body.
fork wins for I/O, subprocess work, and anything where the task should not inherit the parent’s global state changes.
Neither is the right answer for low-cost task dispatch in tight loops — the usual culprit there is a sequential loop that does not actually benefit from concurrency. Measure before adding either layer.
See also#
Parallel Execution — the auto-parallelisation chapter: what qualifies, CLI flags, reduction detection
ithreads basics — the upstream model for context on what these alternatives replace
Shared data — the upstream sharing primitives, for reading existing ithreaded code
fork— full reference for the process-level primitivewait— reap a child processlock— the no-op ithreads primitive under pperlReference · P5 and Reference · PP — threading support is a runtime concern