# Alternatives to ithreads

pperl does not implement Perl's interpreter threads. That sounds
like a loss until you look at *why* most programs reach for threads.
Almost every upstream ithreaded program falls into one of three
shapes:

- **Parallel compute over data** — "apply this function to each
  element of this large array, as fast as you can."
- **Parallel I/O** — "perform these N independent network or
  filesystem calls concurrently."
- **Background work with isolated state** — "run this task to the
  side, with no shared memory, and wait for the result."

pperl has a dedicated, better-performing answer for each.

## Decision table

| Your goal                                    | Upstream tool                      | pperl-native answer                      |
|----------------------------------------------|------------------------------------|------------------------------------------|
| Parallelise a compute-heavy loop             | `threads` + work-crew pattern      | Auto-parallelisation (JIT + Rayon)       |
| Parallelise `map` / `grep` over large data   | Work-crew thread pool              | Parallel `map` / `grep`                  |
| Run I/O-bound tasks concurrently             | `threads` + `Thread::Queue`        | [`fork`](../../p5/core/perlfunc/fork) with pipes, or event-driven I/O |
| Isolate a subtask with its own state         | Detached thread                    | [`fork`](../../p5/core/perlfunc/fork)    |
| Produce/consume pipeline                     | `threads` + `Thread::Queue` chain  | [`fork`](../../p5/core/perlfunc/fork) + pipes, or sequential iterator |
| Synchronised counter                         | `:shared` + [`lock`](../../p5/core/perlfunc/lock) | Sequential accumulator + parallel reduction |

## Auto-parallelisation for compute loops

pperl's JIT detects loops whose bodies are free of I/O, global
writes, and impure calls, then dispatches the loop body across a
Rayon work-stealing pool. A scalar-accumulator like `$sum += ...`
is recognised as a reduction and combined after all iterations
finish.

```perl
my $sum = 0;
for my $i (1 .. 10_000_000) {
    $sum += sqrt($i);
}
print $sum, "\n";
```

Under pperl this compiles to a single Rayon-dispatched loop body.
On a typical 8-core machine the speedup is in the 5-6x range over
the sequential JIT, with no user-visible thread objects, no
synchronisation code, and no shared state.

See [Parallel Execution](parallel) for the full story — what
qualifies, how reduction detection works, and the CLI flags
(`--no-parallel`, `--threads=N`, `--parallel-threshold=N`) that
control the behaviour.

Porting a work-crew ithreaded loop typically means **deleting the
threading code**. The sequential form is often the pperl-optimal
form already.

Before:

```perl
use threads;
use threads::shared;

my $sum :shared = 0;
my @chunks = chunk_data(\@big);
my @workers = map {
    my $c = $_;
    threads->create(sub {
        my $local = 0;
        $local += process($_) for @$c;
        lock $sum; $sum += $local;
    });
} @chunks;
$_->join for @workers;
```

After:

```perl
my $sum = 0;
$sum += process($_) for @big;
```

The chunking, the per-thread accumulator, the lock — all gone. The
JIT does the chunking. The reduction detector handles the
accumulator. There is no shared variable to lock.

## Parallel `map` and `grep`

For list-shaped transformations, the built-ins are already the
natural idiom:

```perl
my @results  = map  { expensive($_) } @input;
my @filtered = grep { test($_)     } @input;
```

When the callback has no detectable side effects and the input size
exceeds `--parallel-threshold`, pperl dispatches the callback in
parallel. Result order is preserved.

Equivalent ithreaded code would involve a thread pool, an input
queue, an output queue, a sentinel value to signal end-of-input,
and per-thread accumulators. The pperl version is one line.

## fork — process-level concurrency

When work is **not** a pure compute loop — it involves I/O,
subprocess management, or genuinely independent state — reach for
[`fork`](../../p5/core/perlfunc/fork). A forked child has:

- A full copy of the parent's memory, copy-on-write at the OS level.
- Its own file descriptor table (with shared underlying file
  descriptions).
- Complete isolation at the interpreter level — no shared Perl heap,
  ever.

```perl
my $pid = fork;
die "fork: $!" unless defined $pid;
if ($pid == 0) {
    # child
    exec 'processing-tool', @args
        or die "exec: $!";
}
# parent
waitpid $pid, 0;
```

Communication uses pipes, sockets, or the filesystem — the same
inter-process primitives you would use between unrelated programs.
That sounds heavier than in-process threading, and at the raw
syscall level it is; but for program structure it is often simpler
because there is no shared memory to guard.

### Forking a work crew

The upstream `threads::shared` work-crew in the previous section
translates to a fork-based equivalent when the work involves I/O:

```perl
my @pids;
for my $chunk (@chunks) {
    my $pid = fork // die "fork: $!";
    if ($pid == 0) {
        process_chunk($chunk);
        exit 0;
    }
    push @pids, $pid;
}
waitpid $_, 0 for @pids;
```

Results flow back through pipes, files, or a named collection point
in `/tmp`. Whatever you would have used between separate programs.

### Pipeline via fork and pipes

For pipeline-shaped workloads, Perl's `open` with a `|-` or `-|`
form spawns a child with a pipe already attached:

```perl
open my $producer, '-|', 'find', '/data', '-type', 'f'
    or die "fork: $!";
while (my $path = <$producer>) {
    chomp $path;
    # filter / process / forward
}
close $producer;
```

Each pipeline stage is its own process, scheduled by the OS, with
no shared Perl state.

## Isolated background work

The upstream pattern of `threads->create(sub { ... })->detach` —
fire and forget — maps to a double-fork:

```perl
my $pid = fork // die "fork: $!";
if ($pid == 0) {
    # first child: fork again and exit immediately, orphaning
    # the grandchild to init so the parent does not have to wait
    my $grand = fork // die "fork: $!";
    exit 0 if $grand != 0;
    background_work();
    exit 0;
}
waitpid $pid, 0;    # reap the first child, not the grandchild
```

The grandchild runs to completion independently of the original
program, no zombie is left behind, and state isolation is absolute.

## Choosing between auto-parallelisation and fork

- **Auto-parallelisation** wins for compute-bound loops over
  in-memory data: no process startup cost, no serialisation of
  results, JIT-compiled body.
- **fork** wins for I/O, subprocess work, and anything where the
  task should not inherit the parent's global state changes.
- **Neither** is the right answer for low-cost task dispatch in
  tight loops — the usual culprit there is a sequential loop that
  does not actually benefit from concurrency. Measure before adding
  either layer.

## See also

- [Parallel Execution](parallel) — the auto-parallelisation
  chapter: what qualifies, CLI flags, reduction detection
- [ithreads basics](ithreads-basics) — the upstream model for
  context on what these alternatives replace
- [Shared data](shared-data) — the upstream sharing primitives, for
  reading existing ithreaded code
- [`fork`](../../p5/core/perlfunc/fork) — full reference for the
  process-level primitive
- [`wait`](../../p5/core/perlfunc/wait) — reap a child process
- [`lock`](../../p5/core/perlfunc/lock) — the no-op ithreads
  primitive under pperl
- [Reference · P5](../../p5/index) and [Reference · PP](../../pp/index) —
  threading support is a runtime concern