Applications#
The previous chapters were about boolean logic in the abstract. This one is about where it shows up most often in working Perl: bitwise arithmetic, regular expressions, and the short-circuit idioms that take the place of explicit conditionals.
Bitwise: logic on integer bits#
The bitwise operators apply boolean operations to each pair of bits in two integers in parallel. A 32-bit integer is, from the operators’ point of view, thirty-two parallel one-bit values.
Op |
Reads as |
Per-bit rule |
|---|---|---|
|
bitwise AND |
each output bit = |
|
bitwise OR |
each output bit = |
|
bitwise XOR |
each output bit = |
|
bitwise NOT |
each output bit = |
|
left shift |
shift bits left, zero-fill from right |
|
right shift |
shift bits right |
The same boolean operators — ∧, ∨, ⊕, ¬ — appear here in
purely numeric form. That is not a coincidence; it is the
definition.
Setting, clearing, toggling, testing a flag#
The four basic flag operations on a single bit:
use constant FLAG_VERBOSE => 0x01;
use constant FLAG_DRY_RUN => 0x02;
use constant FLAG_RECURSE => 0x04;
use constant FLAG_FORCE => 0x08;
my $flags = 0;
$flags |= FLAG_VERBOSE; # SET -- OR with the bit
$flags |= FLAG_DRY_RUN; # SET another
$flags &= ~FLAG_DRY_RUN; # CLEAR -- AND with the inverted bit
$flags ^= FLAG_VERBOSE; # TOGGLE -- XOR with the bit
my $on = $flags & FLAG_RECURSE;# TEST -- AND, then test truthiness
Each of these is a one-bit application of a boolean operator. Set
is bit ∨ flag, clear is bit ∧ ¬flag, toggle is bit ⊕ flag,
test is bit ∧ flag.
XOR-swap: swapping without a temporary#
XOR has two properties that combine into a memorable trick:
a ⊕ a = 0, and a ⊕ b ⊕ b = a. Apply them in sequence and you
can swap two integers without a third variable:
my ($a, $b) = (0xFEED, 0xBEEF);
$a ^= $b; # a := a ⊕ b
$b ^= $a; # b := b ⊕ (a ⊕ b) = a
$a ^= $b; # a := (a ⊕ b) ⊕ a = b
print "a=$a b=$b\n"; # a=48879 b=65261 (0xBEEF, 0xFEED)
The trick is not actually useful in Perl — ($a, $b) = ($b, $a)
is faster, clearer, and works on any scalar including strings and
references. But it shows up in two places that matter:
Embedded code without a free register. A microcontroller with three values to juggle in two registers reaches for this.
Interview folklore. Knowing it exists and why it works (XOR is its own inverse) is worth the thirty seconds it takes to read.
The reason it works is exactly the boolean identity from the
truth-table chapter: x ⊕ y ⊕ y = x. Each of the three lines
above is one application of that identity.
Common bit tricks#
A handful of patterns you will see in performance-sensitive code:
$x & ($x - 1) # $x with its lowest set bit cleared
($x & ($x - 1)) == 0 # true when $x is a power of two (and non-zero)
$x | -$x # signed: zero iff $x was zero, non-zero otherwise
($x >> 31) & 1 # the sign bit, on a 32-bit signed integer
1 << $n # the integer with only bit $n set
$x & (1 << $n) # is bit $n set in $x?
Each of these is an exercise in tracking what the bits do — pure boolean reasoning applied 32 (or 64) times in parallel.
Regular expressions: logic on sets of strings#
A regex matches a set of strings. The empty set, the singleton
{"foo"}, the infinite set “anything that begins with a digit” —
all are sets, and the regex denotes one of them.
Once you see a regex as a set, the boolean operations have geometric meaning:
Boolean operation |
On sets |
In regex syntax |
|---|---|---|
OR (∨) |
union |
alternation: |
AND (∧) |
intersection |
lookahead pair: |
NOT (¬) |
complement |
negative lookahead: |
AND-NOT (a ∧ ¬b) |
difference |
|
Two specifics worth pulling out.
Alternation is OR over sets#
$s =~ /yes|no|maybe/;
Matches the union of three singleton sets. There is nothing more
to it; | in regex syntax is precisely the boolean ∨.
Inside a character class the meaning is the same:
$s =~ /[abc]/; # union of {"a"}, {"b"}, {"c"}
$s =~ /[^abc]/; # complement: anything NOT in {"a","b","c"}
[^...] is the regex syntax for ¬ applied to a character set.
Lookarounds are AND and NOT#
A regex without lookaround consumes characters as it matches; the
two ends of an alternation A|B cannot both match the same span
(only one branch wins). To express both A and B at the
same position, you need lookaround: a zero-width assertion that
demands a property without consuming.
# matches only if the rest of the string is BOTH digits AND ≤ 4 chars
$s =~ /^(?=\d+$)(?=.{1,4}$)/;
# \_____/\_______/
# A B intersection: A ∧ B
(?=...) is positive lookahead; (?!...) is negative lookahead
(the boolean NOT). Combining them gives you the full set algebra
on regex predicates:
# "starts with a digit but is not the literal '0'":
# (digit) ∧ ¬(literal "0")
$s =~ /^(?=\d)(?!0$)/;
Why the framing helps#
Once you read regex this way, the readability of complicated patterns improves dramatically. A regex with two lookaheads is not “two regexes glued together” — it is the intersection of two sets. A negative lookahead followed by a positive match is not “first reject, then match” — it is set difference. The boolean algebra you already know is the algebra of regexes; only the notation changes.
Control flow: short-circuit logic as conditional execution#
The chapter on operators introduced the operand-return rule for
&&, ||, and //. That rule, plus precedence, generates a small
family of idioms that replace explicit if/else for short
conditional logic:
# defaulting
my $port = $cfg{port} // 8080;
# guard with side-effect
open my $fh, '<', $path or die "open $path: $!";
# lazy initialisation (set if currently false)
$cache{$key} ||= compute($key);
# lazy initialisation (set if currently undef — a real `0` is kept)
$cfg{retries} //= 3;
# guard chain: do step 2 only if step 1 succeeded
my $rc = step_one() && step_two();
# logical selection inside an expression
my $label = $count == 1 ? "1 item" : "$count items";
Each of these is a boolean expression chosen for its side-effects
— the short-circuit decides whether the right operand even runs.
open ... or die works exactly because or does not evaluate its
right side when the left is true.
Two pieces of advice.
Use // and //= when 0 or "" is a valid value. This
single change has prevented more bugs than any other modern Perl
idiom; || defaulting on a configured zero is one of the
classic ways to lose data.
Reach for ?: when you would otherwise build an
if/else/scalar-assign triplet. The ternary keeps the
expression as an expression and the assignment in one line:
# verbose
my $kind;
if ($n == 1) { $kind = 'singular' }
else { $kind = 'plural' }
# idiomatic
my $kind = $n == 1 ? 'singular' : 'plural';
Don’t nest ?: more than two levels deep. Past two, the chained
form is harder to read than the if/elsif/else. The ternary
is for choosing; sequential decisions belong in a block.
What you should remember from this chapter#
Bitwise
&,|,^,~are the boolean operators applied bit-by-bit in parallel; flag set/clear/toggle/test are the four one-bit applications.A regex denotes a set of strings;
|is union,[^...]is complement,(?=)and(?!)make intersection and difference.&&,||,//,?:give you most of conditional control flow without writingif. Use//when0is real; use?:for short selection inside an expression.