Hashes#

A hash is an unordered collection of key-value pairs where keys are strings and values are scalars. The sigil is % for the whole hash, @ for value slices and key/value slices, and $ for one value:

my %user = (
    name  => 'John',
    age   => 30,
    email => 'john@example.com',
);

%user                # the whole hash (six elements: key1, val1, ...)
$user{name}          # one value          — 'John'
@user{qw(name age)}  # value slice        — ('John', 30)
%user{qw(name age)}  # key/value slice    — (name => 'John', age => 30)
keys %user           # the list of keys

Hash keys are always coerced to strings. Storing under the integer key 42 and reading under the string key "42" retrieve the same slot — they are the same key.

Initialisation: pairs, fat comma, and %h = LIST#

A hash is initialised from a list of even length, in key/value order:

my %h = ('a', 1, 'b', 2);                        # legal but ugly
my %h = (a => 1, b => 2);                        # idiomatic

The fat comma => is a comma that also auto-quotes a left-hand bareword that looks like an identifier. So a => 1 is exactly 'a', 1. Auto-quoting requires the bareword to be a simple identifier — 2.0 => 'x' is parsed as the number 2, not the string "2.0":

my %h = (a => 1);                # ('a', 1)         — auto-quoted
my %h = ('a' => 1);              # ('a', 1)         — same
my %h = (2.0 => 'x');            # (2, 'x')         — not ('2.0', 'x')!
my %h = ("2.0" => 'x');          # ('2.0', 'x')     — explicit quote

If a key appears more than once in the initialiser list, the last occurrence wins. The standard idiom for ”merge with overrides“ exploits that:

my %config = (%defaults, %overrides);
# %config has every default key, with %overrides values where they collide

Access, with the four sigil shapes#

Reading and writing one slot is $-sigil:

my $name = $user{name};               # read
$user{city} = 'NYC';                  # write
$user{age}++;                         # arithmetic on a hash value
delete $user{email};                  # remove the slot entirely

The other three subscript shapes match what you want back:

@user{qw(name age city)}              # ('John', 31, 'NYC')           — values
%user{qw(name age)}                    # (name => 'John', age => 31)   — pairs
keys %user                             # ('name', 'age', 'city')        — keys
values %user                           # ('John', 31, 'NYC')            — values

See subscript for the whole twelve-way matrix and the bareword-auto-quote rule inside {}.

exists vs defined vs truthiness#

Three distinct questions about a key:

exists  $h{key}      # is the slot present at all?
defined $h{key}      # is the slot present AND non-undef?
        $h{key}      # is the value true (non-empty, non-zero, non-"0")?

These differ when:

$h{a} = undef;
$h{b} = 0;
delete $h{c};

exists  $h{a}        # TRUE  — slot exists, value is undef
defined $h{a}        # FALSE — value is undef
        $h{a}        # FALSE — undef is false

exists  $h{b}        # TRUE
defined $h{b}        # TRUE
        $h{b}        # FALSE — 0 is false

exists  $h{c}        # FALSE — never set, or deleted
defined $h{c}        # FALSE
        $h{c}        # FALSE

Pick by intent: exists for presence, defined for value-not-undef, truthiness for meaningful value. Reaching for if ($h{key}) when exists was meant is a bug source — 0 and "" are valid values that the test rejects.

Iteration: keys, values, each, while (each)#

for my $k (keys %user) {
    print "$k = $user{$k}\n";
}

for my $v (values %user) { ... }            # values only

while (my ($k, $v) = each %user) {          # keys + values, one pair per call
    print "$k = $v\n";
}

each carries an iterator state on the hash itself; calling keys (or values) on the hash resets that iterator. Mixing the two produces hard-to-debug looping bugs:

while (my ($k, $v) = each %h) {
    if (some_condition($k)) {
        print "size: ", scalar keys %h, "\n";   # resets each() iterator!
        # next iteration of while() starts over from the top
    }
}

Fix: either accumulate the list with keys once at the top, or avoid each entirely. Most hash iteration is best written with keys:

for my $k (sort keys %h) {            # bonus: defined ordering
    ...
}

Key ordering#

Hash keys come out in insertion-order-perturbed-by-hash-randomisation order — i.e. there is no guaranteed order. Two runs of the same program with the same input may iterate keys in different orders. This is a security feature (it prevents algorithmic-complexity attacks against the hash function); it is also a recurring source of brittle tests:

my %h = (a => 1, b => 2, c => 3);
print "$_ " for keys %h;        # output order is undefined

If you need a stable order, sort:

print "$_ " for sort keys %h;          # alphabetical
print "$_ " for sort { $h{$a} <=> $h{$b} } keys %h;   # by value

Hash references#

A hash, like an array, flattens when passed through a list. To pass a hash without flattening, take a reference:

my %h = (a => 1, b => 2);

my $href = \%h;                # reference to the existing %h
my $anon = { x => 1, y => 2 }; # anonymous hash reference

$href->{a}                     # access through the arrow      — 1
${$href}{a}                    # fully bracketed deref         — 1
keys %$href                    # deref then keys
%{$href}                       # deref to flat key/value list

See references for the full picture.

Real example: counting and grouping#

A frequency count is the textbook small example for hashes:

my @words = qw(apple banana apple cherry apple banana);
my %count;
$count{$_}++ for @words;
# %count = (apple => 3, banana => 2, cherry => 1)

Grouping is the next step up — building a hash whose values are arrayrefs:

my @people = (
    { name => 'Alice',   dept => 'eng' },
    { name => 'Bob',     dept => 'sales' },
    { name => 'Carol',   dept => 'eng' },
    { name => 'Dan',     dept => 'sales' },
);

my %by_dept;
push @{ $by_dept{$_->{dept}} }, $_->{name} for @people;
# %by_dept = (eng => ['Alice', 'Carol'], sales => ['Bob', 'Dan'])

The push @{ $by_dept{$_->{dept}} }, ... line autovivifies: when the eng key isn’t there yet, the hash slot is created and an anonymous array goes into it, then the value is pushed. See references for the autoviv story.

Hash slice as multi-value get/set#

my %config = (host => 'localhost', port => 80, debug => 0);

my ($h, $p) = @config{'host', 'port'};       # ('localhost', 80)

@config{qw(host port debug)} = ('elsewhere', 8080, 1);   # set three at once

Combine with the %h{...} slice for ”extract a record“:

my %fields = %config{qw(host port)};
# %fields = (host => 'localhost', port => 80)

This is the canonical way to project a hash down to a subset of its keys. Without %h{...} you’d hand-roll it with a loop or with map.

See also#

  • Arrays — when the index is an integer.

  • References — autovivification, hash-of-arrays, hash-of-hashes; the $h->{a}{b} shape.

  • Subscript and slice operators — the bareword-auto-quote rule and the four slice shapes.

  • exists, defined, delete — the presence/value/removal trio.

  • keys, values, each — whole-hash iteration.

  • sort — for ordered iteration.

  • tie — make a hash backed by something other than the in-memory hash table (DBM file, lazy database, …).