Hashes and mixed structures#

A hash’s values are scalars, so a value cannot literally be an array or another hash. It can, however, be a reference to one — and that is enough to build every record shape real programs need: hashes of arrays, arrays of hashes, hashes of hashes, and nested combinations.

This chapter walks the three most common shapes. The patterns compose: once you have a hash of arrays and an array of hashes, you have “rows of records with multi-valued fields” for free.

Hash of arrays — grouping values under a key#

The motivating example: you read lines of the form city, country and want to collect, for each country, the list of its cities.

Chicago, USA
Frankfurt, Germany
Berlin, Germany
Washington, USA
Helsinki, Finland
New York, USA

The natural shape is a hash whose keys are country names and whose values are references to arrays of cities:

my %cities_by_country;
while (<>) {
    chomp;
    my ($city, $country) = split /, /;
    push @{$cities_by_country{$country}}, $city;
}

for my $country (sort keys %cities_by_country) {
    my @cities = sort @{$cities_by_country{$country}};
    print "$country: ", join(', ', @cities), ".\n";
}

Two things to notice:

  • push @{$cities_by_country{$country}}, $city works on the very first iteration for every new country. The hash entry does not exist yet, so Perl creates an anonymous array for it, stores a reference, and pushes the city in. That’s autovivification again — and why you almost never need $h{$k} = [] unless exists $h{$k} as a separate line.

  • The dereference @{$cities_by_country{$country}} is mechanical: “the array that this hash value points at.” Wherever you’d have written @cities if it were a named array, you write the curly form instead.

The output is:

Finland: Helsinki.
Germany: Berlin, Frankfurt.
USA: Chicago, New York, Washington.

Array of hashes — a table of records#

When each item in a list carries several named fields, store each record as a hash reference and put the references in an array:

my @people = (
    { name => 'Ada',   born => 1815, field => 'computing' },
    { name => 'Hedy',  born => 1914, field => 'signals'   },
    { name => 'Grace', born => 1906, field => 'compilers' },
);

$people[0] is a hash reference; $people[0]->{name} or (with the arrow-between-subscripts shortcut) $people[0]{name} is 'Ada'.

Typical operations:

# Sort by a field
my @by_year = sort { $a->{born} <=> $b->{born} } @people;

# Select rows
my @early = grep { $_->{born} < 1900 } @people;

# Project one field out
my @names = map { $_->{name} } @people;

# Mutate a record in place
$people[1]{field} = 'radio';

Inside sort, grep, and map the record is a hash reference, so you reach its fields through the arrow. $a and $b inside a sort comparator are hash references here, not hashes.

Hash of hashes — lookup tables keyed by structured values#

When each record is keyed by a unique identifier, a hash of hashes is usually nicer than an array of hashes:

my %people = (
    ada   => { born => 1815, field => 'computing' },
    hedy  => { born => 1914, field => 'signals'   },
    grace => { born => 1906, field => 'compilers' },
);

print $people{ada}{born};             # 1815
$people{grace}{field} = 'COBOL';      # mutate one record

Iterate in a definite order by sorting the keys:

for my $key (sort keys %people) {
    my $rec = $people{$key};
    print "$key: born $rec->{born}, $rec->{field}\n";
}

Binding the inner reference to a named lexical ($rec) is a habit worth forming — it makes the code read top-to-bottom instead of repeating $people{$key}{field} five times.

Mixed — records with multi-valued fields#

Put it all together: each record is a hash, and one of its fields is a reference to an array.

my %library = (
    'The Road'           => {
        author  => 'Cormac McCarthy',
        year    => 2006,
        genres  => ['fiction', 'post-apocalyptic'],
    },
    'Thinking in Systems' => {
        author  => 'Donella Meadows',
        year    => 2008,
        genres  => ['non-fiction', 'systems'],
    },
);

for my $title (sort keys %library) {
    my $rec = $library{$title};
    my $genres = join ', ', @{$rec->{genres}};
    print "$title ($rec->{year}, $rec->{author}): $genres\n";
}

@{$rec->{genres}} is the same pattern as before: reach into the hash, get the array reference, dereference it with @{...}.

Checking what you have#

Before acting on a nested slot it’s often useful to know whether the slot is actually there. Use exists rather than defined when you want to avoid autovivifying empty intermediate structures:

if (exists $library{$title} and exists $library{$title}{genres}) {
    push @{$library{$title}{genres}}, 'essential';
}

Reading $library{$title}{genres} in lvalue context — the left side of an assignment, or as the argument of push — autovivifies missing levels. Reading it in rvalue context does not. The exists checks above keep the scan read-only.

To test the type of a reference without pulling values out of it, use ref:

ref $library{'The Road'}              # HASH
ref $library{'The Road'}{genres}      # ARRAY

Where to go next#

  • Anonymous references — the [...] / {...} constructors used throughout this chapter, and when a named variable is the better choice.

  • Subroutine references — the remaining reference flavour, needed for callback tables and dispatch.