Hashes and mixed structures#
A hash’s values are scalars, so a value cannot literally be an array or another hash. It can, however, be a reference to one — and that is enough to build every record shape real programs need: hashes of arrays, arrays of hashes, hashes of hashes, and nested combinations.
This chapter walks the three most common shapes. The patterns compose: once you have a hash of arrays and an array of hashes, you have “rows of records with multi-valued fields” for free.
Hash of arrays — grouping values under a key#
The motivating example: you read lines of the form city, country
and want to collect, for each country, the list of its cities.
Chicago, USA
Frankfurt, Germany
Berlin, Germany
Washington, USA
Helsinki, Finland
New York, USA
The natural shape is a hash whose keys are country names and whose values are references to arrays of cities:
my %cities_by_country;
while (<>) {
chomp;
my ($city, $country) = split /, /;
push @{$cities_by_country{$country}}, $city;
}
for my $country (sort keys %cities_by_country) {
my @cities = sort @{$cities_by_country{$country}};
print "$country: ", join(', ', @cities), ".\n";
}
Two things to notice:
push @{$cities_by_country{$country}}, $cityworks on the very first iteration for every new country. The hash entry does not exist yet, so Perl creates an anonymous array for it, stores a reference, and pushes the city in. That’s autovivification again — and why you almost never need$h{$k} = [] unless exists $h{$k}as a separate line.The dereference
@{$cities_by_country{$country}}is mechanical: “the array that this hash value points at.” Wherever you’d have written@citiesif it were a named array, you write the curly form instead.
The output is:
Finland: Helsinki.
Germany: Berlin, Frankfurt.
USA: Chicago, New York, Washington.
Array of hashes — a table of records#
When each item in a list carries several named fields, store each record as a hash reference and put the references in an array:
my @people = (
{ name => 'Ada', born => 1815, field => 'computing' },
{ name => 'Hedy', born => 1914, field => 'signals' },
{ name => 'Grace', born => 1906, field => 'compilers' },
);
$people[0] is a hash reference; $people[0]->{name} or (with the
arrow-between-subscripts shortcut) $people[0]{name} is 'Ada'.
Typical operations:
# Sort by a field
my @by_year = sort { $a->{born} <=> $b->{born} } @people;
# Select rows
my @early = grep { $_->{born} < 1900 } @people;
# Project one field out
my @names = map { $_->{name} } @people;
# Mutate a record in place
$people[1]{field} = 'radio';
Inside sort,
grep, and
map the record is a hash reference,
so you reach its fields through the arrow. $a and $b inside a
sort comparator are hash references here, not hashes.
Hash of hashes — lookup tables keyed by structured values#
When each record is keyed by a unique identifier, a hash of hashes is usually nicer than an array of hashes:
my %people = (
ada => { born => 1815, field => 'computing' },
hedy => { born => 1914, field => 'signals' },
grace => { born => 1906, field => 'compilers' },
);
print $people{ada}{born}; # 1815
$people{grace}{field} = 'COBOL'; # mutate one record
Iterate in a definite order by sorting the keys:
for my $key (sort keys %people) {
my $rec = $people{$key};
print "$key: born $rec->{born}, $rec->{field}\n";
}
Binding the inner reference to a named lexical ($rec) is a
habit worth forming — it makes the code read top-to-bottom instead
of repeating $people{$key}{field} five times.
Mixed — records with multi-valued fields#
Put it all together: each record is a hash, and one of its fields is a reference to an array.
my %library = (
'The Road' => {
author => 'Cormac McCarthy',
year => 2006,
genres => ['fiction', 'post-apocalyptic'],
},
'Thinking in Systems' => {
author => 'Donella Meadows',
year => 2008,
genres => ['non-fiction', 'systems'],
},
);
for my $title (sort keys %library) {
my $rec = $library{$title};
my $genres = join ', ', @{$rec->{genres}};
print "$title ($rec->{year}, $rec->{author}): $genres\n";
}
@{$rec->{genres}} is the same pattern as before: reach into the
hash, get the array reference, dereference it with @{...}.
Checking what you have#
Before acting on a nested slot it’s often useful to know whether
the slot is actually there. Use
exists rather than
defined when you want to avoid
autovivifying empty intermediate structures:
if (exists $library{$title} and exists $library{$title}{genres}) {
push @{$library{$title}{genres}}, 'essential';
}
Reading $library{$title}{genres} in lvalue context — the left
side of an assignment, or as the argument of
push — autovivifies missing
levels. Reading it in rvalue context does not. The exists
checks above keep the scan read-only.
To test the type of a reference without pulling values out of it,
use ref:
ref $library{'The Road'} # HASH
ref $library{'The Road'}{genres} # ARRAY
Where to go next#
Anonymous references — the
[...]/{...}constructors used throughout this chapter, and when a named variable is the better choice.Subroutine references — the remaining reference flavour, needed for callback tables and dispatch.