--- name: hashes and mixed structures --- # Hashes and mixed structures A hash's values are scalars, so a value cannot literally **be** an array or another hash. It can, however, be a **reference** to one — and that is enough to build every record shape real programs need: hashes of arrays, arrays of hashes, hashes of hashes, and nested combinations. This chapter walks the three most common shapes. The patterns compose: once you have a hash of arrays and an array of hashes, you have "rows of records with multi-valued fields" for free. ## Hash of arrays — grouping values under a key The motivating example: you read lines of the form `city, country` and want to collect, for each country, the list of its cities. ``` Chicago, USA Frankfurt, Germany Berlin, Germany Washington, USA Helsinki, Finland New York, USA ``` The natural shape is a hash whose keys are country names and whose values are references to arrays of cities: ```perl my %cities_by_country; while (<>) { chomp; my ($city, $country) = split /, /; push @{$cities_by_country{$country}}, $city; } for my $country (sort keys %cities_by_country) { my @cities = sort @{$cities_by_country{$country}}; print "$country: ", join(', ', @cities), ".\n"; } ``` Two things to notice: - `push @{$cities_by_country{$country}}, $city` works on the very first iteration for every new country. The hash entry does not exist yet, so Perl creates an anonymous array for it, stores a reference, and pushes the city in. That's autovivification again — and why you almost never need `$h{$k} = [] unless exists $h{$k}` as a separate line. - The dereference `@{$cities_by_country{$country}}` is mechanical: "the array that this hash value points at." Wherever you'd have written `@cities` if it were a named array, you write the curly form instead. The output is: ``` Finland: Helsinki. Germany: Berlin, Frankfurt. USA: Chicago, New York, Washington. ``` ## Array of hashes — a table of records When each item in a list carries several named fields, store each record as a hash reference and put the references in an array: ```perl my @people = ( { name => 'Ada', born => 1815, field => 'computing' }, { name => 'Hedy', born => 1914, field => 'signals' }, { name => 'Grace', born => 1906, field => 'compilers' }, ); ``` `$people[0]` is a hash reference; `$people[0]->{name}` or (with the arrow-between-subscripts shortcut) `$people[0]{name}` is `'Ada'`. Typical operations: ```perl # Sort by a field my @by_year = sort { $a->{born} <=> $b->{born} } @people; # Select rows my @early = grep { $_->{born} < 1900 } @people; # Project one field out my @names = map { $_->{name} } @people; # Mutate a record in place $people[1]{field} = 'radio'; ``` Inside [`sort`](../../p5/core/perlfunc/sort), [`grep`](../../p5/core/perlfunc/grep), and [`map`](../../p5/core/perlfunc/map) the record is a hash reference, so you reach its fields through the arrow. `$a` and `$b` inside a `sort` comparator are hash references here, not hashes. ## Hash of hashes — lookup tables keyed by structured values When each record is keyed by a unique identifier, a hash of hashes is usually nicer than an array of hashes: ```perl my %people = ( ada => { born => 1815, field => 'computing' }, hedy => { born => 1914, field => 'signals' }, grace => { born => 1906, field => 'compilers' }, ); print $people{ada}{born}; # 1815 $people{grace}{field} = 'COBOL'; # mutate one record ``` Iterate in a definite order by sorting the keys: ```perl for my $key (sort keys %people) { my $rec = $people{$key}; print "$key: born $rec->{born}, $rec->{field}\n"; } ``` Binding the inner reference to a named lexical (`$rec`) is a habit worth forming — it makes the code read top-to-bottom instead of repeating `$people{$key}{field}` five times. ## Mixed — records with multi-valued fields Put it all together: each record is a hash, and one of its fields is a reference to an array. ```perl my %library = ( 'The Road' => { author => 'Cormac McCarthy', year => 2006, genres => ['fiction', 'post-apocalyptic'], }, 'Thinking in Systems' => { author => 'Donella Meadows', year => 2008, genres => ['non-fiction', 'systems'], }, ); for my $title (sort keys %library) { my $rec = $library{$title}; my $genres = join ', ', @{$rec->{genres}}; print "$title ($rec->{year}, $rec->{author}): $genres\n"; } ``` `@{$rec->{genres}}` is the same pattern as before: reach into the hash, get the array reference, dereference it with `@{...}`. ## Checking what you have Before acting on a nested slot it's often useful to know whether the slot is actually there. Use [`exists`](../../p5/core/perlfunc/exists) rather than [`defined`](../../p5/core/perlfunc/defined) when you want to avoid autovivifying empty intermediate structures: ```perl if (exists $library{$title} and exists $library{$title}{genres}) { push @{$library{$title}{genres}}, 'essential'; } ``` Reading `$library{$title}{genres}` in lvalue context — the left side of an assignment, or as the argument of [`push`](../../p5/core/perlfunc/push) — autovivifies missing levels. Reading it in rvalue context does not. The `exists` checks above keep the scan read-only. To test the type of a reference without pulling values out of it, use [`ref`](../../p5/core/perlfunc/ref): ```perl ref $library{'The Road'} # HASH ref $library{'The Road'}{genres} # ARRAY ``` ## Where to go next - *Anonymous references* — the `[...]` / `{...}` constructors used throughout this chapter, and when a named variable is the better choice. - *Subroutine references* — the remaining reference flavour, needed for callback tables and dispatch.