# Hashes A hash is an unordered collection of key-value pairs where keys are strings and values are scalars. The sigil is `%` for the whole hash, `@` for value slices and key/value slices, and `$` for one value: ```perl my %user = ( name => 'John', age => 30, email => 'john@example.com', ); %user # the whole hash (six elements: key1, val1, ...) $user{name} # one value — 'John' @user{qw(name age)} # value slice — ('John', 30) %user{qw(name age)} # key/value slice — (name => 'John', age => 30) keys %user # the list of keys ``` Hash keys are always coerced to strings. Storing under the integer key `42` and reading under the string key `"42"` retrieve the same slot — they are the same key. ## Initialisation: pairs, fat comma, and `%h = LIST` A hash is initialised from a list of even length, in key/value order: ```perl my %h = ('a', 1, 'b', 2); # legal but ugly my %h = (a => 1, b => 2); # idiomatic ``` The fat comma `=>` is a comma that *also* auto-quotes a left-hand bareword that looks like an identifier. So `a => 1` is exactly `'a', 1`. Auto-quoting requires the bareword to be a simple identifier — `2.0 => 'x'` is parsed as the number `2`, not the string `"2.0"`: ```perl my %h = (a => 1); # ('a', 1) — auto-quoted my %h = ('a' => 1); # ('a', 1) — same my %h = (2.0 => 'x'); # (2, 'x') — not ('2.0', 'x')! my %h = ("2.0" => 'x'); # ('2.0', 'x') — explicit quote ``` If a key appears more than once in the initialiser list, the **last occurrence wins**. The standard idiom for «merge with overrides» exploits that: ```perl my %config = (%defaults, %overrides); # %config has every default key, with %overrides values where they collide ``` ## Access, with the four sigil shapes Reading and writing one slot is `$`-sigil: ```perl my $name = $user{name}; # read $user{city} = 'NYC'; # write $user{age}++; # arithmetic on a hash value delete $user{email}; # remove the slot entirely ``` The other three subscript shapes match what you want back: ```perl @user{qw(name age city)} # ('John', 31, 'NYC') — values %user{qw(name age)} # (name => 'John', age => 31) — pairs keys %user # ('name', 'age', 'city') — keys values %user # ('John', 31, 'NYC') — values ``` See [subscript](../perlop/subscript.md) for the whole twelve-way matrix and the bareword-auto-quote rule inside `{}`. ## `exists` vs `defined` vs truthiness Three distinct questions about a key: ```perl exists $h{key} # is the slot present at all? defined $h{key} # is the slot present AND non-undef? $h{key} # is the value true (non-empty, non-zero, non-"0")? ``` These differ when: ```perl $h{a} = undef; $h{b} = 0; delete $h{c}; exists $h{a} # TRUE — slot exists, value is undef defined $h{a} # FALSE — value is undef $h{a} # FALSE — undef is false exists $h{b} # TRUE defined $h{b} # TRUE $h{b} # FALSE — 0 is false exists $h{c} # FALSE — never set, or deleted defined $h{c} # FALSE $h{c} # FALSE ``` Pick by intent: `exists` for *presence*, `defined` for *value-not-undef*, truthiness for *meaningful value*. Reaching for `if ($h{key})` when `exists` was meant is a bug source — `0` and `""` are valid values that the test rejects. ## Iteration: `keys`, `values`, `each`, `while (each)` ```perl for my $k (keys %user) { print "$k = $user{$k}\n"; } for my $v (values %user) { ... } # values only while (my ($k, $v) = each %user) { # keys + values, one pair per call print "$k = $v\n"; } ``` `each` carries an iterator state on the hash itself; calling [`keys`](../perlfunc/keys.md) (or [`values`](../perlfunc/values.md)) on the hash *resets* that iterator. Mixing the two produces hard-to-debug looping bugs: ```perl while (my ($k, $v) = each %h) { if (some_condition($k)) { print "size: ", scalar keys %h, "\n"; # resets each() iterator! # next iteration of while() starts over from the top } } ``` Fix: either accumulate the list with `keys` once at the top, or avoid `each` entirely. Most hash iteration is best written with `keys`: ```perl for my $k (sort keys %h) { # bonus: defined ordering ... } ``` ## Key ordering Hash keys come out in **insertion-order-perturbed-by-hash-randomisation** order — i.e. there is no guaranteed order. Two runs of the same program with the same input may iterate keys in different orders. This is a security feature (it prevents algorithmic-complexity attacks against the hash function); it is also a recurring source of brittle tests: ```perl my %h = (a => 1, b => 2, c => 3); print "$_ " for keys %h; # output order is undefined ``` If you need a stable order, sort: ```perl print "$_ " for sort keys %h; # alphabetical print "$_ " for sort { $h{$a} <=> $h{$b} } keys %h; # by value ``` ## Hash references A hash, like an array, flattens when passed through a list. To pass a hash without flattening, take a **reference**: ```perl my %h = (a => 1, b => 2); my $href = \%h; # reference to the existing %h my $anon = { x => 1, y => 2 }; # anonymous hash reference $href->{a} # access through the arrow — 1 ${$href}{a} # fully bracketed deref — 1 keys %$href # deref then keys %{$href} # deref to flat key/value list ``` See [references](references.md) for the full picture. ## Real example: counting and grouping A frequency count is the textbook small example for hashes: ```perl my @words = qw(apple banana apple cherry apple banana); my %count; $count{$_}++ for @words; # %count = (apple => 3, banana => 2, cherry => 1) ``` Grouping is the next step up — building a hash whose values are arrayrefs: ```perl my @people = ( { name => 'Alice', dept => 'eng' }, { name => 'Bob', dept => 'sales' }, { name => 'Carol', dept => 'eng' }, { name => 'Dan', dept => 'sales' }, ); my %by_dept; push @{ $by_dept{$_->{dept}} }, $_->{name} for @people; # %by_dept = (eng => ['Alice', 'Carol'], sales => ['Bob', 'Dan']) ``` The `push @{ $by_dept{$_->{dept}} }, ...` line autovivifies: when the `eng` key isn’t there yet, the hash slot is created and an anonymous array goes into it, then the value is pushed. See [references](references.md) for the autoviv story. ## Hash slice as multi-value get/set ```perl my %config = (host => 'localhost', port => 80, debug => 0); my ($h, $p) = @config{'host', 'port'}; # ('localhost', 80) @config{qw(host port debug)} = ('elsewhere', 8080, 1); # set three at once ``` Combine with the `%h{...}` slice for «extract a record»: ```perl my %fields = %config{qw(host port)}; # %fields = (host => 'localhost', port => 80) ``` This is the canonical way to project a hash down to a subset of its keys. Without `%h{...}` you’d hand-roll it with a loop or with `map`. ## See also - [Arrays](arrays.md) — when the index is an integer. - [References](references.md) — autovivification, hash-of-arrays, hash-of-hashes; the `$h->{a}{b}` shape. - [Subscript and slice operators](../perlop/subscript.md) — the bareword-auto-quote rule and the four slice shapes. - [`exists`](../perlfunc/exists.md), [`defined`](../perlfunc/defined.md), [`delete`](../perlfunc/delete.md) — the presence/value/removal trio. - [`keys`](../perlfunc/keys.md), [`values`](../perlfunc/values.md), [`each`](../perlfunc/each.md) — whole-hash iteration. - [`sort`](../perlfunc/sort.md) — for ordered iteration. - [`tie`](../perlfunc/tie.md) — make a hash backed by something other than the in-memory hash table (DBM file, lazy database, …).