# Data Types Perl has three fundamental data types: scalars, arrays, and hashes. PetaPerl implements these with identical semantics to Perl 5. ## Scalars Scalars are Perl's basic data type, holding a single value. The variable name begins with `$`. ```perl my $number = 42; my $string = "hello"; my $float = 3.14; my $ref = \@array; ``` ### Scalar Types Internally, PetaPerl represents scalars as an enum with these variants: | Type | Description | Example | |------|-------------|---------| | `Undef` | Undefined value | `undef` | | `Iv` | Integer (i64) | `42`, `-17` | | `Uv` | Unsigned integer (u64) | Large positive numbers | | `Nv` | Float (f64) | `3.14`, `1.5e10` | | `Pv` | Immutable string (Arc\) | Constants, literals | | `PvBuf` | Mutable string (Arc\) | `.=` targets, `$_` | | `Rv` | Reference | `\$x`, `\@arr`, `\%hash` | | `Av` | Array | Created by `[]` | | `Hv` | Hash | Created by `{}` | **Dynamic typing**: A scalar can change type during execution. ```perl my $x = 42; # Integer $x = "hello"; # Now a string $x = 3.14; # Now a float ``` ### Scalar Context Operations that expect a single value force scalar context: ```perl my $count = @array; # Array length my $last = (1, 2, 3); # Last element (3) if (@array) { ... } # True if non-empty ``` ### Special Scalars | Value | Description | |-------|-------------| | `undef` | Undefined value | | `0` | Numeric zero, string "0" | | `""` | Empty string | **Truth values**: These are false in boolean context: `undef`, `0`, `"0"`, `""`. Everything else is true. ```perl if ($value) { ... } # False if undef, 0, "0", or "" if (defined $value) { ... } # False only if undef ``` ## Arrays Arrays are ordered lists of scalars. Variable names begin with `@`. ```perl my @numbers = (1, 2, 3, 4, 5); my @words = qw(foo bar baz); my @empty = (); ``` ### Array Access ```perl my $first = $numbers[0]; # First element (index 0) my $last = $numbers[-1]; # Last element $numbers[5] = 6; # Set element ``` **Note the sigil change**: Use `$` to access a single element because you're getting a scalar. ### Array Length ```perl my $length = @array; # Scalar context my $length = scalar @array; # Explicit scalar context my $max_index = $#array; # Highest index (length - 1) ``` ### Array Operations ```perl push @arr, $value; # Append my $value = pop @arr; # Remove last my $value = shift @arr; # Remove first unshift @arr, $value; # Prepend splice @arr, $offset, $len, @replacement; # General removal/insertion ``` ### Array Slices Extract multiple elements at once: ```perl my @subset = @arr[0, 2, 4]; # Elements 0, 2, 4 my @range = @arr[0..5]; # Elements 0 through 5 @arr[1, 3] = (10, 20); # Assign to multiple elements ``` ### List Context Operations that expect multiple values force list context: ```perl my @copy = @original; # Array copy my ($a, $b, $c) = (1, 2, 3); # List assignment my @results = function(); # Function returns list ``` ### Array References Create references to arrays: ```perl my $aref = \@array; # Reference to existing array my $aref = [1, 2, 3]; # Anonymous array reference ``` Access via reference: ```perl my $elem = $aref->[0]; # First element my @copy = @$aref; # Dereference to array push @$aref, $value; # Push via reference ``` ## Hashes Hashes are unordered key-value pairs. Variable names begin with `%`. ```perl my %user = ( name => "John", age => 30, email => "john@example.com", ); ``` ### Hash Access ```perl my $name = $user{name}; # Get value $user{city} = "NYC"; # Set value ``` **Sigil change**: Use `$` for single element access (getting a scalar). ### Hash Operations ```perl my @keys = keys %hash; # All keys my @values = values %hash; # All values while (my ($k, $v) = each %hash) { ... } # Iterate if (exists $hash{key}) { ... } # Check existence my $val = delete $hash{key}; # Remove and return ``` ### Hash Slices Extract multiple values at once: ```perl my @vals = @hash{qw(name age)}; # Values for keys @hash{qw(x y)} = (10, 20); # Assign multiple ``` **Note**: Hash slices use `@` sigil because they return a list. ### Hash References Create references to hashes: ```perl my $href = \%hash; # Reference to existing hash my $href = { key => "val" }; # Anonymous hash reference ``` Access via reference: ```perl my $val = $href->{key}; # Get value $href->{new} = "value"; # Set value my @keys = keys %$href; # Dereference to hash ``` ## References References are scalars that point to other data. ### Creating References ```perl my $scalar_ref = \$scalar; my $array_ref = \@array; my $hash_ref = \%hash; my $code_ref = \⊂ my $anon_array = [1, 2, 3]; my $anon_hash = { a => 1, b => 2 }; my $anon_sub = sub { ... }; ``` ### Dereferencing ```perl my $value = $$scalar_ref; # Scalar dereference my @array = @$array_ref; # Array dereference my %hash = %$hash_ref; # Hash dereference my $result = &$code_ref(); # Code dereference ``` **Arrow notation** (preferred for clarity): ```perl my $elem = $array_ref->[0]; my $val = $hash_ref->{key}; my $result = $code_ref->(@args); ``` ### Reference Types Check reference type with `ref`: ```perl my $type = ref $ref; # Returns: 'SCALAR', 'ARRAY', 'HASH', 'CODE', 'REF', or '' (not a ref) ``` ### Nested Data Structures References enable complex data structures: ```perl my $data = { users => [ { name => "John", age => 30 }, { name => "Jane", age => 25 }, ], config => { debug => 1, timeout => 30, }, }; my $name = $data->{users}->[0]->{name}; # "John" ``` ### Autovivification Perl automatically creates intermediate references: ```perl my %hash; $hash{a}{b}{c} = 1; # Creates nested hashes automatically my $val = $hash{x}[0]; # Creates array ref at $hash{x} ``` ## Type Conversions Perl performs automatic type conversions based on context. ### String to Number ```perl my $x = "42"; my $y = $x + 10; # 52 (string → number) ``` ### Number to String ```perl my $x = 42; my $s = "Value: $x"; # "Value: 42" (number → string) ``` ### String Concatenation ```perl my $result = 10 . 20; # "1020" (both → string) ``` ### Boolean Context ```perl if ("0") { ... } # False if ("00") { ... } # True if (0) { ... } # False if (0.0) { ... } # False if ("") { ... } # False if (undef) { ... } # False ``` ## Typeglobs Typeglobs are a special type that can hold entries for all variable types with the same name. ```perl *name = \$scalar; # Alias glob to scalar *name = \⊂ # Alias glob to subroutine ``` Primarily used for symbol table manipulation and importing. ## PetaPerl-Specific Implementation ### Memory Representation PetaPerl uses efficient internal representations: **Scalars**: Rust enum with variants for each type. Two string representations optimize for different access patterns. ```rust pub enum Sv { Undef, Iv(i64), // Integer Uv(u64), // Unsigned Nv(f64), // Float Pv(Arc, u32), // Immutable string + virtual length (O(1) chomp) PvBuf(Arc), // Mutable string (COW via Arc::make_mut) Rv(RvInner), // Reference Av(Av), // Array Hv(Hv), // Hash // ... additional types } ``` **Dual string representation**: `Pv` is used for constants and literals — the `u32` virtual length enables O(1) chomp without modifying the shared string. `PvBuf` is used for mutable strings (default for `new_string()`) — `Arc::make_mut()` provides copy-on-write semantics with zero allocation when unshared. **Arrays**: Dynamic vectors with efficient push/pop operations. **Hashes**: Optimized hash tables with fast key lookup. ### Shared Ownership String scalars use `Arc` (atomic reference counting): - Cheap cloning (just increment counter) - Thread-safe sharing for parallel execution - Common strings can share storage ### Aliasing Support PetaPerl implements `@_` aliasing correctly: - Arguments are aliases to caller's variables - Modifications write through to original - Uses `SvCell` for mutable indirection ```perl sub modify { $_[0] = "changed"; # Modifies caller's variable } my $x = "original"; modify($x); print $x; # "changed" ``` ### Performance Characteristics | Operation | Complexity | Notes | |-----------|-----------|-------| | Scalar assignment | O(1) | String uses Arc (no copy) | | Array push/pop | O(1) amortized | Dynamic growth | | Array shift/unshift | O(n) | Must move elements | | Hash access | O(1) average | Optimized hash function | | Hash insert | O(1) average | With growth | ### Parallel Execution PetaPerl's parallelization model: - Each thread gets its own lexical pad (no shared mutation) - Array and hash operations are thread-safe - `Arc` strings share across threads safely - Loop-level parallelism doesn't require global locks ## Context Sensitivity Perl's context system determines how expressions evaluate. ### Scalar Context Forces single-value evaluation: ```perl my $count = @array; # Length my $last = (1, 2, 3); # Last element my $concat = (1, 2, 3); # 3 ``` ### List Context Forces multiple-value evaluation: ```perl my @copy = @array; # All elements my @results = func(); # All return values my ($a, $b) = (1, 2, 3); # First two elements ``` ### Void Context Result is discarded: ```perl func(); # Return value ignored print "hello"; # No assignment ``` ### Context Propagation ```perl my @arr = (1, 2, 3); # Scalar context my $x = keys @arr; # keys in scalar context → count # List context my @k = keys @arr; # keys in list context → all keys # Function argument context func(@arr); # List context func(scalar @arr); # Scalar context (explicit) ``` ## Constants Constants are immutable values. ### Literal Constants ```perl 42 # Integer 3.14 # Float "string" # String qw(a b c) # List of strings ``` ### Named Constants ```perl use constant PI => 3.14159; use constant MAX => 100; use constant { RED => 0xFF0000, GREEN => 0x00FF00, BLUE => 0x0000FF, }; ``` ### Compile-Time Constant Folding PetaPerl performs constant folding at compile time: ```perl my $x = 2 + 3; # Folded to 5 my $len = length("hello"); # Folded to 5 my $sub = substr("text", 0, 2); # Folded to "te" ``` This optimization eliminates runtime computation for constant expressions. ## Special Types ### Code References Subroutines can be referenced and called dynamically: ```perl my $coderef = sub { return $_[0] * 2 }; my $result = $coderef->(21); # 42 my $coderef = \&existing_sub; $coderef->(@args); ``` ### Filehandles Filehandles are special scalars: ```perl open my $fh, '<', $file or die $!; my $line = <$fh>; close $fh; ``` ### Globs Typeglobs reference symbol table entries: ```perl *alias = *original; # Alias all types *func = sub { ... }; # Install subroutine ``` ## See Also - [perlop](perlop.md) - Operators that work with these types - [perlfunc](perlfunc.md) - Functions for manipulating data