Data Types
Perl has three fundamental data types: scalars, arrays, and hashes. PetaPerl implements these with identical semantics to Perl 5.
Scalars
Scalars are Perl’s basic data type, holding a single value. The variable name begins with $.
my $number = 42;
my $string = "hello";
my $float = 3.14;
my $ref = \@array;
Scalar Types
Internally, PetaPerl represents scalars as an enum with these variants:
| Type | Description | Example |
|---|---|---|
Undef | Undefined value | undef |
Iv | Integer (i64) | 42, -17 |
Uv | Unsigned integer (u64) | Large positive numbers |
Nv | Float (f64) | 3.14, 1.5e10 |
Pv | Immutable string (Arc<str>) | Constants, literals |
PvBuf | Mutable string (Arc<String>) | .= targets, $_ |
Rv | Reference | \$x, \@arr, \%hash |
Av | Array | Created by [] |
Hv | Hash | Created by {} |
Dynamic typing: A scalar can change type during execution.
my $x = 42; # Integer
$x = "hello"; # Now a string
$x = 3.14; # Now a float
Scalar Context
Operations that expect a single value force scalar context:
my $count = @array; # Array length
my $last = (1, 2, 3); # Last element (3)
if (@array) { ... } # True if non-empty
Special Scalars
| Value | Description |
|---|---|
undef | Undefined value |
0 | Numeric zero, string “0” |
"" | Empty string |
Truth values: These are false in boolean context: undef, 0, "0", "". Everything else is true.
if ($value) { ... } # False if undef, 0, "0", or ""
if (defined $value) { ... } # False only if undef
Arrays
Arrays are ordered lists of scalars. Variable names begin with @.
my @numbers = (1, 2, 3, 4, 5);
my @words = qw(foo bar baz);
my @empty = ();
Array Access
my $first = $numbers[0]; # First element (index 0)
my $last = $numbers[-1]; # Last element
$numbers[5] = 6; # Set element
Note the sigil change: Use $ to access a single element because you’re getting a scalar.
Array Length
my $length = @array; # Scalar context
my $length = scalar @array; # Explicit scalar context
my $max_index = $#array; # Highest index (length - 1)
Array Operations
push @arr, $value; # Append
my $value = pop @arr; # Remove last
my $value = shift @arr; # Remove first
unshift @arr, $value; # Prepend
splice @arr, $offset, $len, @replacement; # General removal/insertion
Array Slices
Extract multiple elements at once:
my @subset = @arr[0, 2, 4]; # Elements 0, 2, 4
my @range = @arr[0..5]; # Elements 0 through 5
@arr[1, 3] = (10, 20); # Assign to multiple elements
List Context
Operations that expect multiple values force list context:
my @copy = @original; # Array copy
my ($a, $b, $c) = (1, 2, 3); # List assignment
my @results = function(); # Function returns list
Array References
Create references to arrays:
my $aref = \@array; # Reference to existing array
my $aref = [1, 2, 3]; # Anonymous array reference
Access via reference:
my $elem = $aref->[0]; # First element
my @copy = @$aref; # Dereference to array
push @$aref, $value; # Push via reference
Hashes
Hashes are unordered key-value pairs. Variable names begin with %.
my %user = (
name => "John",
age => 30,
email => "john@example.com",
);
Hash Access
my $name = $user{name}; # Get value
$user{city} = "NYC"; # Set value
Sigil change: Use $ for single element access (getting a scalar).
Hash Operations
my @keys = keys %hash; # All keys
my @values = values %hash; # All values
while (my ($k, $v) = each %hash) { ... } # Iterate
if (exists $hash{key}) { ... } # Check existence
my $val = delete $hash{key}; # Remove and return
Hash Slices
Extract multiple values at once:
my @vals = @hash{qw(name age)}; # Values for keys
@hash{qw(x y)} = (10, 20); # Assign multiple
Note: Hash slices use @ sigil because they return a list.
Hash References
Create references to hashes:
my $href = \%hash; # Reference to existing hash
my $href = { key => "val" }; # Anonymous hash reference
Access via reference:
my $val = $href->{key}; # Get value
$href->{new} = "value"; # Set value
my @keys = keys %$href; # Dereference to hash
References
References are scalars that point to other data.
Creating References
my $scalar_ref = \$scalar;
my $array_ref = \@array;
my $hash_ref = \%hash;
my $code_ref = \⊂
my $anon_array = [1, 2, 3];
my $anon_hash = { a => 1, b => 2 };
my $anon_sub = sub { ... };
Dereferencing
my $value = $$scalar_ref; # Scalar dereference
my @array = @$array_ref; # Array dereference
my %hash = %$hash_ref; # Hash dereference
my $result = &$code_ref(); # Code dereference
Arrow notation (preferred for clarity):
my $elem = $array_ref->[0];
my $val = $hash_ref->{key};
my $result = $code_ref->(@args);
Reference Types
Check reference type with ref:
my $type = ref $ref;
# Returns: 'SCALAR', 'ARRAY', 'HASH', 'CODE', 'REF', or '' (not a ref)
Nested Data Structures
References enable complex data structures:
my $data = {
users => [
{ name => "John", age => 30 },
{ name => "Jane", age => 25 },
],
config => {
debug => 1,
timeout => 30,
},
};
my $name = $data->{users}->[0]->{name}; # "John"
Autovivification
Perl automatically creates intermediate references:
my %hash;
$hash{a}{b}{c} = 1; # Creates nested hashes automatically
my $val = $hash{x}[0]; # Creates array ref at $hash{x}
Type Conversions
Perl performs automatic type conversions based on context.
String to Number
my $x = "42";
my $y = $x + 10; # 52 (string → number)
Number to String
my $x = 42;
my $s = "Value: $x"; # "Value: 42" (number → string)
String Concatenation
my $result = 10 . 20; # "1020" (both → string)
Boolean Context
if ("0") { ... } # False
if ("00") { ... } # True
if (0) { ... } # False
if (0.0) { ... } # False
if ("") { ... } # False
if (undef) { ... } # False
Typeglobs
Typeglobs are a special type that can hold entries for all variable types with the same name.
*name = \$scalar; # Alias glob to scalar
*name = \⊂ # Alias glob to subroutine
Primarily used for symbol table manipulation and importing.
PetaPerl-Specific Implementation
Memory Representation
PetaPerl uses efficient internal representations:
Scalars: Rust enum with variants for each type. Two string representations optimize for different access patterns.
#![allow(unused)]
fn main() {
pub enum Sv {
Undef,
Iv(i64), // Integer
Uv(u64), // Unsigned
Nv(f64), // Float
Pv(Arc<str>, u32), // Immutable string + virtual length (O(1) chomp)
PvBuf(Arc<String>), // Mutable string (COW via Arc::make_mut)
Rv(RvInner), // Reference
Av(Av), // Array
Hv(Hv), // Hash
// ... additional types
}
}
Dual string representation: Pv is used for constants and literals — the u32 virtual length enables O(1) chomp without modifying the shared string. PvBuf is used for mutable strings (default for new_string()) — Arc::make_mut() provides copy-on-write semantics with zero allocation when unshared.
Arrays: Dynamic vectors with efficient push/pop operations.
Hashes: Optimized hash tables with fast key lookup.
Shared Ownership
String scalars use Arc<str> (atomic reference counting):
- Cheap cloning (just increment counter)
- Thread-safe sharing for parallel execution
- Common strings can share storage
Aliasing Support
PetaPerl implements @_ aliasing correctly:
- Arguments are aliases to caller’s variables
- Modifications write through to original
- Uses
SvCellfor mutable indirection
sub modify {
$_[0] = "changed"; # Modifies caller's variable
}
my $x = "original";
modify($x);
print $x; # "changed"
Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
| Scalar assignment | O(1) | String uses Arc (no copy) |
| Array push/pop | O(1) amortized | Dynamic growth |
| Array shift/unshift | O(n) | Must move elements |
| Hash access | O(1) average | Optimized hash function |
| Hash insert | O(1) average | With growth |
Parallel Execution
PetaPerl’s parallelization model:
- Each thread gets its own lexical pad (no shared mutation)
- Array and hash operations are thread-safe
Arc<str>strings share across threads safely- Loop-level parallelism doesn’t require global locks
Context Sensitivity
Perl’s context system determines how expressions evaluate.
Scalar Context
Forces single-value evaluation:
my $count = @array; # Length
my $last = (1, 2, 3); # Last element
my $concat = (1, 2, 3); # 3
List Context
Forces multiple-value evaluation:
my @copy = @array; # All elements
my @results = func(); # All return values
my ($a, $b) = (1, 2, 3); # First two elements
Void Context
Result is discarded:
func(); # Return value ignored
print "hello"; # No assignment
Context Propagation
my @arr = (1, 2, 3);
# Scalar context
my $x = keys @arr; # keys in scalar context → count
# List context
my @k = keys @arr; # keys in list context → all keys
# Function argument context
func(@arr); # List context
func(scalar @arr); # Scalar context (explicit)
Constants
Constants are immutable values.
Literal Constants
42 # Integer
3.14 # Float
"string" # String
qw(a b c) # List of strings
Named Constants
use constant PI => 3.14159;
use constant MAX => 100;
use constant {
RED => 0xFF0000,
GREEN => 0x00FF00,
BLUE => 0x0000FF,
};
Compile-Time Constant Folding
PetaPerl performs constant folding at compile time:
my $x = 2 + 3; # Folded to 5
my $len = length("hello"); # Folded to 5
my $sub = substr("text", 0, 2); # Folded to "te"
This optimization eliminates runtime computation for constant expressions.
Special Types
Code References
Subroutines can be referenced and called dynamically:
my $coderef = sub { return $_[0] * 2 };
my $result = $coderef->(21); # 42
my $coderef = \&existing_sub;
$coderef->(@args);
Filehandles
Filehandles are special scalars:
open my $fh, '<', $file or die $!;
my $line = <$fh>;
close $fh;
Globs
Typeglobs reference symbol table entries:
*alias = *original; # Alias all types
*func = sub { ... }; # Install subroutine