Program input — @ARGV, %ENV, @INC#

Three globals carry input into the program: command-line arguments, environment variables, and the module search path. They share enough context that decisions about one usually inform the others — @ARGV interacts with $0, %ENV’s Perl-relevant keys influence @INC, and @INC’s ordering question is mirrored by similar concerns for @ARGV consumers and shell PATH.

Variable

Holds

@ARGV

Command-line arguments (excluding script name)

$0

Script name (paired with @ARGV)

%ENV

Process environment

@INC

Module search path (used by use/require)

%INC

Already-loaded modules

$INC

Current @INC index during a hook (5.37.7+)

@ARGV and its relationship to $0#

After Perl starts, the script name is in $0 and the script’s arguments are in @ARGV:

$ pperl myscript.pl -v --output /tmp/x foo bar
print "$0\n";                    # myscript.pl
print "@ARGV\n";                 # -v --output /tmp/x foo bar
print scalar(@ARGV), " args\n";  # 5 args

$ARGV[0] is the first script argument, not the script name itself. This differs from C’s argv (where argv[0] is the program name); Perl exposes the program name separately via $0.

Standard argument handling#

Hand-written arg parsing is fine for trivial scripts:

my %opt;
while (@ARGV && $ARGV[0] =~ /^-/) {
    my $flag = shift @ARGV;
    last if $flag eq '--';
    if    ($flag eq '-v')      { $opt{verbose} = 1 }
    elsif ($flag eq '--output'){ $opt{output}  = shift @ARGV }
    else                       { die "unknown flag: $flag\n" }
}
my @files = @ARGV;

For anything beyond toy scripts, the Getopt::Long module handles every standard pattern (long options, abbreviation, mandatory arguments, repeated flags, -- separator, automatic help):

use Getopt::Long;
my %opt;
GetOptions(
    'verbose|v'    => \$opt{verbose},
    'output|o=s'   => \$opt{output},
    'help|h'       => \$opt{help},
) or die "bad options; try --help\n";
my @files = @ARGV;               # what's left after option processing

GetOptions mutates @ARGV in place — when it returns, @ARGV contains only the non-option arguments.

@ARGV and the diamond operator#

The <> (diamond) operator iterates lines from the files named in @ARGV, falling back to STDIN if @ARGV is empty:

while (<>) {
    # one line from one of the @ARGV files (or STDIN)
    chomp;
    process($_);
}

This is the default contract of every awk-style filter program. <> opens each file in turn, sets $ARGV to its name during reading, and closes it at EOF.

-i and the other - switches (-n, -p, -l, -a) all build their behaviour on this @ARGV <> pattern.

Modifying @ARGV before <>#

You can pre-process @ARGV to alter what <> reads:

# Add an extra file to the front of the argv list:
unshift @ARGV, 'header.txt';

# Process compressed files transparently by rewriting argv:
@ARGV = map { /\.gz$/ ? "gunzip -c $_ |" : $_ } @ARGV;

while (<>) {
    process($_);
}

The cmd | form turns the argument into a pipe-from-command open; this is a long-standing Perl idiom for transparent decompression in awk-style scripts.

%ENV — the process environment#

%ENV is the hash of environment variables inherited from the parent process (typically the shell). Reading is straightforward:

my $home    = $ENV{HOME};
my $term    = $ENV{TERM} // 'dumb';
my $path    = $ENV{PATH};

Writing changes the environment seen by child processes that this Perl script subsequently spawns:

$ENV{LC_ALL}     = 'C';          # POSIX locale for child commands
$ENV{TZ}         = 'UTC';
$ENV{LANG}       = 'C';
delete $ENV{LC_NUMERIC};         # remove a key entirely

system('date');                   # child sees the modified ENV

The change does not propagate back to the parent — the shell that started the script keeps its original environment. If you need the parent shell to pick up new values, write them to a file or have the shell eval your output (the standard dotenv pattern).

Stringification of values#

As of Perl 5.18, %ENV values are always stringified at assignment time. Storing a reference no longer round-trips:

my $arr = [1, 2, 3];
$ENV{DATA} = $arr;
print $ENV{DATA};                # "ARRAY(0x...)" — stringified

Environment variables are a string-typed interface to the operating system; preserving structure across a fork/exec boundary would not work anyway. Marshal anything non-trivial as JSON or similar before storing.

Perl-specific environment variables#

A small set of environment variables are honoured by Perl itself, not by user scripts:

Variable

Effect

PERL5LIB

Colon-separated paths prepended to @INC (subject to taint rules)

PERLLIB

Same, older variable; used only if PERL5LIB is unset

PERL5OPT

Switches applied as if on the command line (subset only)

PERLDB_OPTS

Options for the debugger

PERL_UNICODE

Default UTF-8 layer settings (mirrors -C)

PERL_USE_UNSAFE_INC

Re-add . to @INC (removed by default since 5.26 — strongly discouraged)

PERL_HASH_SEED

Per-run hash randomisation seed

PERL_SIGNALS

unsafe to opt out of deferred signal handling

HOME

Read by ~ glob expansion; not Perl-specific but Perl-relevant

PATH

Searched by system/exec for unqualified commands

In tainted mode (-T), PERL5LIB, PERL5OPT, and PERLLIB are ignored to prevent privilege escalation. See the command-line switches guide for the full security model.

delete $ENV{KEY} versus $ENV{KEY} = undef#

The two are not equivalent:

delete $ENV{KEY};                # KEY is not in the environment
$ENV{KEY} = undef;               # KEY is in the environment, value ""

A child process started after delete will not see KEY at all — getenv("KEY") returns NULL. After undef assignment, the child sees KEY="" (empty string). Code that distinguishes «unset» from «set to empty» cares about this difference.

@INC — the module search path#

When use and require look up a module, they walk @INC left to right and stop at the first match:

print "$_\n" for @INC;
# /usr/local/lib/perl5/site_perl/5.42.0/x86_64-linux
# /usr/local/lib/perl5/site_perl/5.42.0
# /usr/local/lib/perl5/5.42.0/x86_64-linux
# /usr/local/lib/perl5/5.42.0
# (no trailing "." since Perl 5.26)

@INC is initialised at startup from a combination of:

  1. Compiled-in defaults (the build-time installprivlib, installsitelib, …).

  2. -I /path switches on the command line, prepended in order.

  3. The PERL5LIB (or PERLLIB) environment variable, also prepended.

After startup it is just an ordinary array. Order matters: a match in an early element wins.

use lib versus unshift @INC versus push @INC#

Three ways to add a path. They are not equivalent:

use lib '/my/lib';                       # unshift at compile-time
unshift @INC, '/my/lib';                 # unshift at runtime
push    @INC, '/my/lib';                 # push    at runtime

The differences:

  • use lib runs at compile time. It happens before any use statement that follows it on the page. This is what you almost always want when you’re saying «this script needs a custom library directory»:

    use lib '/opt/myapp/lib';
    use MyApp::Module;                     # found because lib was unshifted first
    

    Internally use lib '/path' does BEGIN { unshift @INC, '/path' }, so the new path is searched first — a copy of a module in /opt/myapp/lib shadows one in the system @INC.

  • unshift @INC, '/path' at the top level has runtime semantics. It works for require calls that happen after the unshift, but a use statement on a later line of the same file will not see it (because use is itself compile-time):

    unshift @INC, '/my/lib';
    use Some::Module;                      # NOT found in /my/lib — too early
    

    This is a frequent gotcha. BEGIN { unshift @INC, '/my/lib' } fixes it; use lib is the cleaner spelling of exactly that.

  • push @INC, '/path' appends. The new path is searched last. This is the right choice when you want your library to be a fallback — used only if no other location has the module — for example, providing a built-in version of a CPAN module that the user might have installed on the system.

The rule of thumb: always use use lib for «I want my code’s module path to take effect at compile time», which is almost every case.

@INC hooks#

@INC may contain not only paths but also code references and blessed objects. When require encounters one of these, it calls it as a hook — passing the requested module path — and the hook returns either an open filehandle or a list of source code, generators, etc. This is how mechanisms like Module::Pluggable, PAR, and Test::MockModule intercept module loading.

The full hook protocol is documented under require. Most users never write one; you read about them when debugging «why does this module not load from where I expect» problems.

%INC — the loaded-modules cache#

After a module is successfully loaded, %INC records it. The key is the path Perl was asked to find (e.g. Foo/Bar.pm); the value is the absolute path that satisfied the request:

use Data::Dumper;
print "Data::Dumper loaded from $INC{'Data/Dumper.pm'}\n";
# /usr/local/lib/perl5/site_perl/5.42.0/Data/Dumper.pm

# Show every loaded module:
for my $key (sort keys %INC) {
    print "$key  →  $INC{$key}\n";
}

require checks %INC before walking @INC — a second require Foo; is a no-op because Foo.pm is already in the cache. To force a re-require:

delete $INC{'Foo.pm'};
require Foo;                     # actually re-runs Foo.pm

This is the standard idiom for testing reload behaviour. Be aware that re-running a module file does not un-define what the first run created — packages, subs, and globals stay around.

$INC — the index inside an @INC hook#

Available since Perl 5.37.7. When an @INC hook is being called, $INC is set to the index of the hook in @INC. After the hook returns, the iterator advances based on $INC + 1 — so the hook can rewrite @INC and steer the search to a specific position afterwards. This is an advanced feature; everyday code never touches it.

See also#

  • use, require — the consumers of @INC/%INC.

  • use lib — the canonical compile-time spelling for prepending to @INC.

  • Getopt::Long — the standard parser for @ARGV.

  • readline — the function form of the diamond <> operator that drives @ARGV STDIN filtering.

  • -i, -n, -p, -a — command-line switches built on @ARGV.

  • Process variables$0 is the script name; $$ is the PID; pair these with @ARGV for self-restart and logging.

  • Command-line one-liners — the tutorial showing every @ARGV/<>/switch combination in context.