Encode#

📦 std

Convert between Perl character strings and bytes in any named encoding — UTF-8, UTF-16, Latin-1, CP1252, Shift_JIS, EUC-JP, and every other IANA-registered character set.

Encode works in two directions. decode($name, $bytes) takes raw bytes in the named encoding and returns a Perl character string (the SVf_UTF8 flag is set on the result). encode($name, $string) goes the other way: it takes a character string and returns raw bytes in the target encoding. Keep the two operations mentally distinct — strings are sequences of Unicode codepoints, bytes are what you read from and write to files, sockets, and pipes.

Because UTF-8 is the internal form Perl uses for character strings, it has a fast path: encode_utf8 and decode_utf8 skip the full encoding machinery and just flip or validate the SVf_UTF8 flag. Three low-level helpers let you poke at that flag directly: is_utf8 queries it, _utf8_on forces it on, _utf8_off forces it off. Use those only when you know what you are doing — they change how Perl interprets the bytes already in the scalar without touching the bytes themselves.

Every conversion takes an optional $check bitmask that controls what happens when a character cannot be represented in the target encoding. The predefined values are FB_DEFAULT (substitute with ? or the encoding’s replacement character), FB_CROAK (die), FB_QUIET (stop and return the converted prefix), FB_WARN (warn and substitute), FB_HTMLCREF (substitute with &#NNNN;), FB_XMLCREF (substitute with &#xHHHH;), and FB_PERLQQ (substitute with \x{HHHH}). LEAVE_SRC, STOP_AT_PARTIAL, PERLQQ, WARN_ON_ERR, and ONLY_PRAGMA_WARNINGS are the raw bits you OR together to build custom check values.

Encoding names are resolved through a registry. find_encoding($name) returns a blessed encoding object you can call methods on; resolve_alias($name) returns the canonical name as a string; encodings() lists every name the registry knows about.

from_to($octets, $from, $to) is a one-shot in-place conversion useful when all you want is to reencode a byte string — for example rewriting a file body from Latin-1 to UTF-8 without unpacking it into characters first. It handles BOM-tagged and MIME-tagged inputs when paired with find_mime_encoding.

Functions#

Encode/decode#

native_encode#

Turn a character string into a byte string in the named encoding.

native_decode#

Turn a byte string in the named encoding into a Perl character string.

native_encode_utf8#

Fast path for encoding a string to UTF-8 bytes.

native_decode_utf8#

Fast path for decoding UTF-8 bytes to a Perl character string.

UTF-8 flags#

native_is_utf8#

Return true if the scalar carries the SVf_UTF8 flag.

native_utf8_on#

Force SVf_UTF8 on in place, without touching the underlying bytes.

native_utf8_off#

Force SVf_UTF8 off in place, without touching the underlying bytes.

Encoding registry#

native_find_encoding#

Look up an encoding by name and return an object you can call methods on.

native_resolve_alias#

Return the canonical encoding name for an alias, or undef if unknown.

native_encodings#

Return the list of encoding names the registry knows about.

native_obj_encode#

Method form of encode on an encoding object.

native_obj_decode#

Method form of decode on an encoding object.

native_obj_name#

Return the canonical name of an encoding object as a string.

native_obj_renew#

Return a fresh encoding object (effectively a no-op returning $self).

native_obj_perlio_ok#

Return true if the encoding is safe to stack as a PerlIO layer.

MIME/XML helpers#

native_fb_htmlcref#

CHECK value 520 — replace unencodable characters with HTML decimal character references (&#NNNN;).

native_fb_xmlcref#

CHECK value 1032 — replace unencodable characters with XML hexadecimal character references (&#xHHHH;).

Conversion#

native_from_to#

Reencode a byte string in place from one encoding to another.

Utilities#

native_fb_default#

CHECK value 0 — substitute unencodable characters with the encoding’s default replacement (usually ? or U+FFFD).

native_fb_croak#

CHECK value 1 — die on the first unencodable character or invalid byte sequence.

native_fb_quiet#

CHECK value 4 — stop at the first unencodable character and return the encoded prefix; truncate the input to what was not consumed (method form only).

native_fb_warn#

CHECK value 6 — warn and substitute on unencodable input.

native_fb_perlqq#

CHECK value 264 — replace unencodable characters with Perl \x{HHHH} escape sequences.

native_leave_src#

CHECK bit 8 — when OR’d into $check, keeps the input scalar untouched; the default is to consume its successfully-encoded prefix.

native_stop_at_partial#

CHECK bit 2048 — stop at a partial trailing multi-byte sequence rather than reporting it as an error. Useful for streaming decoders.

native_perlqq#

CHECK bit 256 — the raw bit behind FB_PERLQQ; OR it into your own check mask for \x{HHHH} substitution.

native_warn_on_err#

CHECK bit 2 — emit a warning on encoding errors. Combined with a substitution bit to build custom fallback behaviour.

native_only_pragma_warnings#

CHECK bit 16 — emit encoding warnings only when the caller has use warnings 'utf8' (or equivalent) active, rather than unconditionally.