```{index} single: Encode; Perl module ``` # Encode ```{pperl-module-badges} Encode ``` Convert between Perl character strings and bytes in any named encoding — UTF-8, UTF-16, Latin-1, CP1252, Shift_JIS, EUC-JP, and every other IANA-registered character set. Encode works in two directions. `decode($name, $bytes)` takes raw bytes in the named encoding and returns a Perl character string (the `SVf_UTF8` flag is set on the result). `encode($name, $string)` goes the other way: it takes a character string and returns raw bytes in the target encoding. Keep the two operations mentally distinct — strings are sequences of Unicode codepoints, bytes are what you read from and write to files, sockets, and pipes. Because UTF-8 is the internal form Perl uses for character strings, it has a fast path: `encode_utf8` and `decode_utf8` skip the full encoding machinery and just flip or validate the `SVf_UTF8` flag. Three low-level helpers let you poke at that flag directly: `is_utf8` queries it, `_utf8_on` forces it on, `_utf8_off` forces it off. Use those only when you know what you are doing — they change how Perl interprets the bytes already in the scalar without touching the bytes themselves. Every conversion takes an optional `$check` bitmask that controls what happens when a character cannot be represented in the target encoding. The predefined values are `FB_DEFAULT` (substitute with `?` or the encoding's replacement character), `FB_CROAK` (die), `FB_QUIET` (stop and return the converted prefix), `FB_WARN` (warn and substitute), `FB_HTMLCREF` (substitute with `&#NNNN;`), `FB_XMLCREF` (substitute with `&#xHHHH;`), and `FB_PERLQQ` (substitute with `\x{HHHH}`). `LEAVE_SRC`, `STOP_AT_PARTIAL`, `PERLQQ`, `WARN_ON_ERR`, and `ONLY_PRAGMA_WARNINGS` are the raw bits you OR together to build custom check values. Encoding names are resolved through a registry. `find_encoding($name)` returns a blessed encoding object you can call methods on; `resolve_alias($name)` returns the canonical name as a string; `encodings()` lists every name the registry knows about. `from_to($octets, $from, $to)` is a one-shot in-place conversion useful when all you want is to reencode a byte string — for example rewriting a file body from Latin-1 to UTF-8 without unpacking it into characters first. It handles BOM-tagged and MIME-tagged inputs when paired with `find_mime_encoding`. ## Functions ### Encode/decode #### [`native_encode`](Encode/native_encode) Turn a character string into a byte string in the named encoding. #### [`native_decode`](Encode/native_decode) Turn a byte string in the named encoding into a Perl character string. #### [`native_encode_utf8`](Encode/native_encode_utf8) Fast path for encoding a string to UTF-8 bytes. #### [`native_decode_utf8`](Encode/native_decode_utf8) Fast path for decoding UTF-8 bytes to a Perl character string. ### UTF-8 flags #### [`native_is_utf8`](Encode/native_is_utf8) Return true if the scalar carries the `SVf_UTF8` flag. #### [`native_utf8_on`](Encode/native_utf8_on) Force `SVf_UTF8` on in place, without touching the underlying bytes. #### `native_utf8_off` Force `SVf_UTF8` off in place, without touching the underlying bytes. ### Encoding registry #### [`native_find_encoding`](Encode/native_find_encoding) Look up an encoding by name and return an object you can call methods on. #### `native_resolve_alias` Return the canonical encoding name for an alias, or `undef` if unknown. #### [`native_encodings`](Encode/native_encodings) Return the list of encoding names the registry knows about. #### [`native_obj_encode`](Encode/native_obj_encode) Method form of `encode` on an encoding object. #### [`native_obj_decode`](Encode/native_obj_decode) Method form of `decode` on an encoding object. #### `native_obj_name` Return the canonical name of an encoding object as a string. #### `native_obj_renew` Return a fresh encoding object (effectively a no-op returning `$self`). #### `native_obj_perlio_ok` Return true if the encoding is safe to stack as a PerlIO layer. ### MIME/XML helpers #### `native_fb_htmlcref` `CHECK` value `520` — replace unencodable characters with HTML decimal character references (`&#NNNN;`). #### `native_fb_xmlcref` `CHECK` value `1032` — replace unencodable characters with XML hexadecimal character references (`&#xHHHH;`). ### Conversion #### [`native_from_to`](Encode/native_from_to) Reencode a byte string in place from one encoding to another. ### Utilities #### `native_fb_default` `CHECK` value `0` — substitute unencodable characters with the encoding's default replacement (usually `?` or U+FFFD). #### `native_fb_croak` `CHECK` value `1` — die on the first unencodable character or invalid byte sequence. #### `native_fb_quiet` `CHECK` value `4` — stop at the first unencodable character and return the encoded prefix; truncate the input to what was not consumed (method form only). #### `native_fb_warn` `CHECK` value `6` — warn and substitute on unencodable input. #### `native_fb_perlqq` `CHECK` value `264` — replace unencodable characters with Perl `\x{HHHH}` escape sequences. #### `native_leave_src` `CHECK` bit `8` — when OR'd into `$check`, keeps the input scalar untouched; the default is to consume its successfully-encoded prefix. #### `native_stop_at_partial` `CHECK` bit `2048` — stop at a partial trailing multi-byte sequence rather than reporting it as an error. Useful for streaming decoders. #### `native_perlqq` `CHECK` bit `256` — the raw bit behind `FB_PERLQQ`; OR it into your own check mask for `\x{HHHH}` substitution. #### `native_warn_on_err` `CHECK` bit `2` — emit a warning on encoding errors. Combined with a substitution bit to build custom fallback behaviour. #### `native_only_pragma_warnings` `CHECK` bit `16` — emit encoding warnings only when the caller has `use warnings 'utf8'` (or equivalent) active, rather than unconditionally. ```{toctree} :hidden: :maxdepth: 1 Encode/native_encode Encode/native_decode Encode/native_encode_utf8 Encode/native_decode_utf8 Encode/native_find_encoding Encode/native_is_utf8 Encode/native_utf8_on Encode/native_encodings Encode/native_from_to Encode/native_obj_encode Encode/native_obj_decode ```