# Encode
📦 std
Convert between Perl character strings and bytes in any named encoding —
UTF-8, UTF-16, Latin-1, CP1252, Shift_JIS, EUC-JP, and every other
IANA-registered character set.
Encode works in two directions. `decode($name, $bytes)` takes raw bytes
in the named encoding and returns a Perl character string (the `SVf_UTF8`
flag is set on the result). `encode($name, $string)` goes the other
way: it takes a character string and returns raw bytes in the target
encoding. Keep the two operations mentally distinct — strings are
sequences of Unicode codepoints, bytes are what you read from and
write to files, sockets, and pipes.
Because UTF-8 is the internal form Perl uses for character strings,
it has a fast path: `encode_utf8` and `decode_utf8` skip the full
encoding machinery and just flip or validate the `SVf_UTF8` flag.
Three low-level helpers let you poke at that flag directly:
`is_utf8` queries it, `_utf8_on` forces it on, `_utf8_off` forces
it off. Use those only when you know what you are doing — they
change how Perl interprets the bytes already in the scalar without
touching the bytes themselves.
Every conversion takes an optional `$check` bitmask that controls
what happens when a character cannot be represented in the target
encoding. The predefined values are `FB_DEFAULT` (substitute with
`?` or the encoding’s replacement character), `FB_CROAK` (die),
`FB_QUIET` (stop and return the converted prefix), `FB_WARN`
(warn and substitute), `FB_HTMLCREF` (substitute with
`NNNN;`), `FB_XMLCREF` (substitute with `HHHH;`), and
`FB_PERLQQ` (substitute with `\x{HHHH}`). `LEAVE_SRC`,
`STOP_AT_PARTIAL`, `PERLQQ`, `WARN_ON_ERR`, and
`ONLY_PRAGMA_WARNINGS` are the raw bits you OR together to build
custom check values.
Encoding names are resolved through a registry. `find_encoding($name)`
returns a blessed encoding object you can call methods on;
`resolve_alias($name)` returns the canonical name as a string;
`encodings()` lists every name the registry knows about.
`from_to($octets, $from, $to)` is a one-shot in-place conversion
useful when all you want is to reencode a byte string — for
example rewriting a file body from Latin-1 to UTF-8 without
unpacking it into characters first. It handles BOM-tagged and
MIME-tagged inputs when paired with `find_mime_encoding`.
## Functions
### Encode/decode
#### [`native_encode`](Encode/native_encode.md)
Turn a character string into a byte string in the named encoding.
#### [`native_decode`](Encode/native_decode.md)
Turn a byte string in the named encoding into a Perl character string.
#### [`native_encode_utf8`](Encode/native_encode_utf8.md)
Fast path for encoding a string to UTF-8 bytes.
#### [`native_decode_utf8`](Encode/native_decode_utf8.md)
Fast path for decoding UTF-8 bytes to a Perl character string.
### UTF-8 flags
#### [`native_is_utf8`](Encode/native_is_utf8.md)
Return true if the scalar carries the `SVf_UTF8` flag.
#### [`native_utf8_on`](Encode/native_utf8_on.md)
Force `SVf_UTF8` on in place, without touching the underlying bytes.
#### `native_utf8_off`
Force `SVf_UTF8` off in place, without touching the underlying bytes.
### Encoding registry
#### [`native_find_encoding`](Encode/native_find_encoding.md)
Look up an encoding by name and return an object you can call methods on.
#### `native_resolve_alias`
Return the canonical encoding name for an alias, or `undef` if unknown.
#### [`native_encodings`](Encode/native_encodings.md)
Return the list of encoding names the registry knows about.
#### [`native_obj_encode`](Encode/native_obj_encode.md)
Method form of `encode` on an encoding object.
#### [`native_obj_decode`](Encode/native_obj_decode.md)
Method form of `decode` on an encoding object.
#### `native_obj_name`
Return the canonical name of an encoding object as a string.
#### `native_obj_renew`
Return a fresh encoding object (effectively a no-op returning `$self`).
#### `native_obj_perlio_ok`
Return true if the encoding is safe to stack as a PerlIO layer.
### MIME/XML helpers
#### `native_fb_htmlcref`
`CHECK` value `520` — replace unencodable characters with HTML decimal character references (`NNNN;`).
#### `native_fb_xmlcref`
`CHECK` value `1032` — replace unencodable characters with XML hexadecimal character references (`HHHH;`).
### Conversion
#### [`native_from_to`](Encode/native_from_to.md)
Reencode a byte string in place from one encoding to another.
### Utilities
#### `native_fb_default`
`CHECK` value `0` — substitute unencodable characters with the encoding’s default replacement (usually `?` or U+FFFD).
#### `native_fb_croak`
`CHECK` value `1` — die on the first unencodable character or invalid byte sequence.
#### `native_fb_quiet`
`CHECK` value `4` — stop at the first unencodable character and return the encoded prefix; truncate the input to what was not consumed (method form only).
#### `native_fb_warn`
`CHECK` value `6` — warn and substitute on unencodable input.
#### `native_fb_perlqq`
`CHECK` value `264` — replace unencodable characters with Perl `\x{HHHH}` escape sequences.
#### `native_leave_src`
`CHECK` bit `8` — when OR’d into `$check`, keeps the input scalar untouched; the default is to consume its successfully-encoded prefix.
#### `native_stop_at_partial`
`CHECK` bit `2048` — stop at a partial trailing multi-byte sequence rather than reporting it as an error. Useful for streaming decoders.
#### `native_perlqq`
`CHECK` bit `256` — the raw bit behind `FB_PERLQQ`; OR it into your own check mask for `\x{HHHH}` substitution.
#### `native_warn_on_err`
`CHECK` bit `2` — emit a warning on encoding errors. Combined with a substitution bit to build custom fallback behaviour.
#### `native_only_pragma_warnings`
`CHECK` bit `16` — emit encoding warnings only when the caller has `use warnings 'utf8'` (or equivalent) active, rather than unconditionally.