native_encode#
Turn a character string into a byte string in the named encoding.
Synopsis#
my $bytes = encode($encoding, $string);
my $bytes = encode($encoding, $string, $check);
What you get back#
A scalar holding raw bytes. The SVf_UTF8 flag on the result is
always off: this is the form you write to files, sockets, and
pipes. The input $string is treated as a sequence of Unicode
codepoints regardless of how it is internally represented.
If $encoding is unknown, encode croaks with
Unknown encoding '...'.
The optional $check argument controls what happens when a
character cannot be represented in the target encoding:
FB_DEFAULT(0, the default) — substitute with?or the encoding’s replacement character.FB_CROAK— die on the first unencodable character.FB_QUIET— stop at the first unencodable character and return the encoded prefix. In the method form, the consumed prefix is also removed from$string.FB_WARN— warn and substitute.FB_PERLQQ— substitute with\x{HHHH}.FB_HTMLCREF— substitute with&#NNNN;.FB_XMLCREF— substitute with&#xHHHH;.
Examples#
Encode a string to UTF-8 bytes for writing to a file:
my $bytes = encode('UTF-8', "caf\x{e9}");
## $bytes is "caf\xc3\xa9" — 4 bytes, no SVf_UTF8
Encode to Latin-1, losing characters that don’t fit:
my $bytes = encode('iso-8859-1', "\x{20ac}"); # Euro sign
## $bytes is "?" — U+20AC has no Latin-1 byte
Die if anything can’t be encoded:
use Encode qw(encode FB_CROAK);
my $bytes = encode('ascii', "caf\x{e9}", FB_CROAK);
## dies: "\x{e9}" does not map to ascii
Edge cases#
undefinput returns an empty byte string.Input without
SVf_UTF8is treated as Latin-1 bytes and reencoded as such.Encoding
"null"passes input bytes through unchanged.
Differences from upstream#
Fully compatible with upstream for ASCII, Latin-1, CP1252, the
ISO-8859 family, and UTF-8. Shift_JIS, EUC-JP, and other
multi-byte encodings are not yet registered in the static table
and fall back to the Latin-1 identity mapping. Covered by
t/81-xs-native/Encode/040-encode-utf8-latin1.t and
t/81-xs-native/Encode/060-check-parameter.t.
See also#
decode— the inverse, bytes to string.encode_utf8— the UTF-8-only fast path.from_to— reencode in place without materialising a character string in between.