native_decode#
Turn a byte string in the named encoding into a Perl character string.
Synopsis#
my $string = decode($encoding, $bytes);
my $string = decode($encoding, $bytes, $check);
What you get back#
A scalar holding a Perl character string — the SVf_UTF8 flag is
set on the result. Each element of that string is one Unicode
codepoint; indexing with substr and counting with length
returns characters, not bytes.
If $encoding is unknown, decode croaks with
Unknown encoding '...'.
The optional $check argument controls what happens when the
bytes are not valid in the source encoding:
FB_DEFAULT(0, the default) — substitute invalid sequences with U+FFFD (the Unicode replacement character).FB_CROAK— die on the first invalid byte.FB_QUIET— decode the valid prefix and stop. In the method form, the consumed prefix is also removed from$bytes.FB_WARN— warn and substitute with U+FFFD.
Examples#
Decode UTF-8 bytes read from a file:
my $string = decode('UTF-8', "caf\xc3\xa9");
## length($string) == 4, fourth char is U+00E9
Decode CP1252 bytes, turning Windows “smart quotes” into Unicode:
my $string = decode('cp1252', "\x93hi\x94");
## $string is "\x{201c}hi\x{201d}"
Die on malformed UTF-8:
use Encode qw(decode FB_CROAK);
my $string = decode('UTF-8', "\xc3\x28", FB_CROAK);
## dies: utf8 "\xC3" does not map to Unicode
Edge cases#
undefinput returns an empty character string.Encoding
"null"passes input bytes through unchanged.A byte string that is already valid UTF-8 is re-tagged with
SVf_UTF8without reallocating.
Differences from upstream#
Fully compatible with upstream for ASCII, Latin-1, CP1252, the
ISO-8859 family, and UTF-8. Covered by
t/81-xs-native/Encode/010-basic.t and
t/81-xs-native/Encode/090-decode-inplace.t.
See also#
encode— the inverse, string to bytes.decode_utf8— the UTF-8-only fast path.from_to— reencode in place without materialising a character string in between.