native_decode_utf8#

Fast path for decoding UTF-8 bytes to a Perl character string.

Synopsis#

my $string = decode_utf8($bytes);
my $string = decode_utf8($bytes, $check);

Equivalent to decode('UTF-8', $bytes) but skips the encoding registry lookup. Input is fully validated as UTF-8; invalid sequences are replaced with U+FFFD unless $check asks for a different outcome.

What you get back#

A character string with the SVf_UTF8 flag set. length on the result counts characters, not bytes.

Examples#

my $string = decode_utf8("caf\xc3\xa9");

## length($string) == 4

use Encode qw(decode_utf8 FB_CROAK);
my $string = decode_utf8("\xc3\x28", FB_CROAK);

## dies on the invalid sequence

Edge cases#

  • undef returns an empty character string.

  • Input that is already valid UTF-8 is copied and flagged; no byte-level conversion happens.

Differences from upstream#

Fully compatible with upstream. Covered by t/81-xs-native/Encode/010-basic.t.

See also#

  • encode_utf8 — the inverse.

  • decode — the general form.

  • is_utf8 — check whether a scalar already carries the flag.