native_encode_utf8#

Fast path for encoding a string to UTF-8 bytes.

Synopsis#

my $bytes = encode_utf8($string);

Equivalent to encode('UTF-8', $string) but skips the encoding registry lookup. Prefer this when you know UTF-8 is what you want.

What you get back#

A byte string (no SVf_UTF8). If $string already had the SVf_UTF8 flag, the bytes are copied as-is and the flag is cleared on the copy. If $string was a raw byte string (Latin-1), its bytes are first upgraded to their UTF-8 multi-byte form.

Examples#

my $bytes = encode_utf8("caf\x{e9}");

## $bytes is "caf\xc3\xa9"

my $bytes = encode_utf8("\xe9");   # raw Latin-1 input

## $bytes is "\xc3\xa9"

Edge cases#

  • undef returns an empty byte string.

  • Idempotent only when applied to a character string; applying it twice mojibakes the result.

Differences from upstream#

Fully compatible with upstream.

See also#

  • decode_utf8 — the inverse.

  • encode — the general form.

  • is_utf8 — check whether a scalar already carries the flag.