Does "endian-ness" apply to UTF-8 characters that use multiple bytes?

Clive Hohberger via Unicode unicode at unicode.org
Mon Feb 4 13:47:49 CST 2019


Asmus,
I believe it also applies to the bit order in the bytes
I believe UTF-16 and UTF-32 are transmitted as single 16 or 32-bit numbers.
UTF-8 is a stream of 8-bit numbers

Clive

*Clive P. Hohberger, PhD MBA*
Managing Director
Clive Hohberger, LLC
+1 847 910 8794
cph13 at case.edu

*Inventor of the Ultracode Bar Code Symbology*
*2017 Label Industry Global Award for Innovation*


On Mon, Feb 4, 2019 at 1:29 PM Asmus Freytag via Unicode <
unicode at unicode.org> wrote:

> On 2/4/2019 11:21 AM, Costello, Roger L. via Unicode wrote:
>
> Hello Unicode Experts!
>
> As I understand it, endian-ness applies to multi-byte words.
>
> Endian-ness does not apply to ASCII characters because each character is a single byte.
>
> Endian-ness does apply to UTF-16BE (Big-Endian), UTF-16LE (Little-Endian), UTF-32BE and UTF32-LE because each character uses multiple bytes.
>
> Clearly endian-ness does not apply to single-byte UTF-8 characters. But what about UTF-8 characters that use multiple bytes, such as the character é, which uses two bytes C3 and A9; does endian-ness apply? For example, if a file is in Little Endian would the character é appear in a hex editor as A9 C3 whereas if the file is in Big Endian the character é would appear in a hex editor as C3 A9?
>
> /Roger
>
>
>
> UTF-8 is a byte stream. Therefore, the order of bytes in a multiple byte
> integer does not come into it.
>
> A./
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190204/73c2f45a/attachment.html>


More information about the Unicode mailing list