<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 10/11/2020 9:28 PM, James Kass via
Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:40ba9ef0-6618-24f0-f3e6-fb06c0dbf753@code2001.com">
<br>
<br>
On 2020-10-12 3:37 AM, Tom Honermann via Unicode wrote:
<br>
<blockquote type="cite">On 10/11/20 11:32 PM, JF Bastien wrote:
<br>
<blockquote type="cite">It’s a bit odd: if you assume the
default is ascii then you don’t need this. If you assume the
default is utf8 then you don’t need this... so when do you
need the BOM? It seems like making bad prior choices more
acceptable... even though they were bad choices. I’m not sure
it’s a good idea.
<br>
</blockquote>
<br>
A BOM would be needed when:
<br>
<br>
1. The default encoding is ASCII based (ISO-8859-1,
Windows-1252,
<br>
etc...) and the UTF-8 text to be produced contains non-ASCII
<br>
characters. Or,
<br>
2. The default encoding is not ASCII based (e.g., EBCDIC).
<br>
<br>
Both of these cases presume that the default encoding can't be
made UTF-8 for backward compatibility reasons.
<br>
<br>
Tom.
<br>
</blockquote>
<br>
<br>
1. UTF-8 text consists only of ASCII characters. Even if some
ASCII strings reference non-ASCII characters. <br>
</blockquote>
<p>Come again?</p>
<p>UTF-8 encoding of ASCII characters consists only of these ASCII
characters. However, any other character results in sequences of
only non-ASCII code units - bytes with the high bit set.</p>
<p>Thus satisfied that your premise is incorrect, do any of your
following conclusions still apply?</p>
<p>A./<br>
</p>
<blockquote type="cite"
cite="mid:40ba9ef0-6618-24f0-f3e6-fb06c0dbf753@code2001.com">
It's the same idea as HTML numeric character references which
point to non-ASCII characters while being composed of ASCII
characters. It shouldn't matter whether a string of ASCII digits
form the charcter number or a string of UTF-8 hex bytes form that
number. A Unicode-aware application will display the string as a
special character while legacy applications will show the string
as mojibake. Either way, UTF-8 remains an ASCII-preserving
encoding format.
<br>
<br>
2. Files using non-standard encodings should be converted to
Unicode.
<br>
<br>
Any plain-text file should be presumed to be UTF-8 unless marked
otherwise.
<br>
<br>
Years ago, the UTF-8 signature was sometimes considered helpful.
Nowadays it seems be more of an anachronism.
<br>
</blockquote>
<p><br>
</p>
</body>
</html>