Italics get used to express important semantic meaning, so unicode should support them

Kent Karlsson kent.b.karlsson at bahnhof.se
Wed Dec 16 18:47:02 CST 2020



> 16 dec. 2020 kl. 06:49 skrev Asmus Freytag via Unicode <unicode at unicode.org>:
> 
> On 12/15/2020 8:19 PM, David Starner via Unicode wrote:
>> On Tue, Dec 15, 2020 at 4:47 PM Sławomir Osipiuk via Unicode
>> <unicode at unicode.org> <mailto:unicode at unicode.org> wrote:
>>> "Implementations of Unicode that already make use of out-of-band
>>> mechanisms for language [or format] tagging or “heavy-weight” in-band
>>> mechanisms such as XML or HTML will continue to do exactly what they
>>> are doing and will ignore the tag characters completely. They may even
>>> prohibit their use to prevent conflicts with the equivalent markup."
>> So every single thing that interfaces with HTML now has to handle
>> Unicode italics on any plain text input, or silently dump them into
>> the stream, and the web browser may have to handle them or not.
> ^^^That.

Let me paraphrase:

”So every single thing that interfaces with HTML now has to handle RTF italics on any plain text input,
or silently dump them into the stream, and the web browser may have to handle them or not.”

You would not use that as an argument to say that RTF (which I picked just because it is well-known)
should be wiped from the face of Earth? I would think not… (You may want to wipe RTF from the face
of the Earth, I don’t know, but you would not use that argument even if you do want that.)

Even if, in these threads, the term ”plain text formatting” is used (or worse ”Unicode formatting”), that
is a bit misleading (of course). I don’t think these proposals should be applied to text data of the ”type”
’tex/plain” (or as a filename suffix, ”.txt”), nor such things as filenames themselves, and of course not to
”text/html”/”.html”, nor to ”application/pdf”/”.pdf”, nor to ”application/rtf”/”.rtf”, etc. One should be using (a)
new file type(s), POSSIBLY (if one can agree on a single one) even apply it to ”text/plain”/”.txt” (but not
to HTML, RTF, etc., and not (I would say) to filenames or similar, such markup should not even be
permitted in filenames and similar; note: ”should...”, not ”are...").

The point being that the markup would be default-ignorable, and thus normally ”invisible” when not
interpreted, even in a ”plain” text file. Granted, the ECMA-48 approach (if not mapping to TAG
characters) would need a bit of ”extending” the default-ignorability property to certain follow-on
characters (that normally are printable) after ESC and CSI (terminal emulators do that all the time,
and have done so for decades, so it is nothing revolutionary). That is, that the markup does not ”hijack”
normal printable characters for its markup syntax; if ECMA-48 had been done today I think it would use
default-ignorable characters through-out the ESC- and CSI-sequences, not just for the lead character.
(Plus, I think that no use of out-of-band stylesheets is also a point. Plus that some argue for excessive
”bare-boned-ness”; but I don’t agree with that.)

That is my take on this issue at least.

----
> hardcoding 
> 
> visual appearance is really the least helpful, because that totally
> undercuts the the ability for style sheets to address presentation.
Yes, but… Re. ECMA-48 (which we touched on in this thread), there the styling is really
”hardcoded”, and there are no style sheets. For ECMA-48 (which is still very much in use,
and extensions are being implemented). I don’t think it would be a good idea to introduce
any (separate) style sheets of any kind. It is not at all geared for that, and re-gearing it for
that would not be a good idea to do (IMHO). Similarly for any ”plain text” (”low level”, really)
formatting proposal other than ECMA-48. But for HTML and similar, fine; stylesheets are great!

/Kent Karlsson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201217/833c0c28/attachment.htm>


More information about the Unicode mailing list