Italics get used to express important semantic meaning, so unicode should support them

Tue Dec 15 18:41:09 CST 2020

On Tue, Dec 15, 2020 at 6:26 PM Mark E. Shoulson <mark at kli.org> wrote:
>
> But how is that different from anything being proposed?  If this idea were accepted as part of Unicode, then it *would* be a feature of Unicode, just as whatever is being proposed would be if it were accepted.  How does it matter if italicizing something is marked by some new U+DEADBF characters or by existing tag characters?

- Rather than a completely new method, it's "just" an extension of an
existing feature. (Tag syntax, scope, and default ignorability are
already defined in the Unicode standard)
- The syntax "naturally" discourages complicated format nesting.
Unicode may formally restrict format combos.

> If you insist that Unicode-compliant text readers must show italics or bold when marked with such-and-such characters,

Absolutely not!

> Conversely, if you're okay with pseudo-markup, this should sound fine to you.  Why doesn't it?

"Not my first choice" is what I said. It's not bad, but its similarity
to HTML is not a good thing in my eyes, because it raises the question
"I can do this in HTML, why can't I do it in UnicodeML™?" and push for
more and more HTML features to be included. It encourages feature
creep, which I said I'm against. Familiarity is not always a good
thing.

> (how would this markup interact with other markup, like HTML, I wonder?)

(From the Unicode Standard, page 916, with [] additions by me; notice
how little the text changes)

"The rules for Unicode conformance for the tag characters are exactly
the same as those for any other Unicode characters. A conformant
process is not required to interpret the tag characters. If it does
interpret them, it should interpret them according to the standard—
that is, as spelled-out tags. However, there is no requirement to
provide a particular interpretation of the text because it is tagged
with a given language [or formatting]. If an application does not
interpret tag characters, it should leave their values undisturbed and
do whatever it does with any other uninterpreted characters.
[...]
"Implementations of Unicode that already make use of out-of-band
mechanisms for language [or format] tagging or “heavy-weight” in-band
mechanisms such as XML or HTML will continue to do exactly what they
are doing and will ignore the tag characters completely. They may even
prohibit their use to prevent conflicts with the equivalent markup."

Sławomir Osipiuk