Encoding italic

Philippe Verdy via Unicode unicode at unicode.org
Mon Jan 28 02:19:18 CST 2019

So you used
"<U+E003C,U+E0063,U+E003E>bold <U+E003C,U+E002F,U+E0063,U+E003E>
I.e, you converted from ASCII to tag characters the full HTML sequences
"<b>" and "</b>", including the HTML element name. I see little interest
for that approach.

Additionally this means that U+E003C is the tag identifier and its scope
does not end for the rest of the text (the HTML close tag is closing the
previous Unicode tag but opens a new one, as the second sequence is not
<U+E003C,U+E007F>, i.e. the Unicode tag-cancel).

I bet that a Unicode confirming code that treats some tag characters could
choose to remove everything in a Unicode tag that it does not understand
(e.g. U+E003C is not an understood identifier, only U+E0001 is understood
as a language tag) or does not want to parse but without the tag-cancel,
all the rest of your email could have been truncated, instead of just the
tagged text "bold".

Given how HTML tags are nesting(.. or not...), I don't think this approach
is desirable

And I'm not sure that everyone on this list actually received you mail with
this tag, it may have happened that your mail was truncated or all U+E00nn
characters were silently removed by an intermediate agent not wanting to
support any Unicode Tag character.

Le lun. 28 janv. 2019 à 03:03, James Kass via Unicode <unicode at unicode.org>
a écrit :

> On 2019-01-27 11:44 PM, Philippe Verdy wrote:
>  > You're not very explicit about the Tag encoding you use for these
> styles.
> This ������bold�������� new concept was not mine.  When I tested it
> here, I was using the tag encoding recommended by the developer.
>  > Of course it must not be a language tag so the introducer is not
> U+E0001, or a cancel-all tag so it
>  > is not prefixed by U+E007F   It cannot also use letter-like,
> digit-like and hyphen-like tag characters
>  > for its introduction.  So probably you use some prefix in
> U+E0002..U+E001F and some additional tag
>  > (tag "I" for italic, tag "B" for bold, tag "U" for underline, tag "S"
> for strikethough?) and the cancel
>  > tag to return to normal text (terminate the tagged sequence).
> Yes, U+E0001 remains deprecated and its use is strongly discouraged.
>  > Or may be you just use standard HTML encoding by adding U+E0000 to
> each character of the HTML
>  > tag syntax (including attributes and close tags, allowing embedding?)
> So you use the "<" and ">" tag
>  > characters (possibly also the space tag U+E0020, or TAB tag U+E0009
> for separating attributes and the
>  > quotation tags for attribute values)?  Is your proposal also allowing
> the embedding of other HTML
>  > objects (such as SVG)?
> AFAICT, this beta release supports the tag sequences <i></i>, <b></b>,
> <s></s>, & <u></u> expressed here in ASCII.  I don’t know if the
> software developer has plans to expand the enhancements in the future.
>  > And what is then the interest compared to standard HTML (it is not
> more compact, ...
> This was one of the ideas which surfaced earlier in this thread. Some
> users have expressed an interest in preserving, for example, italics in
> plain-text and are uncomfortable using the math alphanumerics for this,
> although the math alphanumerics seem well qualified for the purpose.
> One of the advantages given for this approach earlier is that it can be
> made to work without any official sanction and with no action necessary
> by the Consortium.
>  > I bet in fact that all tag characters are most often restricted in
> text input forms, and will be
>  > silently discarded or the whole text will be rejected.
> In this e-mail, I used the tags <b> & </b> around the word “bold” in the
> first sentence of my reply in order to test your bet.
>  > We were told that these tag characters were deprecated, and in fact
> even their use for language
>  > tags has not found any significant use except some trials (but there
> are now better technologies
>  > available in lot of softwares, APIs and services, and application
> design/development tools, or
>  > document editing/publishing tools).
> Indeed, these tags were deprecated.  At the time the tags were
> deprecated, there was such sorrow on this list that some list members
> were even inspired to compose haiku lamenting their passing and did post
> those haiku to this list.  Now, thanks to emoji requirements, many of
> those tags are experiencing a resurrection/renaissance.  I wonder if
> anyone is composing limericks in joyful celebration…
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190128/4cf103c1/attachment.html>

More information about the Unicode mailing list