Encoding italic

Mark E. Shoulson via Unicode unicode at unicode.org
Wed Jan 23 20:32:55 CST 2019

There is something deliciously simple, elegant... and kinda... 
rebellious? about doing this.  And it wouldn't even be in purview of 
Unicode.  "Yep, my HTML-renderer treats characters E0020..E007F just 
exactly the same 0020..007F, 'cept that it won't render 'em."  And you 
can send HTML text that looks for all the world like plain text to any 
normal Unicode-conformant viewer.  Now, the security issues of being 
able to write "invisible" JavaScript, or rather, Yet Another way you 
need to look at and reveal possible code, are a headache for someone 
else.  Viewed like this, you might do better taking this suggestion to 
W3C and having them amend the HTML/XML specs so that E0020..E007F are 
non-rendering synonyms for 0020..007F.  It wouldn't be a Unicode thing 
anymore, just changing the definition of HTML.  (I'm not saying it would 
be a GOOD idea, mind you.)


On 1/22/19 10:43 PM, James Kass via Unicode wrote:
> Nobody has really addressed Andrew West's suggestion about using the 
> tag characters.
> It seems conformant, unobtrusive, requiring no official sanction, and 
> could be supported by third-partiers in the absence of corporate 
> interest if deemed desirable.
> One argument against it might be:  Whoa, that's just HTML.  Why not 
> just use HTML?  SMH
> One argument for it might be:  Whoa, that's just HTML!  Most everybody 
> already knows about HTML, so a simple subset of HTML would be 
> recognizable.
> After revisiting the concept, it does seem elegant and workable. It 
> would provide support for elements of writing in plain-text for anyone 
> desiring it, enabling essential (or frivolous) preservation of 
> editorial/authorial intentions in plain-text.
> Am I missing something?  (Please be kind if replying.)
> On 2019-01-20 10:35 AM, Andrew West wrote:
>> A possibility that I don't think has been mentioned so far would be to
>> use the existing tag characters (E0020..E007F). These are no longer
>> deprecated, and as they are used in emoji flag tag sequences, software
>> already needs to support them, and they should just be ignored by
>> software that does not support them. The advantages are that no new
>> characters need to be encoded, and they are flexible so that tag
>> sequences for start/end of italic, bold, fraktur, double-struck,
>> script, sans-serif styles could be defined. For example start and end
>> of italic styling could be defined as the tag sequences <i> and </i>
>> (E003C E0069 E003E and E003C E002F E0069 E003E).
>> Andrew

More information about the Unicode mailing list