Encoding italic (was: A last missing link)

Mark E. Shoulson via Unicode unicode at unicode.org
Wed Jan 23 20:21:39 CST 2019


On 1/22/19 6:26 PM, Kent Karlsson via Unicode wrote:
> Ok. One thing to note is that escape sequences (including control sequences,
> for those who care to distinguish those) probably should be "default
> ignorable" for display. Requiring, or even recommending, them to be default
> ignorable for other processing (like sorting, searching, and other things)
> may be a tall order. So, for display, (maximal) substrings that match:
>
> \u001B[\u0020-\002F]*[\u0030-\007E]|
> (\u001B'['|\009B)[\u0030-\003F]*[\u0020-\002F]*[\u0040-\007E]
>
> should be default ignorable (i.e. invisible, but a "show invisibles" mode
> would show them; not interpreted ones should be kept, even if interpreted
> ones need not, just (re)generated on save). That is as far as Unicode
> should go.

So it isn't just "these characters should be default ignorable", but 
"this regular expression is default ignorable."  This gets back to 
"things that span more than a character" again, only this time the 
"span" isn't the text being styled, it's the annotation to style it.  
The "bash" shell has special escape-sequences (\[ and \]) to use in 
defining its prompt that tell the system that the text enclosed by them 
is not rendered and should not be counted when it comes to doing 
cursor-control and line-editing stuff (so you put them around, yep, the 
escape sequences for coloring or boldfacing or whatever that you want in 
your prompt). That would seem to be at least simpler than a big ol' 
regexp, but really not that much of an improvement.  It also goes to 
show how things like this require all kinds of special handling, 
even/especially in a "simple" shell prompt (which could make a strong 
case for being "plain text", though, yes, terminal escape codes are a 
thing.)

~mark


More information about the Unicode mailing list