Unicode "no-op" Character?
Richard Wordingham via Unicode
unicode at unicode.org
Sun Jun 23 03:24:50 CDT 2019
On Sat, 22 Jun 2019 23:56:50 +0000
Shawn Steele via Unicode <unicode at unicode.org> wrote:
> + the list. For some reason the list's reply header is confusing.
> From: Shawn Steele
> Sent: Saturday, June 22, 2019 4:55 PM
> To: Sławomir Osipiuk <sosipiuk at gmail.com>
> Subject: RE: Unicode "no-op" Character?
> The original comment about putting it between the base character and
> the combining diacritic seems peculiar. I'm having a hard time
> visualizing how that kind of markup could be interesting?
There are a number of possible interesting scenarios:
1) Chopping the string into user perceived characters. For example,
the Khmer sequences of COENG plus letter are named sequences. Akin to
this is identifying resting places for a simple cursor, e.g. allowing it
to be positioned between a base character and a spacing, unreordered
subscript. (This last possibility overlaps with rendering.)
2) Chopping the string into collating elements. (This can require
renormalisation, and may raise a rendering issue with HarfBuzz, where
renomalisation is required to get marks into a suitable order for
shaping. I suspect no-op characters would disrupt this
renormalisation; CGJ may legitimately be used to affect rendering this
way, even though it is supposed to have no other effect* on rendering.)
3) Chopping the string into default grapheme clusters. That
separates a coeng from the following character with which it
*Is a Unicode-compliant *renderer* allowed to distinguish diaeresis
from the umlaut mark?
More information about the Unicode