Unicode "no-op" Character?

Sun Jun 23 03:24:50 CDT 2019

On Sat, 22 Jun 2019 23:56:50 +0000
Shawn Steele via Unicode <unicode at unicode.org> wrote:

> + the list.  For some reason the list's reply header is confusing.
> 
> From: Shawn Steele
> Sent: Saturday, June 22, 2019 4:55 PM
> To: Sławomir Osipiuk <sosipiuk at gmail.com>
> Subject: RE: Unicode "no-op" Character?
> 
> The original comment about putting it between the base character and
> the combining diacritic seems peculiar.  I'm having a hard time
> visualizing how that kind of markup could be interesting?

There are a number of possible interesting scenarios:

1) Chopping the string into user perceived characters.  For example,
the Khmer sequences of COENG plus letter are named sequences.  Akin to
this is identifying resting places for a simple cursor, e.g. allowing it
to be positioned between a base character and a spacing, unreordered
subscript.  (This last possibility overlaps with rendering.)

2) Chopping the string into collating elements.  (This can require
renormalisation, and may raise a rendering issue with HarfBuzz, where
renomalisation is required to get marks into a suitable order for
shaping.  I suspect no-op characters would disrupt this
renormalisation; CGJ may legitimately be used to affect rendering this
way, even though it is supposed to have no other effect* on rendering.)

3) Chopping the string into default grapheme clusters.  That
separates a coeng from the following character with which it
interacts.

*Is a Unicode-compliant *renderer* allowed to distinguish diaeresis
from the umlaut mark?

Richard.