Unicode "no-op" Character?

Mark E. Shoulson via Unicode unicode at unicode.org
Wed Jul 3 16:51:29 CDT 2019


I think the idea being considered at the outset was not so complex as 
these (and indeed, the point of the character was to avoid making these 
kinds of decisions). There was a desire for some reason to be able to 
chop up a string into equal-length pieces or something, and some of 
those divisions might wind up between bases and diacritics or who knows 
where else.  Rather than have to work out acceptable places to place the 
characters, the request was for a no-op character that could safely be 
plopped *anywhere*, even in the middle of combinations like that.

~mark

On 6/23/19 4:24 AM, Richard Wordingham via Unicode wrote:
> On Sat, 22 Jun 2019 23:56:50 +0000
> Shawn Steele via Unicode <unicode at unicode.org> wrote:
>
>> + the list.  For some reason the list's reply header is confusing.
>>
>> From: Shawn Steele
>> Sent: Saturday, June 22, 2019 4:55 PM
>> To: Sławomir Osipiuk <sosipiuk at gmail.com>
>> Subject: RE: Unicode "no-op" Character?
>>
>> The original comment about putting it between the base character and
>> the combining diacritic seems peculiar.  I'm having a hard time
>> visualizing how that kind of markup could be interesting?
> There are a number of possible interesting scenarios:
>
> 1) Chopping the string into user perceived characters.  For example,
> the Khmer sequences of COENG plus letter are named sequences.  Akin to
> this is identifying resting places for a simple cursor, e.g. allowing it
> to be positioned between a base character and a spacing, unreordered
> subscript.  (This last possibility overlaps with rendering.)
>
> 2) Chopping the string into collating elements.  (This can require
> renormalisation, and may raise a rendering issue with HarfBuzz, where
> renomalisation is required to get marks into a suitable order for
> shaping.  I suspect no-op characters would disrupt this
> renormalisation; CGJ may legitimately be used to affect rendering this
> way, even though it is supposed to have no other effect* on rendering.)
>
> 3) Chopping the string into default grapheme clusters.  That
> separates a coeng from the following character with which it
> interacts.
>
> *Is a Unicode-compliant *renderer* allowed to distinguish diaeresis
> from the umlaut mark?
>
> Richard.




More information about the Unicode mailing list