Should unassigned code points in blocks reserved for combining marks, etc be GCB extended?

Richard Wordingham richard.wordingham at
Mon Dec 12 14:19:29 CST 2016

On Mon, 12 Dec 2016 09:30:31 -0800
Ken Whistler <kenwhistler at> wrote:

> On 12/12/2016 6:59 AM, Karl Williamson wrote:

> > These are currently GCB Other, but when assigned, don't we know that
> > they will be Extended?  So this could be done now.

> Any proposal like this then also has hidden costs on the committees,
> because it sets up implied requirements for what can be encoded where
> and what properties it has to have. Every time such defaults are set
> up, it makes the documentation of what is already "pre-assigned" more
> complicated and fragile. Already, a large proportion of the
> participants in the maintenance committees have very murky
> understandings about what can and cannot be put where in the future,
> and why. And that is a recipe for mistakes in encoding.

How does this differ from U+0816 SAMARITAN MARK IN changing from
bidi_class=R to bidi_class=NSM upon assignment?

The idea is to reduce the damage done by the use of obsolete versions of
the Unicode database.
> Finally, like it or not, there currently is no actually contract
> guaranteeing that the remaining open ranges in blocks "reserved" for
> combining marks will all end up gc=Mn or gc=Me, anyway. The relevant
> ranges are 1ABF..1AFF, 1DF6..1DFA, and 20F1..20FF. There is nothing to
> prevent the committees from deciding that one (or more) spacing
> combining marks might be appropriate to encode there, or possibly even
> spacing non-combining marks of some strange sort, like the spacing
> Arabic letter diacritics that ended up at FBB2..FBC1. Trying to keep
> those ranges free of characters that would not be Grapheme_Extend=Yes
> would require some guy on the committee to be aware of the arcane
> dependencies for segmentation properties, and then to police such
> decisions in perpetuity -- or at least until the blocks in question
> filled up with non-problematical characters.

What is the down side of a code point changing from Graphme_Extend=Yes
to Grapheme_Extend=No when it is assigned?


More information about the Unicode mailing list