Corrigendum #9

Philippe Verdy verdy_p at
Sun Jun 1 00:06:34 CDT 2014

I've not proposed to move these characters elsewhere (or ro reencode them),
why do you think that?.

I just challenge your statement that a block cannot be discontinuous,
something that is unique in all Unicode properties and completely absent
from ISO 10646 which does not define any real properties beside a name in a
specific code point and some informative glyph, plus historic reference
links documenting its intended usage. (Where is it written in the
Unicode-only stability rules that is continuous when allocations of
codepoints in these blocs has always been discontinuous?...), much more
important than this legacy one which has absolutely no use in regexps as
you stated.

Even the set of non-characters is also discontinuous, as well as blocks for
the Arabic script.; or blocks for presentation forms, or blocks for
compatibility characters. Every property in Unicode is fragmented over
multiple ranges (whose length is also extremely frequently discontinuous
within each block or even in the same encoding column

In other words IsInArabicPresentation(x) would still remain true for all
assgned characters in that block, it will just be false for non-characters
considered outside of it but non-characters don't have nay useful property
except being non-character (the block where they are allocated does not
matter at all).

The alternative is to not restrict these characters as being non-characters
and allowing them to be present in files without enforcing any error, i.e.
treat it like PUA, also with a feow possible default properties (this makes
them a bit interoperable still with limited private agreements, possibly
implicit with the transport interface or enveloppe format).

2014-06-01 4:15 GMT+02:00 Asmus Freytag <asmusf at>:

>  More importantly, while a regex that uses an expression that is
> equivalent to "IsInArabiPresentation(x)" may or may not be well-defined,
> there is no reason to break it by splitting the block.
> As blocks cannot be discontiguous (unlike other properties), some Arabic
> Presentation forms would have to be put into a new block (Arabic
> Presentation Forms C). This is what would break such expressions - it has,
> in fact, nothing to do with the status of the noncharacters.
> There's no reason to contemplate breaking changes of any kind at this
> point.
> A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list