Unclear text in the UBA (UAX#9) of Unicode 6.3

Philippe Verdy verdy_p at wanadoo.fr
Thu Apr 24 10:11:23 CDT 2014

2014-04-24 16:39 GMT+02:00 Eli Zaretskii <eliz at gnu.org>:

> In addition, assuming that by "guillemets" Philippe means U+00AB and
> U+00BB,

"guillemet" is THE correct name, even in English. "guillemot" comes from an
old typo error. If you don't want this term in Engmish you can still use
"double angle bracket" which is unnecessarily long.

> they cannot possibly form a bracketed pair, because their
> General Category is not Ps and Pe.  For that reason, you will never
> find them in BidiBrackets.txt.

Forget the general category, we know that it does not solve any
internationalization issue correctly. All past versions of Unicode
algorthms that initially attempted to use them now use them only as
informative rules (which are not stabilized) to help generate new "derived"
properties (which should be used verbatim from the content of the UCD,
because rapidly new exceptions are added to the rules).

The guillemet evidently form a pair even if their use depends on languages
which may swap their role (and this is the main reason why they are not
assigned Ps and Pe because Ps and Pe will be swapped. They are still a pair
which works even better than """ that can be paired in 3 different ways and
not just two (meaning that you don't know which one to look for.

Also read my exampel for what it is saying explicitly; a demonstration of
the problem; just an example (there are many other similar example for such
cases where nesting is not hierarchical but still maintains pairs).

So nothing (at least not the reason of the GC which is just an intermediate
but incomplete helper) forbids the guillemets to be listed in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140424/9d054b9c/attachment.html>

More information about the Unicode mailing list