Unclear text in the UBA (UAX#9) of Unicode 6.3

Eli Zaretskii eliz at gnu.org
Thu Apr 24 10:20:31 CDT 2014


> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Thu, 24 Apr 2014 17:11:23 +0200
> Cc: Asmus Freytag <asmusf at ix.netcom.com>, Ilya Zakharevich <nospam-abuse at ilyaz.org>, ken at unicode.org, 
> 	James Clark <jjc at jclark.com>, unicode Unicode Discussion <unicode at unicode.org>
> 
> > In addition, assuming that by "guillemets" Philippe means U+00AB and
> > U+00BB,
> 
> 
> "guillemet" is THE correct name, even in English. "guillemot" comes from an
> old typo error.

I didn't mean to say "guillemet" was typo, I just wasn't sure which
Unicode codepoint you had in mind, since you didn't show its full
official name or its codepoint.  And at least your original message
used "<<" and ">>" transliterations, not the actual characters.

> > they cannot possibly form a bracketed pair, because their
> > General Category is not Ps and Pe.  For that reason, you will never
> > find them in BidiBrackets.txt.
> >
> 
> Forget the general category, we know that it does not solve any
> internationalization issue correctly. All past versions of Unicode
> algorthms that initially attempted to use them now use them only as
> informative rules (which are not stabilized) to help generate new "derived"
> properties (which should be used verbatim from the content of the UCD,
> because rapidly new exceptions are added to the rules).
> 
> The guillemet evidently form a pair even if their use depends on languages
> which may swap their role (and this is the main reason why they are not
> assigned Ps and Pe because Ps and Pe will be swapped. They are still a pair
> which works even better than """ that can be paired in 3 different ways and
> not just two (meaning that you don't know which one to look for.

They are not a pair for the purposes of the PBA, which is the subject
of this discussion.  Your message, viz.:

> - later the closing guillemet matches the opening guillemet remaining on
> the stack, even if the second opening bracket was pushed on top of it :
> pair of guillemets is matched, the opening guillement is dropped from the
> stack but the second bracket on top of it remains there and can also match
> now the following closing bracket.

indicated that you thought the guillemets could form a bracket pair,
which they cannot, according to the UBA.

> So nothing (at least not the reason of the GC which is just an intermediate
> but incomplete helper) forbids the guillemets to be listed in
> BidiBrackets.txt.

They don't satisfy the conditions for that.  From BidiBrackets.txt:

  # This file lists the set of code points with Bidi_Paired_Bracket_Type
  # property values Open and Close. The set is derived from the character
  # properties General_Category (gc), Bidi_Class (bc), Bidi_Mirrored (Bidi_M),
  # and Bidi_Mirroring_Glyph (bmg), as follows: two characters, A and B,
  # form a bracket pair if A has gc=Ps and B has gc=Pe, both have bc=ON and
  # Bidi_M=Y, and bmg of A is B. Bidi_Paired_Bracket (bpb) maps A to B and
  # vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are
  # Open (o) and Close (c), respectively.

As you see, Ps and Pe are explicitly required.



More information about the Unicode mailing list