UAX 29 questions

Karl Williamson public at
Thu Jan 29 23:25:14 CST 2015

On 01/29/2015 08:19 PM, Philippe Verdy wrote:
> 2015-01-29 19:52 GMT+01:00 Karl Williamson <public at
> <mailto:public at>>:
>     Rule WB4 is
>     "Ignore Format and Extend characters, except when they appear at the
>     beginning of a region of text.".
>     Not clearly stated, but it appears to me that the ZWJ must be
>     considered here to be the beginning of a region of text, as we are
>     looking at the boundary between it and the "A".  No rule
>     specifically mentions ALetter followed by an Extend, so by the
>     default rule, WB14
>     "Otherwise, break everywhere (including around ideographs)"
> All the text is targeted at finding candidate positions for breaks. It
> is not very clear that "ignore" is definitive and means that there
> cannot be any further breaks before the Format and Extend characters,
> except at beginng of text. So all the rest of rules is ignored, there
> was a match and you stop there; no break before;
>    Any  × (Format | Extend)
> This is confirmed in other rules that state the word "otherwise",
> including the last one (WB14) you quote which is explciitly not applicable.

I don't understand you here.  I understand all the words, but I don't 
see what you're trying to say.  My claim is that there should be a rule:
as you give

  Any  × (Format | Extend)

but there isn't.  I think you are maybe trying to say that the word 
"ignore" in this UAX is tantamount to such a rule.  I am a native 
English speaker, and would never have drawn that inference from the 
text.  There are a lot of passages in the Standard that sound like 
gibberish to me.  I know the words' meanings, but the combination don't 
make any sense.  I don't recall ever having this issue in other 
standards I've looked at.

More information about the Unicode mailing list