Trying to understand Line_Break property apparent discrepancy

Karl Williamson public at khwilliamson.com
Mon Jan 11 18:16:56 CST 2016


On 01/11/2016 03:42 PM, Karl Williamson wrote:
> It appears that
> http://www.unicode.org/Public/8.0.0/ucd/auxiliary/LineBreakTest.txt is
> testing a tailoring rather than the default line break algorithm,
> contrary to its heading "# Default Line Break Test".  And
> http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/LineBreakTest.html follows
> along.
>
> For example, the default algorithm as shown in
> http://www.unicode.org/reports/tr14/#Table2 follows LB25, which is an
> approximation of the desired behavior.  But the test and html don't
> follow this.  I suspect they are looking for the tailoring described in
> http://www.unicode.org/reports/tr14/#Examples example 7.
>
> For example, the test file tests for, and the html says that a class CL
> code point followed by a class PO one is an unconditional line break
> opportunity, based on rule 999. (which is the same as LB31 in TR14)
>
> Whereas, http://www.unicode.org/reports/tr14/#Table2 says that a class
> CL code point followed by a class PO one is an
>
>       "indirect break opportunity     B % A is equivalent to B × A and B
> SP+ ÷ A; in other words, do not break before A, unless one or more
> spaces follow B."  This is by LB25 and LB18.
>
> There is a discrepancy here, which could be resolved either by changing
> the tests and html to follow LB25, or documenting that these are for
> something above and beyond the default algorithm.  (There may also be
> other discrepancies that I haven't stumbled against)
>
>
>
>

Ooops.  I didn't see this statement in the html file:
"The Line Break tests use tailoring of numbers described in Example 7 of 
Section 8.2 Examples of Customization. They also differ from the results 
produced by a pair table implementation in sequences like: ZW SP CL."

This explains everything.  Please disregard the earlier email from me.




More information about the Unicode mailing list