Trying to understand Line_Break property apparent discrepancy

Karl Williamson public at khwilliamson.com
Mon Jan 11 16:42:37 CST 2016


It appears that 
http://www.unicode.org/Public/8.0.0/ucd/auxiliary/LineBreakTest.txt is 
testing a tailoring rather than the default line break algorithm, 
contrary to its heading "# Default Line Break Test".  And 
http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/LineBreakTest.html follows 
along.

For example, the default algorithm as shown in 
http://www.unicode.org/reports/tr14/#Table2 follows LB25, which is an 
approximation of the desired behavior.  But the test and html don't 
follow this.  I suspect they are looking for the tailoring described in 
http://www.unicode.org/reports/tr14/#Examples example 7.

For example, the test file tests for, and the html says that a class CL 
code point followed by a class PO one is an unconditional line break 
opportunity, based on rule 999. (which is the same as LB31 in TR14)

Whereas, http://www.unicode.org/reports/tr14/#Table2 says that a class 
CL code point followed by a class PO one is an

	 "indirect break opportunity 	B % A is equivalent to B × A and B SP+ ÷ 
A; in other words, do not break before A, unless one or more spaces 
follow B."  This is by LB25 and LB18.

There is a discrepancy here, which could be resolved either by changing 
the tests and html to follow LB25, or documenting that these are for 
something above and beyond the default algorithm.  (There may also be 
other discrepancies that I haven't stumbled against)





More information about the Unicode mailing list