Question about the Sentence_Break property

Richard Wordingham richard.wordingham at ntlworld.com
Thu Feb 19 23:14:30 CST 2015


On Thu, 19 Feb 2015 19:55:20 -0700
Karl Williamson <public at khwilliamson.com> wrote:

> UAX 29 says this:
> 
> Break after paragraph separators.
> SB4. 	Sep | CR | LF 	
> 
> Why are CR and LF considered to be paragraph separators?  NEL and
> Line Break are as well.
> 
> My mental model of plain text has it containing embedded characters, 
> which I'll call \n, to allow it to be displayed in a terminal window
> of a given width.  Not all text is like that, of course, but there is
> an awful lot that is.  This rule makes no sense to me.

There are two types of plain text - that which requires explicit
line-breaking, and that which does not.  This is a case where a
non-linguistic tailoring is required.

TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8.
One thing that is missing is mention of the convention that a single
newline character (or CRLF pair) is a line break whereas a doubled
newline character denotes a paragraph break.

Richard.


More information about the Unicode mailing list