Sentence_Break, Semi-colons, and Apparent Miscategorization
Mark Davis ☕️ via Unicode
unicode at unicode.org
Thu Mar 8 09:04:44 CST 2018
>From the first line, I guess you mean that all three questions are having
to do with the Sentence_Break property values. Namely:
On Thu, Mar 8, 2018 at 9:25 AM, fantasai via Unicode <unicode at unicode.org>
> Given that the comma and colon are categorized as SContinue,
> why is the semicolon also not SContinue?
> Also, why is the Greek Question Mark not categorized with
> the rest of the question marks?
As I recall
because the semicolon can also represent a greek question mark (they are
, so you can't reliably distinguish between them
BTW, here is a table of property differences for codepoint X, toNfc(X) (if
a single character) and toNfkc(X) (again, if a single character).
It was a quick dump so no guarantees that all the dots are crossed. It
skips comparing properties that are purposefully different across NFC (like
Decomposition_Mapping) or different code points (like Name or Block), and
most CJK properties (ones starting with 'k').
> Why aren't the vertical presentation forms categorized with
> the things they are presenting?
At least some of them are:
U+FE10 ( ︐ ) PRESENTATION FORM FOR VERTICAL COMMA
U+FE11 ( ︑ ) PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA
U+FE13 ( ︓ ) PRESENTATION FORM FOR VERTICAL COLON
U+FE31 ( ︱ ) PRESENTATION FORM FOR VERTICAL EM DASH
U+FE32 ( ︲ ) PRESENTATION FORM FOR VERTICAL EN DASH
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode