Word_Break for Hieroglyphs
Michael Everson via Unicode
unicode at unicode.org
Thu Dec 14 08:22:54 CST 2017
On 14 Dec 2017, at 14:14, Mark Davis ☕️ via Unicode <unicode at unicode.org> wrote:
> The Word_Break property doesn't have a value Complex_Context, but I think that was just a typo in your message.
>
> The word break and line break properties for 1,057 [:Script=Egyp:] characters are currently
>
> Word_Break=ALetter
> Line_Break=Alphabetic
>
> Off the top of my head, I think the best course would be to make them both the same as for most of [:Script=Hani:]
>
> Word_Break=Other
> Line_Break=Ideographic
Egyptian is not ideographic and is certainly not fixed-width. CJK does not cluster. Why should you want to make them the same? Moreover, these properties were defined at the beginning, were they not? Bob Richmond and others will certainly have a view on this.
> We would only need to use Complex_Context [:lb=SA:] for scripts that keep some letters together and break others apart (typically needing dictionary lookup). I would suspect for modern use of Egyp, that is not the case;
Please do not “suspect”. It is not hard to ask experts.
> most people would expect the characters to would just flow like ideographs, breaking between any pair:
NO. Clusters cannot be broken up just anywhere.
> you wouldn't need to disallow breaks between a <man whose head is hit with an axe> and a <head of hippopotamus>, for example.
>
> Also, I noticed that the 14 Egyp characters with Line_Break≠Alphabetic have a linebreak and general category properties that seem odd and inconsistent to me.
>
> Line_Break=Close_Punctuation
> General_Category=Other_Letteritems: 8
> Egyptian Hieroglyphs — O. Buildings, parts of buildings, etc.items: 6
>
> �� U+1325B EGYPTIAN HIEROGLYPH O006D
> �� U+1325C EGYPTIAN HIEROGLYPH O006E
> �� U+1325D EGYPTIAN HIEROGLYPH O006F
> �� U+13282 EGYPTIAN HIEROGLYPH O033A
> �� U+13287 EGYPTIAN HIEROGLYPH O036B
> �� U+13289 EGYPTIAN HIEROGLYPH O036D
> Egyptian Hieroglyphs — V. Rope, fiber, baskets, bags, etc.items: 2
>
> �� U+1337A EGYPTIAN HIEROGLYPH V011B
> �� U+1337B EGYPTIAN HIEROGLYPH V011C
> Line_Break=Open_Punctuation
> General_Category=Other_Letteritems: 6
> Egyptian Hieroglyphs — O. Buildings, parts of buildings, etc.items: 5
>
> �� U+13258 EGYPTIAN HIEROGLYPH O006A
> �� U+13259 EGYPTIAN HIEROGLYPH O006B
> �� U+1325A EGYPTIAN HIEROGLYPH O006C
> �� U+13286 EGYPTIAN HIEROGLYPH O036A
> �� U+13288 EGYPTIAN HIEROGLYPH O036C
> Egyptian Hieroglyphs — V. Rope, fiber, baskets, bags, etc.items: 1
>
> �� U+13379 EGYPTIAN HIEROGLYPH V011A
These properties were chosen explicitly when Egyptian was first defined. Those are enclosing punctuation characters.
Michael Everson.
More information about the Unicode
mailing list