Word_Break for Hieroglyphs

Michael Everson via Unicode unicode at unicode.org
Thu Dec 14 08:22:54 CST 2017


On 14 Dec 2017, at 14:14, Mark Davis ☕️ via Unicode <unicode at unicode.org> wrote:

> The Word_Break property doesn't have a value Complex_Context, but I think that was just a typo in your message.
> 
> The word break and line break properties for 1,057 [:Script=Egyp:] characters are currently
> 
> Word_Break=ALetter
> Line_Break=Alphabetic
> 
> Off the top of my head, I think the best course would be to make them both the same as for most of [:Script=Hani:]
> 
> Word_Break=Other
> Line_Break=Ideographic

Egyptian is not ideographic and is certainly not fixed-width. CJK does not cluster. Why should you want to make them the same? Moreover, these properties were defined at the beginning, were they not? Bob Richmond and others will certainly have a view on this. 

> We would only need to use Complex_Context [:lb=SA:] for scripts that keep some letters together and break others apart (typically needing dictionary lookup). I would suspect for modern use of Egyp, that is not the case;

Please do not “suspect”. It is not hard to ask experts.

> most people would expect the characters to would just flow like ideographs, breaking between any pair:

NO. Clusters cannot be broken up just anywhere. 

> you wouldn't need to disallow breaks between a <man whose head is hit with an axe> and a <head of hippopotamus>, for example.
> 
> Also, I noticed that the 14 Egyp characters with Line_Break≠Alphabetic have a linebreak and general category properties that seem odd and inconsistent to me.
> 
> Line_Break=Close_Punctuation
> General_Category=Other_Letteritems: 8
> Egyptian Hieroglyphs — O. Buildings, parts of buildings, etc.items: 6
> 
>  �� 	U+1325B	EGYPTIAN HIEROGLYPH O006D
>  �� 	U+1325C	EGYPTIAN HIEROGLYPH O006E
>  �� 	U+1325D	EGYPTIAN HIEROGLYPH O006F
>  ��	U+13282	EGYPTIAN HIEROGLYPH O033A
>  �� 	U+13287	EGYPTIAN HIEROGLYPH O036B
>  �� 	U+13289	EGYPTIAN HIEROGLYPH O036D
> Egyptian Hieroglyphs — V. Rope, fiber, baskets, bags, etc.items: 2
> 
>  �� 	U+1337A	EGYPTIAN HIEROGLYPH V011B
>  �� 	U+1337B	EGYPTIAN HIEROGLYPH V011C
> Line_Break=Open_Punctuation
> General_Category=Other_Letteritems: 6
> Egyptian Hieroglyphs — O. Buildings, parts of buildings, etc.items: 5
> 
>  ��	U+13258	EGYPTIAN HIEROGLYPH O006A
>  ��	U+13259	EGYPTIAN HIEROGLYPH O006B
>  ��	U+1325A	EGYPTIAN HIEROGLYPH O006C
>  ��	U+13286	EGYPTIAN HIEROGLYPH O036A
>  ��	U+13288	EGYPTIAN HIEROGLYPH O036C
> Egyptian Hieroglyphs — V. Rope, fiber, baskets, bags, etc.items: 1
> 
>  ��	U+13379	EGYPTIAN HIEROGLYPH V011A

These properties were chosen explicitly when Egyptian was first defined. Those are enclosing punctuation characters. 

Michael Everson.


More information about the Unicode mailing list