Word_Break for Hieroglyphs

Mark Davis ☕️ via Unicode unicode at unicode.org
Thu Dec 14 08:14:31 CST 2017


The Word_Break property doesn't have a value Complex_Context, but I think
that was just a typo in your message.

The word break and line break properties for 1,057 [:Script=Egyp:]
characters are currently

Word_Break=ALetter
Line_Break=Alphabetic

Off the top of my head, I think the best course would be to make them both
the same as for most of [:Script=Hani:]

Word_Break=Other
Line_Break=Ideographic

We would only need to use Complex_Context [:lb=SA:] for scripts that keep
some letters together and break others apart (typically needing dictionary
lookup). I would suspect for modern use of Egyp, that is not the case; most
people would expect the characters to would just flow like ideographs,
breaking between any pair: you wouldn't need to disallow breaks between a
<man whose head is hit with an axe> and a <head of hippopotamus>, for
example.


Also, I noticed that the 14 Egyp characters with Line_Break≠Alphabetic have
a linebreak and general category properties that seem odd and inconsistent
to me.

Line_Break=Close_Punctuation

General_Category=Other_Letter
items: 8

Egyptian Hieroglyphs
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{Block=Egyptian%20Hieroglyphs}>
 — *O. Buildings, parts of buildings, etc.
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{subhead=O.%20Buildings,%20parts%20of%20buildings,%20etc.}>*
items: 6
 ��  U+1325B <https://unicode.org/cldr/utility/character.jsp?a=1325B> EGYPTIAN
HIEROGLYPH O006D
 ��  U+1325C <https://unicode.org/cldr/utility/character.jsp?a=1325C> EGYPTIAN
HIEROGLYPH O006E
 ��  U+1325D <https://unicode.org/cldr/utility/character.jsp?a=1325D> EGYPTIAN
HIEROGLYPH O006F
 ��  U+13282 <https://unicode.org/cldr/utility/character.jsp?a=13282> EGYPTIAN
HIEROGLYPH O033A
 ��  U+13287 <https://unicode.org/cldr/utility/character.jsp?a=13287> EGYPTIAN
HIEROGLYPH O036B
 ��  U+13289 <https://unicode.org/cldr/utility/character.jsp?a=13289> EGYPTIAN
HIEROGLYPH O036D
Egyptian Hieroglyphs
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{Block=Egyptian%20Hieroglyphs}>
 — *V. Rope, fiber, baskets, bags, etc.
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{subhead=V.%20Rope,%20fiber,%20baskets,%20bags,%20etc.}>*
items: 2
 ��  U+1337A <https://unicode.org/cldr/utility/character.jsp?a=1337A> EGYPTIAN
HIEROGLYPH V011B
 ��  U+1337B <https://unicode.org/cldr/utility/character.jsp?a=1337B> EGYPTIAN
HIEROGLYPH V011C

Line_Break=Open_Punctuation

General_Category=Other_Letter
items: 6

Egyptian Hieroglyphs
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{Block=Egyptian%20Hieroglyphs}>
 — *O. Buildings, parts of buildings, etc.
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{subhead=O.%20Buildings,%20parts%20of%20buildings,%20etc.}>*
items: 5
 ��  U+13258 <https://unicode.org/cldr/utility/character.jsp?a=13258> EGYPTIAN
HIEROGLYPH O006A
 ��  U+13259 <https://unicode.org/cldr/utility/character.jsp?a=13259> EGYPTIAN
HIEROGLYPH O006B
 ��  U+1325A <https://unicode.org/cldr/utility/character.jsp?a=1325A> EGYPTIAN
HIEROGLYPH O006C
 ��  U+13286 <https://unicode.org/cldr/utility/character.jsp?a=13286> EGYPTIAN
HIEROGLYPH O036A
 ��  U+13288 <https://unicode.org/cldr/utility/character.jsp?a=13288> EGYPTIAN
HIEROGLYPH O036C
Egyptian Hieroglyphs
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{Block=Egyptian%20Hieroglyphs}>
 — *V. Rope, fiber, baskets, bags, etc.
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{subhead=V.%20Rope,%20fiber,%20baskets,%20bags,%20etc.}>*
items: 1
 ��  U+13379 <https://unicode.org/cldr/utility/character.jsp?a=13379> EGYPTIAN
HIEROGLYPH V011A



Mark <https://twitter.com/mark_e_davis>

On Thu, Dec 14, 2017 at 9:09 AM, Richard Wordingham via Unicode <
unicode at unicode.org> wrote:

> Is there any valid reason for Egyptian hieroglyphs to have
> Word_Break=ALetter rather than Complex_Context?  So far as I am aware,
> hieroglyphs lack visible word breaks in both inscriptions and in modern
> transcriptions.
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20171214/f0287f9d/attachment.html>


More information about the Unicode mailing list