Word_Break for Hieroglyphs

Andrew Glass via Unicode unicode at unicode.org
Thu Dec 14 12:11:33 CST 2017


We’ve made a lot of progress on Hieroglyphs this year with the addition of the quadrat forming controls (thanks again to everyone involved in that effort and in the preceding 13 documents). I like to think that that part of the model is no longer in flux. Certainly, there is more work to be done on correct breaking. At this point we know that quadrat breaks != word breaks, but quadrat boundaries must align with line breaks. We had some discussion on the sidelines of the August UTC meeting at which time it became clear that more work is needed as current property values are not entirely correct. Currently, my Hieroglyphic energies are focused on completing font documentation and a reference font. I think it will be most helpful to understand the properties when we have a font that fully supports the quadrat controls so we have specific examples we can look at and confer on with specialists. So I’m happy to take Ken’s suggestion that we don’t rush in here.

Cheers,

Andrew

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Ken Whistler via Unicode
Sent: Thursday, December 14, 2017 8:27 AM
To: mark <mark at macchiato.com>
Cc: unicode at unicode.org
Subject: Re: Word_Break for Hieroglyphs


Gentlemen,

On 12/14/2017 6:53 AM, Mark Davis ☕️ via Unicode wrote:
Thus I would like people who are both knowledgeable about hieroglyphs and Unicode properties to weigh in. I know that people like Andrew Glass are on this list, who satisfy both criteria.
​
And what constitutes a cluster?

This entire discussion is premature. The model for Egyptian is in flux right now. What constitutes a "quadrat", which is significantly relevant to any determination of how other segmentation properties should work for Egyptian hieroglyphics, will depend on the details of the model and how quadrat formation interacts with the exact set of format controls eventually agreed upon. See:

http://www.unicode.org/L2/L2017/17112r-quadrat-encoding.pdf<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2FL2%2FL2017%2F17112r-quadrat-encoding.pdf&data=04%7C01%7CAndrew.Glass%40microsoft.com%7C39d84a5cc99343537f6308d543106a18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636488660163563936%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=FxkMPiP7GvgII%2FdP%2FhM68lwui1rLV%2BjeWnFqDN%2Bo8jk%3D&reserved=0>

(And please note that that has a reference list of 13 *other* documents. This is not simple stuff.)

When we get closure on the Egyptian model, *then* will be the time to make suggestions for how Egyptian values for GCB, WB, and LB might we adjusted for possible better default behavior.

--Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20171214/df6becf6/attachment.html>


More information about the Unicode mailing list