Word_Break for Hieroglyphs

Richard Wordingham via Unicode unicode at unicode.org
Wed Dec 20 02:46:33 CST 2017


On Mon, 18 Dec 2017 15:15:11 +0100
Serge Rosmorduc via Unicode <unicode at unicode.org> wrote:

> Hence, you have things like (like 5-6) :  : the word ẖsy « small »,
> is cut between the two lines. The phonetic part is line 5, and the
> bird determinative is alone on line 5, above the preposition « m »,
> which is itself above the consonnant « m » which is the first
> consonant of the following word. I have written the three words in
> different colours to display their intrication.

In an implementation that offered genuine whole word selection, and
thus tackled with the challenges of Chinese, Japanese, Korean and
Vietnamese (both scripts, not just CJKV) as well as Thai, I would
expect the selections to be bounded by word boundaries.  Thus, if the
cited line break (labelled by '6') were not in the text, I would expect
double-clicking on the quadrat G37:Aa13:Aa13 to select all three words.

Looking at the rendering in
https://mjn.host.cs.st-andrews.ac.uk/egyptian/texts/corpus/pdf/Merneptah.pdf,
it is worth noting that the cartouche in Line 4 of the inscription is
not broken between lines.  I don't know whether this is to avoid
breaking the cartouche or to avoid separating the facing figure therein.

Richard.



More information about the Unicode mailing list