Word_Break for Hieroglyphs

Martin J. Dürst via Unicode unicode at unicode.org
Wed Dec 20 03:06:28 CST 2017

On 2017/12/20 17:46, Richard Wordingham via Unicode wrote:

> In an implementation that offered genuine whole word selection, and
> thus tackled with the challenges of Chinese, Japanese, Korean and
> Vietnamese (both scripts, not just CJKV) as well as Thai, I would
> expect the selections to be bounded by word boundaries.  Thus, if the
> cited line break (labelled by '6') were not in the text, I would expect
> double-clicking on the quadrat G37:Aa13:Aa13 to select all three words.

This may be common knowledge to some, but I just had a Japanese document 
open in MS Word, and tried what happened on double-clicking. What it 
does is select same-script runs. This means that a run of kanji, a run 
of hiragana, or a run of katakana (interestingly, the (kata)kana length 
mark is treated as a forth script) is selected. This is of course not 
the same as words, but it can match, and it comes close in terms of 
offering something for editorial convenience while being easy to implement.

Regards,   Martin.

More information about the Unicode mailing list