Word_Break for Hieroglyphs
Martin J. Dürst via Unicode
unicode at unicode.org
Wed Dec 20 03:06:28 CST 2017
On 2017/12/20 17:46, Richard Wordingham via Unicode wrote:
> In an implementation that offered genuine whole word selection, and
> thus tackled with the challenges of Chinese, Japanese, Korean and
> Vietnamese (both scripts, not just CJKV) as well as Thai, I would
> expect the selections to be bounded by word boundaries. Thus, if the
> cited line break (labelled by '6') were not in the text, I would expect
> double-clicking on the quadrat G37:Aa13:Aa13 to select all three words.
This may be common knowledge to some, but I just had a Japanese document
open in MS Word, and tried what happened on double-clicking. What it
does is select same-script runs. This means that a run of kanji, a run
of hiragana, or a run of katakana (interestingly, the (kata)kana length
mark is treated as a forth script) is selected. This is of course not
the same as words, but it can match, and it comes close in terms of
offering something for editorial convenience while being easy to implement.
Regards, Martin.
More information about the Unicode
mailing list