Chinese Word Breaking
Richard Wordingham
richard.wordingham at ntlworld.com
Tue Jul 21 18:33:34 CDT 2015
On Tue, 21 Jul 2015 18:10:14 +0800
gfb hjjhjh <c933103 at gmail.com> wrote:
> When you write text in modern Chinese, there will not be any break
> between different words, and thus if you segment characters according
> to the ideographic characters, what being groupped together would
> either be a clausee or a sentence, Or even a whole paragraph if you
> are handling some older text without punctuations.
I had another look at Chinese word breaking algorithms today and saw
that their practical purposes were mostly indexing and machine
translation. Consequently, I suspect that authors have little
incentive to mark word boundaries in the texts they originate. This
differs from the Thai situation where marking word boundaries improves
layout and spell-checking.
Richard.
More information about the Unicode
mailing list