Chinese Word Breaking

Richard Wordingham richard.wordingham at ntlworld.com
Tue Jul 21 18:33:34 CDT 2015


On Tue, 21 Jul 2015 18:10:14 +0800
gfb hjjhjh <c933103 at gmail.com> wrote:

> When you write text in modern Chinese, there will not be any break
> between different words, and thus if you segment characters according
> to the ideographic characters, what being groupped together would
> either be a clausee or a sentence, Or even a whole paragraph if you
> are handling some older text without punctuations.

I had another look at Chinese word breaking algorithms today and saw
that their practical purposes were mostly indexing and machine
translation.  Consequently, I suspect that authors have little
incentive to mark word boundaries in the texts they originate.  This
differs from the Thai situation where marking word boundaries improves
layout and spell-checking.

Richard.


More information about the Unicode mailing list