Mark-up to Indicate Words

Martin J. Dürst duerst at it.aoyama.ac.jp
Wed Jul 15 06:18:09 CDT 2015


Hello Richard,

On 2015/07/15 16:49, Richard Wordingham wrote:
> What mark-up schemes exist to show that a sequence of letters and
> combining marks constitutes a single word?
>
> Such mark-up would be useful when using spell checkers. At present, I
> use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary.
> (Systematic marking of boundaries using ZWSP is not popular with
> users, and is normally not used in Thai - it's not supported in
> their national or Windows 8-bit encodings.) However, it seems likely
> that when Unicode 8.00 is defined in August, WJ will suppress line
> breaks but not word breaks.  There would still be the limitation that
> mark-up is not available in plain text.
>
> It appears that, for example, Open Document Format has no mark-up to
> indicate word boundaries, relying instead on the overrides of
> the word boundary detection algorithms being stored at character level.

I'd suggest looking at higher-end formats such as DITA or TEI (Text 
Encoding Initiative).

Regards,   Martin.

> Richard.
> .
>


More information about the Unicode mailing list