Hyphenation

Jukka K. Korpela jkorpela at cs.tut.fi
Wed Feb 4 12:57:38 CST 2015


2015-02-04, 19:58, Cameron Dutro wrote:

> It is often the case, especially on smaller screens, that long words
> must be hyphenated so they wrap in a natural way. As far as I can tell,
> the CLDR data set does not define hyphenation rules.

That is correct. And they cannot really be described using the 
techniques currently deployed in CLDR.

> I'm not even really
> sure what the hyphenation rules should be for English.

They vary by version of English (and by authority).

> The implementation I've seen uses a dictionary - maybe it's identifying
> potential breaks at syllable boundaries?

Some simple hyphenators are dictionary-driven. But this does not work 
well even for English, since any word not in the dictionary would remain 
unhyphenated. It does not work well at all for languages that have, say, 
a thousand inflected forms for each verb or noun – but may have simple 
algorithmic rules for hyphenation.

Hyphenation strategies vary greatly by language. At present, the best 
you can do is to try to find suitable hyphenation software for the 
languages that are relevant to you.

Yucca





More information about the CLDR-Users mailing list