Vietnamese Collation Wrong for Polysyllables
Richard Wordingham via CLDR-Users
cldr-users at unicode.org
Thu Jan 11 14:51:51 CST 2018
On Thu, 11 Jan 2018 10:03:31 +0100
Mark Davis ☕️ via CLDR-Users <cldr-users at unicode.org> wrote:
> The first question I'd have is whether the lexical ordering described
> in https://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks is
> expected by average Vietnamese. We have seen before cases where a
> formal government specification (French accent ordering) is expected
> by nobody outside of a small group of mavens.
How small is small? The official Thai way seems not to be how Thais
usually compare manually!
> Assuming it is required ...
>
> This may just be a case where the UCA doesn't work well enough without
> preprocessing. The standard does allow for such preprocessing, and the
> question is how to allow for that in CLDR data. One way I can think
> of by allowing a transform for text that is applied before sorting to
> be specified in the UCA description.
That method may assist greatly with other mainland SE Asian languages
that compare syllable by syllable - Lao, Tai Lue and Burmese at least.
Indeed, it might make the Lao CCVT syllable-based ordering (i.e.
initial, final, vowel and then tone) much easier to implement, as well
as greatly simplifying the Lao CVCT order I was struggling to implement
in March 2017.
> It appears that
> Vietnamese syllables are well structured, which could allow a
> relatively simple transform to do the job, along the lines of what
> you are suggesting.
In vi.xml, I find "St. Barthélemy", which is a little worrying. Even
more important are words like "a-xít" or "axít" 'acidʹ and several other
partially assimilated words, such as "phốtpho" 'phosphorus'.
> In any event, we'd want to involve Vietnamese experts before going any
> further.
Glad to have someone else do the work!
Richard.
More information about the CLDR-Users
mailing list