What constitute? an abstract character?

Kent Karlsson kent.b.karlsson at bahnhof.se
Fri Jun 19 18:06:34 CDT 2020

> 15 juni 2020 kl. 17:04 skrev Peter Constable via Unicode <unicode at unicode.org>:
> Unicode doesn’t give one answer since there’s more than one way that might be appropriate to answer it.
> […] An Old Hangul syllable might have a count of 1, 2 or 3, depending on the syllable.

A bit peripheral to this thread, but:

1) No need to limit that to Old Hangul. It is equally valid for Modern Hangul. It’s just that for SOME old Hangul syllables there is no (canonically equivalent) single character form. This is for encoding historical reasons, nothing deep. Just that hindsight is (now) not at all a sufficient reason to radically change the encoding. (It was sufficient reason long ago, resulting in the ”Hangul mess” in Unicode...)

2) For practical (I guess) reasons one considers clusters of consonants and clusters of vowels as singular indivisible entities. However, since Hangul is an alphabetic script (and the letter basis has no consonant or vowel ”clusters”, the clusters consists of one to three letters), also the (canonical) decomposition into maximum three components is an artifact of the encoding. A Hangul syllable can often consist of more than three Hangul letters. And no, the compatibility decomposition of the Hangul Jamo is of no help, basically they are wrong for Hangul. DO NOT USE! Completely different decompositions are needed to decompose into the letters originally designed for the script. Furthermore, the consonants are (basically) double encoded, but that is for encoding technical reasons, not that there are really two different ones each, just two different positions in a syllable.

This just shows that the mapping from ”abstract characters” (in this example, the letters of the Hangul alphabet) to encoded characters sometimes can be non-trivial.

/Kent Karlsson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200620/dbd53dd6/attachment.htm>

More information about the Unicode mailing list