What constitute? an abstract character?

Peter Constable pgcon6 at msn.com
Mon Jun 15 10:04:59 CDT 2020


Unicode doesn’t give one answer since there’s more than one way that might be appropriate to answer it.

You might want a count of Unicode code points. If a buffer contained a UTF-32 sequence, that would be the same as the sequence length divided by 4. (Count if UTF-16 or UTF-8 requires walking the sequence, obviously.) It would also mean that the text element _a-diaeresis_ could have a count of 1 in some cases but a count of 2 in other cases. An Old Hangul syllable might have a count of 1, 2 or 3, depending on the syllable.

You might want a could of NFC-composable entities. In that case, _a-diaeresis_ would always have a count of 1, but an Old Hangul syllable would have a count of 1, 2 or 3 depending on the syllable.
You might want a count of grapheme clusters, as defined in UAX #29. In that case, _a-diaeresis_ or any Old Hangul Syllable would always have a count of 1.

Which way to count depends on one’s purpose for counting.


Peter

From: Unicode <unicode-bounces at unicode.org> On Behalf Of Slawomir Osipiuk via Unicode
Sent: Monday, June 15, 2020 7:33 AM
To: 'Corentin' <corentin.jabot at gmail.com>; 'unicode Unicode Discussion' <unicode at unicode.org>
Subject: RE: What constitute? an abstract character?

I believe the underlying question is:
How does one programmatically identify and/or count the abstract characters in a Unicode text?

Sławomir Osipiuk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200615/94b82ed8/attachment-0001.htm>


More information about the Unicode mailing list