Script-Specific Composition Exclusions

Charlotte Eiffel Lilith Buff irgendeinbenutzername at gmail.com
Sun Oct 20 03:43:18 CDT 2024


This question on Stack Overflow
<https://stackoverflow.com/questions/79104685/in-unicode-why-%e0%a5%98-is-excluded-from-composition-whereas-%c3%85-is-not>
sent me on a wild Google spree yesterday trying to find the reason why
certain characters are included in the Composition_Exclusion set,
particularly the Devanagari, Bengali, Gurmukhi, and Oriya letters with
nukta, but I wasn’t able to locate any relevant documents from back then.

As I understand it (and I believe this was even the wording used in
previous versions of UAX #15), the script-specific exclusions exist because
for a handful of characters the fully decomposed form is the preferred
representation in regular usage. This makes sense to me for the precomposed
Hebrew letters because with so many combining marks with unique CCC values,
it just seems easier to deal exclusively with combining character sequences
and not have some random marks “glue” themselves to the base letter. The
two-part Tibetan subjoined letters are similar in this regard.

However, the Indic nuktas seem entirely unproblematic and in fact not all
precomposed letters with nukta are composition-excluded: Devanagari has ऩ,
ऱ, and ऴ for example.

Does anyone remember what lead to these specific decisions or knows where
to find the relevant documents if they exist?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20241020/afabece9/attachment.htm>


More information about the Unicode mailing list