<div dir="ltr">Excellent research! Thanks a lot!<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am So., 20. Okt. 2024 um 16:14 Uhr schrieb Robin Leroy <<a href="mailto:egg.robin.leroy@gmail.com">egg.robin.leroy@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Le dim. 20 oct. 2024 à 10:48, Charlotte Eiffel Lilith Buff via Unicode <<a href="mailto:unicode@corp.unicode.org" target="_blank">unicode@corp.unicode.org</a>> a écrit :</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">As I understand it (and I believe this was even the wording used in previous versions of UAX #15), the script-specific exclusions exist because for a handful of characters the fully decomposed form is the preferred representation in regular usage. This makes sense to me for the precomposed Hebrew letters because with so many combining marks with unique CCC values, it just seems easier to deal exclusively with combining character sequences and not have some random marks “glue” themselves to the base letter. The two-part Tibetan subjoined letters are similar in this regard.</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">However, the Indic nuktas seem entirely unproblematic and in fact not all precomposed letters with nukta are composition-excluded: Devanagari has ऩ, ऱ, and ऴ for example.<br><br></blockquote><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Does anyone remember what lead to these specific decisions or knows where to find the relevant documents if they exist?<br></div></div></blockquote><div>I certainly wasn’t involved in Unicode when the relevant documents were discussed, as I was busy learning the letters in the Basic Latin block¹, but I looked at some of them a couple of years ago.</div><div><ul><li>Revision 9 of then-DUTR² #15 <a href="https://www.unicode.org/reports/tr15/tr15-9.html" target="_blank">https://www.unicode.org/reports/tr15/tr15-9.html</a>, dated 1998-11-23, and entered into the <a href="https://www.unicode.org/L2/L1998/Register-1998.html" target="_blank">registry</a> as L2/98-404, does not mention composition exclusions.</li><li>The first revision (10) that mentions characters <i>excluded from being primary composites</i> is <a href="https://www.unicode.org/reports/tr15/tr15-10.html#Definitions" target="_blank">https://www.unicode.org/reports/tr15/tr15-10.html#Definitions</a>, dated 1998-12-16. The rationale is indeed that <i>This would be to match common practice for scripts that use fully decomposed forms.</i> The sole example given is FB31.</li><li>The next revision (11) includes a list of composition exclusions: <a href="https://www.unicode.org/reports/tr15/tr15-11.html#Primary%20Exclusion%20List%20Table" target="_blank">https://www.unicode.org/reports/tr15/tr15-11.html#Primary%20Exclusion%20List%20Table</a>, dated 1999-02-25. This list includes 0958..095F.</li></ul><div>Between revisions 9 and 10, we have UTC #78, whose minutes are <a href="https://www.unicode.org/L2/L1998/98419.pdf" target="_blank">L2/98-419</a>. See the discussion in the section titled “Normalization [Document L2/98-404]”, and in particular the last comment from Ken Whistler.<br></div><div>Between revisions 10 and 11, we have UTC #79, in whose minutes <a href="https://www.unicode.org/L2/L1999/99054r.htm#79-0" target="_blank">L2/99-054R</a>, in the section “Proposed Draft UTR #15, Unicode Normalization”, we get a similar comment from Ken towards the end.</div><div>The minutes of UTC #80, <a href="https://www.unicode.org/L2/L1999/99176.htm" target="_blank">L2/99-176</a>, have some discussion of normalization, and motion 80-M25 letting the editorial committee change the composition exclusions table; but by that point 0958 is already in there, so digging there isn’t going to help.</div><div><br></div><div>However, some later documents provide relevant context:</div><ul><li><a href="https://www.unicode.org/L2/L2001/01304-feedback.pdf" target="_blank">L2/01-304</a> (p. 17, in the section on Devanagari).</li><li><a href="https://www.unicode.org/L2/L2001/01305-india-resp.txt" target="_blank">L2/01-305</a> (section on Devanagari).</li></ul></div><div>So there was clear feedback from India that U+0958 क़ and friends should be discouraged; presumably the UTC must have been aware of that in 1999. On the distinction between क़ vs. ऴ, I guess this is related to ऴ being atomic in ISCII; in turn that is because while ऴ is decomposable, corresponding letters in other ISCII scripts (ழ, ఴ, ഴ) are not. See also point (viii) of <a href="https://www.unicode.org/L2/L2001/01304-feedback.pdf" target="_blank">L2/01-304</a>; there still was a desire to make the encodings similar between the scripts.</div><div><br></div><div>I am sure Ken can provide more details.</div><div><br></div><div>Best regards,</div><div><br></div><div>Robin Leroy</div><div><br></div><div>―</div><div>¹ As well as a few from the Latin-1 Supplement and Latin Extended-A blocks.<br></div><div>² This predates <a href="https://www.unicode.org/L2/L2000/00118-parts.txt" target="_blank">L2/00-118</a> and UTC decision <a href="https://www.unicode.org/L2/L2000/00115.htm#83-C6" target="_blank">83-C6</a> which gave us the terms UAX and UTS.</div></div></div>
</div>
</div>
</blockquote></div>