Re: Alternative encodings for Malayalam “nta”

Mon Oct 7 15:05:08 CDT 2019

[Putting the public mailing list back to the recipient list.]

Cibu,

Thanks for your L2/19-348 <https://www.unicode.org/L2/L2019/19348-malayalam-response.pdf> (Response to L2/19-345). My comments:

> I am curious to know the reference for the phonetic analysis described in section A chillu-less analysis in the proposal L2/19-345. How can a phonetic analysis be the basis for an important double encoding decision?

The basis is not the phonetic analysis (the phonetic analysis is only provided in the document as an fyi, so readers understand why many people use it), but the fact of a widespread alternative encoding.

Basically we need to properly recognize the failure of ensuring a single, ideal encoding. It’s not helpful to keep the Core Spec detached from the reality.

> Anycase, the sequence implied by this particular analysis is an artifact of the evolution of Unicode for Malayalam; it is not grounded in any prior writing traditions or academic literature.

We’re not talking about legitimacy of the phonetic encoding.

> In Malayalam, dental /n̪/ and alveolar /n/ are not allophones as implied in the proposal.

I actually didn’t suggest any allophone relationship, on purpose. If it’s helpful, I can change the “~” notation in “[n̪a ~ na]” (and [ra ~ ta]) to “/” or “,” in a revision.

> So using <NA, VIRAMA> for CHILLU N is not phonetically accurate.

This is not a valid argument (see the next paragraph), although accuracy is not relevant  anyway (as I said, I was trying to explain why people use <NA, VIRAMA, RRA>, not trying to legitimize it.).

The written form ൻ is the syllable-coda specific form of the written form ന, and the pronunciation of ൻ being limited to [n] is a result of Malayalam’s phonology ([n̪] not usually appearing in a syllable-coda position, unless preceding another dental sound).

The reason for ന് being used in the phonetic encoding is mostly because ൻ is not considered to be eligible for conjunct forming, and ന് is the natural fallback. Again, I’m not trying to legitimize the encoding, but only explaining my observation of the widespread encoding.

> Moreover, if you show the visual ന്‌റ (<<NA, visual VIRAMA, RRA>>) to a native user (who is unaware of Unicode particulars), they will not identify it as (<<chillu N, subscript RRA>> /ntʌ/); instead, they would read it as /nərʌ/.

Not relevant. I avoided “ന്‌റ” particularly for this kind of argument. The ് was only there to mark an inherent vowel suppressed ന. I almost avoided ് altogether because of its ambiguity, but didn’t do it, because that would make the document too obscure. The point of an an inherent vowel suppressed ന is used in the phonetic encoding, and ് just happens to be used there.

> This proposal does not address the remaining chillu conjuncts described in L2/19-086R.

The document doesn’t propose any productive encoding rule. Why does it need to address other cases?

> It also does not address the legacy sequence supported by MS Windows <NA, VIRAMA, ZWJ, RRA> for (<<chillu N, subscript RRA>>).

I can make it clearer that <NA, VIRAMA, ZWJ, RRA> is just plainly unacceptable as it clashes with our general rule of chillu not forming a conjunct with its following letter automatically (without a conjoiner), in Section 4, Real-world encodings.

> I am not sure how this proposal is going to solve the issue of inadequate support for <CHILLU N, VIRAMA, RRA>, without explicitly rescinding this sequence. Double encoding for (<<chillu N, subscript RRA>>) is not going to solve any issue, if not, making the issue more acute. Double encoding is never a desirable quality for Unicode. So the decision should not be taken lightly or hastly. It needs to be clearly thought through, probably through a PRI.

Double encoding will not be solved. The proposal is about recognizing the reality of failure. With Windows on the loose for so many years, we’ve already missed the opportunity of ensuring a single encoding for the written form.

Now the standard needs to first recognize the widespread encoding that won’t go away, so implementers are informed. Then we see which direction we should push Microsoft and Apple to converge.

I agree that the Unicode Standard might need to have a clear disposition/preference between the graphic and phonetic encodings, so the two are not considered to be just equal, so we can have a direction for pushing the implementations to converge.

> Prior to Unicode 5.2, the encoding of the cluster [glyph] (<<chillu N, subscript RRA>> /ntʌ/) was not clearly defined. …

You mean 5.1, right? The encoding has been specified since 5.1.

> … and <NA, VIRAMA, ZWJ, RRA> …

How can implementations support this encoding without breaking the side-by-side form ൻറ though?

Best,
梁海 Liang Hai
https://lianghai.github.io <https://lianghai.github.io/>

>> On Oct 6, 2019, at 15:10, Cibu <cibucj at gmail.com <mailto:cibucj at gmail.com>> wrote:
>> 
>> Yes; it is now available as L2/19-348 <http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/19-348>.
>> 
>> On Sun, Oct 6, 2019 at 11:03 PM Asmus Freytag (c) <asmusf at ix.netcom.com <mailto:asmusf at ix.netcom.com>> wrote:
>> Have you submitted that response as a UTC document?
>> A./
>> 
>> On 10/6/2019 2:08 PM, Cibu wrote:
>>> Thanks for addressing this. Here is my response: https://docs.google.com/document/d/1K6L82VRmCGc9Fb4AOitNk4MT7Nu4V8aKUJo_1mW5X1o/ <https://docs.google.com/document/d/1K6L82VRmCGc9Fb4AOitNk4MT7Nu4V8aKUJo_1mW5X1o/>
>>> 
>>> In summary, my take is:
>>> 
>>> The sequence <NA, VIRAMA, RRA> for ൻ്റ (<<chillu N, subscript RRA>>) should not be legitimized as an alternate encoding; but should be recognized as a prevailing non-standard legacy encoding.
>>> 
>>> 
>>> On Sun, Oct 6, 2019 at 7:57 PM 梁海 Liang Hai <lianghai at gmail.com <mailto:lianghai at gmail.com>> wrote:
>>> Folks,
>>> 
>>> (Microsoft Peter and Andrew, search for “Windows” in the document.)
>>> 
>>> (Asmus, in the document there’s a section 5, ICANN RZ-LGR situation—let me know if there’s some news.)
>>> 
>>> This is a pretty straightforward document about the notoriously problematic encoding of Malayalam <chillu n, bottom-side sign of rra>. I always wanted to properly document this, so finally here it is:
>>> 
>>> L2/19-345 <http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/19-345>
>>> Alternative encodings for Malayalam "nta"
>>> Liang Hai
>>> 2019-10-06
>>> 
>>> Unfortunately, as <NA, VIRAMA, RRA> has already become the de facto standard encoding, now we have to recognize it in the Core Spec. It’s a bit like another Tamil srī situation.
>>> 
>>> An excerpt of the proposal:
>>> 
>>> Document the following widely used encoding in the Core Specification as an alternative representation for Malayalam [glyph] (<chillu n, bottom-side sign of rra>) that is a special case and does not suggest any productive rule in the encoding model:
>>> 
>>> <U+0D28 ന MALAYALAM LETTER NA, U+0D4D ◌് MALAYALAM SIGN VIRAMA, U+0D31 റ MALAYALAM LETTER RRA>
>>> 
>>> Best,
>>> 梁海 Liang Hai
>>> https://lianghai.github.io <https://lianghai.github.io/>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20191007/afb69215/attachment.html>