From rick at unicode.org Thu Jan 2 11:25:35 2014 From: rick at unicode.org (Rick McGowan) Date: Thu, 02 Jan 2014 09:25:35 -0800 Subject: Mail list changes for 2014 In-Reply-To: <52C2F5D4.30909@unicode.org> References: <529E6194.5020103@unicode.org> <52C2F5D4.30909@unicode.org> Message-ID: <52C5A10F.7060501@unicode.org> The Indic mail list has now been re-activated. Regards, Rick On 12/31/2013 8:50 AM, Rick McGowan wrote: > As mentioned, this list will be taken off-line shortly, and be > restored after the new year. (A note will be sent when it is back.) > Regards, > Rick > > On 12/3/2013 2:56 PM, Rick McGowan wrote: >> At the end of the year, we will be changing the mail list server for >> the public-access mail lists, including this one. The new system will >> be Gnu "Mailman", an interface familiar to many. This should make it >> easier for users to handle their subscriptions and options in one >> place, via the web interface. >> >> We will thus be shutting down the public mail lists over the "holiday >> break" in the final days of 2013, and re-open with the new system in >> January 2014. >> >> Affected mail lists are those listed on the Mail Lists page here: >> http://www.unicode.org/consortium/distlist.html >> including Unicode, CLDR-Users, ULI-Users, and Indic. >> >> The new mail list system is documented here: >> http://www.gnu.org/software/mailman/ >> > From pravin.d.s at gmail.com Fri Jan 10 04:15:00 2014 From: pravin.d.s at gmail.com (pravin.d.s at gmail.com) Date: Fri, 10 Jan 2014 15:45:00 +0530 Subject: Handling Malayalam "NTA" issue for Lohit2 Message-ID: Hi All, We are working on lohit2[1] project, whose plan is to create standard and reusable open type tables with additional improvement. Lohit as a default system fonts in most of the open source distros always follow standard around language technology. (Font specification, Storage, Guideline related to Languages) Recently we started working on Lohit Malayalam font [2] with some planned improvement and came across couple of bugs related [3][4] with well know "NTA" issue introduced during the addition of Atomic chillu characters in Unicode 5.1 Now dilemma is number of users already using * A. u0D28 + u0D4D + u0D31 for getting NTA character even before Unicode 5.1 * * B. But Unicode from 5.1 onward says (TUS 6.2 chapter 9.9 p 321) use u0D7B + u0D4D + u0D31 for getting same "NTA" * In my humble opinion here one thing is very clear that Unicode forgot to add normalization (backward compatibility) for newly added sequence in (B). Still i have not seen any improvement in it from long time. Now dilemma with lohit2 development is - Lohit 1 is supporting sequence (A) from long time (even before Unicode 5.1), so for the backward compatibility lohit2 should support the same. - Since Lohit follows standards, it is important to support sequence (B) for following Unicode 6.3. But following Unicode 6.3 in this case clearly invites dual encoding without any normalization rules handy. Good documentation on NTA issues is available at [5] Presently i am in favour of not supporting Unicode defined sequence (B) in lohit2 and keep on using (A) which is used in Lohit fonts family from long time. Please let me know your view on it. Is there any chance of getting this mention in Unicode chapter 9? is there any chance of Normalization rule for this? Regards, Pravin Satpute 1. http://pravin-s.blogspot.in/2013/08/project-creating-standard-and-reusable.html 2. http://pravin-s.blogspot.in/2013/12/lohit2-lohit-malayalam-development-plans.html 3. https://bugzilla.redhat.com/show_bug.cgi?id=1016984 4. https://bugzilla.redhat.com/show_bug.cgi?id=1016989 5. http://thottingal.in/documents/Malayalam-NTA.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Fri Jan 10 06:24:46 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Fri, 10 Jan 2014 17:54:46 +0530 Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com wrote: > In my humble opinion here one thing is very clear that Unicode forgot to > add normalization (backward compatibility) for newly added sequence in (B). Dear Pravin, If by normalization you mean http://www.unicode.org/glossary/#normalization -- then it is not possible in this case since the individually encoded chillus do not have canonical decomposition to their related consonants. Indeed, that would defeat the purpose of the separate encoding, which was to provide semantically distinct chillus! The recent additional chillus trickling into the standard seems to indicate that one should have encoded a CHILLU MARKER back then, but there's no going back now, so chillus galore! ;-) On a more serious note, I think it is important to adhere to the standard, as it is good for you in the long run even though it is difficult at first. If you delay the adoption of the standard, it only gets all the harder as time passes, since in the interim even more people continue to assume the old behaviour... -- Shriramana Sharma ???????????? ???????????? From paivakil at gmail.com Fri Jan 10 11:46:30 2014 From: paivakil at gmail.com (Mahesh T. Pai) Date: Fri, 10 Jan 2014 23:16:30 +0530 Subject: Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: <20140110174630.GA18104@localhost> pravin.d.s at gmail.com said on Fri, Jan 10, 2014 at 03:45:00PM +0530,: - Lohit 1 is supporting sequence (A) from long time (even before > Unicode 5.1), so for the backward compatibility lohit2 should support the > same. > I believe thet the UTC wanted to maintain compatibility with some _beta_ version of Microsoft's some software in making the choice that it did regarding the /nta/ sequence. > Presently i am in favour of not supporting Unicode defined > sequence (B) in lohit2 and keep on using (A) which is used in Lohit > fonts family from long time. Allow me to go on a nostalgia trip. Almost a decade back, the then SMC team came accross what was obvious lack of clarity in the UTS. They decided, against my advise, to follow the suggestions in OpenType definition. To be fair, then, I had no alternative to offer, except not to implement the suggestion in the OpenType pages. Microsoft ultimately waited for some clarity in the UTS before implementing anything. and the communimity efforts went (mostly) in vain. Right now, given a choice between supporting legacy data and standards, I will choose the latter, with some kind of jugaad based on the PUA / glyph name to enable support for legacy data. Not the ideal situation, but when politics get the uppoer hand over merits, efficiency and appropriateness always takes a backseat. -- Mahesh T. Pai || free - (adj) able to act at will; not hampered; not under compulsion or restraint; free from obligations or duties; not bound to servitude; at liberty. From pravin.d.s at gmail.com Mon Jan 13 00:04:33 2014 From: pravin.d.s at gmail.com (pravin.d.s at gmail.com) Date: Mon, 13 Jan 2014 11:34:33 +0530 Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: On 10 January 2014 17:54, Shriramana Sharma wrote: > On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com > wrote: > > In my humble opinion here one thing is very clear that Unicode > forgot to > > add normalization (backward compatibility) for newly added sequence in > (B). > > Dear Pravin, > > If by normalization you mean > http://www.unicode.org/glossary/#normalization -- then it is not > possible in this case since the individually encoded chillus do not > have canonical decomposition to their related consonants. Indeed, that > would defeat the purpose of the separate encoding, which was to > provide semantically distinct chillus! > Ok not normalization but at least Unicode should mention old habit of writing NTA and new with addition of atomic chillu. It will definitely help people working on NLP to handle data having these two different sequence. > > On a more serious note, I think it is important to adhere to the > standard, as it is good for you in the long run even though it is > difficult at first. If you delay the adoption of the standard, it only > gets all the harder as time passes, since in the interim even more > people continue to assume the old behaviour... > >From font perspective if we consider there is NTA sequence is available in both form (A) & (B) in data around. We have to add required rules for both way. Unfortunately in this case Unicode has not consider for backward compatibility but at least Lohit project definitely consider it. So to be in safer side now i am fever of having both rules in font. Regards, Pravin Satpute -------------- next part -------------- An HTML attachment was scrubbed... URL: From pravin.d.s at gmail.com Mon Jan 13 00:28:52 2014 From: pravin.d.s at gmail.com (pravin.d.s at gmail.com) Date: Mon, 13 Jan 2014 11:58:52 +0530 Subject: Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: <20140110174630.GA18104@localhost> References: <20140110174630.GA18104@localhost> Message-ID: On 10 January 2014 23:16, Mahesh T. Pai wrote: > pravin.d.s at gmail.com said on Fri, Jan 10, 2014 at 03:45:00PM +0530,: > - Lohit 1 is supporting sequence (A) from long time (even before > > Unicode 5.1), so for the backward compatibility lohit2 should support > the > > same. > > > > I believe thet the UTC wanted to maintain compatibility with some > _beta_ version of Microsoft's some software in making the choice that > it did regarding the /nta/ sequence. > > > > Presently i am in favour of not supporting Unicode defined > > sequence (B) in lohit2 and keep on using (A) which is used in Lohit > > fonts family from long time. > > Allow me to go on a nostalgia trip. Almost a decade back, the then SMC > team came accross what was obvious lack of clarity in the UTS. They > decided, against my advise, to follow the suggestions in OpenType > definition. To be fair, then, I had no alternative to offer, except > not to implement the suggestion in the OpenType pages. Microsoft > ultimately waited for some clarity in the UTS before implementing > anything. and the communimity efforts went (mostly) in vain. > I was wondering how ISCII was handling this. > > Right now, given a choice between supporting legacy data and > standards, I will choose the latter, with some kind of jugaad based on > the PUA / glyph name to enable support for legacy data. > Yeah, as said above will support both legacy and standard sequence. > > Not the ideal situation, but when politics get the uppoer hand over > merits, efficiency and appropriateness always takes a backseat. > That is pain point of standardization activities. Thanks & Regards, Pravin Satpute -------------- next part -------------- An HTML attachment was scrubbed... URL: From cibucj at gmail.com Mon Jan 13 00:32:16 2014 From: cibucj at gmail.com (=?UTF-8?B?4LS44LS/4LSs4LWBIOC0uOC0vyDgtJzgtYY=?=) Date: Sun, 12 Jan 2014 22:32:16 -0800 Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: In fact, there is one more sequence to consider. Kartika in Windows follows for NTA. However, the existing data in that sequence is quite less. In case, Chillus standard is asking display software to be prepared for data in both sequences. I agree, it could document NTA's legacy Vs standard sequences, likewise. 2014/1/12 pravin.d.s at gmail.com > > > > On 10 January 2014 17:54, Shriramana Sharma wrote: > >> On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com >> wrote: >> > In my humble opinion here one thing is very clear that Unicode >> forgot to >> > add normalization (backward compatibility) for newly added sequence in >> (B). >> >> Dear Pravin, >> >> If by normalization you mean >> http://www.unicode.org/glossary/#normalization -- then it is not >> possible in this case since the individually encoded chillus do not >> have canonical decomposition to their related consonants. Indeed, that >> would defeat the purpose of the separate encoding, which was to >> provide semantically distinct chillus! >> > > Ok not normalization but at least Unicode should mention old habit of > writing NTA and new with addition of atomic chillu. It will definitely help > people working on NLP to handle data having these two different sequence. > > >> >> On a more serious note, I think it is important to adhere to the >> standard, as it is good for you in the long run even though it is >> difficult at first. If you delay the adoption of the standard, it only >> gets all the harder as time passes, since in the interim even more >> people continue to assume the old behaviour... >> > > From font perspective if we consider there is NTA sequence is available in > both form (A) & (B) in data around. We have to add required rules for both > way. Unfortunately in this case Unicode has not consider for backward > compatibility but at least Lohit project definitely consider it. > > So to be in safer side now i am fever of having both rules in font. > > Regards, > Pravin Satpute > > > > _______________________________________________ > Indic mailing list > Indic at unicode.org > http://unicode.org/mailman/listinfo/indic > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavanaja at vishvakannada.com Sat Jan 18 06:38:25 2014 From: pavanaja at vishvakannada.com (Pavanaja U B) Date: Sat, 18 Jan 2014 18:08:25 +0530 Subject: Tulu Unicode Message-ID: <001501cf144a$316a28d0$943e7a70$@vishvakannada.com> What are the steps involved to add Tulu language to Unicode? Regards, Pavanaja -------------- next part -------------- An HTML attachment was scrubbed... URL: From sisrivas at yahoo.com Sat Jan 18 07:26:52 2014 From: sisrivas at yahoo.com (Sinnathurai Srivas) Date: Sat, 18 Jan 2014 05:26:52 -0800 (PST) Subject: Tulu Unicode In-Reply-To: <001501cf144a$316a28d0$943e7a70$@vishvakannada.com> References: <001501cf144a$316a28d0$943e7a70$@vishvakannada.com> Message-ID: <1390051612.3615.YahooMailNeo@web125803.mail.ne1.yahoo.com> I would like to interact with experts involved in encoding Tulu. The use of the original scientific base for gramatising alphabet, which is scalable and covers the entire spectrum with simplified representation need to be considered as Tulu is a branch of such original foundations. Thanks Sinnathurai Srivas On Saturday, 18 January 2014, 12:47, Pavanaja U B wrote: What are the steps involved to add Tulu language to Unicode? ? Regards, Pavanaja ? ? ? _______________________________________________ Indic mailing list Indic at unicode.org http://unicode.org/mailman/listinfo/indic -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Sun Jan 19 00:53:50 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Sun, 19 Jan 2014 12:23:50 +0530 Subject: Tulu Unicode In-Reply-To: <001501cf144a$316a28d0$943e7a70$@vishvakannada.com> References: <001501cf144a$316a28d0$943e7a70$@vishvakannada.com> Message-ID: You cannot add a language to Unicode -- you can only add a script, for which you need to prepare a technically correct proposal with sufficient attestations. Or do you mean adding data about Tulu language written in the Kannada script (such as weekday names etc) to the related standard CLDR? See the CLDR section on unicode.org. (I'm not very knowledgeable about CLDR.) -- Shriramana Sharma ???????????? ???????????? From naa.ganesan at gmail.com Sun Jan 19 01:05:29 2014 From: naa.ganesan at gmail.com (N. Ganesan) Date: Sat, 18 Jan 2014 23:05:29 -0800 Subject: Tulu Unicode In-Reply-To: <001501cf144a$316a28d0$943e7a70$@vishvakannada.com> References: <001501cf144a$316a28d0$943e7a70$@vishvakannada.com> Message-ID: On Sat, Jan 18, 2014 at 4:38 AM, Pavanaja U B wrote: > What are the steps involved to add Tulu language to Unicode? > > > There is already a detailed proposal to add Tulu script to Unicode standard, M. Everson's document on Tulu encoding: http://www.unicode.org/L2/L2011/11120-n4025-tulu.pdf Tulu, like many Indian and other languages is written in two scripts. For example, Tevaram, sacred scriptures from Tamil, gets written in Tamil as well as Grantha scripts. Regards N. Ganesan > Regards, > > Pavanaja > > > > > > > > _______________________________________________ > Indic mailing list > Indic at unicode.org > http://unicode.org/mailman/listinfo/indic > > -------------- next part -------------- An HTML attachment was scrubbed... URL: