From samjnaa at gmail.com Wed May 6 03:22:36 2015 From: samjnaa at gmail.com (Shriramana Sharma) Date: Wed, 6 May 2015 13:52:36 +0530 Subject: Bengali Vedic characters Message-ID: This is w.r.t. Srinidhi's preliminary review of non-Devanagari Vedic characters L2/15-101, my comments on it in L2/15-113, and the script review committee's report L2/15-149 p 5. I had located the Arcika (verse) part of the Kauthuma Sama Veda printed in Bengali script via DLI: http://www.dli.ernet.in/cgi-bin/DBscripts/allmetainfo.cgi?barcode=4990010095079. (Obviously Srinidhi had obtained the samples from this same site but had neglected to provide the link in his document.) However no scans are available on DLI for the Gana (melody) part of the same, whereas it is this part which requires a greater number of svara markers. First I was wondering whether the Gana part was published in Bengali script at all, but I hunted down the phone number of a qualified scholar of the Kauthuma Sama Veda who resides in Varanasi and who is a native Bengali, and had a telephonic conversation with him an hour ago. He informs me that the entire Kauthuma Sama Veda including the Arcika (verse) and Gana (melody) forms was published by Satyavrata Samashrami in Calcutta in the previous century. However he has no printed copies on hand and one has to go to the National Library in Calcutta to locate them. I am not sure when I will have the time and occasion to travel to Calcutta from Tamil Nadu for this. If anyone can help in locating digital copies of the Gana part, a comprehensive proposal for Bengali Sama Vedic svara markers can be prepared... -- Shriramana Sharma ???????????? ???????????? From elie.roux at telecom-bretagne.eu Tue May 12 08:28:04 2015 From: elie.roux at telecom-bretagne.eu (=?UTF-8?B?w4lsaWUgUm91eA==?=) Date: Tue, 12 May 2015 15:28:04 +0200 Subject: Deterministic sorting impossible for Tibetan with current state Message-ID: <5551FFE4.7070605@telecom-bretagne.eu> Dear all, I'm not sure I'm sending a mail to the correct list, please tell me if I'm not. I'm currently working on Tibetan sorting. It mostly works, except for this case: ???? This unicode sequence can be interpreted in two very different ways, both valid in terms of Tibetan language: - prefix ?, main letter ?, suffix ? - main letter ?, suffix ?, second suffix ? Both have their entries in a Tibetan dictionnary: one in the entries for letter ?, another (with a different meaning) in the entries for letter ?. It is thus currently impossible to determine the place of the string "????" in a dictionnary (Tibetans guess from the context). Are there other languages where this undetermination happens? Did they solve that problem? If not, what I propose is a new character, invisible, with the meaning "previous letter is the main letter in case of indetermination". This would, of course, not solve the problem entirely, as the string "????" would still be undetermined, but at least it would be possible for users to force its determination. What do you think? Thank you, -- Elie Roux From richard.wordingham at ntlworld.com Tue May 12 13:24:11 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 12 May 2015 19:24:11 +0100 Subject: Deterministic sorting impossible for Tibetan with current state In-Reply-To: <5551FFE4.7070605@telecom-bretagne.eu> References: <5551FFE4.7070605@telecom-bretagne.eu> Message-ID: <20150512192411.2f7a4c69@JRWUBU2> On Tue, 12 May 2015 15:28:04 +0200 ?lie Roux wrote: > I'm currently working on Tibetan sorting. It mostly works, except for > this case: > > ???? > > This unicode sequence can be interpreted in two very different ways, > both valid in terms of Tibetan language: > > - prefix ?, main letter ?, suffix ? > - main letter ?, suffix ?, second suffix ? > > Both have their entries in a Tibetan dictionnary: one in the entries > for letter ?, another (with a different meaning) in the entries for > letter ?. > > It is thus currently impossible to determine the place of the string > "????" in a dictionnary (Tibetans guess from the context). > > Are there other languages where this undetermination happens? Certain examples are rare; it's been claimed that there are none in Tibetan. Welsh has this problem, but the closest I could come is _englyna_ 'to compose' between eg- and eh- versus _engrafu_ 'to engrave', between enf- and enh-. > Did they solve that problem? Where one is a digraph, as with Welsh the letter 'ng', which comes between 'g' and 'h', the Unicode Collation Algorithm recommends inserting U+034F COMBINING GRAPHEME JOINER (CGJ). Soft hyphen will often do as well, as in the Welsh place name Llangollen, which does not include the letter 'ng'. So for your example, I would suggest that as in a lean Tibetan collation table, would be a collating element, that you write _mangs_ as and reserve for _mngas_. Richard. From ake.persson at mimer.se Tue May 12 13:31:02 2015 From: ake.persson at mimer.se (=?UTF-8?Q?=C3=85ke_Persson?=) Date: Tue, 12 May 2015 20:31:02 +0200 Subject: Deterministic sorting impossible for Tibetan with current state In-Reply-To: <5551FFE4.7070605@telecom-bretagne.eu> References: <5551FFE4.7070605@telecom-bretagne.eu> Message-ID: <1DBC21ECC96246E7A0E59106A9BACEC7@upright.nu> Dear ?lie, The combination - prefix ?, main letter ?, suffix ? does not exist in the dictionaries referenced from http://developer.mimer.com/charts/tibetan.htm. Where did you find it? Best regards, ?ke Persson > I'm currently working on Tibetan sorting. It mostly works, except for > this case: > > ???? > > This unicode sequence can be interpreted in two very different ways, > both valid in terms of Tibetan language: > > - prefix ?, main letter ?, suffix ? > - main letter ?, suffix ?, second suffix ? > > Both have their entries in a Tibetan dictionnary: one in the entries for > letter ?, another (with a different meaning) in the entries for letter ?. > > It is thus currently impossible to determine the place of the string > "????" in a dictionnary (Tibetans guess from the context). > > Are there other languages where this undetermination happens? Did they > solve that problem? If not, what I propose is a new character, > invisible, with the meaning "previous letter is the main letter in case > of indetermination". This would, of course, not solve the problem > entirely, as the string "????" would still be undetermined, but at least > it would be possible for users to force its determination. > > What do you think? > > Thank you, > -- > Elie Roux > > _______________________________________________ > Indic mailing list > Indic at unicode.org > http://unicode.org/mailman/listinfo/indic > From elie.roux at telecom-bretagne.eu Tue May 12 16:07:23 2015 From: elie.roux at telecom-bretagne.eu (=?UTF-8?B?w4lsaWUgUm91eA==?=) Date: Tue, 12 May 2015 23:07:23 +0200 Subject: Deterministic sorting impossible for Tibetan with current state In-Reply-To: <1DBC21ECC96246E7A0E59106A9BACEC7@upright.nu> References: <5551FFE4.7070605@telecom-bretagne.eu> <1DBC21ECC96246E7A0E59106A9BACEC7@upright.nu> Message-ID: <55526B8B.8090809@telecom-bretagne.eu> > The combination > - prefix ?, main letter ?, suffix ? > does not exist in the dictionaries referenced from > http://developer.mimer.com/charts/tibetan.htm. > > Where did you find it? There are a few examples of these page 48 of "Manuel de Tib?tain Standard" by Nicolas Tournadre. It exists in English under the name "Manual of Standard Tibetan", but the page might not be the same. The example he cites are (in ewts): - dabs vs. dbas - mangs vs. mgnas - dangs vs. dgnas - dgas vs dags (this one is often disambiguated with dwags) "mgnas" seems rare indeed, I can only find it in the word "gzugs mngas". But I'm no expert in Tibetan, I can ask some people with more knowledge if you want confirmation. Thank you, -- Elie From elie.roux at telecom-bretagne.eu Tue May 12 16:19:01 2015 From: elie.roux at telecom-bretagne.eu (=?UTF-8?B?w4lsaWUgUm91eA==?=) Date: Tue, 12 May 2015 23:19:01 +0200 Subject: Deterministic sorting impossible for Tibetan with current state In-Reply-To: <20150512192411.2f7a4c69@JRWUBU2> References: <5551FFE4.7070605@telecom-bretagne.eu> <20150512192411.2f7a4c69@JRWUBU2> Message-ID: <55526E45.6010609@telecom-bretagne.eu> > So for your example, I would suggest that as in a lean Tibetan > collation table, > would be a collating element, that you write _mangs_ as U+0F44, U+0F66 TIBETAN LETTER SA> and reserve > for _mngas_. You're right, I think this would work! I think I understand the COMBINING GRAPHEME JOINER better now with your example. Thank you very much for your help! -- Elie