abstract characters, semantics, meaningful transformations ... Was: Tibetan Paluta

Naena Guru via Unicode unicode at unicode.org
Mon May 1 09:19:27 CDT 2017

This whole attempt to make digitizing Indic script some esoteric, 
'abstract', 'semantic representation' and so on seems to me is an 
attempt to make Unicode the realm of the some super humans.

The purpose of writing is to represent speech. It is not some secret 
that demi-gods created that we are trying to explain with 'modern' 
linguistic gymnastics. sound => letter that is the basis for writing. 
English writing was massacred when printing was brought in from Europe. 
A similar thing is happening to Indic by all this mumbo-jumbo.

I call out to NATIVE users of Indic to explain what apparently Europeans 
or Americans are discussing here.

On 5/1/2017 10:47 AM, Philippe Verdy wrote:
> 2017-04-29 21:21 GMT+02:00 Naena Guru via Unicode <unicode at unicode.org 
> <mailto:unicode at unicode.org>>:
>     Just about the name paluta:
>     In Sanskrit, the length of vowels are measured in maaþra (a
>     cognate of the word 'meter'). It is the spoken length of a short
>     vowel. In Latin it is termed mora. Usually, you have only single
>     and double length vowels. A paluþa length is like when you call
>     out somebody from a distance. Pluta is a careless use of spelling.
>     Virama and Halanta are two other terms loosely used.
>     Anyway, Unicode is only about DISPLAYING a script: There's a shape
>     here; Let's find how to get it by assembling other shapes or by
>     creating a code point for it. What is short, long or longer in
>     speech is no concern for Unicode.
> Wrong. Unicode is absolutely not about how to "display" any script 
> (except symbols and notational symbols). Unicode does not encode 
> glyphs. Unicode encodes "abstract characters" according to their 
> semantics, in order to assign them properties allowing meaningful 
> transformations of text and in order to allow perfoirming searches 
> (with collation algorithms). What is important is their properties 
> (something that ISO 10646 does not care when it started the UCS in a 
> separate project, ignoring how it would be used, focusing too much on 
> apparent glytphs (and introducing lot of "compatiblity characters" 
> that would not have been encoded otherwise, and creating some havoc in 
> logical processing.
> Anyway Unciode makes some exceptions to the logical model only for 
> roundtrip comptaibility with other standards that used another 
> encoding model widely used, notably in Thai: these are the exception 
> where there are "prepended" letters. There was some havoc also for 
> some scripts in India because of roundtrip compatiblity with an Indian 
> standard (criticized by many users of Tamil and some other Southern 
> Indic scripts that don't follow directly the paradigm created for 
> getting some limited transliteration with Devanagari: that initial 
> desire was abandoned but the legacy Indic scripts in India were 
> imported as is to Unicode)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170501/2b5f5840/attachment.html>

More information about the Unicode mailing list