abstract characters, semantics, meaningful transformations ... Was: Tibetan Paluta

Alastair Houghton via Unicode unicode at unicode.org
Mon May 1 10:26:04 CDT 2017

On 1 May 2017, at 15:19, Naena Guru via Unicode <unicode at unicode.org> wrote:
> This whole attempt to make digitizing Indic script some esoteric, 'abstract', 'semantic representation' and so on seems to me is an attempt to make Unicode the realm of the some super humans.

No.  It’s important so that the standard Unicode algorithms function acceptably for Indic languages.  The design of Unicode is such that, compatibility characters and other some special cases aside, it encodes semantics as opposed to graphic representations.

> The purpose of writing is to represent speech.

Yes, and Unicode is intended to give us a representation of speech *that is amenable to machine processing*.

The other extreme is what used to happen on many Chinese and Japanese websites, namely “representing speech” by way of an image - if you want to process the text in one of those images, well, good luck with that (you’ll want to start with some kind of OCR).

Perhaps part of the problem here is that Unicode sits at the intersection between linguistics and software engineering; the discussion of both sides of this is likely to be quite technical, some of the vocabulary used might well seem like “mumbo jumbo”, just as some of the design decisions might not make sense if your expertise is mainly on one side or mainly on the other (or, for that matter, if you have little exposure to other languages or the challenges inherent in encoding or rendering them).  However, for all that it might *sound* like “mumbo jumbo” to you, it is not.

Kind regards,



More information about the Unicode mailing list