A last missing link for interoperable representation

Richard Wordingham via Unicode unicode at unicode.org
Mon Jan 14 19:09:08 CST 2019

On Mon, 14 Jan 2019 06:24:46 +0000
James Kass via Unicode <unicode at unicode.org> wrote:

> Unicode doesn't enforce any spelling or punctuation rules.  Unicode 
> doesn't tell human beings how to pronounce strings of text or how to 
> interpret them.

These are not statements that are both honest and true.  Unicode lays
down rules and recommendations which others may then enforce.

In Indic scripts where LETTER A is not also a consonant, Unicode
forbids writing <LETTER A, SIGN AA> where LETTER AA would do the same
job, and most renderers enforce that rule.  Similarly, in phonetically
ordered LTR scripts, one can't write a dependent vowel as the first
character even if it is the leftmost character.

There is a subtler rule about not spelling negative numbers with a
hyphen-minus - if one does, one may suddenly find a line break just
after what is being used as a negative sign.

In scripts where Sanskrit grv and gvr may be rendered identically,
Unicode tells us what the two code sequences are, and therefore
indirectly what the range of pronunciations is for a given spelling.

Now, sometimes the enforcers overstep the mark.  For example, the USE
tells us that when we write Northern Thai /pʰiaʔ/ 'sound of a
smack' which visually is <gSIGN_E, gMEDIAL_RA (/ʰ/), gLOW_PA (/p/),
gSAKOT_LOW_YA, gSIGN_A (/ʔ/)>, with <gSIGN_E,...gSAKOT_LOW_YA>
denoting /ia/, we should write it ᨻ᩠ᨿᩕᩮᩡ <LOW PA, SAKOT, LOW YA, MEDIAL
RA, SIGN E, SIGN A>.  So much for phonetic order!

Enforcement can be more subtle.  TUS says that Farsi should use U+06CC
they are identical in initial and medial positions.  In this case, the
enforcer will be the spell-checker.


More information about the Unicode mailing list