Specification of Encoding of Plain Text

Richard Wordingham richard.wordingham at ntlworld.com
Mon Jan 9 16:24:14 CST 2017


Where, if anywhere, is the encoding of plain text specified?  I am
particularly concerned with the arrangement of the code sequences for
non-spacing abstract characters once one has determined an encoding for
the abstract characters.

For example, a naive reading of TUS 9.0 Section 16.4 Subsection
"Ordering of Syllable Components" would lead one to believe that the
word _khnyom_ 'I' shall be encoded as <U+1781 KHMER LETTER KHA,
U+17D2 KHMER SIGN COENG, U+1789 KHMER LETTER NYO, U+17BB KHMER VOWEL
SIGN U, U+17C6 KHMER SIGN NIKAHIT>.  However, on further investigation,
I cannot find any text that says that <U+1781, U+17C6, U+17D2, U+1789,
U+17BB> would not be compliant with the Unicode standard.  Have I
missed anything?

One might hope that the subsection about 'logical order' in TUS 9.0
Section 2.2 Unicode Design Principles would help, but:

1) Section 3 'Conformance' says nothing about logical order; and
2) The subsection about 'logical order' seems to assume that there
exists a common practice; it does not actually place any requirement
on this common practice. 

Richard.



More information about the Unicode mailing list