Specification of Encoding of Plain Text
Richard Wordingham
richard.wordingham at ntlworld.com
Mon Jan 9 16:24:14 CST 2017
Where, if anywhere, is the encoding of plain text specified? I am
particularly concerned with the arrangement of the code sequences for
non-spacing abstract characters once one has determined an encoding for
the abstract characters.
For example, a naive reading of TUS 9.0 Section 16.4 Subsection
"Ordering of Syllable Components" would lead one to believe that the
word _khnyom_ 'I' shall be encoded as <U+1781 KHMER LETTER KHA,
U+17D2 KHMER SIGN COENG, U+1789 KHMER LETTER NYO, U+17BB KHMER VOWEL
SIGN U, U+17C6 KHMER SIGN NIKAHIT>. However, on further investigation,
I cannot find any text that says that <U+1781, U+17C6, U+17D2, U+1789,
U+17BB> would not be compliant with the Unicode standard. Have I
missed anything?
One might hope that the subsection about 'logical order' in TUS 9.0
Section 2.2 Unicode Design Principles would help, but:
1) Section 3 'Conformance' says nothing about logical order; and
2) The subsection about 'logical order' seems to assume that there
exists a common practice; it does not actually place any requirement
on this common practice.
Richard.
More information about the Unicode
mailing list