Terminology (was: Latin glottal stop in ID in NWT, Canada)
richard.wordingham at ntlworld.com
Fri Oct 23 17:16:32 CDT 2015
On Fri, 23 Oct 2015 13:34:26 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:
> On Fri, 23 Oct 2015 08:53:15 +0100, Richard Wordingham wrote:
> > I would like an English translation of Chapter 3 'Conformance',
> I guess that there may be some need of a *manual*, in the spirit that
> led the French translator to adding annotations. May you please quote
> some examples of what you wish to see expressed in a different way?
"C5: A process shall not assume that it is required to interpret any
particular coded character sequence."
I think this is meant to mean that processes do not have to interpret
every coded character sequence presented to them, but this appears to
be a concession and not a requirement, and I cannot derive it from the
text. An example of a non-compliant process would be helpful. I could
interpret this requirement as prohibiting the generation of a missing
glyph glyph, for that is an error report that it has failed to
interpret a coded character sequence. I hope this is not an intended
"C6: A process shall not assume that the interpretations of two
canonical-equivalent character sequences are distinct."
Firstly, I have grave difficulties assigning mental activities to
Secondly, it may be possible to interpet "A process shall not assume X"
as "A process shall function correctly regardless of whether X holds."
However, let image(Y) be the bitmap depicting the string Y. Then the
following logic would be non-compliant:
if A and B are canonically equivalent and image(A) and image(B) are
write(A, " and ", B, "are canonically equivalent but have different
images ", image(A), " and ", image(B));
The logic is non-compliant, for if it is invoked then the write
statement will only work correctly if image(A) and image(B) are
different, i.e. if A and B are interpreted differently. Apparently it
is permissible to render canonically equivalent sequences differently, so
image(A) and image(B) might be different even though canonically
I therefore conclude that C6 is in some language that I do not
> Again, I do know nothing about Thai, but if in TUS an abugida can be
> addressed to as an alphabet if the same is used as such, it seems to
> me that the word 'alphabet' has a pretty extended meaning in TUS.
TUS tries to make accurate use of the distinction between 'alphabet',
'abugida' and 'abjad', 20th century jargon promoted if not invented by
Peter Daniels. The distinction lies in the way vowels are indicated -
always / with a default / not at all. The distinction may be useful
for a writing system, i.e. a way of using the 'script', but it rapidly
encounters the problem that a script may have several different writing
systems. For example, the presence or absence of vowel marks switches
the Arabic and Hebrew scripts, as used for those languages, between
being an abjad and being an alphabet.
> In any case, isolating an arbitrary subset inside our Latin script
> and promoting it as the so-called Roman alphabet to get some pretext
> for refusing that compatriots or strangers bear their real and
> choosen names, [quote] IS A SERIOUS INSULT [/quote].
It is not a matter of an 'arbitrary' subset. I expect the relevant
subset is the *French* alphabet, assuming Quebec has followed (or
preceded?) France and added 'w' to the alphabet. That this subset
should be confused with the concept of 'Roman' is not surprising, even
though the Romans lacked 'J', 'j', 'U', 'v', 'W' and 'w'.
> Additionally, at the age of Unicode, this results in being as well an
> insult to the whole work of the Consortium.
Unicode does not dictate what is accepted as 'the alphabet'; it is only
recently that 'j' has been accepted as part of the Welsh alphabet.
When I was a child, I learnt that there was no 'j' in the Welsh
alphabet - and wondered how the Joneses were supposed to write their
names in Welsh! (One partial answer, of course, is that the very
common Welsh surname 'Jones' is English for 'Evans'.)
More information about the Unicode