Standaridized variation sequences for the Desert alphabet?

Tue Mar 28 07:10:58 CDT 2017

On 2017/03/27 21:59, Michael Everson wrote:
> On 27 Mar 2017, at 08:05, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>
>>> Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are apparently really supposed to have identical glyphs, though we use an old-fashioned style in the charts for the former. (Yes, I am of course aware that there are other reasons for distinguishing these, but as far as glyphs go, even our standard distinguishes them artificially.)
>>
>> "apparently", maybe. Let's for a moment leave aside the radicals themselves, which are to a large extent artificial constructs.
>
> I do stipulate not being a CJK expert. But those are indeed different due to their origins, however similar their shapes are.

Except for the radicals themselves, I haven't found a contrasting pair. 
What I think we would need to find to influence the current 
argumentation (except for general "history is important", which I think 
we all agree) is a case of a character that originally existed both with 
a MEAT radical and a MOON radical, but has only a single usage. Then 
whether there were one or two code points would provide an analog for 
the situation we have at hand.

Also note that there is a difference in meaning. The characters with 
MEAT radicals mostly refer to body parts and organs. The characters with 
MOON radicals are mostly time-related.

>> Let's look at the actual characters with these radicals (e.g. U+6709,... for MOON and U+808A,... for MEAT), in the multi-column code charts of ISO 10646. There are some exceptions, but in most cases, the G/J/K columns show no difference (i.e. always the ⺝ shape, with two horizontal bars), whereas the H/T/V columns show the ⺼ shape (two downwards slanted bars) for the "MEAT" radical and the ⺝ shape for the moon radical. So whether these radicals have identical glyphs depends on typographic tradition/font/…
>
> They are still always very similar, right?

Similarity is in the eye of the beholder (or the script).

Sometimes, a little dot or hook is irrelevant. Sometimes it's the single 
difference that makes it a totally different character.

>> In Japan, many people may be rather unaware of the difference, whereas in Taiwan, it may be that school children get drilled on the difference.
>
> That’s interesting.

Not necessarily for the poor Taiwanese students, and not necessarily for 
the Japanese who try to find a character in a dictionary ordered by 
radical :-(.

>>> Changing to a different font in order to change one or two glyphs is a mechanism that we have actually rejected many times in the past. We have encoded variant and alternate characters for many scripts.
>>
>> Well, yes, rejected many times in cases where that was appropriate. But also accepted many times, in cases that we may not even remember, because they may not even have been made explicitly.
>
> Do come up with examples if you have any.

I had the following in mind:

>> The roman/italic a/ɑ and g/ɡ distinctions (the later code points only used to show the distinction in plain text, which could as well be done descriptively),
>
> Aa and Ɑɑ are used contrastively for different sounds in some languages and in the IPA. Ɡɡ is not, to my knowledge, used contrastively with Gg (except that ɡ can only mean /ɡ/, while orthographic g can mean /ɡ/, /dʒ/, /x/ etc. But g vs ɡ is reasonably analogous to �� and <lig>����</lig> being used for /juː/.

The contrastive use *in some languages or notations* (IPA) is the reason 
these are separately encoded. The fact that these are not contrastively 
used in most major languages is responsible for the fact that they don't 
use different code points when used in these languages. It would be a 
real hassle to have to change from g to ɡ when switching e.g. from Times 
Roman to Times Italic.

In Deseret, we are still missing any contrastive usage, so that suggests 
to be careful with encoding.

>> as well as a large number of distinctions in Han fonts, come to my mind.

It's difficult to show these distinctions, because they are NOT 
separately encoded, but three-stroke and four-stroke grass radical is 
the most well known.

> And the same goes for the /juː/ ligatures. The word tube /tjuːb/ can be written TYŪB �������� or ������ or ��<����>��. But the unligated the sequences would be pronounced differently: �������� /tjuːb/ and �������� /tɪuːb/ and �������� /tɪʊb/.

Ah, I see. So we seem to have five different ways (counting the two 
ligature variants) of writing the same word, with three different 
pronunciations. The important question is whether the two ligatures do 
imply any difference in pronunciation (as opposed to time of writing or 
author/printer preference), i.e. whether the ligated sequences ������ or 
��<����>�� are pronounced differently (not by a phonologist but by an 
average user).

>> Is the choice of variant up to the author (for which variants), or is it the editor or printer who makes the choice (for which variants)?
>
> In a handwritten manuscript obviously the choice is the author’s. As to historical printing, printers may have

Did you want to write something more here?

>> And what informs this choice? If we have any historic metal types, are there examples where a font contains both ligature variants?
>
> Ken Beesley have samples of a metal font (the 1857 St Luois punches) which had both �� and ����; I don’t know what other sorts were in that font.

As I explained in another post, that may just be a 1855/1859 hybrid.

Regards,   Martin.