Standaridized variation sequences for the Desert alphabet?

Martin J. Dürst duerst at it.aoyama.ac.jp
Tue Mar 28 05:39:13 CDT 2017


Hello Michael, others,

On 2017/03/27 21:07, Michael Everson wrote:
> On 27 Mar 2017, at 06:42, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>
>>> The characters in question have different and undisputed origins, undisputed.
>>
>> If you change that to the somewhat more neutral "the shapes in question have different and undisputed origins", then I'm with you. I actually have said as much (in different words) in an earlier post.
>
> And what would the value of this be? Why should I (who have been doing this for two decades) not be able to use the word “character” when I believe it correct? Sometimes you people who have been here for a long time behave as though we had no precedent, as though every time a character were proposed for encoding it’s as thought nothing had ever been encoded before.

I didn't say that you have to change words. I just said that I could 
agree to a slightly differently worded phrase.

And as for precedent, the fact that we have encoded a lot of characters 
in Unicode doesn't mean that we can encode more characters without 
checking each and every single case very carefully, as we are doing in 
this discussion.


> The sharp s analogy wasn’t useful because whether ſs or ſz users can’t tell either and don’t care.

Sorry, but that was exactly the point of this analogy. As to "can't 
tell", it's easy to ask somebody to look at an actual ß letter and say 
whether the right part looks more like an s or like a z. On the other 
hand, users of Deseret may or may not ignore the difference between the 
1855 and 1859 shapes when they read. Of course they will easily see 
different shapes, but what's important isn't the shapes, it's what they 
associate it with. If for them, it's just two shapes for one and the 
same 40th letter of the Deseret alphabet, then that is a strong 
suggestion for not encoding separately, even if the shapes look really 
different.


> No Fraktur fonts, for instance, offer a shape for U+00DF that looks like an ſs. And what Antiiqua fonts do, well, you get this:
>
> https://en.wikipedia.org/wiki/%C3%9F#/media/File:Sz_modern.svg

Yes. And we are just starting to collect evidence for Deseret fonts.


> And there’s nothing unrecognizable about the ſɜ (< ſꝫ (= ſz)) ligature there.

Well, not to somebody used to it. But non-German users quite often use a 
Greek β where they should use a ß, so it's no surprise people don't 
distinguish the ſs and ſz derived glyphs.


> The situation in Deseret is different.

The graphic difference is definitely bigger, so to an outsider, it's 
definitely quite impossible to identify the pairs of shapes. But that 
does in no way mean that these have to be seen as different characters 
(rather than just different glyphs) by insiders (actual users).

To use another analogy, many people these days (me included) would have 
difficulties identifying Fraktur letters, in particular if they show up 
just as individual letters. Similar for many fantasy fonts, and for 
people not very familiar with the Latin script.


> Underlying ligature difference is indicative of character identity. Particularly when two resulting ligatures are SO different from one another as to be unrecognizable. And that is the case with EW on the left and OI on the right here:
> https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg
>
> The lower two letterforms are in no way “glyph variants” of the upper two letterforms. Apart from the stroke of the SHORT I �� they share nothing in common — because they come from different sources and are therefore different characters.

The range of what can be a glyph variant is quite wide across scripts 
and font styles. Just that the shapes differ widely, or that the origin 
is different, doesn't make this conclusive.


> Character origin is intimately related to character identity.

In most cases, yes. But it's not a given conclusion.


> I don’t think that ANY user of Deseret is all that “average”. Certainly some users of Deseret are experts interested in the script origin, dating, variation, and so on — just as we have medievalists who do the same kind of work. I’m about to publish a volume full of characters from Latin Extended-D. My work would have been impossible had we not encoded those characters.

No, your work wouldn't be impossible. It might be quite a bit more 
difficult, but not impossible. I have written papers about Han 
ideographs and Japanese text processing where I had to create my own 
fonts (8-bit, with mostly random assignments of characters because these 
were one-off jobs), or fake things with inline bitmap images (trying to 
get information on the final printer resolution and how many black 
pixels wide a stem or crossbar would have to be to avoid dropouts, and 
not being very successful).

I have heard the argument that some character variant is needed because 
of research, history,... quite a few times. If a character has indeed 
been historically used in a contrasting way, this is definitely a good 
argument for encoding. But if a character just looked somewhat different 
a few (hundreds of) years ago, that doesn't make such a good argument. 
Otherwise, somebody may want to propose new codepoints for Bodoni and 
Helvetica,...


Regards,    Martin.


More information about the Unicode mailing list