Standaridized variation sequences for the Desert alphabet?

Wed Mar 29 09:04:21 CDT 2017

Martin,

thanks for the careful summary.

As in all these cases it is possible to argue from different premises, 
so I would, unfortunately, not expect that this discussion will reach 
the consensus of all parties.

In the end, Unicode is made for the modern user, whether they are native 
users of a script, or modern users archiving or discussing historic texts.

The specific principles used in each encoding decision matter, but only 
insofar as the result works for the modern (and future!) users of the 
standard.

A./

PS: as to modern use of Fraktur -- many fonts for black-letter logos are 
modified to help modern readers recognize the words.

On 3/29/2017 3:12 AM, Martin J. Dürst wrote:
> Hello everybody,
>
> Let me start with a short summary of where I think we are at, and how 
> we got there.
>
> - The discussion started out with two letters,
>   with two letter forms each. There is explicit talk of the
>   40-letter alphabet and glyphs in the Wikipedia page, not
>   of two different letters.
> - That suggests that IF this script is in current use, and the
>   shapes for these diphthongs are interchangeable (for those
>   who use the script day-to-day, not for meta-purposes such
>   as historic and typographic texts), keeping things unified
>   is preferable.
> - As far as we have heard (in the course of the discussion,
>   after questioning claims made without such information),
>   it seems that:
>   - There may not be enough information to understand how the
>     creators and early users of the script saw this issue,
>     on a scale that may range between "everybody knows these
>     are the same, and nobody cares too much who uses which,
>     even if individual people may have their preferences in
>     their handwriting" to something like "these are different
>     choices, and people wouldn't want their texts be changed
>     in any way when published".
>   - Similarly, there seem to be not enough modern practitioners
>     of the script using the ligatures that could shed any
>     light on the question asked in the previous item in a
>     historical context, first apparently because there are not
>     that many modern practitioners at all, and second because
>     modern practitioners seem to prefer spelling with
>     individual letters rather than using the ligatures.
> - IF the above is true, then it may be that these ligatures
>   are mostly used for historic purposes only, in which case
>   it wouldn't do any harm to present-day users if they were separated.
>
> If the above is roughly correct, then it's important that we reached 
> that conclusion after explicitly considering the potential of a split 
> to create inconvenience and confusion for modern practitioners, not 
> after just looking at the shapes only, coming up with separate 
> historical derivations for each of them, and deciding to split because 
> history is way more important than modern practice.
>
> In that light, some more comments lower down.
>
> On 2017/03/28 22:56, Michael Everson wrote:
>> On 28 Mar 2017, at 11:39, Martin J. Dürst <duerst at it.aoyama.ac.jp> 
>> wrote:
>
>> An æ ligature is a ligature of a and of e. It is not some sort of 
>> pretzel.
>
> Yes. But it's important that we know that because we have been faced 
> with many cases where "æ" and "ae" were used interchangeably. For 
> somebody not knowing the (extended) Latin alphabet and its usages, 
> they might easily see more of a pretzel and less of 'a' and 'e'. I 
> might try some experiments with some of my students (although I'm 
> using "formulæ" in my lecture notes, and so they might already be too 
> familiar with the "æ").
>
> Also, if it were the case that shapes like "æ" and "œ" were used 
> interchangeably across all uses of the Latin alphabet, I'm quite sure 
> we would encode it with one code point rather than two, even if some 
> researchers might claim that the later was derived from an "o" rather 
> than an "ɑ", or even if we knew it was derived from an "o" (as we know 
> for the ß).
>
>
>> What Deseret has is this:
>>
>> 10426 DESERET CAPITAL LETTER LONG OO WITH STROKE
>>     * officially named “ew” in the code chart
>>     * used for ew in earlier texts
>> 10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE
>>     * officially named “oi” in the code chart
>>     * used for oi in earlier texts
>> 1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE
>>     * used for oi in later texts
>> 1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE
>>     * used for ew in later texts
>
> Currently, it has this:
>
> 10426 �� DESERET CAPITAL LETTER OI
>
> 10427 �� DESERET CAPITAL LETTER EW
>
> My personal opinion is that names are mostly hints, and not too much 
> should be read into them, but if anything, the names in the current 
> charts would suggest that the encoding is for the 39th/40th letter of 
> the Deseret alphabet, whatever its shape, not for some particular shape.
>
> And you know as well as I do that we can't change names. So if we 
> split, we might end up with something like:
>
> 10426 �� DESERET CAPITAL LETTER OI
>
> 10427 �� DESERET CAPITAL LETTER EW
>
> 1xxxx <����> DESERET CAPITAL LETTER VARIANT OI
>
> 1xxxx <����> DESERET CAPITAL LETTER VARIANT EW
>
>
>> Don’t go trying to tell me that LONG OO WITH STROKE and SHORT OO WITH 
>> STROKE are glyph variants of the same character.
>>
>> Don’t go trying to tell me that LONG AH WITH STROKE and SHORT AH WITH 
>> STROKE are glyph variants of the same character.
>
> We have just established that there are no characters with such names 
> in the standard. It's not the names or the history that I'm arguing.
>
>
>> To do so is to show no understanding of the history of writing 
>> systems at all.
>
> What I'd agree to is that cases where shapes with different historical 
> origins merge and get treated as one and the same character are quite 
> a lot rarer than cases where they don't merge. But we have seen cases 
> where such a merge happens. ß is one of them. There are quite a few in 
> Han (not surprising because there are tons of ideographs there to 
> begin with).
>
> But that experience doesn't mean that we have to rush to a conclusion 
> without examining as much of the evidence as we can get hold of.
>
>
>> You’re smarter than that. So are Asmus and Mark and Erkki and any of 
>> the other sceptics who have chimed in here.
>
> Skepticism is when presented with options without background facts is 
> a virtue in my opinion.
>
>
>>> And as for precedent, the fact that we have encoded a lot of 
>>> characters in Unicode doesn't mean that we can encode more 
>>> characters without checking each and every single case very 
>>> carefully, as we are doing in this discussion.
>>
>> The UTC encodes a great many characters without checking them at all, 
>> or even offering documentation on them to SC2. Don’t think we haven’t 
>> observed this.
>
> As for BROCCOLI that you mention later and other emoji, first I would 
> like to make clear that I don't use emoji personally nor do I push for 
> their encoding.
>
> But what's important for the discussion at hand is that when it comes 
> to emoji, the question of whether we should unify or disunify BROCCOLI 
> and CAULIFLOWER (just a hypothetical example) isn't as important. 
> That's because there is no preexisting user community that would be 
> seriously inconvenienced the way it would happen if we suddenly 
> disunified the ſs/ſz ligature, or suddenly unified "æ" and "œ". Emoji 
> are a hopeless hodgepodge, where users click on what they see, and 
> hope that it shows close enough to what they meant at the other end or 
> after a few years.
>
>
>>> Of course they will easily see different shapes, but what's 
>>> important isn't the shapes, it's what they associate it with. If for 
>>> them, it's just two shapes for one and the same 40th letter of the 
>>> Deseret alphabet, then that is a strong suggestion for not encoding 
>>> separately, even if the shapes look really different.
>>
>> Martin, there is no answer to this unless you can read the minds of 
>> people who are dead a century or more.
>
> Thanks for telling us, finally.
>
>
>>> To use another analogy, many people these days (me included) would 
>>> have difficulties identifying Fraktur letters, in particular if they 
>>> show up just as individual letters.
>>
>> I do not believe you.
>
> It's true. When younger, I tried to read some old books written in 
> Fraktur. It was hard work. Most of the lower letters were okay, but 
> the ſ and the f were easy to confuse, and the k is also confusing. A 
> lot of guessing was needed for upper case. I'm quite sure most people 
> these days couldn't easily identify upper case letters in isolation. 
> Of course, context helps a lot.
>
>> If this were true menus in restaurants and public signage on shops 
>> wouldn’t have Fraktur at all. It’s true that sometimes the 
>> orthography on such things is bad, as where they don’t use ligatures 
>> correctly or the ſ at all.
>
> Shops and newspapers (e.g. NYT) and the like rely a lot on a logo 
> effect. And the situation may be slightly different in Germany and in 
> Switzerland.
>
>> I’ll stipulate that few Germans can read Sütterlin or similar hands. :-)
>
> Definitely agreed!
>
>
>> On 28 Mar 2017, at 11:59, Mark Davis ☕️ <mark at macchiato.com> wrote:
>>
>>> I agree with Martin.
>
>>> Simply because someone used a particular shape at some time to mean 
>>> a letter doesn't mean that Unicode should encode a letter for that 
>>> shape.
>>
>> Coming to a forum like this out of a concern for the corpus of 
>> Deseret literature is not some sort of attempt to encode things for 
>> encoding’s sake.
>
> And coming to a discussion like this out of a concern for modern 
> practitioners of the script (even if it seems, after a lot of 
> discussion, that there aren't that many of these, and the issue at 
> hand may indeed not concern them that much) is not some sort of 
> attempt to unify things for unification's sake.
>
>
> Regards,    Martin.
>