Standaridized variation sequences for the Desert alphabet?

Michael Everson everson at evertype.com
Wed Mar 29 08:08:59 CDT 2017


Martin,

It’s as though you’d not participated in this work for many years, really. 

> On 29 Mar 2017, at 11:12, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
> 
> Hello everybody,
> 
> Let me start with a short summary of where I think we are at, and how we got there.
> 
> - The discussion started out with two letters, with two letter forms each. There is explicit talk of the 40-letter alphabet and glyphs in the Wikipedia page, not of two different letters.

SO WHAT? Alphabets have “letters” in them. “Letters” are not “characters”. In Welsh, “ch” and “dd” and “ll” are “letters”. 

> - That suggests that IF this script is in current use,

You don’t even know? You’re kidding, right?

> and the shapes for these diphthongs are interchangeable

It does NOT “suggest” that at all. 

> (for those who use the script day-to-day, not for meta-purposes such as historic and typographic texts), keeping things unified is preferable.

Deseret was a spelling reform replacement alphabet used for a period of time by the Mormons in what is now Utah. It is structurally very similar to Pitman’s Phonotypic alphabets. Alphabets. There were many revisions of those. Some of them used letterforms we have encoded today, for IPA for instance. Some used letterforms we’d hardly recognize, and we’d never, ever consider them to be glyph variants of the IPA letters. 

> - As far as we have heard (in the course of the discussion, after questioning claims made without such information), it seems that:

Yeah, it doesn’t “seem” anything but a whole lot of special pleading to bolster your rigid view that the glyphs in question can be interchangeable because of the sounds they may represent. 

>  - There may not be enough information to understand how the creators and early users of the script saw this issue, 

Um, yeah. As if there were for Phoenician, or Luwian hieroglyphs, right?

> on a scale that may range between "everybody knows these are the same, and nobody cares too much who uses which, even if individual people may have their preferences in their handwriting" to something like "these are different choices, and people wouldn't want their texts be changed in any way when published”.

We know what the diphthongs were. We know that the script had a spelling reform where some characters were abandoned in favour of other characters. There was at least one font wh

And there is lots of handwriting in which people write what they want to write, in the non-Latin alphabet they learned. 

As far as your guessing what people had in their minds about what they were writing, and as to your speculation about what the very few printers who had Deseret type might have done with such manuscripts, well, it is all reine Phantasie on your part. 

Oh! Look! There was a spelling reform. I should write “Fantasie”, shouldn’t I? Wait! I can have spell-check dictionaries suit my preference! Wow! That’s amazing!

>  - Similarly, there seem to be not enough modern practitioners of the script using the ligatures that could shed any light on the question asked in the previous item in a historical context,

Completely irrelevant. Nobody worried about the number of modern users of the Insular letters we encoded. Why put such a constraints on users of Deseret? Ꝺꝺ Ꝼꝼ Ᵹᵹ Ꝿ Ꞃꞃ Ꞅꞅ Ꞇꞇ. 

> first apparently because there are not that many modern practitioners at all, and second because modern practitioners seem to prefer spelling with individual letters rather than using the ligatures.

This is equally ridiculous. John Jenkins chooses not write the digraphs in the works which he transcribed, because that’s what *he* chooses. He doesn’t speak for anyone else who may choose to write in Deseret, and your assumption that “modern practitioners” do this is groundless. 

It also ignores the fact that the script had a reform and that the value of separate encodings for the various characters is of value to those studying the provenance and orthographic practices of those who wrote Deseret when it was in active use. 

This is exactly the same thing as the medievalist Latin abbreviation and other characters we encoded. There is neither sense nor logic nor utility in trying to argue for why editors of Deseret documents shouldn’t have the same kinds of tools that medievalists have. And as far as medievalist concerns go, many of the characters are used by relatively few researchers. Some of the characters we encoded are used all over Europe at many times. Some are used only by Nordicists, some by Celticists, and some by subsets within the Nordicist and Celticist communities. 

> - IF the above is true, then it may be that these ligatures are mostly used for historic purposes only, in which case it wouldn't do any harm to present-day users if they were separated.

Harm? What harm? Recently the UTC looked at a proposal for capital letters for ʂ and ʐ. Evidence for their existence was shown. One person on the call to the UTC said he didn’t think anyone needed them. Two of us do need them. I needed them last weekend and I had to use awkward workarounds. They weren’t accepted. There wasn’t any good rationale for the rejection. I mean, the letters exist. Case is a normal function of the script. But they weren’t accepted. For the guy who didn’t think he needed them, well, so what? If they’re encoded, he doesn’t have to use them. 

Harm to present-day users? I agree with you. Any modern-day user creating new texts who doesn’t like to use the diphthong letters doesn’t have to use them. Any modern-day user trying to represent historic texts accurately, however, can’t, because not all the letters are encoded. 

> If the above is roughly correct, then it's important that we reached that conclusion after explicitly considering the potential of a split to create inconvenience and confusion for modern practitioners,

People who use Deseret use it to for historical purposes and for cultural reasons. Everybody in Utah reads English in standard Latin orthography. 

> not after just looking at the shapes only, coming up with separate historical derivations for each of them, and deciding to split because history is way more important than modern practice.

I didn’t “come up” with separate historical derivations for the four characters in question. It is entirely obvious that LONG AH, SHORT AH, LONG OO, and SHORT OO are variously combined with the stroke of SHORT I. 

Entirely obvious. There is no other interpretation. 

> In that light, some more comments lower down.
> 
> On 2017/03/28 22:56, Michael Everson wrote:
>> On 28 Mar 2017, at 11:39, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
> 
>> An æ ligature is a ligature of a and of e. It is not some sort of pretzel.
> 
> Yes. But it's important that we know that because we have been faced with many cases where "æ" and "ae" were used interchangeably.

Irrelevant. This is just spelling. It’s no different than colour/color or maximize/maximise or aluminium/aluminum. 

> For somebody not knowing the (extended) Latin alphabet and its usages, they might easily see more of a pretzel and less of 'a' and 'e'. I might try some experiments with some of my students (although I'm using "formulæ" in my lecture notes, and so they might already be too familiar with the "æ”).

You have missed the point fabulously. The point was that the æ ligature can be easily identified as being made of A and of E. And the four Deseret characters can easily be identified as being made of LONG AH, SHORT AH, LONG OO, and SHORT OO with the stroke of SHORT I. 

> Also, if it were the case that shapes like "æ" and "œ" were used interchangeably across all uses of the Latin alphabet, I'm quite sure we would encode it with one code point rather than two, even if some researchers might claim that the later was derived from an "o" rather than an "ɑ", or even if we knew it was derived from an "o" (as we know for the ß).

I don’t agree, and there are hundreds of 

>> What Deseret has is this:
>> 
>> 10426 DESERET CAPITAL LETTER LONG OO WITH STROKE
>> 	* officially named “ew” in the code chart
>> 	* used for ew in earlier texts
>> 10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE
>> 	* officially named “oi” in the code chart
>> 	* used for oi in earlier texts
>> 1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE
>> 	* used for oi in later texts
>> 1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE
>> 	* used for ew in later texts
> 
> Currently, it has this:
> 
> 10426 �� DESERET CAPITAL LETTER OI
> 
> 10427 �� DESERET CAPITAL LETTER EW

You are being deliberately obtuse. Note that I stated clearly “officially named ‘ew/oi’ in the code chart”. 

> My personal opinion is that names are mostly hints, and not too much should be read into them, 

I do not share this opinion.

> but if anything, the names in the current charts would suggest that the encoding is for the 39th/40th letter of the Deseret alphabet, whatever its shape, not for some particular shape.

You make too much of these numbers, but then there are charts of the 38-letter alphabet and charts of the 40-letter alphabet, but those numbers have to do with the number of English phonemes represented in Phonotypy and in Deseret, and with the augmentation of that by the addition of letters which represent phonemes. 

> And you know as well as I do that we can't change names. So if we split, we might end up with something like:
> 
> 10426 �� DESERET CAPITAL LETTER OI
> 
> 10427 �� DESERET CAPITAL LETTER EW
> 
> 1xxxx <����> DESERET CAPITAL LETTER VARIANT OI
> 
> 1xxxx <����> DESERET CAPITAL LETTER VARIANT EW

I’m pretty sure we will propose the names LONG AH WITh STROKE and SHORT OO WITH STROKE. The two un-encoded characters are used for the *diphthongs* oi and ew but they are not “variants” of the other letters. 

We do not require matching names here. Compare LATIN LETTER YR and LATIN LETTER SMALL CAPITAL R. Compare LATIN CAPITAL LETTER HWAIR and LATIN SMALL LETTER HV. 

>> Don’t go trying to tell me that LONG OO WITH STROKE and SHORT OO WITH STROKE are glyph variants of the same character.
>> 
>> Don’t go trying to tell me that LONG AH WITH STROKE and SHORT AH WITH STROKE are glyph variants of the same character.
> 
> We have just established that there are no characters with such names in the standard. It's not the names or the history that I'm arguing.

You’re being obtuse again. Fine. 

Don’t go trying to tell me that EW and SHORT OO WITH STROKE are glyph variants of the same character.

Don’t go trying to tell me that LONG AH WITH STROKE and OI are glyph variants of the same character.

They’re not. The origin of all those letterforms is obvious, and we do not encode sounds, we encode the elements of writing systems. 

>> To do so is to show no understanding of the history of writing systems at all.
> 
> What I'd agree to is that cases where shapes with different historical origins merge and get treated as one and the same character are quite a lot rarer than cases where they don't merge. 

They didn’t merge in Deseret. They had a reform, removing some characters and adding some other characters. 

> But we have seen cases where such a merge happens. ß is one of them.

That’s even arguable because ſʒ only really occurs in the whole-font Fraktur style. It’s pretty rare to see it in Antiqua. Of course it must be attested there, but it’s by no means common. 

> There are quite a few in Han (not surprising because there are tons of ideographs there to begin with).
> 
> But that experience doesn't mean that we have to rush to a conclusion without examining as much of the evidence as we can get hold of.

I haven’t rushed to a conclusion. I’ve made a thorough analysis. 

>> You’re smarter than that. So are Asmus and Mark and Erkki and any of the other sceptics who have chimed in here.
> 
> Skepticism is when presented with options without background facts is a virtue in my opinion.

Your argument seemed to be based solely on the use of the letters for the sounds, ignoring the historical derivation and the facts of the spelling reform in Deseret. 

>> The UTC encodes a great many characters without checking them at all, or even offering documentation on them to SC2. Don’t think we haven’t observed this.
> 
> As for BROCCOLI that you mention later and other emoji, first I would like to make clear that I don't use emoji personally nor do I push for their encoding.

I *do* use emoji and I have devised many emoji which are now in current use. I do find that the process for adding symbols to the UCS (which is not the same thing as giving symbols the emoji property) is not functioning particularly well at present. 

> But what's important for the discussion at hand is that when it comes to emoji, the question of whether we should unify or disunify BROCCOLI and CAULIFLOWER (just a hypothetical example) isn't as important.

Eventually we will have CABBAGE, and then some people will need to use ZWJ to join CABBAGE and KNIFE so that sauerkraut can be represented, and then other people will need to use ZWJ to join CABBAGE and HOT PEPPER for kimchi, and in Ireland we’ve got bacon and cabbage of course, and...

Heh. 

> That's because there is no preexisting user community that would be seriously inconvenienced the way it would happen if we suddenly disunified the ſs/ſz ligature, or suddenly unified "æ" and "œ". Emoji are a hopeless hodgepodge, where users click on what they see, and hope that it shows close enough to what they meant at the other end or after a few years.

No one using Deseret will be inconvenienced by adding additional historical characters for the already historical script. Anyone using modern Deseret fonts *would* be inconvenience by unifying the LONG-AH-WITH-STROKE and SHORT-AH-WITH-STROKE characters and the LONG-OO-WITH-STROKE and SHORT-OO-WITH-STROKE characters, I think. No current fonts that I know of have the 1859 glyphs, apart from private fonts Ken Beesley used for his own work. 

>>> Of course they will easily see different shapes, but what's important isn't the shapes, it's what they associate it with. If for them, it's just two shapes for one and the same 40th letter of the Deseret alphabet, then that is a strong suggestion for not encoding separately, even if the shapes look really different.
>> 
>> Martin, there is no answer to this unless you can read the minds of people who are dead a century or more.
> 
> Thanks for telling us, finally.

What on earth do you mean? I have withheld no secrets. I’ve objected to your wilful unification of characters with obviously different origins. 

>>> To use another analogy, many people these days (me included) would have difficulties identifying Fraktur letters, in particular if they show up just as individual letters.
>> 
>> I do not believe you.
> 
> It's true. When younger, I tried to read some old books written in Fraktur. It was hard work. Most of the lower letters were okay, but the ſ and the f were easy to confuse, and the k is also confusing. A lot of guessing was needed for upper case. I'm quite sure most people these days couldn't easily identify upper case letters in isolation. Of course, context helps a lot.

It’s not the easiest thing but it does not take all that much to accustom oneself to it. 

>> If this were true menus in restaurants and public signage on shops wouldn’t have Fraktur at all. It’s true that sometimes the orthography on such things is bad, as where they don’t use ligatures correctly or the ſ at all.
> 
> Shops and newspapers (e.g. NYT) and the like rely a lot on a logo effect. And the situation may be slightly different in Germany and in Switzerland.

People can read the menus and the public signage nevertheless. Fraktur is not so unbelievably different that it’s entirely opaque. 

>> I’ll stipulate that few Germans can read Sütterlin or similar hands. :-)
> 
> Definitely agreed!

I learned to write Sütterlin. Going back and reading something written takes work too… 

> 
> 
>> On 28 Mar 2017, at 11:59, Mark Davis ☕️ <mark at macchiato.com> wrote:
>> 
>>> ​I agree with Martin.
> 
>>> Simply because someone used a particular shape at some time to mean a letter doesn't mean that Unicode should encode a letter for that shape.
>> 
>> Coming to a forum like this out of a concern for the corpus of Deseret literature is not some sort of attempt to encode things for encoding’s sake.
> 
> And coming to a discussion like this out of a concern for modern practitioners of the script (even if it seems, after a lot of discussion, that there aren't that many of these, and the issue at hand may indeed not concern them that much) is not some sort of attempt to unify things for unification's sake.

I think you made a lot of assumptions about “modern practitioners” which you didn’t disclose.

A proposal will be forthcoming. I want to thank several people who have written to me privately supporting my position with regard to this topic on this list. I can only say that supporting me in public is more useful than supporting me in private. 

Michael


More information about the Unicode mailing list