Standaridized variation sequences for the Desert alphabet?
Martin J. Dürst
duerst at it.aoyama.ac.jp
Thu Apr 6 02:01:38 CDT 2017
[I started to write this mail quite some time ago. I decided to try to
let things cool down a bit by waiting a day or two, but it has become
more than a week now.]
On 2017/03/29 22:08, Michael Everson wrote:
> It’s as though you’d not participated in this work for many years, really.
Well, looking back, my time commitment to Unicode has definitely varied
over the years. But that might be true for everybody.
What's more important is that Unicode covers such a wide range of areas,
and not everybody has the same experience or knowledge. If we did, we
wouldn't need to work together; it would be okay to just have one of us.
Indeed, what's really very valuable and interesting in this work is the
many very varied backgrounds and experiences everybody has.
In addition to variations in background, we also have a wide variety of
ways of thinking, e.g. ranging from abstract to concrete, and so on.
>> On 29 Mar 2017, at 11:12, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>> - That suggests that IF this script is in current use,
> You don’t even know? You’re kidding, right?
Everything is relative. And without being part of the user community,
it's difficult to make any guesses.
>> - As far as we have heard (in the course of the discussion, after questioning claims made without such information), it seems that:
> Yeah, it doesn’t “seem” anything but a whole lot of special pleading to bolster your rigid view that the glyphs in question can be interchangeable because of the sounds they may represent.
I don't remember every claiming that the glyphs must be used
interchangeably, only that we should carefully examine whether they are
or not, and that because they represent the same sound (in a phonetic
alphabet, as it is) and are shown in the same position in alphabet
tables, we shouldn't a priori exclude such a possibility.
>> - There may not be enough information to understand how the creators and early users of the script saw this issue,
> Um, yeah. As if there were for Phoenician, or Luwian hieroglyphs, right?
Well, there's well over an order of magnitude difference in the time
scales involved. The language that Deseret is used to write is still in
active use, including in this very discussion. Quite different from
Phoenician or Luwian hieroglyphs.
In addition, we have meta-information such as alphabet tables, which we
may not have for the scripts you mention, as well as the fact that
printing technology may have forced a better identification of what's a
character and what not than inscriptions and other older technologies.
>> - Similarly, there seem to be not enough modern practitioners of the script using the ligatures that could shed any light on the question asked in the previous item in a historical context,
> Completely irrelevant. Nobody worried about the number of modern users of the Insular letters we encoded. Why put such a constraints on users of Deseret? Ꝺꝺ Ꝼꝼ Ᵹᵹ Ꝿ Ꞃꞃ Ꞅꞅ Ꞇꞇ.
Because it's modern users, and future users, not users some hundred
years or so ago, that will use the encoding. In the case of Insular
letters, my guess is that nobody wants to translate/transcribe xkcd, for
example, whereas there is such a transcription for Deseret:
>> first apparently because there are not that many modern practitioners at all, and second because modern practitioners seem to prefer spelling with individual letters rather than using the ligatures.
> This is equally ridiculous. John Jenkins chooses not write the digraphs in the works which he transcribed, because that’s what *he* chooses. He doesn’t speak for anyone else who may choose to write in Deseret, and your assumption that “modern practitioners” do this is groundless.
Most readers and writers of Deseret today use the shapes that are in
their fonts, which are those in the Unicode charts, and most texts
published today don’t use the EW and OI ligatures at all, because that’s
John Jenkins’ editorial practice.
So I was wrong to write "modern practitioners", and should have written
"modern publishers" or "modern published texts". Or is the impression
that I get from what you wrote above wrong that most texts published
these days are edited by John, or by people following his practice?
> It also ignores the fact that the script had a reform and that the value of separate encodings for the various characters is of value to those studying the provenance and orthographic practices of those who wrote Deseret when it was in active use.
I don't remember denying the value of separate encodings for historic
research. I only wanted to make sure that present-day use isn't
inconvenienced to make historic research easier. If the claims are
correct that present-day usage is mostly a reconstruction based on the
Unicode encoding and the Unicode sample glyphs, then I'm fine with
helping historic research.
> This is exactly the same thing as the medievalist Latin abbreviation and other characters we encoded. There is neither sense nor logic nor utility in trying to argue for why editors of Deseret documents shouldn’t have the same kinds of tools that medievalists have. And as far as medievalist concerns go, many of the characters are used by relatively few researchers. Some of the characters we encoded are used all over Europe at many times. Some are used only by Nordicists, some by Celticists, and some by subsets within the Nordicist and Celticist communities.
Maybe, maybe not. If e.g. somebody came and said that they wanted to
disunify the ſs and ſz ligatures for (German) ß in order to better
analyze some old manuscripts, and the modern users from hereon had to
make sure they used the right one depending on the font they used, then
I'm sure a lot of Germans would complain quite clearly, because it
would make their current use more complicated.
>> - IF the above is true, then it may be that these ligatures are mostly used for historic purposes only, in which case it wouldn't do any harm to present-day users if they were separated.
> Harm? What harm? Recently the UTC looked at a proposal for capital letters for ʂ and ʐ. Evidence for their existence was shown. One person on the call to the UTC said he didn’t think anyone needed them. Two of us do need them. I needed them last weekend and I had to use awkward workarounds. They weren’t accepted. There wasn’t any good rationale for the rejection. I mean, the letters exist. Case is a normal function of the script. But they weren’t accepted. For the guy who didn’t think he needed them, well, so what? If they’re encoded, he doesn’t have to use them.
I have no idea what the reasons for this were, because I wasn't involved
in the discussion.
>> If the above is roughly correct, then it's important that we reached that conclusion after explicitly considering the potential of a split to create inconvenience and confusion for modern practitioners,
> People who use Deseret use it to for historical purposes and for cultural reasons. Everybody in Utah reads English in standard Latin orthography.
I haven't been in Utah except for a one-time flight change in Salt Lake
City more than 10 years ago. So please don't assume that everybody on
this list know the state of usage for all the scripts that get discussed.
>> not after just looking at the shapes only, coming up with separate historical derivations for each of them, and deciding to split because history is way more important than modern practice.
> I didn’t “come up” with separate historical derivations for the four characters in question.
I didn't mean "come up" in the sense of "make up out of thin air", but
in the sense of "discover". If it wasn't you but somebody else who
discovered these derivations, please let us know.
>> On 2017/03/28 22:56, Michael Everson wrote:
>>> On 28 Mar 2017, at 11:39, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>>> An æ ligature is a ligature of a and of e. It is not some sort of pretzel.
>> Yes. But it's important that we know that because we have been faced with many cases where "æ" and "ae" were used interchangeably.
> Irrelevant. This is just spelling. It’s no different than colour/color or maximize/maximise or aluminium/aluminum.
Whether we use "æ" or "ae" is indeed a matter of spelling. But I meant
something else, namely that we know that what may look like a "pretzel"
to the uninitiated is a ligature of 'a' and 'e' exactly because we use
it as a spelling variant for "ae".
>>> What Deseret has is this:
>>> 10426 DESERET CAPITAL LETTER LONG OO WITH STROKE
>>> * officially named “ew” in the code chart
>>> * used for ew in earlier texts
>>> 10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE
>>> * officially named “oi” in the code chart
>>> * used for oi in earlier texts
>>> 1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE
>>> * used for oi in later texts
>>> 1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE
>>> * used for ew in later texts
>> Currently, it has this:
>> 10426 DESERET CAPITAL LETTER OI
>> 10427 DESERET CAPITAL LETTER EW
> You are being deliberately obtuse. Note that I stated clearly “officially named ‘ew/oi’ in the code chart”.
Well, if you think I'm deliberately obtuse, then I'd have to say that I
think you're (deliberately?) obscure. You repeat hypothetical,
non-existing names such as "DESERET CAPITAL LETTER LONG OO WITH STROKE"
over and over, using capitals to make then look like the actual names,
and bury the actual names (such as "DESERET CAPITAL LETTER OI") by
shortening and lowercasing them.
> Don’t go trying to tell me that EW and SHORT OO WITH STROKE are glyph variants of the same character.
> Don’t go trying to tell me that LONG AH WITH STROKE and OI are glyph variants of the same character.
> They’re not. The origin of all those letterforms is obvious,
You don't have to repeat that. I clearly said, maybe even more than
once, that I can agree with your hypothesis on the origin of these
> and we do not encode sounds, we encode the elements of writing systems.
Yes. And we know that individual elements of a writing system sometimes
can have multiple origins.
>> But we have seen cases where such a merge happens. ß is one of them.
> That’s even arguable because ſʒ only really occurs in the whole-font Fraktur style. It’s pretty rare to see it in Antiqua. Of course it must be attested there, but it’s by no means common.
Do you mean that the merge didn't happen style-wise? That we therefore
don't need separate code points because historians don't need to
distinguish between the two; they can just rely on the font used?
But even if that weren't the case, we would still want to treat it as
one and the same character, with a single code point. It would still be
hopelessly impractical for Germans to use two different characters, when
they only can decide which character to type once they have seen the
actual character in the font they type, and have to potentially change
the character if they change the font.
And while we currently have no evidence that Deseret had developed a
typographic tradition where some type styles would use one set of
ligatures, and other styles would use another set, it wouldn't be
possible to reject this possibility without actually trying to find
evidence one way or another.
>> There are quite a few in Han (not surprising because there are tons of ideographs there to begin with).
>> But that experience doesn't mean that we have to rush to a conclusion without examining as much of the evidence as we can get hold of.
> I haven’t rushed to a conclusion. I’ve made a thorough analysis.
You made a thorough analysis of the graphic shapes.
You may have made some analysis with respect to usage, but you didn't
present it initially, and it took quite some time to get to it in this
>>> You’re smarter than that. So are Asmus and Mark and Erkki and any of the other sceptics who have chimed in here.
>> Skepticism is when presented with options without background facts is a virtue in my opinion.
> Your argument seemed to be based solely on the use of the letters for the sounds, ignoring the historical derivation and the facts of the spelling reform in Deseret.
The spelling reform is fine. What is important is what happened after
the spelling reform. Were the 1855 variants replaced by the 1859
variants? Was it two different traditions, separated in some way or
other? Or was it in effect more like a mixture of both?
(or maybe we don't know, or it's a little of everything?)
Examining these questions and bringing the available data to light and
clarifying the limits of our data and our understanding is very
important. Only in this way can we make decisions that will hopefully be
valid for the rest of the existence of Unicode (which might be quite a
few decades at least), or decisions that at a minimum might be evaluated
as "well, they didn't know better then", rather than as "they definitely
should have known better, even then".
>>> On 28 Mar 2017, at 11:59, Mark Davis ☕️ <mark at macchiato.com> wrote:
>>>> I agree with Martin.
>>>> Simply because someone used a particular shape at some time to mean a letter doesn't mean that Unicode should encode a letter for that shape.
>>> Coming to a forum like this out of a concern for the corpus of Deseret literature is not some sort of attempt to encode things for encoding’s sake.
>> And coming to a discussion like this out of a concern for modern practitioners of the script (even if it seems, after a lot of discussion, that there aren't that many of these, and the issue at hand may indeed not concern them that much) is not some sort of attempt to unify things for unification's sake.
> I think you made a lot of assumptions about “modern practitioners” which you didn’t disclose.
Maybe. But so likewise, you made a lot of assumptions about (the
absence) of modern practitioners which you didn't disclose.
> A proposal will be forthcoming. I want to thank several people who have written to me privately supporting my position with regard to this topic on this list. I can only say that supporting me in public is more useful than supporting me in private.
I'm looking forward to your proposal. I hope it clearly indicates why
(you think) there's no danger of inconveniencing modern practitioners.
I'd also like to thank the people who supported me, all of them on the list.
More information about the Unicode