From ismeta.wikt at gmail.com  Tue Nov  1 08:51:53 2016
From: ismeta.wikt at gmail.com (IS META)
Date: Tue, 1 Nov 2016 13:51:53 +0000
Subject: =?UTF-8?Q?U=2B1FBD_GREEK_KORONIS=3A_=E1=BE=BD?=
Message-ID: <CAFV=Ffg2niceFzt8zbCGO+mp_KD5hBCcCw1GH-y4O2mh_Y541g@mail.gmail.com>

Dear subscribers to the Unicode public general mail list,
Can anyone tell me what the intended use(s) of the character ? (U+1FBD
GREEK KORONIS) is/are, please? Or, failing that, where I can find out?

Many thanks in advance for any help you can provide. Apologies if this is
not the right forum in which to ask this question.

Yours faithfully,
I.S.M.E.T.A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161101/eea72323/attachment.html>

From doug at ewellic.org  Tue Nov  1 09:46:15 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 01 Nov 2016 07:46:15 -0700
Subject: U+1FBD GREEK KORONIS: =?UTF-8?Q?=E1=BE=BD?=
Message-ID: <20161101074615.665a7a7059d7ee80bb4d670165c8327d.08018b0db2.wbe@email03.godaddy.com>

IS META wrote:

> Can anyone tell me what the intended use(s) of the character ? (U+1FBD
> GREEK KORONIS) is/are, please? Or, failing that, where I can find out?

Section 7.2, "Greek" in TUS 9.0 says:

> Greek Extended: U+1F00?U+1FFF
> [...]
> Spacing Diacritics. Sixteen additional spacing diacritical marks are
> provided in this character block for use in the representation of
> polytonic Greek texts. Each has an alternative representation for use
> with systems that support nonspacing marks. The nonspacing
> alternatives appear in Table 7-3. The spacing forms are meant for
> keyboards and pedagogical use and are not to be used in the
> representation of titlecase words. The compatibility decompositions of
> these spacing forms consist of the sequence U+0020 SPACE followed by
> the nonspacing form equivalents shown in Table 7-3.

Source: http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From jtauber at jtauber.com  Tue Nov  1 09:51:07 2016
From: jtauber at jtauber.com (James Tauber)
Date: Tue, 1 Nov 2016 10:51:07 -0400
Subject: =?UTF-8?B?UmU6IFUrMUZCRCBHUkVFSyBLT1JPTklTOiDhvr0=?=
In-Reply-To: <CAFV=Ffg2niceFzt8zbCGO+mp_KD5hBCcCw1GH-y4O2mh_Y541g@mail.gmail.com>
References: <CAFV=Ffg2niceFzt8zbCGO+mp_KD5hBCcCw1GH-y4O2mh_Y541g@mail.gmail.com>
Message-ID: <CAJdVgGKtKf7qFNpXrr6h2VvsWgYY_q48d6cgNTw7E2xuWfc=Qw@mail.gmail.com>

The koronis (often latinized as coronis) is a diacritic used in Ancient
Greek texts (although later, not at the time they were written).

It's written over a vowel to indicate contraction by crasis.

See https://en.wikipedia.org/wiki/Crasis#Greek

James


On Tue, Nov 1, 2016 at 9:51 AM, IS META <ismeta.wikt at gmail.com> wrote:

> Dear subscribers to the Unicode public general mail list,
> Can anyone tell me what the intended use(s) of the character ? (U+1FBD
> GREEK KORONIS) is/are, please? Or, failing that, where I can find out?
>
> Many thanks in advance for any help you can provide. Apologies if this is
> not the right forum in which to ask this question.
>
> Yours faithfully,
> I.S.M.E.T.A.
>


-- 
James Tauber
http://jtauber.com/
@jtauber on Twitter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161101/cbcff297/attachment.html>

From doug at ewellic.org  Tue Nov  1 11:05:49 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 01 Nov 2016 09:05:49 -0700
Subject: U+1FBD GREEK KORONIS: =?UTF-8?Q?=E1=BE=BD?=
Message-ID: <20161101090549.665a7a7059d7ee80bb4d670165c8327d.50b0e7457d.wbe@email03.godaddy.com>

James Tauber wrote:

> The koronis (often latinized as coronis) is a diacritic used in
> Ancient Greek texts (although later, not at the time they were
> written).
>
> It's written over a vowel to indicate contraction by crasis.

Sorry, I thought the OP was asking about the use of the specific
character at U+1FBD. not about the koronis generally (which should
normally be coded as U+0343). You are correct about the function of the
koronis in Greek. Apologies if my answer was misleading.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From jtauber at jtauber.com  Tue Nov  1 11:13:20 2016
From: jtauber at jtauber.com (James Tauber)
Date: Tue, 1 Nov 2016 12:13:20 -0400
Subject: =?UTF-8?B?UmU6IFUrMUZCRCBHUkVFSyBLT1JPTklTOiDhvr0=?=
In-Reply-To: <20161101090549.665a7a7059d7ee80bb4d670165c8327d.50b0e7457d.wbe@email03.godaddy.com>
References: <20161101090549.665a7a7059d7ee80bb4d670165c8327d.50b0e7457d.wbe@email03.godaddy.com>
Message-ID: <CAJdVgGKSA0K04c8BkLLver+ONiMV3hZb82qeGvudwe2hj8DNiQ@mail.gmail.com>

On Tue, Nov 1, 2016 at 12:05 PM, Doug Ewell <doug at ewellic.org> wrote:

> James Tauber wrote:
>
> > The koronis (often latinized as coronis) is a diacritic used in
> > Ancient Greek texts (although later, not at the time they were
> > written).
> >
> > It's written over a vowel to indicate contraction by crasis.
>
> Sorry, I thought the OP was asking about the use of the specific
> character at U+1FBD. not about the koronis generally (which should
> normally be coded as U+0343). You are correct about the function of the
> koronis in Greek. Apologies if my answer was misleading.
>

I wasn't sure so I thought I'd complement your answer to cover all bases :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161101/8b3aa44b/attachment.html>

From mats.gbproject at gmail.com  Wed Nov  2 19:05:13 2016
From: mats.gbproject at gmail.com (Mats Blakstad)
Date: Thu, 3 Nov 2016 01:05:13 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
References: <20160223102509.665a7a7059d7ee80bb4d670165c8327d.2a091675e5.wbe@email03.secureserver.net>
 <CAGa7JC06wx=-fABHjpREYhuCUdey25tmz86QERYPLG9ZPX4qpQ@mail.gmail.com>
 <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
Message-ID: <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>

After managing to add the keyboard to XKB I started on a new venture of
trying to make a windows version of the keyboard using this:
https://msdn.microsoft.com/en-us/globalization/keyboardlayouts.aspx

It is nearly impossible to replicate as it seems like you can only add dead
keys if they have a precomposed character.

Also, in Togo it is used double tones like these:

"???"   LATIN CAPITAL LETTER EPSILON WITH TILDE AND ACUTE
"???"   LATIN CAPITAL LETTER EPSILON WITH TILDE AND GRAVE

And windows do not even allow dead keys with double symbols...

So I wonder if it could be a solution for a precomposed double tone?
So one unicode for tilde+acute and another for tilde+grave?

The only way we manage to make the keyboard now is to add all the tones
behind the letters instead of before the letters.
I think in fact it seems easier than on French keyboard, but it will also
break the French keyboard when it comes to what order you click buttons to
add tones.
I also think it would be a benefit to have the keyboard on windows and
Ubuntu work mostly the same.

Not sure if there are any other good ideas for how to solve it?

On 25 February 2016 at 09:35, Marcel Schneider <charupdate at orange.fr> wrote:

> On Tue, 23 Feb 2016 12:10:51 +0100, Philippe Verdy  wrote:
>
> > 2016-02-23 11:21 GMT+01:00 Marcel Schneider :
> >
> > > I feel that people coming from?or studying languages of?countries and
> > > communities on other continents should become able to type their
> language
> > > in that script on any computer in France as well as in any other Latin
> > > script using countries,
> [?]
> > > The only difference
> > > between keyboard layouts of Latin script using countries should be
> varying
> > > accessibility depending on frequencies of use.
> > >
> >
> > There will remain a resistance for the base layout of letters (basically
> > QWERTY vs. AZERTY vs QWERTZ) and basic punctuation
> > For all other characters (including shifted or non-shifted digits,
> because
> > this is only an issue on mechanical keyboards, not touche-on-screen
> > keyboard, and mechanical keyboards almost always have a numeric keypad
> > anyway), people can adapt easily, provided that the less frequent but
> > essential punctuation (parentheses, apostrophe, hyphen) can be found on
> the
> > key labels, as well as the location of dead keys for all the essential
> > diacritics.
> >
> > Indeed, if there's a new standard for French, there will be new physical
> > keyboards placing the labels correctly for the essential punctuation,
> plus
> > the essential letters combined with diacritics with a single keystroke :
> > but the later letters are language-dependant and not script-dependant, so
> > people writing in other languages for the same script may not find them
> > useful, but should be able to locate the deadkeys to get the full
> coverage
> > they need. If a standard is adopted, the set of essential letters
> combined
> > with diacritics should be located on a small part of the keyboard that is
> > the same across all languages of the script, but tuned specifically for a
> > language (or a few languages of one country).
> > There will remain keyboard layouts per country differing only on those
> > locations in this small part, probably reduced to only 5
> language-dependant
> > keys (only designed for ease of access, e.g. "?????" in French are very
> > frequent and will be located in that part, but Italians would like to
> have
> > all vowels with acute, Spanish will want to have the "?" in this part).
>
> On Tue, 23 Feb 2016 10:25:09 -0700, Doug Ewell  replied:
>
> > Philippe Verdy wrote:
> >
> > > There will remain a resistance for the base layout of letters
> > > (basically QWERTY vs. AZERTY vs QWERTZ) and basic punctuation
> >
> > Philippe is absolutely right here. Most of us on this list are
> > character-set and i18n wonks, and some of us have customized our own
> > keyboard layouts, but we should not delude ourselves into thinking we
> > represent ordinary users. Many people are emotionally tied to a
> > particular keyboard layout and become very confused when faced with
> > something different. Trying to persuade them to adopt a "universal"
> > keyboard, so they can type characters in a language they may not know,
> > is an exercise in social frustration.
>
> On Wed, 24 Feb 2016 01:38:59 +0100, Philippe Verdy  replied:
>
> > And this is demonstrated since long by the epxerience of alternate
> > "ergonomic" layouts, used by very few people.
> >
> [?]
> >
> > We'll continue to live for long with the 3 basic layouts for Latin
> (QWERTY,
> > AZERTY, QWERTZ). And nothing will really change without a strong national
> > standard that will convince manufacturers to propose it at normal prices,
> > and force software vendors to include it in the builtin layouts for their
> > OSes.
>
> When I wrote: ?The only difference [?] should be [?]?, I swapped over into
> an ideal world? let alone that the historic swap from QWERTY to AZERTY was
> triggered by an ?accessibility? issue based ?on frequencies of use?. My
> purpose being not to *enforce* ergonomics as about the alphabetical layout,
> I fully agree with Mats Blakstad, whose ?method of extending the main
> layout is likely to be the only useful one? as I wrote in the same
> e-mail?and with Doug Ewell and Philippe Verdy, whose valuable contributions
> came on to sustain.
>
> All parts of the Latin script as provided by Unicode, that are not used to
> write local and national languages e.g. of Togo, or of France, may be
> hidden as on keytops, but accessible on software side, i.e. in the layout
> driver or in the configuration files. One other challenge in Togo would be
> how to give easy access to the seven supplemental letters ?, ?, ?, ?, ?, ?
> and ?, while the five French precomposed letters are to be maintained, let
> alone ? and ??the latter being rather seldom in French however?that are
> part of the new governmental requirements in France, among other characters
> like the angle quotation marks, called guillemets-chevrons[1].
>
> Generally talking, I can?t help believe that providing the ability to type
> any Latin script using language on any Latin keyboard would be a good idea.
> Again, that is feasible without overloading the keyboard with dead keys,
> just providing the most frequently used ones, six in Togo as I can see.
>
> Marcel
>
> [1] Vers une norme fran?aise pour les claviers informatiques - Langue
> fran?aise et langues de France - Minist?re de la Culture et de la
> Communication. (2016, January 15). Retrieved January 19, 2016, from
> http://www.culturecommunication.gouv.fr/Politiques-ministerielles/
> Langue-francaise-et-langues-de-France/Politiques-de-la-
> langue/Langues-et-numerique/Les-technologies-de-la-langue-
> et-la-normalisation/Vers-une-norme-francaise-pour-les-
> claviers-informatiques
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/bc690197/attachment.html>

From verdy_p at wanadoo.fr  Wed Nov  2 19:27:32 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 3 Nov 2016 01:27:32 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>
References: <20160223102509.665a7a7059d7ee80bb4d670165c8327d.2a091675e5.wbe@email03.secureserver.net>
 <CAGa7JC06wx=-fABHjpREYhuCUdey25tmz86QERYPLG9ZPX4qpQ@mail.gmail.com>
 <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
 <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>
Message-ID: <CAGa7JC3u4Awx6VyTH6Lmpk_iFy_rWUfi60SNr4Sso79ZLWvhkQ@mail.gmail.com>

My opinion is that MSKLC should be updated to support chained dead keys
(internally they are supported by the OS), using more keyboard maps.
This way we could enter diacritics in any order and still when typing the
base letter, the result would be the whole combination of characters in NFC
form...
I don't think it is a good reason for encoding the double diacritic itself,
only because of a limitation of MSKLC (but the Windows keymap compiler
supports chained dead keys, it's only the visual editor that does not allow
it) !


2016-11-03 1:05 GMT+01:00 Mats Blakstad <mats.gbproject at gmail.com>:

> After managing to add the keyboard to XKB I started on a new venture of
> trying to make a windows version of the keyboard using this:
> https://msdn.microsoft.com/en-us/globalization/keyboardlayouts.aspx
>
> It is nearly impossible to replicate as it seems like you can only add
> dead keys if they have a precomposed character.
>
> Also, in Togo it is used double tones like these:
>
> "???"   LATIN CAPITAL LETTER EPSILON WITH TILDE AND ACUTE
> "???"   LATIN CAPITAL LETTER EPSILON WITH TILDE AND GRAVE
>
> And windows do not even allow dead keys with double symbols...
>
> So I wonder if it could be a solution for a precomposed double tone?
> So one unicode for tilde+acute and another for tilde+grave?
>
> The only way we manage to make the keyboard now is to add all the tones
> behind the letters instead of before the letters.
> I think in fact it seems easier than on French keyboard, but it will also
> break the French keyboard when it comes to what order you click buttons to
> add tones.
> I also think it would be a benefit to have the keyboard on windows and
> Ubuntu work mostly the same.
>
> Not sure if there are any other good ideas for how to solve it?
>
> On 25 February 2016 at 09:35, Marcel Schneider <charupdate at orange.fr>
> wrote:
>
>> On Tue, 23 Feb 2016 12:10:51 +0100, Philippe Verdy  wrote:
>>
>> > 2016-02-23 11:21 GMT+01:00 Marcel Schneider :
>> >
>> > > I feel that people coming from?or studying languages of?countries and
>> > > communities on other continents should become able to type their
>> language
>> > > in that script on any computer in France as well as in any other Latin
>> > > script using countries,
>> [?]
>> > > The only difference
>> > > between keyboard layouts of Latin script using countries should be
>> varying
>> > > accessibility depending on frequencies of use.
>> > >
>> >
>> > There will remain a resistance for the base layout of letters (basically
>> > QWERTY vs. AZERTY vs QWERTZ) and basic punctuation
>> > For all other characters (including shifted or non-shifted digits,
>> because
>> > this is only an issue on mechanical keyboards, not touche-on-screen
>> > keyboard, and mechanical keyboards almost always have a numeric keypad
>> > anyway), people can adapt easily, provided that the less frequent but
>> > essential punctuation (parentheses, apostrophe, hyphen) can be found on
>> the
>> > key labels, as well as the location of dead keys for all the essential
>> > diacritics.
>> >
>> > Indeed, if there's a new standard for French, there will be new physical
>> > keyboards placing the labels correctly for the essential punctuation,
>> plus
>> > the essential letters combined with diacritics with a single keystroke :
>> > but the later letters are language-dependant and not script-dependant,
>> so
>> > people writing in other languages for the same script may not find them
>> > useful, but should be able to locate the deadkeys to get the full
>> coverage
>> > they need. If a standard is adopted, the set of essential letters
>> combined
>> > with diacritics should be located on a small part of the keyboard that
>> is
>> > the same across all languages of the script, but tuned specifically for
>> a
>> > language (or a few languages of one country).
>> > There will remain keyboard layouts per country differing only on those
>> > locations in this small part, probably reduced to only 5
>> language-dependant
>> > keys (only designed for ease of access, e.g. "?????" in French are very
>> > frequent and will be located in that part, but Italians would like to
>> have
>> > all vowels with acute, Spanish will want to have the "?" in this part).
>>
>> On Tue, 23 Feb 2016 10:25:09 -0700, Doug Ewell  replied:
>>
>> > Philippe Verdy wrote:
>> >
>> > > There will remain a resistance for the base layout of letters
>> > > (basically QWERTY vs. AZERTY vs QWERTZ) and basic punctuation
>> >
>> > Philippe is absolutely right here. Most of us on this list are
>> > character-set and i18n wonks, and some of us have customized our own
>> > keyboard layouts, but we should not delude ourselves into thinking we
>> > represent ordinary users. Many people are emotionally tied to a
>> > particular keyboard layout and become very confused when faced with
>> > something different. Trying to persuade them to adopt a "universal"
>> > keyboard, so they can type characters in a language they may not know,
>> > is an exercise in social frustration.
>>
>> On Wed, 24 Feb 2016 01:38:59 +0100, Philippe Verdy  replied:
>>
>> > And this is demonstrated since long by the epxerience of alternate
>> > "ergonomic" layouts, used by very few people.
>> >
>> [?]
>> >
>> > We'll continue to live for long with the 3 basic layouts for Latin
>> (QWERTY,
>> > AZERTY, QWERTZ). And nothing will really change without a strong
>> national
>> > standard that will convince manufacturers to propose it at normal
>> prices,
>> > and force software vendors to include it in the builtin layouts for
>> their
>> > OSes.
>>
>> When I wrote: ?The only difference [?] should be [?]?, I swapped over
>> into an ideal world? let alone that the historic swap from QWERTY to AZERTY
>> was triggered by an ?accessibility? issue based ?on frequencies of use?. My
>> purpose being not to *enforce* ergonomics as about the alphabetical layout,
>> I fully agree with Mats Blakstad, whose ?method of extending the main
>> layout is likely to be the only useful one? as I wrote in the same
>> e-mail?and with Doug Ewell and Philippe Verdy, whose valuable contributions
>> came on to sustain.
>>
>> All parts of the Latin script as provided by Unicode, that are not used
>> to write local and national languages e.g. of Togo, or of France, may be
>> hidden as on keytops, but accessible on software side, i.e. in the layout
>> driver or in the configuration files. One other challenge in Togo would be
>> how to give easy access to the seven supplemental letters ?, ?, ?, ?, ?, ?
>> and ?, while the five French precomposed letters are to be maintained, let
>> alone ? and ??the latter being rather seldom in French however?that are
>> part of the new governmental requirements in France, among other characters
>> like the angle quotation marks, called guillemets-chevrons[1].
>>
>> Generally talking, I can?t help believe that providing the ability to
>> type any Latin script using language on any Latin keyboard would be a good
>> idea. Again, that is feasible without overloading the keyboard with dead
>> keys, just providing the most frequently used ones, six in Togo as I can
>> see.
>>
>> Marcel
>>
>> [1] Vers une norme fran?aise pour les claviers informatiques - Langue
>> fran?aise et langues de France - Minist?re de la Culture et de la
>> Communication. (2016, January 15). Retrieved January 19, 2016, from
>> http://www.culturecommunication.gouv.fr/Politiques-ministerielles/Langue-
>> francaise-et-langues-de-France/Politiques-de-la-langue
>> /Langues-et-numerique/Les-technologies-de-la-langue-et-
>> la-normalisation/Vers-une-norme-francaise-pour-les-claviers-informatiques
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/ef1d8458/attachment.html>

From moyogo at gmail.com  Thu Nov  3 01:36:26 2016
From: moyogo at gmail.com (Denis Jacquerye)
Date: Thu, 03 Nov 2016 06:36:26 +0000
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <CAGa7JC3u4Awx6VyTH6Lmpk_iFy_rWUfi60SNr4Sso79ZLWvhkQ@mail.gmail.com>
References: <20160223102509.665a7a7059d7ee80bb4d670165c8327d.2a091675e5.wbe@email03.secureserver.net>
 <CAGa7JC06wx=-fABHjpREYhuCUdey25tmz86QERYPLG9ZPX4qpQ@mail.gmail.com>
 <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
 <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>
 <CAGa7JC3u4Awx6VyTH6Lmpk_iFy_rWUfi60SNr4Sso79ZLWvhkQ@mail.gmail.com>
Message-ID: <CAJKta0zTvW9zUiOgU8BJuei6SmHZ0cVPjX-x3Mx68xQO5AMp7w@mail.gmail.com>

2016-11-03 1:05 GMT+01:00 Mats Blakstad <mats.gbproject at gmail.com>:

So I wonder if it could be a solution for a precomposed double tone?
So one unicode for tilde+acute and another for tilde+grave?

The only way we manage to make the keyboard now is to add all the tones
behind the letters instead of before the letters.
I think in fact it seems easier than on French keyboard, but it will also
break the French keyboard when it comes to what order you click buttons to
add tones.
I also think it would be a benefit to have the keyboard on windows and
Ubuntu work mostly the same.

Not sure if there are any other good ideas for how to solve it?


Don?t use dead keys on the keyboard layout, then you can have the same
keyboard on Windows and Ubuntu.
Even if MSKLC could handle outputting multiple characters, why are dead
keys a requirement?

Shouldn?t you already have broken the French layout by reassigning keys to
Togo language letters ?, ?, ?, ?, ?, ?, ??
If not, it sounds like it will slow down typing in those languages.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/8cde554a/attachment.html>

From moyogo at gmail.com  Thu Nov  3 01:45:51 2016
From: moyogo at gmail.com (Denis Jacquerye)
Date: Thu, 03 Nov 2016 06:45:51 +0000
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <CAJKta0zTvW9zUiOgU8BJuei6SmHZ0cVPjX-x3Mx68xQO5AMp7w@mail.gmail.com>
References: <20160223102509.665a7a7059d7ee80bb4d670165c8327d.2a091675e5.wbe@email03.secureserver.net>
 <CAGa7JC06wx=-fABHjpREYhuCUdey25tmz86QERYPLG9ZPX4qpQ@mail.gmail.com>
 <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
 <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>
 <CAGa7JC3u4Awx6VyTH6Lmpk_iFy_rWUfi60SNr4Sso79ZLWvhkQ@mail.gmail.com>
 <CAJKta0zTvW9zUiOgU8BJuei6SmHZ0cVPjX-x3Mx68xQO5AMp7w@mail.gmail.com>
Message-ID: <CAJKta0wSxJiJBX=9_-XQNPq69emV_O+x09QwP8xN1qAqrqcHMQ@mail.gmail.com>

You can also do dead keys in reverse where, instead of having the diacritic
key as a dead key that one pressed before a letter key, you have the letter
key as a dead key that you press before the diacritic key.
That way, your key order is the same whether a system handles outputting
multiple characters or not, and you can use precomposed characters when
available if that is a requirement.

On Thu, 3 Nov 2016 at 06:36 Denis Jacquerye <moyogo at gmail.com> wrote:

> 2016-11-03 1:05 GMT+01:00 Mats Blakstad <mats.gbproject at gmail.com>:
>
> So I wonder if it could be a solution for a precomposed double tone?
> So one unicode for tilde+acute and another for tilde+grave?
>
> The only way we manage to make the keyboard now is to add all the tones
> behind the letters instead of before the letters.
> I think in fact it seems easier than on French keyboard, but it will also
> break the French keyboard when it comes to what order you click buttons to
> add tones.
> I also think it would be a benefit to have the keyboard on windows and
> Ubuntu work mostly the same.
>
> Not sure if there are any other good ideas for how to solve it?
>
>
> Don?t use dead keys on the keyboard layout, then you can have the same
> keyboard on Windows and Ubuntu.
> Even if MSKLC could handle outputting multiple characters, why are dead
> keys a requirement?
>
> Shouldn?t you already have broken the French layout by reassigning keys to
> Togo language letters ?, ?, ?, ?, ?, ?, ??
> If not, it sounds like it will slow down typing in those languages.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/4a3562ca/attachment.html>

From charupdate at orange.fr  Thu Nov  3 02:56:39 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 3 Nov 2016 08:56:39 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <CAJKta0wSxJiJBX=9_-XQNPq69emV_O+x09QwP8xN1qAqrqcHMQ@mail.gmail.com>
References: <20160223102509.665a7a7059d7ee80bb4d670165c8327d.2a091675e5.wbe@email03.secureserver.net>
 <CAGa7JC06wx=-fABHjpREYhuCUdey25tmz86QERYPLG9ZPX4qpQ@mail.gmail.com>
 <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
 <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>
 <CAGa7JC3u4Awx6VyTH6Lmpk_iFy_rWUfi60SNr4Sso79ZLWvhkQ@mail.gmail.com>
 <CAJKta0zTvW9zUiOgU8BJuei6SmHZ0cVPjX-x3Mx68xQO5AMp7w@mail.gmail.com>
 <CAJKta0wSxJiJBX=9_-XQNPq69emV_O+x09QwP8xN1qAqrqcHMQ@mail.gmail.com>
Message-ID: <935090593.1415.1478159799533.JavaMail.www@wwinf1j20>

On Thu, 3 Nov 2016 01:05:13 +0100, Mats Blakstad wrote:

> After managing to add the keyboard to XKB I started on a new venture of 
> trying to make a windows version of the keyboard using this: 
> https://msdn.microsoft.com/en-us/globalization/keyboardlayouts.aspx 
> 
> It is nearly impossible to replicate as it seems like you can only add dead 
> keys if they have a precomposed character.

This Windows limitation is indeed a significant drawback. You may wish to browse 
the archive back and forth starting from here:
http://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0040.html

> 
> Also, in Togo it is used double tones like these: 
> 
> "???" LATIN CAPITAL LETTER EPSILON WITH TILDE AND ACUTE 
> "???" LATIN CAPITAL LETTER EPSILON WITH TILDE AND GRAVE 
> 
> And windows do not even allow dead keys with double symbols... 

I top on Philippe Verdy?s reply. Serial dead keys are a Windows feature, 
and implementing them is feasible around MSKLC although not in the GUI, as 
its developer Michael Kaplan explained in a blog post that Doug Ewell shared in:
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0214.html

Actually I?m localizing in English an interactive, self-explaining script in batch 
to facilitate generating the sources and layout drivers. It will soon be 
for free download here:
http://charupdate.info#drivers

Even the EULA issue is settled, as you may read there.

Further I recommend to program the deadtrans list in C because this has the 
advantage of working on a flat list, while in the .klc source it is grouped.

> 
> So I wonder if it could be a solution for a precomposed double tone? 
> So one unicode for tilde+acute and another for tilde+grave? 
> 
> The only way we manage to make the keyboard now is to add all the tones 
> behind the letters instead of before the letters. 
> I think in fact it seems easier than on French keyboard, but it will also 
> break the French keyboard when it comes to what order you click buttons to 
> add tones. 
> I also think it would be a benefit to have the keyboard on windows and 
> Ubuntu work mostly the same. 
> 
> Not sure if there are any other good ideas for how to solve it? 

Additionally to Denis Jacquerye?s replies, I would mention again a software 
that I believe is best fit to get what you need on Windows:
Keyman.
Keyman is now a part of SIL and is being made available for free.
http://keyman.com/

Best regards,

Marcel

> 
> On 25 February 2016 at 09:35, Marcel Schneider  wrote: 
> 
[?]
?


From mats.gbproject at gmail.com  Thu Nov  3 10:01:56 2016
From: mats.gbproject at gmail.com (Mats Blakstad)
Date: Thu, 3 Nov 2016 16:01:56 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <935090593.1415.1478159799533.JavaMail.www@wwinf1j20>
References: <20160223102509.665a7a7059d7ee80bb4d670165c8327d.2a091675e5.wbe@email03.secureserver.net>
 <CAGa7JC06wx=-fABHjpREYhuCUdey25tmz86QERYPLG9ZPX4qpQ@mail.gmail.com>
 <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
 <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>
 <CAGa7JC3u4Awx6VyTH6Lmpk_iFy_rWUfi60SNr4Sso79ZLWvhkQ@mail.gmail.com>
 <CAJKta0zTvW9zUiOgU8BJuei6SmHZ0cVPjX-x3Mx68xQO5AMp7w@mail.gmail.com>
 <CAJKta0wSxJiJBX=9_-XQNPq69emV_O+x09QwP8xN1qAqrqcHMQ@mail.gmail.com>
 <935090593.1415.1478159799533.JavaMail.www@wwinf1j20>
Message-ID: <CAP=1PAVUnZWu0HQ2cth4b+f3_V+iWk2sYx1Mqsv5rWJ-aEdVVA@mail.gmail.com>

> Don?t use dead keys on the keyboard layout, then you can have the same
keyboard on Windows and Ubuntu.

As we try to keep the French keyboard 1:1 and only extend it with extra
functionalities, I guess we need to keep the dead keys already present
there?

> Shouldn?t you already have broken the French layout by reassigning keys
to Togo language letters ?, ?, ?, ?, ?, ?, ??
> If not, it sounds like it will slow down typing in those languages.

No, in XKB we managed to keep the French keyboard 1:1, only extend it with
extra symbols.
We can't reassigning keys as local languages in Togo also use all letters
in French alphabet.
Besides, they mostly use the French keyboard, it will make it a lot easier
& faster if they just can get extended buttons to a keyboard they already
know.

> You can also do dead keys in reverse where, instead of having the
diacritic key as a dead key that one pressed before a letter key, you have
the letter key as a dead key that you press before the diacritic key.

I managed to maske such a solution, but then the keyboard is not any longer
1;1 with French keyboard as users can use the keyboard exactly as they're
used to use the French keyboard.
What I try achieve is to keep the French keyboard unchanged, extend it with
symbols for Togolese local languages, and keep the assignment of diacritics
consistent with that of the French keyboard.

> Windows keymap compiler supports chained dead keys, it's only the visual
editor that does not allow it
> Serial dead keys are a Windows feature,and implementing them is feasible
around MSKLC although not in the GUI

Are there any other framework than MSKLC that is simple and easy to use?
Or do we need to build from scratch?

> http://charupdate.info#drivers
> Further I recommend to program the deadtrans list in C because this has
the advantage of working on a flat list, while in the .klc source it is
grouped.
> http://keyman.com/

Thanks for these great leads! I guess keyman will make it dependent for the
user to install extra softwares? And the charupdate is not available.
To me now it seems like the best approach to do it in C, I will try
investigate more on this.

Thanks for all the helpful feedbacks!

On 3 November 2016 at 08:56, Marcel Schneider <charupdate at orange.fr> wrote:

> On Thu, 3 Nov 2016 01:05:13 +0100, Mats Blakstad wrote:
>
> > After managing to add the keyboard to XKB I started on a new venture of
> > trying to make a windows version of the keyboard using this:
> > https://msdn.microsoft.com/en-us/globalization/keyboardlayouts.aspx
> >
> > It is nearly impossible to replicate as it seems like you can only add
> dead
> > keys if they have a precomposed character.
>
> This Windows limitation is indeed a significant drawback. You may wish to
> browse
> the archive back and forth starting from here:
> http://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0040.html
>
> >
> > Also, in Togo it is used double tones like these:
> >
> > "???" LATIN CAPITAL LETTER EPSILON WITH TILDE AND ACUTE
> > "???" LATIN CAPITAL LETTER EPSILON WITH TILDE AND GRAVE
> >
> > And windows do not even allow dead keys with double symbols...
>
> I top on Philippe Verdy?s reply. Serial dead keys are a Windows feature,
> and implementing them is feasible around MSKLC although not in the GUI, as
> its developer Michael Kaplan explained in a blog post that Doug Ewell
> shared in:
> http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0214.html
>
> Actually I?m localizing in English an interactive, self-explaining script
> in batch
> to facilitate generating the sources and layout drivers. It will soon be
> for free download here:
> http://charupdate.info#drivers
>
> Even the EULA issue is settled, as you may read there.
>
> Further I recommend to program the deadtrans list in C because this has the
> advantage of working on a flat list, while in the .klc source it is
> grouped.
>
> >
> > So I wonder if it could be a solution for a precomposed double tone?
> > So one unicode for tilde+acute and another for tilde+grave?
> >
> > The only way we manage to make the keyboard now is to add all the tones
> > behind the letters instead of before the letters.
> > I think in fact it seems easier than on French keyboard, but it will also
> > break the French keyboard when it comes to what order you click buttons
> to
> > add tones.
> > I also think it would be a benefit to have the keyboard on windows and
> > Ubuntu work mostly the same.
> >
> > Not sure if there are any other good ideas for how to solve it?
>
> Additionally to Denis Jacquerye?s replies, I would mention again a software
> that I believe is best fit to get what you need on Windows:
> Keyman.
> Keyman is now a part of SIL and is being made available for free.
> http://keyman.com/
>
> Best regards,
>
> Marcel
>
> >
> > On 25 February 2016 at 09:35, Marcel Schneider  wrote:
> >
> [?]
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/d2bafe7a/attachment.html>

From doug at ewellic.org  Thu Nov  3 15:44:12 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 03 Nov 2016 13:44:12 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161103134412.665a7a7059d7ee80bb4d670165c8327d.0e1cab67b3.wbe@email03.godaddy.com>

I think we are talking about two different issues here. It's important
to keep these separate, to avoid talking past each other.

Mats Blakstad wrote:

> After managing to add the keyboard to XKB I started on a new venture
> of trying to make a windows version of the keyboard using this:
>
> [link to Microsoft Keyboard Layout Creator]
>
> It is nearly impossible to replicate as it seems like you can only add
> dead keys if they have a precomposed character.

Mats is talking about the fact that a dead key combination (of any
length) under Windows can generate only a single UTF-16 code unit. This
is a Windows architectural limitation, and cannot be fixed by updating
MSKLC. It can only be circumvented by using Keyman or another
third-party solution that runs at a layer above the Windows
architecture.

Philippe Verdy wrote:

> My opinion is that MSKLC should be updated to support chained dead
> keys (internally they are supported by the OS), using more keyboard
> maps.

The fact that MSKLC does not support chained dead keys is perhaps
related to the problem Mats is experiencing, but it is a different
issue.

Even if MSKLC were updated to allow chaining of dead keys, Mats still
could not use this capability to type a TILDE dead key, then an ACUTE
dead key, and then an EPSILON key and get "???" LATIN CAPITAL LETTER
EPSILON WITH TILDE AND ACUTE. The reason, as Mats said, is that the NFC
form of this double-accented letter is still 3 code units in length, 2
more than the Windows architecture supports.

Furthermore, even though many of us would like for MSKLC to be updated,
the reality is that its developer (Michael Kaplan) is no longer with us,
and Microsoft had already terminated MSKLC development (a source of
frequent frustration to Michael). We can all wish that Microsoft would
reverse itself and start devoting resources to this project of
Michael's, but it's probably not going to happen.

A more realistic course of action might be for someone outside of
Microsoft, maybe someone on this list, to create their own GUI wrapper
around the Microsoft engine, a "new MSKLC" so to speak. That new project
could remove the MSKLC limitation, but not the Windows one.

Mats wrote:

> So I wonder if it could be a solution for a precomposed double tone?
> So one unicode for tilde+acute and another for tilde+grave?

If Unicode policy is what it used to be, then Philippe is correct:
vendor limits are not an adequate justification for encoding double
diacritics. Doing so would introduce new ambiguities, just like encoding
new precomposed versions of characters that already have decomposed
representations.

Denis Jacquerye suggested using the letter as the dead key instead of
the diacritic. Perhaps a more straightforward approach would be to give
the diacritical marks their own normal keys, so the user could type
EPSILON, (combining) TILDE, (combining) ACUTE.

Marcel Schneider's suggestion of using Keyman instead might be the best,
if it is mandatory for the Windows version of this layout to be
identical to the Ubuntu version, for reasons I don't understand (many
keyboard layouts are already not constant across platforms).
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From charupdate at orange.fr  Thu Nov  3 15:44:56 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 3 Nov 2016 21:44:56 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <CAP=1PAVUnZWu0HQ2cth4b+f3_V+iWk2sYx1Mqsv5rWJ-aEdVVA@mail.gmail.com>
References: <20160223102509.665a7a7059d7ee80bb4d670165c8327d.2a091675e5.wbe@email03.secureserver.net>
 <CAGa7JC06wx=-fABHjpREYhuCUdey25tmz86QERYPLG9ZPX4qpQ@mail.gmail.com>
 <1368697159.3867.1456389325215.JavaMail.www@wwinf1p04>
 <CAP=1PAXvk3OMCsnAJFhF2V_HJ39B3wrnCtsFX2ErLPTPGHAqgQ@mail.gmail.com>
 <CAGa7JC3u4Awx6VyTH6Lmpk_iFy_rWUfi60SNr4Sso79ZLWvhkQ@mail.gmail.com>
 <CAJKta0zTvW9zUiOgU8BJuei6SmHZ0cVPjX-x3Mx68xQO5AMp7w@mail.gmail.com>
 <CAJKta0wSxJiJBX=9_-XQNPq69emV_O+x09QwP8xN1qAqrqcHMQ@mail.gmail.com>
 <935090593.1415.1478159799533.JavaMail.www@wwinf1j20>
 <CAP=1PAVUnZWu0HQ2cth4b+f3_V+iWk2sYx1Mqsv5rWJ-aEdVVA@mail.gmail.com>
Message-ID: <2090428085.16301.1478205896676.JavaMail.www@wwinf1e27>

On 3 Nov 2016 16:01:56 +0100, Mats Blakstad wrote;

> > Don?t use dead keys on the keyboard layout, then you can have the same
> > keyboard on Windows and Ubuntu.
> 
> As we try to keep the French keyboard 1:1 

There are many. A standard is now being written, that will be subject to public 
enquiry from december through february, 2017. Base letters remain unchanged.

> and only extend it with extra functionalities,
> I guess we need to keep the dead keys already present there?

E.g. with combining diacritics by [dead key] followed by [space bar], as on:
http://uscustom.sourceforge.net/

[?]
> 
> > Windows keymap compiler supports chained dead keys, it's only the visual
> > editor that does not allow it
> > Serial dead keys are a Windows feature,and implementing them is feasible
> > around MSKLC although not in the GUI
> 
> Are there any other framework than MSKLC that is simple and easy to use?

I know people who use KbdEdit and like it, but it still has extra limitations.
http://www.kbdedit.com

> Or do we need to build from scratch?

No, KbdUTool generates the C sources from any KLC file, that MSKLC generates 
from any keyboard layout that ships with Windows except the Canadian Standard 
Keyboard, because this uses a modifier (0x08) that is unsupported in MSKLC.

> 
> > http://charupdate.info#drivers
> > Further I recommend to program the deadtrans list in C because this has
> > the advantage of working on a flat list, while in the .klc source it is
> > grouped.
> > http://keyman.com/
> 
> Thanks for these great leads! I guess keyman will make it dependent for the
> user to install extra softwares?

Yes, but IMHO installing custom keyboard layout drivers on Windows is not 
essentially different from installing extra software. However if it is to be 
shipped with Windows and distributed through Windows Update, Windows limitations 
apply, i.e. only one code unit by dead keys. In this case, even high surrogates 
must be entered separately (a not very intuitive workaround).

> And the charupdate is not available.

Now it is, though a huge part is still in French. My apologies.

> To me now it seems like the best approach to do it in C, I will try
> investigate more on this.
> 
> Thanks for all the helpful feedbacks!

You are welcome.

> 
> On 3 November 2016 at 08:56, Marcel Schneider  wrote: 
> 
[?]


From verdy_p at wanadoo.fr  Thu Nov  3 17:56:13 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 3 Nov 2016 23:56:13 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <20161103134412.665a7a7059d7ee80bb4d670165c8327d.0e1cab67b3.wbe@email03.godaddy.com>
References: <20161103134412.665a7a7059d7ee80bb4d670165c8327d.0e1cab67b3.wbe@email03.godaddy.com>
Message-ID: <CAGa7JC2EFzXmVWz68KS6vZh8h0Z2GOB4E25_nqc+G32x19azhw@mail.gmail.com>

2016-11-03 21:44 GMT+01:00 Doug Ewell <doug at ewellic.org>:

> I think we are talking about two different issues here. It's important
> to keep these separate, to avoid talking past each other.
>
> Mats Blakstad wrote:
>
> > After managing to add the keyboard to XKB I started on a new venture
> > of trying to make a windows version of the keyboard using this:
> >
> > [link to Microsoft Keyboard Layout Creator]
> >
> > It is nearly impossible to replicate as it seems like you can only add
> > dead keys if they have a precomposed character.
>
> Mats is talking about the fact that a dead key combination (of any
> length) under Windows can generate only a single UTF-16 code unit.
>
> That's wrong. Windows can perfectly generate multiple code units (in fact
it does it for non BMP characters, including in MSKLC!) from its KLC tables
using the default system driver.

Only the GUI editor MSKLC cannot use this possibility and it does not
understand chained tables (note: you can perfectly assign another table
index instead of a character to the combination of a dead key state and
another dead key, so that you can type another key which will be mapped in
the combined state; the combined state can then accept the space bar to
force the output of the NFC form for SPACE+diacritic1+diacritic2, which
should be, if possible, a spacing-diacritic1 followed by a
combining-diacritic2, or the reverse if both diacritics have a non-zero
combining class but the second one has a lower combining clas than the
second one).

In summary MSKLC is unable to edit **visually** the combined state prodiced
by typing two dead keys. But the .klc file is compilable and works. It is
trivial to make such transform to generate the C source of the tables and
compile it to a driver, you should not need to know C/C++ to do that, and
the .klc source contianined the chained keys should be enough.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/b5c8d039/attachment.html>

From doug at ewellic.org  Thu Nov  3 18:24:57 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 03 Nov 2016 16:24:57 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161103162457.665a7a7059d7ee80bb4d670165c8327d.c28f73703f.wbe@email03.godaddy.com>

Philippe Verdy wrote:

>> Mats is talking about the fact that a dead key combination (of any
>> length) under Windows can generate only a single UTF-16 code unit.
>
> That's wrong. Windows can perfectly generate multiple code units (in
> fact it does it for non BMP characters, including in MSKLC!) from its
> KLC tables using the default system driver.

>From a dead key combination? Can you provide an example?

> Only the GUI editor MSKLC cannot use this possibility and it does not
> understand chained tables (note: you can perfectly assign another
> table index instead of a character to the combination of a dead key
> state and another dead key, so that you can type another key which
> will be mapped in the combined state; the combined state can then
> accept the space bar to force the output of the NFC form for
> SPACE+diacritic1+diacritic2, which should be, if possible, a
> spacing-diacritic1 followed by a combining-diacritic2, or the reverse
> if both diacritics have a non-zero combining class but the second one
> has a lower combining clas than the second one).

Even if true -- and I doubt that the Windows keyboard engine knows
anything about Unicode combining classes -- it doesn't solve Mats's
problem. He doesn't want to generate the two diacritical marks in
isolation. He could do that without dead keys.

If a user types a dead key, followed by a character not listed in the
dead key table, Windows gives up and outputs the characters associated
with the two keys. That's not at all the same thing as what Mats wants.

What Mats wants is to enter <dead key>, <dead key>, <base letter> and
have the keyboard generate <letter with two diacritical marks>. That is
the sequence of 3 output code units that the Windows architecture -- not
just MSKLC -- does not support. If you disagree, please provide an
example.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From mark at kli.org  Thu Nov  3 18:43:43 2016
From: mark at kli.org (Mark Shoulson)
Date: Thu, 3 Nov 2016 19:43:43 -0400
Subject: The (Klingon) Empire Strikes Back
Message-ID: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>

At the time of writing this letter it has not yet hit the UTC Document 
Register, but I have recently submitted a document revisiting the 
ever-popular issue of the encoding of Klingon "pIqaD".  The reason 
always given why it could not be encoded was that it did not enjoy 
enough usage, and so I've collected a bunch of examples to demonstrate 
that this is not true (scans and also web pages, etc.)  So the issue 
comes back up, and time to talk about it again.

Michael Everson: I basically copied your 1997 proposal into the 
document, with some minor changes.  I hope you don't mind.  And if you 
don't want to be on the hook for providing the glyphs to UTC, I can do 
that.  I think that proposal should serve as a starting-point for 
discussion anyway.  There are some things that maybe should be different:

1. the "SYMBOL FOR EMPIRE" also known as the "MUMMIFICATION GLYPH".  I 
don't know where the second name comes from, I don't know how important 
it is to encode it, and I don't know how much of a trademark headache it 
will cause with Paramount, as it is used pretty heavily in their 
imagery.  Something we'll have to talk about.

2. I put in the COMMA and FULL STOP, which were not in the original 
proposal but were in the ConScript registry entry.  The examples I have 
show them clearly being used.  UTC may decide to unify them with 
existing triangular shapes, which may or may not be a good idea.

3. For my part, I've invented a pair of ampersands for Klingon (Klingon 
has two words for "and": one for joining verbs/sentences and one for 
joining nouns (the former goes between its "conjunctands", the latter 
after them)), from ligatures of the letters in question.  The pretty 
much have NO usage, of course (and are not in the proposal), but maybe 
they should be presented to the community.

Document is available at http://web.meson.org/downloads/pIqaDReturns.pdf

Let the bickering begin!

~mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/955d6012/attachment.html>

From verdy_p at wanadoo.fr  Thu Nov  3 18:53:57 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 4 Nov 2016 00:53:57 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <20161103162457.665a7a7059d7ee80bb4d670165c8327d.c28f73703f.wbe@email03.godaddy.com>
References: <20161103162457.665a7a7059d7ee80bb4d670165c8327d.c28f73703f.wbe@email03.godaddy.com>
Message-ID: <CAGa7JC2=zmcKVQG8sKYteKTqV4pUMtDTeXT5SX_tUqQycjLotw@mail.gmail.com>

2016-11-04 0:24 GMT+01:00 Doug Ewell <doug at ewellic.org>:

> Philippe Verdy wrote:
>
> >> Mats is talking about the fact that a dead key combination (of any
> >> length) under Windows can generate only a single UTF-16 code unit.
> >
> > That's wrong. Windows can perfectly generate multiple code units (in
> > fact it does it for non BMP characters, including in MSKLC!) from its
> > KLC tables using the default system driver.
>
> From a dead key combination? Can you provide an example?
>
> > Only the GUI editor MSKLC cannot use this possibility and it does not
> > understand chained tables (note: you can perfectly assign another
> > table index instead of a character to the combination of a dead key
> > state and another dead key, so that you can type another key which
> > will be mapped in the combined state; the combined state can then
> > accept the space bar to force the output of the NFC form for
> > SPACE+diacritic1+diacritic2, which should be, if possible, a
> > spacing-diacritic1 followed by a combining-diacritic2, or the reverse
> > if both diacritics have a non-zero combining class but the second one
> > has a lower combining clas than the second one).
>
> Even if true -- and I doubt that the Windows keyboard engine knows
> anything about Unicode combining classes -- it doesn't solve Mats's
> problem. He doesn't want to generate the two diacritical marks in
> isolation. He could do that without dead keys.
>

Windows does not have to know that: the order will be the one you have used
in your keymap tables.

If a user types a dead key, followed by a character not listed in the
> dead key table, Windows gives up and outputs the characters associated
> with the two keys. That's not at all the same thing as what Mats wants.
>

Windows does not do that magically: for characters missing in a table, it
uses by default the position assigned to the space bar, which must be
mapped in all keymaps to generate a seuqnce for the "isolated" dead keys,
then it will reset the state to initial, and then will try to find a
mapping for that character from the table for the initial state.

>
> What Mats wants is to enter <dead key>, <dead key>, <base letter> and
> have the keyboard generate <letter with two diacritical marks>. That is
> the sequence of 3 output code units that the Windows architecture -- not
> just MSKLC -- does not support. If you disagree, please provide an
> example.


I had perfectly understood that ! And my response was in line for this need:

Pseudo-code:

Table[Initialstate] [<deadkey1>,<modifiers1>] = StateDeadKey1
Table[StateDeadKey1] [<deadkey2>,<modifiers2>] = StateDeadKey1And2
Table[StateDeadKey1And2] [<base letter>,<modifiers3>] = NFC(<base letter;
deadkey1; deadkey2>)

Each table entry can contain either a special value for a table index
(representing the current state), or a sequence of UTF-16 code units (the
number of code units depends on the table format, whose header indicates
how many code units are stored, and how many modifiers are mapped or
masked), or a null entry for unmapped keys). The maximum number of UTF-16
code units depends on the OS version which supports more formats (I think
it is now up to 6 code units in past versions it was 4, but there's an
extra format where table entries are in fact positions in a string table,
where strings have variable lengths: the string table just follows the
tables of keymaps, there's actually no code at all in most keyboard drivers
that don't need a special UI.

Newer drivers for Windows hwoever contain additional data with a geometric
layout for touch screens. Some drivers will contain code (notably for CJK
keyboards that need an UI interface for their IME, and for typing emojis,
or to use assistive technologies based on lingusitic dictionnary lookups,
such as "T9" input methods on smartphones/tablets/remote controls).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/386e0a85/attachment.html>

From verdy_p at wanadoo.fr  Thu Nov  3 19:06:49 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 4 Nov 2016 01:06:49 +0100
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
References: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
Message-ID: <CAGa7JC1wTKa0YVJJgtE-3LeVE7rfcy2Gu6OrVzzs3N0UWH9tHQ@mail.gmail.com>

2016-11-04 0:43 GMT+01:00 Mark Shoulson <mark at kli.org>:

> 3. For my part, I've invented a pair of ampersands for Klingon (Klingon
> has two words for "and": one for joining verbs/sentences and one for
> joining nouns (the former goes between its "conjunctands", the latter after
> them)), from ligatures of the letters in question.
>
That is not new to Klingon, and it exists also in Classical Latin :

- the coordinator "et" between words, for simple cases; this translates as
"and" in English...
- the "-que" suffix at end of the second word which may be far after the
first one (which could be in another prior sentence, or implied by the
context and not given explicitly); this translates as the adverb "also" in
English... I've seen that suffix abbreviated as a "q" with a tilde above,
or a slanted tilde mark attached above, or an horizontal tilde crossing the
leg of the q below... Sorry I can't remember the name of these abbreviation
marks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/af7f8603/attachment.html>

From mark at kli.org  Thu Nov  3 19:51:11 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Thu, 3 Nov 2016 20:51:11 -0400
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <CAGa7JC1wTKa0YVJJgtE-3LeVE7rfcy2Gu6OrVzzs3N0UWH9tHQ@mail.gmail.com>
References: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
 <CAGa7JC1wTKa0YVJJgtE-3LeVE7rfcy2Gu6OrVzzs3N0UWH9tHQ@mail.gmail.com>
Message-ID: <43b93f7c-dcf2-9315-5c2d-cde9896ca931@kli.org>

Yes, it isn't unique to Klingon, I never said it was, and who cares that 
Latin also has it??  We weren't talking about Latin!

~mark

On 11/03/2016 08:06 PM, Philippe Verdy wrote:
> 2016-11-04 0:43 GMT+01:00 Mark Shoulson <mark at kli.org 
> <mailto:mark at kli.org>>:
>
>     3. For my part, I've invented a pair of ampersands for Klingon
>     (Klingon has two words for "and": one for joining verbs/sentences
>     and one for joining nouns (the former goes between its
>     "conjunctands", the latter after them)), from ligatures of the
>     letters in question.
>
> That is not new to Klingon, and it exists also in Classical Latin :
>
> - the coordinator "et" between words, for simple cases; this 
> translates as "and" in English...
> - the "-que" suffix at end of the second word which may be far after 
> the first one (which could be in another prior sentence, or implied by 
> the context and not given explicitly); this translates as the adverb 
> "also" in English... I've seen that suffix abbreviated as a "q" with a 
> tilde above, or a slanted tilde mark attached above, or an horizontal 
> tilde crossing the leg of the q below... Sorry I can't remember the 
> name of these abbreviation marks.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161103/72c9fd42/attachment.html>

From verdy_p at wanadoo.fr  Thu Nov  3 22:29:40 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 4 Nov 2016 04:29:40 +0100
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <43b93f7c-dcf2-9315-5c2d-cde9896ca931@kli.org>
References: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
 <CAGa7JC1wTKa0YVJJgtE-3LeVE7rfcy2Gu6OrVzzs3N0UWH9tHQ@mail.gmail.com>
 <43b93f7c-dcf2-9315-5c2d-cde9896ca931@kli.org>
Message-ID: <CAGa7JC1Q8fEorNRzDGG_qqW=uqCDVFE266tAgZUTw8T2NgYewA@mail.gmail.com>

May be but it is still relevant : what is the purpose of these invented
Kilngon ampersands: aren't they ligatures or abbreviation marks like the
"-que", different from the "et" (&) ligature in Latin ? We have "&" encoded
only because it exists in ASCII and it is used as a distinctive isolated
symbol, But why wouldn't we have the "-que" ligature encoded in Latin, but
we would have two invented ligatures for Klongon ?


2016-11-04 1:51 GMT+01:00 Mark E. Shoulson <mark at kli.org>:

> Yes, it isn't unique to Klingon, I never said it was, and who cares that
> Latin also has it??  We weren't talking about Latin!
>
> ~mark
>
>
> On 11/03/2016 08:06 PM, Philippe Verdy wrote:
>
> 2016-11-04 0:43 GMT+01:00 Mark Shoulson <mark at kli.org>:
>
>> 3. For my part, I've invented a pair of ampersands for Klingon (Klingon
>> has two words for "and": one for joining verbs/sentences and one for
>> joining nouns (the former goes between its "conjunctands", the latter after
>> them)), from ligatures of the letters in question.
>>
> That is not new to Klingon, and it exists also in Classical Latin :
>
> - the coordinator "et" between words, for simple cases; this translates as
> "and" in English...
> - the "-que" suffix at end of the second word which may be far after the
> first one (which could be in another prior sentence, or implied by the
> context and not given explicitly); this translates as the adverb "also" in
> English... I've seen that suffix abbreviated as a "q" with a tilde above,
> or a slanted tilde mark attached above, or an horizontal tilde crossing the
> leg of the q below... Sorry I can't remember the name of these abbreviation
> marks.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/7a45e0cf/attachment.html>

From charupdate at orange.fr  Fri Nov  4 11:47:16 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 4 Nov 2016 17:47:16 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <20161103162457.665a7a7059d7ee80bb4d670165c8327d.c28f73703f.wbe@email03.godaddy.com>
References: <20161103162457.665a7a7059d7ee80bb4d670165c8327d.c28f73703f.wbe@email03.godaddy.com>
Message-ID: <130831862.11144.1478278036475.JavaMail.www@wwinf1j20>

On Thu, 03 Nov 2016 13:44:12 -0700, Doug Ewell wrote:

> I think we are talking about two different issues here. It's important 
> to keep these separate, to avoid talking past each other. 

Thank you for the clarification.

> A more realistic course of action might be for someone outside of 
> Microsoft, maybe someone on this list, to create their own GUI wrapper 
> around the Microsoft engine, a "new MSKLC" so to speak. That new project 
> could remove the MSKLC limitation, but not the Windows one. 

>From my own point of view, I can tell that creating big keyboard layouts 
(above 500 characters) in a GUI is really inefficient, hence the demand 
for an ?import table? feature as expressed in the cited 2010 thread:
http://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0020.html
# 6.

The MSKLC is OK, it provides all that is needed, gives a detailed insight 
into the first few shift states, and is well documented. What we can do:
1) Follow Michael?s invitation to automate with a batch script, as seems 
to intend his cited blog post, see link on bottom of:
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0213.html
2) Share source templates (as I?m doing on http://dispoclavier.com
already commented in English, but still under development);
3) Share spreadsheet folders that are automated for efficient layout table 
editing (allocation table, deadtrans list, ligatures table, NamesList.txt 
or UnicodeData.txt in a spreadsheet, for multiple purpose).

> Denis Jacquerye suggested using the letter as the dead key instead of 
> the diacritic. Perhaps a more straightforward approach would be to give 
> the diacritical marks their own normal keys, so the user could type 
> EPSILON, (combining) TILDE, (combining) ACUTE. 

This is found also for Bambara on a French-layout-based Malian layout:
http://www.mali-pense.net/IMG/pdf/le-clavier_francais-bambara.pdf
Linked on:
http://www.mali-pense.net/Ressources-pour-la-pratique-du.html
On this layout, the grave and circumflex accents are duplicated as combining 
diacritics to be used throughout as tone marks for consistency, because 
rendering differences were experienced between composed and precomposed.

> Marcel Schneider's suggestion of using Keyman instead might be the best, 
> if it is mandatory for the Windows version of this layout to be 
> identical to the Ubuntu version, for reasons I don't understand (many 
> keyboard layouts are already not constant across platforms). 

Yes, e.g. Apple does provide a French (France) layout that allows to write 
French, while Microsoft does not, although the charsets had been completed.
As soon as a standard layout does exist, it should be cross-platform.
So Mats Blakstad scarcely would be willing to maintain the two diverging 
implementations when standardization is on.

Marcel


From charupdate at orange.fr  Fri Nov  4 11:56:00 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 4 Nov 2016 17:56:00 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <CAGa7JC2=zmcKVQG8sKYteKTqV4pUMtDTeXT5SX_tUqQycjLotw@mail.gmail.com>
References: <20161103162457.665a7a7059d7ee80bb4d670165c8327d.c28f73703f.wbe@email03.godaddy.com>
 <CAGa7JC2=zmcKVQG8sKYteKTqV4pUMtDTeXT5SX_tUqQycjLotw@mail.gmail.com>
Message-ID: <583905832.11286.1478278560491.JavaMail.www@wwinf1j20>

On Fri, 4 Nov 2016 00:53:57 +0100, Philippe Verdy wrote:

> > What Mats wants is to enter , ,  and 
> > have the keyboard generate . That is 
> > the sequence of 3 output code units that the Windows architecture -- not 
> > just MSKLC -- does not support. If you disagree, please provide an 
> > example. 
> 
> I had perfectly understood that ! And my response was in line for this need: 
> 
> Pseudo-code: 
> 
> Table[Initialstate] [,] = StateDeadKey1 
> Table[StateDeadKey1] [,] = StateDeadKey1And2 
> Table[StateDeadKey1And2] [,] = NFC( deadkey1; deadkey2>) 
> 
> Each table entry can contain either a special value for a table index 
> (representing the current state), or a sequence of UTF-16 code units (the 
> number of code units depends on the table format, whose header indicates 
> how many code units are stored, and how many modifiers are mapped or 
> masked), or a null entry for unmapped keys). The maximum number of UTF-16 
> code units depends on the OS version which supports more formats (I think 
> it is now up to 6 code units in past versions it was 4, but there's an 
> extra format where table entries are in fact positions in a string table, 
> where strings have variable lengths: the string table just follows the 
> tables of keymaps, there's actually no code at all in most keyboard drivers 
> that don't need a special UI. 
> 
[?]

Does this work on Windows? Being not a programmer, I mainly ape and edit 
existing code, so to test this I need the exact spelling of the header and 
one complete line of the DEADTRANS function. Would you please provide a link 
to a source file or to a How-to page?

BTW when reading your comment, I suspect there is a mix of several sections.

Michael Kaplan knew that what you are claiming does not work:
?Every sequence of chained dead keys must end up pointing to a single 
UTF-16 code point; no sequence can be created;?
http://archives.miloush.net/michkap/archive/2011/04/16/10154700.html
(Michael?s blog post about chained dead keys, again.)

Having said that, your announcement (if true) shortcuts an enormous battle 
and greatly improves Microsoft?s relationship to Unicode support and i18n.
I?m getting puzzled that this feature is being hidden instead of promoted.
Finally however I?d be less surprised given these two precedents:

1) When based on MSKLC?s GUI I was in the same position of ignoring Windows 
support for serial dead keys, I vainly posted demands on Microsoft fora?
http://answers.microsoft.com/en-us/insider/forum/insider_wintp-insider_devices/how-to-implement-multiple-deadkey-strokes/4ff38c09-b58c-490a-
963e-3cc745dfb396
https://social.technet.microsoft.com/Forums/windows/en-US/e61dad3a-dbe5-4c5e-88af-7fc33cbb2e6a/multiple-deadkey-strokes-still-not-implemented-
on-windows?forum=w7itproappcompat
?until I found full explanations on the keyboarding page of MNA?s website:
http://accentuez.mon.nom.free.fr/Clavier-CreationClavier.php

2) The issue about the maximum number of code units input by a single key press.

So we look forward to any supplemental information, hopefully that Windows will 
end up having a keyboard input framework with exactly the same performances as 
its challengers.

Marcel


From doug at ewellic.org  Fri Nov  4 12:03:42 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 04 Nov 2016 10:03:42 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161104100342.665a7a7059d7ee80bb4d670165c8327d.70e1439568.wbe@email03.godaddy.com>

Philippe Verdy wrote:

>>> the combined state can then
>>> accept the space bar to force the output of the NFC form for
>>> SPACE+diacritic1+diacritic2, which should be, if possible, a
>>> spacing-diacritic1 followed by a combining-diacritic2, or the
>>> reverse if both diacritics have a non-zero combining class but the
>>> second one has a lower combining clas than the second one).
>>
>> Even if true -- and I doubt that the Windows keyboard engine knows
>> anything about Unicode combining classes -- it doesn't solve Mats's
>> problem. He doesn't want to generate the two diacritical marks in
>> isolation. He could do that without dead keys.
>
> Windows does not have to know that: the order will be the one you have
> used in your keymap tables.

Then combining classes have nothing to do with this after all, and it
was misleading to mention them.

>> If a user types a dead key, followed by a character not listed in the
>> dead key table, Windows gives up and outputs the characters
>> associated with the two keys. That's not at all the same thing as
>> what Mats wants.
>
> Windows does not do that magically: for characters missing in a table,
> it uses by default the position assigned to the space bar, which must
> be mapped in all keymaps to generate a seuqnce for the "isolated" dead
> keys, then it will reset the state to initial, and then will try to
> find a mapping for that character from the table for the initial
> state.

Nope. Try typing <acute accent>, <b> on any Windows keyboard you like.
You will get 'b' followed by whatever base character is associated with
the <acute accent> dead key. This is often apostrophe or U+00B4, but the
space bar has *nothing to do with this*. It is the code point that has
the @ sign before it in the main LAYOUT table.

Here is a snippet you can actually copy and paste into a KLC file to
illustrate this:

<begin code>

LAYOUT  ;an extra '@' at the end is a dead key
//SC VK_  Cap 0 1 2
//-- ----  ---- ---- ---- ----
28 OEM_7  0 0027@ -1 -1  // APOSTROPHE, <none>, <none>
30 B  0 b -1 -1  // LATIN SMALL LETTER B, <none>, <none>
39 SPACE  0 0020 0020 -1  // SPACE, SPACE, <none>
53 DECIMAL 0 -1 -1 -1  // 

DEADKEY 0027
0061 00e1 // a -> ?

<end code>

> Pseudo-code:
>
> Table[Initialstate] [<deadkey1>,<modifiers1>] = StateDeadKey1
> Table[StateDeadKey1] [<deadkey2>,<modifiers2>] = StateDeadKey1And2
> Table[StateDeadKey1And2] [<base letter>,<modifiers3>] =
> NFC(<base letter; deadkey1; deadkey2>)

This is not an example of how it actually works, which someone else can
duplicate. It is a description of how you imagine it works.

The chained dead key part is fine, as I said before, but the part where
NFC(<base letter; deadkey1; deadkey2>) adds up to two or more code units
is NOT fine. You can't do that. It won't compile the way you expect, if
at all. Try it and see, and send or post the *actual code* if you get it
to work.

> Each table entry can contain either a special value for a table index
> (representing the current state), or a sequence of UTF-16 code units
> (the number of code units depends on the table format, whose header
> indicates how many code units are stored, and how many modifiers are
> mapped or masked), or a null entry for unmapped keys). The maximum
> number of UTF-16 code units depends on the OS version which supports
> more formats (I think it is now up to 6 code units in past versions it
> was 4, but there's an extra format where table entries are in fact
> positions in a string table, where strings have variable lengths: the
> string table just follows the tables of keymaps, there's actually no
> code at all in most keyboard drivers that don't need a special UI.

Very little of this is demonstrably true, such as the part where the
limit of 4 UTF-16 code units was somehow increased to 6, despite the
fact that Kaplan often said this had not happened. And dead key mappings
don't follow this at all; they are limited to ONE code unit.

Again, if you can't demonstrate otherwise, but can only assert it, you
may as well assert that the sun revolves around the earth.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From doug at ewellic.org  Fri Nov  4 12:09:54 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 04 Nov 2016 10:09:54 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161104100954.665a7a7059d7ee80bb4d670165c8327d.ca380c1ebd.wbe@email03.godaddy.com>

I wrote:
 
> You will get 'b' followed by whatever base character is associated
> with the <acute accent> dead key.

Sorry, should be "preceded by".

--
Doug Ewell | Thornton, CO, US | ewellic.org


From davidj_faulks at yahoo.ca  Fri Nov  4 12:41:44 2016
From: davidj_faulks at yahoo.ca (David Faulks)
Date: Fri, 4 Nov 2016 17:41:44 +0000 (UTC)
Subject: The (Klingon) Empire Strikes Back
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
Message-ID: <42101413.334282.1478281304520@mail.yahoo.com>

> On Thu, 11/3/16, Mark Shoulson <mark at kli.org> wrote:
> Subject: The (Klingon) Empire Strikes Back
  
> At the time of writing this letter it has not yet hit the UTC
> Document Register, but I have recently submitted a document
> revisiting the ever-popular issue of the encoding of Klingon
> "pIqaD".? The reason always given why it could not be
> encoded was that it did not enjoy enough usage, and so I've
> collected a bunch of examples to demonstrate that this is not
> true (scans and also web pages, etc.)? So the issue comes
> back up, and time to talk about it again.

There is another issue of course, which I think could be a huge obstacle: the Trademark/Copyright issue. Paramount claims copyright over the entire Klingon language (presumably including the script). The issue has recently gone to court. Encoding criteria for symbols (and this likely extends to letters) is against encoding them without the permission of the Copyright/Trademark holder.

Is Paramount endorsing your proposal?

<snip>

> ~mark

David Faulks
 
     
From verdy_p at wanadoo.fr  Fri Nov  4 13:06:07 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 4 Nov 2016 19:06:07 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <20161104100342.665a7a7059d7ee80bb4d670165c8327d.70e1439568.wbe@email03.godaddy.com>
References: <20161104100342.665a7a7059d7ee80bb4d670165c8327d.70e1439568.wbe@email03.godaddy.com>
Message-ID: <CAGa7JC1XBMjTZXFNQaEAV8oqSF-Yz76zyddOjVpYQ6dtniD=Dw@mail.gmail.com>

2016-11-04 18:03 GMT+01:00 Doug Ewell <doug at ewellic.org>:

> Philippe Verdy wrote:
>
> >>> the combined state can then
> >>> accept the space bar to force the output of the NFC form for
> >>> SPACE+diacritic1+diacritic2, which should be, if possible, a
> >>> spacing-diacritic1 followed by a combining-diacritic2, or the
> >>> reverse if both diacritics have a non-zero combining class but the
> >>> second one has a lower combining clas than the second one).
> >>
> >> Even if true -- and I doubt that the Windows keyboard engine knows
> >> anything about Unicode combining classes -- it doesn't solve Mats's
> >> problem. He doesn't want to generate the two diacritical marks in
> >> isolation. He could do that without dead keys.
> >
> > Windows does not have to know that: the order will be the one you have
> > used in your keymap tables.
>
> Then combining classes have nothing to do with this after all, and it
> was misleading to mention them.
>
> >> If a user types a dead key, followed by a character not listed in the
> >> dead key table, Windows gives up and outputs the characters
> >> associated with the two keys. That's not at all the same thing as
> >> what Mats wants.
> >
> > Windows does not do that magically: for characters missing in a table,
> > it uses by default the position assigned to the space bar, which must
> > be mapped in all keymaps to generate a seuqnce for the "isolated" dead
> > keys, then it will reset the state to initial, and then will try to
> > find a mapping for that character from the table for the initial
> > state.
>
> Nope. Try typing <acute accent>, <b> on any Windows keyboard you like.
> You will get 'b' followed by whatever base character is associated with
> the <acute accent> dead key. This is often apostrophe or U+00B4, but the
> space bar has *nothing to do with this*. It is the code point that has
> the @ sign before it in the main LAYOUT table.
>
> Here is a snippet you can actually copy and paste into a KLC file to
> illustrate this:
>
> <begin code>
>
> LAYOUT  ;an extra '@' at the end is a dead key
> //SC VK_  Cap 0 1 2
> //-- ----  ---- ---- ---- ----
> 28 OEM_7  0 0027@ -1 -1  // APOSTROPHE, <none>, <none>
> 30 B  0 b -1 -1  // LATIN SMALL LETTER B, <none>, <none>
> 39 SPACE  0 0020 0020 -1  // SPACE, SPACE, <none>
> 53 DECIMAL 0 -1 -1 -1  //
>
> DEADKEY 0027
> 0061 00e1 // a -> ?
>
> <end code>
>
> > Pseudo-code:
> >
> > Table[Initialstate] [<deadkey1>,<modifiers1>] = StateDeadKey1
> > Table[StateDeadKey1] [<deadkey2>,<modifiers2>] = StateDeadKey1And2
> > Table[StateDeadKey1And2] [<base letter>,<modifiers3>] =
> > NFC(<base letter; deadkey1; deadkey2>)
>
> This is not an example of how it actually works, which someone else can
> duplicate. It is a description of how you imagine it works.
>

It is the way it is documented in MSDN that explains the formats fo keymap
tables (you have to notice that there are several table formats, each
format allowing more or less code units.

You seem to only see the basic historic format (the one used in
Win16/Win9x) that only stores a single code unit, there are others, and
they are documented, includeing the fact that the values of table entries
are two kinds: either code units, or specific values for chaining to a dead
key table, and the spacial NULL value to fill gaps, because table entries
have a static length.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/df922152/attachment.html>

From doug at ewellic.org  Fri Nov  4 13:52:24 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 04 Nov 2016 11:52:24 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161104115224.665a7a7059d7ee80bb4d670165c8327d.30cff46423.wbe@email03.godaddy.com>

Philippe Verdy wrote:

>> This is not an example of how it actually works, which someone else
>> can duplicate. It is a description of how you imagine it works.
>
> It is the way it is documented in MSDN that explains the formats fo
> keymap tables (you have to notice that there are several table
> formats, each format allowing more or less code units.

Well, gee, I'd like to look that up and see how to apply it, but you
didn't supply a link. Does one exist?

> You seem to only see the basic historic format (the one used in Win16/
> Win9x) that only stores a single code unit, there are others, and they
> are documented, includeing the fact that the values of table entries
> are two kinds: either code units, or specific values for chaining to a
> dead key table, and the spacial NULL value to fill gaps, because table
> entries have a static length.

Where is the reference to these new formats? Where are the guidelines
and specifications on how to build a Windows keyboard layout, or even a
"new MSKLC," taking these new formats and tables into account? Are they
available anywhere? (Don't just say "MSDN," which is big. Be specific.)
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From verdy_p at wanadoo.fr  Fri Nov  4 14:44:55 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 4 Nov 2016 20:44:55 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <20161104115224.665a7a7059d7ee80bb4d670165c8327d.30cff46423.wbe@email03.godaddy.com>
References: <20161104115224.665a7a7059d7ee80bb4d670165c8327d.30cff46423.wbe@email03.godaddy.com>
Message-ID: <CAGa7JC2dkT_G1ooZKfTeDs4vbJQJx55MaPg_ySwRJde4S_LiZA@mail.gmail.com>

Consider this source code (based on Microsfot "kbd.h", even if it is ported
to ReadOS)

https://doxygen.reactos.org/d7/df4/kbd_8h_source.html

Look for the structures named with "LIGATURE"

And now look at the special entry value "WCH_LGTR"=0xF002 (i.e. a PUA),
which indicate these keys are mapped using those "LIGATUREn" structures
(which have arbitrary lengths in WCHAR/UTF-16 code units), instead of
storing a 16-bit code unit directly.

<kbd.h> predefines LIGATURE1 to LIGATURE5 but longer lengths are possible
(see cbLgEntry and nLgMaxd members in the KBDTABLE structure)

The table of ligatures in linked from the pLigature member of the KBDTABLES
structure, which points to the first set of LIGATURE1 mappings.

Now study more precisely how _KBDTABLES is defined <kbd.h> and documented
in MSDN...


2016-11-04 19:52 GMT+01:00 Doug Ewell <doug at ewellic.org>:
>
> Philippe Verdy wrote:
>
> >> This is not an example of how it actually works, which someone else
> >> can duplicate. It is a description of how you imagine it works.
> >
> > It is the way it is documented in MSDN that explains the formats fo
> > keymap tables (you have to notice that there are several table
> > formats, each format allowing more or less code units.
>
> Well, gee, I'd like to look that up and see how to apply it, but you
> didn't supply a link. Does one exist?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/bed2907b/attachment.html>

From mark at kli.org  Fri Nov  4 15:17:37 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Fri, 4 Nov 2016 16:17:37 -0400
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <42101413.334282.1478281304520@mail.yahoo.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
Message-ID: <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>

I know of the Axanar flap.  I'm not sure that Paramount was *seriously* 
saying "we own everything anyone ever says or will say in this 
language."  What they said was more "you used Klingon in your story, and 
Klingon is our language, therefore your story is infringing on our 
stuff."  So while it's true they *might* make that claim, I don't know 
that they *have*.

All of which is neither here nor there; it's something they could say.  
The LCS wrote an amicus brief, which is linked to from my document, by 
the way, arguing that very point, which the judge dismissed without 
prejudice on the grounds that he wasn't going to be addressing that 
issue (so he may not have seen it as critical to Paramount's case 
either).  A claim as bald and universal as the way I worded it above is 
practically indefensible logically, intuitively, and legally (Sun 
invented Java, but can they claim every Java program???)  At any rate, 
this isn't Unicode's problem.  Unicode would not be creating anything in 
Klingon anyway!  Just encoding letters used to write it.  Now, those 
letter-shapes might (for all I know) have legal strings attached, and 
what's more, the word "Klingon" is definitely owned and claimed by 
Paramount, which might cause problems with naming the block.

Really, though, that isn't what UTC should be deciding.  The question is 
whether or not to encode pIqaD: is it a writing system that people use 
or have used in the past to communicate (that's the main criterion, 
right?  Unicode is supposed to contain "all" alphabets).  If there are 
additional issues outside of UTC's purview that raise difficulties, 
those will have to be heard and addressed. But decide to act first, 
*then* see what obstacles need to be overcome.

~mark

On 11/04/2016 01:41 PM, David Faulks wrote:
>> On Thu, 11/3/16, Mark Shoulson <mark at kli.org> wrote:
>> Subject: The (Klingon) Empire Strikes Back
>    
>> At the time of writing this letter it has not yet hit the UTC
>> Document Register, but I have recently submitted a document
>> revisiting the ever-popular issue of the encoding of Klingon
>> "pIqaD".  The reason always given why it could not be
>> encoded was that it did not enjoy enough usage, and so I've
>> collected a bunch of examples to demonstrate that this is not
>> true (scans and also web pages, etc.)  So the issue comes
>> back up, and time to talk about it again.
> There is another issue of course, which I think could be a huge obstacle: the Trademark/Copyright issue. Paramount claims copyright over the entire Klingon language (presumably including the script). The issue has recently gone to court. Encoding criteria for symbols (and this likely extends to letters) is against encoding them without the permission of the Copyright/Trademark holder.
>
> Is Paramount endorsing your proposal?
>
> <snip>
>
>> ~mark
> David Faulks
>   
>       
>     
>   
>   


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/f3b415a1/attachment.html>

From doug at ewellic.org  Fri Nov  4 15:52:52 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 04 Nov 2016 13:52:52 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>

Philippe Verdy wrote:

> Consider this source code (based on Microsfot "kbd.h", even if it is
> ported to ReadOS)
>
> https://doxygen.reactos.org/d7/df4/kbd_8h_source.html
>
> Look for the structures named with "LIGATURE"
>
> And now look at the special entry value "WCH_LGTR"=0xF002 (i.e. a
> PUA), which indicate these keys are mapped using those "LIGATUREn"
> structures (which have arbitrary lengths in WCHAR/UTF-16 code units),
> instead of storing a 16-bit code unit directly.
>
> <kbd.h> predefines LIGATURE1 to LIGATURE5 but longer lengths are
> possible (see cbLgEntry and nLgMaxd members in the KBDTABLE structure)
>
> The table of ligatures in linked from the pLigature member of the
> KBDTABLES structure, which points to the first set of LIGATURE1
> mappings.

OK, I understand now. We are rehashing the discussion on this list from
August 2015, in which Marcel claimed that the presence of these lines in
kbd.h:

#define TYPEDEF_LIGATURE(i) \
typedef struct _LIGATURE ## i { \
	BYTE VirtualKey; \
	WORD ModificationNumber; \
	WCHAR wch[i]; \
} LIGATURE ## i, *PLIGATURE ## i;

	TYPEDEF_LIGATURE(1)
	TYPEDEF_LIGATURE(2)
	TYPEDEF_LIGATURE(3)
	TYPEDEF_LIGATURE(4)
	TYPEDEF_LIGATURE(5)

was proof that some version of Windows actually supported ligatures
longer than 4 code units (WCHARs). But no such proof ever materialized.
There is still no documentation and no examples of any native Windows
keyboard that generates more than 4 code units from one keystroke.

kbd.h could declare:

	TYPEDEF_LIGATURE(8192)

and a user could compile it, and that would have nothing to do with
whether the Windows runtime could actually handle a LIGATURE structure
of that size.

Going beyond 4 seems like such a useful and intriguing enhancement, for
some folks anyway, that if it were possible, it should be easy to find
at least one example where some DDK developer has utilized it.

And once again, that is not what Mats was talking about. He was talking
about dead-key combinations not being able to generate more than ONE
code unit. And if you go back and look at kbd.h, you will see this:

typedef struct _DEADKEY {
	DWORD dwBoth;
	WCHAR wchComposed;
	USHORT uFlags;
} DEADKEY, *PDEADKEY;

typedef WCHAR *DEADKEY_LPWSTR;

Notice the absence of any array of 4, 6, or 8192 WCHARs. Only one WCHAR
can be composed from a dead-key sequence. This is why Mats was unable to
create a keyboard for double-accented letters that don't map to a single
BMP code point using dead keys. (Correct, Mats?)

A clarification: When I said "send or post the *actual code*", I assumed
you were creating KLC files and running them through kbdutool (bypassing
MSKLC), as you implied yesterday, not examining C++ code from the DDK. I
apologize for this unstated assumption and the confusion it caused, but
I still don't see any facts to support either the claim that a single
keystroke can generate more than 4 code units, or the claim that a dead
key combination can generate more than 1.

I'm currently trying to see if there is a Microsoft employee or business
unit that can resolve these questions for us once and for all.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From doug at ewellic.org  Fri Nov  4 16:02:36 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 04 Nov 2016 14:02:36 -0700
Subject: The (Klingon) Empire Strikes Back
Message-ID: <20161104140236.665a7a7059d7ee80bb4d670165c8327d.ef5253d96e.wbe@email03.godaddy.com>

Mark E. Shoulson wrote:

> At any rate, this isn't Unicode's problem. Unicode would not be
> creating anything in Klingon anyway!

Well, to be fair, I thought IPR was the primary reason Unicode had never
encoded the Apple logo either. I doubt that whether Unicode intended to
use such a character themselves was a factor. (Of course, users who
really wanted that character encoded are probably using ?? or ??
now.) 
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From verdy_p at wanadoo.fr  Fri Nov  4 17:16:30 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 4 Nov 2016 23:16:30 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
References: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
Message-ID: <CAGa7JC0s89A9m+QV_SNcWGXcaQidtOu_YH_or=CxCjihoASVqw@mail.gmail.com>

2016-11-04 21:52 GMT+01:00 Doug Ewell <doug at ewellic.org>:

> OK, I understand now. We are rehashing the discussion on this list from
> August 2015, in which Marcel claimed that the presence of these lines in
> kbd.h:
>
> #define TYPEDEF_LIGATURE(i) \
> typedef struct _LIGATURE ## i { \
>         BYTE VirtualKey; \
>         WORD ModificationNumber; \
>         WCHAR wch[i]; \
> } LIGATURE ## i, *PLIGATURE ## i;
>
>         TYPEDEF_LIGATURE(1)
>         TYPEDEF_LIGATURE(2)
>         TYPEDEF_LIGATURE(3)
>         TYPEDEF_LIGATURE(4)
>         TYPEDEF_LIGATURE(5)
>
> was proof that some version of Windows actually supported ligatures
> longer than 4 code units (WCHARs).


Why then the SDK predefines a structure with 5 code units ???


> But no such proof ever materialized.
>

You'll find examples in the ReactOS  sources (the link I gave) that
provides drivers for many more languages than the two example drivers
provided with the SDK.


> And once again, that is not what Mats was talking about. He was talking
> about dead-key combinations not being able to generate more than ONE
> code unit. And if you go back and look at kbd.h, you will see this:
>
> typedef struct _DEADKEY {
>         DWORD dwBoth;
>         WCHAR wchComposed;
>         USHORT uFlags;
> } DEADKEY, *PDEADKEY;
>
> typedef WCHAR *DEADKEY_LPWSTR;
>

Here again, the support of 4 code points in structures allows binding
"ligatures" in keymaps, even if their entries contain a single WCHAR, using
the special value for "ligatures" (which are looked up in a separate table.

>
> Notice the absence of any array of 4, 6, or 8192 WCHARs.


You don't need to ! you assign a value WCH_LGTR=0xF002 (the PUA code unit),
which triggers a lookup in the "LIGATUREn" tables.


> Only one WCHAR
> can be composed from a dead-key sequence.


Wrong, you assign a WCH_LGTR and then ligature tables are used, they are
not limited to just one code unit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/c631984d/attachment.html>

From verdy_p at wanadoo.fr  Fri Nov  4 17:22:42 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 4 Nov 2016 23:22:42 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <CAGa7JC0s89A9m+QV_SNcWGXcaQidtOu_YH_or=CxCjihoASVqw@mail.gmail.com>
References: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
 <CAGa7JC0s89A9m+QV_SNcWGXcaQidtOu_YH_or=CxCjihoASVqw@mail.gmail.com>
Message-ID: <CAGa7JC3-ODFAs8unEu1y76c+PkF9wd0bq2XnTqgu8KXh+hrJrg@mail.gmail.com>

Look at this example using LIGATURE3 (kbdinasa.dll : "ASSAMESE - INSCRIPT"):

https://doxygen.reactos.org/da/dc5/kbdinasa_8c_source.html

2016-11-04 23:16 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> 2016-11-04 21:52 GMT+01:00 Doug Ewell <doug at ewellic.org>:
>
>> OK, I understand now. We are rehashing the discussion on this list from
>> August 2015, in which Marcel claimed that the presence of these lines in
>> kbd.h:
>>
>> #define TYPEDEF_LIGATURE(i) \
>> typedef struct _LIGATURE ## i { \
>>         BYTE VirtualKey; \
>>         WORD ModificationNumber; \
>>         WCHAR wch[i]; \
>> } LIGATURE ## i, *PLIGATURE ## i;
>>
>>         TYPEDEF_LIGATURE(1)
>>         TYPEDEF_LIGATURE(2)
>>         TYPEDEF_LIGATURE(3)
>>         TYPEDEF_LIGATURE(4)
>>         TYPEDEF_LIGATURE(5)
>>
>> was proof that some version of Windows actually supported ligatures
>> longer than 4 code units (WCHARs).
>
>
> Why then the SDK predefines a structure with 5 code units ???
>
>
>> But no such proof ever materialized.
>>
>
> You'll find examples in the ReactOS  sources (the link I gave) that
> provides drivers for many more languages than the two example drivers
> provided with the SDK.
>
>
>> And once again, that is not what Mats was talking about. He was talking
>> about dead-key combinations not being able to generate more than ONE
>> code unit. And if you go back and look at kbd.h, you will see this:
>>
>> typedef struct _DEADKEY {
>>         DWORD dwBoth;
>>         WCHAR wchComposed;
>>         USHORT uFlags;
>> } DEADKEY, *PDEADKEY;
>>
>> typedef WCHAR *DEADKEY_LPWSTR;
>>
>
> Here again, the support of 4 code points in structures allows binding
> "ligatures" in keymaps, even if their entries contain a single WCHAR, using
> the special value for "ligatures" (which are looked up in a separate table.
>
>>
>> Notice the absence of any array of 4, 6, or 8192 WCHARs.
>
>
> You don't need to ! you assign a value WCH_LGTR=0xF002 (the PUA code
> unit), which triggers a lookup in the "LIGATUREn" tables.
>
>
>> Only one WCHAR
>> can be composed from a dead-key sequence.
>
>
> Wrong, you assign a WCH_LGTR and then ligature tables are used, they are
> not limited to just one code unit.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/3ffe58c8/attachment.html>

From doug at ewellic.org  Fri Nov  4 17:30:48 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 04 Nov 2016 15:30:48 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>

I am seeking technical information from a Microsoft team member.
Hopefully we will soon have definitive answers to replace all the
controversy.

--
Doug Ewell | Thornton, CO, US | ewellic.org


From lang.support at gmail.com  Fri Nov  4 18:17:30 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 5 Nov 2016 10:17:30 +1100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
Message-ID: <CAGJ7U-WpEO0DaO1UXoidTCmhdH+ucFtpvBV=sNSNbMWLX3iM1Q@mail.gmail.com>

Thanks Doug,

That would be welcome.


On Saturday, 5 November 2016, Doug Ewell <doug at ewellic.org> wrote:
> I am seeking technical information from a Microsoft team member.
> Hopefully we will soon have definitive answers to replace all the
> controversy.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161105/21c6e5dd/attachment.html>

From charupdate at orange.fr  Fri Nov  4 22:33:00 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 5 Nov 2016 04:33:00 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
References: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
Message-ID: <64711057.25674.1478316780673.JavaMail.www@wwinf1d31>

Sorry, while trying to look up MSDN, I lost touch with the discussion 
and didn?t notice that my information about ?more than 4 code units?, 
more precisely ?16 code units? by a live key press has been questioned 
again. Even if primarily off-topic, it is a rather useful subject, along 
with the input of several code units by dead keys (which admittedly is 
more important). 

To achieve the requested materialization, you are welcome to do the 
following steps:
1) Open http://dispoclavier.com
2) Click the download button [T?l?charger]
3) Unzip the folder 
4) Browse to ?DTM_Dispoclavier_v0.9.0.44\DTMD_v0.9.0.44_(installation)\
kbdfrf81 azerty d?ploy? capitales et chiffres v0.9.0.44 installation?
5) Read the ?Note?
6) Run the ?setup.exe? (noticing that it has been provided by MSKLC)
7) Click the Language button in the Language bar and select ?French (France)?
8) Eventually click the Keyboard button and select ?DTMD France azerty d?ploy?
capitales et chiffres?
9) Make sure to use an ISO keyboard with a key for VK_OEM_105; 
or remap the left Windows key to it: if no key is already remapped, merge this:
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout]
"Scancode Map"=hex:00,00,00,00,00,00,00,00,04,00,00,00,56,00,5b,e0,5b,e0,56,00,\
00,00,00,00

(When this is used with an ISO keyboard, the two keys are swapped.)
10) Press the following three keys together: Left Shift; the ISO key or if 
remapped, the Left Windows key; Q (on AZERTY) or A (on QWERTY).
The expected keyboard input is: ? ?q_n?existe_pas? [?superscript small q 
does not exist?] preceded by a white space for quick erase by Ctrl+Backspace.
Please note that in the next version. ??? will be replaced by ?^?, since ?^? 
will be the character of the superscript dead key, while the character of the 
circumflex dead key is ???.

Along with this test, you may wish to look up the sources in the other main 
folder. 16 is the empirically stated maximum number of inserted code units.

Best regards,

Marcel?
?
> Message du 04/11/16 21:58
> De : "Doug Ewell" 
> A : verdy_p at wanadoo.fr
> Copie ? : "Marcel Schneider" , "Denis Jacquerye" , "Mats Blakstad" , 
"Unicode Mailing List" 
> Objet : RE: Possible to add new precomposed characters for local language in Togo?
> 
> Philippe Verdy wrote:
> 
> > Consider this source code (based on Microsfot "kbd.h", even if it is
> > ported to ReadOS)
> >
> > https://doxygen.reactos.org/d7/df4/kbd_8h_source.html
> >
> > Look for the structures named with "LIGATURE"
> >
> > And now look at the special entry value "WCH_LGTR"=0xF002 (i.e. a
> > PUA), which indicate these keys are mapped using those "LIGATUREn"
> > structures (which have arbitrary lengths in WCHAR/UTF-16 code units),
> > instead of storing a 16-bit code unit directly.
> >
> >  predefines LIGATURE1 to LIGATURE5 but longer lengths are
> > possible (see cbLgEntry and nLgMaxd members in the KBDTABLE structure)
> >
> > The table of ligatures in linked from the pLigature member of the
> > KBDTABLES structure, which points to the first set of LIGATURE1
> > mappings.
> 
> OK, I understand now. We are rehashing the discussion on this list from
> August 2015, in which Marcel claimed that the presence of these lines in
> kbd.h:
> 
> #define TYPEDEF_LIGATURE(i) \
> typedef struct _LIGATURE ## i { \
> BYTE VirtualKey; \
> WORD ModificationNumber; \
> WCHAR wch[i]; \
> } LIGATURE ## i, *PLIGATURE ## i;
> 
> TYPEDEF_LIGATURE(1)
> TYPEDEF_LIGATURE(2)
> TYPEDEF_LIGATURE(3)
> TYPEDEF_LIGATURE(4)
> TYPEDEF_LIGATURE(5)
> 
> was proof that some version of Windows actually supported ligatures
> longer than 4 code units (WCHARs). But no such proof ever materialized.
> There is still no documentation and no examples of any native Windows
> keyboard that generates more than 4 code units from one keystroke.
> 
> kbd.h could declare:
> 
> TYPEDEF_LIGATURE(8192)
> 
> and a user could compile it, and that would have nothing to do with
> whether the Windows runtime could actually handle a LIGATURE structure
> of that size.
> 
> Going beyond 4 seems like such a useful and intriguing enhancement, for
> some folks anyway, that if it were possible, it should be easy to find
> at least one example where some DDK developer has utilized it.
> 
> And once again, that is not what Mats was talking about. He was talking
> about dead-key combinations not being able to generate more than ONE
> code unit. And if you go back and look at kbd.h, you will see this:
> 
> typedef struct _DEADKEY {
> DWORD dwBoth;
> WCHAR wchComposed;
> USHORT uFlags;
> } DEADKEY, *PDEADKEY;
> 
> typedef WCHAR *DEADKEY_LPWSTR;
> 
> Notice the absence of any array of 4, 6, or 8192 WCHARs. Only one WCHAR
> can be composed from a dead-key sequence. This is why Mats was unable to
> create a keyboard for double-accented letters that don't map to a single
> BMP code point using dead keys. (Correct, Mats?)
> 
> A clarification: When I said "send or post the *actual code*", I assumed
> you were creating KLC files and running them through kbdutool (bypassing
> MSKLC), as you implied yesterday, not examining C++ code from the DDK. I
> apologize for this unstated assumption and the confusion it caused, but
> I still don't see any facts to support either the claim that a single
> keystroke can generate more than 4 code units, or the claim that a dead
> key combination can generate more than 1.
> 
> I'm currently trying to see if there is a Microsoft employee or business
> unit that can resolve these questions for us once and for all.
> 
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
> 
>


From charupdate at orange.fr  Fri Nov  4 22:41:21 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 5 Nov 2016 04:41:21 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
References: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
Message-ID: <1415950225.25679.1478317281693.JavaMail.www@wwinf1d31>

I?m sorry for the typo: ?VK_OEM_105? should read ?VK_OEM_102?.
(The registry key is tested and OK.)

A few minutes ago, I wrote:

> 9) Make sure to use an ISO keyboard with a key for VK_OEM_105; 
> or remap the left Windows key to it: if no key is already remapped, merge this:
> Windows Registry Editor Version 5.00

> [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout]
> "Scancode Map"=hex:00,00,00,00,00,00,00,00,04,00,00,00,56,00,5b,e0,5b,e0,56,00,\
> 00,00,00,00

> (When this is used with an ISO keyboard, the two keys are swapped.)


From charupdate at orange.fr  Sat Nov  5 11:51:21 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 5 Nov 2016 17:51:21 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <CAGa7JC3-ODFAs8unEu1y76c+PkF9wd0bq2XnTqgu8KXh+hrJrg@mail.gmail.com>
References: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
 <CAGa7JC0s89A9m+QV_SNcWGXcaQidtOu_YH_or=CxCjihoASVqw@mail.gmail.com>
 <CAGa7JC3-ODFAs8unEu1y76c+PkF9wd0bq2XnTqgu8KXh+hrJrg@mail.gmail.com>
Message-ID: <1751824501.10460.1478364682456.JavaMail.www@wwinf1p03>

Sorry not to have found time sooner to look close at the stuff 
that is claimed to support code unit sequences through dead keys.
It?s all about live keys, none about dead keys. 
Yet another case of talking past each other.

IMHO that happened because one simple question was not answered prior to 
sharing links to sources: How will the API know what line of aLigature 
(the ligature table) to look up, if the 0xf002 alias WCH_LGTR is not found 
in aVkToWch<n> (the allocation table)?

Indeed, column 1 of the ligature table contains the virtual key, and 
column 2 contains the modification number, that refers to the column of 
the allocation table where each 0xf002 or WCH_LGTR is mapped to a key and 
shift state:

static ALLOC_SECTION_LDATA VK_TO_WCHARS38 aVkToWch38[] = {
// Modification_# >>>|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|
{'Q'/*T1E 
C01*/,0x01,'q','Q','#',0x2126,0x00f7,LGTR,0x0331,NONE,NONE,NONE,NONE,NONE,0x0634,'\\',0x0447,0x0427,0x0447,0x0427,'&','%',0x03c2,0x2211,'&'
,'%',0x05e7,'*','&','%',LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,NONE,NONE}, // 
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} 
};

static ALLOC_SECTION_LDATA LIGATURE16 aLigature[] = {
// |Virtual_Key|SC|ISO_#|Modif#|Char0|Char1|Char2|Char3|Char4|Char5|Char6|Char7|Char8|Char9|Char10|Char11|Char12|Char13|Char14|Char15|
{'Q'/*T1E C01*/,5,' ',0x2191,'q','_','n',0x2019,'e','x','i','s','t','e','_','p','a','s'}, // ^q doesn't exist
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
};

This leads again to the (off-topic) concern about some examples found on 
the internet. I?ll try to do some search for web pages in English, while 
we are looking forward to the advice that Doug Ewell kindly requested.

Marcel

On Fri, 4 Nov 2016 23:22:42 +0100, Philippe Verdy wrote:

> Look at this example using LIGATURE3 (kbdinasa.dll : "ASSAMESE - INSCRIPT"):
>
> https://doxygen.reactos.org/da/dc5/kbdinasa_8c_source.html
>
> 2016-11-04 23:16 GMT+01:00 Philippe Verdy :
>
>> 2016-11-04 21:52 GMT+01:00 Doug Ewell :
>>>
>>> OK, I understand now. We are rehashing the discussion on this list from
>>> August 2015, in which Marcel claimed that the presence of these lines in
>>> kbd.h:
>>> 
>>> #define TYPEDEF_LIGATURE(i) \
>>> typedef struct _LIGATURE ## i { \
>>> ? ? ? ? BYTE VirtualKey; \
>>> ? ? ? ? WORD ModificationNumber; \
>>> ? ? ? ? WCHAR wch[i]; \
>>> } LIGATURE ## i, *PLIGATURE ## i;
>>> 
>>> ? ? ? ? TYPEDEF_LIGATURE(1)
>>> ? ? ? ? TYPEDEF_LIGATURE(2)
>>> ? ? ? ? TYPEDEF_LIGATURE(3)
>>> ? ? ? ? TYPEDEF_LIGATURE(4)
>>> ? ? ? ? TYPEDEF_LIGATURE(5)
>>> 
>>> was proof that some version of Windows actually supported ligatures
>>> longer than 4 code units (WCHARs).
>>
>> Why then the SDK predefines a structure with 5 code units ???
>>
>>> But no such proof ever materialized.
>>
>> You'll find examples in the ReactOS ?sources (the link I gave) that provides 
>> drivers for many more languages than the two example drivers provided with the SDK.
>>
>>> And once again, that is not what Mats was talking about. He was talking
>>> about dead-key combinations not being able to generate more than ONE
>>> code unit. And if you go back and look at kbd.h, you will see this:
>>> 
>>> typedef struct _DEADKEY {
>>> ? ? ? ? DWORD dwBoth;
>>> ? ? ? ? WCHAR wchComposed;
>>> ? ? ? ? USHORT uFlags;
>>> } DEADKEY, *PDEADKEY;
>>> 
>>> typedef WCHAR *DEADKEY_LPWSTR;
>>>
>> Here again, the support of 4 code points in structures allows binding 
>> "ligatures" in keymaps, even if their entries contain a single WCHAR, using the 
>> special value for "ligatures" (which are looked up in a separate table.
>>
>>> Notice the absence of any array of 4, 6, or 8192 WCHARs.
>>
>> You don't need to ! you assign a value WCH_LGTR=0xF002 (the PUA code unit), 
>> which triggers a lookup in the "LIGATUREn" tables.
>>
>>> Only one WCHAR can be composed from a dead-key sequence.
>>
>> Wrong, you assign a?WCH_LGTR and then ligature tables are used, they are not 
>> limited to just one code unit.


From verdy_p at wanadoo.fr  Sat Nov  5 15:52:17 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sat, 5 Nov 2016 21:52:17 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <1751824501.10460.1478364682456.JavaMail.www@wwinf1p03>
References: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
 <CAGa7JC0s89A9m+QV_SNcWGXcaQidtOu_YH_or=CxCjihoASVqw@mail.gmail.com>
 <CAGa7JC3-ODFAs8unEu1y76c+PkF9wd0bq2XnTqgu8KXh+hrJrg@mail.gmail.com>
 <1751824501.10460.1478364682456.JavaMail.www@wwinf1p03>
Message-ID: <CAGa7JC0G9wFpnKu4KTaJ+Cf9EbO6wN0Eoa-mKX015B_OwHenxg@mail.gmail.com>

2016-11-05 17:51 GMT+01:00 Marcel Schneider <charupdate at orange.fr>:

> Sorry not to have found time sooner to look close at the stuff
> that is claimed to support code unit sequences through dead keys.
> It?s all about live keys, none about dead keys.
> Yet another case of talking past each other.
>
> IMHO that happened because one simple question was not answered prior to
> sharing links to sources: How will the API know what line of aLigature
> (the ligature table) to look up, if the 0xf002 alias WCH_LGTR is not found
> in aVkToWch<n> (the allocation table)?
>
> Indeed, column 1 of the ligature table contains the virtual key, and
> column 2 contains the modification number, that refers to the column of
> the allocation table where each 0xf002 or WCH_LGTR is mapped to a key and
> shift state:
>
> static ALLOC_SECTION_LDATA VK_TO_WCHARS38 aVkToWch38[] = {
> // Modification_# >>>|0|1|2|3|4|5|6|7|8|9|10|11|
> 12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|
> 32|33|34|35|36|37|
> {'Q'/*T1E
> C01*/,0x01,'q','Q','#',0x2126,0x00f7,LGTR,0x0331,NONE,NONE,
> NONE,NONE,NONE,0x0634,'\\',0x0447,0x0427,0x0447,0x0427,'&
> ','%',0x03c2,0x2211,'&'
> ,'%',0x05e7,'*','&','%',LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,NONE,NONE},
> //
> {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0}
> };
>
> static ALLOC_SECTION_LDATA LIGATURE16 aLigature[] = {
> // |Virtual_Key|SC|ISO_#|Modif#|Char0|Char1|Char2|Char3|Char4|
> Char5|Char6|Char7|Char8|Char9|Char10|Char11|Char12|Char13|Char14|Char15|
> {'Q'/*T1E C01*/,5,' ',0x2191,'q','_','n',0x2019,'
> e','x','i','s','t','e','_','p','a','s'}, // ^q doesn't exist
> {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
> };
>

Your structures do not seem to be correctly formatted (or it is just random
data):
>>> typedef struct _LIGATURE ## i { \
>>>         BYTE VirtualKey; \
>>>         WORD ModificationNumber; \
>>>         WCHAR wch[i]; \
>>> } LIGATURE ## i, *PLIGATURE ## i;
Here you set:
 VirtualKey='Q''/*T1E C01*/,
 ModificationNumber = 5,
 wch[0]=' ', wch[1]='0x2191',//^, wch[2]='q',... wch[15]='s'
for defininig a very long "ligature" (the wrong term used in kbd.h where it
should just be a "string", even if those strings have a fixed length and
are null-padded)

What it says is that VK_Q in modification number 5 (as defined in the
MODIFIERS modifier_bits table, which remaps to modication number bits the
set of virtual modifiers mapped in the VK_TO_BIT modifier_keys table)
should generate your string (which can contain up to 16 WCHARS, but no null
chars: it is not possible to include a NULL char in a LIGATURE table, but
anyway a keyboard never has to do that, as NULL chars are not mapped in any
ligature table but isolatedly in a VK mapping table, with a single WCHAR
code unit directly.

Note also that the definition in <kbd.h> of the SDK:
typedef struct _KBDTABLES {
  PMODIFIERS pCharModifiers;
  PVK_TO_WCHAR_TABLE pVkToWcharTable;
  PDEADKEY pDeadKey;
  VSC_LPWSTR *pKeyNames;
  VSC_LPWSTR *pKeyNamesExt;
  LPWSTR *pKeyNamesDead;
  USHORT *pusVSCtoVK;
  BYTE bMaxVSCtoVK;
  PVSC_VK pVSCtoVK_E0;
  PVSC_VK pVSCtoVK_E1;
  DWORD fLocaleFlags;
  BYTE nLgMaxd;
  BYTE cbLgEntry;
  PLIGATURE1 pLigature;
} KBDTABLES, *PKBDTABLES;

May be misleading, for the last two members:
- nLgMaxd indicates the maximum length of null-padded strings in a
pLigature table entry, whose entry size is stored in cbLgEntry: this size
acts as a versioning info for the ligatures table format, and most probably
it is there so that keyboard drivers compiled on another architecture will
still be usable even if the size of a WCHAR is changed.
- but of course the type of an entry is not a LIGATURE1, but at least a
LIGATURE2 (LIGATURE1 has no use in any table, given that 1-WCHAR strings
will be stored directly in one of the VK_TO_WCHAR_TABLE
<https://doxygen.reactos.org/d1/df3/struct__VK__TO__WCHAR__TABLE.html> tables.
the LIGATURE1 is just there to allow pointer typecasts in C/C++
independantly of the LIGARTURE(n) table format you need.
- Windows provably works with LIGATURE2, LIGATURE3, LIGATURE4 and LIGATURE5
(I've never tested if it works for longer strings or if it really works
with a LIGATURE1 table format)

The LIGATURE(n) format also uses internal paddings between members, notably
between "BYTE VirtualKey;" and "WORD ModificationNumber;": there's a hidden
alignment BYTE between them, which could be considered as additional flags
for the effective LIGATURE(n) format (C/C++ compilers are stupposed to fill
these padding bytes with zeroes). Given that WORD and WCHAR have the same
16-bit size, the whole structure is an array of 16-bit blocks: in a
LIGATURE1 there are two WORDS, so it is also aligned on a DWORD; in a
LIGATURE2, this would take 3 useful words, but due to alignment constaints,
the entry will be 4 words and sizeof(wch[0] will be 16, just like for a
LIGATURE3; so LIGATURE2 has no use: therere will be an extra padding null
WORD in the wchar array, and that's why "cbLgEntry " is there, but this
makes "nLgMaxd" completely unneeded, except to make sure that the extra
padding WCHAR in wch[] will be discarded, even if it is not filled with
zeroes, i.e. a NULL WCHAR which is ignored anyway and acts as an early
terminator.

Now comes the question about how ligatures are matched: they are looked up
in the LIGATURE(n) tables by looking only at the first two members
VirtualKey and ModificationNumber (ignoring the extra padding BYTE?) but
most probably by grouping them as a single DWORD (the LO WORD contains the
VKEY, the HIWORD contains the modifiers). The lookup is apparently linear
(there's apparently no requirement for this table to be sorted to perform a
binary search, and anyway these LIGATURE tables are generally short).

If a [KEY,modifiers] pair is not found in the ligature table (even if the
VK_TO_WCHAR_TABLE says it should be there by assigning a WCH_LGTR value to
the entry for that VKEY in the modifier column number), the behavior should
probably be the same as if the entry in the VK_TO_WCHAR_TABLE  contained
WCH_NONE (i.e. key not mapped), but in my opinion the table data has a bug:
it should contain WCH_NONE instead of WCH_LGTR. I think that the Keyboard
compiler tool should detect this error (it should also detect the use of an
unneeded LIGATURE1 instead of mapping directly in a VK_WCHAR_TABLE (or in a
DEADKEY table)

---- Speculation follows about possible extensions for dead keys mapped to
"ligatures", and arbitrary-length ligatures in general mapped from
DEADKEY(n) and VK_WCHAR_TABLE(n) tables ---

Note also the presence of a "flags" BYTE in entries of a DEADKEY table:
could this BYTE be used as well in the LIGATURE table entries (between BYTE
VirtualKey; WORD ModificationNumber) when the "comp" member of a DEADKEY's
entry contains a "WCH_LGTR" and use for example to store an identifier of
the deakey state for lokup in LIGATURE(n) tables (this lookup will still
continue to work by grouping <VirtualKey, DeadKeyState, modifiers> in a
single DWORD instead of comparing them individually.

Also the "nLgMaxd" member of KBDTABLES has no real use if it just contains
2, 3, 4 or 5. Setting its value to 0 would be better used to indicate that
a LIGATURE(0) entry no longer contains a null-padded string "WCHAR wch[]",
but instead contain a pointer to a real string with "PWSTR pwch;"
("cbLgEntry" is still used: on 32-bit architecture it returns 8 (2 BYTES+1
WORD for the composite key, 1 DWORD for the target pointer), on 64-bit
architecture it will return h16 (2 BYTES+1 DWORD for the composite key, 1
DWORD of alignement, 1 QWORD for the 64 bit pointer); the alternative would
be to store even shorter pointers using a single DWORD of offset in a
null-terminated strings table, stored just at end of the LIGATURE(0) lookup
table, these offsets being relative to the start of the LIGATURE(0) table
(whose pointer just has to be typecasted as a WORD[] array).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161105/8af17ae4/attachment.html>

From charupdate at orange.fr  Sat Nov  5 21:31:58 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 6 Nov 2016 03:31:58 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
Message-ID: <213971739.7366.1478399518373.JavaMail.www@wwinf2212>

On Fri, 04 Nov 2016 13:52:52 -0700, Doug Ewell wrote:

> Going beyond 4 seems like such a useful and intriguing enhancement, for 
> some folks anyway, that if it were possible, it should be easy to find 
> at least one example where some DDK developer has utilized it. 

Yes indeed, it?s finally rather easy to find:
http://accentuez.mon.nom.free.fr/Clavier-CreationClavier.php
(again) writes notably (as of the present topic; translation follows below):

| aLigature
| [?]
| Le fichier kbd.h ne contient que 4 types LIGATURE2, LIGATURE3, LIGATURE4, 
| LIGATURE5. Mais en r?alit? on n?est pas limit? ? cinq unit?s de code : si on a 
| une touche Alt-Gr + espace qui renvoie dix unit?s de code, par exemple *LIGATURE*, 
| on peut d?clarer la table pr?c?dente comme suit :
| 
| TYPEDEF_LIGATURE(10) // LIGATURE10, *PLIGATURE10;
| static ALLOC_SECTION_LDATA LIGATURE10 aLigature[] = {
| [?]
| };
| 
| On peut donc cr?er des touches renvoyant des mots, voire des phrases. 
| On est toutefois limit? ? seize unit?s de code TYPEDEF_LIGATURE(16).

?The kbd.h file contains only 4 types LIGATURE2, LIGATURE3, LIGATURE4, LIGATURE5.
But in reality one is not limited to five code units: if AltGr + space generates 
ten code units, e.g. as in ?*LIGATURE*?, the table above can be declared as follows:
TYPEDEF_LIGATURE(10) // LIGATURE10, *PLIGATURE10;
static ALLOC_SECTION_LDATA LIGATURE10 aLigature[] = {
[?]
};
Thus we can create keys generating words or even sentences. However we are 
limited to sixteen code units: TYPEDEF_LIGATURE(16).?

As of retrieving this page, it is actually the 18th result of Bing Search on 
'keyboard layout creation', results in all languages enabled.

Marcel


From charupdate at orange.fr  Sat Nov  5 21:40:51 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 6 Nov 2016 03:40:51 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <CAGa7JC0G9wFpnKu4KTaJ+Cf9EbO6wN0Eoa-mKX015B_OwHenxg@mail.gmail.com>
References: <20161104135252.665a7a7059d7ee80bb4d670165c8327d.812c5c1fe0.wbe@email03.godaddy.com>
 <CAGa7JC0s89A9m+QV_SNcWGXcaQidtOu_YH_or=CxCjihoASVqw@mail.gmail.com>
 <CAGa7JC3-ODFAs8unEu1y76c+PkF9wd0bq2XnTqgu8KXh+hrJrg@mail.gmail.com>
 <1751824501.10460.1478364682456.JavaMail.www@wwinf1p03>
 <CAGa7JC0G9wFpnKu4KTaJ+Cf9EbO6wN0Eoa-mKX015B_OwHenxg@mail.gmail.com>
Message-ID: <1931445883.7378.1478400051028.JavaMail.www@wwinf2212>

On Sat, 5 Nov 2016 21:52:17 +0100, Philippe Verdy wrote:

> Your structures do not seem to be correctly formatted (or it is just random 
> data): 

Maybe there are formal defaults, and perhaps it is written in a non-standard 
way. What I can tell at least, is that on my machine it works (Windows 7 Starter).
And I?m not in the habits of publishing random data as if it were real code.

Marcel


From charupdate at orange.fr  Sat Nov  5 22:11:02 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 6 Nov 2016 04:11:02 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
Message-ID: <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>

On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote:

> I am seeking technical information from a Microsoft team member. 
> Hopefully we will soon have definitive answers to replace all the 
> controversy. 

I?m aware that discussions have sometimes a way of going off the road and I 
do experience this also on the mailing list of a keyboarding community I?m 
actually very implied in, but I understand that when the layout driver 
architecture of some OS impacts numerous local user communities, analyzing 
code snippets on the Unicode List may sometimes end up meeting a real demand 
because at some point, the discrepancy between the on-going development of 
the Unicode Standard and its implementation in the real world is going to 
heavily compromise the usability and the usefulness of the scheme.

Having said that, I?m further aware that code development is typically best 
done on collaborative repositories such as GitHub, GitLab, Sourceforge. I?ve 
tried some of them and do have accounts. Perhaps I?ve missed something: I 
dont find the neat display and nice syntaxic highlighting like on ReactOS. 
And above all, I?m unable to figure out efficient layout driver development 
there. A big part is done in huge workbooks. This is best done in Excel. 
When my workbook is up-to-date, I?ll be in a position to share it in public.

Now since we are on it, be it permitted to discuss other snippets, hopefully 
that Microsoft (or a programmer on this List) will find a way to make the 
Windows APIs understand multiple code units by dead keys:

/*TEMPLATE */ DEADTRANS( BASECHAR ,DEADKEY ,COMBICHAR ,DEADKEYFLAG), // UNICODE NAME

? This is how it can work without dead keys:
/*COMPOSE */ DEADTRANS( L'\"' ,0x00a9 ,0x0151 ,CHAIN ), // LATIN SMALL LETTER O WITH DOUBLE ACUTE
/*DOUBLE_AIGU*/ DEADTRANS( L'o' ,0x0151 ,0x0151 ,DKF_0 ), // LATIN SMALL LETTER O WITH DOUBLE ACUTE
/*COMPOSE */ DEADTRANS( L':' ,0x00a9 ,0x00eb ,CHAIN ), // LATIN SMALL LETTER E WITH DIAERESIS
/*TREMA */ DEADTRANS( L'a' ,0x00eb ,0x00e4 ,DKF_0 ), // LATIN SMALL LETTER A WITH DIAERESIS

? Now the acute and tilde dead keys:

? In the allocation table:
{VK_OEM_1 /*T1B D12*/ ,0x08 ,DEAD ,DEAD /*snip*/
{0xff,0 ,/*acute:*/0x00e1 ,/*tilde:*/0x00f5 /*snip*/

? In the deadtrans list:
/*TILDE */ DEADTRANS( 0x00e1 ,0x00f5 ,0x1e4d ,CHAIN ), // LATIN SMALL LETTER O WITH TILDE AND ACUTE
/*TILDE&AIGU */ DEADTRANS( L'O' ,0x1e4d ,0x1e4c ,DKF_0 ), // LATIN CAPITAL LETTER O WITH TILDE AND ACUTE

? And with LATIN CAPITAL LETTER OPEN E? Why not this way (as has been suggested):
/*TILDE&AIGU */ DEADTRANS( 0x0190 ,0x1e4d ,{0x0190,0x0303,0x0301} ,DKF_0 ), // *LATIN CAPITAL LETTER OPEN E WITH 
TILDE AND ACUTE

Hopefully,

Marcel


From verdy_p at wanadoo.fr  Sat Nov  5 23:32:23 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 6 Nov 2016 05:32:23 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
 <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
Message-ID: <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>

2016-11-06 4:11 GMT+01:00 Marcel Schneider <charupdate at orange.fr>:

> On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote:
>
> ? And with LATIN CAPITAL LETTER OPEN E? Why not this way (as has been
> suggested):
> /*TILDE&AIGU */ DEADTRANS( 0x0190 ,0x1e4d ,{0x0190,0x0303,0x0301} ,DKF_0
> ), // *LATIN CAPITAL LETTER OPEN E WITH
> TILDE AND ACUTE
>

This snippet cannot work as is, because the DEADTRANS() macro maps
gernerates a 8-BYTE structure only has a single WCHAR for storing the
result of the map of a (VKEY+modifier number):

  typedef struct _DEADKEY {
    DWORD dwBoth;
    WCHAR wchComposed;
    USHORT uFlags;
  } DEADKEY, *PDEADKEY;

So it will need to map a WCH_LGTR instead, and then use a "ligature" table
to store the string containing the 3 code units you want.

Then there's an unused BYTE in the DEADTRANS structure for the flags, that
can be used (specifically for entries mapped to WCH_LGTR) to pass flags to
the LIGATURE(n) table (where there's also a free BYTE in the indexing key,
allowing to pass an identifier needed for the lookup in the LIGATURE(n)
table; alternatively, instead of mapping WCH_LGTR (a PUA), you could as
well map another PUA there in 0xE001.0xE0FF for passing a byte for the
deadkey state into the lookup of ligatures:

#define TYPEDEF_LIGATURE(i) \
typedef struct _LIGATURE ## i { \
 BYTE VirtualKey; \
 WORD ModificationNumber; \
 WCHAR wch[i]; \
} LIGATURE ## i, *PLIGATURE ## i;

which can safely be changed to:

typedef struct _LIGATURE ## i { \
 BYTE VirtualKey, DeadKeyState; \
 WORD ModificationNumber; \
 WCHAR wch[i]; \
} LIGATURE ## i, *PLIGATURE ## i;

(in the current definition of <kbd.h> the extra byte is implicit for the
alignment, but not declared explicitly, it is implicitly filled with zeroes
by C compilers when declaring the structure, but in my opinion this extra
byte should have been declared explicitly.)

But now it's up to the OS to support it, may be it works already if the
lookup in the LIGATURE(n) table already scans for values of a DWORD,
including this free padding byte, however there's a need to change some
code in the kernel-level to check the PUA values mapped in DEADKEY
structures and extract a DeadKeyState from it.

The alternative is to map the combination of two deadkeys to a bit in the
modifier number (this can be instructed by the uFlags, which will set the
modifier bit number specified in the mapped PUA). In all cases there's
still space for extension there.

The last alternative is to extend the KBDTABLES structure to append new
members for a table of extended DEADKEYS, and a separate table of LIGATURE
for DEADKEYs (the KBDTABLE does not specify its own size, but it has a
fLocaleFlags field just before the table of ligatures, which can indicate
the presence of these extensions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161106/298cca8d/attachment.html>

From verdy_p at wanadoo.fr  Sat Nov  5 23:37:12 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 6 Nov 2016 05:37:12 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
 <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
 <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>
Message-ID: <CAGa7JC00ScDYO6yAJWK1qjkrBoEBKkPo765_0P76D5e=McCP7w@mail.gmail.com>

Note: such extension is absolutely necessary for scripts not encoded in the
BMP (e.g. Gothic or Deseret, or larger scripts that will absolutely need
mechanisms like dead keys if they want to have a usable keyboard layout !)

2016-11-06 5:32 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

>
>
> 2016-11-06 4:11 GMT+01:00 Marcel Schneider <charupdate at orange.fr>:
>
>> On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote:
>>
>> ? And with LATIN CAPITAL LETTER OPEN E? Why not this way (as has been
>> suggested):
>> /*TILDE&AIGU */ DEADTRANS( 0x0190 ,0x1e4d ,{0x0190,0x0303,0x0301} ,DKF_0
>> ), // *LATIN CAPITAL LETTER OPEN E WITH
>> TILDE AND ACUTE
>>
>
> This snippet cannot work as is, because the DEADTRANS() macro maps
> gernerates a 8-BYTE structure only has a single WCHAR for storing the
> result of the map of a (VKEY+modifier number):
>
>   typedef struct _DEADKEY {
>     DWORD dwBoth;
>     WCHAR wchComposed;
>     USHORT uFlags;
>   } DEADKEY, *PDEADKEY;
>
> So it will need to map a WCH_LGTR instead, and then use a "ligature" table
> to store the string containing the 3 code units you want.
>
> Then there's an unused BYTE in the DEADTRANS structure for the flags, that
> can be used (specifically for entries mapped to WCH_LGTR) to pass flags to
> the LIGATURE(n) table (where there's also a free BYTE in the indexing key,
> allowing to pass an identifier needed for the lookup in the LIGATURE(n)
> table; alternatively, instead of mapping WCH_LGTR (a PUA), you could as
> well map another PUA there in 0xE001.0xE0FF for passing a byte for the
> deadkey state into the lookup of ligatures:
>
> #define TYPEDEF_LIGATURE(i) \
> typedef struct _LIGATURE ## i { \
>  BYTE VirtualKey; \
>  WORD ModificationNumber; \
>  WCHAR wch[i]; \
> } LIGATURE ## i, *PLIGATURE ## i;
>
> which can safely be changed to:
>
> typedef struct _LIGATURE ## i { \
>  BYTE VirtualKey, DeadKeyState; \
>  WORD ModificationNumber; \
>  WCHAR wch[i]; \
> } LIGATURE ## i, *PLIGATURE ## i;
>
> (in the current definition of <kbd.h> the extra byte is implicit for the
> alignment, but not declared explicitly, it is implicitly filled with zeroes
> by C compilers when declaring the structure, but in my opinion this extra
> byte should have been declared explicitly.)
>
> But now it's up to the OS to support it, may be it works already if the
> lookup in the LIGATURE(n) table already scans for values of a DWORD,
> including this free padding byte, however there's a need to change some
> code in the kernel-level to check the PUA values mapped in DEADKEY
> structures and extract a DeadKeyState from it.
>
> The alternative is to map the combination of two deadkeys to a bit in the
> modifier number (this can be instructed by the uFlags, which will set the
> modifier bit number specified in the mapped PUA). In all cases there's
> still space for extension there.
>
> The last alternative is to extend the KBDTABLES structure to append new
> members for a table of extended DEADKEYS, and a separate table of LIGATURE
> for DEADKEYs (the KBDTABLE does not specify its own size, but it has a
> fLocaleFlags field just before the table of ligatures, which can indicate
> the presence of these extensions.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161106/17d397d5/attachment.html>

From verdy_p at wanadoo.fr  Sat Nov  5 23:40:59 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 6 Nov 2016 05:40:59 +0100
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <CAGa7JC00ScDYO6yAJWK1qjkrBoEBKkPo765_0P76D5e=McCP7w@mail.gmail.com>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
 <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
 <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>
 <CAGa7JC00ScDYO6yAJWK1qjkrBoEBKkPo765_0P76D5e=McCP7w@mail.gmail.com>
Message-ID: <CAGa7JC14_GVbmuHZierM-WHFLSazDA5hAz53-Ymx+2YAjP+Gow@mail.gmail.com>

Another use case: being able to type Bopomofo along with Cyrillic or
Kanas...; and new extensions will be needed for the 2012 German layout and
other layouts made according to the ISO standard (you cannot do all what
you want with just a few modifier bits and Windows only implementing a Kana
modifier key and limiting the number of modifiers supported even below the
capacity of the WORD ModificationNumber !

2016-11-06 5:37 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> Note: such extension is absolutely necessary for scripts not encoded in
> the BMP (e.g. Gothic or Deseret, or larger scripts that will absolutely
> need mechanisms like dead keys if they want to have a usable keyboard
> layout !)
>
> 2016-11-06 5:32 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:
>
>>
>>
>> 2016-11-06 4:11 GMT+01:00 Marcel Schneider <charupdate at orange.fr>:
>>
>>> On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote:
>>>
>>> ? And with LATIN CAPITAL LETTER OPEN E? Why not this way (as has been
>>> suggested):
>>> /*TILDE&AIGU */ DEADTRANS( 0x0190 ,0x1e4d ,{0x0190,0x0303,0x0301} ,DKF_0
>>> ), // *LATIN CAPITAL LETTER OPEN E WITH
>>> TILDE AND ACUTE
>>>
>>
>> This snippet cannot work as is, because the DEADTRANS() macro maps
>> gernerates a 8-BYTE structure only has a single WCHAR for storing the
>> result of the map of a (VKEY+modifier number):
>>
>>   typedef struct _DEADKEY {
>>     DWORD dwBoth;
>>     WCHAR wchComposed;
>>     USHORT uFlags;
>>   } DEADKEY, *PDEADKEY;
>>
>> So it will need to map a WCH_LGTR instead, and then use a "ligature"
>> table to store the string containing the 3 code units you want.
>>
>> Then there's an unused BYTE in the DEADTRANS structure for the flags,
>> that can be used (specifically for entries mapped to WCH_LGTR) to pass
>> flags to the LIGATURE(n) table (where there's also a free BYTE in the
>> indexing key, allowing to pass an identifier needed for the lookup in the
>> LIGATURE(n) table; alternatively, instead of mapping WCH_LGTR (a PUA), you
>> could as well map another PUA there in 0xE001.0xE0FF for passing a byte for
>> the deadkey state into the lookup of ligatures:
>>
>> #define TYPEDEF_LIGATURE(i) \
>> typedef struct _LIGATURE ## i { \
>>  BYTE VirtualKey; \
>>  WORD ModificationNumber; \
>>  WCHAR wch[i]; \
>> } LIGATURE ## i, *PLIGATURE ## i;
>>
>> which can safely be changed to:
>>
>> typedef struct _LIGATURE ## i { \
>>  BYTE VirtualKey, DeadKeyState; \
>>  WORD ModificationNumber; \
>>  WCHAR wch[i]; \
>> } LIGATURE ## i, *PLIGATURE ## i;
>>
>> (in the current definition of <kbd.h> the extra byte is implicit for the
>> alignment, but not declared explicitly, it is implicitly filled with zeroes
>> by C compilers when declaring the structure, but in my opinion this extra
>> byte should have been declared explicitly.)
>>
>> But now it's up to the OS to support it, may be it works already if the
>> lookup in the LIGATURE(n) table already scans for values of a DWORD,
>> including this free padding byte, however there's a need to change some
>> code in the kernel-level to check the PUA values mapped in DEADKEY
>> structures and extract a DeadKeyState from it.
>>
>> The alternative is to map the combination of two deadkeys to a bit in the
>> modifier number (this can be instructed by the uFlags, which will set the
>> modifier bit number specified in the mapped PUA). In all cases there's
>> still space for extension there.
>>
>> The last alternative is to extend the KBDTABLES structure to append new
>> members for a table of extended DEADKEYS, and a separate table of LIGATURE
>> for DEADKEYs (the KBDTABLE does not specify its own size, but it has a
>> fLocaleFlags field just before the table of ligatures, which can indicate
>> the presence of these extensions.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161106/a7411e0b/attachment.html>

From charupdate at orange.fr  Sun Nov  6 01:22:25 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 6 Nov 2016 08:22:25 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <CAGa7JC14_GVbmuHZierM-WHFLSazDA5hAz53-Ymx+2YAjP+Gow@mail.gmail.com>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
 <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
 <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>
 <CAGa7JC00ScDYO6yAJWK1qjkrBoEBKkPo765_0P76D5e=McCP7w@mail.gmail.com>
 <CAGa7JC14_GVbmuHZierM-WHFLSazDA5hAz53-Ymx+2YAjP+Gow@mail.gmail.com>
Message-ID: <143705401.185.1478416945629.JavaMail.www@wwinf2212>

On Sun, 6 Nov 2016 05:40:59 +0100, Philippe Verdy wrote:

> Another use case: being able to type Bopomofo along with Cyrillic or 
> Kanas...; and new extensions will be needed for the 2012 German layout and 
> other layouts made according to the ISO standard (you cannot do all what 
> you want with just a few modifier bits and Windows only implementing a Kana 
> modifier key and limiting the number of modifiers supported even below the 
> capacity of the WORD ModificationNumber ! 

This does not match my experience. I?m actually using modifiers 0x10, 0x20, 
0x40 and 0x80 too, and kbd.h has even names for most of them: [kbd.h(51)]

/*
* Keyboard Shift State defines. These correspond to the bit mask defined
* by the VkKeyScan() API.
*/
#define KBDBASE 0
#define KBDSHIFT 1
#define KBDCTRL 2
#define KBDALT 4
// three symbols KANA, ROYA, LOYA are for FE
#define KBDKANA 8
#define KBDROYA 0x10
#define KBDLOYA 0x20
#define KBDGRPSELTAP 0x80

0x40 proves to be useable too. What I cannot understand, and others 
are puzzled too, is the name KBDGRPSELTAP. It sounds like it were an 
acronym of ?GRouP SELecTor APing? or the like, hence my suspicion that 
the developers were asked to ape the *then new* ISO/IEC 9995-3 group 
selector. by implementing it as a dead key, as a *remnant* group selector.

That?s about the name only. Much more annoying is that I?ve been unable 
to get any result from the application of the related attribute: [kbd.h(364)]

#define CAPLOK 0x01
#define SGCAPS 0x02
#define CAPLOKALTGR 0x04
// KANALOK is for FE
#define KANALOK 0x08
#define GRPSELTAP 0x80

And there is even NO COMMENT, as only the first two are mentioned in the 
preceding comment: [kbd.h(46)]

* Special values for Attributes:
* CAPLOK - The CAPS-LOCK key affects this key like SHIFT
* SGCAPS - CapsLock uppercases the unshifted char (Swiss-German)

So I added 0x80 to the attribute of a key, expecting that this would 
make it sensitive to the CapsLock toggle key VK_CAPITAL, because this 
would match the ISO/IEC 9995 intent of having a secondary group that is 
subject to CapsLock. But it did not work.

Thank you for the instructions below. I hope that the programmers on 
this List know how exactly it must be translated into C so that it will 
be compiled and the API can read the compiled binaries it, and that 
Microsoft will make and ship the kernel-level update you mention below
with one of the very next Windows Updates so that all users whose 
Windows version stays maintained, will be able to use keyboard layouts 
that can input WCHAR strings trough dead keys.

Best regards,

Marcel

On Sun, 6 Nov 2016 05:37:12 +0100, Philippe Verdy wrote:

> Note: such extension is absolutely necessary for scripts not encoded in 
> the BMP (e.g. Gothic or Deseret, or larger scripts that will absolutely 
> need mechanisms like dead keys if they want to have a usable keyboard 
> layout !) 
> 
> 2016-11-06 5:32 GMT+01:00 Philippe Verdy : 
> 
>> 
>> 
>> 2016-11-06 4:11 GMT+01:00 Marcel Schneider : 
>> 
>>> On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote: 
>>> 
>>> ? And with LATIN CAPITAL LETTER OPEN E? Why not this way (as has been 
>>> suggested): 
>>> /*TILDE&AIGU */ DEADTRANS( 0x0190 ,0x1e4d ,{0x0190,0x0303,0x0301} ,DKF_0 
>>> ), // *LATIN CAPITAL LETTER OPEN E WITH 
>>> TILDE AND ACUTE 
>>> 
>> 
>> This snippet cannot work as is, because the DEADTRANS() macro maps 
>> gernerates a 8-BYTE structure only has a single WCHAR for storing the 
>> result of the map of a (VKEY+modifier number): 
>> 
>> typedef struct _DEADKEY { 
>> DWORD dwBoth; 
>> WCHAR wchComposed; 
>> USHORT uFlags; 
>> } DEADKEY, *PDEADKEY; 
>> 
>> So it will need to map a WCH_LGTR instead, and then use a "ligature" 
>> table to store the string containing the 3 code units you want. 
>> 
>> Then there's an unused BYTE in the DEADTRANS structure for the flags, 
>> that can be used (specifically for entries mapped to WCH_LGTR) to pass 
>> flags to the LIGATURE(n) table (where there's also a free BYTE in the 
>> indexing key, allowing to pass an identifier needed for the lookup in the 
>> LIGATURE(n) table; alternatively, instead of mapping WCH_LGTR (a PUA), you 
>> could as well map another PUA there in 0xE001.0xE0FF for passing a byte for 
>> the deadkey state into the lookup of ligatures: 
>> 
>> #define TYPEDEF_LIGATURE(i) \ 
>> typedef struct _LIGATURE ## i { \ 
>> BYTE VirtualKey; \ 
>> WORD ModificationNumber; \ 
>> WCHAR wch[i]; \ 
>> } LIGATURE ## i, *PLIGATURE ## i; 
>> 
>> which can safely be changed to: 
>> 
>> typedef struct _LIGATURE ## i { \ 
>> BYTE VirtualKey, DeadKeyState; \ 
>> WORD ModificationNumber; \ 
>> WCHAR wch[i]; \ 
>> } LIGATURE ## i, *PLIGATURE ## i; 
>> 
>> (in the current definition of  the extra byte is implicit for the 
>> alignment, but not declared explicitly, it is implicitly filled with zeroes 
>> by C compilers when declaring the structure, but in my opinion this extra 
>> byte should have been declared explicitly.) 
>> 
>> But now it's up to the OS to support it, may be it works already if the 
>> lookup in the LIGATURE(n) table already scans for values of a DWORD, 
>> including this free padding byte, however there's a need to change some 
>> code in the kernel-level to check the PUA values mapped in DEADKEY 
>> structures and extract a DeadKeyState from it. 
>> 
>> The alternative is to map the combination of two deadkeys to a bit in the 
>> modifier number (this can be instructed by the uFlags, which will set the 
>> modifier bit number specified in the mapped PUA). In all cases there's 
>> still space for extension there. 
>> 
>> The last alternative is to extend the KBDTABLES structure to append new 
>> members for a table of extended DEADKEYS, and a separate table of LIGATURE 
>> for DEADKEYs (the KBDTABLE does not specify its own size, but it has a 
>> fLocaleFlags field just before the table of ligatures, which can indicate 
>> the presence of these extensions. 
>>


From charupdate at orange.fr  Sun Nov  6 12:33:39 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 6 Nov 2016 19:33:39 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <143705401.185.1478416945629.JavaMail.www@wwinf2212>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
 <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
 <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>
 <CAGa7JC00ScDYO6yAJWK1qjkrBoEBKkPo765_0P76D5e=McCP7w@mail.gmail.com>
 <CAGa7JC14_GVbmuHZierM-WHFLSazDA5hAz53-Ymx+2YAjP+Gow@mail.gmail.com>
 <143705401.185.1478416945629.JavaMail.www@wwinf2212>
Message-ID: <1565548445.5480.1478457219235.JavaMail.www@wwinf2212>

To complete this thread prior to Microsoft?s response, I?d quote in extenso 
the relevant part of the Standard. Though it basically matches the actual state 
of discussion, quoting it here seems useful since it highlights the fact that 
if the end-users are used to dead keys?as in the francophone regions in Africa?
urging them to swap base characters and diacritics is *not* straightforward. 
Keyboard layouts without dead keys and with combining diacritics on live keys 
are thus to be promoted in the anglophone regions of Africa, not the francophone 
ones where layouts with string-generating dead keys seem to be mandatory:

TUS 9.0, ?5.12 (Implementation Guidelines: Strategies for Handling Nonspacing Marks), p. 222:
|
| Keyboard Input
|
| A common implementation for the input of combining character sequences is the use of
| dead keys. These keys match the mechanics used by typewriters to generate such sequences
| through overtyping the base character after the nonspacing mark. In computer implementations,
| keyboards enter a special state when a dead key is pressed for the accent and emit a
| precomposed character only when one of a limited number of ?legal? base characters is
| entered. It is straightforward to adapt such a system to emit combining character sequences
| or precomposed characters as needed.
|
| Typists, especially in the Latin script, are trained on systems that work using dead keys.
| However, many scripts in the Unicode Standard (including the Latin script) may be implemented
| according to the handwriting sequence, in which users type the base character first,
| followed by the accents or other nonspacing marks (see Figure 5-4).
|

In another part, TUS mentions the downside (outdated legacy keyboard protocols):

TUS 9.0, ?2.7 (General Structure: Unicode strings), p. 43:
|
| [?] While an ideal protocol would allow keyboard events to contain complete strings,
| many allow only a single UTF-16 code unit per event. [?]


BTW there is an obvious error in my last e-mail (quoted below):
> [?] that the developers were asked to ape the *then new* ISO/IEC 9995-3 group 
> selector. by implementing it as a dead key, as a *remnant* group selector.
This is not about a dead key, but about a modifier key.

Then there is a flaw when I didn?t mention that I pressed the related modifier:
> So I added 0x80 to the attribute of a key, expecting that this would 
> make it sensitive to the CapsLock toggle key VK_CAPITAL, because this 
> would match the ISO/IEC 9995 intent of having a secondary group that is 
> subject to CapsLock. But it did not work.
Should read:
?expecting that this would make it sensitive to CapsLock *on the 0x80 shift state*.?

Lastly, subscribers who had trouble downloading the folder from dispoclavier.com 
are welcome to e-mail me off-list so that I can send sources and/or drivers without 
the script that is available at charupdate.info#drivers (translation will complete). 
Although I?m aware that developers using KbdUTool typically scripted already the 
automation of the process.

Marcel
?
On 06/11/16 08:28, I wrote:
> On Sun, 6 Nov 2016 05:40:59 +0100, Philippe Verdy wrote:
> 
> > Another use case: being able to type Bopomofo along with Cyrillic or 
> > Kanas...; and new extensions will be needed for the 2012 German layout and 
> > other layouts made according to the ISO standard (you cannot do all what 
> > you want with just a few modifier bits and Windows only implementing a Kana 
> > modifier key and limiting the number of modifiers supported even below the 
> > capacity of the WORD ModificationNumber ! 
> 
> This does not match my experience. I?m actually using modifiers 0x10, 0x20, 
> 0x40 and 0x80 too, and kbd.h has even names for most of them: [kbd.h(51)]
> 
> /*
> * Keyboard Shift State defines. These correspond to the bit mask defined
> * by the VkKeyScan() API.
> */
> #define KBDBASE 0
> #define KBDSHIFT 1
> #define KBDCTRL 2
> #define KBDALT 4
> // three symbols KANA, ROYA, LOYA are for FE
> #define KBDKANA 8
> #define KBDROYA 0x10
> #define KBDLOYA 0x20
> #define KBDGRPSELTAP 0x80
> 
> 0x40 proves to be useable too. What I cannot understand, and others 
> are puzzled too, is the name KBDGRPSELTAP. It sounds like it were an 
> acronym of ?GRouP SELecTor APing? or the like, hence my suspicion that 
> the developers were asked to ape the *then new* ISO/IEC 9995-3 group 
> selector. by implementing it as a dead key, as a *remnant* group selector.
> 
> That?s about the name only. Much more annoying is that I?ve been unable 
> to get any result from the application of the related attribute: [kbd.h(364)]
> 
> #define CAPLOK 0x01
> #define SGCAPS 0x02
> #define CAPLOKALTGR 0x04
> // KANALOK is for FE
> #define KANALOK 0x08
> #define GRPSELTAP 0x80
> 
> And there is even NO COMMENT, as only the first two are mentioned in the 
> preceding comment: [kbd.h(46)]
> 
> * Special values for Attributes:
> * CAPLOK - The CAPS-LOCK key affects this key like SHIFT
> * SGCAPS - CapsLock uppercases the unshifted char (Swiss-German)
> 
> So I added 0x80 to the attribute of a key, expecting that this would 
> make it sensitive to the CapsLock toggle key VK_CAPITAL, because this 
> would match the ISO/IEC 9995 intent of having a secondary group that is 
> subject to CapsLock. But it did not work.
> 
> Thank you for the instructions below. I hope that the programmers on 
> this List know how exactly it must be translated into C so that it will 
> be compiled and the API can read the compiled binaries it, and that 
> Microsoft will make and ship the kernel-level update you mention below
> with one of the very next Windows Updates so that all users whose 
> Windows version stays maintained, will be able to use keyboard layouts 
> that can input WCHAR strings trough dead keys.
> 
> Best regards,
> 
> Marcel
> 
> On Sun, 6 Nov 2016 05:37:12 +0100, Philippe Verdy wrote:
> 
> > Note: such extension is absolutely necessary for scripts not encoded in 
> > the BMP (e.g. Gothic or Deseret, or larger scripts that will absolutely 
> > need mechanisms like dead keys if they want to have a usable keyboard 
> > layout !) 
> > 
> > 2016-11-06 5:32 GMT+01:00 Philippe Verdy : 
> > 
> >> 
> >> 
> >> 2016-11-06 4:11 GMT+01:00 Marcel Schneider : 
> >> 
> >>> On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote: 
> >>> 
> >>> ? And with LATIN CAPITAL LETTER OPEN E? Why not this way (as has been 
> >>> suggested): 
> >>> /*TILDE&AIGU */ DEADTRANS( 0x0190 ,0x1e4d ,{0x0190,0x0303,0x0301} ,DKF_0 
> >>> ), // *LATIN CAPITAL LETTER OPEN E WITH 
> >>> TILDE AND ACUTE 
> >>> 
> >> 
> >> This snippet cannot work as is, because the DEADTRANS() macro maps 
> >> gernerates a 8-BYTE structure only has a single WCHAR for storing the 
> >> result of the map of a (VKEY+modifier number): 
> >> 
> >> typedef struct _DEADKEY { 
> >> DWORD dwBoth; 
> >> WCHAR wchComposed; 
> >> USHORT uFlags; 
> >> } DEADKEY, *PDEADKEY; 
> >> 
> >> So it will need to map a WCH_LGTR instead, and then use a "ligature" 
> >> table to store the string containing the 3 code units you want. 
> >> 
> >> Then there's an unused BYTE in the DEADTRANS structure for the flags, 
> >> that can be used (specifically for entries mapped to WCH_LGTR) to pass 
> >> flags to the LIGATURE(n) table (where there's also a free BYTE in the 
> >> indexing key, allowing to pass an identifier needed for the lookup in the 
> >> LIGATURE(n) table; alternatively, instead of mapping WCH_LGTR (a PUA), you 
> >> could as well map another PUA there in 0xE001.0xE0FF for passing a byte for 
> >> the deadkey state into the lookup of ligatures: 
> >> 
> >> #define TYPEDEF_LIGATURE(i) \ 
> >> typedef struct _LIGATURE ## i { \ 
> >> BYTE VirtualKey; \ 
> >> WORD ModificationNumber; \ 
> >> WCHAR wch[i]; \ 
> >> } LIGATURE ## i, *PLIGATURE ## i; 
> >> 
> >> which can safely be changed to: 
> >> 
> >> typedef struct _LIGATURE ## i { \ 
> >> BYTE VirtualKey, DeadKeyState; \ 
> >> WORD ModificationNumber; \ 
> >> WCHAR wch[i]; \ 
> >> } LIGATURE ## i, *PLIGATURE ## i; 
> >> 
> >> (in the current definition of the extra byte is implicit for the 
> >> alignment, but not declared explicitly, it is implicitly filled with zeroes 
> >> by C compilers when declaring the structure, but in my opinion this extra 
> >> byte should have been declared explicitly.) 
> >> 
> >> But now it's up to the OS to support it, may be it works already if the 
> >> lookup in the LIGATURE(n) table already scans for values of a DWORD, 
> >> including this free padding byte, however there's a need to change some 
> >> code in the kernel-level to check the PUA values mapped in DEADKEY 
> >> structures and extract a DeadKeyState from it. 
> >> 
> >> The alternative is to map the combination of two deadkeys to a bit in the 
> >> modifier number (this can be instructed by the uFlags, which will set the 
> >> modifier bit number specified in the mapped PUA). In all cases there's 
> >> still space for extension there. 
> >> 
> >> The last alternative is to extend the KBDTABLES structure to append new 
> >> members for a table of extended DEADKEYS, and a separate table of LIGATURE 
> >> for DEADKEYs (the KBDTABLE does not specify its own size, but it has a 
> >> fLocaleFlags field just before the table of ligatures, which can indicate 
> >> the presence of these extensions. 
> >>
> 
>


From mark at kli.org  Sun Nov  6 13:17:02 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Sun, 6 Nov 2016 14:17:02 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <20161104140236.665a7a7059d7ee80bb4d670165c8327d.ef5253d96e.wbe@email03.godaddy.com>
References: <20161104140236.665a7a7059d7ee80bb4d670165c8327d.ef5253d96e.wbe@email03.godaddy.com>
Message-ID: <4a495df6-9245-c5b2-1a49-abcde8496032@kli.org>

On 11/04/2016 05:02 PM, Doug Ewell wrote:
> Mark E. Shoulson wrote:
>
>> At any rate, this isn't Unicode's problem. Unicode would not be
>> creating anything in Klingon anyway!
> Well, to be fair, I thought IPR was the primary reason Unicode had never
> encoded the Apple logo either. I doubt that whether Unicode intended to
> use such a character themselves was a factor. (Of course, users who
> really wanted that character encoded are probably using ?? or ??
> now.)
>   
> --
> Doug Ewell | Thornton, CO, US | ewellic.org

The Apple logo is just that: a logo.  Unicode is/used to be explicitly 
NOT in the business of encoding logos, and only peripherally in the 
business of encoding cute Wingdings and icons.  pIqaD is an *alphabet* 
for writing a *language*; that's a whole different situation, and one 
that is squarely in what Unicode is all about doing.  "Should" the Apple 
logo have been encoded?  Possibly, though there are a lot of reasons not 
to which do not depend specifically on IP (we'd have to encode all the 
other emblems of all the other computer companies also... not to mention 
gasoline companies, cereal companies...) Should pIqaD be encoded?  It is 
my claim that it should, and that reasons not to are (mainly) limited to 
IP considerations.  In which case, IP considerations need to be 
addressed, yes, but they should not pre-determine the decision of 
whether or not it's worthy of inclusion.


~mark


From prosfilaes at gmail.com  Sun Nov  6 16:22:16 2016
From: prosfilaes at gmail.com (David Starner)
Date: Sun, 06 Nov 2016 22:22:16 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <42101413.334282.1478281304520@mail.yahoo.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
Message-ID: <CAMZ=zj6LOMsoU2vwM=1mE6zvTEOMxfqE2wq9UZ2WORY4CTU=yQ@mail.gmail.com>

On Fri, Nov 4, 2016 at 10:42 AM David Faulks <davidj_faulks at yahoo.ca> wrote:

> There is another issue of course, which I think could be a huge obstacle:
> the Trademark/Copyright issue. Paramount claims copyright over the entire
> Klingon language (presumably including the script). The issue has recently
> gone to court. Encoding criteria for symbols (and this likely extends to
> letters) is against encoding them without the permission of the
> Copyright/Trademark holder.
>

The US copyright office will not register letters for copyright: cf.
http://web.archive.org/web/20160304062736/http://www.ipmall.info/hosted_resources/CopyrightAppeals/2004/Mark%20Hendricksen.pdf
So the copyright issue is not relevant here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161106/2c11a6d1/attachment.html>

From asmusf at ix.netcom.com  Sun Nov  6 20:16:37 2016
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Sun, 6 Nov 2016 18:16:37 -0800
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <CAMZ=zj6LOMsoU2vwM=1mE6zvTEOMxfqE2wq9UZ2WORY4CTU=yQ@mail.gmail.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <CAMZ=zj6LOMsoU2vwM=1mE6zvTEOMxfqE2wq9UZ2WORY4CTU=yQ@mail.gmail.com>
Message-ID: <d9b3402a-0298-36fe-12d1-342d4aea5e29@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161106/b30fcff9/attachment.html>

From Shawn.Steele at microsoft.com  Mon Nov  7 16:36:32 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Mon, 7 Nov 2016 22:36:32 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
References: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
Message-ID: <MWHPR03MB2813B27D4071C048600CE7B482A70@MWHPR03MB2813.namprd03.prod.outlook.com>

I guess for this thread I should subscribe to the list with a personal email address.  Please don?t confuse my personal and professional opinions here ;)  (Of course I?ll probably confuse them myself).


<Microsoft hat off>


Personally, as myself, no Microsoft hat, I would be interested to see the base characters encoded, excluding the ?mummification glyph? and your 2 created characters.  The mummification glyph seems decorative and I haven?t seen the others in use.  I would include the pIqaD comma and full stop, they seem to have fairly consistent use.  Their meaning is also more specific than the triangle glyph suggestions you mentioned as possible alternatives.  Since these are used in plaintext conversations and not merely as decoration, I think that attempting to overload the meaning of the non-pIqaD triangle glyphs would be inappropriate.

The enthusiasts using pIqaD, and the businesses targeting that community, have, in my opinion, reached a level of adoption that requires proper Unicode encoding to make further progress.  The current ConScript PUA practice is a decent hack to get things to work, but in practice there can be strange behaviors, particularly in more advanced aspects of character behavior.  Like the fact that the PUA range doesn?t properly describe the character properties of these letters and digits.

For example, Qurgh and others figured out how to get pIqaD to behave in Facebook posts.  The current Klingon word of the day posts include the pIqaD spelling, and some discussion happens in pIqaD as well.  However getting it all to behave is unnecessarily awkward given some of the current restrictions requiring using the PUA for pIqaD.

Mark, you missed that pIqaD has an ISO script code now (Piqd).  That might be worth mentioning.  The PUA encoding makes it difficult or hacky to integrate some features for the Piqd script in computing libraries, such as digit conversion routines.


<Microsoft hat on>


Professionally, I?m not sure if Microsoft has a current position on pIqaD.  As noted by Mark, the Bing Translator allows the use of pIqaD (tlh-Piqd), both for input and output.  I chose to use the ConScript PUA for that feature.  Had the pIqaD script been included in Unicode, we would have used the assigned Unicode codepoints instead of the ConScript PUA.

-Shawn

???? ?????
http://blogs.msdn.com/shawnste<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fblogs.msdn.com%2fshawnste&data=01%7c01%7cShawn.Steele%40microsoft.com%7c84e9a7e949424607aa7e08d2db1af438%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=7geUWqk26QXcuTobmofsuVAMCFnVv5BRPLT2rnQBSM4%3d>
http://bb-8.blogspot.com<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fbb-8.blogspot.com&data=01%7c01%7cShawn.Steele%40microsoft.com%7c84e9a7e949424607aa7e08d2db1af438%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=o2r6e84vcXvjOiNkmJPU095fbyOIdFTetYocZzqXQu8%3d>

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark Shoulson
Sent: ???????, ???????? 03, ??? 2016 16:44
To: unicode at unicode.org
Subject: The (Klingon) Empire Strikes Back


At the time of writing this letter it has not yet hit the UTC Document Register, but I have recently submitted a document revisiting the ever-popular issue of the encoding of Klingon "pIqaD".  The reason always given why it could not be encoded was that it did not enjoy enough usage, and so I've collected a bunch of examples to demonstrate that this is not true (scans and also web pages, etc.)  So the issue comes back up, and time to talk about it again.

Michael Everson: I basically copied your 1997 proposal into the document, with some minor changes.  I hope you don't mind.  And if you don't want to be on the hook for providing the glyphs to UTC, I can do that.  I think that proposal should serve as a starting-point for discussion anyway.  There are some things that maybe should be different:

1. the "SYMBOL FOR EMPIRE" also known as the "MUMMIFICATION GLYPH".  I don't know where the second name comes from, I don't know how important it is to encode it, and I don't know how much of a trademark headache it will cause with Paramount, as it is used pretty heavily in their imagery.  Something we'll have to talk about.

2. I put in the COMMA and FULL STOP, which were not in the original proposal but were in the ConScript registry entry.  The examples I have show them clearly being used.  UTC may decide to unify them with existing triangular shapes, which may or may not be a good idea.

3. For my part, I've invented a pair of ampersands for Klingon (Klingon has two words for "and": one for joining verbs/sentences and one for joining nouns (the former goes between its "conjunctands", the latter after them)), from ligatures of the letters in question.  The pretty much have NO usage, of course (and are not in the proposal), but maybe they should be presented to the community.

Document is available at http://web.meson.org/downloads/pIqaDReturns.pdf

Let the bickering begin!

~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161107/88854dd3/attachment.html>

From doug at ewellic.org  Mon Nov  7 16:59:36 2016
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 07 Nov 2016 15:59:36 -0700
Subject: The (Klingon) Empire Strikes Back
Message-ID: <20161107155936.665a7a7059d7ee80bb4d670165c8327d.1b935c880e.wbe@email03.godaddy.com>

Shawn Steele wrote:

> The PUA encoding makes it difficult or hacky to integrate some
> features for the Piqd script in computing libraries, such as digit
> conversion routines. 

Although somebody did create a Ewellic calculator for iOS that uses the
ConScript encoding:

https://itunes.apple.com/us/app/calculator-ewellic/id850838052
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From mark at kli.org  Mon Nov  7 18:46:09 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Mon, 7 Nov 2016 19:46:09 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <d9b3402a-0298-36fe-12d1-342d4aea5e29@ix.netcom.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <CAMZ=zj6LOMsoU2vwM=1mE6zvTEOMxfqE2wq9UZ2WORY4CTU=yQ@mail.gmail.com>
 <d9b3402a-0298-36fe-12d1-342d4aea5e29@ix.netcom.com>
Message-ID: <933c21bf-89ea-5078-eef7-7e0453cf02b6@kli.org>

Thanks, Asmus.

The document from the copyright office is pretty explicit and final, and 
it is pretty clear that you can't copyright an *alphabet*, that is 
*characters*.  You can copyright *glyphs* (a font), but that is another 
matter entirely.

I've heard that there are similar questions regarding tengwar and cirth, 
but it is notable that UTC *did* see fit to consider this question for 
them and determine that they were worthy of encoding (they are on the 
roadmap), even though they have not actually followed through on that 
yet, perhaps because of these very IP concerns.  Notably, pIqaD is not 
only not on  the roadmap, it is specifically listed on the "Not on the 
Roadmap" page as an example of something that was not deemed worthy of 
being on the roadmap. If it's an IP issue, then someone will have to 
explain to me why it applies so asymmetrically to Tolkien and Klingon 
(and Blissymbolics, for that matter).  And yes, these are not the only 
writing systems with these issues and will not be the last.  One way or 
another, the question will have to be faced and dealt with one way or 
another; ignoring it won't help.

~mark

On 11/06/2016 09:16 PM, Asmus Freytag wrote:
> On 11/6/2016 2:22 PM, David Starner wrote:
>>
>>
>> On Fri, Nov 4, 2016 at 10:42 AM David Faulks <davidj_faulks at yahoo.ca 
>> <mailto:davidj_faulks at yahoo.ca>> wrote:
>>
>>     There is another issue of course, which I think could be a huge
>>     obstacle: the Trademark/Copyright issue. Paramount claims
>>     copyright over the entire Klingon language (presumably including
>>     the script). The issue has recently gone to court. Encoding
>>     criteria for symbols (and this likely extends to letters) is
>>     against encoding them without the permission of the
>>     Copyright/Trademark holder.
>>
>>
>> The US copyright office will not register letters for copyright: cf. 
>> http://web.archive.org/web/20160304062736/http://www.ipmall.info/hosted_resources/CopyrightAppeals/2004/Mark%20Hendricksen.pdf
>> So the copyright issue is not relevant here.
>
> On the face of it, the cited statement seems to very broadly reject 
> the copyrightability of alphabets and writing systems, tracing that 
> decision back to statements of intent around the copyright legislation.
>
> Given that, I'd tend to concur with Doug that UTC should feel free to 
> discuss this on the merit, but that in the case of a positive outcome 
> the Consortium would of course have counsel review this issue. Given 
> that this won't be the only writing system for which the original 
> invention post-dates modern IP laws, it would probably be good to have 
> some clarity here.
>
> A./
>


From richard.wordingham at ntlworld.com  Tue Nov  8 02:30:25 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Tue, 8 Nov 2016 08:30:25 +0000
Subject: Multiple Preposed Marks
Message-ID: <20161108083025.47a4784c@JRWUBU2>

TUS Section 2.11 says, "If the combining characters can interact
typographically?for example, U+0304 combining macron and  U+0308  
combining  diaeresis ? then  the  order  of  graphic  display  is
determined  by  the  order  of  coded  characters  (see Table 2-5).
By  default,  the  diacritics  or other combining characters are
positioned from the base character?s glyph outward".

So, if I have two spacing combining marks E and O that are each
positioned to the left of the base (say X) in a left-to-right script,
so that the encodings <X, E> and <X, O> appear with the glyph orders
<gE, gX> and <gO, gX>, and codings <X, E, O> and <X ,O, E>, if not
total gibberish, represent a horizontal sequence of the glyphs with
gX on the right, should <X, E, O> render as <gE, gO, gX> or <gO, gE,
gX>?  The phonetics and collation (in so far as it is meaningful) of
the words provide no cue as to the order of the encoded characters.  I
have encountered both renderings.

The issue came up when I was checking, in both the Firefox and MS Edge
browsers, that my OpenType Tai Tham font Da Lekh could handle all the
headwords of two Northern Thai dictionaries. (Sparing dotted circle
deletion and orthographic syllable reunification are tricky.)  One
of the dictionaries spells a few words with a combination of the Tai and
Pali notations for the vowel /o:/ in open syllables where one might
expect to see an independent vowel.

I'm down to two other rendering engine issues - a combination of tone
mark and then vowel in 4 words, where the dictionary probably has a
misspelling, and the need for an OpenType feature (probably a cvXX) for
inconsistent handling of U+1A58 MAI KANG LAI.  The latter may be a
challenge - I couldn't persuade MS Edge to use the font's Lao shaping
for the Tai Tham script or for the Latin script in a transliteration
mode.  (That mode is triggered by feature ss02 for the Latin script, and
that works well enough in browsers.)

Richard.


From richard.wordingham at ntlworld.com  Tue Nov  8 03:09:45 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Tue, 8 Nov 2016 09:09:45 +0000
Subject: Suppressing Ligation of Spacing Marks
Message-ID: <20161108090945.2f92771d@JRWUBU2>

Should it be possible to suppress the ligation of a base character and
a visually following spacing mark in plain text?

The example I have in minf is the sequence <U+1A36 TAI THAM LETTER NA,
U+1A63 TAI THAM VOWEL SIGN AA>.  It may be desirable to suppress the
ligation because both ligands have subscript consonants.  However, if
I write <NA, ..., ZWNJ, SIGN AA, ...>, the Universal Shaping Engine
decides that the ZWNJ triggers a new syllable, and inserts a dotted
circle before SIGN AA.  (The dotted circle after SIGN AA results from a
failure to read the proposal for the Lanna script as it was then
called.)

Richard.


From jcb+unicode at inf.ed.ac.uk  Tue Nov  8 05:58:26 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Tue,  8 Nov 2016 11:58:26 +0000 (GMT)
Subject: The (Klingon) Empire Strikes Back
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <CAMZ=zj6LOMsoU2vwM=1mE6zvTEOMxfqE2wq9UZ2WORY4CTU=yQ@mail.gmail.com>
 <d9b3402a-0298-36fe-12d1-342d4aea5e29@ix.netcom.com>
 <933c21bf-89ea-5078-eef7-7e0453cf02b6@kli.org>
Message-ID: <slrno23ff1.e9d.jcb@home.stevens-bradfield.com>

On 2016-11-08, Mark E. Shoulson <mark at kli.org> wrote:
> I've heard that there are similar questions regarding tengwar and cirth, 
> but it is notable that UTC *did* see fit to consider this question for 
> them and determine that they were worthy of encoding (they are on the 
> roadmap), even though they have not actually followed through on that 
> yet, perhaps because of these very IP concerns.  Notably, pIqaD is not 

The Tolkien Estate considers that the tengwar constitute a work of
art, and it's not willing to see them in Unicode, because this would
hinder its ability to pursue people using tengwar for what it
considers inappropriate purposes. (I finally asked them a couple of
years ago for permission to encode, based on Michael Everson's draft
proposal from yonks ago, and that's the summary of their reply.)

Several years ago, I was told on this list that it would be up to the
proposers to deal with this, and that the Unicode Consortium would
have no interest in taking on the 800lb legal gorilla that is the
Tolkien Estate. (Now a 24M? gorilla with what it got from New Line
Cinema.)

If some wealthy Unicode Consortium member feels like paying for an
American counsel's opinion that the Estate is just trying it on, feel
free!

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From c933103 at gmail.com  Tue Nov  8 08:02:05 2016
From: c933103 at gmail.com (gfb hjjhjh)
Date: Tue, 8 Nov 2016 22:02:05 +0800
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <42101413.334282.1478281304520@mail.yahoo.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
Message-ID: <CAGHjPP+CnjeMxQ-9rdC9FTm1pD14u7WLUJFqUL9Mu8jaV1n8vg@mail.gmail.com>

I believe there's already a court ruling that say languages and words are
not copyrightablein the case about loglan, although the trademarkability of
an language is another matter.

2016?11?5? 01:42 ? "David Faulks" <davidj_faulks at yahoo.ca> ???

> > On Thu, 11/3/16, Mark Shoulson <mark at kli.org> wrote:
> > Subject: The (Klingon) Empire Strikes Back
>
> > At the time of writing this letter it has not yet hit the UTC
> > Document Register, but I have recently submitted a document
> > revisiting the ever-popular issue of the encoding of Klingon
> > "pIqaD".  The reason always given why it could not be
> > encoded was that it did not enjoy enough usage, and so I've
> > collected a bunch of examples to demonstrate that this is not
> > true (scans and also web pages, etc.)  So the issue comes
> > back up, and time to talk about it again.
>
> There is another issue of course, which I think could be a huge obstacle:
> the Trademark/Copyright issue. Paramount claims copyright over the entire
> Klingon language (presumably including the script). The issue has recently
> gone to court. Encoding criteria for symbols (and this likely extends to
> letters) is against encoding them without the permission of the
> Copyright/Trademark holder.
>
> Is Paramount endorsing your proposal?
>
> <snip>
>
> > ~mark
>
> David Faulks
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161108/724326fc/attachment.html>

From richard.wordingham at ntlworld.com  Tue Nov  8 14:30:26 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Tue, 8 Nov 2016 20:30:26 +0000
Subject: Multiple Preposed Marks
In-Reply-To: <20161108083025.47a4784c@JRWUBU2>
References: <20161108083025.47a4784c@JRWUBU2>
Message-ID: <20161108203026.2a56cb1d@JRWUBU2>

On Tue, 8 Nov 2016 08:30:25 +0000
Richard Wordingham <richard.wordingham at ntlworld.com> wrote:

> and the need for an OpenType feature (probably a cvXX)
> for inconsistent handling of U+1A58 MAI KANG LAI.  The latter may be a
> challenge - I couldn't persuade MS Edge to use the font's Lao shaping

General features (e.g. 'ss01') for Tai Tham work a treat in MS Edge, and
seem to be executed at the same time time as the 'standard typographical
presentation', e.g feature 'psts'.  Thank you!  That makes things much
easier.  (There seems to be quite a bit of variation in layout in Chiang
Mai province, never mind the rest of the region.)

Richard.

From charupdate at orange.fr  Tue Nov  8 15:02:16 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Tue, 8 Nov 2016 22:02:16 +0100 (CET)
Subject: Multiple Preposed Marks
In-Reply-To: <20161108203026.2a56cb1d@JRWUBU2>
References: <20161108083025.47a4784c@JRWUBU2> <20161108203026.2a56cb1d@JRWUBU2>
Message-ID: <1173503927.19308.1478638936401.JavaMail.www@wwinf1c23>

On Tue, 8 Nov 2016 21:36, Richard Wordingham wrote:
> 
> On Tue, 8 Nov 2016 08:30:25 +0000
> Richard Wordingham  wrote:
> 
> > and the need for an OpenType feature (probably a cvXX)
> > for inconsistent handling of U+1A58 MAI KANG LAI. The latter may be a
> > challenge - I couldn't persuade MS Edge to use the font's Lao shaping
> 
> General features (e.g. 'ss01') for Tai Tham work a treat in MS Edge, and
> seem to be executed at the same time time as the 'standard typographical
> presentation', e.g feature 'psts'. Thank you! That makes things much
> easier. [?]

?Where there?s a will, there?s a way!?

Marcel


From verdy_p at wanadoo.fr  Tue Nov  8 17:00:01 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 9 Nov 2016 00:00:01 +0100
Subject: Multiple Preposed Marks
In-Reply-To: <20161108083025.47a4784c@JRWUBU2>
References: <20161108083025.47a4784c@JRWUBU2>
Message-ID: <CAGa7JC17+oyL6evcyFRYd98BZzHQXYhb2ybApG4svasuL1wM6w@mail.gmail.com>

2016-11-08 9:30 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> TUS Section 2.11 says, "If the combining characters can interact
> typographically?for example, U+0304 combining macron and  U+0308
> combining  diaeresis ? then  the  order  of  graphic  display  is
> determined  by  the  order  of  coded  characters  (see Table 2-5).
> By  default,  the  diacritics  or other combining characters are
> positioned from the base character?s glyph outward".
>

The interpretation of   "If the combining characters can interact
typographically" should be better read as "If the combining characters have
the same non-zero combining class or any one of them has a zero combining
class".

Effectively the combining classes were historically intended to track these
possible graphic interactions, in order to allow or disable reordering and
detect canonical equivalences.

But now normalization is everywhere and causes the pairs using the
condition above to be freely reordered (or decomposed and recomposed,
meaning that the encoding order is NOT significant at all).

But it turned out that some diacritics may be positioned differently
according to their base character. E.g., the cedilla which may interact
below, where no interaction is supposed with other combining characters
normally interacting above (so that reordering to canonical equivalents is
permitted and in fact made automatically during the encoding/decoding
processes of documents), but with some Latin letters these interaction do
occur. The only way then to block the reordering (if you don't want the
positions infered from the encoding order of normalized strings), is to
block it using zero-combining joiners (CGJ).

This sentence should have been updated since long in TUS, because TUS does
not really know how characters will be positioned and Unicode permits
reordering of pairs of diacritics if they are not blocking each other for
normalization.

This is important for the cedilla, but even more important for Hebrew
diacritics, whose combining classes do not really track correctly their
relative positioning (as discussed on this list years ago, and known as the
"Hebrew points bug" (but this will never change: the combiing classes are
assigned permanently and continue to work for simple cases, but will cause
problems with some pairs needing insertions of CGJ).

This is also important for several Indic scripts that have complex
positioning rules if you use combining characters with non-zero combining
classes (initially intended for simple cases in Latin/Greek/Cyrillic).
Thanks, the most critical diacritics in Indic scripts for such complex
cases have a combining class set to zero (meaning that they blcok eah other
and their relative encoding order is not affected by normalization, but
there are many cases where CGJ is needed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161109/2a4abd76/attachment.html>

From richard.wordingham at ntlworld.com  Tue Nov  8 17:42:53 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Tue, 8 Nov 2016 23:42:53 +0000
Subject: Multiple Preposed Marks
In-Reply-To: <CAGa7JC17+oyL6evcyFRYd98BZzHQXYhb2ybApG4svasuL1wM6w@mail.gmail.com>
References: <20161108083025.47a4784c@JRWUBU2>
 <CAGa7JC17+oyL6evcyFRYd98BZzHQXYhb2ybApG4svasuL1wM6w@mail.gmail.com>
Message-ID: <20161108234253.52544213@JRWUBU2>

On Wed, 9 Nov 2016 00:00:01 +0100
Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> 2016-11-08 9:30 GMT+01:00 Richard Wordingham <
> richard.wordingham at ntlworld.com>:
> 
> > TUS Section 2.11 says, "If the combining characters can interact
> > typographically?for example, U+0304 combining macron and  U+0308
> > combining  diaeresis ? then  the  order  of  graphic  display  is
> > determined  by  the  order  of  coded  characters  (see Table 2-5).
> > By  default,  the  diacritics  or other combining characters are
> > positioned from the base character?s glyph outward".
 
> The interpretation of   "If the combining characters can interact
> typographically" should be better read as "If the combining
> characters have the same non-zero combining class or any one of them
> has a zero combining class".

The combining marks in question both have canonical combining class 0.

> But now normalization is everywhere and causes the pairs using the
> condition above to be freely reordered (or decomposed and recomposed,
> meaning that the encoding order is NOT significant at all).

I believe a renderer is permitted to treat canonically equivalent
sequence differently so long as it does not believe it should treat
them differently.  However, that is irrelevant to this case.

Richard.


From verdy_p at wanadoo.fr  Tue Nov  8 20:26:51 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 9 Nov 2016 03:26:51 +0100
Subject: Multiple Preposed Marks
In-Reply-To: <20161108234253.52544213@JRWUBU2>
References: <20161108083025.47a4784c@JRWUBU2>
 <CAGa7JC17+oyL6evcyFRYd98BZzHQXYhb2ybApG4svasuL1wM6w@mail.gmail.com>
 <20161108234253.52544213@JRWUBU2>
Message-ID: <CAGa7JC0MWUTMY+5CaZPcfCQ+BJWYLAgVw7bg_j8NJ0no4aaqDw@mail.gmail.com>

2016-11-09 0:42 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> On Wed, 9 Nov 2016 00:00:01 +0100
> Philippe Verdy <verdy_p at wanadoo.fr> wrote:
>
> > 2016-11-08 9:30 GMT+01:00 Richard Wordingham <
> > richard.wordingham at ntlworld.com>:
> >
> > > TUS Section 2.11 says, "If the combining characters can interact
> > > typographically?for example, U+0304 combining macron and  U+0308
> > > combining  diaeresis ? then  the  order  of  graphic  display  is
> > > determined  by  the  order  of  coded  characters  (see Table 2-5).
> > > By  default,  the  diacritics  or other combining characters are
> > > positioned from the base character?s glyph outward".
>
> > The interpretation of   "If the combining characters can interact
> > typographically" should be better read as "If the combining
> > characters have the same non-zero combining class or any one of them
> > has a zero combining class".
>
> The combining marks in question both have canonical combining class 0.
>
> > But now normalization is everywhere and causes the pairs using the
> > condition above to be freely reordered (or decomposed and recomposed,
> > meaning that the encoding order is NOT significant at all).
>
> I believe a renderer is permitted to treat canonically equivalent
> sequence differently so long as it does not believe it should treat
> them differently.  However, that is irrelevant to this case.
>

This is DIRECTLY relevant to the sentence in TUS you quoted, which is all
about combining characters encoded after the base letter and often have
non-zero combining classes and are reorderable

But evidently this sentence in TUS is not relevant to "prepended" combining
marks that are all with combining class 0, here "prepended" meaning:
encoded before the base character, but not after it even if they are
visually combining before it, as is the case for wellknown Indic vowels
that have now non-zero combining classes that allow them to be reordered
before other combining marks when normalizing, but still remaining encoded
after the base consonnant).

What I want to say is that this sentence in TUS is quite ambiguous: it
speaks about graphic interaction, but this is not really encoded in text
sequences and forgets the the effect of combining classes on combining
sequences, which NEVER considers any actual graphic interaction (simply
because it is not specified and the actual graphic interactions may depend
on font styles (notably in honorific Arabic typography using very complex
layouts, but even within the Latin script when using decorated font styles
or custom ligatures where complex also interactions occur, including on
larger spans than clusters, such as full words).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161109/b7b77c5b/attachment.html>

From verdy_p at wanadoo.fr  Tue Nov  8 20:42:07 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 9 Nov 2016 03:42:07 +0100
Subject: Suppressing Ligation of Spacing Marks
In-Reply-To: <20161108090945.2f92771d@JRWUBU2>
References: <20161108090945.2f92771d@JRWUBU2>
Message-ID: <CAGa7JC0sa-1XNRoFG7ZrdgThMqNgu47SLwKMj44WTS=JETxQUQ@mail.gmail.com>

inserting some zero-width word joiner or disjoiner should work with this...
But if you see a dotted circle, you need to encode some zero-width space as
the base holder for the combining vowel sign following it.

However I wonder if fonts accept zero-width holders for combining vowels,
they could still assume that there's no matching base consonnant and thus
insert another base dotted circle.

There's no consensus across script for using the same null-base holder
acting as a pseudo-consonnant for vowels encoded after them (e.g. Hangul
has its own jamo holder for this because of its specific algorithmic
composition, but some other scripts also use such null holders for their
own orthography).. In Alphabetic scripts, the ZWNJ should work.

But in Indic scripts we are all depending on the capability of renderers to
support specific scripts with only specific subsets of base letters and
every other character outside this subset will trigger the insertion of a
dotted circle glyph, and ZWJ/ZWNJ is already specific for being used in
script-specific clusters for some distinctions (notably to control how
parts of clusters are subgrouped ...)

You'll need to "bug" the maintainers of the renderer if they forgot
necessary cases described earlier for the script when it was initially
approved for encoding.

2016-11-08 10:09 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> Should it be possible to suppress the ligation of a base character and
> a visually following spacing mark in plain text?
>
> The example I have in minf is the sequence <U+1A36 TAI THAM LETTER NA,
> U+1A63 TAI THAM VOWEL SIGN AA>.  It may be desirable to suppress the
> ligation because both ligands have subscript consonants.  However, if
> I write <NA, ..., ZWNJ, SIGN AA, ...>, the Universal Shaping Engine
> decides that the ZWNJ triggers a new syllable, and inserts a dotted
> circle before SIGN AA.  (The dotted circle after SIGN AA results from a
> failure to read the proposal for the Lanna script as it was then
> called.)
>
> Richard.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161109/6057f366/attachment.html>

From unicode at lindenbergsoftware.com  Wed Nov  9 07:10:34 2016
From: unicode at lindenbergsoftware.com (Norbert Lindenberg)
Date: Wed, 9 Nov 2016 22:10:34 +0900
Subject: Suppressing Ligation of Spacing Marks
In-Reply-To: <20161108090945.2f92771d@JRWUBU2>
References: <20161108090945.2f92771d@JRWUBU2>
Message-ID: <F64E3236-9B9C-475E-AE5E-D7AA322281A3@lindenbergsoftware.com>

The part of the specification of the Universal Shaping Engine [1] that deals with ZWNJ is a bit unclear, but I read it to mean that ZWNJ should not cause the insertion of a dotted circle if the character following it has general category Mn or Mc.

The USE specification says: "The zero-width non-joiner is used to prevent a fusion of two characters. It continues a preceding cluster but causes a cluster break after itself when the following character is not a mark character (gc=Mn or gc=Mc).?

The specification does not say how this character should be handled in cluster validation. I assume first that the statement about the combining grapheme joiner also applies to ZWNJ: ?CGJ has been omitted from the above schema in order to avoid unnecessary complexity?. I further interpret the little the spec does say about ZWNJ to imply that it should be allowed before any character with general category Mn or Mc, without affecting the validity of the cluster. Inserting a dotted circle would be equivalent to causing a cluster break, which the spec rules out when the following character has general category Mn or Mc.

 U+1A63 has gc=Mc, so it shouldn?t be preceded by a dotted circle in the sequence <NA, ZWNJ, SIGN AA, ?>. Note that I omitted the first ??? from the sequence you provided, because an intervening character might trigger the dotted circle.

So this may just be a bug in the implementation of the USE that you?re using. I see this bug in Safari (CoreText), but not in Firefox (Harfbuzz); haven?t tried Edge. Which one are you using?

[1] http://www.microsoft.com/typography/OpenTypeDev/USE/intro.htm

Best regards,
Norbert


> On Nov 8, 2016, at 18:09 , Richard Wordingham <richard.wordingham at ntlworld.com> wrote:
> 
> Should it be possible to suppress the ligation of a base character and
> a visually following spacing mark in plain text?
> 
> The example I have in minf is the sequence <U+1A36 TAI THAM LETTER NA,
> U+1A63 TAI THAM VOWEL SIGN AA>.  It may be desirable to suppress the
> ligation because both ligands have subscript consonants.  However, if
> I write <NA, ..., ZWNJ, SIGN AA, ...>, the Universal Shaping Engine
> decides that the ZWNJ triggers a new syllable, and inserts a dotted
> circle before SIGN AA.  (The dotted circle after SIGN AA results from a
> failure to read the proposal for the Lanna script as it was then
> called.)
> 
> Richard.
> 


From richard.wordingham at ntlworld.com  Wed Nov  9 13:53:35 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 9 Nov 2016 19:53:35 +0000
Subject: Suppressing Ligation of Spacing Marks
In-Reply-To: <F64E3236-9B9C-475E-AE5E-D7AA322281A3@lindenbergsoftware.com>
References: <20161108090945.2f92771d@JRWUBU2>
 <F64E3236-9B9C-475E-AE5E-D7AA322281A3@lindenbergsoftware.com>
Message-ID: <20161109195335.349183b3@JRWUBU2>

On Wed, 9 Nov 2016 22:10:34 +0900
Norbert Lindenberg <unicode at lindenbergsoftware.com> wrote:

> The part of the specification of the Universal Shaping Engine [1]
> that deals with ZWNJ is a bit unclear, but I read it to mean that
> ZWNJ should not cause the insertion of a dotted circle if the
> character following it has general category Mn or Mc.
> 
> The USE specification says: "The zero-width non-joiner is used to
> prevent a fusion of two characters. It continues a preceding cluster
> but causes a cluster break after itself when the following character
> is not a mark character (gc=Mn or gc=Mc).?
> 
> The specification does not say how this character should be handled
> in cluster validation. I assume first that the statement about the
> combining grapheme joiner also applies to ZWNJ: ?CGJ has been omitted
> from the above schema in order to avoid unnecessary complexity?. I
> further interpret the little the spec does say about ZWNJ to imply
> that it should be allowed before any character with general category
> Mn or Mc, without affecting the validity of the cluster. Inserting a
> dotted circle would be equivalent to causing a cluster break, which
> the spec rules out when the following character has general category
> Mn or Mc.

That makes sense, but I was hoping for an opinion independent of the
Microsoft policy.

>  U+1A63 has gc=Mc, so it shouldn?t be preceded by a dotted circle in
> the sequence <NA, ZWNJ, SIGN AA, ?>. Note that I omitted the first
> ??? from the sequence you provided, because an intervening character
> might trigger the dotted circle.

The word, meaning 'to foretell' can be seen at
http://www.wrdingham.co.uk/lanna/renderer_test.htm .  The full encoding
of the syllable is <U+1A36 NA, U+1A60 SAKOT, U+1A45 WA, U+200C ZWNJ,
U+1A63 SIGN AA, U+1A60 SAKOT, U+1A3F LOW YA>.  MS Edge, running on an
evaluation copy of Windows 10 kindly provided for checking web page
displays in MS Edge, inserts dotted circles after* ZWNJ and before the
second SAKOT.  The second insertion is because USE does not recognise
Indic CVC orthographic syllables, which make up about half the native
vocabulary in the region.  Pali is less badly affected, though one
can't write _nibb?na_ 'nirvana' properly and the Tai Khuen may be
unhappy with how they have to write _dhamma_ 'dharma' and its compounds
in Pali.

*I know it's after because of the 'shaping' in the Da Lekh font, which
eliminates the vast bulk of the dotted circles misinserted by USE,
whose specification is wrong.

> So this may just be a bug in the implementation of the USE that
> you?re using. I see this bug in Safari (CoreText), but not in Firefox
> (Harfbuzz); haven?t tried Edge. Which one are you using?

MS Edge (see above).  The dotted circle behaviour of HarfBuzz and MS
Edge is different - I have dotted circle lookups in my font dedicated to
HarfBuzz patterns that don't occur in MS Edge.  I haven't checked my
font to destruction yet (6 marks will generally overwhelm it); I've
just thrown two Northern Thai dictionaries at it.

Richard.


From richard.wordingham at ntlworld.com  Wed Nov  9 14:27:42 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 9 Nov 2016 20:27:42 +0000
Subject: Multiple Preposed Marks
In-Reply-To: <CAGa7JC0MWUTMY+5CaZPcfCQ+BJWYLAgVw7bg_j8NJ0no4aaqDw@mail.gmail.com>
References: <20161108083025.47a4784c@JRWUBU2>
 <CAGa7JC17+oyL6evcyFRYd98BZzHQXYhb2ybApG4svasuL1wM6w@mail.gmail.com>
 <20161108234253.52544213@JRWUBU2>
 <CAGa7JC0MWUTMY+5CaZPcfCQ+BJWYLAgVw7bg_j8NJ0no4aaqDw@mail.gmail.com>
Message-ID: <20161109202742.06df65c6@JRWUBU2>

On Wed, 9 Nov 2016 03:26:51 +0100
Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> 2016-11-09 0:42 GMT+01:00 Richard Wordingham <
> richard.wordingham at ntlworld.com>:

> > I believe a renderer is permitted to treat canonically equivalent
> > sequence differently so long as it does not believe it should treat
> > them differently.  However, that is irrelevant to this case.
 
> This is DIRECTLY relevant to the sentence in TUS you quoted, which is
> all about combining characters encoded after the base letter and
> often have non-zero combining classes and are reorderable

As you pointed out, it most clearly addresses the case of two combining
marks with the same canonical combining class, and obviously in such a
case the sequence is not reorderable.
 
> But evidently this sentence in TUS is not relevant to "prepended"
> combining marks that are all with combining class 0, here "prepended"
> meaning: encoded before the base character, but not after it even if
> they are visually combining before it, as is the case for wellknown
> Indic vowels that have now non-zero combining classes that allow them
> to be reordered before other combining marks when normalizing, but
> still remaining encoded after the base consonnant).

I can't guess what you mean:
(a) The combining marks in question *follow* the base consonant, but are
rendered before it.  'Preposition' is a property of abstract
characters, not of codepoints.

(b) All characters with an Indic Positional Category of 'left' (or
similar) have canonical combining class 0.

There is a simple example of the base outwards rule in the Tai Tham
script.  The only way of encoding Northern Thai /p???/ 'to chan?e' with
the glyphs of U+1A38 TAI THAM LETTER HIGH PA, U+1A55 TAI THAM CONSONANT
SIGN MEDIAL RA and U+1A6F TAI THAM VOWEL SIGN AE acceptable to the
Universal Shaping engine is <U+1A38, U+1A55, U+1A6F>, and the visual
order is the reverse of the encoding order.  Unfortunately, it could be
argued that the encoding order is independent of the visual order.

Richard.


From verdy_p at wanadoo.fr  Wed Nov  9 15:23:28 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 9 Nov 2016 22:23:28 +0100
Subject: Multiple Preposed Marks
In-Reply-To: <20161109202742.06df65c6@JRWUBU2>
References: <20161108083025.47a4784c@JRWUBU2>
 <CAGa7JC17+oyL6evcyFRYd98BZzHQXYhb2ybApG4svasuL1wM6w@mail.gmail.com>
 <20161108234253.52544213@JRWUBU2>
 <CAGa7JC0MWUTMY+5CaZPcfCQ+BJWYLAgVw7bg_j8NJ0no4aaqDw@mail.gmail.com>
 <20161109202742.06df65c6@JRWUBU2>
Message-ID: <CAGa7JC0_3uRkjo6k48r9YHs9KDBcXLLftX7Z=NiLbvTeOcWi=g@mail.gmail.com>

2016-11-09 21:27 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> On Wed, 9 Nov 2016 03:26:51 +0100
> Philippe Verdy <verdy_p at wanadoo.fr> wrote:
>
> > 2016-11-09 0:42 GMT+01:00 Richard Wordingham <
> > richard.wordingham at ntlworld.com>:
>
> > > I believe a renderer is permitted to treat canonically equivalent
> > > sequence differently so long as it does not believe it should treat
> > > them differently.  However, that is irrelevant to this case.
>
> > This is DIRECTLY relevant to the sentence in TUS you quoted, which is
> > all about combining characters encoded after the base letter and
> > often have non-zero combining classes and are reorderable
>
> As you pointed out, it most clearly addresses the case of two combining
> marks with the same canonical combining class, and obviously in such a
> case the sequence is not reorderable.
>
> > But evidently this sentence in TUS is not relevant to "prepended"
> > combining marks that are all with combining class 0, here "prepended"
> > meaning: encoded before the base character, but not after it even if
> > they are visually combining before it, as is the case for wellknown
> > Indic vowels that have now non-zero combining classes that allow them
> > to be reordered before other combining marks when normalizing, but
> > still remaining encoded after the base consonnant).
>
> I can't guess what you mean:
> (a) The combining marks in question *follow* the base consonant, but are
> rendered before it.  'Preposition' is a property of abstract
> characters, not of codepoints.
>
> (b) All characters with an Indic Positional Category of 'left' (or
> similar) have canonical combining class 0.
>

Reread, I was very clear between these two cases, explicitly saying that
"PREPENDED" meant case (b). And yes I also said explicitly these had
combining class 0 and that they were then not subject to mutual reordering.

But the TUS sentence that YOU quoted was compleltely falling in case (a),
where "combining marks" may still appear before but are always encoded
after, and where they are freely (undistinctly) reorderable if they have
distinct non-zero combining classes: these combining characters have then
no well defined mutual positions. But in these cases, they are "supposed"
to not "interact typographically" (due to the fact they were encoded with
distinct combining positional classes), but this turns to be wrong in
various cases, notably for Hebrew diacritics (between vowel points and
other points modifying the consonnant) and for several kinds of Indic
diacritics (mixes of vowels halfvowels, and "liquid" halfconsonnants, and
within consonnant clusters). There are also some complex cases when using
non-Indic diacritics over Indic letters/clusters

For all these cases (a), CGJ must be used to block the possible reorderings
and then being able to compose the layout of clusters with the expected
typographic interactions when such interactions can effectively occur
(because the **effective** relative position is DEFINTELY NOT explicitly
encoded in any one of these combining characters with non-zero combining
classes (whose property names, like "above" or "below", are
counter-intuitive but only work with the most frequent simple cases where
there's a single diacritic after a base letter and for most base letters...
but not all, and without any consideration of the possible creation of
ligatures and complex clusters, notably in traditional Arabic, or in
decorative typographies for most all scripts including Latin)!

If you're still not convinced, look at how complex typographies are used
for "the name of God" in various religions and denominations (it's not just
the case of the Hebrew "tetragram"). You can also look at "calligrammes"
where the usual script layout is completely relaxed and where diacritics
may be moved anywhere around words and not necessarily near the base
letter; it is impossible to represent this typography with characters and
their Unicode properties. Indic scripts however have formalized some of
these freedoms of placements using complex positioning rules that are part
of their most common form.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161109/0bfd789d/attachment.html>

From petercon at microsoft.com  Wed Nov  9 22:49:12 2016
From: petercon at microsoft.com (Peter Constable)
Date: Thu, 10 Nov 2016 04:49:12 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
Message-ID: <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark E. Shoulson
Sent: Friday, November 4, 2016 1:18 PM

> At any rate, this isn't Unicode's problem?

You saying that potential IP issues are not Unicode?s problem does not in fact make it not a problem. A statement in writing from authorized Paramount representatives stating it would not be a problem for either Unicode, its members or implementers of Unicode would make it not a problem for Unicode.


Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161110/b3ff9a45/attachment.html>

From mats.gbproject at gmail.com  Thu Nov 10 05:47:41 2016
From: mats.gbproject at gmail.com (Mats Blakstad)
Date: Thu, 10 Nov 2016 12:47:41 +0100
Subject: Dataset for all ISO639 code sorted by country/territory?
In-Reply-To: <20160920093425.665a7a7059d7ee80bb4d670165c8327d.219e1cf756.wbe@email03.godaddy.com>
References: <20160920093425.665a7a7059d7ee80bb4d670165c8327d.219e1cf756.wbe@email03.godaddy.com>
Message-ID: <CAP=1PAXKmvXfOqHo2Yp8ZXLAo-=AaW3rUPScZZiN3122S4AF9g@mail.gmail.com>

On 20 September 2016 at 18:34, Doug Ewell <doug at ewellic.org> wrote:

> > Is there any dataset that contains all languages in the world sorted
> > by country/territory?
>
> As others have pointed out, be careful about how slippery this slope can
> get. Everyone has his or her own opinion about how many speakers of
> Language X in country Y need to be identified, estimated, or conjectured
> in order to say that "language X is spoken in country Y."
>

For myself I was not actually considering the amount of speakers in each
country, but to map languages with countries/territories where the language
originated or have been spoken traditionally.
For instance in Norway we do have many immigrants from Pakistan, but I
doubt any of them would expect to see Urdu sorted under Norway, even though
there are many people in Norway that speak Urdu.
They would expect to see it under Pakistan that is a their heritage
country, I guess this is a lot an identity issue also

I do understand that it is not easy to get a perfect language-country
mapping, and I guess the mapping also depend on the use.
For myself I want people to be able to sort languages by
country/territories to make it easier to make lists of translations, I
think it can be good to be able to sort by territories instead of providing
a looong list of languages.
So I guess what matters is which language people mostly expect to find
under the country/territory.


>
> > I manage to find a dataset on the website of Ethnologue, though it
> > doesn't look like open source, need to check with them exactly how I'm
> > allowed to use it:
> > http://www.ethnologue.com/codes/download-code-tables
>
> The readme file included in the downloadable zip file makes SIL's terms
> very clear. Basically you need to credit SIL as the source of the data,
> not change it, and not make the data directly available for others to
> download. It's best not to get caught up in "open source" as if any
> other terms would make the data totally unusable.
>
>
I agree that a dataset is not unusable just because it is not open source,
but for myself I in fact need a dowbloadable file!

I tried contact SiL but they will only sell the dataset for a fee and will
not give an open source license.

Would it be possible to extend this dataset to all languages and start
build an open source data set for language-territory mapping?
http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161110/1d948e0e/attachment.html>

From doug at ewellic.org  Thu Nov 10 11:56:58 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 10 Nov 2016 10:56:58 -0700
Subject: Dataset for all ISO639 code sorted by
 =?UTF-8?Q?country/territory=3F?=
Message-ID: <20161110105658.665a7a7059d7ee80bb4d670165c8327d.a8ff034ef1.wbe@email03.godaddy.com>

Mats Blakstad wrote:

> For myself I was not actually considering the amount of speakers in
> each country, but to map languages with countries/territories where
> the language originated or have been spoken traditionally.

And that is where I think you'll have disagreement on the details.

> So I guess what matters is which language people mostly expect to find
> under the country/territory.

Yep, that's the challenge.

> Would it be possible to extend this dataset to all languages and start
> build an open source data set for language-territory mapping?
> http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html 

That's a good question for the CLDR folks, who have their own mailing
list.

Keep in mind that the CLDR table documents 675 of the world's best-known
languages, counting variants such as three different orthographies of
Uzbek. While anything is possible, extending this to "all languages,"
e.g. the other 6,300 lesser-known living languages, might require a bit
of time and money.

There is also a resource in the "UDHR in Unicode" project that might be
worth investigating, though it too is an imperfect match with what you
seem to be looking for.

--
Doug Ewell | Thornton, CO, US | ewellic.org


From Shawn.Steele at microsoft.com  Thu Nov 10 12:33:55 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Thu, 10 Nov 2016 18:33:55 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
Message-ID: <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>

More generally, does that mean that alphabets with perceived owners will only be considered for encoding with permission from those owner(s)?  What if the ownership is ambiguous or unclear?

Getting permission may be a lot of work, or cost money, in some cases.  Will applications be considered pending permission, perhaps being provisionally approved until such permission is received?

Is there specific language that Unicode would require from owners to be comfortable in these cases?  It makes little sense for a submitter to go through a complex exercise to request permission if Unicode is not comfortable with the wording of the permission that is garnered.  Are there other such agreements that could perhaps be used as templates?

Historically, the message pIqaD supporters have heard from Unicode has been that pIqaD is a toy script that does not have enough use.  The new proposal attempts to respond to those concerns, particularly since there is more interest in the script now.  Now, additional (valid) concerns are being raised.

In Mark?s case it seems like it would be nice if Unicode could consider the rest of the proposal and either tentatively approve it pending Paramount?s approval, or to provide feedback as to other defects in the proposal that would need addressed for consideration.  Meanwhile Mark can figure out how to get Paramount?s agreement.

-Shawn

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Peter Constable
Sent: Wednesday, November 9, 2016 8:49 PM
To: Mark E. Shoulson <mark at kli.org>; David Faulks <davidj_faulks at yahoo.ca>
Cc: Unicode Mailing List <unicode at unicode.org>
Subject: RE: The (Klingon) Empire Strikes Back

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark E. Shoulson
Sent: Friday, November 4, 2016 1:18 PM
> At any rate, this isn't Unicode's problem?

You saying that potential IP issues are not Unicode?s problem does not in fact make it not a problem. A statement in writing from authorized Paramount representatives stating it would not be a problem for either Unicode, its members or implementers of Unicode would make it not a problem for Unicode.


Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161110/cff5560b/attachment.html>

From andrewcwest at gmail.com  Thu Nov 10 13:25:50 2016
From: andrewcwest at gmail.com (Andrew West)
Date: Thu, 10 Nov 2016 19:25:50 +0000
Subject: Dataset for all ISO639 code sorted by country/territory?
In-Reply-To: <20161110105658.665a7a7059d7ee80bb4d670165c8327d.a8ff034ef1.wbe@email03.godaddy.com>
References: <20161110105658.665a7a7059d7ee80bb4d670165c8327d.a8ff034ef1.wbe@email03.godaddy.com>
Message-ID: <CALgEMhx2Dj_qn0yr8jA07epDkJV5o_yXqPyiUZfN=xqssWzkSA@mail.gmail.com>

On 10 November 2016 at 17:56, Doug Ewell <doug at ewellic.org> wrote:
>
> Keep in mind that the CLDR table documents 675 of the world's best-known
> languages, counting variants such as three different orthographies of
> Uzbek.

Oddly, it seems that there are over 1.2 billion speakers of Cantonese
in China, but no speakers of Mandarin (the biggest language by number
of speakers in the world).

Andrew

From Shawn.Steele at microsoft.com  Thu Nov 10 13:34:53 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Thu, 10 Nov 2016 19:34:53 +0000
Subject: Dataset for all ISO639 code sorted by country/territory?
In-Reply-To: <20161110105658.665a7a7059d7ee80bb4d670165c8327d.a8ff034ef1.wbe@email03.godaddy.com>
References: <20161110105658.665a7a7059d7ee80bb4d670165c8327d.a8ff034ef1.wbe@email03.godaddy.com>
Message-ID: <MWHPR03MB2813833E365234420070DF3582B80@MWHPR03MB2813.namprd03.prod.outlook.com>

I didn't really say anything because this is kinda a hopeless task, but it seems like some realities are being overlooked.  I'm as curious about cataloguing everything as the next OCD guy, but a general solution doesn't seem practical.

* There are a *lot* of languages
* Many countries have speakers of several languages.
	* In the US it's "obvious" that a list of languages for the US should include "English"
	* Spanish in the US is less obvious, however it is often considered important.
	* However, that's a slippery slope as there are many other languages with large groups of speakers in the US.  If such a list includes Spanish, should it not include some of the others?  San Francisco requires documents in 4 languages but provides telephone help for 200 languages.  Where's the line?
* Some languages happen in many places.  There are a disproportionate # of Englishes in CLDR, however Chinese is also spoken in lots of the countries that have English available in CLDR.  Yet CLDR doesn't provide data for those.
* Some language/region combinations could encounter geopolitical issues.  Like "it's not legal for that language to be spoken in XX" (but it happens).  Or "that language isn't YY country's language, it's ours!!!"

* The requirement "where the language has been spoken traditionally" is really, really subjective.  "Traditionally" the US is an English speaking country.  However, "Traditionally", there are hundreds of languages that have been spoken in the US.  What could be more "traditional" than the native American languages?  Yet those often have low numbers of speakers in the modern world, many are even dying languages.  There are also a number of "traditional" languages spoken by the original settlers.  Which differ than the set of languages spoken by modern immigrants.  So your data is going to be very skewed depending on the person collecting the data's definition of "traditional".

Ethnologue has done a decent job of identifying languages and the number of speakers in various areas, but it would be very difficult to draw a line that selected "English and Spanish in the US" and was consistent with similar real-life impacts across the other languages.  Do you pick the top n languages for each country?  Languages with > x million speakers (that would be very different in small and big countries).  Languages with > y% of the speakers in the different countries?

And then you end up with each application having to figure out it's own bar.  Applications will have different market considerations and other reasons to target different regions/languages.  That would skew any list for their purposes.

-Shawn


From mark at macchiato.com  Thu Nov 10 13:34:51 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Thu, 10 Nov 2016 11:34:51 -0800
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
Message-ID: <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>

The committee doesn't "tentatively approve, pending X".

But the good news is that I think it was the sense of the committee that
the evidence of use for Klingon is now sufficient, and the rest of the
proposal was in good shape (other than the lack of a date), so really only
the IP stands in the way.

I would suggest that the Klingon community work towards getting Paramount
to engage with us, so that any IP issues could be settled.

Mark

Mark

On Thu, Nov 10, 2016 at 10:33 AM, Shawn Steele <Shawn.Steele at microsoft.com>
wrote:

> More generally, does that mean that alphabets with perceived owners will
> only be considered for encoding with permission from those owner(s)?  What
> if the ownership is ambiguous or unclear?
>
>
>
> Getting permission may be a lot of work, or cost money, in some cases.
> Will applications be considered pending permission, perhaps being
> provisionally approved until such permission is received?
>
>
>
> Is there specific language that Unicode would require from owners to be
> comfortable in these cases?  It makes little sense for a submitter to go
> through a complex exercise to request permission if Unicode is not
> comfortable with the wording of the permission that is garnered.  Are there
> other such agreements that could perhaps be used as templates?
>
>
>
> Historically, the message pIqaD supporters have heard from Unicode has
> been that pIqaD is a toy script that does not have enough use.  The new
> proposal attempts to respond to those concerns, particularly since there is
> more interest in the script now.  Now, additional (valid) concerns are
> being raised.
>
>
>
> In Mark?s case it seems like it would be nice if Unicode could consider
> the rest of the proposal and either tentatively approve it pending
> Paramount?s approval, or to provide feedback as to other defects in the
> proposal that would need addressed for consideration.  Meanwhile Mark can
> figure out how to get Paramount?s agreement.
>
>
>
> -Shawn
>
>
>
> *From:* Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of *Peter
> Constable
> *Sent:* Wednesday, November 9, 2016 8:49 PM
> *To:* Mark E. Shoulson <mark at kli.org>; David Faulks <
> davidj_faulks at yahoo.ca>
> *Cc:* Unicode Mailing List <unicode at unicode.org>
> *Subject:* RE: The (Klingon) Empire Strikes Back
>
>
>
> *From:* Unicode [mailto:unicode-bounces at unicode.org
> <unicode-bounces at unicode.org>] *On Behalf Of *Mark E. Shoulson
> *Sent:* Friday, November 4, 2016 1:18 PM
>
> > At any rate, this isn't Unicode's problem?
>
>
>
> You saying that potential IP issues are not Unicode?s problem does not in
> fact make it not a problem. A statement in writing from authorized
> Paramount representatives stating it would not be a problem for either
> Unicode, its members or implementers of Unicode would make it not a problem
> for Unicode.
>
>
>
>
>
>
>
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161110/e5ac56fd/attachment.html>

From verdy_p at wanadoo.fr  Fri Nov 11 03:31:17 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 11 Nov 2016 10:31:17 +0100
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
Message-ID: <CAGa7JC0K_v+8Z7c-azxb-XYS2vzwk4RxEnBfy2XFq7n7=gzY-A@mail.gmail.com>

As Unicode will actually not encode the language itself, but just the
characters there's no problem at all in terms of IP, except for the
representative glyphs if they use the protected graphic designs.

Everything else is free, including the name that Unicode will choose for
designating the character names, or the single English term for designating
the script itself.

Then what will be challenging is not to support the script in software, but
render it with fonts. If people use the script to create their own texts in
this script, their text will be free, but it will not be possible to get it
rendered wit hthe protected glyph designs. But supporters will be inventive
and will create their own designs.

So the final thing which will be difficult for encoding the script will be
to produce a glyph chart in the standard and publish it under the Unicode
or ISO copyright. I assume that this chart will require approval by the IP
holder or some fair (but permanent) licencing to Unicode and ISO.

For other users of the standard, they are in a position equivalent to other
scripts, where charts are **also** protected by the copyright of the
standard and the rights attached to the fonts used and embedded in the PDF
documents: they cannot use the glyphs directly to derive their fonts. They
have to create and support fonts with their own designs.

Then whever the script will be used in texts conveying protected works in
the matching language, or for representing texts in unrelated languages
will have no importance :

The IP rights supposedly attached to the "language" are in the works
published and they must be significantly large enough and inventive to be
subject to a copyright, or a patent right, or to a "sui generi" database
right, or must have a valid registration in an applicable registry to be
subject to a trademark right. But even if these rights exist, they won't
cover the individual characters, and the Unicode character database or
standard (that will reference some elements related to the original work
covered by IP) are separate creations/inventions not covered by any earlier
rights: this is only a very small set of external references and if
Paramount claims that these references as infringing, they can be as well
removed: we don't really need direct references to Paramount (not even by
an URL or some other hypertext link).

If Paramount refuses to be cited, then it could just stop its own
activities, as no one will be able to talk and advertize their works that
will be unsellable... I doubt it will ever occur, however we should honor
the correct credits (fair and anyway required for any citations).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161111/9f6fc29b/attachment.html>

From charupdate at orange.fr  Fri Nov 11 16:35:09 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 11 Nov 2016 23:35:09 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
In-Reply-To: <1565548445.5480.1478457219235.JavaMail.www@wwinf2212>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
 <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
 <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>
 <CAGa7JC00ScDYO6yAJWK1qjkrBoEBKkPo765_0P76D5e=McCP7w@mail.gmail.com>
 <CAGa7JC14_GVbmuHZierM-WHFLSazDA5hAz53-Ymx+2YAjP+Gow@mail.gmail.com>
 <143705401.185.1478416945629.JavaMail.www@wwinf2212>
 <1565548445.5480.1478457219235.JavaMail.www@wwinf2212>
Message-ID: <1031067964.11192.1478903709471.JavaMail.www@wwinf1e26>

On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote:

> I am seeking technical information from a Microsoft team member. 
> Hopefully we will soon have definitive answers to replace all the 
> controversy. 

For lack of anything better, and faced with Microsoft?s one week?s silence, I 
now suggest to make a wider use of the Vietnamese text representation scheme 
that Microsoft implemented for Vietnamese, that is documented in TUS [1], and 
that might be of wider interest for all tone mark using languages, including 
but not limited to Ga and other languages of Togo and other countries of Africa, 
and Lithuanian:

? Vowels with diacritics that are not tone marks, e. g. 6 out of the 12 Vietnamese 
vowels as shown in Figure 7-3. of TUS 9.0 [2] are represented in NFC and entered 
either with live keys or with a dead key - live key combination;

? Tone marks are added as combining diacritics with live keys after the vowels.

Based on what I got and found, I believe that languages in Anglophone African 
countries use digraphs rather than diacritics, and that adding tone marks after 
the base letter could make for a consistent and already partially implemented [3]
worlwide standard.

Still we don?t know why Microsoft isn?t willing to upgrade its input framework 
for support of strings through dead keys, since Philippe Verdy?s findings show that 
there must be a way of doing it even without upgrading to XML layout definitions?


Marcel

[1] The Unicode Standard 9.0, ch. 7 Europe-I, ?7.1 Latin, sh. Vietnamese:
http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G19663

[2] http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G17544

[3] Cf. the already cited Unified Bambara-French keyboard layout (in French):
http://www.mali-pense.net/IMG/pdf/le-clavier_francais-bambara.pdf
Linked on the Resources for Bambara Practice page of Mali-Pense (in French):
http://www.mali-pense.net/Ressources-pour-la-pratique-du.html


From mark at kli.org  Sun Nov 13 15:56:30 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Sun, 13 Nov 2016 16:56:30 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <slrno23ff1.e9d.jcb@home.stevens-bradfield.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <CAMZ=zj6LOMsoU2vwM=1mE6zvTEOMxfqE2wq9UZ2WORY4CTU=yQ@mail.gmail.com>
 <d9b3402a-0298-36fe-12d1-342d4aea5e29@ix.netcom.com>
 <933c21bf-89ea-5078-eef7-7e0453cf02b6@kli.org>
 <slrno23ff1.e9d.jcb@home.stevens-bradfield.com>
Message-ID: <64982934-ca5b-5d29-7210-7e8fa3a27e50@kli.org>

On 11/08/2016 06:58 AM, Julian Bradfield wrote:
> On 2016-11-08, Mark E. Shoulson <mark at kli.org> wrote:
>> I've heard that there are similar questions regarding tengwar and cirth,
>> but it is notable that UTC *did* see fit to consider this question for
>> them and determine that they were worthy of encoding (they are on the
>> roadmap), even though they have not actually followed through on that
>> yet, perhaps because of these very IP concerns.  Notably, pIqaD is not
> The Tolkien Estate considers that the tengwar constitute a work of
> art, and it's not willing to see them in Unicode, because this would
> hinder its ability to pursue people using tengwar for what it
> considers inappropriate purposes. (I finally asked them a couple of
> years ago for permission to encode, based on Michael Everson's draft
> proposal from yonks ago, and that's the summary of their reply.)

I've said it before: if we could get pIqaD at leasr on the same footing 
as tengwar, that would be a step in the right direction. Saying they're 
in a similar fix is (currently) blatantly contradicted by the facts, and 
we might as well clear up whatever *else* it is that's holding pIqaD 
back, and then see about IP problems.

It sounds like some progress is being made in this front.

~mark

From mark at kli.org  Sun Nov 13 15:59:25 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Sun, 13 Nov 2016 16:59:25 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
Message-ID: <eec45bf9-a3d6-b60b-db86-4e10559385a6@kli.org>

On 11/09/2016 11:49 PM, Peter Constable wrote:
>
> *From:*Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of 
> *Mark E. Shoulson
> *Sent:* Friday, November 4, 2016 1:18 PM
>
> **
>
> > At any rate, this isn't Unicode's problem?
>
> You saying that potential IP issues are not Unicode?s problem does not 
> in fact make it not a problem. A statement in writing from authorized 
> Paramount representatives stating it would not be a problem for either 
> Unicode, its members or implementers of Unicode would make it not a 
> problem for Unicode.
>
> Peter
>
That's a fair point; any problems arising from this *would* affect 
Unicode.  I guess what I was trying to say is that such an issue, while 
a problem once encoding proceeds, should not affect the determination of 
whether or not the encoding is *warranted*.

~mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161113/ffa88e30/attachment.html>

From mark at kli.org  Sun Nov 13 16:10:22 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Sun, 13 Nov 2016 17:10:22 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
Message-ID: <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>

On 11/10/2016 02:34 PM, Mark Davis ?? wrote:
> The committee doesn't "tentatively approve, pending X".
>
> But the good news is that I think it was the sense of the committee 
> that the evidence of use for Klingon is now sufficient, and the rest 
> of the proposal was in good shape (other than the lack of a date), so 
> really only the IP stands in the way.

Fair enough.  There have, I think, been other cases of this sort of 
informal "tentative approval", usually involving someone from UTC 
telling the proposer, "your proposal is okay, but you probably need to 
change this..."  And that's about the best I could hope for at this 
point anyway.  So it sounds like (correct me if I'm wrong) there is at 
least unofficial recognition that pIqaD *should* be encoded, and that 
it's mainly an IP problem now (like with tengwar), and possibly some 
minor issues that maybe hadn't been addressed properly in the proposal.

Can we get pIqaD removed from 
http://www.unicode.org/roadmaps/not-the-roadmap/ then?  And (dare I ask) 
perhaps enshrined someplace in http://www.unicode.org/roadmaps/smp/ 
pending further progress with Paramount?

> I would suggest that the Klingon community work towards getting 
> Paramount to engage with us, so that any IP issues could be settled.

I'll see what we can come up with; have to start somewhere.  There is a 
VERY good argument to be made that Paramount doesn't actually have the 
right to stop the encoding, as you can't copyright an alphabet (as we 
have seen), and they don't have a current copyright to "Klingon" in this 
domain, etc., and it may eventually come down to these arguments.  
However, I recognize that having a good argument on your side, and 
indeed even having the law on your side, does not guarantee smooth 
sailing when the other guys have a huge well-funded legal department on 
their side, and thus I understand UTC's reluctance to move forward 
without better legal direction. But at least we can say we've made 
progress, can't we?

~mark

>
> Mark
>
> Mark
> //
>
> On Thu, Nov 10, 2016 at 10:33 AM, Shawn Steele 
> <Shawn.Steele at microsoft.com <mailto:Shawn.Steele at microsoft.com>> wrote:
>
>     More generally, does that mean that alphabets with perceived
>     owners will only be considered for encoding with permission from
>     those owner(s)?  What if the ownership is ambiguous or unclear?
>
>     Getting permission may be a lot of work, or cost money, in some
>     cases.  Will applications be considered pending permission,
>     perhaps being provisionally approved until such permission is
>     received?
>
>     Is there specific language that Unicode would require from owners
>     to be comfortable in these cases?  It makes little sense for a
>     submitter to go through a complex exercise to request permission
>     if Unicode is not comfortable with the wording of the permission
>     that is garnered.  Are there other such agreements that could
>     perhaps be used as templates?
>
>     Historically, the message pIqaD supporters have heard from Unicode
>     has been that pIqaD is a toy script that does not have enough
>     use.  The new proposal attempts to respond to those concerns,
>     particularly since there is more interest in the script now.  Now,
>     additional (valid) concerns are being raised.
>
>     In Mark?s case it seems like it would be nice if Unicode could
>     consider the rest of the proposal and either tentatively approve
>     it pending Paramount?s approval, or to provide feedback as to
>     other defects in the proposal that would need addressed for
>     consideration.  Meanwhile Mark can figure out how to get
>     Paramount?s agreement.
>
>     -Shawn
>
>     *From:*Unicode [mailto:unicode-bounces at unicode.org
>     <mailto:unicode-bounces at unicode.org>] *On Behalf Of *Peter Constable
>     *Sent:* Wednesday, November 9, 2016 8:49 PM
>     *To:* Mark E. Shoulson <mark at kli.org <mailto:mark at kli.org>>; David
>     Faulks <davidj_faulks at yahoo.ca <mailto:davidj_faulks at yahoo.ca>>
>     *Cc:* Unicode Mailing List <unicode at unicode.org
>     <mailto:unicode at unicode.org>>
>     *Subject:* RE: The (Klingon) Empire Strikes Back
>
>     *From:*Unicode [mailto:unicode-bounces at unicode.org
>     <mailto:unicode-bounces at unicode.org>] *On Behalf Of *Mark E. Shoulson
>     *Sent:* Friday, November 4, 2016 1:18 PM
>
>     >At any rate, this isn't Unicode's problem?
>
>     You saying that potential IP issues are not Unicode?s problem does
>     not in fact make it not a problem. A statement in writing from
>     authorized Paramount representatives stating it would not be a
>     problem for either Unicode, its members or implementers of Unicode
>     would make it not a problem for Unicode.
>
>     Peter
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161113/66da38fb/attachment.html>

From duerst at it.aoyama.ac.jp  Tue Nov 15 02:23:58 2016
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=)
Date: Tue, 15 Nov 2016 17:23:58 +0900
Subject: Possible to add new precomposed characters for local language in
 Togo?
In-Reply-To: <1031067964.11192.1478903709471.JavaMail.www@wwinf1e26>
References: <20161104153048.665a7a7059d7ee80bb4d670165c8327d.4f17bdb7bd.wbe@email03.godaddy.com>
 <1745350033.7396.1478401863009.JavaMail.www@wwinf2212>
 <CAGa7JC2QTYYPSf0UGcXRvvnpO+PVoU8802gxKSFFzNOYakjGLA@mail.gmail.com>
 <CAGa7JC00ScDYO6yAJWK1qjkrBoEBKkPo765_0P76D5e=McCP7w@mail.gmail.com>
 <CAGa7JC14_GVbmuHZierM-WHFLSazDA5hAz53-Ymx+2YAjP+Gow@mail.gmail.com>
 <143705401.185.1478416945629.JavaMail.www@wwinf2212>
 <1565548445.5480.1478457219235.JavaMail.www@wwinf2212>
 <1031067964.11192.1478903709471.JavaMail.www@wwinf1e26>
Message-ID: <f4827199-9856-1aed-c847-77c879d2c29e@it.aoyama.ac.jp>

Hello Marcel,

On 2016/11/12 07:35, Marcel Schneider wrote:

> For lack of anything better, and faced with Microsoft?s one week?s silence, I
> now suggest to make a wider use of the Vietnamese text representation scheme
> that Microsoft implemented for Vietnamese, that is documented in TUS [1], and
> that might be of wider interest for all tone mark using languages, including
> but not limited to Ga and other languages of Togo and other countries of Africa,
> and Lithuanian:
>
> ? Vowels with diacritics that are not tone marks, e. g. 6 out of the 12 Vietnamese
> vowels as shown in Figure 7-3. of TUS 9.0 [2] are represented in NFC and entered
> either with live keys or with a dead key - live key combination;

> [1] The Unicode Standard 9.0, ch. 7 Europe-I, ?7.1 Latin, sh. Vietnamese:
> http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G19663
>
> [2] http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G17544

I'm sorry, but I didn't get the fragment identifiers (#G19663, #G17544) 
to work. Can you tell me which pages/paragraphs you refer to here?

Thanks and regards,   Martin.

From charupdate at orange.fr  Tue Nov 15 04:38:23 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Tue, 15 Nov 2016 11:38:23 +0100 (CET)
Subject: Possible to add new precomposed characters for local language
 in Togo?
Message-ID: <1902272544.4809.1479206303384.JavaMail.www@wwinf1g20>

Hi Martin,

On Tue, 15 Nov 2016 17:23:58 +0900, Martin J. D?rst wrote:
[?]
> I'm sorry, but I didn't get the fragment identifiers (#G19663, #G17544) 
> to work. Can you tell me which pages/paragraphs you refer to here? 

Sorry for the omission of the page number!

In the document pagination of TUS 9.0 it?s on page 296, that is page 332 of 
the full PDF, or page 7 of the Chapter 7 PDF. On this page, I?m referring to 
the first three paragraphs, and to the second figure.

As of working with PDF fragment identifiers, I must confess that I?m unable 
to do this in Adobe Reader, neither grabbing nor input, despite of the side 
pane working fine for browsing. To open a fragment following its identifier 
in a local copy, I must open the PDF in a web browser, add the ID in the URL 
bar, and refresh the document; that works fine in Chrome. To grab an ID of 
a TUS fragment, I open the PDF in Firefox, display the side pane in TOC mode, 
and copy the URI of the bookmark.

Best regards,

Marcel


From doug at ewellic.org  Tue Nov 15 10:47:00 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 15 Nov 2016 09:47:00 -0700
Subject: Possible to add new precomposed characters for local language in
 =?UTF-8?Q?Togo=3F?=
Message-ID: <20161115094700.665a7a7059d7ee80bb4d670165c8327d.e8e451dcd6.wbe@email03.godaddy.com>

Marcel Schneider wrote:

> For lack of anything better, and faced with Microsoft?s one week?s
> silence, I now suggest to make a wider use of the Vietnamese text
> representation scheme that Microsoft implemented for Vietnamese, that
> is documented in TUS [1], 

The entire "documentation" of this approach in Section 7.1 of TUS is:

"Some widely used implementations prefer storing the vowel letter and
the tone mark separately."

That said,

> and that might be of wider interest for all tone mark using languages,
> including but not limited to Ga and other languages of Togo and other
> countries of Africa, and Lithuanian: 
>
> ? Vowels with diacritics that are not tone marks, e. g. 6 out of the
> 12 Vietnamese vowels as shown in Figure 7-3. of TUS 9.0 [2] are
> represented in NFC and entered either with live keys or with a dead
> key - live key combination; 
>
> ? Tone marks are added as combining diacritics with live keys after
> the vowels.

As long as implementations can deal with text that is not strictly NFC,
this seems like a sensible way to support multiple diacritical marks
while remaining compatible with existing dead-key implementations (Mats
stated that compatibility with the existing French layout was a
requirement) and existing architectural constraints.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From petercon at microsoft.com  Tue Nov 15 11:22:41 2016
From: petercon at microsoft.com (Peter Constable)
Date: Tue, 15 Nov 2016 17:22:41 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
Message-ID: <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>

Klingon _should not_ be encoded so long as there are open IP issues. For that reason, I think it would be premature to place it in the roadmap.


Peter

From: Mark E. Shoulson [mailto:mark at kli.org]
Sent: Sunday, November 13, 2016 2:10 PM
To: Mark Davis ?? <mark at macchiato.com>; Shawn Steele <Shawn.Steele at microsoft.com>
Cc: Peter Constable <petercon at microsoft.com>; David Faulks <davidj_faulks at yahoo.ca>; Unicode Mailing List <unicode at unicode.org>
Subject: Re: The (Klingon) Empire Strikes Back

On 11/10/2016 02:34 PM, Mark Davis ?? wrote:
The committee doesn't "tentatively approve, pending X".

But the good news is that I think it was the sense of the committee that the evidence of use for Klingon is now sufficient, and the rest of the proposal was in good shape (other than the lack of a date), so really only the IP stands in the way.

Fair enough.  There have, I think, been other cases of this sort of informal "tentative approval", usually involving someone from UTC telling the proposer, "your proposal is okay, but you probably need to change this..."  And that's about the best I could hope for at this point anyway.  So it sounds like (correct me if I'm wrong) there is at least unofficial recognition that pIqaD *should* be encoded, and that it's mainly an IP problem now (like with tengwar), and possibly some minor issues that maybe hadn't been addressed properly in the proposal.

Can we get pIqaD removed from http://www.unicode.org/roadmaps/not-the-roadmap/ then?  And (dare I ask) perhaps enshrined someplace in http://www.unicode.org/roadmaps/smp/ pending further progress with Paramount?


I would suggest that the Klingon community work towards getting Paramount to engage with us, so that any IP issues could be settled.

I'll see what we can come up with; have to start somewhere.  There is a VERY good argument to be made that Paramount doesn't actually have the right to stop the encoding, as you can't copyright an alphabet (as we have seen), and they don't have a current copyright to "Klingon" in this domain, etc., and it may eventually come down to these arguments.  However, I recognize that having a good argument on your side, and indeed even having the law on your side, does not guarantee smooth sailing when the other guys have a huge well-funded legal department on their side, and thus I understand UTC's reluctance to move forward without better legal direction.  But at least we can say we've made progress, can't we?

~mark


Mark

Mark

On Thu, Nov 10, 2016 at 10:33 AM, Shawn Steele <Shawn.Steele at microsoft.com<mailto:Shawn.Steele at microsoft.com>> wrote:
More generally, does that mean that alphabets with perceived owners will only be considered for encoding with permission from those owner(s)?  What if the ownership is ambiguous or unclear?

Getting permission may be a lot of work, or cost money, in some cases.  Will applications be considered pending permission, perhaps being provisionally approved until such permission is received?

Is there specific language that Unicode would require from owners to be comfortable in these cases?  It makes little sense for a submitter to go through a complex exercise to request permission if Unicode is not comfortable with the wording of the permission that is garnered.  Are there other such agreements that could perhaps be used as templates?

Historically, the message pIqaD supporters have heard from Unicode has been that pIqaD is a toy script that does not have enough use.  The new proposal attempts to respond to those concerns, particularly since there is more interest in the script now.  Now, additional (valid) concerns are being raised.

In Mark?s case it seems like it would be nice if Unicode could consider the rest of the proposal and either tentatively approve it pending Paramount?s approval, or to provide feedback as to other defects in the proposal that would need addressed for consideration.  Meanwhile Mark can figure out how to get Paramount?s agreement.

-Shawn

From: Unicode [mailto:unicode-bounces at unicode.org<mailto:unicode-bounces at unicode.org>] On Behalf Of Peter Constable
Sent: Wednesday, November 9, 2016 8:49 PM
To: Mark E. Shoulson <mark at kli.org<mailto:mark at kli.org>>; David Faulks <davidj_faulks at yahoo.ca<mailto:davidj_faulks at yahoo.ca>>
Cc: Unicode Mailing List <unicode at unicode.org<mailto:unicode at unicode.org>>
Subject: RE: The (Klingon) Empire Strikes Back

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark E. Shoulson
Sent: Friday, November 4, 2016 1:18 PM
> At any rate, this isn't Unicode's problem?

You saying that potential IP issues are not Unicode?s problem does not in fact make it not a problem. A statement in writing from authorized Paramount representatives stating it would not be a problem for either Unicode, its members or implementers of Unicode would make it not a problem for Unicode.


Peter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161115/88db5714/attachment.html>

From doug at ewellic.org  Tue Nov 15 11:39:37 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 15 Nov 2016 10:39:37 -0700
Subject: The (Klingon) Empire Strikes Back
Message-ID: <20161115103937.665a7a7059d7ee80bb4d670165c8327d.455ad40709.wbe@email03.godaddy.com>

Peter Constable wrote:

> Klingon _should not_ be encoded so long as there are open IP issues.
> For that reason, I think it would be premature to place it in the
> roadmap.

But Mark's point about removing it from the "Not the Roadmap" page,
which categorizes it among "Scripts (or pseudoscripts) which have been
investigated and rejected as unsuitable for encoding," may be a valid
one. There is a difference between "unsuitable for encoding" and "might
turn out to be unencodable due to IP issues."
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From Shawn.Steele at microsoft.com  Tue Nov 15 11:44:13 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Tue, 15 Nov 2016 17:44:13 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <20161115103937.665a7a7059d7ee80bb4d670165c8327d.455ad40709.wbe@email03.godaddy.com>
References: <20161115103937.665a7a7059d7ee80bb4d670165c8327d.455ad40709.wbe@email03.godaddy.com>
Message-ID: <MWHPR03MB2813A03B00D4B5C3629606A282BF0@MWHPR03MB2813.namprd03.prod.outlook.com>

I'm a little confused.  I thought that the primary reason Cirth and Tengwar were on the roadmap - and not actually encoded - were because of the IP concerns?  (I confess to not following them very closely, so I may be wrong.)

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Doug Ewell
Sent: Tuesday, November 15, 2016 9:40 AM
To: Unicode Mailing List <unicode at unicode.org>
Cc: Mark Shoulson <mark at kli.org>; Peter Constable <petercon at microsoft.com>
Subject: RE: The (Klingon) Empire Strikes Back

Peter Constable wrote:

> Klingon _should not_ be encoded so long as there are open IP issues.
> For that reason, I think it would be premature to place it in the 
> roadmap.

But Mark's point about removing it from the "Not the Roadmap" page, which categorizes it among "Scripts (or pseudoscripts) which have been investigated and rejected as unsuitable for encoding," may be a valid one. There is a difference between "unsuitable for encoding" and "might turn out to be unencodable due to IP issues."
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From petercon at microsoft.com  Tue Nov 15 11:49:11 2016
From: petercon at microsoft.com (Peter Constable)
Date: Tue, 15 Nov 2016 17:49:11 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <20161115103937.665a7a7059d7ee80bb4d670165c8327d.455ad40709.wbe@email03.godaddy.com>
References: <20161115103937.665a7a7059d7ee80bb4d670165c8327d.455ad40709.wbe@email03.godaddy.com>
Message-ID: <SN1PR0301MB19666119105578D91E3FF75AD5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>

I was responding to this:

> And (dare I ask) perhaps enshrined someplace in http://www.unicode.org/roadmaps/smp/ pending further progress with Paramount?


Peter

-----Original Message-----
From: Doug Ewell [mailto:doug at ewellic.org] 
Sent: Tuesday, November 15, 2016 9:40 AM
To: Unicode Mailing List <unicode at unicode.org>
Cc: Mark Shoulson <mark at kli.org>; Peter Constable <petercon at microsoft.com>
Subject: RE: The (Klingon) Empire Strikes Back

Peter Constable wrote:

> Klingon _should not_ be encoded so long as there are open IP issues.
> For that reason, I think it would be premature to place it in the 
> roadmap.

But Mark's point about removing it from the "Not the Roadmap" page, which categorizes it among "Scripts (or pseudoscripts) which have been investigated and rejected as unsuitable for encoding," may be a valid one. There is a difference between "unsuitable for encoding" and "might turn out to be unencodable due to IP issues."
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From asmusf at ix.netcom.com  Tue Nov 15 12:21:09 2016
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Tue, 15 Nov 2016 10:21:09 -0800
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
Message-ID: <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161115/ba7208b3/attachment.html>

From mark at macchiato.com  Tue Nov 15 18:31:42 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Tue, 15 Nov 2016 17:31:42 -0700
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
Message-ID: <CAJ2xs_F6G9n6zoBJimBuAUrwgGKqngk4r5Y64gdi5bhXr-p8sg@mail.gmail.com>

> However, it appears relatively settled that one cannot claim copyright in
an alphabet...

We know that these parties tend to be litigious, so we have to be careful.
"relatively settled" is not good enough.

We do not want to be the ones responsible (and liable) for making a
determination as to whether that is settled. Nor do we want to pay the
legal fees necessary to make a water-tight determination.

That is why if there is any question as to the IP issues, we leave it up to
the proposers to get absolutely rock-solid clearance (eg from the Tolkien
estate for Tengwar, or from Paramount for Klingon). The only other
alternative I can think of is if the proposers provide indemnification for
any legal costs that could obtain from a legal suit of us or our vendors.

Mark

On Tue, Nov 15, 2016 at 11:21 AM, Asmus Freytag <asmusf at ix.netcom.com>
wrote:

> On 11/15/2016 9:22 AM, Peter Constable wrote:
>
> Klingon _*should not*_ be encoded so long as there are open IP issues.
> For that reason, I think it would be premature to place it in the roadmap.
>
>
>
> Peter,
>
> I certainly sympathize with the fact that the Consortium wants to avoid
> being drawn into litigation, and that even litigation based on unsustained
> IP claims could be costly.
>
> However, it appears relatively settled that one cannot claim copyright in
> an alphabet; one of the roles of the Unicode Consortium in this regard
> would be to reach a formal decision whether this is, in fact, an
> alphabet/script (and one that, based on the usual criteria of usage) is
> acceptable for encoding.
>
> Ducking this particular determination serves no-one.
>
> This does not mean that publication would have to be immediate; there's
> certainly room for something like an approval to include a script in "some"
> future version of the standard, which would allow all parties to figure out
> how to deal with any IP issues. (Note that this would not be a decision,
> "pending" anything, merely separating approval of a script proposal from a
> decision of the contents for a particular version - something that used to
> be rather routine in earlier years).
>
> I would also like to point out that Unicode would be well served by taking
> a stronger position on the issue of IP claims on writing systems, in
> particular copyright claims. These seem to be unfounded at least under US
> law; should Unicode nevertheless allow such unfounded claims become a way
> to veto the encoding of any script/writing system (or script extension)?
>
> As we move on, the number of cases where writing systems, or innovations
> in writing systems may be subject to unfounded claims of copyright may
> become more mainstream (think national writing systems, rather than
> fan-based ones). Already, the emoji are a good example how, now that the
> bulk of living/historic writing systems has been encoded, the "novelties"
> come to the forefront.
>
> Finally, I really can't understand the reluctance to place anything in the
> roadmap. An entry in the roadmap is not a commitment to anything - many
> scripts listed there face enormous obstacles before they could even reach
> the stage of a well-founded proposal. And, until such a proposal exists,
> there's no formal determination that a script has a truly separate identity
> and meets the bar for encoding.
>
> A./
>
> PS: the "real" reason that Klingon was never put in the roadmap (as I
> recall discussions in the early years) was not so much the question whether
> IP issues existed/could be resolved, but the fear that adding such an
> "invented" and "frivolous" script would undermine the acceptance of
> Unicode. Given the way Unicode is invested in "frivolous" communication
> systems of very recent origin (emoji), that original argument surely
> doesn't apply :)
>
>
>
> Peter
>
>
>
> *From:* Mark E. Shoulson [mailto:mark at kli.org <mark at kli.org>]
> *Sent:* Sunday, November 13, 2016 2:10 PM
> *To:* Mark Davis ?? <mark at macchiato.com> <mark at macchiato.com>; Shawn
> Steele <Shawn.Steele at microsoft.com> <Shawn.Steele at microsoft.com>
> *Cc:* Peter Constable <petercon at microsoft.com> <petercon at microsoft.com>;
> David Faulks <davidj_faulks at yahoo.ca> <davidj_faulks at yahoo.ca>; Unicode
> Mailing List <unicode at unicode.org> <unicode at unicode.org>
> *Subject:* Re: The (Klingon) Empire Strikes Back
>
>
>
> On 11/10/2016 02:34 PM, Mark Davis ?? wrote:
>
> The committee doesn't "tentatively approve, pending X".
>
>
>
> But the good news is that I think it was the sense of the committee that
> the evidence of use for Klingon is now sufficient, and the rest of the
> proposal was in good shape (other than the lack of a date), so really only
> the IP stands in the way.
>
>
> Fair enough.  There have, I think, been other cases of this sort of
> informal "tentative approval", usually involving someone from UTC telling
> the proposer, "your proposal is okay, but you probably need to change
> this..."  And that's about the best I could hope for at this point anyway.
> So it sounds like (correct me if I'm wrong) there is at least unofficial
> recognition that pIqaD *should* be encoded, and that it's mainly an IP
> problem now (like with tengwar), and possibly some minor issues that maybe
> hadn't been addressed properly in the proposal.
>
> Can we get pIqaD removed from http://www.unicode.org/
> roadmaps/not-the-roadmap/ then?  And (dare I ask) perhaps enshrined
> someplace in http://www.unicode.org/roadmaps/smp/ pending further
> progress with Paramount?
>
>
> I would suggest that the Klingon community work towards getting Paramount
> to engage with us, so that any IP issues could be settled.
>
>
> I'll see what we can come up with; have to start somewhere.  There is a
> VERY good argument to be made that Paramount doesn't actually have the
> right to stop the encoding, as you can't copyright an alphabet (as we have
> seen), and they don't have a current copyright to "Klingon" in this domain,
> etc., and it may eventually come down to these arguments.  However, I
> recognize that having a good argument on your side, and indeed even having
> the law on your side, does not guarantee smooth sailing when the other guys
> have a huge well-funded legal department on their side, and thus I
> understand UTC's reluctance to move forward without better legal
> direction.  But at least we can say we've made progress, can't we?
>
> ~mark
>
>
>
>
> Mark
>
>
> Mark
>
>
>
> On Thu, Nov 10, 2016 at 10:33 AM, Shawn Steele <Shawn.Steele at microsoft.com>
> wrote:
>
> More generally, does that mean that alphabets with perceived owners will
> only be considered for encoding with permission from those owner(s)?  What
> if the ownership is ambiguous or unclear?
>
>
>
> Getting permission may be a lot of work, or cost money, in some cases.
> Will applications be considered pending permission, perhaps being
> provisionally approved until such permission is received?
>
>
>
> Is there specific language that Unicode would require from owners to be
> comfortable in these cases?  It makes little sense for a submitter to go
> through a complex exercise to request permission if Unicode is not
> comfortable with the wording of the permission that is garnered.  Are there
> other such agreements that could perhaps be used as templates?
>
>
>
> Historically, the message pIqaD supporters have heard from Unicode has
> been that pIqaD is a toy script that does not have enough use.  The new
> proposal attempts to respond to those concerns, particularly since there is
> more interest in the script now.  Now, additional (valid) concerns are
> being raised.
>
>
>
> In Mark?s case it seems like it would be nice if Unicode could consider
> the rest of the proposal and either tentatively approve it pending
> Paramount?s approval, or to provide feedback as to other defects in the
> proposal that would need addressed for consideration.  Meanwhile Mark can
> figure out how to get Paramount?s agreement.
>
>
>
> -Shawn
>
>
>
> *From:* Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of *Peter
> Constable
> *Sent:* Wednesday, November 9, 2016 8:49 PM
> *To:* Mark E. Shoulson <mark at kli.org>; David Faulks <
> davidj_faulks at yahoo.ca>
> *Cc:* Unicode Mailing List <unicode at unicode.org>
> *Subject:* RE: The (Klingon) Empire Strikes Back
>
>
>
> *From:* Unicode [mailto:unicode-bounces at unicode.org
> <unicode-bounces at unicode.org>] *On Behalf Of *Mark E. Shoulson
> *Sent:* Friday, November 4, 2016 1:18 PM
>
> > At any rate, this isn't Unicode's problem?
>
>
>
> You saying that potential IP issues are not Unicode?s problem does not in
> fact make it not a problem. A statement in writing from authorized
> Paramount representatives stating it would not be a problem for either
> Unicode, its members or implementers of Unicode would make it not a problem
> for Unicode.
>
>
>
>
>
>
>
> Peter
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161115/76f6d66a/attachment.html>

From mark at kli.org  Tue Nov 15 18:39:36 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 15 Nov 2016 19:39:36 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
Message-ID: <d6d97678-5ca0-f45d-b096-f8b75b9d5ed8@kli.org>

On 11/15/2016 12:22 PM, Peter Constable wrote:
>
> Klingon _/should not/_ be encoded so long as there are open IP issues. 
> For that reason, I think it would be premature to place it in the roadmap.
>
Then why is tengwar there, and Klingon proclaimed "unsuitable" for 
encoding?  Everyone's telling me the situation is the same with tengwar, 
and yet it isn't.  What is it about Tolkien scripts that makes them 
suitable and pIqaD not?  Artistic interest doesn't count.

I'm not trying to get tengwar/cirth *demoted*, but I would like someone 
to explain to me why some fandoms/scripts seem to be better than others.


~mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161115/065dda3d/attachment.html>

From everson at evertype.com  Tue Nov 15 18:47:32 2016
From: everson at evertype.com (Michael Everson)
Date: Wed, 16 Nov 2016 00:47:32 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <d6d97678-5ca0-f45d-b096-f8b75b9d5ed8@kli.org>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <d6d97678-5ca0-f45d-b096-f8b75b9d5ed8@kli.org>
Message-ID: <54D68C57-FB87-46D8-A822-3A1848CDD611@evertype.com>

A body of a particular kind of scholarship surrounds Tolkien?s oeuvre. That?s probably the reason. 

Michael Everson

From mark at kli.org  Tue Nov 15 19:04:09 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 15 Nov 2016 20:04:09 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
Message-ID: <33c1ba77-2ba0-dd62-327f-edd32b3efa23@kli.org>

On 11/15/2016 01:21 PM, Asmus Freytag wrote:
> On 11/15/2016 9:22 AM, Peter Constable wrote:
>>
>> Klingon _/should not/_ be encoded so long as there are open IP 
>> issues. For that reason, I think it would be premature to place it in 
>> the roadmap.
>>
> Peter,
>
> I certainly sympathize with the fact that the Consortium wants to 
> avoid being drawn into litigation, and that even litigation based on 
> unsustained IP claims could be costly.
>
> However, it appears relatively settled that one cannot claim copyright 
> in an alphabet; one of the roles of the Unicode Consortium in this 
> regard would be to reach a formal decision whether this is, in fact, 
> an alphabet/script (and one that, based on the usual criteria of 
> usage) is acceptable for encoding.
>
> Ducking this particular determination serves no-one.

Thanks, Asmus.

I can understand the UTC's caution: you don't want to open yourself up 
to litigation?even if you eventually win.  But this also is likely not 
going to be the first time that there is this kind of legal hold on 
something encodable.  I note that Blissymbolics, according to Wikipedia, 
*does* have a copyright (as opposed to "maybe they might think they do") 
and yet it, too, is roadmapped. If I didn't know better (and I don't), I 
might think there was some sort of bias against Klingon.

> Finally, I really can't understand the reluctance to place anything in 
> the roadmap. An entry in the roadmap is not a commitment to anything - 
> many scripts listed there face enormous obstacles before they could 
> even reach the stage of a well-founded proposal. And, until such a 
> proposal exists, there's no formal determination that a script has a 
> truly separate identity and meets the bar for encoding.

NOT being called out for being unencodable would be a step up for 
Klingon, at least, let alone the roadmap.

> PS: the "real" reason that Klingon was never put in the roadmap (as I 
> recall discussions in the early years) was not so much the question 
> whether IP issues existed/could be resolved, but the fear that adding 
> such an "invented" and "frivolous" script would undermine the 
> acceptance of Unicode. Given the way Unicode is invested in 
> "frivolous" communication systems of very recent origin (emoji), that 
> original argument surely doesn't apply :)

Yes, of course, though it's nice to have someone say it out loud. You do 
of course realize that that sentiment is *precisely* as offensive as 
"Unicode shouldn't encode African scripts, because only darkies use them 
anyway, and we wouldn't want to be seen as supporting *those* people."  
Bigotry is bigotry, even when applied to fans.  Essentially, the claim 
is "we shouldn't encode those, not because nobody uses them, but because 
nobody *important* uses them."

I was talking to someone once about Unicode, and explained that they 
were responsible for encoding emoji, etc.  And he scoffed at that, "why 
encode those?  who uses those anyway?"  I said, "Millions of people 
around the world use them every day in tweets and instant messages..." 
"Yeah, but I mean, aside from that!"  The question is, who out there who 
is *important* is using them for *important* things.  And if the UTC has 
to get in the business of judging what qualifies as "important" 
communication, you're going to need a lot more members, just to go 
through everything being printed. (Why encode chess pieces?  Only chess 
nerds use them, and I don't care about chess.  Go piece signs?  Nobody 
*I* talk to uses those.  And don't even get me started on pictures of 
baseballs.  And only goyim would need a picture of a breaded shrimp...)

It's refreshing to hear it finally admitted in full.  I always felt that 
if people are going to act unfairly, they should at least say "yes, 
we're acting unfairly, because you don't deserve fairness." Then they 
can explain why fairness is undeserved.

~mark

From kenwhistler at att.net  Tue Nov 15 19:15:58 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Tue, 15 Nov 2016 17:15:58 -0800
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
Message-ID: <90d16ff4-eef9-28df-3d9a-51a8011339ce@att.net>


On 11/15/2016 10:21 AM, Asmus Freytag wrote:
> Finally, I really can't understand the reluctance to place anything in 
> the roadmap. An entry in the roadmap is not a commitment to anything - 
> many scripts listed there face enormous obstacles before they could 
> even reach the stage of a well-founded proposal. And, until such a 
> proposal exists, there's no formal determination that a script has a 
> truly separate identity and meets the bar for encoding.

The barrier to putting it in the roadmap is the that it pIQaD is 
currently listed on *not*-the-roadmap:

http://www.unicode.org/roadmaps/not-the-roadmap/

as Mark Shoulsen has been repeatedly pointing out.

It would be inconsistent to add it to the SMP roadmap unless we delete 
it from not-the-roadmap.

And the reason that step has been stuck is because the UTC is still on 
record with a nonapproval notice for the Klingon script from 2001. 
(Based on Consensus 87-M3.)

http://www.unicode.org/alloc/nonapprovals.html

So figure it out, folks. First bring to the UTC a proposal to reverse 
87-M3. (Not to *encode* pIQaD yet -- just, on the basis of the new, more 
mature proposal, to *entertain* appropriate discussion about suitability 
for encoding, by rescinding the prior determination of nonapproval.) If 
*that* proposal passed, then the nonapproval notice would also be 
dropped. If the nonapproval notice is dropped, the not-the-roadmap entry 
would be dropped. And if that is dropped, then the Roadmap committee 
would dig around for a tentative allocation slot, pending the 
determination of outcome for any other issues. Which then could focus on 
the next obstacle, which is IP and the unresolved risk of litigation.

In any case, folks should stop with with "Unfair! Unfair!" stuff, and 
just set to work, step-by-step, to deal with the items noted above. "A 
Klingon is trained to use everything around them to their advantage." 
O.k., I've just provided something useful -- go for it. And you won't 
even need a cloaking device.

--Ken


From mark at kli.org  Tue Nov 15 19:19:23 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 15 Nov 2016 20:19:23 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <CAJ2xs_F6G9n6zoBJimBuAUrwgGKqngk4r5Y64gdi5bhXr-p8sg@mail.gmail.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
 <CAJ2xs_F6G9n6zoBJimBuAUrwgGKqngk4r5Y64gdi5bhXr-p8sg@mail.gmail.com>
Message-ID: <db9b94ad-af41-8c0d-4d21-d6661ee49777@kli.org>

On 11/15/2016 07:31 PM, Mark Davis ?? wrote:
> > However, it appears relatively settled that one cannot claim 
> copyright in an alphabet...
>
> We know that these parties tend to be litigious, so we have to be 
> careful. "relatively settled" is not good enough.
>
> We do not want to be the ones responsible (and liable) for making a 
> determination as to whether that is settled. Nor do we want to pay the 
> legal fees necessary to make a water-tight determination.
>
> That is why if there is any question as to the IP issues, we leave it 
> up to the proposers to get absolutely rock-solid clearance (eg from 
> the Tolkien estate for Tengwar, or from Paramount for Klingon). The 
> only other alternative I can think of is if the proposers provide 
> indemnification for any legal costs that could obtain from a legal 
> suit of us or our vendors.
>
> Mark
> //

How about legal counsel on the matter?

We're a little hesitant of asking Paramount/CBS about this, because of 
course, asking means that we think maybe they can say no, and we don't 
want to imply that.  So I'm thinking/hoping maybe we can do some 
research by a qualified legal expert (and not us armchair-lawyers, 
"yeah, it looks pretty settled to me...") to make a determination.

I'm trying to find out some more information about the KLI's pIqaD font, 
which it has been using and distributing for decades, during some of 
which time it was licensed by Paramount, and which apparently was *not* 
covered in the licensing agreements?precisely because typefaces are 
*not* copyrightable in the US!  (I thought they were, though... like I 
said, I'm trying to find out more about this.)  And all that time 
without objection from Paramount.  Not a slam-dunk argument, but it's 
something.

~mark

From mark at kli.org  Tue Nov 15 19:22:36 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 15 Nov 2016 20:22:36 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <54D68C57-FB87-46D8-A822-3A1848CDD611@evertype.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <d6d97678-5ca0-f45d-b096-f8b75b9d5ed8@kli.org>
 <54D68C57-FB87-46D8-A822-3A1848CDD611@evertype.com>
Message-ID: <a4e1a635-028e-2ce0-d058-ec870d54d3bd@kli.org>

On 11/15/2016 07:47 PM, Michael Everson wrote:
> A body of a particular kind of scholarship surrounds Tolkien?s oeuvre. That?s probably the reason.
>
> Michael Everson

Ah.  So it *is* a matter of "some literature is better than others."  I 
repeat here all the stuff I said in my response to Asmus' letter.  Since 
when did Unicode get in the business of deciding whose literature was 
important and whose wasn't?  And what do they base their decisions on?  
How much Klingon correspondence and conversation did the UTC sift 
through in order to reach its learned conclusion that Klingon-speakers 
don't do anything "scholarly"?

Do you guys even hear how ridiculously bigoted this all sounds?

~mark


From Shawn.Steele at microsoft.com  Tue Nov 15 19:26:16 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Wed, 16 Nov 2016 01:26:16 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <db9b94ad-af41-8c0d-4d21-d6661ee49777@kli.org>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
 <CAJ2xs_F6G9n6zoBJimBuAUrwgGKqngk4r5Y64gdi5bhXr-p8sg@mail.gmail.com>
 <db9b94ad-af41-8c0d-4d21-d6661ee49777@kli.org>
Message-ID: <MWHPR03MB28139D2CCBE02F274D1AABE882BE0@MWHPR03MB2813.namprd03.prod.outlook.com>

As I understand the issue, the problem is less of whether or not it is legal, then whether or not Paramount might sue.  Whether Unicode wins or not, it would still cost money to defend.

I was wondering like Mark Davis mentioned if there were some sort of companies that sold bonds for this kind of thing (though that might be out of KLI's budget.)

Being afraid of a no answer probably isn't going to inspire confidence.  But maybe you could do a combination of the above.  Get someone to give you a legal opinion and then present that to Paramount with a "hey, they said this was probably legal anyway, but we wanted to ask nicely to be sure."

-Shawn

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark E. Shoulson
Sent: Tuesday, 15 November 2016 5:19 PM
To: unicode at unicode.org
Subject: Re: The (Klingon) Empire Strikes Back

On 11/15/2016 07:31 PM, Mark Davis ?? wrote:
> > However, it appears relatively settled that one cannot claim
> copyright in an alphabet...
>
> We know that these parties tend to be litigious, so we have to be 
> careful. "relatively settled" is not good enough.
>
> We do not want to be the ones responsible (and liable) for making a 
> determination as to whether that is settled. Nor do we want to pay the 
> legal fees necessary to make a water-tight determination.
>
> That is why if there is any question as to the IP issues, we leave it 
> up to the proposers to get absolutely rock-solid clearance (eg from 
> the Tolkien estate for Tengwar, or from Paramount for Klingon). The 
> only other alternative I can think of is if the proposers provide 
> indemnification for any legal costs that could obtain from a legal 
> suit of us or our vendors.
>
> Mark
> //

How about legal counsel on the matter?

We're a little hesitant of asking Paramount/CBS about this, because of course, asking means that we think maybe they can say no, and we don't want to imply that.  So I'm thinking/hoping maybe we can do some research by a qualified legal expert (and not us armchair-lawyers, "yeah, it looks pretty settled to me...") to make a determination.

I'm trying to find out some more information about the KLI's pIqaD font, which it has been using and distributing for decades, during some of which time it was licensed by Paramount, and which apparently was *not* covered in the licensing agreements?precisely because typefaces are
*not* copyrightable in the US!  (I thought they were, though... like I said, I'm trying to find out more about this.)  And all that time without objection from Paramount.  Not a slam-dunk argument, but it's something.

~mark


From everson at evertype.com  Tue Nov 15 19:29:14 2016
From: everson at evertype.com (Michael Everson)
Date: Wed, 16 Nov 2016 01:29:14 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <a4e1a635-028e-2ce0-d058-ec870d54d3bd@kli.org>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <d6d97678-5ca0-f45d-b096-f8b75b9d5ed8@kli.org>
 <54D68C57-FB87-46D8-A822-3A1848CDD611@evertype.com>
 <a4e1a635-028e-2ce0-d058-ec870d54d3bd@kli.org>
Message-ID: <B26188E1-773B-43F3-BBBA-0D608B6E4A42@evertype.com>

Mark, 

No need to be defensive.

Tengwar and Cirth are in there because *I* put them there *long ago*, and the argument made was the nature of Tolkien?s work and study of it. That remains valid for keeping there, for one day the Tolkien Estate may revise its view on the matter. 

Maybe a version of the Roadmap had Klingon in it. I don?t recall. I?d?ve been the one to have put it there. There are records. It doesn?t matter, though. When lack of use made Klingon made UTC remove it from consideration, it would have been removed. 

The Roadmaps are really of no consequence. They?re useful, but they have no status and are subject to any kind of change before ballotting ends. 

Michael

> On 16 Nov 2016, at 01:22, Mark E. Shoulson <mark at kli.org> wrote:
> 
> On 11/15/2016 07:47 PM, Michael Everson wrote:
>> A body of a particular kind of scholarship surrounds Tolkien?s oeuvre. That?s probably the reason.
>> 
>> Michael Everson
> 
> Ah.  So it *is* a matter of "some literature is better than others."  I repeat here all the stuff I said in my response to Asmus' letter.  Since when did Unicode get in the business of deciding whose literature was important and whose wasn't?  And what do they base their decisions on?  How much Klingon correspondence and conversation did the UTC sift through in order to reach its learned conclusion that Klingon-speakers don't do anything "scholarly"?
> 
> Do you guys even hear how ridiculously bigoted this all sounds?
> 
> ~mark
> 


From mark at kli.org  Tue Nov 15 19:31:21 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 15 Nov 2016 20:31:21 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <90d16ff4-eef9-28df-3d9a-51a8011339ce@att.net>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
 <90d16ff4-eef9-28df-3d9a-51a8011339ce@att.net>
Message-ID: <b89c3620-4be4-25d4-7fc2-2dcb505eaaa8@kli.org>

On 11/15/2016 08:15 PM, Ken Whistler wrote:
>
> On 11/15/2016 10:21 AM, Asmus Freytag wrote:
>> Finally, I really can't understand the reluctance to place anything 
>> in the roadmap. An entry in the roadmap is not a commitment to 
>> anything - many scripts listed there face enormous obstacles before 
>> they could even reach the stage of a well-founded proposal. And, 
>> until such a proposal exists, there's no formal determination that a 
>> script has a truly separate identity and meets the bar for encoding.
>
> The barrier to putting it in the roadmap is the that it pIQaD is 
> currently listed on *not*-the-roadmap:
>
> http://www.unicode.org/roadmaps/not-the-roadmap/
>
> as Mark Shoulsen has been repeatedly pointing out.
>
> It would be inconsistent to add it to the SMP roadmap unless we delete 
> it from not-the-roadmap.
>
> And the reason that step has been stuck is because the UTC is still on 
> record with a nonapproval notice for the Klingon script from 2001. 
> (Based on Consensus 87-M3.)
>
> http://www.unicode.org/alloc/nonapprovals.html
>
> So figure it out, folks. First bring to the UTC a proposal to reverse 
> 87-M3. (Not to *encode* pIQaD yet -- just, on the basis of the new, 
> more mature proposal, to *entertain* appropriate discussion about 
> suitability for encoding, by rescinding the prior determination of 
> nonapproval.) If *that* proposal passed, then the nonapproval notice 
> would also be dropped. If the nonapproval notice is dropped, the 
> not-the-roadmap entry would be dropped. And if that is dropped, then 
> the Roadmap committee would dig around for a tentative allocation 
> slot, pending the determination of outcome for any other issues. Which 
> then could focus on the next obstacle, which is IP and the unresolved 
> risk of litigation.

So.... now the problem *isn't* the IP.  All along I've been saying that 
UTC needs to decide that pIqaD *should* be encoded first, without 
consideration of the IP issues, and *then* we can worry about dealing 
with the IP.  And the answers I got were all about how we can't do 
*anything* until this IP stuff is dealt with.  And now Ken Whistler 
comes and says what I said in the first place!  At least someone was 
paying attention.

So... Now it's not enough to propose that pIqaD get encoded, like any 
other script would need.  First we need a proposal to *permit* a 
proposal for encoding?  Um.  OK.  What should such a thing look like?  
Perhaps something like the document I submitted, showing lots of usage 
and asking if it could be considered now?  I originally wasn't going to 
append the full proposal to the document, but it was suggested to me 
that it would be expected.

Should I split the document up into two pieces and re-submit the two 
halves, one as a proposal, and one for permission to consider the 
proposal?  Would that satisfy the requirements?

> In any case, folks should stop with with "Unfair! Unfair!" stuff, and 
> just set to work, step-by-step, to deal with the items noted above. "A 
> Klingon is trained to use everything around them to their advantage." 
> O.k., I've just provided something useful -- go for it. And you won't 
> even need a cloaking device.

I've been working with whatever I could find all along.  The unfairness 
is a recognized fact, apparently, that can finally be faced and fixed, 
or so I hope.  I'm trying to get this done; best I can do is answer the 
questions put to me and look how other scripts in similar situations 
(like Tolkien scripts) have done what they did.

~mark

From mark at kli.org  Tue Nov 15 19:41:02 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 15 Nov 2016 20:41:02 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <MWHPR03MB28139D2CCBE02F274D1AABE882BE0@MWHPR03MB2813.namprd03.prod.outlook.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
 <CAJ2xs_F6G9n6zoBJimBuAUrwgGKqngk4r5Y64gdi5bhXr-p8sg@mail.gmail.com>
 <db9b94ad-af41-8c0d-4d21-d6661ee49777@kli.org>
 <MWHPR03MB28139D2CCBE02F274D1AABE882BE0@MWHPR03MB2813.namprd03.prod.outlook.com>
Message-ID: <713bb56b-1e37-1c16-d5e6-72828032bf28@kli.org>

On 11/15/2016 08:26 PM, Shawn Steele wrote:
> As I understand the issue, the problem is less of whether or not it is legal, then whether or not Paramount might sue.  Whether Unicode wins or not, it would still cost money to defend.

There ought to be laws against suits brought just to intimidate.  I 
think there are.  But yes, they aren't easy to prove or enforce.
> I was wondering like Mark Davis mentioned if there were some sort of companies that sold bonds for this kind of thing (though that might be out of KLI's budget.)
>
> Being afraid of a no answer probably isn't going to inspire confidence.  But maybe you could do a combination of the above.  Get someone to give you a legal opinion and then present that to Paramount with a "hey, they said this was probably legal anyway, but we wanted to ask nicely to be sure."

Not so much "afraid" of a no answer, but would rather not give the sense 
that we even thought that one was an option.  And for a company that 
makes its living from IP, they usually don't even have to bother 
listening to the whole question: "Say, can we use your?" "No!"  (This is 
probably also partly due to the way the laws are structured).

Your idea is a good one, though.  Get a legal opinion and maybe *inform* 
Paramount of it, and ask if they'd like to be involved in sanctioning 
it.  If spun right, it could even be sold as offering them the 
opportunity to get in on this, magnanimously offering them the privilege 
of giving their blessing...

~mark

From mark at kli.org  Tue Nov 15 19:47:42 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 15 Nov 2016 20:47:42 -0500
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <B26188E1-773B-43F3-BBBA-0D608B6E4A42@evertype.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <d6d97678-5ca0-f45d-b096-f8b75b9d5ed8@kli.org>
 <54D68C57-FB87-46D8-A822-3A1848CDD611@evertype.com>
 <a4e1a635-028e-2ce0-d058-ec870d54d3bd@kli.org>
 <B26188E1-773B-43F3-BBBA-0D608B6E4A42@evertype.com>
Message-ID: <0ca0b95c-fa07-daae-7f18-23e7b66915eb@kli.org>

On 11/15/2016 08:29 PM, Michael Everson wrote:
> Mark,
>
> No need to be defensive.
>
> Tengwar and Cirth are in there because *I* put them there *long ago*, and the argument made was the nature of Tolkien?s work and study of it. That remains valid for keeping there, for one day the Tolkien Estate may revise its view on the matter.
>
> Maybe a version of the Roadmap had Klingon in it. I don?t recall. I?d?ve been the one to have put it there. There are records. It doesn?t matter, though. When lack of use made Klingon made UTC remove it from consideration, it would have been removed.

The defensiveness was not that Tolkienian scholarship was deemed 
"worthy", but more that Klingon's apparently was not.  There was a 
Roadmap with pIqaD on it, and indeed you were the one who put it there.  
Nick Nicholas, in 
https://web.archive.org/web/20120307231609fw_/http://www.tlg.uci.edu/~opoudjis/Klingon/piqad.html 
credits you with a "delightful move of defiance" for replacing pIqaD 
with Sarati when it was removed.

> The Roadmaps are really of no consequence. They?re useful, but they have no status and are subject to any kind of change before ballotting ends.

Getting pIqaD off the "not-roadmapped" list is more important, both 
symbolically and, as Ken Whistler says, practically.

~mark

From everson at evertype.com  Tue Nov 15 20:18:49 2016
From: everson at evertype.com (Michael Everson)
Date: Wed, 16 Nov 2016 02:18:49 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <0ca0b95c-fa07-daae-7f18-23e7b66915eb@kli.org>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <d6d97678-5ca0-f45d-b096-f8b75b9d5ed8@kli.org>
 <54D68C57-FB87-46D8-A822-3A1848CDD611@evertype.com>
 <a4e1a635-028e-2ce0-d058-ec870d54d3bd@kli.org>
 <B26188E1-773B-43F3-BBBA-0D608B6E4A42@evertype.com>
 <0ca0b95c-fa07-daae-7f18-23e7b66915eb@kli.org>
Message-ID: <82CC2935-D61A-4956-BC75-79DC578E0871@evertype.com>

On 16 Nov 2016, at 01:47, Mark E. Shoulson <mark at kli.org> wrote:
> 
> The defensiveness was not that Tolkienian scholarship was deemed "worthy", but more that Klingon's apparently was not.

Back in the day? No. It wasn?t. 

> There was a Roadmap with pIqaD on it, and indeed you were the one who put it there.  Nick Nicholas, in https://web.archive.org/web/20120307231609fw_/http://www.tlg.uci.edu/~opoudjis/Klingon/piqad.html credits you with a "delightful move of defiance" for replacing pIqaD with Sarati when it was removed.

That would be me. 

>> The Roadmaps are really of no consequence. They?re useful, but they have no status and are subject to any kind of change before ballotting ends.
> 
> Getting pIqaD off the "not-roadmapped" list is more important, both symbolically and, as Ken Whistler says, practically.

Ha? ruch.

Michael

From everson at evertype.com  Tue Nov 15 21:57:23 2016
From: everson at evertype.com (Michael Everson)
Date: Wed, 16 Nov 2016 03:57:23 +0000
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
References: <01275881-d53b-269d-fde9-330e7d94be37@kli.org>
Message-ID: <01C190B4-7440-4085-B723-AC7EED0444AF@evertype.com>

On 3 Nov 2016, at 23:43, Mark Shoulson <mark at kli.org> wrote:

> Michael Everson: I basically copied your 1997 proposal into the document, with some minor changes.  I hope you don't mind.

I do not.

> And if you don't want to be on the hook for providing the glyphs to UTC, I can do that.  I think that proposal should serve as a starting-point for discussion anyway.

I?m in. 

> 1. the "SYMBOL FOR EMPIRE" also known as the "MUMMIFICATION GLYPH".  I don't know where the second name comes from, I don't know how important it is to encode it, and I don't know how much of a trademark headache it will cause with Paramount, as it is used pretty heavily in their imagery.  Something we'll have to talk about.

I?d leave it out for now.

> 2. I put in the COMMA and FULL STOP, which were not in the original proposal but were in the ConScript registry entry.

Yes, those have been adopted since 1997. 

>  The examples I have show them clearly being used.  UTC may decide to unify them with existing triangular shapes, which may or may not be a good idea.

As they are punctuation, I think it unlikely. 

> 3. For my part, I've invented a pair of ampersands for Klingon (Klingon has two words for "and": one for joining verbs/sentences and one for joining nouns (the former goes between its "conjunctands", the latter after them)), from ligatures of the letters in question.  The pretty much have NO usage, of course (and are not in the proposal), but maybe they should be presented to the community.

That?s up to you. Adoption is a matter for the user community. 

> Let the bickering begin!

may? malujpu'. veS maQap.

Michael Everson


From petercon at microsoft.com  Thu Nov 17 17:10:41 2016
From: petercon at microsoft.com (Peter Constable)
Date: Thu, 17 Nov 2016 23:10:41 +0000
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
Message-ID: <CY4PR03MB27909BD6A79380D31804660DD5B10@CY4PR03MB2790.namprd03.prod.outlook.com>

Somewhat interesting: a paper from a conference in Italy a couple of months ago:

http://discovery.dundee.ac.uk/portal/en/research/oh-thats-what-you-meant(20b8923c-28da-49ed-bc78-fcc741db3187).html

I anticipated old news about misunderstanding based on presentation differences on the level of water gun vs. etc. But it focuses on subtleties in emotional reactions that different users associate with different smileys. E.g., how does U+1F624 ???? compare with U+1F62C ????? A given user may perceive the two differently, and for either one a given user?s perception may differ when evaluating the depiction used in one app/platform versus another. They suggest that, if users gave a characterization of reactions to different emoji on a given platform (e.g., degree of emotion, how positive or negative) then an automated system could translate one user?s message to display an emoji to a second user that more closely reflects the emotion intended by the first user.


Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161117/46005a01/attachment.html>

From doug at ewellic.org  Thu Nov 17 17:31:34 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 17 Nov 2016 16:31:34 -0700
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
Message-ID: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>

Peter Constable wrote:

> E.g., how does U+1F624 ???? compare with U+1F62C ????? A given user may
> perceive the two differently, and for either one a given user?s
> perception may differ when evaluating the depiction used in one app/
> platform versus another. They suggest that, if users gave a
> characterization of reactions to different emoji on a given platform
> (e.g., degree of emotion, how positive or negative) then an automated
> system could translate one user?s message to display an emoji to a
> second user that more closely reflects the emotion intended by the
> first user. 

Or, people could just say what they mean, using language.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From verdy_p at wanadoo.fr  Thu Nov 17 21:46:07 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 18 Nov 2016 04:46:07 +0100
Subject: The (Klingon) Empire Strikes Back
In-Reply-To: <MWHPR03MB28139D2CCBE02F274D1AABE882BE0@MWHPR03MB2813.namprd03.prod.outlook.com>
References: <42101413.334282.1478281304520.ref@mail.yahoo.com>
 <42101413.334282.1478281304520@mail.yahoo.com>
 <27f48fcf-9ebc-8363-3b27-6540a242d375@kli.org>
 <SN1PR0301MB19664572E150BF938CABF4BDD5B80@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <MWHPR03MB28130FD3D83578F3A1CC82A482B80@MWHPR03MB2813.namprd03.prod.outlook.com>
 <CAJ2xs_E_vf26U3H_A6xTSLEssB3HkfxS1u6ydQsu9Bb7qs97VQ@mail.gmail.com>
 <59a89be5-1359-f5b7-905f-108c22c6e189@kli.org>
 <SN1PR0301MB1966C43A40333BEF0691AF3ED5BF0@SN1PR0301MB1966.namprd03.prod.outlook.com>
 <5b602cd1-53c8-181a-5ca4-0470ce36b92e@ix.netcom.com>
 <CAJ2xs_F6G9n6zoBJimBuAUrwgGKqngk4r5Y64gdi5bhXr-p8sg@mail.gmail.com>
 <db9b94ad-af41-8c0d-4d21-d6661ee49777@kli.org>
 <MWHPR03MB28139D2CCBE02F274D1AABE882BE0@MWHPR03MB2813.namprd03.prod.outlook.com>
Message-ID: <CAGa7JC0nTqnzu6O-1J_Dp0Qq8QLqR35J6g6209sZBjmqtKO4sA@mail.gmail.com>

Fonts when they are not copyrightable are still patentable. The complexity
of IP rights is growing and their scope of application as well (sometimes
with backward effects in time, including on the "public domain"). I would
not bet anything on a past decision by a US court, and anyway we're not
building just an US standard but an international standard: may be the
Unicode consortium or ISO would not be liable of infringments or subjects
to claims of IP rights in US, this doies not mean that there won't be
claims elsewhere, if the standard bodies cannot assert themselves their own
IP rights (which are then allowing them to licence the standards "for free"
to anyone in the world).

In this complex world, all that can be done is tohave a faire procedure for
litigations, and get some security by offering enough time for such claims,
after which a local (but applicable) law enforcement body will be able to
decide that these claims are coming too late to be valid (in the IP world,
such delays for "too late" claims can be extremely long, up to 70 years or
more for claims by individual people, ot 10 years for tangible properties
and appropriation of the public domain or the private domain of someone
else). On the Internet this fair system is known as the "UDRP" procedure
(which applies as well on claims for domain names).

But once this time is exhausted an IP rights are no longer exclusive,
someone else could build a new claim (e.g. by registering new patents
against what shoudl be the public domain and it is then costly to counter
these attacks that are too common with patents and trademarks). And when
there's uncertainty about the oreservation of the public domain or
legitimete use of it, some countries prefer redefining the delays
(including with backward applications, for example Russia): they can do
that with national laws unless these countries are bound to international
treaties: this has occured notably before the WIPO became a mostly
worldwide body enforcing the applicability or non-applicability of IP
rights in more tan just one country. But WIPO is now concerned with new
kind of rights.

Historically there was the patent system (derived from industrial rights
and artistic rights), then the copyright system, now there's the new
database IP system, and the moral right for physical persons starts being
extended to moral persons... In fact with these constant extensions, I am
not sure that all existing publications of the standard are not partly
covered now by new claims against which we've not opposed officially in due
time. This means that this goes beyond the single case of Klingons. We know
that the historic human language is now being appropriated (notably by
trademarks). In fact, all existing standards are concerned.

2016-11-16 2:26 GMT+01:00 Shawn Steele <Shawn.Steele at microsoft.com>:

> As I understand the issue, the problem is less of whether or not it is
> legal, then whether or not Paramount might sue.  Whether Unicode wins or
> not, it would still cost money to defend.
>
> I was wondering like Mark Davis mentioned if there were some sort of
> companies that sold bonds for this kind of thing (though that might be out
> of KLI's budget.)
>
> Being afraid of a no answer probably isn't going to inspire confidence.
> But maybe you could do a combination of the above.  Get someone to give you
> a legal opinion and then present that to Paramount with a "hey, they said
> this was probably legal anyway, but we wanted to ask nicely to be sure."
>
> -Shawn
>
> -----Original Message-----
> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark E.
> Shoulson
> Sent: Tuesday, 15 November 2016 5:19 PM
> To: unicode at unicode.org
> Subject: Re: The (Klingon) Empire Strikes Back
>
> On 11/15/2016 07:31 PM, Mark Davis ?? wrote:
> > > However, it appears relatively settled that one cannot claim
> > copyright in an alphabet...
> >
> > We know that these parties tend to be litigious, so we have to be
> > careful. "relatively settled" is not good enough.
> >
> > We do not want to be the ones responsible (and liable) for making a
> > determination as to whether that is settled. Nor do we want to pay the
> > legal fees necessary to make a water-tight determination.
> >
> > That is why if there is any question as to the IP issues, we leave it
> > up to the proposers to get absolutely rock-solid clearance (eg from
> > the Tolkien estate for Tengwar, or from Paramount for Klingon). The
> > only other alternative I can think of is if the proposers provide
> > indemnification for any legal costs that could obtain from a legal
> > suit of us or our vendors.
> >
> > Mark
> > //
>
> How about legal counsel on the matter?
>
> We're a little hesitant of asking Paramount/CBS about this, because of
> course, asking means that we think maybe they can say no, and we don't want
> to imply that.  So I'm thinking/hoping maybe we can do some research by a
> qualified legal expert (and not us armchair-lawyers, "yeah, it looks pretty
> settled to me...") to make a determination.
>
> I'm trying to find out some more information about the KLI's pIqaD font,
> which it has been using and distributing for decades, during some of which
> time it was licensed by Paramount, and which apparently was *not* covered
> in the licensing agreements?precisely because typefaces are
> *not* copyrightable in the US!  (I thought they were, though... like I
> said, I'm trying to find out more about this.)  And all that time without
> objection from Paramount.  Not a slam-dunk argument, but it's something.
>
> ~mark
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161118/7a1e2a0f/attachment.html>

From jameskasskrv at gmail.com  Thu Nov 17 21:55:01 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Thu, 17 Nov 2016 19:55:01 -0800
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
Message-ID: <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>

Doug Ewell responded to Peter Constable,

>> then an automated system could translate one user?s message to
>> display an emoji to a second user that more closely reflects
>> the emotion intended by the first user.
>
> Or, people could just say what they mean, using language.

How about some kind of automated system for translating icons into words?

>> E.g., how does U+1F624 ???? compare with U+1F62C ?????

They display identically in Notepad using Lucida Console, but I'm OK
with that.  So if anyone seeks an easy method for translating emoji
characters into meaningless little rectangles, there you go!

Best regards,

James Kass


From verdy_p at wanadoo.fr  Thu Nov 17 22:27:25 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 18 Nov 2016 05:27:25 +0100
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
 <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>
Message-ID: <CAGa7JC3q76m5=rpbLMjNN5Zaee0SHzC3je9X2D8UE=DvnNRRjw@mail.gmail.com>

such system already exists since long in various forums and chats, you
already write a word between colons, you get the emoji without having to
select it in a list or remember their code point and use complex input, but
there's a way to reverse this conversion if needed. The conversion of
":colon-bracketed-words:" to emojis has frequent false positives, notably
with punctuation: I've seen regularly false conversions of "-)" or similar
into undesired emojis.

There's no evident and universal way to convert emojis to natural language,
you'll collide sometimes as well with non-Emoji meanings I've seen some
forums substituting programming code (properly tagged as such using
surrounding markup such as <code>...</code> or <pre>...</pre> or
<kbd>...</kbd>) and replacing it with non-sense emojis. The same could
happen in the reverse direction (even if you surround the ":word:" with
additional spaces. Even if you choose some keywords or markup such as
"<emoji>smiley</emoji>" instead of " :-) " or " :smiley: ", you may break
tabular data (using ":" as column separators).


2016-11-18 4:55 GMT+01:00 James Kass <jameskasskrv at gmail.com>:

> Doug Ewell responded to Peter Constable,
>
> >> then an automated system could translate one user?s message to
> >> display an emoji to a second user that more closely reflects
> >> the emotion intended by the first user.
> >
> > Or, people could just say what they mean, using language.
>
> How about some kind of automated system for translating icons into words?
>
> >> E.g., how does U+1F624 ???? compare with U+1F62C ?????
>
> They display identically in Notepad using Lucida Console, but I'm OK
> with that.  So if anyone seeks an easy method for translating emoji
> characters into meaningless little rectangles, there you go!
>
> Best regards,
>
> James Kass
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161118/619aef3e/attachment.html>

From jameskasskrv at gmail.com  Fri Nov 18 00:06:32 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Thu, 17 Nov 2016 22:06:32 -0800
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <CAGa7JC3q76m5=rpbLMjNN5Zaee0SHzC3je9X2D8UE=DvnNRRjw@mail.gmail.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
 <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>
 <CAGa7JC3q76m5=rpbLMjNN5Zaee0SHzC3je9X2D8UE=DvnNRRjw@mail.gmail.com>
Message-ID: <CABPY6Z0NmBxmLu9mG24-Opgn+=WOxDfHadZRL1uFdFCUv0b-fQ@mail.gmail.com>

Philippe Verdy wrote,

> There's no evident and universal way to convert
> emojis to natural language ...

Indeed.  Emoji characters apparently mean whatever their users want them to
mean.  Such meanings may be perceived differently by various users or
communities, as the subject line indicates, and these meanings are subject
to change without notice.  Any effort to standardize such a conversion
seems doomed, but someone with funding would probably try it anyway.

Best regards,
James Kass
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161117/84186b8a/attachment.html>

From christoph.paeper at crissov.de  Fri Nov 18 00:27:55 2016
From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=)
Date: Fri, 18 Nov 2016 07:27:55 +0100
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
Message-ID: <82FB3E37-F903-4BCF-8320-2DBDE0BC41F8@crissov.de>

Doug Ewell <doug at ewellic.org>:
> 
> Or, people could just say what they mean, using language.

That?s not how language (or communication in general) works. At all.

From jameskasskrv at gmail.com  Fri Nov 18 00:55:20 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Thu, 17 Nov 2016 22:55:20 -0800
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <82FB3E37-F903-4BCF-8320-2DBDE0BC41F8@crissov.de>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
 <82FB3E37-F903-4BCF-8320-2DBDE0BC41F8@crissov.de>
Message-ID: <CABPY6Z1Y1zjDqcsCLcBfaB9nebK-vp1+Saqnv=8Whp-gtujbWg@mail.gmail.com>

Christoph P?per wrote,

>> Or, people could just say what they mean, using language.
>
> That?s not how language (or communication in general) works. At all.

Language works best when people say what they mean and mean what they
say, just as democracy works best with an informed electorate.  The
absence of either factor would tend to break down communication in
general.  Are we communicating with language here?

Best regards,

James Kass


From Shawn.Steele at microsoft.com  Fri Nov 18 01:30:53 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Fri, 18 Nov 2016 07:30:53 +0000
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
Message-ID: <MWHPR03MB2813FF6603BC1FAF2174110882B00@MWHPR03MB2813.namprd03.prod.outlook.com>

> Or, people could just say what they mean, using language.

Hmm, some languages don't have words to express what one means (or feels) in every circumstance.  I've used emoji when the concept would be tough, or impossible, to convey accurately in English.

-Shawn


From verdy_p at wanadoo.fr  Fri Nov 18 01:40:09 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 18 Nov 2016 08:40:09 +0100
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <CABPY6Z0NmBxmLu9mG24-Opgn+=WOxDfHadZRL1uFdFCUv0b-fQ@mail.gmail.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
 <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>
 <CAGa7JC3q76m5=rpbLMjNN5Zaee0SHzC3je9X2D8UE=DvnNRRjw@mail.gmail.com>
 <CABPY6Z0NmBxmLu9mG24-Opgn+=WOxDfHadZRL1uFdFCUv0b-fQ@mail.gmail.com>
Message-ID: <CAGa7JC3vLMMLrsNcjk+1ZTCd3KrDgdB0uwk5ZCO5u78Bc=tSgg@mail.gmail.com>

I would even add the Emojis are in fact a new separate language, written
with its own script, its own grammar/syntax, and its specific layout and
combinations (ligatured clusters, partly documented in Unicode) and
sometimes specificities about colors of rendering (e.g. the human skin
colors, or national flags if they are colorized).

I think it would merit a language code for itself. But you could use some
special language codes for notations, if "zxx" (no lingusitic content) is
not appropriate. (same remark about musical notations)

2016-11-18 7:06 GMT+01:00 James Kass <jameskasskrv at gmail.com>:

>
> Philippe Verdy wrote,
>
> > There's no evident and universal way to convert
> > emojis to natural language ...
>
> Indeed.  Emoji characters apparently mean whatever their users want them
> to mean.  Such meanings may be perceived differently by various users or
> communities, as the subject line indicates, and these meanings are subject
> to change without notice.  Any effort to standardize such a conversion
> seems doomed, but someone with funding would probably try it anyway.
>
> Best regards,
> James Kass
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161118/1b81b6ee/attachment.html>

From A.Schappo at lboro.ac.uk  Fri Nov 18 03:26:06 2016
From: A.Schappo at lboro.ac.uk (Andre Schappo)
Date: Fri, 18 Nov 2016 09:26:06 +0000
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <CAGa7JC3vLMMLrsNcjk+1ZTCd3KrDgdB0uwk5ZCO5u78Bc=tSgg@mail.gmail.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
 <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>
 <CAGa7JC3q76m5=rpbLMjNN5Zaee0SHzC3je9X2D8UE=DvnNRRjw@mail.gmail.com>
 <CABPY6Z0NmBxmLu9mG24-Opgn+=WOxDfHadZRL1uFdFCUv0b-fQ@mail.gmail.com>
 <CAGa7JC3vLMMLrsNcjk+1ZTCd3KrDgdB0uwk5ZCO5u78Bc=tSgg@mail.gmail.com>
Message-ID: <AD38B14D-7C58-4591-8882-5C25CCD37B4A@lboro.ac.uk>


 As Richard Ishida insightfully points out ? should Emoji sequences/phrases/sentences adhere to the human language context eg a Japanese Emoji sequence could/should be in Japanese "Subject - Object - Verb" order https://twitter.com/r12a/status/798151134963757056

Andr? Schappo

On 18 Nov 2016, at 07:40, Philippe Verdy <verdy_p at wanadoo.fr<mailto:verdy_p at wanadoo.fr>> wrote:

I would even add the Emojis are in fact a new separate language, written with its own script, its own grammar/syntax, and its specific layout and combinations (ligatured clusters, partly documented in Unicode) and sometimes specificities about colors of rendering (e.g. the human skin colors, or national flags if they are colorized).

I think it would merit a language code for itself. But you could use some special language codes for notations, if "zxx" (no lingusitic content) is not appropriate. (same remark about musical notations)

2016-11-18 7:06 GMT+01:00 James Kass <jameskasskrv at gmail.com<mailto:jameskasskrv at gmail.com>>:

Philippe Verdy wrote,

> There's no evident and universal way to convert
> emojis to natural language ...

Indeed.  Emoji characters apparently mean whatever their users want them to mean.  Such meanings may be perceived differently by various users or  communities, as the subject line indicates, and these meanings are subject to change without notice.  Any effort to standardize such a conversion seems doomed, but someone with funding would probably try it anyway.

Best regards,

James Kass


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161118/bc4a18d6/attachment.html>

From duerst at it.aoyama.ac.jp  Fri Nov 18 04:24:40 2016
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=)
Date: Fri, 18 Nov 2016 19:24:40 +0900
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <AD38B14D-7C58-4591-8882-5C25CCD37B4A@lboro.ac.uk>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
 <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>
 <CAGa7JC3q76m5=rpbLMjNN5Zaee0SHzC3je9X2D8UE=DvnNRRjw@mail.gmail.com>
 <CABPY6Z0NmBxmLu9mG24-Opgn+=WOxDfHadZRL1uFdFCUv0b-fQ@mail.gmail.com>
 <CAGa7JC3vLMMLrsNcjk+1ZTCd3KrDgdB0uwk5ZCO5u78Bc=tSgg@mail.gmail.com>
 <AD38B14D-7C58-4591-8882-5C25CCD37B4A@lboro.ac.uk>
Message-ID: <e148a1a0-fd5e-60cd-4c97-7b85ff417242@it.aoyama.ac.jp>

In many cases, emoji communication is a lot more complicated than just 
copying word order from the host language. See e.g.
https://www.wired.com/2016/08/how-teens-use-social-media/ for some examples.

Regards,   Martin.

On 2016/11/18 18:26, Andre Schappo wrote:
>
>  As Richard Ishida insightfully points out ? should Emoji sequences/phrases/sentences adhere to the human language context eg a Japanese Emoji sequence could/should be in Japanese "Subject - Object - Verb" order https://twitter.com/r12a/status/798151134963757056
>
> Andr? Schappo
>
> On 18 Nov 2016, at 07:40, Philippe Verdy <verdy_p at wanadoo.fr<mailto:verdy_p at wanadoo.fr>> wrote:
>
> I would even add the Emojis are in fact a new separate language, written with its own script, its own grammar/syntax, and its specific layout and combinations (ligatured clusters, partly documented in Unicode) and sometimes specificities about colors of rendering (e.g. the human skin colors, or national flags if they are colorized).
>
> I think it would merit a language code for itself. But you could use some special language codes for notations, if "zxx" (no lingusitic content) is not appropriate. (same remark about musical notations)
>
> 2016-11-18 7:06 GMT+01:00 James Kass <jameskasskrv at gmail.com<mailto:jameskasskrv at gmail.com>>:
>
> Philippe Verdy wrote,
>
>> There's no evident and universal way to convert
>> emojis to natural language ...
>
> Indeed.  Emoji characters apparently mean whatever their users want them to mean.  Such meanings may be perceived differently by various users or  communities, as the subject line indicates, and these meanings are subject to change without notice.  Any effort to standardize such a conversion seems doomed, but someone with funding would probably try it anyway.
>
> Best regards,
>
> James Kass
>
>
>

-- 
Prof. Dr.sc. Martin J. D?rst
Department of Intelligent Information Technology
College of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
252-5258 Japan

From otto.stolz at uni-konstanz.de  Fri Nov 18 05:49:49 2016
From: otto.stolz at uni-konstanz.de (Otto Stolz)
Date: Fri, 18 Nov 2016 12:49:49 +0100
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
Message-ID: <582EEADD.2070302@uni-konstanz.de>

Am 18.11.2016 um 00:31 schrieb Doug Ewell:
> Or, people could just say what they mean, using language.

This is not so easy, as already Lewis Carroll had seen,
cf. this snippet from ?Alice in Wonderland?:
> ?Then you should say what you mean,? the March Hare went on.
> ?I do,? Alice hastily replied; ?at least?at least I mean what I say?
> that?s the same thing, you know.?
> ?Not the same thing a bit!? said the Hatter.


Best wishes,
    Otto


From wjgo_10009 at btinternet.com  Fri Nov 18 09:41:36 2016
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Fri, 18 Nov 2016 15:41:36 +0000 (GMT)
Subject: "Oh that's what you meant!: reducing emoji misunderstanding"
In-Reply-To: <AD38B14D-7C58-4591-8882-5C25CCD37B4A@lboro.ac.uk>
References: <20161117163134.665a7a7059d7ee80bb4d670165c8327d.e5ed1c79b6.wbe@email03.godaddy.com>
 <CABPY6Z2JURyPrSnqm8E26thDHV5zRGtNG1z8J3QnHyN55sedJg@mail.gmail.com>
 <CAGa7JC3q76m5=rpbLMjNN5Zaee0SHzC3je9X2D8UE=DvnNRRjw@mail.gmail.com>
 <CABPY6Z0NmBxmLu9mG24-Opgn+=WOxDfHadZRL1uFdFCUv0b-fQ@mail.gmail.com>
 <CAGa7JC3vLMMLrsNcjk+1ZTCd3KrDgdB0uwk5ZCO5u78Bc=tSgg@mail.gmail.com>
 <AD38B14D-7C58-4591-8882-5C25CCD37B4A@lboro.ac.uk>
Message-ID: <2292097.44071.1479483696494.JavaMail.defaultUser@defaultHost>

Andr? Schappo wrote:

> As Richard Ishida insightfully points out ? should Emoji sequences/phrases/sentences adhere to the human language context eg a Japanese Emoji sequence could/should be in Japanese "Subject - Object - Verb" order https://twitter.com/r12a/status/798151134963757056

As it happens I have recently been designing some emoji grammatical operator characters. They are abstract emoji.

The concept is that the emoji grammatical operator operates on the emoji character that follows it, so as to provide a grammatical context for the emoji character. 

Each of the characters is designed to be on a 7 by 7 grid, and is one contiguous piece with no inner hole. 

Lines are always one unit wide and only corners and T junctions are allowed. 

I have now added images of glyph designs for fifteen emoji grammatical operator characters to the web.

They are included on the following web page.

http://www.users.globalnet.co.uk/~ngo/abstract_emoji.htm

That page is linked from the following web page.

http://www.users.globalnet.co.uk/~ngo/library.htm

I have attached copies of two of the images to this email as examples.

They are as follows.

emoji_grammatical_operator_verb_pluperfect_tense.png

emoji_grammatical_operator_noun_direct_object.png

William Overington

Friday 18 November 2016

-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_grammatical_operator_noun_direct_object.png
Type: image/png
Size: 3013 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161118/7a22e874/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_grammatical_operator_verb_pluperfect_tense.png
Type: image/png
Size: 3022 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161118/7a22e874/attachment-0001.png>

From verdy_p at wanadoo.fr  Sun Nov 20 02:46:14 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 09:46:14 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
Message-ID: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>

How should the following Japanese **paragraph** be displayed when inserted
in a RTL context (Arabic/Farsi/...) ?

    ?Japanese1?Japanese2

What I see in browsers is:

    Japanese1?Japanese2 ?

Why don't the Japanese backets pair together to avoid having one mirrored
and not the other one ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/b2cb8812/attachment.html>

From simon at simon-cozens.org  Sun Nov 20 04:22:46 2016
From: simon at simon-cozens.org (Simon Cozens)
Date: Sun, 20 Nov 2016 21:22:46 +1100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
Message-ID: <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>

On 20/11/2016 19:46, Philippe Verdy wrote:
> Why don't the Japanese backets pair together to avoid having one
> mirrored and not the other one ?

Isn't this the classic bidi brackets problem? The ? is assumed to belong
to the base level because it's bidi neutral, but the ? is assumed to be
part of the LTR text, so they end up in different isolating runs.

I don't think there's anything special about Japanese here. The same
happens for () brackets and English text.

From verdy_p at wanadoo.fr  Sun Nov 20 04:52:01 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 11:52:01 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
Message-ID: <CAGa7JC3rXdXs_y-8sdhWSDf=f8NpaiKh6h_=z4wEh7c1cY0A7A@mail.gmail.com>

Wasn't this corrected so that the direction of such 'bidi neutral" pairs
should match, i.e. the leading character would adopt the direction of the
trailing one in the same pair, rather than inheriting the direction from
the outer context ?

2016-11-20 11:22 GMT+01:00 Simon Cozens <simon at simon-cozens.org>:

> On 20/11/2016 19:46, Philippe Verdy wrote:
> > Why don't the Japanese backets pair together to avoid having one
> > mirrored and not the other one ?
>
> Isn't this the classic bidi brackets problem? The ? is assumed to belong
> to the base level because it's bidi neutral, but the ? is assumed to be
> part of the LTR text, so they end up in different isolating runs.
>
> I don't think there's anything special about Japanese here. The same
> happens for () brackets and English text.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/cf97a459/attachment.html>

From ismeta.wikt at gmail.com  Sun Nov 20 05:14:18 2016
From: ismeta.wikt at gmail.com (IS META)
Date: Sun, 20 Nov 2016 11:14:18 +0000
Subject: Unicode Digest, Vol 35, Issue 16
In-Reply-To: <mailman.0.1479578401.11142.unicode@unicode.org>
References: <mailman.0.1479578401.11142.unicode@unicode.org>
Message-ID: <CAFV=FfhUWVjzYTMSm3zqMBZ_SmwyPfD+kzkT-FU3P7nkbB9png@mail.gmail.com>

Dear William Overington,
Your abstract emoji are interesting. I am especially pleased that your *noun
brown* emoji express a number of grammatical cases. However, your *Some
designs for emoji of personal pronouns* is less flexible, wherein the
pronouns can only express singular and plural grammatical numbers. Is there
any chance that the system may be modified to enable the expression of dual
grammatical number? Though the dual number is rarer than the
singular?plural distinction, it occurs in many languages, including major
ones like Classical Greek, Sanskrit, and Modern Standard Arabic, and it is
far more widespread in pronominal systems. Perhaps the way American Sign
Language expresses the dual number could provide some inspiration for this.

Yours sincerely,
I.S.M.E.T.A.

On Sat, Nov 19, 2016 at 6:00 PM, <unicode-request at unicode.org> wrote:

> Send Unicode mailing list submissions to
>         unicode at unicode.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://unicode.org/mailman/listinfo/unicode
> or, via email, send a message with subject or body 'help' to
>         unicode-request at unicode.org
>
> You can reach the person managing the list at
>         unicode-owner at unicode.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Unicode digest..."
>
> Today's Topics:
>
>    1. Re: "Oh that's what you meant!: reducing emoji
>       misunderstanding" (William_J_G Overington)
>
>
> ---------- Forwarded message ----------
> From: William_J_G Overington <wjgo_10009 at btinternet.com>
> To: A.Schappo at lboro.ac.uk, unicode at unicode.org
> Cc:
> Date: Fri, 18 Nov 2016 15:41:36 +0000 (GMT)
> Subject: Re: "Oh that's what you meant!: reducing emoji misunderstanding"
> Andr? Schappo wrote:
>
> > As Richard Ishida insightfully points out ? should Emoji
> sequences/phrases/sentences adhere to the human language context eg a
> Japanese Emoji sequence could/should be in Japanese "Subject - Object -
> Verb" order https://twitter.com/r12a/status/798151134963757056
>
> As it happens I have recently been designing some emoji grammatical
> operator characters. They are abstract emoji.
>
> The concept is that the emoji grammatical operator operates on the emoji
> character that follows it, so as to provide a grammatical context for the
> emoji character.
>
> Each of the characters is designed to be on a 7 by 7 grid, and is one
> contiguous piece with no inner hole.
>
> Lines are always one unit wide and only corners and T junctions are
> allowed.
>
> I have now added images of glyph designs for fifteen emoji grammatical
> operator characters to the web.
>
> They are included on the following web page.
>
> http://www.users.globalnet.co.uk/~ngo/abstract_emoji.htm
>
> That page is linked from the following web page.
>
> http://www.users.globalnet.co.uk/~ngo/library.htm
>
> I have attached copies of two of the images to this email as examples.
>
> They are as follows.
>
> emoji_grammatical_operator_verb_pluperfect_tense.png
>
> emoji_grammatical_operator_noun_direct_object.png
>
> William Overington
>
> Friday 18 November 2016
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/421a259a/attachment.html>

From eliz at gnu.org  Sun Nov 20 09:27:41 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sun, 20 Nov 2016 17:27:41 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 (message from Philippe Verdy on Sun, 20 Nov 2016 09:46:14 +0100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
Message-ID: <83twb29no2.fsf@gnu.org>

> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Sun, 20 Nov 2016 09:46:14 +0100
> 
> How should the following Japanese **paragraph** be displayed when inserted in a RTL context
> (Arabic/Farsi/...) ?
> 
> ?Japanese1?Japanese2
> 
> What I see in browsers is:
> 
> Japanese1?Japanese2 ?
> 
> Why don't the Japanese backets pair together to avoid having one mirrored and not the other one ?

I guess your browser doesn't support the full Unicode 9.0 UBA.
Emacs 25, for example, does TRT: I see

                                                      Japanese2 ?Japanese1?

(flushed all the way to the right margin of the window), as expected.

P.S. I assume that by "RTL context" you mean right-to-left base
paragraph direction.

From eliz at gnu.org  Sun Nov 20 09:29:35 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sun, 20 Nov 2016 17:29:35 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org> (message
 from Simon Cozens on Sun, 20 Nov 2016 21:22:46 +1100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
Message-ID: <83shqm9nkw.fsf@gnu.org>

> From: Simon Cozens <simon at simon-cozens.org>
> Date: Sun, 20 Nov 2016 21:22:46 +1100
> 
> On 20/11/2016 19:46, Philippe Verdy wrote:
> > Why don't the Japanese backets pair together to avoid having one
> > mirrored and not the other one ?
> 
> Isn't this the classic bidi brackets problem? The ? is assumed to belong
> to the base level because it's bidi neutral, but the ? is assumed to be
> part of the LTR text, so they end up in different isolating runs.

The UBA was changed in Unicode 6.3 to process mirrored bracket pairs
specially, to avoid this issue.  But not all browsers caught up with
that yet.

From verdy_p at wanadoo.fr  Sun Nov 20 10:20:49 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 17:20:49 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <83shqm9nkw.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
Message-ID: <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>

So it is an issue of Chrome, still not using the new rules. I thought it
was already using them.

The alignment of the paragraph to the right is optional, it is less
essential. It would still be satisfactory to see:

Japanese2 ?Japanese1?

That alignment is prefered only when it is a separate paragraph, but if the
Japanese citation is within an Arabic paragraph encoded as :

ARABIC-ONE "?Japanese1?Japanese2" ARABIC-TWO

I expect to see

                                               OWT-CIBARA Japanese2
?Japanese1?"" ENO-CIBARA

aligned to the right margin,or:

   OWT-CIBARA Japanese2 ?Japanese1?"" ENO-CIBARA

if it occurs in an Arabic document.

There's still the problem of surrounding quation marks that don't form
matching pairs (unlike brackets), that's why authors will likely use
mirrorable quotation marks, or will need to surround the Japanese citation
and the quotations using some isolation using <bdi>...</bdi> or equivalent
bidi isolate controls, or an LTR override control for the leading quotation
mark to get:

   OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA

May be some bidi processors may opt for matching quotation mark pairs such
as "..." or ?...? or ?...? or ?...? or ?...? or ?...?, but it is well known
that this won't work if quotation marks are not paired or use the same
mirrorable character for the leasing and trailing quotation marks as  ?
...?,.

Same problem if quotations span multiple paragraphs where an additional
quotation mark is leading each additional paragraph in the same quotation
(for saying that the quotation continues), with only one quotation mark at
end of the last paragraph) which can't be paired easily without
ambiguities, or more complex resolution which will be language dependant
and would probably require additonal markup of the language used in the
citation text itself, or for the whole container including the quotation
marks. And example of this complex case is

   ? CITATION1
   ? CITATION2
   ? CITATION3 ?, Author

This style above is parsable by considering that any "trailing" quotation
mark leading any line cannot be really a trailing mark (it is then a
continuation mark) and that to match the trailing quotation mark, you need
to look further, possibly in multiple paragraphs.

As far as I know, there's no easy way to encode in plain-text Unicode only
(without markup), that continuation marks should be ignored by Bidi
processors for matching pairs, except by putting these continuation marks
in isolates (e.g. above the continuation marks just before CITATION2 and
CITATION3 will be encoded as <LRI,?,PDI>, or in HTML as <bdi>?</bdi>).

There's no easy solution for this case except by using some isolation with
an explicit direction set to surround the whole (<bdi dir="ltr">...</bdi>
or LRI...PDI). It is notable that most quotation marks are also not
mirrorable, but pseudo-mirroring by replacing these marks may be made in
language-dependant processors.

2016-11-20 16:29 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Simon Cozens <simon at simon-cozens.org>
> > Date: Sun, 20 Nov 2016 21:22:46 +1100
> >
> > On 20/11/2016 19:46, Philippe Verdy wrote:
> > > Why don't the Japanese backets pair together to avoid having one
> > > mirrored and not the other one ?
> >
> > Isn't this the classic bidi brackets problem? The ? is assumed to belong
> > to the base level because it's bidi neutral, but the ? is assumed to be
> > part of the LTR text, so they end up in different isolating runs.
>
> The UBA was changed in Unicode 6.3 to process mirrored bracket pairs
> specially, to avoid this issue.  But not all browsers caught up with
> that yet.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/b47bfeba/attachment.html>

From eliz at gnu.org  Sun Nov 20 10:37:23 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sun, 20 Nov 2016 18:37:23 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 (message from Philippe Verdy on Sun, 20 Nov 2016 17:20:49 +0100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
Message-ID: <83h9729kfw.fsf@gnu.org>

> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Sun, 20 Nov 2016 17:20:49 +0100
> Cc: Simon Cozens <simon at simon-cozens.org>, 
> 	unicode Unicode Discussion <unicode at unicode.org>
> 
> The alignment of the paragraph to the right is optional, it is less essential.

It's essential for people who speak those languages.  Not seeing the
alignment would cause some brows to be raised (and can also cause
incorrect reading in some marginal cases).

> That alignment is prefered only when it is a separate paragraph, but if the Japanese citation is within an Arabic
> paragraph encoded as :
> 
> ARABIC-ONE "?Japanese1?Japanese2" ARABIC-TWO
> 
> I expect to see
> 
> OWT-CIBARA Japanese2 ?Japanese1?"" ENO-CIBARA
> 
> aligned to the right margin

No, you should see this:

 OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA

That's what Emacs shows me.

> There's still the problem of surrounding quation marks that don't form matching pairs (unlike brackets), that's
> why authors will likely use mirrorable quotation marks, or will need to surround the Japanese citation and the
> quotations using some isolation using <bdi>...</bdi> or equivalent bidi isolate controls, or an LTR override
> control for the leading quotation mark to get:

I don't see any problems with quotes in Emacs, see above.

> May be some bidi processors may opt for matching quotation mark pairs such as "..." or ?...? or ?...? or ?...? or
> ?...? or ?...?, but it is well known that this won't work if quotation marks are not paired or use the same
> mirrorable character for the leasing and trailing quotation marks as ?...?,.

They do match here without any problems.

From verdy_p at wanadoo.fr  Sun Nov 20 10:58:54 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 17:58:54 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <83h9729kfw.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
Message-ID: <CAGa7JC2+fNZNFfmwmQYpjeyrQ97P38vG975Pvz4EAMa0wtC0+Q@mail.gmail.com>

2016-11-20 17:37 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Sun, 20 Nov 2016 17:20:49 +0100
> > Cc: Simon Cozens <simon at simon-cozens.org>,
> >       unicode Unicode Discussion <unicode at unicode.org>
> >
> > The alignment of the paragraph to the right is optional, it is less
> essential.
>
> It's essential for people who speak those languages.  Not seeing the
> alignment would cause some brows to be raised (and can also cause
> incorrect reading in some marginal cases).
>
> > That alignment is prefered only when it is a separate paragraph, but if
> the Japanese citation is within an Arabic
> > paragraph encoded as :
> >
> > ARABIC-ONE "?Japanese1?Japanese2" ARABIC-TWO
> >
> > I expect to see
> >
> > OWT-CIBARA Japanese2 ?Japanese1?"" ENO-CIBARA
> >
> > aligned to the right margin
>
> No, you should see this:
>
>  OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA
>
> That's what Emacs shows me.
>

That's because EMACS uses some "smart quote" processing, but it is
absolutely not part of the Unicode Bidi standard, This is an extension
("smart quote" matching is known to be defective in all processors in many
cases because they assume rules used for specific languages, but they DO
NOT work properly notably when using multilingual text where various
languages use quotation marks very differently and in incompatible ways!!!

The ASCII quotes are neither opening, nor closing, they do not form
**clear** pairs (e.g when I speak about the two characters ' " ' and " ' ",
smart processors are unable to correctly guess how simple and double quotes
are pairing, or if they are really pairing or not !!!).

Emacs will be as stupid as other wordprocessors here if it uses its "smart
quotes" to tune the behavior Bidi algorithm (IMHO this is clearly a real
BUG of Emacs if it does that, this will never be portable and this behavior
is completely unpredictable).

Here I was speaking about the standard Bidi algorithm (also part of HTML
and SVG, and implemetned in browsers: none of them can use any "smart
quote" processing, only some word processors may do that but with
interaction with users dueing editing, but NEVER for rendering a read-only
document, because those "smart quotes" are just guesses for most frequent
cases, but there are many exceptions, notably in multilingual documents
like here)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/651cfac0/attachment.html>

From eliz at gnu.org  Sun Nov 20 11:24:05 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sun, 20 Nov 2016 19:24:05 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2+fNZNFfmwmQYpjeyrQ97P38vG975Pvz4EAMa0wtC0+Q@mail.gmail.com>
 (message from Philippe Verdy on Sun, 20 Nov 2016 17:58:54 +0100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2+fNZNFfmwmQYpjeyrQ97P38vG975Pvz4EAMa0wtC0+Q@mail.gmail.com>
Message-ID: <83d1hq9ia2.fsf@gnu.org>

> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Sun, 20 Nov 2016 17:58:54 +0100
> Cc: Simon Cozens <simon at simon-cozens.org>, 
> 	unicode Unicode Discussion <unicode at unicode.org>
> 
>  No, you should see this:
> 
>  OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA
> 
>  That's what Emacs shows me.
> 
> That's because EMACS uses some "smart quote" processing

It doesn't.  It might have bugs in its UBA implementation, but
otherwise it just implements the UBA.  I wrote it, so I should know.

I believe in this case there's no bug, since each quote is between an
LTR and an RTL character, so they both take the base paragraph level.

FWIW, I see the same behavior in Notepad.

From eliz at gnu.org  Sun Nov 20 11:48:00 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sun, 20 Nov 2016 19:48:00 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <83d1hq9ia2.fsf@gnu.org> (message from Eli Zaretskii on Sun, 20
 Nov 2016 19:24:05 +0200)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2+fNZNFfmwmQYpjeyrQ97P38vG975Pvz4EAMa0wtC0+Q@mail.gmail.com>
 <83d1hq9ia2.fsf@gnu.org>
Message-ID: <837f7y9h67.fsf@gnu.org>

> Date: Sun, 20 Nov 2016 19:24:05 +0200
> From: Eli Zaretskii <eliz at gnu.org>
> Cc: simon at simon-cozens.org, unicode at unicode.org
> 
> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Sun, 20 Nov 2016 17:58:54 +0100
> > Cc: Simon Cozens <simon at simon-cozens.org>, 
> > 	unicode Unicode Discussion <unicode at unicode.org>
> > 
> >  No, you should see this:
> > 
> >  OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA
> > 
> >  That's what Emacs shows me.
> > 
> > That's because EMACS uses some "smart quote" processing
> 
> It doesn't.  It might have bugs in its UBA implementation, but
> otherwise it just implements the UBA.  I wrote it, so I should know.
> 
> I believe in this case there's no bug, since each quote is between an
> LTR and an RTL character, so they both take the base paragraph level.
> 
> FWIW, I see the same behavior in Notepad.

I've now double-checked this in the Reference Implementation, and it
also exhibits the same behavior I see in Emacs.  So I believe there's
no bug, and the display should be as shown above.

From verdy_p at wanadoo.fr  Sun Nov 20 11:50:18 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 18:50:18 +0100
Subject: Unicode Digest, Vol 35, Issue 16
In-Reply-To: <CAFV=FfhUWVjzYTMSm3zqMBZ_SmwyPfD+kzkT-FU3P7nkbB9png@mail.gmail.com>
References: <mailman.0.1479578401.11142.unicode@unicode.org>
 <CAFV=FfhUWVjzYTMSm3zqMBZ_SmwyPfD+kzkT-FU3P7nkbB9png@mail.gmail.com>
Message-ID: <CAGa7JC3u0vmf_TCXrJ3FEOzq4WuX0kiUiqpimgHDpT9ds20zeg@mail.gmail.com>

2016-11-20 12:14 GMT+01:00 IS META <ismeta.wikt at gmail.com>:

> Dear William Overington,
> Your abstract emoji are interesting. I am especially pleased that your *noun
> brown* emoji express a number of grammatical cases. However, your *Some
> designs for emoji of personal pronouns* is less flexible, wherein the
> pronouns can only express singular and plural grammatical numbers. Is there
> any chance that the system may be modified to enable the expression of dual
> grammatical number? Though the dual number is rarer than the
> singular?plural distinction, it occurs in many languages, including major
> ones like Classical Greek, Sanskrit, and Modern Standard Arabic, and it is
> far more widespread in pronominal systems. Perhaps the way American Sign
> Language expresses the dual number could provide some inspiration for this.
>

For such graphical notations, there's absolutely no need to distinguish
singular and plural (many Asian languages do not have distinctive
grammatical numbers): if the numal quantity is important, it should just be
represented directly by its value (e.g. by showing hands with a number of
fingers raised), but most probably by using digits directly).

On the opposite I think it is much more important to be able to designate
the 1st person speaking, and if she speaks for herself or in the noun of a
group, the person(s) to she is speaking to (either directly, as as the
representant of a group, but this could be a separate "privately" or
"alone" attribute), and a generic undesignated/umpersonal 3rd person not
designating anyone (he/she/it/they), possiblyt with an additional attribute
(a number? an adjective for "near" versus "far", like in the distinction of
"this" and "that" or "here" and "there' in English, or "left" vs."right",
or "front" vs. "back") to distinguish several entities.

But once again this discussion is about a long personal invention by
William, that attempts since long to push it as a "standard", when he is
actually alone and not qualified alone to be an academic source
representing an active community, and whre he never demonstrated the
existance of any active community supporting his "inventions" (often
self-contradictory and constantly changing) :

In other words it is out of scope for the Unicode standard.

Emojis are definitely NOT used in the world the way that William thinks.
William is in fact inventing since long another script (which has nothing
in copmmon with Emojis) but has not been able to conveince a community to
use and support it. Borrowing Emojis inside his personnaly invented script
does not mean that Emojis are part of William's script.

But there's a very active community using Emojis (notably in Japan), and
with active support by local providers of communication channels, that
developed initially separate incompatible solutions before thinking about
standardizing their usage using a common agreed set (because their users
wanted interoperability across providers and urged them to use comatible
schemes, without loosing their freedom to use Emojis like they want, i.e.
without any strong "grammatical" rules)

However there's much more promizing scripts to think about, notably
SignWriting (but hre also some Emojis could be borrowed, this does not mean
that Emojis are full part of SignWriting, just like they are not directly
part of Han signograms, or Kanas, or Latin) !

Emojis are and will remain a specific script that will never be able to
express a full human language, only some small isolated items whose
interpretation will remain very fuzzy, and with an extremely minimalist
grammar and an minimalist orthography (the "ligature" clusters documented
in Unicode), so that they can be used in various languages having very
different grammars or conceptual models: the interpretation of emojis are
left to readers in some linguistic, territorial, cultural, or social
community, that DON'T want any strong grammar: they really love the freedom
of speech and composition offered by Emojis, and certainly don't want such
grammar !

So please keep William's proposed (unsupported) script completely out of
way of the encoding of Emojis that are and will remain isolate symbols,
with minimal interactions among themselves or with other scripts. I also
note that Emojis that **should** all have neutral directionality, and
should all be mirrorable where approriate (so that they'll be usable in LTR
or RTL contexts), unless they explicitly express the "left" vs "right
semantics (but they could also express the "start" vs. "end" semantic that
MUST be mirorrable, and possibly even "rotatable" in vertical script
presentations).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/aa56c0af/attachment.html>

From verdy_p at wanadoo.fr  Sun Nov 20 11:51:01 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 18:51:01 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <83h9729kfw.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
Message-ID: <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>

>> I expect to see
>>
>>   OWT-CIBARA Japanese2 ?Japanese1?"" ENO-CIBARA
>>
>> aligned to the right margin
>No, you should see this:
>   OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA

Correction: I expect to see:

   OWT-CIBARA Japanese2" ?Japanese1?" ENO-CIBARA


2016-11-20 17:37 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Sun, 20 Nov 2016 17:20:49 +0100
> > Cc: Simon Cozens <simon at simon-cozens.org>,
> >       unicode Unicode Discussion <unicode at unicode.org>
> >
> > The alignment of the paragraph to the right is optional, it is less
> essential.
>
> It's essential for people who speak those languages.  Not seeing the
> alignment would cause some brows to be raised (and can also cause
> incorrect reading in some marginal cases).
>
> > That alignment is prefered only when it is a separate paragraph, but if
> the Japanese citation is within an Arabic
> > paragraph encoded as :
> >
> > ARABIC-ONE "?Japanese1?Japanese2" ARABIC-TWO
> >
> > I expect to see
> >
> > OWT-CIBARA Japanese2 ?Japanese1?"" ENO-CIBARA
> >
> > aligned to the right margin
>
> No, you should see this:
>
>  OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA
>
> That's what Emacs shows me.
>
> > There's still the problem of surrounding quation marks that don't form
> matching pairs (unlike brackets), that's
> > why authors will likely use mirrorable quotation marks, or will need to
> surround the Japanese citation and the
> > quotations using some isolation using <bdi>...</bdi> or equivalent bidi
> isolate controls, or an LTR override
> > control for the leading quotation mark to get:
>
> I don't see any problems with quotes in Emacs, see above.
>
> > May be some bidi processors may opt for matching quotation mark pairs
> such as "..." or ?...? or ?...? or ?...? or
> > ?...? or ?...?, but it is well known that this won't work if quotation
> marks are not paired or use the same
> > mirrorable character for the leasing and trailing quotation marks as
> ?...?,.
>
> They do match here without any problems.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/9d3b4a73/attachment.html>

From eliz at gnu.org  Sun Nov 20 12:19:04 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sun, 20 Nov 2016 20:19:04 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 (message from Philippe Verdy on Sun, 20 Nov 2016 18:51:01 +0100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
Message-ID: <834m329fqf.fsf@gnu.org>

> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Sun, 20 Nov 2016 18:51:01 +0100
> Cc: Simon Cozens <simon at simon-cozens.org>, 
> 	unicode Unicode Discussion <unicode at unicode.org>
> 
> Correction: I expect to see:
> 
> OWT-CIBARA Japanese2" ?Japanese1?" ENO-CIBARA

I don't understand why.

What do you expect with the brackets removed?  I expect this:

 OWT-CIBARA "Japanese1 Japanese2" ENO-CIBARA

because N0 and N1 are no-ops, and N2 clearly says that a neutral
character that is surrounded by text of different directionalities
takes the embedding direction.

From verdy_p at wanadoo.fr  Sun Nov 20 13:58:58 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 20:58:58 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <834m329fqf.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
Message-ID: <CAGa7JC3Bfr9enEVZ6+_g-BWUdGbc8kfGxhyKDu4mmvdphbdVoA@mail.gmail.com>

2016-11-20 19:19 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Sun, 20 Nov 2016 18:51:01 +0100
> > Cc: Simon Cozens <simon at simon-cozens.org>,
> >       unicode Unicode Discussion <unicode at unicode.org>
> >
> > Correction: I expect to see:
> >
> > OWT-CIBARA Japanese2" ?Japanese1?" ENO-CIBARA
>
> I don't understand why.
>
> What do you expect with the brackets removed?  I expect this:
>
>  OWT-CIBARA "Japanese1 Japanese2" ENO-CIBARA
>
> because N0 and N1 are no-ops, and N2 clearly says that a neutral
> character that is surrounded by text of different directionalities
> takes the embedding direction.
>

With ASCII quotes that are hard to match unambiguously in pairs, they would
normally inherit what is in their prior context if they cannot be paired.
So the first quotation mark would take the RTL direction of ARABIC-ONE. the
second quotation mark would also inherit the LTR direction of "Japanese2"
and would to its right.

The final effect would be that quotes would appear glued side-by-side. But
note that the two japanese backets are matching together, so no quotation
mark can be between them: the whole bracketed section including brackets
should be creating its own isolate: this occurs only with the old Bidi
algorithm that did not take bracket pairs into account.

So the [Japanese1] bracketed section should be OK with new renderers (this
is not the case with Chrome that still uses the old algorithm), just after
the ARABIC-ONE and the leading quotation mark of the Japanese section.

But probably the correct rendering should rather be:

   OWT-CIBARA ?Japanese1? Japanese2"" ENO-CIBARA

unless ASCII quotation marks are paired, in which case you'll get:

   OWT-CIBARA "?Japanese1? Japanese2" ENO-CIBARA

which is most probably what is expected.

All this is about deciding if a quotation mark is "leading" or "trailing",
and this is not clear at all for ASCII quotation marks and it has a
consequence on the final rendering made by the Bidi algorithm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/33863bfd/attachment.html>

From verdy_p at wanadoo.fr  Sun Nov 20 14:19:40 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 20 Nov 2016 21:19:40 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC3Bfr9enEVZ6+_g-BWUdGbc8kfGxhyKDu4mmvdphbdVoA@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
 <CAGa7JC3Bfr9enEVZ6+_g-BWUdGbc8kfGxhyKDu4mmvdphbdVoA@mail.gmail.com>
Message-ID: <CAGa7JC3ZikqCHDTbZum1mLmYRwab1_hVBCtx+fXGtnvNzXqSig@mail.gmail.com>

Note that if you get :

   OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA

this means that the first quotation mark is "transparent" and preserves the
RTL direction.
And I don't see then how you can pair the final quotation mark, unless you
consider it as "leading" the ARABIC-TWO part (meaning that you don't pair
these quotation marks at all: only brackets are paired and the
fragment?Japanese1?
is correct (you are using the new Bidi algorithm).

There's still ambiguities for handling pairs of quotation marks (this is
not evident at all and it is language-dependant when some languages do not
distinguish the glyph for the leading and trailing marks, or swap them, for
example with ?Deutsch? as opposed to ?Italiano? or ? fran?ais?, and it is a
difdicult problem in multilingual documents not only mixing RTL and LTR
scripts and needing the Bidi algorithm, and different LTR languages are
occuring).

For citation of Japanese in Arabic text, I sould suggest using Asian
quotation marks by encoding:

  ARABIC-ONE ??Japanese1? Japanese2? ARABIC-TWO

so that Asian quotation marks will unambiguously pair together and you'll
get:

   OWT-CIBARA ?Japanese2 ?Japanese1?? ENO-CIBARA

Or because ??, like also ??, are unambiguously LTR giving them a strong LTR
direction, you'd then get the best:

   OWT-CIBARA ??Japanese1? Japanese2? ENO-CIBARA

But If there are line-wraps in the middle of the Japanese section:

 ??Japanese1?  ENO-CIBARA
   OWT-CIBARA Japanese2?

notably if you can't mirror the CJK quotation marks

Otherwise if you can mirror these marks :

 ?Japanese1?? ENO-CIBARA
   OWT-CIBARA ?Japanese2

or without any line-break in the middle of the Japanese quotation :

    OWT-CIBARA ?Japanese2?Japanese1?? ENO-CIBARA

(here I use? ? only as aliases for the mirrored??, which are not encoded)


2016-11-20 20:58 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

>
>
> 2016-11-20 19:19 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:
>
>> > From: Philippe Verdy <verdy_p at wanadoo.fr>
>> > Date: Sun, 20 Nov 2016 18:51:01 +0100
>> > Cc: Simon Cozens <simon at simon-cozens.org>,
>> >       unicode Unicode Discussion <unicode at unicode.org>
>> >
>> > Correction: I expect to see:
>> >
>> > OWT-CIBARA Japanese2" ?Japanese1?" ENO-CIBARA
>>
>> I don't understand why.
>>
>> What do you expect with the brackets removed?  I expect this:
>>
>>  OWT-CIBARA "Japanese1 Japanese2" ENO-CIBARA
>>
>> because N0 and N1 are no-ops, and N2 clearly says that a neutral
>> character that is surrounded by text of different directionalities
>> takes the embedding direction.
>>
>
> With ASCII quotes that are hard to match unambiguously in pairs, they
> would normally inherit what is in their prior context if they cannot be
> paired.
> So the first quotation mark would take the RTL direction of ARABIC-ONE.
> the second quotation mark would also inherit the LTR direction of
> "Japanese2" and would to its right.
>
> The final effect would be that quotes would appear glued side-by-side. But
> note that the two japanese backets are matching together, so no quotation
> mark can be between them: the whole bracketed section including brackets
> should be creating its own isolate: this occurs only with the old Bidi
> algorithm that did not take bracket pairs into account.
>
> So the [Japanese1] bracketed section should be OK with new renderers (this
> is not the case with Chrome that still uses the old algorithm), just after
> the ARABIC-ONE and the leading quotation mark of the Japanese section.
>
> But probably the correct rendering should rather be:
>
>    OWT-CIBARA ?Japanese1? Japanese2"" ENO-CIBARA
>
> unless ASCII quotation marks are paired, in which case you'll get:
>
>    OWT-CIBARA "?Japanese1? Japanese2" ENO-CIBARA
>
> which is most probably what is expected.
>
> All this is about deciding if a quotation mark is "leading" or "trailing",
> and this is not clear at all for ASCII quotation marks and it has a
> consequence on the final rendering made by the Bidi algorithm
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/89cbc4e5/attachment.html>

From eliz at gnu.org  Sun Nov 20 21:39:56 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Mon, 21 Nov 2016 05:39:56 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC3ZikqCHDTbZum1mLmYRwab1_hVBCtx+fXGtnvNzXqSig@mail.gmail.com>
 (message from Philippe Verdy on Sun, 20 Nov 2016 21:19:40 +0100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
 <CAGa7JC3Bfr9enEVZ6+_g-BWUdGbc8kfGxhyKDu4mmvdphbdVoA@mail.gmail.com>
 <CAGa7JC3ZikqCHDTbZum1mLmYRwab1_hVBCtx+fXGtnvNzXqSig@mail.gmail.com>
Message-ID: <83r3658prn.fsf@gnu.org>

> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Sun, 20 Nov 2016 21:19:40 +0100
> Cc: Simon Cozens <simon at simon-cozens.org>, 
> 	unicode Unicode Discussion <unicode at unicode.org>
> 
> Note that if you get :
> 
> OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA
> 
> this means that the first quotation mark is "transparent" and preserves the RTL direction.

Yes.  It takes the direction of the paragraph, which is RTL.

> And I don't see then how you can pair the final quotation mark, unless you consider it as "leading" the
> ARABIC-TWO part (meaning that you don't pair these quotation marks at all: only brackets are paired and the
> fragment?Japanese1? is correct (you are using the new Bidi algorithm).

The quotes don't need to pair, they just need both to have the
paragraph direction.  And that's what happens, because text on each
side of each quote has different directionality.  The UBA mandates
that the quote (which is ON) takes the embedding direction in that
case, and the embedding direction here is the base paragraph
direction.

From wjgo_10009 at btinternet.com  Mon Nov 21 05:55:54 2016
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 21 Nov 2016 11:55:54 +0000 (GMT)
Subject: Unicode Digest, Vol 35, Issue 16
In-Reply-To: <CAFV=FfhUWVjzYTMSm3zqMBZ_SmwyPfD+kzkT-FU3P7nkbB9png@mail.gmail.com>
References: <mailman.0.1479578401.11142.unicode@unicode.org>
 <CAFV=FfhUWVjzYTMSm3zqMBZ_SmwyPfD+kzkT-FU3P7nkbB9png@mail.gmail.com>
Message-ID: <25253037.25408.1479729354126.JavaMail.defaultUser@defaultHost>

Thank you for your email and for your comments.

> Your abstract emoji are interesting.

Thank you.

> I am especially pleased that your noun brown emoji express a number of grammatical cases.

Thank you. I designed the glyphs with both the Latin case system, and also the way that Esperanto uses a subject, an inflected version of the subject for direct object, and a preposition followed by the same form as used for the subject for all other grammatical cases. in mind.

> However, your Some designs for emoji of personal pronouns is less flexible, wherein the pronouns can only express singular and plural grammatical numbers. Is there any chance that the system may be modified to enable the expression of dual grammatical number?

Yes. I have added some more designs for personal pronouns. I have added designs for "two" and also designs for "three or more".

I have also added some designs so as to give the option of expressing "we" either basically or with specifying one or other of "inclusive we" or "exclusive we".

I have also added a design for the form of you that is expressed by the word "tu" of French.

At the time of writing this note I have got thirty-one designs all in a document produced using the Serif PagePlus version X7 desktop publishing package.

I am hoping to export each of the thirty-one designs as an individual graphic file and add the graphic files to the following web page.

http://www.users.globalnet.co.uk/~ngo/abstract_emoji.htm

William Overington

Monday 21 November 2016

From wjgo_10009 at btinternet.com  Mon Nov 21 06:46:18 2016
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 21 Nov 2016 12:46:18 +0000 (GMT)
Subject: Unicode Digest, Vol 35, Issue 16
In-Reply-To: <CAGa7JC3u0vmf_TCXrJ3FEOzq4WuX0kiUiqpimgHDpT9ds20zeg@mail.gmail.com>
References: <mailman.0.1479578401.11142.unicode@unicode.org>
 <CAFV=FfhUWVjzYTMSm3zqMBZ_SmwyPfD+kzkT-FU3P7nkbB9png@mail.gmail.com>
 <CAGa7JC3u0vmf_TCXrJ3FEOzq4WuX0kiUiqpimgHDpT9ds20zeg@mail.gmail.com>
Message-ID: <22985217.31526.1479732378425.JavaMail.defaultUser@defaultHost>

> On the opposite I think it is much more important to be able to designate the 1st person speaking, and if she speaks for herself or in the noun of a group, the person(s) to she is speaking to (either directly, as as the representant of a group, but this could be a separate "privately" or "alone" attribute), and a generic undesignated/umpersonal 3rd person not designating anyone (he/she/it/they), possiblyt with an additional attribute (a number? an adjective for "near" versus "far", like in the distinction of "this" and "that" or "here" and "there' in English, or "left" vs."right", or "front" vs. "back") to distinguish several entities.
Well, yes, there could be a design for an emoji that means ", speaking for myself," and a design for an emoji that means ", speaking on behalf of ..." and they could be useful in some circumstances. Also, there could be abstract emoji for distinguish several entities as you suggest. The way that emoji are becoming a script upon which language is built is fascinating. I wonder if there are any parallels with how picture writing turned into scripts in the past.
> But once again this discussion is about a long personal invention by William, that attempts since long to push it as a "standard", when he is actually alone and not qualified alone to be an academic source representing an active community, and whre he never demonstrated the existance of any active community supporting his "inventions" (often self-contradictory and constantly changing) :
No. I have been researching on an invention at times since 2009, but this discussion is not about that at all. This discussion is about conveying meaning using a direct display of emoji characters. In some circumstances that conveying of meaning could go through the language barrier. However, the items in this discussion are abstract emoji and are not part of the other project at all.
> In other words it is out of scope for the Unicode standard.
Well, emoji are part of the Unicode Standard and there can be abstract emoji.
Please note item 13 of the following document.
http://www.unicode.org/L2/L2016/16356-esc-cmt-feedback.pdf
> Emojis are definitely NOT used in the world the way that William thinks.
Oh, what do you opine that I think?
> William is in fact inventing since long another script (which has nothing in copmmon with Emojis) but has not been able to conveince a community to use and support it. Borrowing Emojis inside his personnaly invented script does not mean that Emojis are part of William's script.
Well, although I would not call it a script, I have been researching on an invention at times since 2009, but this discussion is not about that invention at all.
In fact, emoji are not used at all in that collection of items due to the lack of precision of meaning of emoji characters.
This discussion is about emoji.
William Overington
Monday 21 November 2016
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/c350470a/attachment.html>

From verdy_p at wanadoo.fr  Mon Nov 21 12:27:10 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 21 Nov 2016 19:27:10 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <83r3658prn.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
 <CAGa7JC3Bfr9enEVZ6+_g-BWUdGbc8kfGxhyKDu4mmvdphbdVoA@mail.gmail.com>
 <CAGa7JC3ZikqCHDTbZum1mLmYRwab1_hVBCtx+fXGtnvNzXqSig@mail.gmail.com>
 <83r3658prn.fsf@gnu.org>
Message-ID: <CAGa7JC1rKWR0q1O7Cys+GJ6rUFLWgaJq9-+xCOa8o2aXMis7+w@mail.gmail.com>

2016-11-21 4:39 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Sun, 20 Nov 2016 21:19:40 +0100
> > Cc: Simon Cozens <simon at simon-cozens.org>,
> >       unicode Unicode Discussion <unicode at unicode.org>
> >
> > Note that if you get :
> >
> > OWT-CIBARA "Japanese2 ?Japanese1?" ENO-CIBARA
> >
> > this means that the first quotation mark is "transparent" and preserves
> the RTL direction.
>
> Yes.  It takes the direction of the paragraph, which is RTL.
>
> > And I don't see then how you can pair the final quotation mark, unless
> you consider it as "leading" the
> > ARABIC-TWO part (meaning that you don't pair these quotation marks at
> all: only brackets are paired and the
> > fragment?Japanese1? is correct (you are using the new Bidi algorithm).
>
> The quotes don't need to pair, they just need both to have the
> paragraph direction.  And that's what happens, because text on each
> side of each quote has different directionality.  The UBA mandates
> that the quote (which is ON) takes the embedding direction in that
> case, and the embedding direction here is the base paragraph
> direction.
>

This is a reasonnable rule for most frequent cases, but I'm not sure this
works in the case of multiple levels of inclusions (with different
directions), where the paragraph direction is not relevant for quotation
marks in the inner levels.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/b81e349f/attachment.html>

From asmusf at ix.netcom.com  Mon Nov 21 14:23:06 2016
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Mon, 21 Nov 2016 12:23:06 -0800
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <834m329fqf.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
Message-ID: <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>

Can we get that example with actual code points, for testing?

A./

From verdy_p at wanadoo.fr  Mon Nov 21 15:17:39 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 21 Nov 2016 22:17:39 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
Message-ID: <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>

Examples were in the initial post I sent in this thread, or in other
replies.

In encoded order, it should be testing this:

ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1 ?, or? japanese3???
" ARABIC-TWO


2016-11-21 21:23 GMT+01:00 Asmus Freytag <asmusf at ix.netcom.com>:

> Can we get that example with actual code points, for testing?
>
> A./
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/57a0094d/attachment.html>

From asmusf at ix.netcom.com  Mon Nov 21 15:40:21 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Mon, 21 Nov 2016 13:40:21 -0800
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
Message-ID: <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>

On 11/21/2016 1:17 PM, Philippe Verdy wrote:
> Examples were in the initial post I sent in this thread, or in other 
> replies.
>
> In encoded order, it should be testing this:
>
> ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1 ?, or? 
> japanese3???" ARABIC-TWO
I don't see any actual Arabic or Japanese letters.
A./
>
>
> 2016-11-21 21:23 GMT+01:00 Asmus Freytag <asmusf at ix.netcom.com 
> <mailto:asmusf at ix.netcom.com>>:
>
>     Can we get that example with actual code points, for testing?
>
>     A./
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/141dde0e/attachment.html>

From verdy_p at wanadoo.fr  Mon Nov 21 16:02:32 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 21 Nov 2016 23:02:32 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
Message-ID: <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>

You don't need them, I just used lowercase letters for strong LTR
characters and uppercase for RTL, just like in the existing Bidi test page.
Use some random Arabic or Japanese words if you prefer.

2016-11-21 22:40 GMT+01:00 Asmus Freytag (c) <asmusf at ix.netcom.com>:

> On 11/21/2016 1:17 PM, Philippe Verdy wrote:
>
> Examples were in the initial post I sent in this thread, or in other
> replies.
>
> In encoded order, it should be testing this:
>
> ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1 ?, or? japanese3???
> " ARABIC-TWO
>
> I don't see any actual Arabic or Japanese letters.
> A./
>
>
>
> 2016-11-21 21:23 GMT+01:00 Asmus Freytag <asmusf at ix.netcom.com>:
>
>> Can we get that example with actual code points, for testing?
>>
>> A./
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/259d7e48/attachment.html>

From verdy_p at wanadoo.fr  Mon Nov 21 16:17:15 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 21 Nov 2016 23:17:15 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
Message-ID: <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>

2016-11-21 23:02 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> You don't need them, I just used lowercase letters for strong LTR
> characters and uppercase for RTL, just like in the existing Bidi test page.
> Use some random Arabic or Japanese words if you prefer.
>
> 2016-11-21 22:40 GMT+01:00 Asmus Freytag (c) <asmusf at ix.netcom.com>:
>
>> On 11/21/2016 1:17 PM, Philippe Verdy wrote:
>>
>> Examples were in the initial post I sent in this thread, or in other
>> replies.
>>
>> In encoded order, it should be testing this:
>>
>> ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1 ?, or? japanese3??
>> ?" ARABIC-TWO
>>
>> Replacing "japanese" by its translation in Japanese, and translating
ARABIC-ONE and TWO into Arabic (Note: japanese3 is been also translated in
Arabic):

??????? ????? "????1????2: ?english1, ? french1 ?, or?????????? ??????"
???????-?????


The CJK square quote are not mirrored, they are just swapped, but still do
not embed their content as pairs...
This is an example of where the simple assignement of direction for quotes
from the paragraph direction only does not work, and where detecting pairs
or quotes would be necessary to fix their enclosure as isolates at inner
levels.n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/09490ff4/attachment.html>

From chris.jacobs at xs4all.nl  Mon Nov 21 16:38:15 2016
From: chris.jacobs at xs4all.nl (Chris Jacobs)
Date: Mon, 21 Nov 2016 23:38:15 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
 <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
Message-ID: <40450033fc19abe42a4ed83ff9adc5f7@xs4all.nl>

The CJK quotes display here just fine in XS4ALL webmail, but not in
Outlook. 

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/ffd9ef06/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2016-11-21.png
Type: image/png
Size: 200905 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/ffd9ef06/attachment-0001.png>

From asmusf at ix.netcom.com  Mon Nov 21 16:58:40 2016
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Mon, 21 Nov 2016 14:58:40 -0800
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
Message-ID: <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>

On 11/21/2016 2:17 PM, Philippe Verdy wrote:
>
>
> 2016-11-21 23:02 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr 
> <mailto:verdy_p at wanadoo.fr>>:
>
>     You don't need them, I just used lowercase letters for strong LTR
>     characters and uppercase for RTL, just like in the existing Bidi
>     test page. Use some random Arabic or Japanese words if you prefer.
>
>     2016-11-21 22:40 GMT+01:00 Asmus Freytag (c) <asmusf at ix.netcom.com
>     <mailto:asmusf at ix.netcom.com>>:
>
>         On 11/21/2016 1:17 PM, Philippe Verdy wrote:
>>         Examples were in the initial post I sent in this thread, or
>>         in other replies.
>>
>>         In encoded order, it should be testing this:
>>
>>         ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1 ?, or?
>>         japanese3???" ARABIC-TWO
>
> Replacing "japanese" by its translation in Japanese, and translating 
> ARABIC-ONE and TWO into Arabic (Note: japanese3 is been also 
> translated in Arabic):
>
>     ??????? ????? "????1????2: ?english1, ? french1 ?,
>     or?????????? ??????" ???????-?????
>
>
> The CJK square quote are not mirrored, they are just swapped, but 
> still do not embed their content as pairs...
> This is an example of where the simple assignement of direction for 
> quotes from the paragraph direction only does not work, and where 
> detecting pairs or quotes would be necessary to fix their enclosure as 
> isolates at inner levels.n

I get


where is the problem?
A./


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/3846a135/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fhmbobjniphfamjk.png
Type: image/png
Size: 1707 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/3846a135/attachment.png>

From asmusf at ix.netcom.com  Mon Nov 21 17:00:28 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Mon, 21 Nov 2016 15:00:28 -0800
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
Message-ID: <0ec66454-1dce-c926-9c35-fa6d3613e3ce@ix.netcom.com>

On 11/21/2016 2:02 PM, Philippe Verdy wrote:
> You don't need them, I just used lowercase letters for strong LTR 
> characters and uppercase for RTL, just like in the existing Bidi test 
> page. Use some random Arabic or Japanese words if you prefer.
The difference is that I can cut/paste an actual string into my 
mailer/browser/whatever and observe what's happening.

I see that you sent me something. I'll try and mail back a screenshot, 
but the list is so super-restrictive on images that you may only get it 
cc'd directly to you.
A./
>
> 2016-11-21 22:40 GMT+01:00 Asmus Freytag (c) <asmusf at ix.netcom.com 
> <mailto:asmusf at ix.netcom.com>>:
>
>     On 11/21/2016 1:17 PM, Philippe Verdy wrote:
>>     Examples were in the initial post I sent in this thread, or in
>>     other replies.
>>
>>     In encoded order, it should be testing this:
>>
>>     ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1 ?, or?
>>     japanese3???" ARABIC-TWO
>     I don't see any actual Arabic or Japanese letters.
>     A./
>>
>>
>>     2016-11-21 21:23 GMT+01:00 Asmus Freytag <asmusf at ix.netcom.com
>>     <mailto:asmusf at ix.netcom.com>>:
>>
>>         Can we get that example with actual code points, for testing?
>>
>>         A./
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161121/6e44d675/attachment.html>

From verdy_p at wanadoo.fr  Mon Nov 21 19:47:10 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Tue, 22 Nov 2016 02:47:10 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
Message-ID: <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>

Look at where the Asian quotes are partially "moved" by the ASCII quotes in
Chrome.

May be the reason is that Chrome still does not use the new rules.

You probably use another browser that implement other rules.

2016-11-21 23:58 GMT+01:00 Asmus Freytag <asmusf at ix.netcom.com>:

> On 11/21/2016 2:17 PM, Philippe Verdy wrote:
>
>
>
> 2016-11-21 23:02 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:
>
>> You don't need them, I just used lowercase letters for strong LTR
>> characters and uppercase for RTL, just like in the existing Bidi test page.
>> Use some random Arabic or Japanese words if you prefer.
>>
>> 2016-11-21 22:40 GMT+01:00 Asmus Freytag (c) <asmusf at ix.netcom.com>:
>>
>>> On 11/21/2016 1:17 PM, Philippe Verdy wrote:
>>>
>>> Examples were in the initial post I sent in this thread, or in other
>>> replies.
>>>
>>> In encoded order, it should be testing this:
>>>
>>> ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1 ?, or? japanese3?
>>> ??" ARABIC-TWO
>>>
>>> Replacing "japanese" by its translation in Japanese, and translating
> ARABIC-ONE and TWO into Arabic (Note: japanese3 is been also translated in
> Arabic):
>
> ??????? ????? "????1????2: ?english1, ? french1 ?, or?????????? ??????"
> ???????-?????
>
>
> The CJK square quote are not mirrored, they are just swapped, but still do
> not embed their content as pairs...
> This is an example of where the simple assignement of direction for quotes
> from the paragraph direction only does not work, and where detecting pairs
> or quotes would be necessary to fix their enclosure as isolates at inner
> levels.n
>
>
> I get
>
>
> where is the problem?
> A./
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161122/0250ca90/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fhmbobjniphfamjk.png
Type: image/png
Size: 1707 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161122/0250ca90/attachment.png>

From asmusf at ix.netcom.com  Tue Nov 22 09:15:08 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Tue, 22 Nov 2016 07:15:08 -0800
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
Message-ID: <d200243d-2572-e155-e702-273ab8dd8ffe@ix.netcom.com>

On 11/21/2016 5:47 PM, Philippe Verdy wrote:
> Look at where the Asian quotes are partially "moved" by the ASCII 
> quotes in Chrome.

How does Chrome enter into this? (What I posted is a screenshot from 
Thunderbird on Windows 7).
It seems to fully match up the the example using the UPPER/lower case 
convention.

A./
>
> May be the reason is that Chrome still does not use the new rules.
>
> You probably use another browser that implement other rules.
>
> 2016-11-21 23:58 GMT+01:00 Asmus Freytag <asmusf at ix.netcom.com 
> <mailto:asmusf at ix.netcom.com>>:
>
>     On 11/21/2016 2:17 PM, Philippe Verdy wrote:
>>
>>
>>     2016-11-21 23:02 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr
>>     <mailto:verdy_p at wanadoo.fr>>:
>>
>>         You don't need them, I just used lowercase letters for strong
>>         LTR characters and uppercase for RTL, just like in the
>>         existing Bidi test page. Use some random Arabic or Japanese
>>         words if you prefer.
>>
>>         2016-11-21 22:40 GMT+01:00 Asmus Freytag (c)
>>         <asmusf at ix.netcom.com <mailto:asmusf at ix.netcom.com>>:
>>
>>             On 11/21/2016 1:17 PM, Philippe Verdy wrote:
>>>             Examples were in the initial post I sent in this thread,
>>>             or in other replies.
>>>
>>>             In encoded order, it should be testing this:
>>>
>>>             ARABIC-ONE "?japanese1?japanese2: ?english1, ? french1
>>>             ?, or? japanese3???" ARABIC-TWO
>>
>>     Replacing "japanese" by its translation in Japanese, and
>>     translating ARABIC-ONE and TWO into Arabic (Note: japanese3 is
>>     been also translated in Arabic):
>>
>>         ??????? ????? "????1????2: ?english1, ? french1 ?,
>>         or?????????? ??????" ???????-?????
>>
>>
>>     The CJK square quote are not mirrored, they are just swapped, but
>>     still do not embed their content as pairs...
>>     This is an example of where the simple assignement of direction
>>     for quotes from the paragraph direction only does not work, and
>>     where detecting pairs or quotes would be necessary to fix their
>>     enclosure as isolates at inner levels.n
>
>     I get
>
>
>     where is the problem?
>     A./
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161122/486e8bcf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 1707 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161122/486e8bcf/attachment.png>

From tom at osg.samsung.com  Tue Nov 22 06:07:16 2016
From: tom at osg.samsung.com (Tom Hacohen)
Date: Tue, 22 Nov 2016 12:07:16 +0000
Subject: Potential contradiction between the WordBreak test data and UAX #29
Message-ID: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>

Dear,

I recently updated libunibreak[1] according to unicode 9.0.0. I thought 
I implemented it correctly, however it fails against two of the tests in 
the reference test data:

? 200D ? 0308 ? 2764 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0] 
COMBINING DIAERESIS (Extend_FE) ? [999.0] HEAVY BLACK HEART 
(Glue_After_Zwj) ? [0.3]

and

? 200D ? 0308 ? 1F466 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0] 
COMBINING DIAERESIS (Extend_FE) ? [999.0] BOY (EBG) ? [0.3]


More specifically, it fails in both after the "combining diaeresis". My 
implementation marks it as a break, whereas the test data as not. The 
reference implementation, as expected, agrees with the test data.


However, looking at the test case and the UAX[2], this does not look 
correct. More specifically, because of rule 4:
ZWJ Extended GAZ -> ZWJ GAZ
And then according to rule 3c, there should be no break opportunity 
between them. The reference implementation, however, uses rule 999 here, 
which I believe is incorrect.


Am I missing anything, or is this an issue with the reference test data 
and reference implementation?

Thanks,
Tom.

[1]: https://github.com/adah1972/libunibreak
[2]: http://www.unicode.org/reports/tr29/#WB1

From verdy_p at wanadoo.fr  Tue Nov 22 20:49:08 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 23 Nov 2016 03:49:08 +0100
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
Message-ID: <CAGa7JC2vLOX+SqH78v+tn0pW-Bfs9ziYHAcp0+VQVbbHpO5DGg@mail.gmail.com>

IMHO, the ZWJ should glue with the last symbol following your examples.
But the combining diaeresis following the ZWJ extends it (even if in my
opinion it is "defective" and would likely display on a dotted ciurcle in
renderers, but not defective for the string definition of combining
sequences).
So ignore it and test whever the last symbols glues with ZWJ (it should, so
there's no break in the reference implementation).

WB4: X (Extend | Format | ZWJ)*?X

Extend: [ExtendGrapheme_Extend=Yes]  This includes:
  General_Category = Nonspacing_Mark (this includes the combining diaeresis)
  General_Category = Enclosing_Mark
  U+200C ZERO WIDTH NON-JOINER
  plus a few General_Category = Spacing_Mark needed for canonical
equivalence.

So yes we have: ZWJ "COMBINING DIERESIS" (EBG|Glue_After_Zwj) ? ZWJ (EBG|
Glue_After_Zwj) from rule WB4 eliminate the combining mark from the input
queue

But rule WB3c comes before and prohibits it:

WB3c: ZWJ ? (Glue_After_Zwj | EBG)

This means that you have first:

ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG

and this does not match the rule WB4 which is not matching for:

X ? (Extend | Format | ZWJ)*?X

(it cannot remove the extenders if there's a no-break before them, it is
valid only when the break oppotunity is still unspecified. As soon as a
rule as produced a "break here" or "nobreak here" at a given position, you
must advance after this position (the rules are based on a small finite
state machine). So after :

ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG

it just remains in your input queue:

"COMBINING DIERESIS" EBG  (because "ZWJ ?" is already processed, and so ZWJ
is elminated)

Now comes WB4: X (Extend | Format | ZWJ)* ? X

There's no more any "X" to match before the combining diaeresis: your input
queue starts by the combining diareasis matching "X", the following
character (EBG) does not match within "(Extend | Format | ZWJ)*" (which
matches an empty string and does not contain the combining diaresis already
matched in "X"), rule WB4 has then no replacement effect and preserves the
initial "X" (i.e. the combining diaeresis)

.


2016-11-22 13:07 GMT+01:00 Tom Hacohen <tom at osg.samsung.com>:

> Dear,
>
> I recently updated libunibreak[1] according to unicode 9.0.0. I thought I
> implemented it correctly, however it fails against two of the tests in the
> reference test data:
>
> ? 200D ? 0308 ? 2764 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0]
> COMBINING DIAERESIS (Extend_FE) ? [999.0] HEAVY BLACK HEART
> (Glue_After_Zwj) ? [0.3]
>
> and
>
> ? 200D ? 0308 ? 1F466 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0]
> COMBINING DIAERESIS (Extend_FE) ? [999.0] BOY (EBG) ? [0.3]
>
>
> More specifically, it fails in both after the "combining diaeresis". My
> implementation marks it as a break, whereas the test data as not. The
> reference implementation, as expected, agrees with the test data.
>
>
> However, looking at the test case and the UAX[2], this does not look
> correct. More specifically, because of rule 4:
> ZWJ Extended GAZ -> ZWJ GAZ
> And then according to rule 3c, there should be no break opportunity
> between them. The reference implementation, however, uses rule 999 here,
> which I believe is incorrect.
>
>
> Am I missing anything, or is this an issue with the reference test data
> and reference implementation?
>
> Thanks,
> Tom.
>
> [1]: https://github.com/adah1972/libunibreak
> [2]: http://www.unicode.org/reports/tr29/#WB1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161123/8eee3518/attachment.html>

From verdy_p at wanadoo.fr  Tue Nov 22 20:56:39 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 23 Nov 2016 03:56:39 +0100
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <CAGa7JC2vLOX+SqH78v+tn0pW-Bfs9ziYHAcp0+VQVbbHpO5DGg@mail.gmail.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <CAGa7JC2vLOX+SqH78v+tn0pW-Bfs9ziYHAcp0+VQVbbHpO5DGg@mail.gmail.com>
Message-ID: <CAGa7JC2KzpQdpT+BP3tRz1Y4NxHViSJ8mZ4oU9LwhXs8aMxsUA@mail.gmail.com>

Note also this statement at the begining of the specification:

Single boundaries. Each rule has exactly one boundary position. This
restriction is more a limitation on the specification methods, because a
rule with multiple boundaries could be expressed instead as multiple rules.
For example:
 *  ?a b ? c d ? e f? could be broken into two rules ?a b ? c d e f? and ?a
b c d ? e f?
 *  ?a b ? c d ? e f? could be broken into two rules ?a b ? c d e f? and ?a
b c d ? e f?

The rules are not built to allow keeping and processing multiple boundary
positions. Only one is considered: once a break or no-break decision is
made on a position, everything that is before that position is discarded
from the input and will no longer be used in further rule. The engines
loops at the first rule, just from that new boundary position to find
matching rules, without ever looking backward.

2016-11-23 3:49 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> IMHO, the ZWJ should glue with the last symbol following your examples.
> But the combining diaeresis following the ZWJ extends it (even if in my
> opinion it is "defective" and would likely display on a dotted ciurcle in
> renderers, but not defective for the string definition of combining
> sequences).
> So ignore it and test whever the last symbols glues with ZWJ (it should,
> so there's no break in the reference implementation).
>
> WB4: X (Extend | Format | ZWJ)*?X
>
> Extend: [ExtendGrapheme_Extend=Yes]  This includes:
>   General_Category = Nonspacing_Mark (this includes the combining
> diaeresis)
>   General_Category = Enclosing_Mark
>   U+200C ZERO WIDTH NON-JOINER
>   plus a few General_Category = Spacing_Mark needed for canonical
> equivalence.
>
> So yes we have: ZWJ "COMBINING DIERESIS" (EBG|Glue_After_Zwj) ? ZWJ (EBG|
> Glue_After_Zwj) from rule WB4 eliminate the combining mark from the input
> queue
>
> But rule WB3c comes before and prohibits it:
>
> WB3c: ZWJ ? (Glue_After_Zwj | EBG)
>
> This means that you have first:
>
> ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG
>
> and this does not match the rule WB4 which is not matching for:
>
> X ? (Extend | Format | ZWJ)*?X
>
> (it cannot remove the extenders if there's a no-break before them, it is
> valid only when the break oppotunity is still unspecified. As soon as a
> rule as produced a "break here" or "nobreak here" at a given position, you
> must advance after this position (the rules are based on a small finite
> state machine). So after :
>
> ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG
>
> it just remains in your input queue:
>
> "COMBINING DIERESIS" EBG  (because "ZWJ ?" is already processed, and so
> ZWJ is elminated)
>
> Now comes WB4: X (Extend | Format | ZWJ)* ? X
>
> There's no more any "X" to match before the combining diaeresis: your
> input queue starts by the combining diareasis matching "X", the following
> character (EBG) does not match within "(Extend | Format | ZWJ)*" (which
> matches an empty string and does not contain the combining diaresis already
> matched in "X"), rule WB4 has then no replacement effect and preserves the
> initial "X" (i.e. the combining diaeresis)
>
> .
>
>
>
>
>
>
> 2016-11-22 13:07 GMT+01:00 Tom Hacohen <tom at osg.samsung.com>:
>
>> Dear,
>>
>> I recently updated libunibreak[1] according to unicode 9.0.0. I thought I
>> implemented it correctly, however it fails against two of the tests in the
>> reference test data:
>>
>> ? 200D ? 0308 ? 2764 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0]
>> COMBINING DIAERESIS (Extend_FE) ? [999.0] HEAVY BLACK HEART
>> (Glue_After_Zwj) ? [0.3]
>>
>> and
>>
>> ? 200D ? 0308 ? 1F466 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0]
>> COMBINING DIAERESIS (Extend_FE) ? [999.0] BOY (EBG) ? [0.3]
>>
>>
>> More specifically, it fails in both after the "combining diaeresis". My
>> implementation marks it as a break, whereas the test data as not. The
>> reference implementation, as expected, agrees with the test data.
>>
>>
>> However, looking at the test case and the UAX[2], this does not look
>> correct. More specifically, because of rule 4:
>> ZWJ Extended GAZ -> ZWJ GAZ
>> And then according to rule 3c, there should be no break opportunity
>> between them. The reference implementation, however, uses rule 999 here,
>> which I believe is incorrect.
>>
>>
>> Am I missing anything, or is this an issue with the reference test data
>> and reference implementation?
>>
>> Thanks,
>> Tom.
>>
>> [1]: https://github.com/adah1972/libunibreak
>> [2]: http://www.unicode.org/reports/tr29/#WB1
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161123/474dd9ae/attachment.html>

From richard.wordingham at ntlworld.com  Wed Nov 23 03:05:11 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 23 Nov 2016 09:05:11 +0000
Subject: Line-Breaking Hyphenation
Message-ID: <20161123090511.1b691ece@JRWUBU2>

What is 'line-breaking hyphenation'?  In particular, I am trying to
determine the meaning of the TUS statement 'There is no
line-breaking hyphenation' referring to the Lanna script at the end of
TUS Section 16.7.

One possibility is that it means that visible text does not
distinguish line breaks within words from line breaks at word
boundaries, which would be a statement about the prevalent style.

Another possibility is that it means that automatic line-breaking does
not split words.  I am not sure if 'opportunities for line breaking are
lexical' nevertheless allows for the use of hyphenation dictionaries.

The statement 'Opportunities  for  line  breaking  are  lexical,  but
a  line  break  may  not  be inserted between a base letter and a
combining diacritic' confuses me.  Is it saying that a clitic may not
be separated from a word if so doing would break a vertical stack?
Perhaps it is also saying that there is no line break between words if
thay share a vertical stack, as can happen in Pali.

Richard. 

From tom at osg.samsung.com  Wed Nov 23 03:13:28 2016
From: tom at osg.samsung.com (Tom Hacohen)
Date: Wed, 23 Nov 2016 09:13:28 +0000
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <CAGa7JC2vLOX+SqH78v+tn0pW-Bfs9ziYHAcp0+VQVbbHpO5DGg@mail.gmail.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <CAGa7JC2vLOX+SqH78v+tn0pW-Bfs9ziYHAcp0+VQVbbHpO5DGg@mail.gmail.com>
Message-ID: <ababb5e2-61f7-9ca6-89ec-bba1a7eea80d@osg.samsung.com>

You said:
 > So ignore it and test whever the last symbols glues with ZWJ (it should,
 > so there's no break in the reference implementation).

Which makes me think you misread the example I quoted. There is a break 
in the reference implementation, though I argue (like you just did) that 
there shouldn't be. So I think you agree with me and also think it's broken.

Otherwise, I'm not sure I fully understand what you are saying, but if 
what you are saying is correct, then following the same logic, other 
rules would fail, specifically:

? 0061 ? 2060 ? 0030 ?  #  ? [0.2] LATIN SMALL LETTER A (ALetter) ? 
[4.0] WORD JOINER (Format_FE) ? [9.0] DIGIT ZERO (Numeric) ? [0.3]

After the FE here there's no BREAK because:
ALetter Format Numeric -> ALetter Numeric
Which then following rule 9.0 is a no-break.

This is exactly the rule (4) as described in my previous email, just 
with a different follow-up rule (9 instead of 3c). I don't see how rule 
precedence would matter here, as there is no case for which two rules apply.

--
Tom.

On 23/11/16 02:49, Philippe Verdy wrote:
> IMHO, the ZWJ should glue with the last symbol following your examples.
> But the combining diaeresis following the ZWJ extends it (even if in my
> opinion it is "defective" and would likely display on a dotted ciurcle
> in renderers, but not defective for the string definition of combining
> sequences).
> So ignore it and test whever the last symbols glues with ZWJ (it should,
> so there's no break in the reference implementation).
>
> WB4: X (Extend | Format | ZWJ)*?X
>
> Extend: [ExtendGrapheme_Extend=Yes]  This includes:
>   General_Category = Nonspacing_Mark (this includes the combining diaeresis)
>   General_Category = Enclosing_Mark
>   U+200C ZERO WIDTH NON-JOINER
>   plus a few General_Category = Spacing_Mark needed for canonical
> equivalence.
>
> So yes we have: ZWJ "COMBINING DIERESIS" (EBG|Glue_After_Zwj) ? ZWJ
> (EBG|Glue_After_Zwj) from rule WB4 eliminate the combining mark from the
> input queue
>
> But rule WB3c comes before and prohibits it:
>
> WB3c: ZWJ ? (Glue_After_Zwj | EBG)
>
> This means that you have first:
>
> ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG
>
> and this does not match the rule WB4 which is not matching for:
>
> X ? (Extend | Format | ZWJ)*?X
>
> (it cannot remove the extenders if there's a no-break before them, it is
> valid only when the break oppotunity is still unspecified. As soon as a
> rule as produced a "break here" or "nobreak here" at a given position,
> you must advance after this position (the rules are based on a small
> finite state machine). So after :
>
> ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG
>
> it just remains in your input queue:
>
> "COMBINING DIERESIS" EBG  (because "ZWJ ?" is already processed, and so
> ZWJ is elminated)
>
> Now comes WB4: X (Extend | Format | ZWJ)* ? X
>
> There's no more any "X" to match before the combining diaeresis: your
> input queue starts by the combining diareasis matching "X", the
> following character (EBG) does not match within "(Extend | Format |
> ZWJ)*" (which matches an empty string and does not contain the combining
> diaresis already matched in "X"), rule WB4 has then no replacement
> effect and preserves the initial "X" (i.e. the combining diaeresis)
>
> .
>
> 	
> 	
>
>
>
>
> 2016-11-22 13:07 GMT+01:00 Tom Hacohen <tom at osg.samsung.com
> <mailto:tom at osg.samsung.com>>:
>
>     Dear,
>
>     I recently updated libunibreak[1] according to unicode 9.0.0. I
>     thought I implemented it correctly, however it fails against two of
>     the tests in the reference test data:
>
>     ? 200D ? 0308 ? 2764 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0]
>     COMBINING DIAERESIS (Extend_FE) ? [999.0] HEAVY BLACK HEART
>     (Glue_After_Zwj) ? [0.3]
>
>     and
>
>     ? 200D ? 0308 ? 1F466 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ?
>     [4.0] COMBINING DIAERESIS (Extend_FE) ? [999.0] BOY (EBG) ? [0.3]
>
>
>     More specifically, it fails in both after the "combining diaeresis".
>     My implementation marks it as a break, whereas the test data as not.
>     The reference implementation, as expected, agrees with the test data.
>
>
>     However, looking at the test case and the UAX[2], this does not look
>     correct. More specifically, because of rule 4:
>     ZWJ Extended GAZ -> ZWJ GAZ
>     And then according to rule 3c, there should be no break opportunity
>     between them. The reference implementation, however, uses rule 999
>     here, which I believe is incorrect.
>
>
>     Am I missing anything, or is this an issue with the reference test
>     data and reference implementation?
>
>     Thanks,
>     Tom.
>
>     [1]: https://github.com/adah1972/libunibreak
>     <https://github.com/adah1972/libunibreak>
>     [2]: http://www.unicode.org/reports/tr29/#WB1
>     <http://www.unicode.org/reports/tr29/#WB1>
>
>


From daniel.buenzli at erratique.ch  Wed Nov 23 04:01:59 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Wed, 23 Nov 2016 11:01:59 +0100
Subject: Potential contradiction between the WordBreak test data and
 UAX #29
In-Reply-To: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
Message-ID: <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>

On Tuesday 22 November 2016 at 13:07, Tom Hacohen wrote:
> However, looking at the test case and the UAX[2], this does not look
> correct. More specifically, because of rule 4:
> ZWJ Extended GAZ -> ZWJ GAZ
> And then according to rule 3c, there should be no break opportunity 
> between them. 

I'd say this is not the right operational model. From [1]: 

"The rules are processed from top to bottom. As soon as a rule matches and produces a boundary status (boundary or no boundary) for that offset, the process is terminated."

So in this case between COMBINING DIAERESIS and HEAVY BLACK HEART rule WB4 quicks in. It does not produce a boundary status, it only changes your offset context to ZWJ GAZ, as you mention. Now you continue applying the rules sequentially with WB6 which does not match, with WB7 which does not match,... and you'll get to WB999 which matches and produces a boundary status. 

After WB4 you do not restart the matching process from the beginning, as you do, leading you to say that WB3c should apply.

Best, 

Daniel


[1] http://www.unicode.org/reports/tr29/#Notation


From tom at osg.samsung.com  Wed Nov 23 04:22:59 2016
From: tom at osg.samsung.com (Tom Hacohen)
Date: Wed, 23 Nov 2016 10:22:59 +0000
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
Message-ID: <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>

On 23/11/16 10:01, Daniel B?nzli wrote:
> On Tuesday 22 November 2016 at 13:07, Tom Hacohen wrote:
>> However, looking at the test case and the UAX[2], this does not look
>> correct. More specifically, because of rule 4:
>> ZWJ Extended GAZ -> ZWJ GAZ
>> And then according to rule 3c, there should be no break opportunity
>> between them.
>
> I'd say this is not the right operational model. From [1]:
>
> "The rules are processed from top to bottom. As soon as a rule matches and produces a boundary status (boundary or no boundary) for that offset, the process is terminated."
>
> So in this case between COMBINING DIAERESIS and HEAVY BLACK HEART rule WB4 quicks in. It does not produce a boundary status, it only changes your offset context to ZWJ GAZ, as you mention. Now you continue applying the rules sequentially with WB6 which does not match, with WB7 which does not match,... and you'll get to WB999 which matches and produces a boundary status.
>
> After WB4 you do not restart the matching process from the beginning, as you do, leading you to say that WB3c should apply.

Hey Daniel,

Thank you for your reply, but I don't think the UAX, specifically the 
line you quoted implies that. The line you quoted says that the process 
is terminated when a rule matches and produces a boundary status. In 
Table 1[1], the right-arrow (which is used in rule 4) is listed as a 
boundary symbol, so I would argue that one should stop the process and 
start it again from the start.

Furthermore, in the clarification to rule 4[2] it clearly states: "The 
main purpose of this rule is to always treat a grapheme cluster as a 
single character?that is, as if it were simply the first character of 
the cluster".
This again sides with my understanding that:
X Extendend Y
should behave exactly the same as
X Y
after the extended part.
Which is exactly what I'm arguing for.

--
Tom

[1] http://www.unicode.org/reports/tr29/#Table_Boundary_Symbols
[2] http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules

From daniel.buenzli at erratique.ch  Wed Nov 23 04:52:56 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Wed, 23 Nov 2016 11:52:56 +0100
Subject: Potential contradiction between the WordBreak test data and
 UAX #29
In-Reply-To: <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
Message-ID: <012E41802C7842F386529FBA99969391@erratique.ch>

On Wednesday 23 November 2016 at 11:22, Tom Hacohen wrote:
> Thank you for your reply, but I don't think the UAX, specifically the
> line you quoted implies that. The line you quoted says that the process 
> is terminated when a rule matches and produces a boundary status. In 
> Table 1[1], the right-arrow (which is used in rule 4) is listed as a 
> boundary symbol, 

Precisely, rules with this *symbol* do not produce a boundary *status* which is either boundary or not boundary as mentioned in parens in the line I quoted.
 
> so I would argue that one should stop the process and start it again from the start.

At least in the current UAX there is no mention of an idea of stopping and restarting the process at all.

Best, 

Daniel

From tom at osg.samsung.com  Wed Nov 23 05:00:53 2016
From: tom at osg.samsung.com (Tom Hacohen)
Date: Wed, 23 Nov 2016 11:00:53 +0000
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <012E41802C7842F386529FBA99969391@erratique.ch>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
 <012E41802C7842F386529FBA99969391@erratique.ch>
Message-ID: <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>

On 23/11/16 10:52, Daniel B?nzli wrote:
> On Wednesday 23 November 2016 at 11:22, Tom Hacohen wrote:
>> Thank you for your reply, but I don't think the UAX, specifically the
>> line you quoted implies that. The line you quoted says that the process
>> is terminated when a rule matches and produces a boundary status. In
>> Table 1[1], the right-arrow (which is used in rule 4) is listed as a
>> boundary symbol,
>
> Precisely, rules with this *symbol* do not produce a boundary *status* which is either boundary or not boundary as mentioned in parens in the line I quoted.

This looks like a mistake statement rather than a binding rule.

>
>> so I would argue that one should stop the process and start it again from the start.
>
> At least in the current UAX there is no mention of an idea of stopping and restarting the process at all.

Even if that's true, look at my second statement (which you redacted in 
your reply):

Furthermore, in the clarification to rule 4[2] it clearly states: "The 
main purpose of this rule is to always treat a grapheme cluster as a 
single character?that is, as if it were simply the first character of 
the cluster".
This again sides with my understanding that:
X Extendend Y
should behave exactly the same as
X Y
after the extended part.
Which is exactly what I'm arguing for.


Also take another look at 
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules 
specifically the table that shows another way of writing the ignore 
rule. This again shows my understanding of rule 4 is correct.

Specially look at the following equivalence:
X Y ? Z W 	? 	X (Extend | Format)* Y (Extend | Format)* ? Z (Extend | 
Format)* W

--
Tom

From daniel.buenzli at erratique.ch  Wed Nov 23 05:11:55 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Wed, 23 Nov 2016 12:11:55 +0100
Subject: Potential contradiction between the WordBreak test data and
 UAX #29
In-Reply-To: <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
 <012E41802C7842F386529FBA99969391@erratique.ch>
 <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
Message-ID: <1A9F4765A03D446AA6D6DD4B31465072@erratique.ch>


On Wednesday 23 November 2016 at 12:00, Tom Hacohen wrote:
> This looks like a mistake statement rather than a binding rule.
Well at least to me it's pretty clear that this is not the case.


> Even if that's true, look at my second statement (which you redacted in
> your reply):

I'm not arguing whether the boundaries produced by this process is good or not. I'm just saying that to me, the test data is consistent with the operational model and rules of UAX#29 as it exists. 

Best, 

Daniel


From tom at osg.samsung.com  Wed Nov 23 05:14:09 2016
From: tom at osg.samsung.com (Tom Hacohen)
Date: Wed, 23 Nov 2016 11:14:09 +0000
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <1A9F4765A03D446AA6D6DD4B31465072@erratique.ch>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
 <012E41802C7842F386529FBA99969391@erratique.ch>
 <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
 <1A9F4765A03D446AA6D6DD4B31465072@erratique.ch>
Message-ID: <b5840524-2de3-4421-f059-ffd9fad3fcfc@osg.samsung.com>

On 23/11/16 11:11, Daniel B?nzli wrote:
>
> On Wednesday 23 November 2016 at 12:00, Tom Hacohen wrote:
>> This looks like a mistake statement rather than a binding rule.
> Well at least to me it's pretty clear that this is not the case.
>
>
>> Even if that's true, look at my second statement (which you redacted in
>> your reply):
>
> I'm not arguing whether the boundaries produced by this process is good or not. I'm just saying that to me, the test data is consistent with the operational model and rules of UAX#29 as it exists.

I'm arguing it's not, and I still don't agree with your understanding of 
the operational model, again, take a look at what I wrote in my last email:

Also take another look at 
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules 
specifically the table that shows another way of writing the ignore 
rule. This again shows my understanding of rule 4 is correct.

Specially look at the following equivalence:
X Y ? Z W     ?     X (Extend | Format)* Y (Extend | Format)* ? Z 
(Extend | Format)* W

--
Tom

From verdy_p at wanadoo.fr  Wed Nov 23 05:14:51 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 23 Nov 2016 12:14:51 +0100
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <ababb5e2-61f7-9ca6-89ec-bba1a7eea80d@osg.samsung.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <CAGa7JC2vLOX+SqH78v+tn0pW-Bfs9ziYHAcp0+VQVbbHpO5DGg@mail.gmail.com>
 <ababb5e2-61f7-9ca6-89ec-bba1a7eea80d@osg.samsung.com>
Message-ID: <CAGa7JC0e-GVjW8yMSB_q_sKvEdOX2u0RMoPdQ-enOYfRUZX=cA@mail.gmail.com>

You say "theres's no case where two rules apply". I don't think this is
right, rules apply in the precedence order as long as they've not produced
a decision for generating a "break here" or no break here". This is
especially important for rules that generate only a replacement, that are
executed in the displayed order. because multiple rules may have their left
side member match simultaneously.

You have to read them as if this was a:

if (condition1) then (replacement1)
else if (condition2) then (replacement2)
else if (condition3) then (replacement3)
...
else if (conditionnN) then (replacementN)

The order of conditions (i.e. the order of rules) is significant when
several one may be true simultaneously.

Then when handling the replacement, of course you restart from the
begining. But what happens on the input stream is very different if it
contains a "break here" or "no break here" (e.g. rule WB3c), or not (e.g.
rule WB4): in the first case, the substitution will not advance the input
stream, it just transforms it (it changes the internal parser state only),
in the second case, the state is transformed but all elements in the put
stream before the "break here" or no-break here" are discarded from the
input stream, leaving only those on the left part of the "break
here"/"nobreak here".

The input state is a FIFO stack where each element contains:
  { a text buffer (or equivalently an index pointing to the relative end
position in the input stream buffer) cumulating all characters (or bytes)
from the input to which the WB class was assigned;
    a WB class (a small integer) to which this input string was mapped
  }
and the input strema buffer.

The automata processes each rule in the listed order: to see if a rule
match it just uses the seond element (the WB class) of elements starting
from those of the bottom of the stack.

If there's not enough elements in trhe FIFO stack to match a rule
completely (in "hungry" mode if that matching rule contains "*" or "+") it
will read additional bytes or a character from the input stream, to append
to the top of the input buffer until it can assign it a WB class, and that
element will just contain that character and that WB class that will be
pushed to the top of the FIFO stack.

When a tested rule matches one or more elements starting from the bottom of
the FIFO,
* the replacement will transform only these elements in the FIFO: all
characters in their internal text buffers are combined if needed if the
replacement reduces the number of WB class items, otherwise the WB class is
just replaced in the relevant element of the FIFO stack, but characters are
kept unchanged.
* Then if the replacement in that matched rule contains a "break here'" or
"no-break" item, all characters in the bottom of the FIFO up to that
position are output: they are popped out from the FIFO, but other items in
the FIFO are kept.

An automata can optimize this FIFO so that the set of rules (equivalent to
an ordered set of regexps) becomes a finite state automata. But as the set
of regexp is ordered, it is possible that from some input some common
prefix in multiple regexps will match simulteneously: their order is
significant.

This is more complex than in the initial specification of word breakers
where there was no "hungry" regexps and matching occured only on pairs of
characters, so that you did not need a FIFO (or the FIFO always contained a
single element, never more, and the text buffer in that element was reduced
to just one character or their encoded bytes): in that case there was still
a significant order or rules, so that only if multiple ones were potentialy
matching the input pair, their order in the specification determined their
precedence (in that case it was possibly to summarize the ordered set of
rules with a simple 2D lookup table).

But if you look at rule WB4: X (Extend | Format | ZWJ)*?X
(which is "hungry" and not bound in length, and which does not pop out any
characters from the input FIFO but still cumulate them in the input state
until it no longer matches longer inputs with "X (Extend | Format | ZWJ)*),
the simple 2D lookup table array approach does no work: it will match
partial input at the same time as other concurrent rules, but concurrent
rules must be ignored if their precedence is lower (because their rule
number is higher).

So the automata cannot be a finite-state automata whose state is
represented only by a single integer in a small bound set (the set of WB
class values).

Note also that the input stream is complemented with additional
pseudo-characters "sot" and "eot" surrounding it: the automata will be
initialized by pushing a {"", sot} element in the FIFO and when the end of
strem is reached, it will push a {"", eot} element to the FIFO. This is
needed for rules WB1 and WB2 (that have the highest precedence in the set
of regexps to match).

The last rule "WB999: Any ? Any" is not "hungry" but is equivalent to a
match-all pairs regexp "..", and because it is the last rule, it has the
lowest precedence: it will always match simultaneously with other rules
matching pairs, but will be ignored unless none of the previous rules match.

Not all rules are matching pairs (or longer sequences), notably not rules
WB3a, WB3b that match isolated newlines, but all other rules are matching
at least a pair of character, this means that rules WB3a and WB3b are in
fact those that have the highest precedence.

These rules not matching pairs are:
  WB3a: (Newline | CR | LF)?
  WB3b: ?(Newline | CR | LF)

They are in compact form but are equivalent to the expanded form showing
their replacement:
  WB3a: (Newline | CR | LF)?  ? sot
Effectively this is the only rule that matches a single character, all
other rules are matching pairs.

Rule WB999 will match "sot eot" and will discard "sot" from the FIFO,
leaving "eot" alone. ("Any eot" is matched in rule WB2). There's an
additional final (implicit) rule needed to match "eot" alone: it will
terminate the automata. So all other rules are considering at least one
pair and WB999 will match all of them.


2016-11-23 10:13 GMT+01:00 Tom Hacohen <tom at osg.samsung.com>:

> You said:
> > So ignore it and test whever the last symbols glues with ZWJ (it should,
> > so there's no break in the reference implementation).
>
> Which makes me think you misread the example I quoted. There is a break in
> the reference implementation, though I argue (like you just did) that there
> shouldn't be. So I think you agree with me and also think it's broken.
>
> Otherwise, I'm not sure I fully understand what you are saying, but if
> what you are saying is correct, then following the same logic, other rules
> would fail, specifically:
>
> ? 0061 ? 2060 ? 0030 ?  #  ? [0.2] LATIN SMALL LETTER A (ALetter) ? [4.0]
> WORD JOINER (Format_FE) ? [9.0] DIGIT ZERO (Numeric) ? [0.3]
>
> After the FE here there's no BREAK because:
> ALetter Format Numeric -> ALetter Numeric
> Which then following rule 9.0 is a no-break.
>
> This is exactly the rule (4) as described in my previous email, just with
> a different follow-up rule (9 instead of 3c). I don't see how rule
> precedence would matter here, as there is no case for which two rules apply.
>
> --
> Tom.
>
>
> On 23/11/16 02:49, Philippe Verdy wrote:
>
>> IMHO, the ZWJ should glue with the last symbol following your examples.
>> But the combining diaeresis following the ZWJ extends it (even if in my
>> opinion it is "defective" and would likely display on a dotted ciurcle
>> in renderers, but not defective for the string definition of combining
>> sequences).
>> So ignore it and test whever the last symbols glues with ZWJ (it should,
>> so there's no break in the reference implementation).
>>
>> WB4: X (Extend | Format | ZWJ)*?X
>>
>> Extend: [ExtendGrapheme_Extend=Yes]  This includes:
>>   General_Category = Nonspacing_Mark (this includes the combining
>> diaeresis)
>>   General_Category = Enclosing_Mark
>>   U+200C ZERO WIDTH NON-JOINER
>>   plus a few General_Category = Spacing_Mark needed for canonical
>> equivalence.
>>
>> So yes we have: ZWJ "COMBINING DIERESIS" (EBG|Glue_After_Zwj) ? ZWJ
>> (EBG|Glue_After_Zwj) from rule WB4 eliminate the combining mark from the
>> input queue
>>
>> But rule WB3c comes before and prohibits it:
>>
>> WB3c: ZWJ ? (Glue_After_Zwj | EBG)
>>
>> This means that you have first:
>>
>> ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG
>>
>> and this does not match the rule WB4 which is not matching for:
>>
>> X ? (Extend | Format | ZWJ)*?X
>>
>> (it cannot remove the extenders if there's a no-break before them, it is
>> valid only when the break oppotunity is still unspecified. As soon as a
>> rule as produced a "break here" or "nobreak here" at a given position,
>> you must advance after this position (the rules are based on a small
>> finite state machine). So after :
>>
>> ZWJ "COMBINING DIERESIS" GAZ ?  ZWJ ? "COMBINING DIERESIS" EBG
>>
>> it just remains in your input queue:
>>
>> "COMBINING DIERESIS" EBG  (because "ZWJ ?" is already processed, and so
>> ZWJ is elminated)
>>
>> Now comes WB4: X (Extend | Format | ZWJ)* ? X
>>
>> There's no more any "X" to match before the combining diaeresis: your
>> input queue starts by the combining diareasis matching "X", the
>> following character (EBG) does not match within "(Extend | Format |
>> ZWJ)*" (which matches an empty string and does not contain the combining
>> diaresis already matched in "X"), rule WB4 has then no replacement
>> effect and preserves the initial "X" (i.e. the combining diaeresis)
>>
>> .
>>
>>
>>
>>
>>
>>
>>
>> 2016-11-22 13:07 GMT+01:00 Tom Hacohen <tom at osg.samsung.com
>> <mailto:tom at osg.samsung.com>>:
>>
>>
>>     Dear,
>>
>>     I recently updated libunibreak[1] according to unicode 9.0.0. I
>>     thought I implemented it correctly, however it fails against two of
>>     the tests in the reference test data:
>>
>>     ? 200D ? 0308 ? 2764 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ? [4.0]
>>     COMBINING DIAERESIS (Extend_FE) ? [999.0] HEAVY BLACK HEART
>>     (Glue_After_Zwj) ? [0.3]
>>
>>     and
>>
>>     ? 200D ? 0308 ? 1F466 ? #  ? [0.2] ZERO WIDTH JOINER (ZWJ_FE) ?
>>     [4.0] COMBINING DIAERESIS (Extend_FE) ? [999.0] BOY (EBG) ? [0.3]
>>
>>
>>     More specifically, it fails in both after the "combining diaeresis".
>>     My implementation marks it as a break, whereas the test data as not.
>>     The reference implementation, as expected, agrees with the test data.
>>
>>
>>     However, looking at the test case and the UAX[2], this does not look
>>     correct. More specifically, because of rule 4:
>>     ZWJ Extended GAZ -> ZWJ GAZ
>>     And then according to rule 3c, there should be no break opportunity
>>     between them. The reference implementation, however, uses rule 999
>>     here, which I believe is incorrect.
>>
>>
>>     Am I missing anything, or is this an issue with the reference test
>>     data and reference implementation?
>>
>>     Thanks,
>>     Tom.
>>
>>     [1]: https://github.com/adah1972/libunibreak
>>     <https://github.com/adah1972/libunibreak>
>>     [2]: http://www.unicode.org/reports/tr29/#WB1
>>     <http://www.unicode.org/reports/tr29/#WB1>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161123/ff89f1c5/attachment.html>

From verdy_p at wanadoo.fr  Wed Nov 23 05:20:44 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 23 Nov 2016 12:20:44 +0100
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
 <012E41802C7842F386529FBA99969391@erratique.ch>
 <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
Message-ID: <CAGa7JC106jhM4dwkEqJx6nTRnt-Fjqkwep2heDS_SqRE8M_fMQ@mail.gmail.com>

2016-11-23 12:00 GMT+01:00 Tom Hacohen <tom at osg.samsung.com>:

>
> Also take another look at http://www.unicode.org/reports
> /tr29/#Grapheme_Cluster_and_Format_Rules specifically the table that
> shows another way of writing the ignore rule. This again shows my
> understanding of rule 4 is correct.
>
> Specially look at the following equivalence:
> X Y ? Z W       ?       X (Extend | Format)* Y (Extend | Format)* ? Z
> (Extend | Format)* W
>

This expansion does not occur before rule WB4; it cannot be used to
transform rules WB1 to WB3c; this is explicitly stated in the algorithm.
And because the rule WB3c handles your case, you are misinterpreting the
specs as if it was applying there too...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161123/c5f58725/attachment.html>

From tom at osg.samsung.com  Wed Nov 23 05:28:41 2016
From: tom at osg.samsung.com (Tom Hacohen)
Date: Wed, 23 Nov 2016 11:28:41 +0000
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <CAGa7JC106jhM4dwkEqJx6nTRnt-Fjqkwep2heDS_SqRE8M_fMQ@mail.gmail.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
 <012E41802C7842F386529FBA99969391@erratique.ch>
 <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
 <CAGa7JC106jhM4dwkEqJx6nTRnt-Fjqkwep2heDS_SqRE8M_fMQ@mail.gmail.com>
Message-ID: <941085bf-5c67-e4d4-9263-2d897fd8915b@osg.samsung.com>

On 23/11/16 11:20, Philippe Verdy wrote:
> 2016-11-23 12:00 GMT+01:00 Tom Hacohen <tom at osg.samsung.com
> <mailto:tom at osg.samsung.com>>:
>
>
>     Also take another look at
>     http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules
>     <http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules>
>     specifically the table that shows another way of writing the ignore
>     rule. This again shows my understanding of rule 4 is correct.
>
>     Specially look at the following equivalence:
>     X Y ? Z W       ?       X (Extend | Format)* Y (Extend | Format)* ?
>     Z (Extend | Format)* W
>
>
> This expansion does not occur before rule WB4; it cannot be used to
> transform rules WB1 to WB3c; this is explicitly stated in the algorithm.
> And because the rule WB3c handles your case, you are misinterpreting the
> specs as if it was applying there too...
>

I took a look at the ICU sources, and they explicitly mention this case, 
so it seems I was mistaken with interpreting the intention of the UAX. I 
still find it confusing, but based on this thread, it seems to just be me.

Sorry for the noise.

The comment from the ICU source code:
# Rule 3c   ZWJ x (Extended_Pict | EmojiNRK).  Precedes WB4, so no 
intervening Extend chars allowed.

Thanks for your help,
Tom

From daniel.buenzli at erratique.ch  Wed Nov 23 05:45:04 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Wed, 23 Nov 2016 12:45:04 +0100
Subject: Potential contradiction between the WordBreak test data and
 UAX #29
In-Reply-To: <941085bf-5c67-e4d4-9263-2d897fd8915b@osg.samsung.com>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
 <012E41802C7842F386529FBA99969391@erratique.ch>
 <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
 <CAGa7JC106jhM4dwkEqJx6nTRnt-Fjqkwep2heDS_SqRE8M_fMQ@mail.gmail.com>
 <941085bf-5c67-e4d4-9263-2d897fd8915b@osg.samsung.com>
Message-ID: <A6D5307A6C7748E7BB58C5DCC02E1C6C@erratique.ch>

On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote:
> I took a look at the ICU sources, and they explicitly mention this case,
> so it seems I was mistaken with interpreting the intention of the UAX. I 
> still find it confusing, but based on this thread, it seems to just be me.

It's not only you, I also sometimes get confused by it (see for example [1] and subsequent messages). Maybe the operational model could be clarified a bit. 

I also think it would be better if the UAX29 didn't use ignore rules at all, so that going from rules to implementation is more straightforward --- though I understand it may make the spec harder to maintain.

Best,

Daniel

[1] http://www.unicode.org/mail-arch/unicode-ml/y2016-m06/0088.html

From tom at osg.samsung.com  Wed Nov 23 06:04:30 2016
From: tom at osg.samsung.com (Tom Hacohen)
Date: Wed, 23 Nov 2016 12:04:30 +0000
Subject: Potential contradiction between the WordBreak test data and UAX
 #29
In-Reply-To: <A6D5307A6C7748E7BB58C5DCC02E1C6C@erratique.ch>
References: <ed2d35b3-ca26-a7fa-bdaa-50bbc23548c5@osg.samsung.com>
 <34DEC7A2F6EC43DD9766B06D8E558CD7@erratique.ch>
 <11941b77-414c-4831-f02a-179f6582a522@osg.samsung.com>
 <012E41802C7842F386529FBA99969391@erratique.ch>
 <fa2d8452-6ccf-8ea3-9ed6-28fde4fc4257@osg.samsung.com>
 <CAGa7JC106jhM4dwkEqJx6nTRnt-Fjqkwep2heDS_SqRE8M_fMQ@mail.gmail.com>
 <941085bf-5c67-e4d4-9263-2d897fd8915b@osg.samsung.com>
 <A6D5307A6C7748E7BB58C5DCC02E1C6C@erratique.ch>
Message-ID: <ea03ee6c-fae5-dbd9-f733-9d7292f112cd@osg.samsung.com>

On 23/11/16 11:45, Daniel B?nzli wrote:
> On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote:
>> I took a look at the ICU sources, and they explicitly mention this case,
>> so it seems I was mistaken with interpreting the intention of the UAX. I
>> still find it confusing, but based on this thread, it seems to just be me.
>
> It's not only you, I also sometimes get confused by it (see for example [1] and subsequent messages). Maybe the operational model could be clarified a bit.

The comment I quoted from the ICU sources clarifies the intention. Maybe 
a comment similar to one would be helpful?

Also, thinking about it a bit more, the operational order makes sense 
when you consider the CR LF case and extended characters, however it is 
still not obvious from the wording.

Thanks again.

--
Tom.


From everson at evertype.com  Wed Nov 23 07:13:02 2016
From: everson at evertype.com (Michael Everson)
Date: Wed, 23 Nov 2016 13:13:02 +0000
Subject: Line-Breaking Hyphenation
In-Reply-To: <20161123090511.1b691ece@JRWUBU2>
References: <20161123090511.1b691ece@JRWUBU2>
Message-ID: <C1373218-CED4-4622-881E-1CAE02309E92@evertype.com>

On 23 Nov 2016, at 09:05, Richard Wordingham <richard.wordingham at ntlworld.com> wrote:
> 
> What is 'line-breaking hyphenation'?  In particular, I am trying to determine the meaning of the TUS statement 'There is no line-breaking hyphenation' referring to the Lanna script at the end of TUS Section 16.7.

?inserting a visible hyphen at a line boundary?

Michael Everson

From jameskasskrv at gmail.com  Wed Nov 23 09:15:55 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Wed, 23 Nov 2016 07:15:55 -0800
Subject: Manatee emoji?
Message-ID: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>

http://patch.com/florida/southtampa/petition-drive-aims-raise-manatee-awareness-adorable-way

If enough people sign the petition, will Unicode add a manatee emoji?
And, how about wolverines and lemmings?  Are any petitions underway
for them?  How many signatures on a petition would be needed before
Unicode would consider adding a non-existent character to the
repertoire?

Best regards,

James Kass

From Shawn.Steele at microsoft.com  Wed Nov 23 10:38:56 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Wed, 23 Nov 2016 16:38:56 +0000
Subject: Manatee emoji?
In-Reply-To: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
Message-ID: <MWHPR03MB2813CFE86FCA9F03C06BE50882B70@MWHPR03MB2813.namprd03.prod.outlook.com>

I'm not sure I've ever heard of a "save the lemmings" campaign.

Considering how much effort Florida puts into protecting Manatees and their occurrence on signs, I'm actually sort of surprised there isn't already a Manatee emoji.  Had emoji been "invented" in Florida there certainly would've been one already!

-Shawn

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of James Kass
Sent: Wednesday, November 23, 2016 7:16 AM
To: Unicode Public <unicode at unicode.org>
Subject: Manatee emoji?

http://patch.com/florida/southtampa/petition-drive-aims-raise-manatee-awareness-adorable-way

If enough people sign the petition, will Unicode add a manatee emoji?
And, how about wolverines and lemmings?  Are any petitions underway for them?  How many signatures on a petition would be needed before Unicode would consider adding a non-existent character to the repertoire?

Best regards,

James Kass


From kenwhistler at att.net  Wed Nov 23 10:39:49 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Wed, 23 Nov 2016 08:39:49 -0800
Subject: Manatee emoji?
In-Reply-To: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
Message-ID: <4f5a2ef4-f815-fc02-3a39-73e51ac20f8d@att.net>

James,


On 11/23/2016 7:15 AM, James Kass wrote:
> How many signatures on a petition would be needed before
> Unicode would consider adding a non-existent character to the
> repertoire?

I would say somewhat more than zero (which could hardly be considered a 
petition) and less than 7,466,363,069 (current estimate of the world 
population).

BTW, from the selection factors page:

http://www.unicode.org/emoji/selection.html#Selection_Factors_Requested

"Petitions are only considered as possible indications of potential 
frequency of usage, among the other selection factors."

BTW, U+1F984 UNICORN FACE was a "non-existent character" for a 
non-existent animal before it made the selection review cut and was 
actually encoded as a new emoji. That doesn't mean, a priori, that it 
was a bad choice to encode. Nor did the existence or non-existence of a 
petition to encode this particular non-existent animal as an emoji 
character make much difference, anyway.

--Ken


From Shawn.Steele at microsoft.com  Wed Nov 23 10:59:57 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Wed, 23 Nov 2016 16:59:57 +0000
Subject: Manatee emoji?
In-Reply-To: <4f5a2ef4-f815-fc02-3a39-73e51ac20f8d@att.net>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
 <4f5a2ef4-f815-fc02-3a39-73e51ac20f8d@att.net>
Message-ID: <MWHPR03MB2813E7C934B8F9B691596D8282B70@MWHPR03MB2813.namprd03.prod.outlook.com>

Well, I'd suggest "more than one" as the lower limit since change.org counts the original person as #1 and Unicode'd probably want at least one other person to agree with them ;-)

If I knew how to draw a Manatee glyph, I'd propose it for them ;0)  

However preemptively proposing this emoji wouldn't help address their concern of "raising awareness."  Their change.org petition is probably doing at least as much to raise awareness as encoding an emoji without any hubbub would be.  To help raise the most awareness, Unicode should probably deny it a few times so that they can raise awareness even more.  (I'm joking about the last in case that wasn't obvious).  But, more seriously, it's a fair point and we shouldn't use their Manatee proposal to try to preemptively encode emoji for other similar scenarios.  Let them petition for each one.

*I* personally would find a Manatee emoji more useful than many of the other ones that are already encoded.  That said, I've never missed having it in the repertoire (until now).

Encoding glyphs for all fauna (& flora) obviously can't happen though.  I wonder where the line is?  Waiting for petitions seems like a reasonable gating factor, at least until that proves problematic somehow.

-Shawn

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Ken Whistler
Sent: Wednesday, November 23, 2016 8:40 AM
To: James Kass <jameskasskrv at gmail.com>
Cc: unicode at unicode.org
Subject: Re: Manatee emoji?

James,


On 11/23/2016 7:15 AM, James Kass wrote:
> How many signatures on a petition would be needed before Unicode would 
> consider adding a non-existent character to the repertoire?

I would say somewhat more than zero (which could hardly be considered a
petition) and less than 7,466,363,069 (current estimate of the world population).

BTW, from the selection factors page:

http://www.unicode.org/emoji/selection.html#Selection_Factors_Requested

"Petitions are only considered as possible indications of potential frequency of usage, among the other selection factors."

BTW, U+1F984 UNICORN FACE was a "non-existent character" for a non-existent animal before it made the selection review cut and was actually encoded as a new emoji. That doesn't mean, a priori, that it was a bad choice to encode. Nor did the existence or non-existence of a petition to encode this particular non-existent animal as an emoji character make much difference, anyway.

--Ken


From andrewcwest at gmail.com  Wed Nov 23 12:47:51 2016
From: andrewcwest at gmail.com (Andrew West)
Date: Wed, 23 Nov 2016 18:47:51 +0000
Subject: Manatee emoji?
In-Reply-To: <4f5a2ef4-f815-fc02-3a39-73e51ac20f8d@att.net>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
 <4f5a2ef4-f815-fc02-3a39-73e51ac20f8d@att.net>
Message-ID: <CALgEMhxS=j4OQmsra28yyeLHyem3kk_Pmf1y+tkwaE05LxUG=w@mail.gmail.com>

On 23 November 2016 at 16:39, Ken Whistler <kenwhistler at att.net> wrote:
> On 11/23/2016 7:15 AM, James Kass wrote:
>>
>> How many signatures on a petition would be needed before
>> Unicode would consider adding a non-existent character to the
>> repertoire?
>
> I would say somewhat more than zero (which could hardly be considered a
> petition) and less than 7,466,363,069 (current estimate of the world
> population).

Well, based on http://www.unicode.org/L2/L2016/16295r-animal-emoji.pdf
I would say between 4,737 and 6,941.

Andrew

From leoboiko at namakajiri.net  Wed Nov 23 13:10:03 2016
From: leoboiko at namakajiri.net (Leonardo Boiko)
Date: Wed, 23 Nov 2016 17:10:03 -0200
Subject: Manatee emoji?
In-Reply-To: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
Message-ID: <CAJ6uix6nUu2AcR5baGbRk3igbJFsSjFu1uVbKoSB_hzAP5fNOQ@mail.gmail.com>

I support the creation of manatee emoji, but only if it?s accompanied
by a new modifier for emoji size, coming in the varieties: TINY,
SMALL, LARGE, HUGE.

This would allow us to say "oh, the [HUGE MANATEE]" in emoji.

2016-11-23 13:15 GMT-02:00 James Kass <jameskasskrv at gmail.com>:
> http://patch.com/florida/southtampa/petition-drive-aims-raise-manatee-awareness-adorable-way
>
> If enough people sign the petition, will Unicode add a manatee emoji?
> And, how about wolverines and lemmings?  Are any petitions underway
> for them?  How many signatures on a petition would be needed before
> Unicode would consider adding a non-existent character to the
> repertoire?
>
> Best regards,
>
> James Kass


From doug at ewellic.org  Wed Nov 23 13:44:58 2016
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 23 Nov 2016 12:44:58 -0700
Subject: Manatee =?UTF-8?Q?emoji=3F?=
Message-ID: <20161123124458.665a7a7059d7ee80bb4d670165c8327d.7ac8a1b9e0.wbe@email03.godaddy.com>

Leonardo Boiko wrote:

> I support the creation of manatee emoji, but only if it?s accompanied
> by a new modifier for emoji size, coming in the varieties: TINY,
> SMALL, LARGE, HUGE.
>
> This would allow us to say "oh, the [HUGE MANATEE]" in emoji.

Leonardo immediately wins the award for best sort-of-Unicode-related pun
ever. Just retire the trophy now.

But I am expecting a full array of modifiers and ZWJ sequences, to meet
the user need for a female factory-worker manatee with dark skin and red
hair, or families of manatees with arbitrary combinations of attributes.

 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From christoph.paeper at crissov.de  Wed Nov 23 15:30:08 2016
From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=)
Date: Wed, 23 Nov 2016 22:30:08 +0100
Subject: Manatee emoji?
In-Reply-To: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
Message-ID: <ACF7B20B-688A-408D-BEDB-FC2853568D3E@crissov.de>

James Kass <jameskasskrv at gmail.com>:
> 
> And, how about [other emoji]?  Are any petitions underway for them?

For what it?s worth, several weeks ago (before UTC149), I collected all emoji petitions I could find online (and that were in languages I can at least somewhat decipher). I?m excluding anything moot added in or before Unicode 9.0 and Emoji 4.0, but am including current candidate emoji in the list below (Markdown format). In some cases, I think, it?s at least as valuable to see how many people are proposing some emoji character independently than how many co-sign a single public petition.

Emoticons, Actions, People, Body and Clothing/Fashion Emojis
============================================================

- [Itching](http://www.ipetitions.com/petition/demand-an-itching-emoji)

?? Emoticon Faces
----------------

- [Grimacing face with smiling eyes](https://www.change.org/p/apple-change-the-grinning-emoji-back-to-how-it-was-in-ios-9)
- [Face wearing makeup](https://www.change.org/p/kik-team-help-kik-com-http-www-kik-com-contact-kik-interactive-inc-this-drawing-needs-to-be-a-real-kik-emoji)

Puke Emoji, Vomit Emoji, Barf Emoji Disgust Emoji, Sick Emoji
-------------------------------------------------------------

[FACE WITH OPEN MOUTH VOMITING](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f92e)

- [Face vomiting](https://www.change.org/p/you-vomit-emoji)
- [Face vomiting](https://www.change.org/p/facebook-add-puke-or-ill-to-available-reactions-for-posts)
- [Face vomiting](https://www.change.org/p/mark-zuckerberg-add-the-barf-disgust-reaction-to-facebook)
- [Face vomiting](https://www.change.org/p/apple-an-emoji-symbolizing-someone-throwing-up)
- [Face vomiting](https://www.openpetition.de/petition/online/wir-moechten-ein-kotzendes-smiley-bei-whatsapp)

?? Professions, Roles, Costumes, Clich?s Emoji
----------------------------------------------

- [Emo](https://www.change.org/p/emoji-makers-emo-emoji)
- [Ninja](https://www.change.org/p/apple-create-a-ninja-emoji)
- [Pet lover](https://www.change.org/p/mark-davis-queremos-el-petloveremoji-we-want-a-petloveremoji)
- [Alien laughing](https://www.change.org/p/facebook-facebook-needs-a-alien-laughing-emoji)
- [Bachelor, Bachelorette etc.](https://www.change.org/p/apple-lets-make-these-hen-and-stag-emojis-happen)
- [Rabbi](http://www.thepetitionsite.com/148/741/454/make-a-rabbi-emoji/)
- [Viking](http://www.ipetitions.com/petition/apple-needs-to-add-a-viking-emoji)
- [Stoner](http://www.ipetitions.com/petition/stoner-emojis)

[MAGE](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f9d9)

- [Wizard](https://www.change.org/p/the-get-a-wizard-emoji-for-skype)

### Fandom Emoji

- [Fanboy and Fangirl](https://www.change.org/p/google-fangirl-fanboy-emoji)
- [Fangirl](http://www.ipetitions.com/petition/fangirl-emoji)
- [Fandom](http://www.ipetitions.com/petition/fandom-emoji-added-to-apples-emojis)

?? Hair and Skin Colors, Ethnicity
----------------------------------

- [Curly hair](https://www.change.org/p/unicode-consortium-there-should-be-curly-people-emojis)

### Redhead or Ginger and Freckles Emoji

- [Red hair](https://www.change.org/p/apple-redheads-should-have-emoji-too)
- [Red hair](https://www.change.org/p/apple-redhead-emoji-ae5c74fe-1429-4e72-a835-2508d189132c)
- [Red hair](https://www.change.org/p/apple-redhead-emoji-684224ac-e260-4f2d-92ff-8653323d5675)
- [Red hair](https://www.change.org/p/apple-a-red-head-emoji)
- [Red hair](https://www.change.org/p/apple-a-ginger-hair-emoji-girl)
- [Red hair](https://www.change.org/p/apple-make-apple-create-an-emoji-of-a-person-with-ginger-hair)
- [Red hair](https://www.change.org/p/apple-make-red-haired-emoji-s-happen)
- [Red hair](https://www.change.org/p/apple-redhead-representation-f8628a12-1d26-475c-bb8d-20ee166854c1)
- [Red hair](https://www.change.org/p/apple-fighting-for-red-headed-emojis)
- [Red hair](https://www.change.org/p/apple-justice-for-gingers-with-a-ginger-emoji)
- [Red hair](https://www.change.org/p/us-redhead-emojis)
- [Red hair](http://www.thepetitionsite.com/258/258/593/apple-needs-a-ginger-emoji/)
- [Red hair](http://www.gopetition.com/petitions/redhead-emoji-needed.html)
- [Red hair](http://www.ipetitions.com/petition/red-haired-emoji)
- [Red hair](http://www.ipetitions.com/petition/redheads-should-have-emojitoo)
- [Red hair](http://www.ipetitions.com/petition/things-the-world-needs)
- [Red hair](http://www.petitions24.com/we_want_redhead_emojis)

?? Body Part Emojis
-------------------

- [Leg](https://www.change.org/p/apple-to-make-a-leg-emoji)
- [Kidney](https://www.change.org/p/apple-samsung-android-kidney-emoji)
- [Vagina](http://www.ipetitions.com/petition/add-a-vagina-emoji) (seems to have been deleted from site)

[BRAIN](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f9e0)

- [Brain](https://www.change.org/p/whatsapp-volem-l-emoticona-d-un-cervell-a-whatsapp)

Beard and Mustache Emoji
------------------------

[BEARDED PERSON](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f9d4)

- [Beard](https://www.change.org/p/unicode-please-release-a-beardemoji
- [Beard](https://www.change.org/p/the-unicode-consortium-apple-create-a-beard-emoji)
- [not mustache](https://www.change.org/p/apple-new-beard-emoji)
- [Beards](https://www.change.org/p/unicode-apple-blackberry-google-microsoft-emojis-need-beards-too)
- [Bearded person and bald person](https://www.change.org/p/whatsapp-queremos-un-emoji-de-whatsapp-calvo-y-barbudo
- [Beard](http://www.beardemoji.com
- [Bearded person](http://www.ipetitions.com/petition/we-want-a-bearded-emoji

Health and Illness Emoji
------------------------

- [Tumor](http://www.ipetitions.com/petition/facebook-should-add-a-tumor-emoji-to-emojis)

?? Headwear and Hats Emoji
--------------------------

- [Person wearing sombrero](https://www.change.org/p/google-inc-mexican-wearing-sombrero-emoji)

### Fedora Emoji

- [Fedora](https://www.change.org/p/my-friend-justin-approve-this-fedora-emoji)
- [Fedora](https://www.change.org/p/president-of-the-united-states-obtain-a-fedora-emoji-for-all-mobile-devices)

### Headscarf or Hijab Emoji

[PERSON WITH HEADSCARF](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f9d5)

- [Hijab](https://www.change.org/p/apple-add-a-hijab-emoji-for-muslims)
- [Hijab](https://www.change.org/p/unicode-consortium-mark-davis-rachel-martin-i-want-the-hijab-emoji-i-want-diversity)
- [Hijab](http://www.ipetitions.com/petition/hijabi-emojis)

?? Footwear and Shoe Emoji
--------------------------

- [Ballet shoes](https://www.change.org/p/whats-app-create-a-ballerina-emoji-shoe-emojis-are-either-red-high-heels-sex-heavy-boots-masculine-or-frumpy-sandals)

### Socks and Stockings

[SOCKS](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f9e6)

- [Sock](https://www.change.org/p/apple-or-android-or-whoever-create-a-sock-emoji)
- [Sock](https://www.change.org/p/national-rifle-association-add-a-sock-emoji-to-iphone)

Waistwear and Belts
-------------------

- [Championship belt](https://www.change.org/p/apple-a-championship-belt-emoji-should-be-a-standard-emoji)

Clothing Emoji
--------------

- [Cardigan](https://www.change.org/p/tim-cook-apple-needs-a-party-cardigan-emoji)

Gestures and Poses Emoji
========================

- [Left-handed](https://www.change.org/p/apple-apple-make-a-left-handed-emoji) ? ?? U+1F58E (not an emoji yet)

?? Greeting or Salute Emoji
---------------------------

- [Tip of the hat](https://www.change.org/p/apple-add-a-tip-of-the-hat-emoji)

?? Two Fingers Emoji
-------------------

### Finger Gun Emoji

Hand with Thumb and Index Finger Extended, Pointing Sidewards

- [Finger gun](https://www.change.org/p/all-of-those-who-support-awkward-finger-guns-as-answers-to-all-questions-there-needs-to-be-a-finger-guns-emoji)
- [Finger gun](https://www.change.org/p/skype-make-a-finger-guns-emoji-on-skype)
- [Finger gun](http://www.ipetitions.com/petition/we-need-a-finger-guns-emoji)
- [Finger gun](http://www.petitions24.com/apple_give_us_a_finger_gun_emoji)

### Shaka Emoji

Hand with Thumb and Pinky Finger Extended, Pointing Sidewards

- [Shaka](https://www.change.org/p/the-unicode-consortium-let-shaka-be-in-emoji)
- [Shaka](https://www.change.org/p/apple-add-the-shaka-emoji)
- [Shaka](https://www.change.org/p/apple-apple-to-put-out-a-thumb-and-pinky-emoji)
- [Shaka](https://www.change.org/p/all-the-shakka-people-steph-saves-shakkas)
- [Shaka](https://www.change.org/p/apple-shakas-emoji-for-ios)
- [Shaka](http://www.ipetitions.com/petition/shaka-brah-emoji)

?? Three Fingers Emoji
----------------------

- [Scout sign: Index, Middle and Ring Fingers Extended](https://www.change.org/p/whatsapp-queremos-un-emoji-para-la-se?a-scout-i-we-want-an-emoji-of-the-scout-signal
- [Shocker: Index, Middle and Pinky Fingers Extended](https://www.change.org/p/apple-make-the-shocker-hand-an-emoji)

[I LOVE YOU HAND SIGN](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f91f)

- [ASL ILY: Thumb, Index and Pinky Fingers extended](https://www.change.org/p/unicode-consortium-we-want-the-i-love-you-asl-handshape-emoji)

Phan, Ladders Gesture Emoji
---------------------------
?

- [Ladders hand](https://www.change.org/p/apple-phan-ladders-emoji)
- [Ladders hand](https://www.change.org/p/apple-make-ladders-pinof-7-an-emoji)

?? Poses Emoji
--------------

- [Person with Hand under Chin](https://www.change.org/p/emoji-people-made-a-hand-under-chin-pose-emoji)

### Dab Emoji
Dab is probably more appropriately filed under Fad, though.

- [Dab](https://www.change.org/p/apple-dab-emote-for-whatsapp)
- [Dab](https://www.change.org/p/google-make-a-dab-emoji-46f1e448-7cfd-4359-bcad-78630bfbf55f)
- [Dab](https://www.change.org/p/emoji-a-dab-emoji-must-be-added-to-the-emoji-keyboard)
- [Dab](https://www.change.org/p/mark-zuckerberg-pour-la-cr?ation-d-un-emoji-qui-dab)
- [Dab](https://www.change.org/p/whatsapp-whatsapp-incluya-el-dab-smiley)
- [Dab](http://www.ipetitions.com/petition/make-a-dab-emoji)

?? Food Emojis
==============

- [Dip](https://www.change.org/p/apple-inc-make-an-onion-dip-emoji-2)
- [Brunch](https://www.change.org/p/lovers-join-the-campaign-to-make-a-brunching-emoji-happen)
- [Cheese curd](https://www.change.org/p/apple-people-for-an-iphone-cheese-curd-emoji)
- [Soup](https://www.change.org/p/apple-for-apple-co-to-make-a-soup-emoji) ?  ?? U+1F372 / ?? U+1F35C
- [Jam](https://www.change.org/p/the-creator-of-emojis-the-creation-of-the-jar-of-jam-emoji)
- [Corndog](https://www.change.org/p/apple-inc-we-need-corn-dog-emojis)
- [Lasagna](https://www.change.org/p/whatsapp-mark-zuckerberg-queremos-emoji-de-lasanha-no-whatsapp)
- [Sausage](http://www.ipetitions.com/petition/sausage-emoji) ? ?? U+1F32D
- [Dolma: grape/wine leaves](http://www.ipetitions.com/petition/add-a-dolma-emoji-to-ios)

?? Chicken Nuggets or Wings Emoji
---------------------------------

- [Chicken Nugget](https://www.change.org/p/to-provide-society-with-a-long-awaited-chicken-nugget-emoji)
- [Chicken Nugget](https://www.change.org/p/emoji-create-a-chicken-nugget-emoji)
- [Chicken Nugget](https://www.change.org/p/apple-apple-please-produce-a-chicken-nugget-emoji)
- [Chicken Nugget](https://www.change.org/p/apple-chicken-dildos-as-emojis-now)
- [Chicken Nugget](https://www.change.org/p/me-apple-to-make-a-chicken-nugget-emoji)
- [Chicken Nugget](http://www.ipetitions.com/petition/petition-for-a-chicken-nugget-emoji)

Waffle Emoji
------------

- [Waffle](https://www.change.org/p/apple-add-a-waffle-emoji)
- [Waffle](https://www.change.org/p/waffle-emoji-help-us-create-a-waffle-emoji-b3a4aef9-9508-4562-954b-98135c404320)

?? Pastries Emoji
-----------------

- [Dough](https://www.change.org/p/steve-jobs-apple-dough-quality)
- [Chocolate cake](https://www.change.org/p/apple-add-a-chocolate-cake-emoji)

[PIE](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f967)

- [Pie](https://www.change.org/p/unicode-consortium-create-a-pie-emoji-seriously)

### Muffin and Cupcake Emoji

- [Cupcake](https://www.change.org/p/unicode-consortium-we-need-a-cupcake-emoji-stat)
- [Cupcake](https://www.change.org/p/apple-cupcake-emoji)
- [Muffin](http://www.ipetitions.com/petition/we-want-a-muffin-emoji)
- [Muffin](http://www.ipetitions.com/petition/muffin-emoji-2)
- [Muffin](http://www.ipetitions.com/petition/muffin-emoji-3)
- [Muffin](http://www.ipetitions.com/petition/fabio-needs-a-muffin-emoji)

?? Bread Emoji
--------------

### Bagel Emoji

- [Bagel](http://www.ipetitions.com/petition/we-need-a-bagel-emoji)
- [Bacon Bagel](http://www.ipetitions.com/petition/bakel-emoji-needed) ? U+1F953 Bacon ??

### Garlic Bread Emoji

- [Garlic Bread](https://www.change.org/p/apple-garlic-bread-emoji)
- [Garlic Bread](https://www.change.org/p/garlic-bread-fanatics-garlic-bread-emoji)
- [Garlic Bread](https://www.change.org/p/google-make-garlic-bread-an-emoji-on-all-platforms)

### Sandwich or Sub Emoji

[SANDWICH](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f96a)

- [Smore](https://www.change.org/p/apple-create-a-s-more-emoji)
- [Meatball sub](https://www.change.org/p/apple-meatball-sub-emoji)
- [Croque Monsieur](https://www.change.org/p/emoji-pour-un-emoji-croque-monsieur)

Pretzel Emoji
-------------

[PRETZEL](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f968)

- [Pretzel](https://www.change.org/p/unicode-consortium-pretzel-emoji-the-perfect-twist)

Dumpling Emoji and similar Stuffed Pasta
----------------------------------------

[DUMPLING](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f95f)

- [Samosa](https://www.change.org/p/apple-samosa-emoji)
- [Dumpling](https://www.change.org/p/unicode-consortium-we-need-a-dumpling-emoji)
- [Ravioli](https://www.change.org/p/everyone-a-ravioli-emoji)
- [Pizza roll](https://www.change.org/p/dani-add-a-pizza-roll-emoji)
- [Dumpling](https://www.change.org/p/the-peopo-have-apple-add-a-dumpring-emoji)
- [Empanada](https://www.change.org/p/apple-a?adan-el-emoji-de-empanada)

?? Edible Fruit, Plants and Seeds
---------------------------------

- [Zucchini](https://www.change.org/p/apple-create-the-courgette-emoji)
- [Papaya](https://www.change.org/p/apple-make-a-papaya-emoji)
- [Guava](https://www.change.org/p/people-who-work-for-the-emoji-company-there-should-be-a-guava-emoji)
- [Grapefruit](http://www.ipetitions.com/petition/grapefruit-emoji-4-messenger)
- [Macadamia nut (and peanut)](http://www.thepetitionsite.com/204/444/591/demand-peanut-and-macadamia-nut-emoji-now/) ? U+1F95C Peanuts ??

[COCONUT](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f965)

- [Coconut](http://www.ipetitions.com/petition/coconut-emoji)

### Broccoli Emoji

[BROCCOLI](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f966)

- [Broccoli](https://www.change.org/p/apple-give-broccoli-the-emoji-it-deserves)
- [Broccoli](http://www.ipetitions.com/petition/officially-add-broccoli-as-an-emoji)

### Mango Emoji

- [Mango](https://www.change.org/p/apple-create-a-mango-emoji)
- [Mango](https://www.change.org/p/jamal-geeemawl-create-a-mango-emoji)

### (Baked) Bean Emoji

- [Bean](https://www.change.org/p/apple-bean-emoji-bae32ca4-3545-494d-a9f5-284d876286a9)
- [Bean](https://www.change.org/p/apple-give-us-a-bean-emoji)
- [Bean](https://www.change.org/p/apple-get-apple-to-create-a-bean-emoji)
- [Baked beans](http://www.ipetitions.com/petition/petition-for-baked-bean-emoji)

### Garlic or Onion Emoji

- [Garlic](https://www.change.org/p/unicode-add-a-garlic-emoji-to-the-emoji-library-to-help-us-better-express-our-culinary-lives)

### Blueberry

- [Blueberry](https://www.change.org/p/donald-trump-blueberry-emoji)
- [Blueberry](http://www.ipetitions.com/petition/blueberry-emoji)

?? Beverages and Drinks
-----------------------

- [Mate](https://www.change.org/p/unicode-unicode-consortium-mate-chimarr?o-emoji)
- [White wine](https://www.change.org/p/unicode-create-a-white-wine-emoji) ? U+1F377 Wine Glass ??, U+1F347 Grapes ??
- [Porr?](https://www.change.org/p/jan-koum-volem-que-el-porr?-sigui-una-emoticona-de-whatsapp-porr?emoji) (Catalan wine glass)
- [Gin Tonic](https://www.change.org/p/jan-koum-queremos-el-emoji-de-gin-tonic-en-whatsapp-we-want-gin-tonic-emoji-in-whatsapp)
- [Raki (shot) glass](https://www.change.org/p/rak?-barda??-emojisi-istiyoruz-unicode)

Seasoning Emoji
---------------

- [Pepper and Carrot](http://www.ipetitions.com/petition/no-pepper-or-carrot-emoji) ? U+1F955 Carrot ??
- [Salt shaker, Mustard and Ketchup](https://www.change.org/p/you-petition-to-apple-to-add-salt-shaker-mustard-and-ketchup-emoji)

### Salt (Shaker) Emoji

- [Salt shaker](https://www.change.org/p/apple-i-want-to-bring-a-salt-emoji-to-the-emoji-keyboard-on-apple-devices)
- [Salt shaker](https://www.change.org/p/apple-help-us-get-apple-to-give-us-a-salt-shaker-emoji)
- [Salt](http://www.ipetitions.com/petition/make-a-salt-emoji)

Cooking Emoji
-------------

- [Kettle](http://www.ipetitions.com/petition/introduce-a-kettle-emoji-on-apple-phones)

Sports, Hobbies and Activities
==============================

?? Sport Emoji
--------------

- [Hula hoop](https://www.change.org/p/apple-make-a-hula-hooper-emoji)
- [Marching band](https://www.change.org/p/apple-recognition-for-marching-band-emoji)
- [Skateboard](https://www.change.org/p/apple-unicode-emoji-a-skateboard-skateboarder-emoji-should-be-included-amongst-the-many-other-sports)
- [Roller skates](https://www.change.org/p/unicode-consortium-unicode-consortium-give-us-roller-skates)
- [Australian Football](http://www.ipetitions.com/petition/bring-the-afl-football-emoji-to-life)

### Lacrosse Emoji

- [Lacrosse](https://www.change.org/p/apple-lacrosse-emoji-845dd849-b7b2-4615-add0-c965a4021923)
- [Lacrosse stick](https://www.change.org/p/apple-apple-needs-to-add-a-lacrosse-stick)
- [Lacrosse](http://www.ipetitions.com/petition/lacrosse-emoji)

### Frisbee Emoji

- [Frisbee](https://www.change.org/p/apple-apple-add-a-frisbee-emoji)
- [Frisbee](https://www.change.org/p/apple-please-make-a-frisbee-emoji)

### Softball

? U+26BE Baseball ??

- [Softball](https://www.change.org/p/apple-softball-needs-an-emoji-before-2020)
- [Softball](http://www.ipetitions.com/petition/softball-emoji-like-now)

### Gym Emoji

- [Ergometer](https://www.change.org/p/make-apple-have-an-erg-emoji)

?? Dance, Song and Music Emoji
-----------------------------

- [Vinyl record, LP](https://www.change.org/p/unicode-create-a-vinyl-emoji-for-music-lovers)
- [Bellydancer](https://www.change.org/p/snapchat-snapchat-bellydancer-emoji)
- [Ballet](https://www.change.org/p/whats-app-create-a-ballerina-emoji-shoe-emojis-are-either-red-high-heels-sex-heavy-boots-masculine-or-frumpy-sandals)

?? Activity
-----------

### Breastfeeding Emoji

[BREAST-FEEDING](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f931)

- [Breastfeeding](https://www.change.org/p/for-more-breastfeeding-on-the-world-we-wish-an-emoji-a-mum-breastfeeding-her-baby-por-un-emotic?n-pro-lactancia-materna)
- [Breastfeeding](https://www.change.org/p/apple-where-is-the-breastfeeding-emoji)

?? Machines, Tools and Objects
==============================

- [Passport](https://www.change.org/p/steve-dowling-vice-president-of-communications-apple-inc-create-a-passport-emoji)
- [Noose](https://www.change.org/p/mark-zuckerberg-noose-emoji-on-facebook)
- [Typewriter](https://www.change.org/p/apple-typewriter-emoji-for-the-ios)
- [Spork](https://www.change.org/p/apple-we-as-a-union-ad-people-need-a-spork-emoji-now)
- [Treasure chest](http://www.thepetitionsite.com/741/585/286/we-want-a-treasure-chest-emoji/)
- [Bucket](http://www.ipetitions.com/petition/bucket-emoji)

?? Gavel vs. Hammer Emoji
-------------------------

- [Gavel](https://www.change.org/p/apple-bring-back-the-gavel-emoji)
- [Gavel](https://www.change.org/p/apple-bring-back-the-gavel-emoji-3e238579-4d95-44ed-9e7b-74453ac2f56e)

?? Crafts Emoji
---------------

- [Sewing machine](https://www.change.org/p/http-www-emojifoundation-com-sewing-machine-emoji-emoji-machine-?-coudre)
- [Sewing](https://www.change.org/p/apple-create-craft-emoji-scissors-only-sew-unfair)

?? Musical Instrument Emoji
---------------------------

- [Euphonium](https://www.change.org/p/apple-samsung-make-a-euphonium-emoji)

### Flute Emoji

- [Flute](http://www.ipetitions.com/petition/flute-emoji)
- [Flute](http://www.petitions24.com/the_flute_emoji)

? Weapons Emoji
----------------

- [Lightsaber](https://www.change.org/p/the-unicode-consortium-facebook-apple-google-inc-google-htc-lightsaber-emojis-we-would-love-them-let-s-make-it-happen)

?? Vehicle Emoji
----------------

- [Caravan](https://www.change.org/p/apple-make-a-caravan-emoji)
- [Tank](https://www.change.org/p/tim-cook-necessitem-l-emojitanc-necesitamos-el-emojitanc-we-need-the-emojitanc-69e2bf66-62e6-4b17-a2a6-f7d83d4539bc)

?? Furniture Emoji
------------------

- [Magic Carpet](https://www.change.org/p/apple-help-motivate-apple-to-design-a-magic-carpet-emoji-think-of-the-possibilities)
- [Stool](https://www.change.org/p/android-pour-un-emoji-tabouret)
- [Pillow](https://www.change.org/p/a-internet-el-emoji-de-almohada)

Animal Emoji
============

- [Elephant with tusks](https://www.change.org/p/the-perfect-world-foundation-give-the-emoji-elephant-back-its-tusks)
- [Chameleon](http://www.ipetitions.com/petition/I-need-a-chameleon-emoji) ? ?? U+1F98E Lizard

?? Cat Emoji
------------

- [More cats](https://www.change.org/p/apple-more-cat-emoji-s)
- [Black cat](https://www.emojirequest.com/r/BlackCatEmoji)

?? Insects and Bugs Emoji
-------------------------

[CRICKET](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f997)

- [Crickets](https://www.change.org/p/facebook-new-crickets-emoji)

?? Birds Emoji
--------------

- [Seagull](https://www.change.org/p/apple-get-a-seagull-emoji)
- [Crying vulture](https://www.change.org/p/emoji-companies-there-should-be-a-crying-vulture-emoji)
- [Parakeet](https://www.change.org/p/computer-people-make-a-parakeet-emoji-for-a-girl-i-m-interested-in)

### Flamingo Emoji

- [Flamingo](https://www.change.org/p/apple-flamingo-emoji)
- [Flamingo](https://www.change.org/p/apple-add-a-flamingo-emoji)
- [Flamingo](https://www.sophiawebster.com/flamingo-emoji-petition)
- [Flamingo](https://www.emojirequest.com/r/FlamingoEmoji)

### Ostrich Emoji

- [Ostrich](https://www.change.org/p/instagram-create-an-ostrich-emoji)

### Swan Emoji

- [Swan and Goose](https://www.change.org/p/michelle-obama-i-wany-a-goose-emoji)
- [Swan](https://www.change.org/p/unicode-consortium-unicode-consortium-p-o-box-391476-mountain-view-ca-94039-1476-u-s-a-whattsapp-hinzuf?gen-des-schwan-emojis-including-the-swan-emoji)

Dog Breeds Emoji
----------------

### Pug

- [Pug](https://www.change.org/p/everyone-make-a-sloth-and-pug-emoji)
- [Pug](http://www.ipetitions.com/petition/pugs-and-emojis)

### Shiba

- [Shibs](https://www.change.org/p/facebook-shibs-4-messenger)
- [Shiba and Husky](https://www.change.org/p/mark-zuckerberg-emojis-shiba-et-huksy-sur-facebook)

Ferret and Weasel Emoji
-----------------------

- [Ferret](https://www.change.org/p/shigetaka-kurita-add-ferret-emoji)
- [Ferret](https://www.change.org/p/make-a-ferret-emoji-happen)
- [Ferret](https://www.change.org/p/apple-ferret-emoji-please-unicode)

?? Lobster Emoji
----------------
? U+1F980

- [Lobster](https://www.change.org/p/unicode-a-lobster-emoji)
- [Lobster](https://www.change.org/p/we-have-a-crab-emoji-now-it-s-time-for-a-lobster-emoji)

Giraffe Emoji
-------------

[GIRAFFE FACE](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f992)

- [Giraffe](https://www.change.org/p/apple-microsoft-facebook-giraffe-emoji)
- [Giraffe](https://www.change.org/p/apple-apple-to-create-a-giraffe-emoji)
- [Giraffe](https://www.change.org/p/tim-cook-a-giraffe-emoji)
- [Giraffe](https://www.change.org/p/apple-make-a-giraffe-emoji-a5a0e90f-45c5-44dc-aa52-674a884dda45)
- [Giraffe](https://www.change.org/p/apple-apple-make-a-giraffe-emoji)
- [Giraffe](https://www.change.org/p/apple-petition-to-have-a-giraffe-emoji)
- [Giraffe](https://www.change.org/p/apple-get-apple-to-make-a-giraffe-emoji)
- [Giraffe](https://www.change.org/p/unicode-consortium-there-should-be-a-giraffe-emoji)
- [Giraffe](https://www.change.org/p/emoji-there-needs-to-be-a-giraffe-emoji-who-s-with-me)
- [Giraffe](https://www.change.org/p/whatsapp-inc-para-que-whatsapp-inlcuya-un-emoji-de-jirafa-en-la-secci?n-de-animales)
- [Giraffe](https://www.change.org/p/apple-we-want-giraffe-emoji-s)
- [Giraffe](http://www.thepetitionsite.com/701/400/453/make-a-giraffe-emoji-apple/)
- [Giraffe](http://www.ipetitions.com/petition/giraffe-emoji-2)
- [Giraffe](http://www.ipetitions.com/petition/giraffe-emoji-3)
- [Giraffe](http://www.ipetitions.com/petition/giraffe-emoji-4)
- [Giraffe](http://www.ipetitions.com/petition/help-create-the-giraffe-emoji)
- [Giraffe](http://www.ipetitions.com/petition/we-need-a-giraffe-emoji)
- [Giraffe Face](https://www.emojirequest.com/r/GiraffeFaceEmoji)

Hedgehog Emoji
--------------

[HEDGEHOG](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f994)

- [Hedgehog](http://www.ipetitions.com/petition/hedgehog-emoji)
- [Hedgehog](https://www.emojirequest.com/r/HedgehogFaceEmoji)

Zebra Emoji
-----------

[ZEBRA FACE](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f993)

- [ZEBRA FACE](https://www.emojirequest.com/r/ZebraFaceEmoji)

?? Dinosaurs Emoji
------------------

[SAUROPOD](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f995)

[T-REX](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f996)

- [Dinosaur](https://www.change.org/p/apple-dinosaur-emoji)
- [Dinosaur](https://www.change.org/p/apple-have-a-dinosaur-emoji)
- [Dinosaur](https://www.change.org/p/apple-let-s-get-a-dinosaur-emoji)
- [Dinosaur](https://www.change.org/p/apple-help-us-get-a-velociraptor-emoji)
- [Dinosaur](https://www.change.org/p/apple-apple-where-tf-is-the-dinosaur-emoji-tho)
- [Dinosaur](https://www.change.org/p/apple-let-s-get-a-dinosaur-emoji-on-apple-devices)
- [Dinosaur](https://www.change.org/p/apple-get-apple-to-give-us-a-dinosaur-emoji-06440e0b-d456-464c-a1e7-4d48e44d9f19)
- [Dinosaur](http://www.ipetitions.com/petition/dino-emoji)
- [Dinosaur](https://www.emojirequest.com/r/DinosaurEmoji)

Llama or Alpaka Emoji
---------------------

- [Llama](https://www.change.org/p/apple-llama-emoji-335d92b3-baf3-4447-99a9-a24ce3853db0)
- [Llama](https://www.change.org/p/apple-we-demand-apple-to-make-a-llama-emoji)
- [Alpaca](https://www.change.org/p/unicode-wir-fordern-einen-alpaka-emoji)
- [Llama](https://www.change.org/p/whatsapp-emoji-de-una-llama)
- [Llama](http://www.ipetitions.com/petition/llama-emoji)
- [Llama](http://www.ipetitions.com/petition/llama-emoji-2)
- [Alpaca](http://www.ipetitions.com/petition/alpaca-emoji-for-whatsapp)

Otter Emoji
-----------

- [Otter](https://www.change.org/p/apple-unicode-consortium-we-need-an-otter-emoji)
- [Otter](https://www.change.org/p/apple-let-s-get-otters-their-own-emoji)
- [Otter](https://www.change.org/p/whatsapp-bring-an-otter-to-whatsapp-emojis)

Manatee or Walrus Emoji
-----------------------

- [Manatee](https://www.change.org/p/apple-com-add-a-manatee-emoji)
- [Manatee](https://www.change.org/p/anyone-donald-trump-manatee-emoji)

?? Whale and Dolphin Emoji
--------------------------
? ?? U+1F40B, ?? U+1F433, ?? U+1F42C

- [Orca](https://www.change.org/p/apple-make-a-killer-whale-emoji-in-apple-s-emoji-board)

### Narwhal Emoji

- [Narwhal](https://www.change.org/p/unicode-consortium-a-petition-to-create-a-narwhal-emoji-proposal-to-unicode-consortrium)
- [Narwhal](https://www.change.org/p/michellebyang111-gmail-com-can-we-add-a-narwhal-emoji-in-the-new-ios)

Hippopotamus Emoji
------------------

- [Hippo](http://www.ipetitions.com/petition/HippoEmoji)

Kangaroo Emoji
--------------

- [Kangaroo](https://www.change.org/p/apple-add-a-kangaroo-emoji-to-the-iphone)
- [Kangaroo](https://www.change.org/p/where-s-our-kangaroo-emoji)

?? Polar Bear Emoji
-------------------

- [Polar bear](https://www.change.org/p/apple-inc-make-a-polar-bear-emoji)
- [Polar bear](http://www.ipetitions.com/petition/polar-bear-emoji)

?? Squirrel or Rodent Emoji
---------------------------
? U+1F43F

- [Squirrel](https://www.change.org/p/unicode-give-us-the-squirrel-emoji-now)

Opossum Emoji
-------------

- [Opossum](https://www.change.org/p/apple-possum-emoticon)

Raccoon Emoji
-------------

- [Raccoon](https://www.change.org/p/apple-have-apple-make-a-raccoon-emoji)
- [Raccoon](https://www.change.org/p/unicode-consortium-enough-is-enough-raccoons-need-equal-representation-in-the-emoji-keyboard-now)
- [Raccoon](http://www.ipetitions.com/petition/racoon-emojis-to-make-a-difference)
- [Raccoon](http://www.ipetitions.com/petition/raccoon-emojis)
- [Raccoon](http://www.ipetitions.com/petition/raccoon-emojis-for-freedom)
- [Raccoon](https://www.gopetition.com/petitions/raccoon-emoji.html)

(Honey) Badger Emoji
--------------------

- [(Hufflepuff) Badger](https://www.change.org/p/facebook-we-want-a-badger-emoji-for-hufflepuff-facebook-group-chats-and-we-want-it-now)
- [Honey Badger](http://www.gopetition.com/petitions/petition-to-make-honey-badger-emoji-on-fb-messenger.html)

Sloth Emoji
-----------

- [Sloth](https://www.change.org/p/apple-verizon-sprint-apple-should-make-a-sloth-emoji)
- [Sloth](https://www.change.org/p/steve-jobs-apple-apple-inc-make-a-sloth-emoji-for-the-ios-devices)
- [Sloth](https://www.change.org/p/everyone-make-a-sloth-and-pug-emoji)

?? Plants and Flowers
=====================
not for eating

Poppy
-----

- [Poppy](https://www.change.org/p/make-a-poppy-emoji-for-rememberance-day-petition)
- [Poppy](https://www.change.org/p/a-poppy-emoji-for-remembrance-day)

Recreational Drugs
------------------

- [Weed](https://www.change.org/p/apple-inc-add-a-weed-emoji-to-iphone-and-android-devices)
- [Blunt](https://www.change.org/p/apple-make-a-blunt-emoji-cafd1ce2-f01a-4f89-b0ea-5af7da25a018)
- [Marijuana](http://www.ipetitions.com/petition/create-a-marijuana-emoji)
- [Stoner](http://www.ipetitions.com/petition/stoner-emojis)

?? Flags
========

???? Countries of United Kingdom
-------------------------------

### Scotland/Alba or Saltire or St. Andrew Cross Flag

- [Scotland](https://www.change.org/p/nicola-sturgeon-saltire-flag-emoji)
- [Scotland](https://www.change.org/p/facebook-create-a-scottish-saltire-flag-emoticon-standrewsday)
- [Scotland](https://www.change.org/p/international-organisation-we-want-a-scotland-emoji-flag)
- [Scotland](http://www.ipetitions.com/petition/change-the-icon-on-facebook-for-burns-supper-from)
- [Scotland and Northern Ireland](http://www.petitions24.com/scotland_and_northern_ireland_to_get_an_emoji)

### Wales/Cymru or Dragon Flag

- [Wales](https://www.change.org/p/apple-get-apple-to-add-a-welsh-flag-emoji)
- [Wales](https://www.change.org/p/the-unicode-consortium-welsh-flag-emoji-appeal)
- [Wales](https://www.change.org/p/apple-i-really-want-apple-to-aknowledge-wales-as-a-country-and-give-us-our-flag-emoji)
- [Wales](https://www.change.org/p/kane-to-add-the-welsh-flag-to-emoji)
- [Wales](http://www.gopetition.com/petitions/get-the-welsh-flag-on-the-emoji-lists.html)
- [Wales](http://www.ipetitions.com/petition/welsh-flag-emoji-for-apple-emoji-keyboard)
- [Wales](http://www.ipetitions.com/petition/facebook-welsh-flag-emoji)

### Northern Ireland Flag

- [Northern Ireland](https://www.change.org/p/apple-why-isn-t-there-an-northern-ireland-flag-emoji-let-s-ensure-apple-knows-we-exist
- [Northern Ireland and Scotland](http://www.petitions24.com/scotland_and_northern_ireland_to_get_an_emoji

### England or St. George Cross Flag

- [England](https://www.gopetition.com/petitions/england-flag-emoji.html)

???? US States Flags
--------------------

- [Texas](https://www.change.org/p/ted-cruz-need-texas-emoji)
- [Confederate States of America]http://www.ipetitions.com/petition/have-apple-make-a-confederate-flag-emoji)

Natives and Aboriginals Flags
-----------------------------

- [Torres Strai Islander](https://www.change.org/p/shigetaka-kurita-the-aboriginal-torres-strait-islander-flag-emojis)
- [Aboriginal](https://www.change.org/p/apple-an-aboriginal-flag-emoji-needs-to-be-released)

Independence Movements Transnational
------------------------------------

- [No Israel](https://www.change.org/p/tim-cook-ceo-of-apple-justify-the-addition-of-the-israeli-flag-on-the-new-ios-emoji-keyboard)
- [Kurdistan](http://www.thepetitionsite.com/689/689/556/kurdish-flag-emoji/)

Independence Movements Intranational
------------------------------------

- [Oromo](https://www.change.org/p/apple-i-want-to-have-an-oromo-flag-emoji)
- [South Vietnam](https://www.change.org/p/apple-unicode-representation-vietnamese-heritage-and-freedom-flag-emoji)
- [Aramea](https://www.change.org/p/apple-inc-we-want-apple-to-add-the-syriac-aramean-flag-in-ios)
- [Sicily](http://www.thepetitionsite.com/781/027/921/making-the-sicilian-flag-an-emoji/)

? Equality, Diversity, Sexuality and Gender
===========================================

- [Transgender flag](https://www.change.org/p/unicode-add-transgender-pride-flag-emoji- [](https://www.change.org/p/obama-ya-mama-stop-emoji-racism

Gender Equality (New Versions of Existing Emojis)
-------------------------------------------------

- [Gender equality](https://www.change.org/p/apple-gender-equality-emojis-9a6b29ca-98f3-4bb9-bbb7-14ee6a50ff65
- [Gender equality](https://www.change.org/p/emojiquality
- [Runner](http://www.ipetitions.com/petition/athletic-running-woman-emoji)
- [Runner](http://www.ipetitions.com/petition/petition-for-female-runnig-emoji)
- [King](http://www.ipetitions.com/petition/King-Emoji) ? ?? U+1F934 Prince
- [](https://www.change.org/p/petici?n-para-por-el-derecho-de-las-mujeres-a-ir-solas-por-el-mundo

Femojis
-------

- https://www.change.org/p/femojis-uk-2
- https://www.change.org/p/femojis-uk
- https://www.change.org/p/femojis-fr
- https://www.change.org/p/femojis-it

Complexion and Ethniticy
------------------------

- [African-American](http://www.thepetitionsite.com/450/195/279/demand-apple-to-include-african-american-emoji/
- [African-American](http://www.ipetitions.com/petition/african-american-emojis)
- [Ethnicity](http://www.ipetitions.com/petition/represent-all-ethnicities-as-emojis)
- [](http://www.ipetitions.com/petition/love-doesnt-see-color-it-just-knows-how-to-mix)

?? Faith, Religion, Belief
-------------------------

- [Illuminati](http://www.ipetitions.com/petition/illuminati-emoji)
- [Pentagram](http://www.ipetitions.com/petition/make-a-pentagram-emoji)

### Khanda

? ? U+262C (Adi Shakti; Sikh symbol)

- [Khanda](https://www.change.org/p/apple-create-the-khanda-emoji)
- [Khanda](http://www.ipetitions.com/petition/khandaemoji)

Symbols, Signs and Icons
========================

- [Sponsored Content ?#Ad?](http://corp.izea.com/emoji/)
- [Planets (not symbols)](http://www.petitions24.com/planet_emojis)

? Hearts Emoji
--------------

### Orange Heart Emoji

[ORANGE HEART](http://unicode.org/emoji/charts-beta/emoji-candidates.html#1f9e1)

- [Orange heart](https://www.change.org/p/apple-make-an-orange-heart-emoji)
- [Orange heart](https://www.change.org/p/apple-make-apple-and-android-create-an-orange-heart)
- [Orange heart](http://www.ipetitions.com/petition/add-an-orange-heart-emoji-to-apples-emoji)
- [Orange heart](http://www.ipetitions.com/petition/heart-rainbow)

?? Money Emoji
--------------

- [Bitcoin](http://www.ipetitions.com/petition/bitcoin-emoji)

# New and unsorted as of 14 Nov 2016

- https://www.change.org/p/door-hinge-memes-petition-to-have-facebook-font-makers-add-a-door-hinge-emoji-to-their-pictograph-library
- https://www.change.org/p/green-balloon-emoji-for-glenn
- https://www.change.org/p/apple-get-apple-to-make-a-match-emoji
- https://www.change.org/p/zoe-taylor-make-the-eggplant-emoji-respectable-again
- https://www.change.org/p/unicode-dat-boi-emoji-added-to-uni-code-character-set
- https://www.change.org/p/facebook-add-praying-emoticon-to-facebook-like


From mark at kli.org  Wed Nov 23 17:59:17 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Wed, 23 Nov 2016 18:59:17 -0500
Subject: Manatee emoji?
In-Reply-To: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
Message-ID: <9b7010bf-ccf9-69a3-4d06-64db9b8faad1@kli.org>

On 11/23/2016 10:15 AM, James Kass wrote:
> http://patch.com/florida/southtampa/petition-drive-aims-raise-manatee-awareness-adorable-way
>
> If enough people sign the petition, will Unicode add a manatee emoji?
> And, how about wolverines and lemmings?  Are any petitions underway
> for them?  How many signatures on a petition would be needed before
> Unicode would consider adding a non-existent character to the
> repertoire?
Aren't many emoji "non-existent[sic]" characters prior to their adoption?

~mark

From Shawn.Steele at microsoft.com  Wed Nov 23 18:13:07 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Thu, 24 Nov 2016 00:13:07 +0000
Subject: Manatee emoji?
In-Reply-To: <9b7010bf-ccf9-69a3-4d06-64db9b8faad1@kli.org>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
 <9b7010bf-ccf9-69a3-4d06-64db9b8faad1@kli.org>
Message-ID: <MWHPR03MB2813899CB9FCE65D76270DDA82B60@MWHPR03MB2813.namprd03.prod.outlook.com>

Short answer: not really :)

Most of (particularly the initial batch) of emoji were used in other contexts before Unicode.  Most notably the Japanese mobile telephone companies added them.  They also differentiated between carriers by types and features of supported characters.  That led to incompatibilities between the companies and an incentive to standardize them in Unicode.

Since then, there are other systems that provide emoji outside of Unicode or other mechanisms.  (Like gifs or special codes in messaging software).  So they keep evolving.  

I'm probably skipping other ways these shapes get created that evolve into Unicode emoji.

-Shawn

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark E. Shoulson
Sent: Wednesday, November 23, 2016 3:59 PM
To: unicode at unicode.org
Subject: Re: Manatee emoji?

On 11/23/2016 10:15 AM, James Kass wrote:
> http://patch.com/florida/southtampa/petition-drive-aims-raise-manatee-
> awareness-adorable-way
>
> If enough people sign the petition, will Unicode add a manatee emoji?
> And, how about wolverines and lemmings?  Are any petitions underway 
> for them?  How many signatures on a petition would be needed before 
> Unicode would consider adding a non-existent character to the 
> repertoire?
Aren't many emoji "non-existent[sic]" characters prior to their adoption?

~mark


From zelpahd at gmail.com  Thu Nov 24 02:23:51 2016
From: zelpahd at gmail.com (zelpa)
Date: Thu, 24 Nov 2016 19:23:51 +1100
Subject: Manatee emoji?
In-Reply-To: <ACF7B20B-688A-408D-BEDB-FC2853568D3E@crissov.de>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
 <ACF7B20B-688A-408D-BEDB-FC2853568D3E@crissov.de>
Message-ID: <CAFcYzHkhM4-gLyMx-N_58OAmEiErXNoiVePzx0601RBsjWaNeQ@mail.gmail.com>

On Thu, Nov 24, 2016 at 8:30 AM, Christoph P?per <
christoph.paeper at crissov.de> wrote:

> James Kass <jameskasskrv at gmail.com>:
> >
> > And, how about [other emoji]?  Are any petitions underway for them?
>
> For what it?s worth, several weeks ago (before UTC149), I collected all
> emoji petitions I could find online (and that were in languages I can at
> least somewhat decipher). I?m excluding anything moot added in or before
> Unicode 9.0 and Emoji 4.0, but am including current candidate emoji in the
> list below (Markdown format). In some cases, I think, it?s at least as
> valuable to see how many people are proposing some emoji character
> independently than how many co-sign a single public petition.
>
> ### Finger Gun Emoji
>
> Hand with Thumb and Index Finger Extended, Pointing Sidewards
>
> - [Finger gun](https://www.change.org/p/all-of-those-who-support-
> awkward-finger-guns-as-answers-to-all-questions-
> there-needs-to-be-a-finger-guns-emoji)
> - [Finger gun](https://www.change.org/p/skype-make-a-finger-guns-
> emoji-on-skype)
> - [Finger gun](http://www.ipetitions.com/petition/we-need-a-finger-
> guns-emoji)
> - [Finger gun](http://www.petitions24.com/apple_give_us_a_finger_gun_emoji
> )
>

 Wow I don't think I realised how much I've wanted a finger gun emoji until
this point. Might consider writing up a proper proposal for it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161124/6f5ee368/attachment.html>

From wjgo_10009 at btinternet.com  Thu Nov 24 05:59:20 2016
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Thu, 24 Nov 2016 11:59:20 +0000 (GMT)
Subject: Manatee emoji?
In-Reply-To: <CAJ6uix6nUu2AcR5baGbRk3igbJFsSjFu1uVbKoSB_hzAP5fNOQ@mail.gmail.com>
References: <CABPY6Z14neq2hNOHSAGYoqM_VsW2OeHASmBuKtM+ya4ZhAzSLg@mail.gmail.com>
 <CAJ6uix6nUu2AcR5baGbRk3igbJFsSjFu1uVbKoSB_hzAP5fNOQ@mail.gmail.com>
Message-ID: <6371660.22581.1479988760419.JavaMail.defaultUser@defaultHost>

Leonardo Boiko wrote:

> I support the creation of manatee emoji, but only if it?s accompanied
by a new modifier for emoji size, coming in the varieties: TINY,
SMALL, LARGE, HUGE.

> This would allow us to say "oh, the [HUGE MANATEE]" in emoji.

I have produced some designs for tiny, small, large and huge and also for medium size.

The designs and some notes about how I produced them are in the following web page.

http://www.users.globalnet.co.uk/~ngo/abstract_emoji.htm

The web page is listed in the form of a diary and the designs are within some text headed Thursday 24 November 2016.

I have attached the designs to this post as well so that they will be conserved in the archive.

The design with the most purple in the upper right quadrant is for the adjective huge and the design with the most purple in the lower right quadrant is for the adjective tiny.

In use, the emoji of the adjective would be after the emoji of the noun in a piece of text.

William Overington

Thursday 24 November 2016

-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_tiny.png
Type: image/png
Size: 3032 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161124/23cbbf1a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_small.png
Type: image/png
Size: 3029 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161124/23cbbf1a/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_medium_size.png
Type: image/png
Size: 3064 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161124/23cbbf1a/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_large.png
Type: image/png
Size: 3065 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161124/23cbbf1a/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_huge.png
Type: image/png
Size: 3094 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161124/23cbbf1a/attachment-0004.png>

From richard.wordingham at ntlworld.com  Thu Nov 24 23:49:51 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 25 Nov 2016 05:49:51 +0000
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
 <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
Message-ID: <20161125054951.6a4825b6@JRWUBU2>

On Tue, 22 Nov 2016 02:47:10 +0100
Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> Look at where the Asian quotes are partially "moved" by the ASCII
> quotes in Chrome.

I presume this is referring to the attached file 00000007.fhmbobjniphfamjk.png.  There are two problems with using this example.
(1) The closing curved quote U+201D appears to have gone missing.
(2) The paragraph is a LTR paragraph.

Remember that the overall directionality of a paragraph can be determined by a "higher level protocol" rather than the content.  In the text shown in the attachment, a higher level protocol is specifying LTR - the leftmost text is 'ARABIC-ONE'.

Richard.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 00000007.fhmbobjniphfamjk.png
Type: image/png
Size: 1707 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20161125/01b2045b/attachment.png>

From jsbien at mimuw.edu.pl  Fri Nov 25 08:38:44 2016
From: jsbien at mimuw.edu.pl (Janusz S. =?utf-8?Q?Bie=C5=84?=)
Date: Fri, 25 Nov 2016 15:38:44 +0100
Subject: The usage of Z WITH STROKE
Message-ID: <86wpfrzksr.fsf@mimuw.edu.pl>


Hi!

There are two comments to the character(s) in the U0180 chart:

1. Pan-Turkic Latin orthography
2. handwritten variant of Latin ?z?

Ad 1.

Do I understand correctly that the Pan-Turkic Latin ortography
refers to the initiative described in the post to the Linguist list:

https://linguistlist.org/issues/4/4-187.html

If so, where to find more information about it? I found already another
post to the Linguist list

https://linguistlist.org/issues/5/5-739.html

but it contains only very general information.

Ad 2.

I'm curious how widespread, in time and space, is/was this
convention. Can you suggest to me where to search for this information?

Best regards

Janusz


-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/


From jknappen at web.de  Fri Nov 25 09:05:50 2016
From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=)
Date: Fri, 25 Nov 2016 16:05:50 +0100
Subject: Aw: The usage of Z WITH STROKE
In-Reply-To: <86wpfrzksr.fsf@mimuw.edu.pl>
References: <86wpfrzksr.fsf@mimuw.edu.pl>
Message-ID: <trinity-e5285e6a-0373-455e-8358-6c6f73717140-1480086350326@3capp-webde-bs25>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161125/947dd1a2/attachment.html>

From frederic.grosshans at gmail.com  Fri Nov 25 10:18:37 2016
From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=)
Date: Fri, 25 Nov 2016 17:18:37 +0100
Subject: The usage of Z WITH STROKE
In-Reply-To: <86wpfrzksr.fsf@mimuw.edu.pl>
References: <86wpfrzksr.fsf@mimuw.edu.pl>
Message-ID: <b1d6ca6c-012a-de8b-f869-829ef38ffe31@gmail.com>

Le 25/11/2016 ? 15:38, Janusz S. Bie? a ?crit :
> Hi!
>
> There are two comments to the character(s) in the U0180 chart:
>
> 1. Pan-Turkic Latin orthography
> 2. handwritten variant of Latin ?z?
>
> Ad 1.
>
> Do I understand correctly that the Pan-Turkic Latin ortography
> refers to the initiative described in the post to the Linguist list:
>
> https://linguistlist.org/issues/4/4-187.html
>
> If so, where to find more information about it? I found already another
> post to the Linguist list
>
> https://linguistlist.org/issues/5/5-739.html
>
> but it contains only very general information.
The use of Latin (vs Arabic or Cyrillic)  alphabets in Turkic languages 
has been a heavily political subject for the whole 20th century. You can 
find a lots of information of the pre-1991 situation in Mark Dickens? 
article ?Soviet Language Policy  in Central Asia? 
http://www.oxuscom.com/lang-policy.htm#alphabet . The end of USSR in 
1991 was the occasion of new reform, but some were cancelled, like for 
Tatar, since the only official alphabet allowed in Russia is Cyrillic 
(see https://en.wikipedia.org/wiki/Tatar_alphabet).

However, the modern (1990?s) turkic alphabets do not contain ? 
https://en.wikipedia.org/wiki/Common_Turkic_Alphabet . It was used for 
waht is know written with j in the 1930?s USSR?s uniform Turkic alphabet 
aka Ja?alif https://en.wikipedia.org/wiki/Ya%C3%B1alif.
The Wikipedia pages of Azerbaijani, Turkman, Crieman Tatar anad Usbek 
alphabets mention this historical use 
https://en.wikipedia.org/wiki/Azerbaijani_alphabet , 
https://en.wikipedia.org/wiki/Turkmen_alphabet , 
https://en.wikipedia.org/wiki/Crimean_Tatar_alphabet , 
https://en.wikipedia.org/wiki/Uzbek_alphabet .

This letter was also used for other orthographies : The 1931?41 Latin 
Mongolian orthography 
(https://en.wikipedia.org/wiki/Mongolian_Latin_alphabet), and a 1992 
Latin orthography used by secessionist Chechens
>
> Ad 2.
>
> I'm curious how widespread, in time and space, is/was this
> convention. Can you suggest to me where to search for this information?
I was told in elementary (French) school too write Z this way. I guess 
you should look at elementary schoolbooks for various languages, or 
since it?s a handwritten convention, on references about calligraphy 
and/or paleography.


From verdy_p at wanadoo.fr  Fri Nov 25 10:35:53 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 25 Nov 2016 17:35:53 +0100
Subject: The usage of Z WITH STROKE
In-Reply-To: <trinity-e5285e6a-0373-455e-8358-6c6f73717140-1480086350326@3capp-webde-bs25>
References: <86wpfrzksr.fsf@mimuw.edu.pl>
 <trinity-e5285e6a-0373-455e-8358-6c6f73717140-1480086350326@3capp-webde-bs25>
Message-ID: <CAGa7JC3oeVyMtXzQvhdK78+NpL-U3vbVsD4cNFzWBS2zNwvO6w@mail.gmail.com>

And the cursive form of uppercase Z also has a stroke to distinguish it
from the cursive form of uppercase L... So this is not just for maths.


2016-11-25 16:05 GMT+01:00 "J?rg Knappen" <jknappen at web.de>:

> Some anecdotal evidence:
>
> I was taught by my math teacher (Germany, 1970s) to stroke all  z's (upper
> or lowercase) in order to
> distinguish them from the digit "2"
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161125/4d5a299f/attachment.html>

From verdy_p at wanadoo.fr  Fri Nov 25 16:34:07 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 25 Nov 2016 23:34:07 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <20161125054951.6a4825b6@JRWUBU2>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
 <20161125054951.6a4825b6@JRWUBU2>
Message-ID: <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>

Initially my thread was really about Japanese in Arabic documents (or
Arabic paragraph), where Asian quotation marks were swapped (but not
mirrored), and where other Arabic contents had their own quotation marks
misplaced. The result was unreadable, including pairs of Arabic quotes with
empty content. The Japanese citation was broken, as well as the overall
Arabic one containing it.

And once again you're testing it in Firefox (which apparently uses its own
higher protocol): I said the problem occured in Chrome (which apparently
still does not use the updated Bidi algorithm).

This also brings a question about Asian quotes, that are not mirrorable but
still swapped by Bidi ! If they are not mirrorable, they should have a
strong LTR direction (like other kana or kanji characters).


2016-11-25 6:49 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> On Tue, 22 Nov 2016 02:47:10 +0100
> Philippe Verdy <verdy_p at wanadoo.fr> wrote:
>
> > Look at where the Asian quotes are partially "moved" by the ASCII
> > quotes in Chrome.
>
> I presume this is referring to the attached file
> 00000007.fhmbobjniphfamjk.png.  There are two problems with using this
> example.
> (1) The closing curved quote U+201D appears to have gone missing.
> (2) The paragraph is a LTR paragraph.
>
> Remember that the overall directionality of a paragraph can be determined
> by a "higher level protocol" rather than the content.  In the text shown in
> the attachment, a higher level protocol is specifying LTR - the leftmost
> text is 'ARABIC-ONE'.
>
> Richard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161125/a7502977/attachment.html>

From jsbien at mimuw.edu.pl  Sat Nov 26 00:20:55 2016
From: jsbien at mimuw.edu.pl (Janusz S. =?utf-8?Q?Bie=C5=84?=)
Date: Sat, 26 Nov 2016 07:20:55 +0100
Subject: The usage of Z WITH STROKE
In-Reply-To: <86wpfrzksr.fsf@mimuw.edu.pl> ("Janusz S. =?utf-8?Q?Bie=C5=84?=
 =?utf-8?Q?=22's?= message of "Fri, 25 Nov 2016 15:38:44 +0100")
References: <86wpfrzksr.fsf@mimuw.edu.pl>
Message-ID: <86polizrqw.fsf@mimuw.edu.pl>


Thanks for all the interesting asnwers. I will focus now on my first
question.

On Fri, Nov 25 2016 at 15:38 CET, jsbien at mimuw.edu.pl writes:
> Hi!
>
> There are two comments to the character(s) in the U0180 chart:
>
> 1. Pan-Turkic Latin orthography
> 2. handwritten variant of Latin ?z?
>
> Ad 1.
>
> Do I understand correctly that the Pan-Turkic Latin ortography
> refers to the initiative described in the post to the Linguist list:
>
> https://linguistlist.org/issues/4/4-187.html

[...]

The initiative was made in March 1993, the character appeared already in
Unicode 1.1.0 in June 1993. Do you think it is possible and/or probable
that the comment refers to the very initiative?

On Fri, Nov 25 2016 at 16:05 CET, jknappen at web.de writes:

[...]

> P.S. What pan-turkic orthography is concerned, there were also a lot
> of pan-turkic Latin alphabets in revolutionary
> Soviet Union (1920s) before Cyrillic alphabets were introduced in the
> Stalin era.
> P.P.S. You are certainly aware of this article:
> https://en.wikipedia.org/wiki/Z_with_stroke

On Fri, Nov 25 2016 at 17:18 CET, frederic.grosshans at gmail.com writes:

> The use of Latin (vs Arabic or Cyrillic)  alphabets in Turkic
> languages has been a heavily political subject for the whole 20th
> century. You can find a lots of information of the pre-1991 situation
> in Mark Dickens? article ?Soviet Language Policy  in Central Asia?
> http://www.oxuscom.com/lang-policy.htm#alphabet . The end of USSR in
> 1991 was the occasion of new reform, but some were cancelled, like for
> Tatar, since the only official alphabet allowed in Russia is Cyrillic
> (see https://en.wikipedia.org/wiki/Tatar_alphabet).
>
> However, the modern (1990?s) turkic alphabets do not contain ?
> https://en.wikipedia.org/wiki/Common_Turkic_Alphabet . It was used for
> waht is know written with j in the 1930?s USSR?s uniform Turkic
> alphabet aka Ja?alif https://en.wikipedia.org/wiki/Ya%C3%B1alif.
> The Wikipedia pages of Azerbaijani, Turkman, Crieman Tatar anad Usbek
> alphabets mention this historical use
> https://en.wikipedia.org/wiki/Azerbaijani_alphabet ,
> https://en.wikipedia.org/wiki/Turkmen_alphabet ,
> https://en.wikipedia.org/wiki/Crimean_Tatar_alphabet ,
> https://en.wikipedia.org/wiki/Uzbek_alphabet .
>
> This letter was also used for other orthographies : The 1931?41 Latin
> Mongolian orthography
> (https://en.wikipedia.org/wiki/Mongolian_Latin_alphabet), and a 1992
> Latin orthography used by secessionist Chechens

Thanks for all the information and the links (I was familiar with some
of them, but not all).

Now there is a follow-up question: why the character was included in
Unicode 1.1.0?  And there are also two other related questions:

1. Is there an easy way to check whether the character existed already
in pre-Unicode character sets? I'm aware about a difficult way,
i.e. browsing International Register of Coded Character Sets to be Used
with Escape Sequences.

2. Which characters codes were included in the Unicode round-trip test?
Was the list ever published somewhere? There used to be available the
files containing mappings from some legacy codes to Unicode, I can't
find them now. Perhaps the mappings where prepared just for the
round-trip codes?

Best regards

Janusz

-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/


From eliz at gnu.org  Sat Nov 26 01:10:14 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 26 Nov 2016 09:10:14 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>
 (message from Philippe Verdy on Fri, 25 Nov 2016 23:34:07 +0100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
 <20161125054951.6a4825b6@JRWUBU2>
 <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>
Message-ID: <83poli3eeh.fsf@gnu.org>

> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Fri, 25 Nov 2016 23:34:07 +0100
> Cc: unicode Unicode Discussion <unicode at unicode.org>
> 
> This also brings a question about Asian quotes, that are not mirrorable but still swapped by Bidi ! If they are
> not mirrorable, they should have a strong LTR direction (like other kana or kanji characters).

That's not how this stuff works in RTL locales.  It works by changing
the character produced by the keyboard keys assigned to these
characters, when the keyboard is configured for an RTL language.
E.g., a key labeled ? should produce ? when the current language is
RTL.  That's how this works with mirrored characters as well, because
when you type in an RTL language, you will press ) when you want an
opening parenthesis, since that's what you expect to see on display.

With a suitably configured keyboard (or input method, for that
matter), the problem you mention doesn't exist, and therefore there's
no relation between whether characters are swapped and whether they
are mirrored.

From verdy_p at wanadoo.fr  Sat Nov 26 02:25:16 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sat, 26 Nov 2016 09:25:16 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <83poli3eeh.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
 <20161125054951.6a4825b6@JRWUBU2>
 <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>
 <83poli3eeh.fsf@gnu.org>
Message-ID: <CAGa7JC0Pbkv+D-SU+oMOCfqdr5_StVAZY0FhVDdxcTwwE96WfQ@mail.gmail.com>

No, I was speaking at the encoding level. Even if your Arabic keyboard
displays a ")", and you type it, it will output/encode an open parenthesis
"(", that will then be mirrored to display a ")" glyph, matching your key
input.
The Bidi algorithm will still render it RTL (i.e. it will reorder it/"swap
it" so that it will render to the right of Arabic characters entered after
it. That encoded open parenthesis character is then both reordered and
rendered mirrored.
However with Asian parentheses in this context, they are also reordered...
but not mirrored when in fact they should be treated as strong LTR, and not
reordered (and not mirrored at all)

For Asian parentheses this is less a problem (you do not see the difference
if the two parentheses are already symetric) than with Asian square-angle
quotation marks: the effect of the absence of mirroring when swapping them
becomes evidently wrong: but they are still reordered ("swapped" visually)
as if they were Bidi-neutral, but as they are not symetric and not
mirrored, they are oriented the wrong way.

2016-11-26 8:10 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Fri, 25 Nov 2016 23:34:07 +0100
> > Cc: unicode Unicode Discussion <unicode at unicode.org>
> >
> > This also brings a question about Asian quotes, that are not mirrorable
> but still swapped by Bidi ! If they are
> > not mirrorable, they should have a strong LTR direction (like other kana
> or kanji characters).
>
> That's not how this stuff works in RTL locales.  It works by changing
> the character produced by the keyboard keys assigned to these
> characters, when the keyboard is configured for an RTL language.
> E.g., a key labeled ? should produce ? when the current language is
> RTL.  That's how this works with mirrored characters as well, because
> when you type in an RTL language, you will press ) when you want an
> opening parenthesis, since that's what you expect to see on display.
>
> With a suitably configured keyboard (or input method, for that
> matter), the problem you mention doesn't exist, and therefore there's
> no relation between whether characters are swapped and whether they
> are mirrored.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161126/ab19b6d3/attachment.html>

From eliz at gnu.org  Sat Nov 26 02:57:29 2016
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 26 Nov 2016 10:57:29 +0200
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC0Pbkv+D-SU+oMOCfqdr5_StVAZY0FhVDdxcTwwE96WfQ@mail.gmail.com>
 (message from Philippe Verdy on Sat, 26 Nov 2016 09:25:16 +0100)
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <7be20863-8c28-221f-d240-0cc5e9531352@simon-cozens.org>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
 <20161125054951.6a4825b6@JRWUBU2>
 <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>
 <83poli3eeh.fsf@gnu.org>
 <CAGa7JC0Pbkv+D-SU+oMOCfqdr5_StVAZY0FhVDdxcTwwE96WfQ@mail.gmail.com>
Message-ID: <837f7q39fq.fsf@gnu.org>

> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Sat, 26 Nov 2016 09:25:16 +0100
> Cc: Richard Wordingham <richard.wordingham at ntlworld.com>, 
> 	unicode Unicode Discussion <unicode at unicode.org>
> 
> No, I was speaking at the encoding level. Even if your Arabic keyboard displays a ")", and you type it, it will
> output/encode an open parenthesis "(", that will then be mirrored to display a ")" glyph, matching your key
> input.

Yes.

> The Bidi algorithm will still render it RTL (i.e. it will reorder it/"swap it" so that it will render to the right of Arabic
> characters entered after it. That encoded open parenthesis character is then both reordered and rendered
> mirrored.
> However with Asian parentheses in this context, they are also reordered... but not mirrored when in fact they
> should be treated as strong LTR, and not reordered (and not mirrored at all)

You were originally talking about quotes, not parentheses.  Which one
is it?  I responded to the quotes issue.

> For Asian parentheses this is less a problem (you do not see the difference if the two parentheses are already
> symetric) than with Asian square-angle quotation marks: the effect of the absence of mirroring when
> swapping them becomes evidently wrong: but they are still reordered ("swapped" visually) as if they were
> Bidi-neutral, but as they are not symetric and not mirrored, they are oriented the wrong way.

They will be effectively "mirrored" by the keyboard, as I described.

From richard.wordingham at ntlworld.com  Sun Nov 27 08:09:17 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sun, 27 Nov 2016 14:09:17 +0000
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <837f7q39fq.fsf@gnu.org>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org>
 <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
 <20161125054951.6a4825b6@JRWUBU2>
 <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>
 <83poli3eeh.fsf@gnu.org>
 <CAGa7JC0Pbkv+D-SU+oMOCfqdr5_StVAZY0FhVDdxcTwwE96WfQ@mail.gmail.com>
 <837f7q39fq.fsf@gnu.org>
Message-ID: <20161127140917.5cee547d@JRWUBU2>

On Sat, 26 Nov 2016 10:57:29 +0200
Eli Zaretskii <eliz at gnu.org> wrote:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Sat, 26 Nov 2016 09:25:16 +0100

> > For Asian parentheses this is less a problem (you do not see the
> > difference if the two parentheses are already symetric) than with
> > Asian square-angle quotation marks: the effect of the absence of
> > mirroring when swapping them becomes evidently wrong: but they are
> > still reordered ("swapped" visually) as if they were Bidi-neutral,
> > but as they are not symetric and not mirrored, they are oriented
> > the wrong way.  

They (U+300C LEFT CORNER BRACKET and U+300D RIGHT CORNER BRACKET)
are bidi-neutral (bidi class ON) and have bidi-mirroring, as you should
see from the nonsense string

????????

(0628 300C 0629 0630 0638 0638 300D 0629), whichever the paragraph-level
embedding.  Whether a top-left corner (?) should be mirrored to a
bottom-right corner (?) is a matter of taste, which will probably not
bother those who think that bidi-mirroring is a matter of character
substitution.  They are listed as a pair in both BidiBrackets.txt and
BidiMirroring.txt.

> They will be effectively "mirrored" by the keyboard, as I described.

Except that a visual keyboard for an RTL writing system is highly unlikely to
have U+300C and U+300D.

Richard.


From verdy_p at wanadoo.fr  Sun Nov 27 10:33:12 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 27 Nov 2016 17:33:12 +0100
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <20161127140917.5cee547d@JRWUBU2>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
 <20161125054951.6a4825b6@JRWUBU2>
 <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>
 <83poli3eeh.fsf@gnu.org>
 <CAGa7JC0Pbkv+D-SU+oMOCfqdr5_StVAZY0FhVDdxcTwwE96WfQ@mail.gmail.com>
 <837f7q39fq.fsf@gnu.org> <20161127140917.5cee547d@JRWUBU2>
Message-ID: <CAGa7JC2RM-C0p4Xs-pODftbh4sqUv3L+M-9+-sdDFHAJKXY_DA@mail.gmail.com>

2016-11-27 15:09 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

>
> > They will be effectively "mirrored" by the keyboard, as I described.
>
> Except that a visual keyboard for an RTL writing system is highly unlikely
> to
> have U+300C and U+300D.
>

I spoke about multilingula documents where you'll mix Japanese into Arabic
(or the reverse).

The keyboard capability does not matter at all because a "keyboard for an
RTL writing system" will also not support any one of the characters needed
for a Japanese citation (this is not just these two punctuation characters).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161127/1891cf93/attachment.html>

From kojiishi at gmail.com  Mon Nov 28 07:45:36 2016
From: kojiishi at gmail.com (Koji Ishii)
Date: Mon, 28 Nov 2016 22:45:36 +0900
Subject: Bidi: inserting Japanese paragraphs in Arabic/Farsi document
In-Reply-To: <CAGa7JC2RM-C0p4Xs-pODftbh4sqUv3L+M-9+-sdDFHAJKXY_DA@mail.gmail.com>
References: <CAGa7JC1EWaEknjV8Q22rhUdGk9qmo_N2_rJFudi3CkYOhGRL+A@mail.gmail.com>
 <83shqm9nkw.fsf@gnu.org>
 <CAGa7JC29mag74Z-_fHYHmus3y5FEows-ZSk8BAF8_tVQaqkHnw@mail.gmail.com>
 <83h9729kfw.fsf@gnu.org>
 <CAGa7JC2Z0Ci8-TcaLNvBdaE6WZDK=z2=TYQwsS9QGefKP4K_QA@mail.gmail.com>
 <834m329fqf.fsf@gnu.org> <cb7ee3a9-7998-43f8-f550-39faecd379b8@ix.netcom.com>
 <CAGa7JC3H9WhA4uc4W-Onjs3-jvRUJeP7xOg+qmpfVAip_3aF9Q@mail.gmail.com>
 <e5fcbb35-66fc-763d-456c-dbaf089e5df8@ix.netcom.com>
 <CAGa7JC2dAOU9pmTFRdSfg=FrHp8ce_5M4f4G94XhJL2Z=L6uYQ@mail.gmail.com>
 <CAGa7JC2s5N0nph-ftNnaqZ0OBLfV=kGYCogLm4g7y=mhfcGpvw@mail.gmail.com>
 <31c61d2a-8911-9503-139b-9497137e2dff@ix.netcom.com>
 <CAGa7JC2NOMafm-9A+LShFGwvutSbrdhz+M6DaDaD-=XpEDi98g@mail.gmail.com>
 <20161125054951.6a4825b6@JRWUBU2>
 <CAGa7JC0WRVgYh-cJoJEDdsrQRGuN0X0c-c1P5DzoaA_27U322Q@mail.gmail.com>
 <83poli3eeh.fsf@gnu.org>
 <CAGa7JC0Pbkv+D-SU+oMOCfqdr5_StVAZY0FhVDdxcTwwE96WfQ@mail.gmail.com>
 <837f7q39fq.fsf@gnu.org> <20161127140917.5cee547d@JRWUBU2>
 <CAGa7JC2RM-C0p4Xs-pODftbh4sqUv3L+M-9+-sdDFHAJKXY_DA@mail.gmail.com>
Message-ID: <CAN9ydbVFG+4MTLhoWt2Qox1C471VsBbTr3h-7+wFgFsfW6YOUw@mail.gmail.com>

Hi, I work on Chrome. I have to acknowledge that our implementation on UBA
6.3 is still not completed yet <http://crbug.com/242238>, nor the paired
brackets either <http://crbug.com/302469>. We're working on improving it.

It's not very clear to me whether this thread is discussing on paired
brackets (BD14-16) or mirrored glyphs (L4), I tried to reproduce but the
steps and expectations are not very clear to me, since my understanding on
UBA is still not high enough. But as long as it's an implementation issue,
we're happy to investigate further. It'd be great if you could provide
reproducing HTML at <http://crbug.com/new>.

/koji

2016-11-28 1:33 GMT+09:00 Philippe Verdy <verdy_p at wanadoo.fr>:

>
> 2016-11-27 15:09 GMT+01:00 Richard Wordingham <
> richard.wordingham at ntlworld.com>:
>
>>
>> > They will be effectively "mirrored" by the keyboard, as I described.
>>
>> Except that a visual keyboard for an RTL writing system is highly
>> unlikely to
>> have U+300C and U+300D.
>>
>
> I spoke about multilingula documents where you'll mix Japanese into Arabic
> (or the reverse).
>
> The keyboard capability does not matter at all because a "keyboard for an
> RTL writing system" will also not support any one of the characters needed
> for a Japanese citation (this is not just these two punctuation characters).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161128/013ac632/attachment.html>

From kenwhistler at att.net  Mon Nov 28 09:48:44 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Mon, 28 Nov 2016 07:48:44 -0800
Subject: The usage of Z WITH STROKE
In-Reply-To: <86polizrqw.fsf@mimuw.edu.pl>
References: <86wpfrzksr.fsf@mimuw.edu.pl> <86polizrqw.fsf@mimuw.edu.pl>
Message-ID: <c3ac59b0-1caa-0d58-cca5-92f7d99ecb59@att.net>


On 11/25/2016 10:20 PM, Janusz S. Bie? wrote:
> Now there is a follow-up question: why the character was included in
> Unicode 1.1.0?

Well, it was included in Unicode 1.1 because it was published in Unicode 
1.0 already. So that is the proximate reason.

That inevitably will raise the question, "Why was it included in Unicode 
1.0?"

Well, the proximate cause for that was the presence of z with stroke in 
the XCCS character set, which was the source for a lot of the early 
Unicode 1.0 repertoire. More precisely:

XCCS (= Xerox Character Code Standard) 1990 contained:

0x23 0x48 Azerbaijani capital letter Z
0x23 0x68 Azerbaijani small letter Z

So that also answers the next question, "Why was it included in XCCS?" 
Note that XCCS 1990 is the 2.0 version. The 1.0 version of XCCS was 
dated 1980. I don't have access to that one, so cannot tell for sure 
whether it contained the "character set 43_8 " content (i.e. the 0x23 .. 
character block) or not.

At any rate, see here:

https://en.wikipedia.org/wiki/Azerbaijani_alphabet

The additions from the XCCS "character set 43_8 " included the schwa, 
the gha, and the z-stroke from the old Azerbaijani Latin alphabet, 
documented there as in use from 1929 until 1939. And from XCCS, all of 
them made it into Unicode 1.0.

So that should pretty definitively answer the origin question for z with 
stroke.

> And there are also two other related questions:
>
> 1. Is there an easy way to check whether the character existed already
> in pre-Unicode character sets? I'm aware about a difficult way,
> i.e. browsing International Register of Coded Character Sets to be Used
> with Escape Sequences.

The International Register is *not* a particularly fruitful source. Much 
more of the Unicode 1.0 material actually came from corporate sets, 
including, but not limited to XCCS and the large collection of IBM code 
pages.

>
> 2. Which characters codes were included in the Unicode round-trip test?
> Was the list ever published somewhere? There used to be available the
> files containing mappings from some legacy codes to Unicode, I can't
> find them now. Perhaps the mappings where prepared just for the
> round-trip codes?

Currently maintained mappings (and some historic materials) are posted at:

http://www.unicode.org/Public/MAPPINGS/

For the really old mapping pertinent to the original decisions about 
inclusion in Unicode 1.0, the mapping data for East Asian were 
distributed in a 3.5" floppy diskette on request. Probably very hard to 
locate (or read) one of those now.

But you can refer to the *scanned* version of Chapter 6 of Unicode 1.0, 
which is available online. That was a printed copy of many of the 
cross-mapping tables to external standards. See:

http://www.unicode.org/versions/Unicode1.0.0/ch06.pdf

For the cross-mapping of the Unicode 1.0, Volume 2 unified CJK, that is 
also scanned and available online:

http://www.unicode.org/versions/Unicode1.0.0/HanCharts2.pdf

That table is known to have errors in it, so for CJK it should not be 
considered currently definitive in any meaningful way -- it is of 
historic interest.

--Ken

>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161128/93b25c5b/attachment.html>

From asmusf at ix.netcom.com  Mon Nov 28 10:30:22 2016
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Mon, 28 Nov 2016 08:30:22 -0800
Subject: Manatee emoji?
In-Reply-To: <20161123124458.665a7a7059d7ee80bb4d670165c8327d.7ac8a1b9e0.wbe@email03.godaddy.com>
References: <20161123124458.665a7a7059d7ee80bb4d670165c8327d.7ac8a1b9e0.wbe@email03.godaddy.com>
Message-ID: <1f211e28-4dc2-02c5-bf8d-732197dfecf1@ix.netcom.com>

On 11/23/2016 11:44 AM, Doug Ewell wrote:
> Leonardo Boiko wrote:
>
>> I support the creation of manatee emoji, but only if it?s accompanied
>> by a new modifier for emoji size, coming in the varieties: TINY,
>> SMALL, LARGE, HUGE.
>>
>> This would allow us to say "oh, the [HUGE MANATEE]" in emoji.
> Leonardo immediately wins the award for best sort-of-Unicode-related pun
> ever. Just retire the trophy now.
>
> But I am expecting a full array of modifiers and ZWJ sequences, to meet
> the user need for a female factory-worker manatee with dark skin and red
> hair, or families of manatees with arbitrary combinations of attributes.
>
>   
Manatee families are where it's at.

A./

From asmusf at ix.netcom.com  Mon Nov 28 10:32:55 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Mon, 28 Nov 2016 08:32:55 -0800
Subject: Manatee emoji?
In-Reply-To: <20161123124458.665a7a7059d7ee80bb4d670165c8327d.7ac8a1b9e0.wbe@email03.godaddy.com>
References: <20161123124458.665a7a7059d7ee80bb4d670165c8327d.7ac8a1b9e0.wbe@email03.godaddy.com>
Message-ID: <bd690eb5-c081-8f63-bf2d-62c64ca36ef9@ix.netcom.com>

On 11/23/2016 11:44 AM, Doug Ewell wrote:
> Leonardo Boiko wrote:
>
>> I support the creation of manatee emoji, but only if it?s accompanied
>> by a new modifier for emoji size, coming in the varieties: TINY,
>> SMALL, LARGE, HUGE.
>>
>> This would allow us to say "oh, the [HUGE MANATEE]" in emoji.
> Leonardo immediately wins the award for best sort-of-Unicode-related pun
> ever. Just retire the trophy now.
>
> But I am expecting a full array of modifiers and ZWJ sequences, to meet
> the user need for a female factory-worker manatee with dark skin and red
> hair, or families of manatees with arbitrary combinations of attributes.
>
>
Manatee families are where it's at.

A./

PS: "experts agree that the manatee is more developed than any other 
marine mammal in the world" (from: 
http://www.manatee-world.com/manatee-social-structure/)

From jsbien at mimuw.edu.pl  Tue Nov 29 05:57:32 2016
From: jsbien at mimuw.edu.pl (Janusz S. =?utf-8?Q?Bie=C5=84?=)
Date: Tue, 29 Nov 2016 12:57:32 +0100
Subject: The usage of Z WITH STROKE
In-Reply-To: <86wpfrzksr.fsf@mimuw.edu.pl> ("Janusz S. =?utf-8?Q?Bie=C5=84?=
 =?utf-8?Q?=22's?= message of "Fri, 25 Nov 2016 15:38:44 +0100")
References: <86wpfrzksr.fsf@mimuw.edu.pl>
Message-ID: <86zikifqhf.fsf@mimuw.edu.pl>

On Fri, Nov 25 2016 at 15:38 CET, jsbien at mimuw.edu.pl writes:
> Hi!
>
> There are two comments to the character(s) in the U0180 chart:
>
> 1. Pan-Turkic Latin orthography

[...]

On Mon, Nov 28 2016 at 16:48 CET, kenwhistler at att.net writes:
> On 11/25/2016 10:20 PM, Janusz S. Bie? wrote:
>
>     Now there is a follow-up question: why the character was included in
> Unicode 1.1.0?  

Thank you very much for the detailed answer!

[...]

> Well, the proximate cause for that was the presence of z with stroke
> in the XCCS character set, which was the source for a lot of the early
> Unicode 1.0 repertoire. More precisely:
>
> XCCS (= Xerox Character Code Standard) 1990 contained:
>
> 0x23 0x48 Azerbaijani capital letter Z
> 0x23 0x68 Azerbaijani small letter Z

[...]

> https://en.wikipedia.org/wiki/Azerbaijani_alphabet

So "Pan-Turkic Latin orthograhy" in the comment shoud be understood as
The Uniform Turkic Alphabet
(https://en.wikipedia.org/wiki/Common_Turkic_Alphabet#In_the_USSR)?

Best regards

Janusz


-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/