From c933103 at gmail.com  Wed Jul  1 03:15:28 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Wed, 1 Jul 2015 16:15:28 +0800
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
Message-ID: <CAGHjPPJ3gGWt8bq4=xK17bK81V0U342Tt5qmhGDK_2+x687g9g@mail.gmail.com>

<http://unicode.org/announcements/flag-snippets.jpg>The UTC is considering
a proposal to extend the types of flags which can be reliably represented
by certain sequences of Unicode characters. In addition to the current
mechanism using pairs of regional indicator symbols?already widely
implemented?the proposal would use sequences of the TAG characters in the
range U+E0030..U+E005A to represent other types of flags. The proposal also
provides guidelines to specify valid sequences of TAG characters and how to
interpret them. Full details of the proposal are provided in the background
document
<http://www.unicode.org/review/pri299/pri299-additional-flags-background.html>
.

The UTC welcomes feedback on this proposed new mechanism. Feedback could
consist of an indication of support or opposition to the proposal, with
reasons why, or could consist of suggestions for improvement of the
proposal.

For further information, please see the Public Review Issues
<http://www.unicode.org/review/> page.
 http://blog.unicode.org/2015/06/representing-additional-types-of-flags.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/5327aeea/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flag-snippets.jpg
Type: image/jpeg
Size: 36250 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/5327aeea/attachment.jpg>

From charupdate at orange.fr  Wed Jul  1 03:47:55 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 1 Jul 2015 10:47:55 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <20150630223305.67b8da0f@JRWUBU2>
References: <1851400009.9981.1435420121813.JavaMail.www@wwinf1d10>
 <20150630074746.79ff7cf7@JRWUBU2>
 <1430770470.10024.1435656344025.JavaMail.www@wwinf1m18>
 <20150630223305.67b8da0f@JRWUBU2>
Message-ID: <1398757226.8479.1435740475172.JavaMail.www@wwinf1m18>

On Tue, Jun 30, 2015, Richard Wordingham  wrote:

> On Tue, 30 Jun 2015 11:25:43 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > On Mon, Jun 30, 2015, Richard Wordingham  wrote:
> 
> > I tested on Microsoft Word 2010 Starter running on Windows 7 Starter,
> > on a netbook. This software being based on the full versions, the
> > interpretation of U+FEFF must be the standard behavior. I tested in
> > Latin script. You may wish to redo the tests, so please open a new
> > document, input two words, replace the blank with whatever character
> > the word boundaries behavior is to be checked of, and search for one
> > of the two words with the 'whole word' option enabled. If the result
> > is none, the test character indicates the absence of word boundaries;
> > if there is a result, the test character indicates the presence of
> > word boundaries.

Yesterday (On Tue, Jun 30, 2015) already, I?wondered how my text could be altered with needlessly suppressed and added line breaks.
Now I wish everybody to take notice that, at least on this Public List, I *never* quoted anybody this way:
?
> At some time in June 2015, Richard Wordingham wrote:

This is why, to get started with this reply, I?replaced that line with the accurate one, which can be checked at http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0279.html (except the e-mail address, which is suppressed by the list engine at archiving, and will be so here again):

On Tue, Jun 30, 2015, Richard Wordingham  wrote:
_______

> I did my own tests in word 2010 with Windows 7. Although U+FEFF and
> U+2060 displayed differently when I enabled the display of
> 'non-printing' characters (spaces, inactive soft hyphens, non-breaking
> hyphens, paragraph ends etc.), the behaved the same when embedded in
> French l'eau and Thai ?? - they changed each word to two words, as
> detected by ctrl/rt-arrow. However, this is wrong. 

At the same time, Doug Ewell (to whom I'll reply soon, as well as to Khaled Hosny) was writing exactly what I see at display: a .notdef box. Personally I've enabled for current display: paragraph ends, manual line breaks, tabulation characters, text limits. (Unfortunately I cannot enable separately the display of style separators too. To see them, I must enable all, as Richard did for test.)

Ctrl + RIGHT overrides APOSTROPHEs and in-word single closing-quotes, and can therefore not be used to detect word boundaries. 
Perhaps you might consider to run the test as I did. It goes as follows:

1 Open a new document.
2 input two words with a blank between.
3 Replace the blank with whatever character the word boundaries behavior is to be checked of.
4 Do a search for one of the two words with the 'whole word' option enabled.
? If the result is 'No instance found', the test character indicates the absence of word boundaries.
? If the result is 'One instance found', the test character indicates the presence of word boundaries.

This way, you will be told by Microsoft Word that the word 'eau' is found, because you used U+0027. Same result with U+2019. It wouldn't be until you use U+02BC, that U+006C U+02BC U+0065 U+0061 U+0075 is considered as a single word. With U+006C U+02BC U+FEFF U+0065 U+0061 U+0075, you will find the word 'eau' again. This is not wrong, given that a word joiner is expected to join words, in order that no NBSP nor any other no-break white space is needed to prevent line breaks between them. However, the words remain words. This is why Ctrl + RIGHT makes a stop at U+FEFF, detecting a word boundary. The overriding of in-word punctuations by quick cursor move is for word processing convenience only, in English as well as in French and other languages. In your example, when 'l'eau' (the water) is to be replaced with its counter-part 'la terre' (the land), when placing the cursor at the end and pressing Ctrl + BACKSPACE, you get the two words deleted and can immediately rewrite the non-elided article and the new word. But, as I say, that is not a test for word boundaries.

> >> No, this doesn't work.
> 
> Clarification: It doesn't work in correct software. Correct software
> would have treated the modified words as single words.

As far as belongs to the French example, the elided article and the noun are *already* treated as two words in correct software. There are spell-checkers which don't recognize a word when it is preceded by an elided article with apostrophe, but these are *not* correct software. And they are *not* from Microsoft. About Thai I've no knowledge, but I guess that ?? is a correct word, and therefore, correct software will take notice of the U+FEFF or U+2060 you add between the two characters and therefore assume that you mean *two* words but that you just won't have any blank between them. This is not wrong, again, and it is consistent with the fact that correct software complies to the Standards, that the Standards are designed to be useful, and that correct software is useful software. 

Talking about software, what use else of being correct?

Marcel 
?

> Message du 30/06/15 23:40
> De : "Richard Wordingham" 
> A : "Unicode Mailing List" 
> Copie ? : 
> Objet : Re: WORD JOINER vs ZWNBSP
> 
> On Tue, 30 Jun 2015 11:25:43 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > At some time in June 2015, Richard Wordingham wrote:
> 
> > I tested on Microsoft Word 2010 Starter running on Windows 7 Starter,
> > on a netbook. This software being based on the full versions, the
> > interpretation of U+FEFF must be the standard behavior. I?tested in
> > Latin script. You may wish to redo the tests, so please open a new
> > document, input two words, replace the blank with whatever character
> > the word boundaries behavior is to be checked of, and search for one
> > of the two words with the 'whole word' option enabled. If the result
> > is none, the test character indicates the absence of word boundaries;
> > if there is a result, the test character indicates the presence of
> > word boundaries.
> 
> I did my own tests in word 2010 with Windows 7. Although U+FEFF and
> U+2060 displayed differently when I enabled the display of
> 'non-printing' characters (spaces, inactive soft hyphens, non-breaking
> hyphens, paragraph ends etc.), the behaved the same when embedded in
> French l'eau and Thai ?? - they changed each word to two words, as
> detected by ctrl/rt-arrow. However, this is wrong. 
> 
> 
> >> No, this doesn't work.
> 
> Clarification: It doesn't work in correct software. Correct software
> would have treated the modified words as single words.
> 
> Richard.
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/3c2a1bf3/attachment.html>

From verdy_p at wanadoo.fr  Wed Jul  1 03:57:49 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 1 Jul 2015 10:57:49 +0200
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAGHjPPJ3gGWt8bq4=xK17bK81V0U342Tt5qmhGDK_2+x687g9g@mail.gmail.com>
References: <CAGHjPPJ3gGWt8bq4=xK17bK81V0U342Tt5qmhGDK_2+x687g9g@mail.gmail.com>
Message-ID: <CAGa7JC0pFjCQYwCz1nXVPrjV4JA9c-yLd6-dBMsw0oBBCPgfcQ@mail.gmail.com>

I oppose this proposal for the simple reason that it thinks hyphen
separations are not necessary. Possibly true today but there will be
extensions in some future needing more than 2 letters or 3 digits in the
primary subtag. even for iso 3166-2 the regional  subtags are very likely
to change and without separators the extension,s will become ambiguous

2015-07-01 10:15 GMT+02:00 gfb hjjhjh <c933103 at gmail.com>:

>   <http://unicode.org/announcements/flag-snippets.jpg>The UTC is
> considering a proposal to extend the types of flags which can be reliably
> represented by certain sequences of Unicode characters. In addition to the
> current mechanism using pairs of regional indicator symbols?already widely
> implemented?the proposal would use sequences of the TAG characters in the
> range U+E0030..U+E005A to represent other types of flags. The proposal also
> provides guidelines to specify valid sequences of TAG characters and how to
> interpret them. Full details of the proposal are provided in the background
> document
> <http://www.unicode.org/review/pri299/pri299-additional-flags-background.html>
> .
>
> The UTC welcomes feedback on this proposed new mechanism. Feedback could
> consist of an indication of support or opposition to the proposal, with
> reasons why, or could consist of suggestions for improvement of the
> proposal.
>
> For further information, please see the Public Review Issues
> <http://www.unicode.org/review/> page.
>
> http://blog.unicode.org/2015/06/representing-additional-types-of-flags.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/e537f67e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flag-snippets.jpg
Type: image/jpeg
Size: 36250 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/e537f67e/attachment-0001.jpg>

From dzo at bisharat.net  Wed Jul  1 08:50:17 2015
From: dzo at bisharat.net (dzo at bisharat.net)
Date: Wed, 1 Jul 2015 13:50:17 +0000
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <55915B2C.3060809@att.net>
References: <CAAAFA70B47E445D9D2FE1389040C726@DougEwell>
 <CA+Y+447XjiFQ-DEvs-MkF7ijdGzQxYrmnHEFnS5_7ML2+XcrGQ@mail.gmail.com>
 <84968C090B5F47409EF2006CF5309985@DougEwell>
 <CA+Y+444dDacnK9sbjH-YP4mJdaP2Y+QEipnhZ+RSraVGW7E=Sw@mail.gmail.com>
 <55915B2C.3060809@att.net>
Message-ID: <78734928-1435758617-cardhu_decombobulator_blackberry.rim.net-1181414367-@b27.c4.bise6.blackberry>

Whatever notation that might be added to whatever decision is ultimately made on this should probably mention historic use of the rainbow flag by the peace movement. See for example:

https://en.wikipedia.org/wiki/Peace_flag#Rainbow_flag

Sent via BlackBerry by AT&T

-----Original Message-----
From: Ken Whistler <kenwhistler at att.net>
Sender: "Unicode" <unicode-bounces at unicode.org>Date: Mon, 29 Jun 2015 07:50:20 
To: Noah Slater<nslater at tumbolia.org>
Cc: <unicode at unicode.org>
Subject: Re: Adding RAINBOW FLAG to Unicode

Noah,

Additional information you should have is that the UTC is about to
publish a new Public Review Issue on the topic of an extended mechanism
for the representation of more flag emoji with sequences of tag characters.
(Note: *not* representation as encoded single character symbols.)

That PRI, when it is available (should be quite soon -- early this week),
will be explicitly addressing concerns about state, regional, and
international flags. I don't think it will explicitly address "or 
otherwise",
but additional flag emoji that don't happen to be covered by the
regional and sub-regional tag mechanisms in the PRI would certainly
be in scope for discussion and feedback on the PRI.

Other short notes on comments in this long thread:

1. The claim that Twitter is including a RAINBOW FLAG would be taken
into consideration by the Emoji Subcommittee. Compatibility with
existing systems in wide use is a strong factor in favor of additions:

http://www.unicode.org/reports/tr51/#Selection_Factors_Compatibility

2. But on the other hand the offhand note: "When I mentioned my email to a
queer friend, they asked if I might propose other pride flags (*as there 
are many*)."
(emphasis added) illustrates the fundamental problem here. There is no
effective end to the "or otherwise" case for flags as symbols, and that
is why they are "generally not amenable to representation by encoded
characters".

Any simple image search for "pride flag" or "pride flag list" 
illustrates the
problem amply:

https://s-media-cache-ak0.pinimg.com/236x/69/83/f3/6983f3b9a4f68468bb101383006aa565.jpg
https://s-media-cache-ak0.pinimg.com/236x/61/88/95/618895059533cb5b52c55cecd641881d.jpg

That is not the realm of *characters* -- it is the realm of graphic 
design of
flags, emblems, and frankly, at this point, heraldry. ;-)

So, to sum up, I suggest that this thread about the RAINBOW FLAG be
directed to the soon-to-be-posted Public Review Issue about extending
the generative mechanisms for representing emoji symbols for flags,
but that that feedback carefully consider how such an addition would
coexist with other mechanisms for extensions of flag representation
*and* how it could be reasonably limited to one instead of 28 (... or
500) more flags.

--Ken

P.S. While I do think there might be a strong case made for the RAINBOW
FLAG to be added to the list of emoji flags representable by *some* kind
of extension mechanism in Unicode, there really, really is no end to
the "or otherwise" case. I happen to live in the city of Oakland, 
California.
Try an image search on "Oakland flag". You start with a more-or-less
official City flag, which kind of fits in the city as sub-region of region
paradigm, and which can be spotted flying at the Oakland City Hall,
but this quickly tails off into a gazillion variants, and various
flags as sports memorabilia. I'm quite certain that an Oakland A's flag
emoji would be locally quite popular if it were available on people's
phones, for example.

On 6/28/2015 3:36 PM, Noah Slater wrote:
>
> I really wish they'd provided a justification for this statement! :) I 
> guess that this is the right list for a UTC officer to give some sort 
> of feedback.
>
> On Sun, 28 Jun 2015 at 21:23 Doug Ewell <doug at ewellic.org 
> <mailto:doug at ewellic.org>> wrote:
>
>
>     Additionally, the domain of flags is
>     generally not amenable to representation by encoded characters,
>     and the
>     UTC does not wish to entertain further proposals for encoding of
>     symbol
>     characters for flags, whether national, state, regional,
>     international,
>     or otherwise. References to UTC Minutes: [134-C2], January 28, 2013."
>
>     The last clause is the relevant one here: "whether national, state,
>     regional, international, or otherwise." The words "or otherwise" could
>     be interpreted as saying that no *specific* flag of any kind will be
>     encoded in the future as a single character, partly because the domain
>     of flags is so open-ended. That would include flags associated with or
>     representing specific groups of individuals or social causes.
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/63ce7c6c/attachment.html>

From dzo at bisharat.net  Wed Jul  1 08:55:02 2015
From: dzo at bisharat.net (dzo at bisharat.net)
Date: Wed, 1 Jul 2015 13:55:02 +0000
Subject: Unencoded Latin capitals
Message-ID: <239490429-1435758902-cardhu_decombobulator_blackberry.rim.net-451267535-@b27.c4.bise6.blackberry>

Michael, Is there a list of lower case Latin letters needing capital equivalents?

TIA, 

Don
 
------Original Message------
From: Michael Everson
Sender: Unicode
To: Unicode Public
Subject: Re: Adding RAINBOW FLAG to Unicode
Sent: Jun 27, 2015 5:56 PM

On 27 Jun 2015, at 22:46, Konstantin Ritt <ritt.ks at gmail.com> wrote:
> 
> U+1F3F3, U+200D, U+2620
> WAVING WHITE FLAG, ZERO WIDTH JOINER, SKULL AND CROSSBONES

And thus the slippery slope is well and truly discovered.

Gosh, I wish we could add capital equivalents to all (or most of) the un-cased lower-case letters we?ve got for Latin. 

That at least would be practical. 

Michael Everson * http://www.evertype.com/


Sent via BlackBerry by AT&T


From dzo at bisharat.net  Wed Jul  1 09:02:08 2015
From: dzo at bisharat.net (dzo at bisharat.net)
Date: Wed, 1 Jul 2015 14:02:08 +0000
Subject: Unicode & the architecture of ICT
Message-ID: <1836500250-1435759328-cardhu_decombobulator_blackberry.rim.net-1230244883-@b27.c4.bise6.blackberry>

Fyi, a quick reflection on Unicode and enabling use of African languages in ICT. Addresses mainly people not expert on the subject:

http://niamey.blogspot.com/2015/06/unicode-and-architecture-of-ict.html
Sent via BlackBerry by AT&T

From nslater at tumbolia.org  Wed Jul  1 11:20:08 2015
From: nslater at tumbolia.org (Noah Slater)
Date: Wed, 01 Jul 2015 16:20:08 +0000
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAGa7JC0pFjCQYwCz1nXVPrjV4JA9c-yLd6-dBMsw0oBBCPgfcQ@mail.gmail.com>
References: <CAGHjPPJ3gGWt8bq4=xK17bK81V0U342Tt5qmhGDK_2+x687g9g@mail.gmail.com>
 <CAGa7JC0pFjCQYwCz1nXVPrjV4JA9c-yLd6-dBMsw0oBBCPgfcQ@mail.gmail.com>
Message-ID: <CA+Y+447_NPWFU+Hp2Po9XdUgJmdhZo77Xajwh4tpZo8RerMKWQ@mail.gmail.com>

Can someone help me understand what this means for my rainbow flag proposal?

On Wed, 1 Jul 2015 at 10:02 Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> I oppose this proposal for the simple reason that it thinks hyphen
> separations are not necessary. Possibly true today but there will be
> extensions in some future needing more than 2 letters or 3 digits in the
> primary subtag. even for iso 3166-2 the regional  subtags are very likely
> to change and without separators the extension,s will become ambiguous
>
> 2015-07-01 10:15 GMT+02:00 gfb hjjhjh <c933103 at gmail.com>:
>
>>   <http://unicode.org/announcements/flag-snippets.jpg>The UTC is
>> considering a proposal to extend the types of flags which can be reliably
>> represented by certain sequences of Unicode characters. In addition to the
>> current mechanism using pairs of regional indicator symbols?already widely
>> implemented?the proposal would use sequences of the TAG characters in the
>> range U+E0030..U+E005A to represent other types of flags. The proposal also
>> provides guidelines to specify valid sequences of TAG characters and how to
>> interpret them. Full details of the proposal are provided in the background
>> document
>> <http://www.unicode.org/review/pri299/pri299-additional-flags-background.html>
>> .
>>
>> The UTC welcomes feedback on this proposed new mechanism. Feedback could
>> consist of an indication of support or opposition to the proposal, with
>> reasons why, or could consist of suggestions for improvement of the
>> proposal.
>>
>> For further information, please see the Public Review Issues
>> <http://www.unicode.org/review/> page.
>>
>> http://blog.unicode.org/2015/06/representing-additional-types-of-flags.html
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/66669d36/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flag-snippets.jpg
Type: image/jpeg
Size: 36250 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/66669d36/attachment.jpg>

From doug at ewellic.org  Wed Jul  1 11:45:25 2015
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 01 Jul 2015 09:45:25 -0700
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
Message-ID: <20150701094525.665a7a7059d7ee80bb4d670165c8327d.036fd1878f.wbe@email03.secureserver.net>

Noah Slater <nslater at tumbolia dot org> wrote:

> Can someone help me understand what this means for my rainbow flag
> proposal?

You may want to go back and read Ken Whistler's suggestion from Monday:

> I suggest that this thread about the RAINBOW FLAG be
> directed to the soon-to-be-posted Public Review Issue about extending
> the generative mechanisms for representing emoji symbols for flags,
> but that that feedback carefully consider how such an addition would
> coexist with other mechanisms for extensions of flag representation
> *and* how it could be reasonably limited to one instead of 28 (... or
> 500) more flags.

I posted feedback yesterday on this PRI that was intended to be
consistent with what Ken wrote:

> Any proposal to extend the mechanism to cover the many other types of
> flags -- for historical regions, NGOs, maritime, sports, or social or
> political causes -- must be systematic and well-planned, not ad-hoc or
> haphazard, to assure interoperability and extensibility.

In other words, to the extent you wish to pursue encoding the rainbow
flag as a flag-tag sequence, I suggest this is part of a broader problem
space (how to encode flags for non-geopolitical entities) and requires a
broader solution that can apply to any arbitrary number of such flags.

In other, other words, something like "[flag]LGBT" should be a
non-starter.

If you are still suggesting a single character, this thread doesn't
affect that suggestion at all.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From shervinafshar at gmail.com  Wed Jul  1 11:49:53 2015
From: shervinafshar at gmail.com (Shervin Afshar)
Date: Wed, 1 Jul 2015 09:49:53 -0700
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CA+Y+447_NPWFU+Hp2Po9XdUgJmdhZo77Xajwh4tpZo8RerMKWQ@mail.gmail.com>
References: <CAGHjPPJ3gGWt8bq4=xK17bK81V0U342Tt5qmhGDK_2+x687g9g@mail.gmail.com>
 <CAGa7JC0pFjCQYwCz1nXVPrjV4JA9c-yLd6-dBMsw0oBBCPgfcQ@mail.gmail.com>
 <CA+Y+447_NPWFU+Hp2Po9XdUgJmdhZo77Xajwh4tpZo8RerMKWQ@mail.gmail.com>
Message-ID: <CA+ONODkL_i_Hw+zDPjAj3JuXbjYVz_Z_gYRoCh5WMNRGN2KK0g@mail.gmail.com>

On Wed, Jul 1, 2015 at 9:20 AM, Noah Slater <nslater at tumbolia.org> wrote:

> Can someone help me understand what this means for my rainbow flag
> proposal?
>

AFAIK, it's not going to have any effect on what you're proposing. This is
a mechanism for flags of sub-regions with ISO 3166-2 codes; e.g. US States,
countries and provinces of the UK, Tibet, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/0b1b7e37/attachment.html>

From doug at ewellic.org  Wed Jul  1 12:33:45 2015
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 01 Jul 2015 10:33:45 -0700
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
Message-ID: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>

Shervin Afshar <shervinafshar at gmail dot com> wrote:

> This is a mechanism for flags of sub-regions with ISO 3166-2 codes;
> e.g. US States, countries and provinces of the UK, Tibet, etc.

The Tibet Autonomous Region (CN-54), like other regions in China except
Hong Kong and Macao, has no official flag. 

Although this is what some users might expect, implementing or
interpreting "[flag]CN54" as the snow-lion flag, associated with the
Free Tibet movement, could be controversial and problematic in the
extreme. You know how China is.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Wed Jul  1 12:38:18 2015
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 01 Jul 2015 10:38:18 -0700
Subject: Adding RAINBOW FLAG to Unicode
Message-ID: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net>

<dzo at bisharat dot net> wrote:

> Whatever notation that might be added to whatever decision is
> ultimately made on this should probably mention historic use of the
> rainbow flag by the peace movement. See for example:
>
> https://en.wikipedia.org/wiki/Peace_flag#Rainbow_flag

The colors of the rainbow peace flag (purple on top) are often inverted
with respect to the LGBT flag (red on top), making them essentially two
different flags.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From shervinafshar at gmail.com  Wed Jul  1 12:46:27 2015
From: shervinafshar at gmail.com (Shervin Afshar)
Date: Wed, 1 Jul 2015 10:46:27 -0700
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
Message-ID: <CA+ONODkJVoSRkS8UAoCNhm_zfVutEP=piuE5nc+HwbGA+invhQ@mail.gmail.com>

On Wed, Jul 1, 2015 at 10:33 AM, Doug Ewell <doug at ewellic.org> wrote:

>
> The Tibet Autonomous Region (CN-54), like other regions in China except
> Hong Kong and Macao, has no official flag.
>
> Although this is what some users might expect, implementing or
> interpreting "[flag]CN54" as the snow-lion flag, associated with the
> Free Tibet movement, could be controversial and problematic in the
> extreme. You know how China is.


That's correct. I intentionally used that example as the implementations
can decide how do they want to represent "[flag]CN54". Technically it would
just be "flag for ISO 3166-2:CN-54".

? Shervin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/d763a7a3/attachment.html>

From nslater at tumbolia.org  Wed Jul  1 13:38:54 2015
From: nslater at tumbolia.org (Noah Slater)
Date: Wed, 01 Jul 2015 18:38:54 +0000
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <20150701094525.665a7a7059d7ee80bb4d670165c8327d.036fd1878f.wbe@email03.secureserver.net>
References: <20150701094525.665a7a7059d7ee80bb4d670165c8327d.036fd1878f.wbe@email03.secureserver.net>
Message-ID: <CA+Y+447ABX3TR-OskmmW0f8_WhrRS5sM9Wa7X19RQxiYCjfvrw@mail.gmail.com>

Thanks Doug.

On Wed, 1 Jul 2015 at 17:45 Doug Ewell <doug at ewellic.org> wrote:

>
> In other, other words, something like "[flag]LGBT" should be a
> non-starter.
>

Followed until this bit. Why would it be a non-starter?


> If you are still suggesting a single character, this thread doesn't
> affect that suggestion at all.
>

I don't know enough about how the Consortium functions to understand my
best course of action. Looking for advisement on (a) what is most likely to
pass UTC muster, and (b) what is most likely to result in rainbow flag
emojis being available widely in the near future.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150701/71db44e4/attachment.html>

From doug at ewellic.org  Wed Jul  1 14:26:44 2015
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 01 Jul 2015 12:26:44 -0700
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
Message-ID: <20150701122644.665a7a7059d7ee80bb4d670165c8327d.73a24430a0.wbe@email03.secureserver.net>

Noah Slater <nslater at tumbolia dot org> wrote:

>> In other, other words, something like "[flag]LGBT" should be a
>> non-starter.
>
> Followed until this bit. Why would it be a non-starter?

First, because under the proposal described in the PRI, it would
unequivocally stand for "region LG, subdivision BT". As it happens,
there is no region LG, so the sequence might simply be ignored as
undefined.

Second, and more generally, because it would not be part of any sort of
structured extension to the geopolitical-entity encoding mechanism. It
would provide no orderly path to encoding additional, similar flags for
other social groups or causes, including others also focusing on
sexuality. It would be strictly ad-hoc. It would rely solely on "this
combination of letters isn't in use right now, so let's snag it," which
is poor standardization, as Michael pointed out on Monday.

Using an ad-hoc "land grab" approach to registering flag tags, how would
the following flags be represented?

1. The flag of Chicago
2. The flag of the U.S. Army
3. The flag of ASEAN
4. The Olympic flag
5. The flag of UNICEF
6. The Christian flag
7. The Esperanto flag
8. The Confederate battle flag
9. The Gadsden flag ("Don't Tread On Me")
10. The Jolly Roger (pirate flag of Edward England)
11. The flag of ISIS (ISIL, AQMI, Da'esh)
12. The flag of Germany from 1933 to 1945

(Hint: all of these would have to be eligible, once the doors are
opened.)

Simply coming up with a combination of letters and digits for each of
these that happens to be unused in ISO 3166 won't do. There would have
to be something with much better structure and organization.

This is my suggestion, anyway.

> I don't know enough about how the Consortium functions to understand
> my best course of action. Looking for advisement on (a) what is most
> likely to pass UTC muster, and (b) what is most likely to result in
> rainbow flag emojis being available widely in the near future.

Other list participants and/or UTC members will have to help you here.
I'm the last one you want to ask about how to get a random emoji into
the Unicode Standard.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From verdy_p at wanadoo.fr  Wed Jul  1 21:12:53 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 2 Jul 2015 04:12:53 +0200
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
Message-ID: <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>

And today's Chinese province ofTibet is different from the historic Tibet,
as China incorporated other surrounding areas, including some parts taken
from Bhutan (a small part around Legaru, and a larger part to the North)
and India (some parts to the West from states of Jammu and Kashmir, which
itself is also claimed by Pakistan, and of Uttarakhand, and to the East
from Arunachal Pradesh), as well as modifying the internal borders of
Chinese provinces of Xinjiang in the nort-west and of Sichuan on the east.
The whole new province is still named Tibet but much larger than the
historic country of Tibet before its annexion.

The Chinese claims in India and Bhutan are contested and is still subject
to very active military tensions with India. This question is then more
important than only the Tibetan free movement that does not claim anything
to India and Bhutan (and in fact these two countries are hosting Tibetan
refugees and the Free Tibet movement itself) and do not claim anything in
Chinese parts previously part of Sichuan and Xinjiang provinces.

China also has border conflicts with Tajiskistan and a small part of
Afghanistan to extend its current province of Xinjiang to the West. The
international borders of China are then extremely fuzzy. With India and
Bhutan, the claims are theorically existing but India has kept its
presence. The situation is much less clear however with Jammu and Kashmir
(that has its own separatist movement in addition to the Pakistan claims)
and is now becoming more critical with Tajikistan and in the troubled area
bordering Afghanistan, both areas having autonomist islamic movements in
Xinjiang (including now some of them allied with Talebans operating in
Afghanistan and Tajikistan since the dissolution of the former USSR: before
that dissolution, this was also a region of border conflicts between China
and USSR).

Now China has also maritime bordering conflicts in the South China Sea from
Vietnam to the Philippines, Malaysia and Brunei as China wants to extend
its maritime borders to the south to include various small islands. It has
also conflicts with Taiwan to the north of that maritime area.

Defining the borders of China is really complicate. And this has
consequences also on the interpretation of Chinese subdivisions of
provinces in ISO 3166-2. I would not associate flags with these official
Chinese provinces given that even China does not claim any flag. But I
would certainly not use these ISO 3166-2 Chinese subdivisions to associate
them with historic regions annexed by China, or claimed by China over other
countries (which are still a source of active conflicts and military
actions or political tensions by China against Vietnam, Taiwan, the
Philippines, Malaysia, Brunei, as well with South Korea and Japan. All
countries around China have to protect their borders with China whose power
and influence is growing (even in the easternmost part of Russia with an
important Chinese community supporting China rather than Russia for the
historic conflicts with Japan).

We've not seen any sign of stabilization and in fact the number of
territorial conflicts is growing, as well as the Chinese military presence
in all these bordering regions. Many of these existing countries also have
internal troubles since long (e.g. Myanmar, and even Vietnam due to the
past wars and military support of China for Northern Vietnam against
Southern Vietnam: now Vietnam has a significant Chinese community in its
own borders, which could support the Chinese claims in South China Sea). It
seems that China wants to create a huge matitime area connecting the
maritime roads from Hong Kong to Singapore and new conflicts could appear
with Indonesia.

2015-07-01 19:33 GMT+02:00 Doug Ewell <doug at ewellic.org>:

> Shervin Afshar <shervinafshar at gmail dot com> wrote:
>
> > This is a mechanism for flags of sub-regions with ISO 3166-2 codes;
> > e.g. US States, countries and provinces of the UK, Tibet, etc.
>
> The Tibet Autonomous Region (CN-54), like other regions in China except
> Hong Kong and Macao, has no official flag.
>
> Although this is what some users might expect, implementing or
> interpreting "[flag]CN54" as the snow-lion flag, associated with the
> Free Tibet movement, could be controversial and problematic in the
> extreme. You know how China is.
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/dea04b4f/attachment.html>

From mark at macchiato.com  Thu Jul  2 00:16:41 2015
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Thu, 2 Jul 2015 07:16:41 +0200
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
Message-ID: <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>

*?Please take political discussions elsewhere; they do not belong on this
list.*

The point about the boundaries of regions changing over time, and flags
being associated with a former set of boundaries could have been made in a
few sentences. Not only would it have avoided politics, it would have been
more likely that people would actually read it (the likelihood being
inversely proportional to the length).


Mark <https://google.com/+MarkDavis>

*? Il meglio ? l?inimico del bene ?*

On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> And today's Chinese province ofTibet is different from the historic Tibet,
> as China incorporated other surrounding areas, including some parts taken
> from Bhutan (a small part around Legaru, and a larger part to the North)
> and India (some parts to the West from states of Jammu and Kashmir, which
> itself is also claimed by Pakistan, and of Uttarakhand, and to the East
> from Arunachal Pradesh), as well as modifying the internal borders of
> Chinese provinces of Xinjiang in the nort-west and of Sichuan on the east.
> The whole new province is still named Tibet but much larger than the
> historic country of Tibet before its annexion.
>
> The Chinese claims in India and Bhutan are contested and is still subject
> to very active military tensions with India. This question is then more
> important than only the Tibetan free movement that does not claim anything
> to India and Bhutan (and in fact these two countries are hosting Tibetan
> refugees and the Free Tibet movement itself) and do not claim anything in
> Chinese parts previously part of Sichuan and Xinjiang provinces.
>
> China also has border conflicts with Tajiskistan and a small part of
> Afghanistan to extend its current province of Xinjiang to the West. The
> international borders of China are then extremely fuzzy. With India and
> Bhutan, the claims are theorically existing but India has kept its
> presence. The situation is much less clear however with Jammu and Kashmir
> (that has its own separatist movement in addition to the Pakistan claims)
> and is now becoming more critical with Tajikistan and in the troubled area
> bordering Afghanistan, both areas having autonomist islamic movements in
> Xinjiang (including now some of them allied with Talebans operating in
> Afghanistan and Tajikistan since the dissolution of the former USSR: before
> that dissolution, this was also a region of border conflicts between China
> and USSR).
>
> Now China has also maritime bordering conflicts in the South China Sea
> from Vietnam to the Philippines, Malaysia and Brunei as China wants to
> extend its maritime borders to the south to include various small islands.
> It has also conflicts with Taiwan to the north of that maritime area.
>
> Defining the borders of China is really complicate. And this has
> consequences also on the interpretation of Chinese subdivisions of
> provinces in ISO 3166-2. I would not associate flags with these official
> Chinese provinces given that even China does not claim any flag. But I
> would certainly not use these ISO 3166-2 Chinese subdivisions to associate
> them with historic regions annexed by China, or claimed by China over other
> countries (which are still a source of active conflicts and military
> actions or political tensions by China against Vietnam, Taiwan, the
> Philippines, Malaysia, Brunei, as well with South Korea and Japan. All
> countries around China have to protect their borders with China whose power
> and influence is growing (even in the easternmost part of Russia with an
> important Chinese community supporting China rather than Russia for the
> historic conflicts with Japan).
>
> We've not seen any sign of stabilization and in fact the number of
> territorial conflicts is growing, as well as the Chinese military presence
> in all these bordering regions. Many of these existing countries also have
> internal troubles since long (e.g. Myanmar, and even Vietnam due to the
> past wars and military support of China for Northern Vietnam against
> Southern Vietnam: now Vietnam has a significant Chinese community in its
> own borders, which could support the Chinese claims in South China Sea). It
> seems that China wants to create a huge matitime area connecting the
> maritime roads from Hong Kong to Singapore and new conflicts could appear
> with Indonesia.
>
> 2015-07-01 19:33 GMT+02:00 Doug Ewell <doug at ewellic.org>:
>
>> Shervin Afshar <shervinafshar at gmail dot com> wrote:
>>
>> > This is a mechanism for flags of sub-regions with ISO 3166-2 codes;
>> > e.g. US States, countries and provinces of the UK, Tibet, etc.
>>
>> The Tibet Autonomous Region (CN-54), like other regions in China except
>> Hong Kong and Macao, has no official flag.
>>
>> Although this is what some users might expect, implementing or
>> interpreting "[flag]CN54" as the snow-lion flag, associated with the
>> Free Tibet movement, could be controversial and problematic in the
>> extreme. You know how China is.
>>
>> --
>> Doug Ewell | http://ewellic.org | Thornton, CO ????
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/5b225389/attachment.html>

From verdy_p at wanadoo.fr  Thu Jul  2 04:01:46 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 2 Jul 2015 11:01:46 +0200
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
Message-ID: <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>

The political subject is immediately related to the designation of flags
and their association to ISO 3166-1 and -2 encoded entities. Even if you
don't like it, this is very political and for a standard seeking for
stability, I wonder how any flag (directly bound to specific political
entities at specific dates and within some boundaries which may be
contested) can be related to ISO 3166 and its instability (and the fact
that ISO 3166 entities have in fact also no defined borders, so that ISO
3166-2 is just a political point of view from the current ruler of the
current ISO 3166-1 entity).

All this topic is political. In fact the real flags are not even encoded
with RIS, not even for current nations (and there's still a problem to know
what is a recognized nation, even when just considering the UN definition.
Political entities are defined but with fuzzy borders, they just represent
in fact some local governments, not necessarily their lands, people, or
cultures, and in some cases they are in exil or not even ruling: their seat
in the UN is vacant and they exist only on the paper, but even UN members
disagree about which treaty they recognize).

Consider the case of Western Sahara (which no longer exists except on the
paper as a dependency of Spain that has abandoned it completely) and with
two governments competing to control the territory (Morocco controlling
most of it, another part claimed by Mauritania then abandonned, another
part left without infrastructures, and many refugees left de facto in
Mauritania or Algeria). None of the two autorities designate that territory
as "Western Sahara". So it no longer exists (and will likely never exist
again).

The frozen status of Antarctica has not created any new country or
territory, even if there's a sort of joint administration: that
adminsitration does not suppresses the existing claims (and new claims that
have been made since its creation). So this area has no well defined flag
and various falgs are used informally plus national flags for each claim
and sometimes specific regional flags created ad hoc. The use of RIS for
ISO 3166-1 and its limited extension for ISO3166-2 (slightly modified) does
not resolve the problem.

In really there's still no standard way to encode flags unambiguously and
in a stable way. We'd like to have FOTW (Flags of the World) contributors
to propose their own scheme. But it will not be compatible with the current
RIS solution or the proposed extension. If ever such standard emerges, it
will require encoding a new set of characters.

An alternative would be to embed an URN (not reencoded) between some pairs
of controls (to embed an object by reference) and use that sequence after a
White flag symbol with a joiner.

The URN scheme being the best long term solution (and preferable to URLs
bound to specific servers), but we could in fact a generic URI
encapsulation (supporting URNs and URLs).

It could be used then for representing various kinds of entities, and then
link them to specific forms: flags, banners, flying flag, flag over a
person face, micni location maps, "flag maps"... Programs not recognizing
the encoded entities would have a very simply way to scan over the
encasulated URI representing some an specified objects. OTher programs will
recognize some specific URI schemes. RIS will then be something of the
past, obsoleted because it was non neutral, politcally and culturally
oriented, incomplete, and fundamentally unstable since the begining... For
now we just have some set of flags promoted only to support the immediate
support for interconnecting propriatary messaging services. But all this
came without a correct review of what was really needed.


2015-07-02 7:16 GMT+02:00 Mark Davis ?? <mark at macchiato.com>:

> *?Please take political discussions elsewhere; they do not belong on this
> list.*
>
> The point about the boundaries of regions changing over time, and flags
> being associated with a former set of boundaries could have been made in a
> few sentences. Not only would it have avoided politics, it would have been
> more likely that people would actually read it (the likelihood being
> inversely proportional to the length).
>
>
> Mark <https://google.com/+MarkDavis>
>
> *? Il meglio ? l?inimico del bene ?*
>
> On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
>
>> And today's Chinese province ofTibet is different from the historic
>> Tibet, as China incorporated other surrounding areas, including some parts
>> taken from Bhutan (a small part around Legaru, and a larger part to the
>> North) and India (some parts to the West from states of Jammu and Kashmir,
>> which itself is also claimed by Pakistan, and of Uttarakhand, and to the
>> East from Arunachal Pradesh), as well as modifying the internal borders of
>> Chinese provinces of Xinjiang in the nort-west and of Sichuan on the east.
>> The whole new province is still named Tibet but much larger than the
>> historic country of Tibet before its annexion.
>>
>> The Chinese claims in India and Bhutan are contested and is still subject
>> to very active military tensions with India. This question is then more
>> important than only the Tibetan free movement that does not claim anything
>> to India and Bhutan (and in fact these two countries are hosting Tibetan
>> refugees and the Free Tibet movement itself) and do not claim anything in
>> Chinese parts previously part of Sichuan and Xinjiang provinces.
>>
>> China also has border conflicts with Tajiskistan and a small part of
>> Afghanistan to extend its current province of Xinjiang to the West. The
>> international borders of China are then extremely fuzzy. With India and
>> Bhutan, the claims are theorically existing but India has kept its
>> presence. The situation is much less clear however with Jammu and Kashmir
>> (that has its own separatist movement in addition to the Pakistan claims)
>> and is now becoming more critical with Tajikistan and in the troubled area
>> bordering Afghanistan, both areas having autonomist islamic movements in
>> Xinjiang (including now some of them allied with Talebans operating in
>> Afghanistan and Tajikistan since the dissolution of the former USSR: before
>> that dissolution, this was also a region of border conflicts between China
>> and USSR).
>>
>> Now China has also maritime bordering conflicts in the South China Sea
>> from Vietnam to the Philippines, Malaysia and Brunei as China wants to
>> extend its maritime borders to the south to include various small islands.
>> It has also conflicts with Taiwan to the north of that maritime area.
>>
>> Defining the borders of China is really complicate. And this has
>> consequences also on the interpretation of Chinese subdivisions of
>> provinces in ISO 3166-2. I would not associate flags with these official
>> Chinese provinces given that even China does not claim any flag. But I
>> would certainly not use these ISO 3166-2 Chinese subdivisions to associate
>> them with historic regions annexed by China, or claimed by China over other
>> countries (which are still a source of active conflicts and military
>> actions or political tensions by China against Vietnam, Taiwan, the
>> Philippines, Malaysia, Brunei, as well with South Korea and Japan. All
>> countries around China have to protect their borders with China whose power
>> and influence is growing (even in the easternmost part of Russia with an
>> important Chinese community supporting China rather than Russia for the
>> historic conflicts with Japan).
>>
>> We've not seen any sign of stabilization and in fact the number of
>> territorial conflicts is growing, as well as the Chinese military presence
>> in all these bordering regions. Many of these existing countries also have
>> internal troubles since long (e.g. Myanmar, and even Vietnam due to the
>> past wars and military support of China for Northern Vietnam against
>> Southern Vietnam: now Vietnam has a significant Chinese community in its
>> own borders, which could support the Chinese claims in South China Sea). It
>> seems that China wants to create a huge matitime area connecting the
>> maritime roads from Hong Kong to Singapore and new conflicts could appear
>> with Indonesia.
>>
>> 2015-07-01 19:33 GMT+02:00 Doug Ewell <doug at ewellic.org>:
>>
>>> Shervin Afshar <shervinafshar at gmail dot com> wrote:
>>>
>>> > This is a mechanism for flags of sub-regions with ISO 3166-2 codes;
>>> > e.g. US States, countries and provinces of the UK, Tibet, etc.
>>>
>>> The Tibet Autonomous Region (CN-54), like other regions in China except
>>> Hong Kong and Macao, has no official flag.
>>>
>>> Although this is what some users might expect, implementing or
>>> interpreting "[flag]CN54" as the snow-lion flag, associated with the
>>> Free Tibet movement, could be controversial and problematic in the
>>> extreme. You know how China is.
>>>
>>> --
>>> Doug Ewell | http://ewellic.org | Thornton, CO ????
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/55afaf9e/attachment.html>

From charupdate at orange.fr  Thu Jul  2 04:29:06 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 2 Jul 2015 11:29:06 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <20150630194129.GA16879@khaled-laptop>
References: <552516479.6107.1435315719474.JavaMail.www@wwinf2229>
 <20150626110243.GB18139@ebed.etf.cuni.cz>
 <BLUPR03MB1207D2C08CC6E2A84AF0F0CD5AC0@BLUPR03MB120.namprd03.prod.outlook.com>
 <2104451852.9023.1435654939028.JavaMail.www@wwinf1m18>
 <20150630194129.GA16879@khaled-laptop>
Message-ID: <1413925467.11206.1435829346235.JavaMail.www@wwinf1m18>

On Tue, Jun 30, 2015, Khaled Hosny  wrote:

> On Tue, Jun 30, 2015 at 11:02:18AM +0200, Marcel Schneider wrote:
> > On Sun, Jun 28, 2015, Peter Constable 
> > wrote:
> > 
> > > Marcel: Can you please clarify in what way Windows 7 is not supporting U+2060.
> > 
> > On my netbook, which is running Windows 7 Starter, U+2060 is not a
> > part of any of the shipped fonts.
> 
> It is a control character, it does not need to have a glyph in the font
> to be properly supported.

As Doug explained us, this is true and false because there are three fonts shipped with Windows' full version where U+2060 is a part of, and all other fonts are bugging about U+2060. However, that too is only an application issue, and Hosny's advice is true for OpenOffice and LibreOffice, if my test results are accurate (please refer to the e-mail I sent just before).

The issue about WORD JOINER vs ZWNBSP is resolved in conformance with Unicode recommendations at the condition that the preferred word processor is LibreOffice Writer, or OpenOffice Writer, but not Microsoft Offfice Word. This results from three facts:
1 The WJ is displayed with zero width and with a visible mark (resembling to that of NBSP) in OpenOffice/LibreOffice:

[screenshot]


2 The WJ works with whatever font is selected (here, Aharoni).

?

3 No format character is destroyed by OpenOffice/LibreOffice at conversion to plain text (pasting into a text editor).

?

This is why, actually, users must switch between applications depending on the actual task and the characters used. Sticking with an application we are used to, would then be a counter-productive error.

?

About the WJ being a control character, I would add that it is of general category Cf, which in actual terms is Other (Format), while control characters belong to Cc, named Other (Control). The difference may be slight and a mere terminology topic, but given the bad handling of some format characters by the world's most used word processors, I guess there must be something to be changed. Perhaps the WJ has been forgotten, on the idea that it's only a control. In the case that the WJ has purposely been poorly implemented on Word, that may be to prevent people from using Word for what they should use Publisher. However, I believe that WJs being a part of plain text, they should be properly supported on all text handling applications. And they should be on the keyboard.

?

The solution I suggest is therefore to have the word joiner (and the sequences containing it) on Ctrl+Alt or Kana, and the zero width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users working efficently on good software may access the preferred character a bit easier than users who must use the deprecated character because their word processor does not properly support the preferred one.

?

I'm sorry to have asked Unicode to remove the recommendation for U+2060. i'm accustomed to Microsoft's word processor, where I've got my huge autoexpand list. (This is written *without* autoexpand.) And I hadn't already tested that on OpenOffice/LibreOffice. Now, that's done.

?

Regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/fc20eb1e/attachment.html>

From charupdate at orange.fr  Thu Jul  2 05:22:30 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 2 Jul 2015 12:22:30 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
Message-ID: <2143588033.13070.1435832550669.JavaMail.www@wwinf1m18>

I'm sorry of the name mistake in this mail (it's corrected below) and got aware of a number of problems with sending secreenshots. As I just learned that links are preferred for images, I posted them on Postimage.


?

On Tue, Jun 30, 2015, Khaled Hosny  wrote:

> On Tue, Jun 30, 2015 at 11:02:18AM +0200, Marcel Schneider wrote:
> > On Sun, Jun 28, 2015, Peter Constable 
> > wrote:
> > 
> > > Marcel: Can you please clarify in what way Windows 7 is not supporting U+2060.
> > 
> > On my netbook, which is running Windows 7 Starter, U+2060 is not a
> > part of any of the shipped fonts.
> 
> It is a control character, it does not need to have a glyph in the font
> to be properly supported.

As Doug explained us, this is true and false because there are three fonts shipped with Windows' full version where U+2060 is a part of, and all other fonts are bugging about U+2060. However, that too is only an application issue, and Khaled's advice is true for OpenOffice and LibreOffice, if my test results are accurate (please refer to the e-mail I sent just before).

The issue about WORD JOINER vs ZWNBSP is resolved in conformance with Unicode recommendations at the condition that the preferred word processor is LibreOffice Writer, or OpenOffice Writer, but not Microsoft Offfice Word. This results from three facts:
1 The WJ is displayed with zero width and with a visible mark (resembling to that of NBSP) in OpenOffice/LibreOffice:

http://s24.postimg.org/5ujkak28l/screen_m_2015_07_02_04_08.jpg


2 The WJ works with whatever font is selected (here, Aharoni).

?

3 No format character is destroyed by OpenOffice/LibreOffice at conversion to plain text (pasting into a text editor).

?

This is why, actually, users must switch between applications depending on the actual task and the characters used. Sticking with an application we are used to, would then be a counter-productive error.

?

If you wish to view some more screenshots, I'd like to provide these (I switched the UI to English if possible, eventually in LibreOffice Writer):


http://s6.postimg.org/mfn27wthd/screen_m_2015_07_02_04_19.jpg


http://s6.postimg.org/6wpmasl6p/screen_m_2015_07_02_04_32.png


http://s6.postimg.org/bz6y5kugx/screen_m_2015_07_02_04_42.jpg


?

?

About the WJ being a control character, I would add that it is of general category Cf, which in actual terms is Other (Format), while control characters belong to Cc, named Other (Control). The difference may be slight and a mere terminology topic, but given the bad handling of some format characters by the world's most used word processors, I guess there must be something to be changed. Perhaps the WJ has been forgotten, on the idea that it's only a control. In the case that the WJ has purposely been poorly implemented on Word, that may be to prevent people from using Word for what they should use Publisher. However, I believe that WJs being a part of plain text, they should be properly supported on all text handling applications. And they should be on the keyboard.

?

The solution I suggest is therefore to have the word joiner (and the sequences containing it) on Ctrl+Alt or Kana, and the zero width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users working efficently on good software may access the preferred character a bit easier than users who must use the deprecated character because their word processor does not properly support the preferred one.

?

I'm sorry to have asked Unicode to remove the recommendation for U+2060. i'm accustomed to Microsoft's word processor, where I've got my huge autoexpand list. (This is written *without* autoexpand.) And I hadn't already tested that on OpenOffice/LibreOffice. Now, that's done.

?

Regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/3dea96d0/attachment.html>

From eik at iki.fi  Thu Jul  2 06:57:11 2015
From: eik at iki.fi (Erkki I Kolehmainen)
Date: Thu, 2 Jul 2015 14:57:11 +0300
Subject: VS: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
Message-ID: <003401d0b4be$3af16970$b0d43c50$@fi>

I cannot but agree with Mark! Thus, please?

 
Sincerely, Erkki

 
L?hett?j?: Unicode [mailto:unicode-bounces at unicode.org] Puolesta Philippe Verdy
L?hetetty: 2. hein?kuuta 2015 12:02
Vastaanottaja: Mark Davis ??
Kopio: Doug Ewell; Unicode Mailing List
Aihe: Re: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags)

 
The political subject is immediately related to the designation of flags and their association to ISO 3166-1 and -2 encoded entities. Even if you don't like it, this is very political and for a standard seeking for stability, I wonder how any flag (directly bound to specific political entities at specific dates and within some boundaries which may be contested) can be related to ISO 3166 and its instability (and the fact that ISO 3166 entities have in fact also no defined borders, so that ISO 3166-2 is just a political point of view from the current ruler of the current ISO 3166-1 entity).

 
All this topic is political. In fact the real flags are not even encoded with RIS, not even for current nations (and there's still a problem to know what is a recognized nation, even when just considering the UN definition. Political entities are defined but with fuzzy borders, they just represent in fact some local governments, not necessarily their lands, people, or cultures, and in some cases they are in exil or not even ruling: their seat in the UN is vacant and they exist only on the paper, but even UN members disagree about which treaty they recognize).

 
Consider the case of Western Sahara (which no longer exists except on the paper as a dependency of Spain that has abandoned it completely) and with two governments competing to control the territory (Morocco controlling most of it, another part claimed by Mauritania then abandonned, another part left without infrastructures, and many refugees left de facto in Mauritania or Algeria). None of the two autorities designate that territory as "Western Sahara". So it no longer exists (and will likely never exist again).

 
The frozen status of Antarctica has not created any new country or territory, even if there's a sort of joint administration: that adminsitration does not suppresses the existing claims (and new claims that have been made since its creation). So this area has no well defined flag and various falgs are used informally plus national flags for each claim and sometimes specific regional flags created ad hoc. The use of RIS for ISO 3166-1 and its limited extension for ISO3166-2 (slightly modified) does not resolve the problem.

 
In really there's still no standard way to encode flags unambiguously and in a stable way. We'd like to have FOTW (Flags of the World) contributors to propose their own scheme. But it will not be compatible with the current RIS solution or the proposed extension. If ever such standard emerges, it will require encoding a new set of characters.

 
An alternative would be to embed an URN (not reencoded) between some pairs of controls (to embed an object by reference) and use that sequence after a White flag symbol with a joiner.

 
The URN scheme being the best long term solution (and preferable to URLs bound to specific servers), but we could in fact a generic URI encapsulation (supporting URNs and URLs).

 
It could be used then for representing various kinds of entities, and then link them to specific forms: flags, banners, flying flag, flag over a person face, micni location maps, "flag maps"... Programs not recognizing the encoded entities would have a very simply way to scan over the encasulated URI representing some an specified objects. OTher programs will recognize some specific URI schemes. RIS will then be something of the past, obsoleted because it was non neutral, politcally and culturally oriented, incomplete, and fundamentally unstable since the begining... For now we just have some set of flags promoted only to support the immediate support for interconnecting propriatary messaging services. But all this came without a correct review of what was really needed.

 
2015-07-02 7:16 GMT+02:00 Mark Davis ?? <mark at macchiato.com>:

?Please take political discussions elsewhere; they do not belong on this list.

 
The point about the boundaries of regions changing over time, and flags being associated with a former set of boundaries could have been made in a few sentences. Not only would it have avoided politics, it would have been more likely that people would actually read it (the likelihood being inversely proportional to the length).


Mark <https://google.com/+MarkDavis> 

 
? Il meglio ? l?inimico del bene ?

 
On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

?

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/6843a123/attachment.html>

From charupdate at orange.fr  Thu Jul  2 06:58:40 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 2 Jul 2015 13:58:40 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
Message-ID: <687953989.15645.1435838320486.JavaMail.www@wwinf1m18>

This message contained a screenshot and originally contained several attached screenshots, which prevented it from being forwarded to the List. I removed all and suggest that for screenshots, readers might refer to the links I added in my e-mail I resent today to Khaled Hosny.


?

On Tue, Jun 30, 2015, Doug Ewell  wrote:

> Khaled Hosny wrote:
> 
> >> On my netbook, which is running Windows 7 Starter, U+2060 is not a
> >> part of any of the shipped fonts.
> >
> > It is a control character, it does not need to have a glyph in the
> > font to be properly supported.

Thank you Khaled, I will respond soon after this.

> The problem is the word "supported." Marcel is seeing a visible glyph (a
> .notdef box) for what is supposed to be an invisible, zero-width
> character, and that is leading him to conclude that Windows doesn't
> "support" this character.

The .notdef box is exactly what I see sometimes on the Notepad and every time in the Word dialogs when I use U+2060, but in fact, what I see in the document is a particular glyph, representing a tall fullheight empty box with a wide space to its right despite of the font being proportional, and in the Notepad text the same box but without space. Only when I switch the font to the one you indicate below, the word joiner displays correctly on my version of Microsoft Word. Please see the attached screenshots (I wanted to paste them into this e-mail).

> On my Win 7 machine at work, when I enter the string "one?two"
> ("one\u2060two") and click on either word, both words are selected. That
> is exactly what I would expect WJ to do. This works on the built-in
> Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's
> Web-based email client).

The selection with double-click corresponds to what Richard did with the quick cursor move. These phenomena are text processing features which give little evidence on the presence or the absence of word boundaries. So I redid your test but used the search tool, with the "Whole words only" option enabled. This gives an idea of how the application percieves the words as entities, or better said, how developers expect users to expect search results. Well that isn't really a better expression... What I?want to say is that what we see is normally what we are expected to expect. Personally I wouldn't like to get selected only a part of the compound I want most probably to mark up as a whole, nor do you, Doug. This is why a double-click on no matter which spot on the sequence makes this sequence selected as a whole. By contrast, given that we took care to insert word joiners where normally we aren't expected to (because it is sufficient to simply type the words one after each other without anything between, to get them as *one* word), the software engineers expect us to wish to join what must remain a sequence of separate words. Consequently, the built-in search engine will recognize each word as a word for itself.

This is where good software deploys its benefits. Some software does not recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating the presence of a word boundary, and therefore does not work correctly. That depends also on the PDF?conversion tool. Please check the screenshots (I switched the UIs to English wherever possible, that is, on LibreOffice). [This e-mail has been blocked because it contained several attached screenshots. So I resend it without attached images.]

> But out of more than 500 fonts on that machine, the only stock Microsoft
> fonts that show WJ with zero-width, instead of a .notdef glyph, are
> Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's
> inaccurate to extrapolate this to "Microsoft doesn't support WJ," the
> font support is definitely lacking.

I wish to thank you personally Doug, for this very valuable hint. Effectively, on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly displayed unless the font is switched to Segoe UI Symbol (which is the one out of the three that had been shipped with my OS). If the Segoe typeface is not appropriate in the document, we can ask Word to find and replace all istances of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word users are expected to do every time. Even if that isn't really what we expect of a Productivity Suite. Perhaps, or most probably, this problem does not occur in other high-end software, as Microsoft Publisher (needs to be confirmed). But if somebody buys Microsoft Office Premium, or Professional, he should be save from that misfunctioning. As should be everybody using Microsoft software, in fact.

> The bit about characters being converted to other characters, of course,
> has nothing to do with Windows and everything to do with particular
> applications.

Based on this hint, I did more tests and found out that for a proper conversion to plain text, any segment including U+00A0, U+FEFF and other format characters, when copied from a document on Microsoft Word, must first be pasted into a LibreOffice document, then copied again and finally pasted into the text editor. I should avoid to vent further about that issue, and I'd better wait for official comments; I simply suppose that there is an algorithm (say, then, as a part of Microsoft Word) detecting where the clipboard item goes to, and eventually destroying the format characters. Guess everybody to what use...

Thanks a lot!

Marcel


[originally one pasted screenshot]

? 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/0212cfac/attachment-0001.html>

From mark at macchiato.com  Thu Jul  2 07:05:44 2015
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Thu, 2 Jul 2015 14:05:44 +0200
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
Message-ID: <CAJ2xs_HtQ1MizmPN-=gvxcJZkQ5pW0CU++cuPUm5T+jRmjB3XA@mail.gmail.com>

Ok. I wasn't clear enough. Certainly boundaries are political and relevant,
as is the fact that they change. What is not relevant is talking about
particular country's motivations and actions.

Moreover, you insist about writing a tome about this. In other words, TL;DR.

Mark <https://google.com/+MarkDavis>

*? Il meglio ? l?inimico del bene ?*

On Thu, Jul 2, 2015 at 11:01 AM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> The political subject is immediately related to the designation of flags
> and their association to ISO 3166-1 and -2 encoded entities. Even if you
> don't like it, this is very political and for a standard seeking for
> stability, I wonder how any flag (directly bound to specific political
> entities at specific dates and within some boundaries which may be
> contested) can be related to ISO 3166 and its instability (and the fact
> that ISO 3166 entities have in fact also no defined borders, so that ISO
> 3166-2 is just a political point of view from the current ruler of the
> current ISO 3166-1 entity).
>
> All this topic is political. In fact the real flags are not even encoded
> with RIS, not even for current nations (and there's still a problem to know
> what is a recognized nation, even when just considering the UN definition.
> Political entities are defined but with fuzzy borders, they just represent
> in fact some local governments, not necessarily their lands, people, or
> cultures, and in some cases they are in exil or not even ruling: their seat
> in the UN is vacant and they exist only on the paper, but even UN members
> disagree about which treaty they recognize).
>
?...?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/89195568/attachment.html>

From verdy_p at wanadoo.fr  Thu Jul  2 07:20:50 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 2 Jul 2015 14:20:50 +0200
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAJ2xs_HtQ1MizmPN-=gvxcJZkQ5pW0CU++cuPUm5T+jRmjB3XA@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <CAJ2xs_HtQ1MizmPN-=gvxcJZkQ5pW0CU++cuPUm5T+jRmjB3XA@mail.gmail.com>
Message-ID: <CAGa7JC0q79iQj7Ui9TBab9MacSOCV0qzm6NYLBWKY+n37qhg9g@mail.gmail.com>

It was not just about it but on the fact that nothing is solved and for
things that Unicode does not want to support, there should be a better way
using existing standards to bind some object with semantics taken from a
blind but easily parsable object (here an URI ,without the need to reinvent
a way to encode it, just a plain URI just surrounded by a couple of
controls). No need then to describe what will be in that URI, it will just
need to be interpreted as a unique indentifier within some namespace.
With that it will be possible to create catalogs and standardize a few of
them. The system will not be limited to geopolitical entities. And nobody
will need to support all the namespaces or even to perform any external
query to some rogue server delivering malicious content. The URI could
still embed a small image using the "data:" URI scheme.
Also I criticize the fact of using RIS to decribe a "standard" feature in
the UCS, when they will be bound to unstable ISO standards which are
already politically biased. RIS was a bad choice the way it was specified,
and even its specification does not fully conforms to these ISO standards.

2015-07-02 14:05 GMT+02:00 Mark Davis ?? <mark at macchiato.com>:

> Ok. I wasn't clear enough. Certainly boundaries are political and
> relevant, as is the fact that they change. What is not relevant is talking
> about particular country's motivations and actions.
>
> Moreover, you insist about writing a tome about this. In other words,
> TL;DR.
>
> Mark <https://google.com/+MarkDavis>
>
> *? Il meglio ? l?inimico del bene ?*
>
> On Thu, Jul 2, 2015 at 11:01 AM, Philippe Verdy <verdy_p at wanadoo.fr>
> wrote:
>
>> The political subject is immediately related to the designation of flags
>> and their association to ISO 3166-1 and -2 encoded entities. Even if you
>> don't like it, this is very political and for a standard seeking for
>> stability, I wonder how any flag (directly bound to specific political
>> entities at specific dates and within some boundaries which may be
>> contested) can be related to ISO 3166 and its instability (and the fact
>> that ISO 3166 entities have in fact also no defined borders, so that ISO
>> 3166-2 is just a political point of view from the current ruler of the
>> current ISO 3166-1 entity).
>>
>> All this topic is political. In fact the real flags are not even encoded
>> with RIS, not even for current nations (and there's still a problem to know
>> what is a recognized nation, even when just considering the UN definition.
>> Political entities are defined but with fuzzy borders, they just represent
>> in fact some local governments, not necessarily their lands, people, or
>> cultures, and in some cases they are in exil or not even ruling: their seat
>> in the UN is vacant and they exist only on the paper, but even UN members
>> disagree about which treaty they recognize).
>>
> ?...?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/8acd516c/attachment.html>

From nslater at tumbolia.org  Thu Jul  2 07:33:03 2015
From: nslater at tumbolia.org (Noah Slater)
Date: Thu, 02 Jul 2015 12:33:03 +0000
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CAGa7JC0q79iQj7Ui9TBab9MacSOCV0qzm6NYLBWKY+n37qhg9g@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <CAJ2xs_HtQ1MizmPN-=gvxcJZkQ5pW0CU++cuPUm5T+jRmjB3XA@mail.gmail.com>
 <CAGa7JC0q79iQj7Ui9TBab9MacSOCV0qzm6NYLBWKY+n37qhg9g@mail.gmail.com>
Message-ID: <CA+Y+444dKoh28JsNi3o=xn-HrXhr9-40PU8LOs6nog8U_9-x-A@mail.gmail.com>

Correct me if I'm wrong, but it seems like Philippe's core argument is that
geopolitical entities and flags (as a specific instances of a design, in
the heraldic sense) are disjoint. And that using geopolitical codes to
refer to these designs is inherently unstable.

On Thu, 2 Jul 2015 at 13:26 Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> It was not just about it but on the fact that nothing is solved and for
> things that Unicode does not want to support, there should be a better way
> using existing standards to bind some object with semantics taken from a
> blind but easily parsable object (here an URI ,without the need to reinvent
> a way to encode it, just a plain URI just surrounded by a couple of
> controls). No need then to describe what will be in that URI, it will just
> need to be interpreted as a unique indentifier within some namespace.
> With that it will be possible to create catalogs and standardize a few of
> them. The system will not be limited to geopolitical entities. And nobody
> will need to support all the namespaces or even to perform any external
> query to some rogue server delivering malicious content. The URI could
> still embed a small image using the "data:" URI scheme.
> Also I criticize the fact of using RIS to decribe a "standard" feature in
> the UCS, when they will be bound to unstable ISO standards which are
> already politically biased. RIS was a bad choice the way it was specified,
> and even its specification does not fully conforms to these ISO standards.
>
> 2015-07-02 14:05 GMT+02:00 Mark Davis ?? <mark at macchiato.com>:
>
>> Ok. I wasn't clear enough. Certainly boundaries are political and
>> relevant, as is the fact that they change. What is not relevant is talking
>> about particular country's motivations and actions.
>>
>> Moreover, you insist about writing a tome about this. In other words,
>> TL;DR.
>>
>> Mark <https://google.com/+MarkDavis>
>>
>> *? Il meglio ? l?inimico del bene ?*
>>
>> On Thu, Jul 2, 2015 at 11:01 AM, Philippe Verdy <verdy_p at wanadoo.fr>
>> wrote:
>>
>>> The political subject is immediately related to the designation of flags
>>> and their association to ISO 3166-1 and -2 encoded entities. Even if you
>>> don't like it, this is very political and for a standard seeking for
>>> stability, I wonder how any flag (directly bound to specific political
>>> entities at specific dates and within some boundaries which may be
>>> contested) can be related to ISO 3166 and its instability (and the fact
>>> that ISO 3166 entities have in fact also no defined borders, so that ISO
>>> 3166-2 is just a political point of view from the current ruler of the
>>> current ISO 3166-1 entity).
>>>
>>> All this topic is political. In fact the real flags are not even encoded
>>> with RIS, not even for current nations (and there's still a problem to know
>>> what is a recognized nation, even when just considering the UN definition.
>>> Political entities are defined but with fuzzy borders, they just represent
>>> in fact some local governments, not necessarily their lands, people, or
>>> cultures, and in some cases they are in exil or not even ruling: their seat
>>> in the UN is vacant and they exist only on the paper, but even UN members
>>> disagree about which treaty they recognize).
>>>
>> ?...?
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/7c36c8cd/attachment.html>

From doug at ewellic.org  Thu Jul  2 08:04:13 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 2 Jul 2015 07:04:13 -0600
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <CA+Y+444dKoh28JsNi3o=xn-HrXhr9-40PU8LOs6nog8U_9-x-A@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <CAJ2xs_HtQ1MizmPN-=gvxcJZkQ5pW0CU++cuPUm5T+jRmjB3XA@mail.gmail.com>
 <CAGa7JC0q79iQj7Ui9TBab9MacSOCV0qzm6NYLBWKY+n37qhg9g@mail.gmail.com>
 <CA+Y+444dKoh28JsNi3o=xn-HrXhr9-40PU8LOs6nog8U_9-x-A@mail.gmail.com>
Message-ID: <417C8958513D4476884721AB0881975A@DougEwell>

Noah Slater wrote:

> Correct me if I'm wrong, but it seems like Philippe's core argument is
> that geopolitical entities and flags (as a specific instances of a
> design, in the heraldic sense) are disjoint. And that using
> geopolitical codes to refer to these designs is inherently unstable.

But the only alternative is to encode about 200 discrete emoji for what 
we think of as "country" flags, plus somewhere between 0 and 5000 for 
flags of what we think of as "subdivisions."

And in the end, when users see these emoji, they will still think "Oh, 
that's the US flag" or "the French flag" or "the Japanese flag" or 
whatever. They will still associate them with geopolitical entities. 
That's the whole purpose of such flags.

(Either that or they will associate them with languages, which is far 
more unstable than anything else being discussed here.)

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Thu Jul  2 08:12:23 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 2 Jul 2015 07:12:23 -0600
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
Message-ID: <ADFAED45AB884663A541DF939130216D@DougEwell>

I wrote:

> But the only alternative is to encode about 200 discrete emoji [...]

Here I am assuming that UTC will not shift gears and approve an 
"embedded URI" scheme, which sounds way too much like localizable 
you-know-whats.

--
Doug Ewell | http://ewellic.org | Thornton, CO ???? 


From charupdate at orange.fr  Thu Jul  2 03:37:17 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 2 Jul 2015 10:37:17 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
Message-ID: <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>

On Tue, Jun 30, 2015, Doug Ewell  wrote:

> Khaled Hosny wrote:
> 
> >> On my netbook, which is running Windows 7 Starter, U+2060 is not a
> >> part of any of the shipped fonts.
> >
> > It is a control character, it does not need to have a glyph in the
> > font to be properly supported.

Thank you Khaled, I will respond soon after this.

> The problem is the word "supported." Marcel is seeing a visible glyph (a
> .notdef box) for what is supposed to be an invisible, zero-width
> character, and that is leading him to conclude that Windows doesn't
> "support" this character.

The .notdef box is exactly what I see sometimes on the Notepad and every time in the Word dialogs when I use U+2060, but in fact, what I see in the document is a particular glyph, representing a tall fullheight empty box with a wide space to its right despite of the font being proportional, and in the Notepad text the same box but without space. Only when I switch the font to the one you indicate below, the word joiner displays correctly on my version of Microsoft Word. Please see the attached screenshots (I wanted to paste them into this e-mail).

> On my Win 7 machine at work, when I enter the string "one?two"
> ("one\u2060two") and click on either word, both words are selected. That
> is exactly what I would expect WJ to do. This works on the built-in
> Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's
> Web-based email client).

The selection with double-click corresponds to what Richard did with the quick cursor move. These phenomena are text processing features which give little evidence on the presence or the absence of word boundaries. So I redid your test but used the search tool, with the "Whole words only" option enabled. This gives an idea of how the application percieves the words as entities, or better said, how developers expect users to expect search results. Well that isn't really a better expression... What I?want to say is that what we see is normally what we are expected to expect. Personally I wouldn't like to get selected only a part of the compound I want most probably to mark up as a whole, nor do you, Doug. This is why a double-click on no matter which spot on the sequence makes this sequence selected as a whole. By contrast, given that we took care to insert word joiners where normally we aren't expected to (because it is sufficient to simply type the words one after each other without anything between, to get them as *one* word), the software engineers expect us to wish to join what must remain a sequence of separate words. Consequently, the built-in search engine will recognize each word as a word for itself.

This is where good software deploys its benefits. Some software does not recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating the presence of a word boundary, and therefore does not work correctly. That depends also on the PDF?conversion tool. Please check the screenshots (I switched the UIs to English wherever possible, that is, on LibreOffice).

> But out of more than 500 fonts on that machine, the only stock Microsoft
> fonts that show WJ with zero-width, instead of a .notdef glyph, are
> Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's
> inaccurate to extrapolate this to "Microsoft doesn't support WJ," the
> font support is definitely lacking.

I wish to thank you personally Doug, for this very valuable hint. Effectively, on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly displayed unless the font is switched to Segoe UI Symbol (which is the one out of the three that had been shipped with my OS). If the Segoe typeface is not appropriate in the document, we can ask Word to find and replace all istances of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word users are expected to do every time. Even if that isn't really what we expect of a Productivity Suite. Perhaps, or most probably, this problem does not occur in other high-end software, as Microsoft Publisher (needs to be confirmed). But if somebody buys Microsoft Office Premium, or Professional, he should be save from that misfunctioning. As should be everybody using Microsoft software, in fact.

> The bit about characters being converted to other characters, of course,
> has nothing to do with Windows and everything to do with particular
> applications.

Based on this hint, I did more tests and found out that for a proper conversion to plain text, any segment including U+00A0, U+FEFF and other format characters, when copied from a document on Microsoft Word, must first be pasted into a LibreOffice document, then copied again and finally pasted into the text editor. I should avoid to vent further about that issue, and I'd better wait for official comments; I simply suppose that there is an algorithm (say, then, as a part of Microsoft Word) detecting where the clipboard item goes to, and eventually destroying the format characters. Guess everybody to what use...

Thanks a lot!

Marcel


?

?

?

?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen m 2015-07-02 04.08.jpg
Type: image/jpeg
Size: 156419 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0006.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen m 2015-07-02 04.59.jpg
Type: image/jpeg
Size: 150875 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0007.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen m 2015-07-02 04.32.jpg
Type: image/jpeg
Size: 200880 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0008.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen m 2015-07-02 04.42.jpg
Type: image/jpeg
Size: 126705 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0009.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen m 2015-07-02 05.08.jpg
Type: image/jpeg
Size: 197542 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0010.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen m 2015-07-02 05.21.png
Type: image/png
Size: 90615 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen m 2015-07-02 04.19.jpg
Type: image/jpeg
Size: 176376 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/96c6ad51/attachment-0011.jpg>

From charupdate at orange.fr  Thu Jul  2 04:39:54 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 2 Jul 2015 11:39:54 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
Message-ID: <1396690673.11696.1435829994863.JavaMail.www@wwinf1m18>

On Tue, Jun 30, 2015, Doug Ewell  wrote:

> Khaled Hosny wrote:
> 
> >> On my netbook, which is running Windows 7 Starter, U+2060 is not a
> >> part of any of the shipped fonts.
> >
> > It is a control character, it does not need to have a glyph in the
> > font to be properly supported.

Thank you Khaled, I will respond soon after this.

> The problem is the word "supported." Marcel is seeing a visible glyph (a
> .notdef box) for what is supposed to be an invisible, zero-width
> character, and that is leading him to conclude that Windows doesn't
> "support" this character.

The .notdef box is exactly what I see sometimes on the Notepad and every time in the Word dialogs when I use U+2060, but in fact, what I see in the document is a particular glyph, representing a tall fullheight empty box with a wide space to its right despite of the font being proportional, and in the Notepad text the same box but without space. Only when I switch the font to the one you indicate below, the word joiner displays correctly on my version of Microsoft Word. Please see the attached screenshots (I wanted to paste them into this e-mail).

> On my Win 7 machine at work, when I enter the string "one?two"
> ("one\u2060two") and click on either word, both words are selected. That
> is exactly what I would expect WJ to do. This works on the built-in
> Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's
> Web-based email client).

The selection with double-click corresponds to what Richard did with the quick cursor move. These phenomena are text processing features which give little evidence on the presence or the absence of word boundaries. So I redid your test but used the search tool, with the "Whole words only" option enabled. This gives an idea of how the application percieves the words as entities, or better said, how developers expect users to expect search results. Well that isn't really a better expression... What I?want to say is that what we see is normally what we are expected to expect. Personally I wouldn't like to get selected only a part of the compound I want most probably to mark up as a whole, nor do you, Doug. This is why a double-click on no matter which spot on the sequence makes this sequence selected as a whole. By contrast, given that we took care to insert word joiners where normally we aren't expected to (because it is sufficient to simply type the words one after each other without anything between, to get them as *one* word), the software engineers expect us to wish to join what must remain a sequence of separate words. Consequently, the built-in search engine will recognize each word as a word for itself.

This is where good software deploys its benefits. Some software does not recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating the presence of a word boundary, and therefore does not work correctly. That depends also on the PDF?conversion tool. Please check the screenshots (I switched the UIs to English wherever possible, that is, on LibreOffice). [This e-mail has been blocked because it contained several attached screenshots. So I resend it without attached images.]

> But out of more than 500 fonts on that machine, the only stock Microsoft
> fonts that show WJ with zero-width, instead of a .notdef glyph, are
> Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's
> inaccurate to extrapolate this to "Microsoft doesn't support WJ," the
> font support is definitely lacking.

I wish to thank you personally Doug, for this very valuable hint. Effectively, on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly displayed unless the font is switched to Segoe UI Symbol (which is the one out of the three that had been shipped with my OS). If the Segoe typeface is not appropriate in the document, we can ask Word to find and replace all istances of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word users are expected to do every time. Even if that isn't really what we expect of a Productivity Suite. Perhaps, or most probably, this problem does not occur in other high-end software, as Microsoft Publisher (needs to be confirmed). But if somebody buys Microsoft Office Premium, or Professional, he should be save from that misfunctioning. As should be everybody using Microsoft software, in fact.

> The bit about characters being converted to other characters, of course,
> has nothing to do with Windows and everything to do with particular
> applications.

Based on this hint, I did more tests and found out that for a proper conversion to plain text, any segment including U+00A0, U+FEFF and other format characters, when copied from a document on Microsoft Word, must first be pasted into a LibreOffice document, then copied again and finally pasted into the text editor. I should avoid to vent further about that issue, and I'd better wait for official comments; I simply suppose that there is an algorithm (say, then, as a part of Microsoft Word) detecting where the clipboard item goes to, and eventually destroying the format characters. Guess everybody to what use...

Thanks a lot!

Marcel


[one pasted screenshot]

?

?

?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/c87a93b7/attachment.html>

From doug at ewellic.org  Thu Jul  2 09:54:28 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 02 Jul 2015 07:54:28 -0700
Subject: Representing Additional Types of Flags
Message-ID: <20150702075428.665a7a7059d7ee80bb4d670165c8327d.f848ab7c97.wbe@email03.secureserver.net>

There must be a problem with my browser. When it displays the PRI #299
background document, there is text about using CLDR entities to define
regions and subdivisions, to preclude stability problems in ISO 3166-1.
Apparently that text doesn't appear on other people's browsers.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Thu Jul  2 10:07:34 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 02 Jul 2015 08:07:34 -0700
Subject: Representing Additional Types of Flags
Message-ID: <20150702080734.665a7a7059d7ee80bb4d670165c8327d.ee02e411cb.wbe@email03.secureserver.net>

Also posted as formal feedback to the PRI:

6. What is the policy on generating flag tags with unicode_region_subtag
values corresponding to private-use BCP 47 subtags, other than those
given special semantics by CLDR? Are they invalid or merely discouraged?
Should tools allow users to create such a tag? Is there any provision
for a "private agreement," similar to that defined in Unicode for PUA
usage?


--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From mark at macchiato.com  Thu Jul  2 11:10:50 2015
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Thu, 2 Jul 2015 18:10:50 +0200
Subject: Representing Additional Types of Flags
In-Reply-To: <20150630145719.665a7a7059d7ee80bb4d670165c8327d.06f042790e.wbe@email03.secureserver.net>
References: <20150630145719.665a7a7059d7ee80bb4d670165c8327d.06f042790e.wbe@email03.secureserver.net>
Message-ID: <CAJ2xs_Gx5tUut9_CXdRaAfvR2PNutUjnk6xV5Ac=KNiaiCd17w@mail.gmail.com>

I'll try to answer a few of these.


Mark <https://google.com/+MarkDavis>

*? Il meglio ? l?inimico del bene ?*

On Tue, Jun 30, 2015 at 11:57 PM, Doug Ewell <doug at ewellic.org> wrote:

> Re-posting my comments and questions on this PRI to the list. I've
> already submitted them as formal feedback.
>
> .
>
> I support this proposal. I have the following questions:
>
> 1. The existing RIS-based flag mechanism is based on ISO 3166-1 (TUS 7.0
> ?22.10). In this proposal, "valid" tag sequences would instead be
> determined by CLDR data and LDML specification. Is there any precedent
> for CLDR to define the validity of Unicode character sequences?
>

?We already have, in tr51, the unicode_region_codes being used for validity
testing of flags:
http://unicode.org/reports/tr51/#Encoding
http://unicode.org/reports/tr51/#Flags?


?Those are typically the same as the ISO codes, but do add XK
http://unicode.org/reports/tr35/#unicode_region_subtag?


> 2. What is the policy on generating flag tags with deprecated
> unicode_region_subtag or unicode_subdivision_subtag values, such as
> "[flag]UK"? How "discouraged" would such a tag be? Should tools allow
> users to create such a tag?
>

CLDR treats UK as deprecated. When a code is deprecated, we strongly
discourage its use in new data, but normally allow it for old data. But the
UK is somewhat different, since it really shouldn't ever be valid as it
stands. The purpose for UK in CLDR metadata is so that locale ID
canonicalization can map en-UK (which occurs quite often) to en-GB, and so
on. (We do this also for overlong codes like eng-GB => en-GB.)
?

But you're right; we need to be able to distinguish this case (and ones
like it.) I filed
http://unicode.org/cldr/trac/ticket/8736?

?


>
> 3. The subdivisions.xml file contains a "subtype" hierarchy, reflecting
> the "parent subdivision" relationship in ISO 3166-2. So region 'FR'
> contains subdivision 'J' (?le-de-France), which itself contains
> subdivision '75' (Paris). Is there any significance to the "subtype"
> hierarchy as far as flag tags are concerned, or are "[flag]FRJ" and
> "[flag]FR75" equally valid?
>

?No, there isn't. But see also E.5 in
http://www.unicode.org/review/pri299/pri299-additional-flags-background.html
?


>
> 4. The entry for "001" in subdivisions.xml contains each of the
> two-letter codes for regions (countries) that have their own
> subdivisions. This is less than the set of all regions; for example,
> Anguilla (AI) does not have ISO 3166-2 subdivisions and so is not
> listed. This implies that a tag like "[flag]001US" is valid (and
> equivalent to "US" spelled with RIS, which is preferred) but
> "[flag]001AI" is not valid. Is this intended? If not, can it be
> clarified?
>

?Good catch, the 001 shouldn't even exist in the subdivisionContainment.
This is now fixed in trunk.

(The subdivision addition will only be final in September, so feedback on
it now would be great.  People can file tickets at
http://unicode.org/cldr/trac/newticket
?)?
?


>
> 5. Will any preliminary examples of CLDR 4-character subdivision codes
> be made available before any such codes are actually assigned?
>

?The only purpose for the 4-character subdivision codes is stability. So
let's suppose that Colorado decides to join Canada (thereby deprecating CO
in
ISO 3166-2
), and British Columbia decides to join the US (getting the code CO in
ISO 3166-2
). In that case, CLDR would keep the old code CO (but deprecated) and
create a new 4-letter code for BC, such as XXCO. This is just for
illustration, of course, I've heard no rumors about either political
shift...


> .
>
> The PRI #299 mechanism is clearly and intentionally oriented toward
> representing flags of well-defined geopolitical entities.
>
> Any proposal to extend the mechanism to cover the many other types of
> flags -- for historical regions, NGOs, maritime, sports, or social or
> political causes -- must be systematic and well-planned, not ad-hoc or
> haphazard, to assure interoperability and extensibility.
>

?Firmly agreed.
?


>
> The documentation for the PRI #299 mechanism should state clearly that
> (e.g.) the Confederate battle flag, the Olympic flag, the Esperanto
> flag, the LGBT rainbow flag, and the naval flags used to spell out
> "ENGLAND EXPECTS" can be represented only via a proper extension to the
> mechanism, not by ad-hoc means such as the use of unassigned or
> private-use combinations. This is at least as important as ensuring the
> stable coding of geopolitical flags.
>

?Yes, again a good point.

>
6. What is the policy on generating flag tags with unicode_region_subtag
values corresponding to private-use BCP 47 subtags, other than those
given special semantics by CLDR? Are they invalid or merely discouraged?
Should tools allow users to create such a tag? Is there any provision
for a "private agreement," similar to that defined in Unicode for PUA
usage?

?We'll have to address that. My view is that they should not be valid: if
someone wants a PU flag, of any source, they have over 130,000 Unicode PU
character?s to play with.

?


>
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/b6a9442a/attachment.html>

From kenwhistler at att.net  Thu Jul  2 12:04:38 2015
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 02 Jul 2015 10:04:38 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
Message-ID: <55956F26.9030607@att.net>


On 7/2/2015 2:01 AM, Philippe Verdy wrote:
>
> The frozen status of Antarctica ...

... will be addressed separately by global warming. But be that as it may...

>
> In really there's still no standard way to encode flags unambiguously 
> and in a stable way. We'd like to have FOTW (Flags of the World) 
> contributors to propose their own scheme. But it will not be 
> compatible with the current RIS solution or the proposed extension. If 
> ever such standard emerges, it will require encoding a new set of 
> characters.

The UTC is neither responsible for nor interested in a "standard way
to encode flags unambiguously". I suspect one of the reasons this
discussion is tending to derail into political topics and too much detail
about particular flags and their stability and the stability of geopolitical
entities they represent and yadda yadda, is that people seem ineluctably
drawn to the misapprehension that this is all about standard encoding
of flags.

It is not.

Rather, it is about a standard way to represent recognizable and 
interchangeable
emoji (colorful little pictographs) of flags, using defined sequences of
Unicode characters.

The existing mechanism using regional indicator symbol (RIS) pairs was
originally aimed at solving the following problems:

1. Enabling the reliable interchange of the legacy 10 flag emoji from 
Japanese
carrier sets.

2. Enabling the completion of the encoding of emoji to cover the rest
of the Japanese carrier sets without all progress dragging to a
complete halt as national bodies in SC2 would argue interminably over
a "standard way to encode flags unambiguously" in an ISO standard.

3. Dealing with the inevitable hue and cry: "China and Japan and the US 
got their flag!
Why can't I get my country's flag??!"

And it appears that the RIS mechanism succeeded spectacularly well in
addressing all of those design goals.

In the middle of last year, for example, there was a major media and
internet campaign to "encode the flag of India". Well, the RIS mechanism
handled the real issue there just fine -- when the new phones started
coming out with support for display and interchange of emoji for flags
using the RIS sequences, there was the emoji for the flag of India for
everybody to use. Problem solved.

And the problem which was solved was /not /the determination that
the <1F1EE, 1F1F3> RIS sequence "IN" meant /precisely /the current
national flag of India, the saffron, white and green tricolor with the
Ashoka Chakra, and *not* any other flag of India (the flag of the
Indian army, the flag of the Mughal Empire, the flag of British
India, etc.). The RIS sequence "IN" was just mapped to the colorful
little emoji glyph for the Indian flag that everybody wanted to interchange.

The Unicode Standard is not a vexillology standard -- nor will it ever be.
It is a standard for the encoding and interchange of characters.

The *character* problem we are faced with here is that people want
to use and interchange colorful little emoji pictographs of various
flags in text streams. The RIS mechanism addresses a significant
part of that problem, but is not extensible to cover the full scope of the
demand.

And what is the scope of the additional demand?

1. The first part can be summed up as: *the flag of Scotland problem*.

In other words, there are a number of high visibility, high demand,
widely recognized /regional/ flags that would be interchanged as just
more emoji pictographs, if a mechanism for that were available.

People who want to use an emoji for the flag of Scotland just as
easily as someone can use an emoji for the flag of Great Britain
are not going to accept an argument that says, "Well, we can't do
that on your phones because there is no 3166-1 country code registered, so
we can't map a Scotland flag emoji glyph to a RIS pair."

Hence the PRI #299 proposal: for an extension mechanism that would
address the flag of Scotland problem in a generic and reasonably
stable way.

2. The second part can be summed up as: *the rainbow flag problem*.

In other words, there are a number of high visibility, high demand,
widely recognized /non-governmental/ flags that would be interchanged
as just more emoji pictographs, if a mechanism for that were available.

 From the public's point of view, this is another no brainer: if the
flag of Japan and the flag of Scotland, why not the rainbow flag??!
They aren't interested in the limitations of the underlying representation
mechanisms, nor should they be, IMO.

The problem the UTC faces here is that there are a number of
reasonable and popular candidates, which the rainbow flag amply
exemplifies, for more colorful little emoji pictographs for flags that
people would like to interchange -- but there is no obvious and
extensible way to do so reliably in terms of sequences of Unicode
characters in a plain text stream. The PRI #299 proposal does not
extend into this realm, for many of the reasons pointed
out by Doug Ewell.

There are a number of potential approaches to address the rainbow
flag problem. For example:

a. use private-use characters
b. pursue one-by-one encoding of each newly desired flag pictograph as a 
symbol
c. extend the unicode_region_subtag and unicode_subdivision_subtag
scheme in CLDR to add some new subtag addressing a separate,
non-geopolitical hierarchy
d. create a separate extension using TAG characters but with a
syntax not dependent on CLDR subtag definitions
e. create a registry of flag entities suitable for representation as
emoji, together with a "c" or "d" style syntax
f. something else?
g. do nothing (and perhaps hope that stickers will solve the problem)

If we are to make any progress here in addressing the actual scope
of "the rainbow flag problem", I suggest we focus on the details and
pros and cons of suggestions like those of a through g above, rather than
pursuing more discussion recapitulating the history of the borders of 
Tibet --
which truly are out of scope here.

--Ken


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/e1072797/attachment.html>

From doug at ewellic.org  Thu Jul  2 12:33:30 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 02 Jul 2015 10:33:30 -0700
Subject: Representing Additional Types of Flags
Message-ID: <20150702103330.665a7a7059d7ee80bb4d670165c8327d.b24129c345.wbe@email03.secureserver.net>

Mark Davis ?? <mark at macchiato dot com> wrote:

>> Is there any precedent for CLDR to define the validity of Unicode
>> character sequences?
>
> We already have, in tr51, the unicode_region_codes being used for
> validity testing of flags:
> http://unicode.org/reports/tr51/#Encoding
> http://unicode.org/reports/tr51/#Flags

the second of which (Annex B) says:

"The valid region sequences are specified by Unicode region subtags as
defined in [CLDR], excluding those that are designated private-use or
deprecated in [CLDR]."

In that case, the wording in TUS needs to be corrected, because TUS 7.0
?22.10 says:

"The regional indicator symbols in the range U+1F1E6..U+1F1FF can be
used in pairs to represent an ISO 3166 region code."

It doesn't say anything about valid pairs being defined by CLDR instead
of ISO. I wonder how many users actually know this.

> Those are typically the same as the ISO codes, but do add XK
> http://unicode.org/reports/tr35/#unicode_region_subtag

So QO, QU, and ZZ would be excluded, since those are private-use in BCP
47 and hence also in CLDR. But XK is included, even though it is also
private-use. Is this correct? Can an application tell that XK is in and
the others are out, just by looking at CLDR data?

Also, I assume all of the same include/exclude rules apply both to RIS
combinations and to PRI #299-style flag tags. Please let me know if
that's not true.

> CLDR treats UK as deprecated.
> [...]
> But you're right; we need to be able to distinguish this case (and
> ones like it.) I filed
> http://unicode.org/cldr/trac/ticket/8736

OK, so UK is not valid in RIS combinations or flag tags either. Glad to
see that clarified.

>> Is there any significance to the "subtype" hierarchy as far as flag
>> tags are concerned, or are "[flag]FRJ" and "[flag]FR75" equally
>> valid?
>
> ?No, there isn't. But see also E.5 in
> http://www.unicode.org/review/pri299/pri299-additional-flags-background.html

Right, clearly flags don't exist for many of the subdivisions. But I'm
not sure this is the same question as whether the three-level hierarchy
is relevant. In my example, ?le-de-France and Paris both have flags,
and they aren't the same. (Wikipedia says the ?le-de-France flag is
"non-official and unused," but they do have a page for it, and in any
case there are probably better examples.)

> The only purpose for the 4-character subdivision codes is stability.
> So let's suppose that Colorado decides to join Canada (thereby
> deprecating CO in ISO 3166-2), and British Columbia decides to join
> the US (getting the code CO in ISO 3166-2). In that case, CLDR would
> keep the old code CO (but deprecated) and create a new 4-letter code
> for BC, such as XXCO. This is just for illustration, of course, I've
> heard no rumors about either political shift...

Thanks for the 'XXCO' example; this is different from tending toward
'COXX' and was what I was looking for.

The exact scenario would not apply, of course, due to the agreement to
keep subdivision codes unique across the US/Canada border. I'd suppose
this would be preserved, and 3166-2 would assign US-BC to "British
Columbia as US state," and there would be no coding conflict to resolve.
But again, additional examples could easily be dreamed up: replace BC
with the Central Abaco region of the Bahamas (currently BS-CO), which
isn't that far away.

>> (private-use flag tags)
>
> ?We'll have to address that. My view is that they should not be valid:
> if someone wants a PU flag, of any source, they have over 130,000
> Unicode PU character?s to play with.

I concur, and this is consistent with Annex B.

Thanks,

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From mark at macchiato.com  Thu Jul  2 12:44:23 2015
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Thu, 2 Jul 2015 19:44:23 +0200
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <55956F26.9030607@att.net>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <55956F26.9030607@att.net>
Message-ID: <CAJ2xs_HC5sFzJjPH7VbLFpFn5Lm9QbmGs93gVOFF-gyJ0qBM2w@mail.gmail.com>

To add some information that people like Noah may not be aware of:

This email list is an open, public list for arbitrary discussions about
Unicode and software internationalization. It is *not* an email list for
consortium business?the vast majority of the people on it are *not* members
of the Unicode consortium, and are simply expressing their opinions on a
particular topic, as individuals. Members of the consortium are *not*
necessarily active on this list. Those who are do *not* necessarily engage
in every topic.

It can be a useful place to talk about possible proposals, but any opinions
provided here (or that appear in random blogs, news articles, or change.org
petitions) are *not* taken into account by the consortium. Anyone wanting a
proposal to be considered *should* submit it via
http://unicode.org/reporting.html. Those submissions *are* considered by
the relevant technical body in the consortium.

People proposing new *emoji* characters *should* read
http://unicode.org/reports/tr51/#Selection_Factors beforehand, and follow
the guidelines there. Proposals about emoji are directed to the emoji
subcommittee, which meets weekly. It makes recommendations to the UTC,
which meets quarterly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/fa9a5572/attachment.html>

From leob at mailcom.com  Thu Jul  2 12:46:47 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Thu, 2 Jul 2015 10:46:47 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <55956F26.9030607@att.net>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <55956F26.9030607@att.net>
Message-ID: <CAFmvRsdP++jfmB00N43BfD1rvWotFaTYCsZz9VD+DcV_raUQ8w@mail.gmail.com>

Why not add another 26 A-Z characters, call them "regional
supplementary symbols", and let carriers decide what to encode and how
to encode what they want with sequences <RIS> <RSS>* <RIS> to their
hearts' content?

Leo

On Thu, Jul 2, 2015 at 10:04 AM, Ken Whistler <kenwhistler at att.net> wrote:
>
> On 7/2/2015 2:01 AM, Philippe Verdy wrote:
>
>
> The frozen status of Antarctica ...
>
>
> ... will be addressed separately by global warming. But be that as it may...
>
>
> In really there's still no standard way to encode flags unambiguously and in
> a stable way. We'd like to have FOTW (Flags of the World) contributors to
> propose their own scheme. But it will not be compatible with the current RIS
> solution or the proposed extension. If ever such standard emerges, it will
> require encoding a new set of characters.
>
>
> The UTC is neither responsible for nor interested in a "standard way
> to encode flags unambiguously". I suspect one of the reasons this
> discussion is tending to derail into political topics and too much detail
> about particular flags and their stability and the stability of geopolitical
> entities they represent and yadda yadda, is that people seem ineluctably
> drawn to the misapprehension that this is all about standard encoding
> of flags.
>
> It is not.
>
> Rather, it is about a standard way to represent recognizable and
> interchangeable
> emoji (colorful little pictographs) of flags, using defined sequences of
> Unicode characters.
>
> The existing mechanism using regional indicator symbol (RIS) pairs was
> originally aimed at solving the following problems:
>
> 1. Enabling the reliable interchange of the legacy 10 flag emoji from
> Japanese
> carrier sets.
>
> 2. Enabling the completion of the encoding of emoji to cover the rest
> of the Japanese carrier sets without all progress dragging to a
> complete halt as national bodies in SC2 would argue interminably over
> a "standard way to encode flags unambiguously" in an ISO standard.
>
> 3. Dealing with the inevitable hue and cry: "China and Japan and the US got
> their flag!
> Why can't I get my country's flag??!"
>
> And it appears that the RIS mechanism succeeded spectacularly well in
> addressing all of those design goals.
>
> In the middle of last year, for example, there was a major media and
> internet campaign to "encode the flag of India". Well, the RIS mechanism
> handled the real issue there just fine -- when the new phones started
> coming out with support for display and interchange of emoji for flags
> using the RIS sequences, there was the emoji for the flag of India for
> everybody to use. Problem solved.
>
> And the problem which was solved was not the determination that
> the <1F1EE, 1F1F3> RIS sequence "IN" meant precisely the current
> national flag of India, the saffron, white and green tricolor with the
> Ashoka Chakra, and *not* any other flag of India (the flag of the
> Indian army, the flag of the Mughal Empire, the flag of British
> India, etc.). The RIS sequence "IN" was just mapped to the colorful
> little emoji glyph for the Indian flag that everybody wanted to interchange.
>
> The Unicode Standard is not a vexillology standard -- nor will it ever be.
> It is a standard for the encoding and interchange of characters.
>
> The *character* problem we are faced with here is that people want
> to use and interchange colorful little emoji pictographs of various
> flags in text streams. The RIS mechanism addresses a significant
> part of that problem, but is not extensible to cover the full scope of the
> demand.
>
> And what is the scope of the additional demand?
>
> 1. The first part can be summed up as: the flag of Scotland problem.
>
> In other words, there are a number of high visibility, high demand,
> widely recognized regional flags that would be interchanged as just
> more emoji pictographs, if a mechanism for that were available.
>
> People who want to use an emoji for the flag of Scotland just as
> easily as someone can use an emoji for the flag of Great Britain
> are not going to accept an argument that says, "Well, we can't do
> that on your phones because there is no 3166-1 country code registered, so
> we can't map a Scotland flag emoji glyph to a RIS pair."
>
> Hence the PRI #299 proposal: for an extension mechanism that would
> address the flag of Scotland problem in a generic and reasonably
> stable way.
>
> 2. The second part can be summed up as: the rainbow flag problem.
>
> In other words, there are a number of high visibility, high demand,
> widely recognized non-governmental flags that would be interchanged
> as just more emoji pictographs, if a mechanism for that were available.
>
> From the public's point of view, this is another no brainer: if the
> flag of Japan and the flag of Scotland, why not the rainbow flag??!
> They aren't interested in the limitations of the underlying representation
> mechanisms, nor should they be, IMO.
>
> The problem the UTC faces here is that there are a number of
> reasonable and popular candidates, which the rainbow flag amply
> exemplifies, for more colorful little emoji pictographs for flags that
> people would like to interchange -- but there is no obvious and
> extensible way to do so reliably in terms of sequences of Unicode
> characters in a plain text stream. The PRI #299 proposal does not
> extend into this realm, for many of the reasons pointed
> out by Doug Ewell.
>
> There are a number of potential approaches to address the rainbow
> flag problem. For example:
>
> a. use private-use characters
> b. pursue one-by-one encoding of each newly desired flag pictograph as a
> symbol
> c. extend the unicode_region_subtag and unicode_subdivision_subtag
> scheme in CLDR to add some new subtag addressing a separate,
> non-geopolitical hierarchy
> d. create a separate extension using TAG characters but with a
> syntax not dependent on CLDR subtag definitions
> e. create a registry of flag entities suitable for representation as
> emoji, together with a "c" or "d" style syntax
> f. something else?
> g. do nothing (and perhaps hope that stickers will solve the problem)
>
> If we are to make any progress here in addressing the actual scope
> of "the rainbow flag problem", I suggest we focus on the details and
> pros and cons of suggestions like those of a through g above, rather than
> pursuing more discussion recapitulating the history of the borders of Tibet
> --
> which truly are out of scope here.
>
> --Ken
>
>

From mark at macchiato.com  Thu Jul  2 12:55:53 2015
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Thu, 2 Jul 2015 19:55:53 +0200
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <CAFmvRsdP++jfmB00N43BfD1rvWotFaTYCsZz9VD+DcV_raUQ8w@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <55956F26.9030607@att.net>
 <CAFmvRsdP++jfmB00N43BfD1rvWotFaTYCsZz9VD+DcV_raUQ8w@mail.gmail.com>
Message-ID: <CAJ2xs_HBvGskHfo88tSsRH1DfVZNMQSRMwSw8+t6WaD4oENMSw@mail.gmail.com>

Again, that has no advantage over PUA characters. Carriers/vendors can
*already* add whatever PUA characters they want to fonts and keyboards. But
of course, the problem is interoperability; you send a flag to a friend for
your favorite vacation spot, Florida, and the friend sees a flag for New
Jersey.


Mark <https://google.com/+MarkDavis>

*? Il meglio ? l?inimico del bene ?*

On Thu, Jul 2, 2015 at 7:46 PM, Leo Broukhis <leob at mailcom.com> wrote:

> Why not add another 26 A-Z characters, call them "regional
> supplementary symbols", and let carriers decide what to encode and how
> to encode what they want with sequences <RIS> <RSS>* <RIS> to their
> hearts' content?
>
> Leo
>
> On Thu, Jul 2, 2015 at 10:04 AM, Ken Whistler <kenwhistler at att.net> wrote:
> >
> > On 7/2/2015 2:01 AM, Philippe Verdy wrote:
> >
> >
> > The frozen status of Antarctica ...
> >
> >
> > ... will be addressed separately by global warming. But be that as it
> may...
> >
> >
> > In really there's still no standard way to encode flags unambiguously
> and in
> > a stable way. We'd like to have FOTW (Flags of the World) contributors to
> > propose their own scheme. But it will not be compatible with the current
> RIS
> > solution or the proposed extension. If ever such standard emerges, it
> will
> > require encoding a new set of characters.
> >
> >
> > The UTC is neither responsible for nor interested in a "standard way
> > to encode flags unambiguously". I suspect one of the reasons this
> > discussion is tending to derail into political topics and too much detail
> > about particular flags and their stability and the stability of
> geopolitical
> > entities they represent and yadda yadda, is that people seem ineluctably
> > drawn to the misapprehension that this is all about standard encoding
> > of flags.
> >
> > It is not.
> >
> > Rather, it is about a standard way to represent recognizable and
> > interchangeable
> > emoji (colorful little pictographs) of flags, using defined sequences of
> > Unicode characters.
> >
> > The existing mechanism using regional indicator symbol (RIS) pairs was
> > originally aimed at solving the following problems:
> >
> > 1. Enabling the reliable interchange of the legacy 10 flag emoji from
> > Japanese
> > carrier sets.
> >
> > 2. Enabling the completion of the encoding of emoji to cover the rest
> > of the Japanese carrier sets without all progress dragging to a
> > complete halt as national bodies in SC2 would argue interminably over
> > a "standard way to encode flags unambiguously" in an ISO standard.
> >
> > 3. Dealing with the inevitable hue and cry: "China and Japan and the US
> got
> > their flag!
> > Why can't I get my country's flag??!"
> >
> > And it appears that the RIS mechanism succeeded spectacularly well in
> > addressing all of those design goals.
> >
> > In the middle of last year, for example, there was a major media and
> > internet campaign to "encode the flag of India". Well, the RIS mechanism
> > handled the real issue there just fine -- when the new phones started
> > coming out with support for display and interchange of emoji for flags
> > using the RIS sequences, there was the emoji for the flag of India for
> > everybody to use. Problem solved.
> >
> > And the problem which was solved was not the determination that
> > the <1F1EE, 1F1F3> RIS sequence "IN" meant precisely the current
> > national flag of India, the saffron, white and green tricolor with the
> > Ashoka Chakra, and *not* any other flag of India (the flag of the
> > Indian army, the flag of the Mughal Empire, the flag of British
> > India, etc.). The RIS sequence "IN" was just mapped to the colorful
> > little emoji glyph for the Indian flag that everybody wanted to
> interchange.
> >
> > The Unicode Standard is not a vexillology standard -- nor will it ever
> be.
> > It is a standard for the encoding and interchange of characters.
> >
> > The *character* problem we are faced with here is that people want
> > to use and interchange colorful little emoji pictographs of various
> > flags in text streams. The RIS mechanism addresses a significant
> > part of that problem, but is not extensible to cover the full scope of
> the
> > demand.
> >
> > And what is the scope of the additional demand?
> >
> > 1. The first part can be summed up as: the flag of Scotland problem.
> >
> > In other words, there are a number of high visibility, high demand,
> > widely recognized regional flags that would be interchanged as just
> > more emoji pictographs, if a mechanism for that were available.
> >
> > People who want to use an emoji for the flag of Scotland just as
> > easily as someone can use an emoji for the flag of Great Britain
> > are not going to accept an argument that says, "Well, we can't do
> > that on your phones because there is no 3166-1 country code registered,
> so
> > we can't map a Scotland flag emoji glyph to a RIS pair."
> >
> > Hence the PRI #299 proposal: for an extension mechanism that would
> > address the flag of Scotland problem in a generic and reasonably
> > stable way.
> >
> > 2. The second part can be summed up as: the rainbow flag problem.
> >
> > In other words, there are a number of high visibility, high demand,
> > widely recognized non-governmental flags that would be interchanged
> > as just more emoji pictographs, if a mechanism for that were available.
> >
> > From the public's point of view, this is another no brainer: if the
> > flag of Japan and the flag of Scotland, why not the rainbow flag??!
> > They aren't interested in the limitations of the underlying
> representation
> > mechanisms, nor should they be, IMO.
> >
> > The problem the UTC faces here is that there are a number of
> > reasonable and popular candidates, which the rainbow flag amply
> > exemplifies, for more colorful little emoji pictographs for flags that
> > people would like to interchange -- but there is no obvious and
> > extensible way to do so reliably in terms of sequences of Unicode
> > characters in a plain text stream. The PRI #299 proposal does not
> > extend into this realm, for many of the reasons pointed
> > out by Doug Ewell.
> >
> > There are a number of potential approaches to address the rainbow
> > flag problem. For example:
> >
> > a. use private-use characters
> > b. pursue one-by-one encoding of each newly desired flag pictograph as a
> > symbol
> > c. extend the unicode_region_subtag and unicode_subdivision_subtag
> > scheme in CLDR to add some new subtag addressing a separate,
> > non-geopolitical hierarchy
> > d. create a separate extension using TAG characters but with a
> > syntax not dependent on CLDR subtag definitions
> > e. create a registry of flag entities suitable for representation as
> > emoji, together with a "c" or "d" style syntax
> > f. something else?
> > g. do nothing (and perhaps hope that stickers will solve the problem)
> >
> > If we are to make any progress here in addressing the actual scope
> > of "the rainbow flag problem", I suggest we focus on the details and
> > pros and cons of suggestions like those of a through g above, rather than
> > pursuing more discussion recapitulating the history of the borders of
> Tibet
> > --
> > which truly are out of scope here.
> >
> > --Ken
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150702/729c9aec/attachment.html>

From richard.wordingham at ntlworld.com  Thu Jul  2 13:02:44 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 2 Jul 2015 19:02:44 +0100
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>
References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
 <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>
Message-ID: <20150702190244.789e44af@JRWUBU2>

On Thu, 2 Jul 2015 10:37:17 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> (because it is
> sufficient to simply type the words one after each other without
> anything between, to get them as *one* word)

This only applies where it is traditional to separate words, a habit
the Romans got out of and the Irish revived.

Unicode Word Boundary Rule WB4 (in UAX #29 'Unicode Text
Segmentation') decrees that U+2060 and U+FEFF be ignored in
word-boundary determination except that newline breaks before them and
that inserting them between between <CR> and <LF> creates an extra word
boundary.

Richard.

From leob at mailcom.com  Thu Jul  2 13:10:27 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Thu, 2 Jul 2015 11:10:27 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <CAJ2xs_HBvGskHfo88tSsRH1DfVZNMQSRMwSw8+t6WaD4oENMSw@mail.gmail.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <55956F26.9030607@att.net>
 <CAFmvRsdP++jfmB00N43BfD1rvWotFaTYCsZz9VD+DcV_raUQ8w@mail.gmail.com>
 <CAJ2xs_HBvGskHfo88tSsRH1DfVZNMQSRMwSw8+t6WaD4oENMSw@mail.gmail.com>
Message-ID: <CAFmvRscGukSV7-nJsERjMovcxMMrzjJT0Z2OuoYQwq8txvfJRQ@mail.gmail.com>

With extensible self-delimited regional indicator sequences the
carriers will be able to come to an agreement and to petition Unicode
to register them as named character sequences symbolizing flags not
encoded by an ISO entity, like various rainbow flags, making sure that
the format of such sequences is guaranteed not to clash with any
existing ISO 3166 format.

Also, ISO 3166-2 can have 2 or 3 letters after the dash; it makes
sense to have the letters after the dash self-delimited, if/when
REGIONAL INDICATOR DASH is added to facilitate encoding of ISO 3166-2
codes.

Leo


On Thu, Jul 2, 2015 at 10:55 AM, Mark Davis ?? <mark at macchiato.com> wrote:
> Again, that has no advantage over PUA characters. Carriers/vendors can
> *already* add whatever PUA characters they want to fonts and keyboards. But
> of course, the problem is interoperability; you send a flag to a friend for
> your favorite vacation spot, Florida, and the friend sees a flag for New
> Jersey.
>
>
> Mark
>
> ? Il meglio ? l?inimico del bene ?
>
> On Thu, Jul 2, 2015 at 7:46 PM, Leo Broukhis <leob at mailcom.com> wrote:
>>
>> Why not add another 26 A-Z characters, call them "regional
>> supplementary symbols", and let carriers decide what to encode and how
>> to encode what they want with sequences <RIS> <RSS>* <RIS> to their
>> hearts' content?
>>
>> Leo
>>
>> On Thu, Jul 2, 2015 at 10:04 AM, Ken Whistler <kenwhistler at att.net> wrote:
>> >
>> > On 7/2/2015 2:01 AM, Philippe Verdy wrote:
>> >
>> >
>> > The frozen status of Antarctica ...
>> >
>> >
>> > ... will be addressed separately by global warming. But be that as it
>> > may...
>> >
>> >
>> > In really there's still no standard way to encode flags unambiguously
>> > and in
>> > a stable way. We'd like to have FOTW (Flags of the World) contributors
>> > to
>> > propose their own scheme. But it will not be compatible with the current
>> > RIS
>> > solution or the proposed extension. If ever such standard emerges, it
>> > will
>> > require encoding a new set of characters.
>> >
>> >
>> > The UTC is neither responsible for nor interested in a "standard way
>> > to encode flags unambiguously". I suspect one of the reasons this
>> > discussion is tending to derail into political topics and too much
>> > detail
>> > about particular flags and their stability and the stability of
>> > geopolitical
>> > entities they represent and yadda yadda, is that people seem ineluctably
>> > drawn to the misapprehension that this is all about standard encoding
>> > of flags.
>> >
>> > It is not.
>> >
>> > Rather, it is about a standard way to represent recognizable and
>> > interchangeable
>> > emoji (colorful little pictographs) of flags, using defined sequences of
>> > Unicode characters.
>> >
>> > The existing mechanism using regional indicator symbol (RIS) pairs was
>> > originally aimed at solving the following problems:
>> >
>> > 1. Enabling the reliable interchange of the legacy 10 flag emoji from
>> > Japanese
>> > carrier sets.
>> >
>> > 2. Enabling the completion of the encoding of emoji to cover the rest
>> > of the Japanese carrier sets without all progress dragging to a
>> > complete halt as national bodies in SC2 would argue interminably over
>> > a "standard way to encode flags unambiguously" in an ISO standard.
>> >
>> > 3. Dealing with the inevitable hue and cry: "China and Japan and the US
>> > got
>> > their flag!
>> > Why can't I get my country's flag??!"
>> >
>> > And it appears that the RIS mechanism succeeded spectacularly well in
>> > addressing all of those design goals.
>> >
>> > In the middle of last year, for example, there was a major media and
>> > internet campaign to "encode the flag of India". Well, the RIS mechanism
>> > handled the real issue there just fine -- when the new phones started
>> > coming out with support for display and interchange of emoji for flags
>> > using the RIS sequences, there was the emoji for the flag of India for
>> > everybody to use. Problem solved.
>> >
>> > And the problem which was solved was not the determination that
>> > the <1F1EE, 1F1F3> RIS sequence "IN" meant precisely the current
>> > national flag of India, the saffron, white and green tricolor with the
>> > Ashoka Chakra, and *not* any other flag of India (the flag of the
>> > Indian army, the flag of the Mughal Empire, the flag of British
>> > India, etc.). The RIS sequence "IN" was just mapped to the colorful
>> > little emoji glyph for the Indian flag that everybody wanted to
>> > interchange.
>> >
>> > The Unicode Standard is not a vexillology standard -- nor will it ever
>> > be.
>> > It is a standard for the encoding and interchange of characters.
>> >
>> > The *character* problem we are faced with here is that people want
>> > to use and interchange colorful little emoji pictographs of various
>> > flags in text streams. The RIS mechanism addresses a significant
>> > part of that problem, but is not extensible to cover the full scope of
>> > the
>> > demand.
>> >
>> > And what is the scope of the additional demand?
>> >
>> > 1. The first part can be summed up as: the flag of Scotland problem.
>> >
>> > In other words, there are a number of high visibility, high demand,
>> > widely recognized regional flags that would be interchanged as just
>> > more emoji pictographs, if a mechanism for that were available.
>> >
>> > People who want to use an emoji for the flag of Scotland just as
>> > easily as someone can use an emoji for the flag of Great Britain
>> > are not going to accept an argument that says, "Well, we can't do
>> > that on your phones because there is no 3166-1 country code registered,
>> > so
>> > we can't map a Scotland flag emoji glyph to a RIS pair."
>> >
>> > Hence the PRI #299 proposal: for an extension mechanism that would
>> > address the flag of Scotland problem in a generic and reasonably
>> > stable way.
>> >
>> > 2. The second part can be summed up as: the rainbow flag problem.
>> >
>> > In other words, there are a number of high visibility, high demand,
>> > widely recognized non-governmental flags that would be interchanged
>> > as just more emoji pictographs, if a mechanism for that were available.
>> >
>> > From the public's point of view, this is another no brainer: if the
>> > flag of Japan and the flag of Scotland, why not the rainbow flag??!
>> > They aren't interested in the limitations of the underlying
>> > representation
>> > mechanisms, nor should they be, IMO.
>> >
>> > The problem the UTC faces here is that there are a number of
>> > reasonable and popular candidates, which the rainbow flag amply
>> > exemplifies, for more colorful little emoji pictographs for flags that
>> > people would like to interchange -- but there is no obvious and
>> > extensible way to do so reliably in terms of sequences of Unicode
>> > characters in a plain text stream. The PRI #299 proposal does not
>> > extend into this realm, for many of the reasons pointed
>> > out by Doug Ewell.
>> >
>> > There are a number of potential approaches to address the rainbow
>> > flag problem. For example:
>> >
>> > a. use private-use characters
>> > b. pursue one-by-one encoding of each newly desired flag pictograph as a
>> > symbol
>> > c. extend the unicode_region_subtag and unicode_subdivision_subtag
>> > scheme in CLDR to add some new subtag addressing a separate,
>> > non-geopolitical hierarchy
>> > d. create a separate extension using TAG characters but with a
>> > syntax not dependent on CLDR subtag definitions
>> > e. create a registry of flag entities suitable for representation as
>> > emoji, together with a "c" or "d" style syntax
>> > f. something else?
>> > g. do nothing (and perhaps hope that stickers will solve the problem)
>> >
>> > If we are to make any progress here in addressing the actual scope
>> > of "the rainbow flag problem", I suggest we focus on the details and
>> > pros and cons of suggestions like those of a through g above, rather
>> > than
>> > pursuing more discussion recapitulating the history of the borders of
>> > Tibet
>> > --
>> > which truly are out of scope here.
>> >
>> > --Ken
>> >
>> >
>
>


From doug at ewellic.org  Thu Jul  2 13:59:52 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 02 Jul 2015 11:59:52 -0700
Subject: Adding RAINBOW FLAG to Unicode
Message-ID: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net>

Leo Broukhis <leob at mailcom dot com> wrote:

> With extensible self-delimited regional indicator sequences the
> carriers will be able to come to an agreement and to petition Unicode
> to register them as named character sequences symbolizing flags not
> encoded by an ISO entity, like various rainbow flags, making sure that
> the format of such sequences is guaranteed not to clash with any
> existing ISO 3166 format.

There are already plenty of ways for companies and groups and
individuals to request new emoji. This way would have the disadvantage
of conflating non-regional flags with a coding system for regions, which
doesn't seem like a good idea.

> Also, ISO 3166-2 can have 2 or 3 letters

or 1, or digits or a combination

> after the dash; it makes sense to have the letters after the dash
> self-delimited, if/when REGIONAL INDICATOR DASH is added to
> facilitate encoding of ISO 3166-2 codes.

I don't understand the significance of this part.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Thu Jul  2 14:09:15 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 02 Jul 2015 12:09:15 -0700
Subject: Adding RAINBOW FLAG to Unicode
Message-ID: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net>

Ken Whistler <kenwhistler at att dot net> wrote:

> The UTC is neither responsible for nor interested in a "standard way
> to encode flags unambiguously".
>
> [...]
>
> The Unicode Standard is not a vexillology standard -- nor will it ever
> be. It is a standard for the encoding and interchange of characters.

Even though I continue to believe there *should* be a vexillology
standard for encoding flags as unambiguously as practicable, I'm in
strong agreement that this is not a Unicode problem, or a character
problem, or even a CLDR problem.

If there were such a standard today, it might make sense for Unicode
and/or CLDR to adapt it for the emoji purposes we are discussing here.
But there isn't.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From c933103 at gmail.com  Thu Jul  2 14:22:34 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Fri, 3 Jul 2015 03:22:34 +0800
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <417C8958513D4476884721AB0881975A@DougEwell>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <CAJ2xs_HtQ1MizmPN-=gvxcJZkQ5pW0CU++cuPUm5T+jRmjB3XA@mail.gmail.com>
 <CAGa7JC0q79iQj7Ui9TBab9MacSOCV0qzm6NYLBWKY+n37qhg9g@mail.gmail.com>
 <CA+Y+444dKoh28JsNi3o=xn-HrXhr9-40PU8LOs6nog8U_9-x-A@mail.gmail.com>
 <417C8958513D4476884721AB0881975A@DougEwell>
Message-ID: <CAGHjPP+OfVvWHg_SvHJWkggQtT_xDQvFKOOvbb+_37KuC7qQ4A@mail.gmail.com>

As I read, should those flag be versioned when being use?As the curremt
implementation sound like those flag would change all over the time, and if
people using the emoticon with country X's flag on it to show support for
its current government, once the government have been overthrown and the
overthrown is internationally recongized with new flags and thus being
accepted, then what appear on one's timeline of their social media would
have their meaning shifted to the opposing side of their original intention
by simply updating their device, and for those who haven't update their
device they would see same effect from message written by those who have
already updated their devices. a potential way to do it might be adding RIS
for number and then append those numbers after alphabetical RIS to show
year of start while retaining the unnumbered alphabetical RIS as they are
today?
2015?7?2? ??9:09? "Doug Ewell" <doug at ewellic.org>???

> Noah Slater wrote:
>
>  Correct me if I'm wrong, but it seems like Philippe's core argument is
>> that geopolitical entities and flags (as a specific instances of a
>> design, in the heraldic sense) are disjoint. And that using
>> geopolitical codes to refer to these designs is inherently unstable.
>>
>
> But the only alternative is to encode about 200 discrete emoji for what we
> think of as "country" flags, plus somewhere between 0 and 5000 for flags of
> what we think of as "subdivisions."
>
> And in the end, when users see these emoji, they will still think "Oh,
> that's the US flag" or "the French flag" or "the Japanese flag" or
> whatever. They will still associate them with geopolitical entities. That's
> the whole purpose of such flags.
>
> (Either that or they will associate them with languages, which is far more
> unstable than anything else being discussed here.)
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150703/203c72f4/attachment.html>

From leob at mailcom.com  Thu Jul  2 14:33:31 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Thu, 2 Jul 2015 12:33:31 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net>
References: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net>
Message-ID: <CAFmvRschL-uLN0W8v9pnUfLb6mrVsj_1j_VgDkzhHRMsE5y-bA@mail.gmail.com>

Currently a sequence of regional indicator symbols is parsed
unambiguously by greedily taking pairs of RIS chars and interpreting
them as ISO 3166-1 alpha 2 codes.
If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added,
along with regional supplementary symbols, then sequences
<RIS><RIS><RID><RSS>*<RIS> can be parsed unambiguously as ISO 3166-2,
whereas <RIS><RSS>+<RIS> can be parsed as a named sequence signifying
a flag of a non-governmental  entity (or <RIS><RSS><RIS> - as ISO
3166-1 alpha 3, and longer sequences as non-governmental).

Leo

On Thu, Jul 2, 2015 at 11:59 AM, Doug Ewell <doug at ewellic.org> wrote:
> Leo Broukhis <leob at mailcom dot com> wrote:
>
>> With extensible self-delimited regional indicator sequences the
>> carriers will be able to come to an agreement and to petition Unicode
>> to register them as named character sequences symbolizing flags not
>> encoded by an ISO entity, like various rainbow flags, making sure that
>> the format of such sequences is guaranteed not to clash with any
>> existing ISO 3166 format.
>
> There are already plenty of ways for companies and groups and
> individuals to request new emoji. This way would have the disadvantage
> of conflating non-regional flags with a coding system for regions, which
> doesn't seem like a good idea.
>
>> Also, ISO 3166-2 can have 2 or 3 letters
>
> or 1, or digits or a combination
>
>> after the dash; it makes sense to have the letters after the dash
>> self-delimited, if/when REGIONAL INDICATOR DASH is added to
>> facilitate encoding of ISO 3166-2 codes.
>
> I don't understand the significance of this part.
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>


From kenwhistler at att.net  Thu Jul  2 14:57:23 2015
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 02 Jul 2015 12:57:23 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <CAFmvRschL-uLN0W8v9pnUfLb6mrVsj_1j_VgDkzhHRMsE5y-bA@mail.gmail.com>
References: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net>
 <CAFmvRschL-uLN0W8v9pnUfLb6mrVsj_1j_VgDkzhHRMsE5y-bA@mail.gmail.com>
Message-ID: <559597A3.8090205@att.net>


On 7/2/2015 12:33 PM, Leo Broukhis wrote:
> If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added,
> along with regional supplementary symbols, then sequences
> <RIS><RIS><RID><RSS>*<RIS> can be parsed unambiguously as ISO 3166-2,
> whereas <RIS><RSS>+<RIS> can be parsed as a named sequence signifying
> a flag of a non-governmental  entity (or <RIS><RSS><RIS> - as ISO
> 3166-1 alpha 3, and longer sequences as non-governmental).
>
>

The point of switching to the TAG characters for an extension
mechanism beyond what the RIS pairs can handle is that
TAG characters for letters *and* digits *and* dash already exist
and do not have to be encoded yet again before they could be used.

Any proposal that depends on getting agreement to encode and
publish some *further* set of meta-characters for representing
letters, digits, and ASCII punctuation marks would at this point
push out any possible solution to the time frame of Unicode 10.0
(June, 2017). And even that would depend on first coming to
agreement that *more* sets of meta-characters for dealing with
the same kind of function that TAG characters could already serve
would be a good idea. The potential for significant disagreement could
push such a solution out even further. Remember that any
solution involving encoding more characters with "funny behavior"
would need not only to gain consensus in the UTC, but would
also have to pass muster in SC2 and pass two formal ballots by
the national bodies.

You could create an equivalent proposal to what you are suggesting
above by simply substituting <TAG-DASH> and <TAG-[0..9]> for your
RID and RSS above -- and you could do it *now*, instead of in 2017.

But once we look to TAG characters for an extension mechanism,
why mess with the existing RIS pair syntax and break the existing
implementations using them? Hence, the direction taken in
PRI #399, which suggests an extension syntax based entirely on
the TAG characters.

--Ken

From leob at mailcom.com  Thu Jul  2 15:58:22 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Thu, 2 Jul 2015 13:58:22 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <559597A3.8090205@att.net>
References: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net>
 <CAFmvRschL-uLN0W8v9pnUfLb6mrVsj_1j_VgDkzhHRMsE5y-bA@mail.gmail.com>
 <559597A3.8090205@att.net>
Message-ID: <CAFmvRsfNJ6sjMge-yxJniDskFmgGiO_DKC25jfgEZ6LRYrGL+w@mail.gmail.com>

What I don't like about PRI #399 is its proposing to use
default-ignorable characters. On a non-vexillology-aware platform, I'd
like to see something informative, albeit not resembling a flag, but
indicative of the intention to display a flag, like RIS can be, as
opposed to nondescript white flags.

Leo

On Thu, Jul 2, 2015 at 12:57 PM, Ken Whistler <kenwhistler at att.net> wrote:
>
>
> On 7/2/2015 12:33 PM, Leo Broukhis wrote:
>>
>> If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added,
>> along with regional supplementary symbols, then sequences
>> <RIS><RIS><RID><RSS>*<RIS> can be parsed unambiguously as ISO 3166-2,
>> whereas <RIS><RSS>+<RIS> can be parsed as a named sequence signifying
>> a flag of a non-governmental  entity (or <RIS><RSS><RIS> - as ISO
>> 3166-1 alpha 3, and longer sequences as non-governmental).
>>
>>
>
> The point of switching to the TAG characters for an extension
> mechanism beyond what the RIS pairs can handle is that
> TAG characters for letters *and* digits *and* dash already exist
> and do not have to be encoded yet again before they could be used.
>
> Any proposal that depends on getting agreement to encode and
> publish some *further* set of meta-characters for representing
> letters, digits, and ASCII punctuation marks would at this point
> push out any possible solution to the time frame of Unicode 10.0
> (June, 2017). And even that would depend on first coming to
> agreement that *more* sets of meta-characters for dealing with
> the same kind of function that TAG characters could already serve
> would be a good idea. The potential for significant disagreement could
> push such a solution out even further. Remember that any
> solution involving encoding more characters with "funny behavior"
> would need not only to gain consensus in the UTC, but would
> also have to pass muster in SC2 and pass two formal ballots by
> the national bodies.
>
> You could create an equivalent proposal to what you are suggesting
> above by simply substituting <TAG-DASH> and <TAG-[0..9]> for your
> RID and RSS above -- and you could do it *now*, instead of in 2017.
>
> But once we look to TAG characters for an extension mechanism,
> why mess with the existing RIS pair syntax and break the existing
> implementations using them? Hence, the direction taken in
> PRI #399, which suggests an extension syntax based entirely on
> the TAG characters.
>
> --Ken

From doug at ewellic.org  Thu Jul  2 15:58:54 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 02 Jul 2015 13:58:54 -0700
Subject: [OT] Versioning flags (was: Re: Adding RAINBOW FLAG to Unicode)
Message-ID: <20150702135854.665a7a7059d7ee80bb4d670165c8327d.0c4b4865bb.wbe@email03.secureserver.net>

gfb hjjhjh <c933103 at gmail dot com> wrote:

> As I read, should those flag be versioned when being use?As the
> curremt implementation sound like those flag would change all over the
> time, and if people using the emoticon with country X's flag on it to
> show support for its current government, once the government have been
> overthrown

Or not: http://www.newfijiflag.com

> and the overthrown is internationally recongized with new flags and
> thus being accepted, then what appear on one's timeline of their
> social media would have their meaning shifted to the opposing side of
> their original intention by simply updating their device, and for
> those who haven't update their device they would see same effect from
> message written by those who have already updated their devices. a
> potential way to do it might be adding RIS for number and then append
> those numbers after alphabetical RIS to show year of start while
> retaining the unnumbered alphabetical RIS as they are today?

This would be a great reason to remind users of emoji flags not to try
to use them to indicate an "intention" that can have an "opposing side,"
such as loyalty or support for a particular political party or
government. They aren't for that.

A proper coding standard for flags (NOT in scope for Unicode) might have
this sort of versioning feature, but even then, I would think the
default (unversioned) behavior should be to select the "current" flag,
whatever that is.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From gwalla at gmail.com  Thu Jul  2 16:59:37 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Thu, 2 Jul 2015 14:59:37 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net>
References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net>
Message-ID: <CA+p4_H3goDi5Acs_KzBZoePtsJks4fYTCW4v0=ocV96f4tH6vQ@mail.gmail.com>

On Thu, Jul 2, 2015 at 12:09 PM, Doug Ewell <doug at ewellic.org> wrote:
> Ken Whistler <kenwhistler at att dot net> wrote:
>
>> The UTC is neither responsible for nor interested in a "standard way
>> to encode flags unambiguously".
>>
>> [...]
>>
>> The Unicode Standard is not a vexillology standard -- nor will it ever
>> be. It is a standard for the encoding and interchange of characters.
>
> Even though I continue to believe there *should* be a vexillology
> standard for encoding flags as unambiguously as practicable, I'm in
> strong agreement that this is not a Unicode problem, or a character
> problem, or even a CLDR problem.
>
> If there were such a standard today, it might make sense for Unicode
> and/or CLDR to adapt it for the emoji purposes we are discussing here.
> But there isn't.

Tangentially, I recently ran across something called International
Flag Identification Symbols. It's a symbolic notation for vexillology
that describes their use of flags and some aspects of their design but
not enough to reproduce them. They're described on the Flags of the
World site <https://flagspot.net/flags/xf-fis.html#fiavcode> and the
usage symbols at least are used inline with text on that site, e.g. in
the article on German flags <https://flagspot.net/flags/de.html> and
as a quotation from a reference in the article on Guinea-Bissau
<https://flagspot.net/flags/gw.html>. The site uses small black &
white GIFs but there are apparently a couple of TrueType fonts that
put those symbols in the PUA.

From petercon at microsoft.com  Thu Jul  2 19:56:32 2015
From: petercon at microsoft.com (Peter Constable)
Date: Fri, 3 Jul 2015 00:56:32 +0000
Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional
 Types of Flags)
In-Reply-To: <003401d0b4be$3af16970$b0d43c50$@fi>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <003401d0b4be$3af16970$b0d43c50$@fi>
Message-ID: <BL2PR03MB1148BF6281CB167BC400730D5960@BL2PR03MB114.namprd03.prod.outlook.com>

Erkki, in this case, I think Philippe is making valid points.


-          For the proposal to be workable requires some means of ensuring stability of encoded representations. The way this would be done would be for CLDR to provide data with all valid sequences --- effectively becoming a registry.

-          The concepts being denoted are inherently political, often unstable, and sometimes highly sensitive.

Sensitive issues aside, a better approach would be to have a URN tagging scheme --- which IMO begs the question why this is a Unicode topic as it clearly crosses outside the limits of plain text.

Sensitive issues considered, though, it begs the question as to whether Unicode should be considering any of this at all, no matter what the scheme for encoded representation may be. Someone helpfully reminded us of this:


>> [...] the UTC does not wish to entertain further proposals for

>> encoding of symbol characters for flags, whether national, state,

>> regional, international, or otherwise. References to UTC Minutes:

>> [134-C2], January 28, 2013.


Peter

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Erkki I Kolehmainen
Sent: Thursday, July 2, 2015 5:42 PM
To: verdy_p at wanadoo.fr; 'Mark Davis ??'
Cc: 'Doug Ewell'; 'Unicode Mailing List'
Subject: VS: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags)

I cannot but agree with Mark! Thus, please?

Sincerely, Erkki

L?hett?j?: Unicode [mailto:unicode-bounces at unicode.org] Puolesta Philippe Verdy
L?hetetty: 2. hein?kuuta 2015 12:02
Vastaanottaja: Mark Davis ??
Kopio: Doug Ewell; Unicode Mailing List
Aihe: Re: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags)

The political subject is immediately related to the designation of flags and their association to ISO 3166-1 and -2 encoded entities. Even if you don't like it, this is very political and for a standard seeking for stability, I wonder how any flag (directly bound to specific political entities at specific dates and within some boundaries which may be contested) can be related to ISO 3166 and its instability (and the fact that ISO 3166 entities have in fact also no defined borders, so that ISO 3166-2 is just a political point of view from the current ruler of the current ISO 3166-1 entity).

All this topic is political. In fact the real flags are not even encoded with RIS, not even for current nations (and there's still a problem to know what is a recognized nation, even when just considering the UN definition. Political entities are defined but with fuzzy borders, they just represent in fact some local governments, not necessarily their lands, people, or cultures, and in some cases they are in exil or not even ruling: their seat in the UN is vacant and they exist only on the paper, but even UN members disagree about which treaty they recognize).

Consider the case of Western Sahara (which no longer exists except on the paper as a dependency of Spain that has abandoned it completely) and with two governments competing to control the territory (Morocco controlling most of it, another part claimed by Mauritania then abandonned, another part left without infrastructures, and many refugees left de facto in Mauritania or Algeria). None of the two autorities designate that territory as "Western Sahara". So it no longer exists (and will likely never exist again).

The frozen status of Antarctica has not created any new country or territory, even if there's a sort of joint administration: that adminsitration does not suppresses the existing claims (and new claims that have been made since its creation). So this area has no well defined flag and various falgs are used informally plus national flags for each claim and sometimes specific regional flags created ad hoc. The use of RIS for ISO 3166-1 and its limited extension for ISO3166-2 (slightly modified) does not resolve the problem.

In really there's still no standard way to encode flags unambiguously and in a stable way. We'd like to have FOTW (Flags of the World) contributors to propose their own scheme. But it will not be compatible with the current RIS solution or the proposed extension. If ever such standard emerges, it will require encoding a new set of characters.

An alternative would be to embed an URN (not reencoded) between some pairs of controls (to embed an object by reference) and use that sequence after a White flag symbol with a joiner.

The URN scheme being the best long term solution (and preferable to URLs bound to specific servers), but we could in fact a generic URI encapsulation (supporting URNs and URLs).

It could be used then for representing various kinds of entities, and then link them to specific forms: flags, banners, flying flag, flag over a person face, micni location maps, "flag maps"... Programs not recognizing the encoded entities would have a very simply way to scan over the encasulated URI representing some an specified objects. OTher programs will recognize some specific URI schemes. RIS will then be something of the past, obsoleted because it was non neutral, politcally and culturally oriented, incomplete, and fundamentally unstable since the begining... For now we just have some set of flags promoted only to support the immediate support for interconnecting propriatary messaging services. But all this came without a correct review of what was really needed.


2015-07-02 7:16 GMT+02:00 Mark Davis ?? <mark at macchiato.com<mailto:mark at macchiato.com>>:
?Please take political discussions elsewhere; they do not belong on this list.

The point about the boundaries of regions changing over time, and flags being associated with a former set of boundaries could have been made in a few sentences. Not only would it have avoided politics, it would have been more likely that people would actually read it (the likelihood being inversely proportional to the length).


Mark<https://google.com/+MarkDavis>

? Il meglio ? l?inimico del bene ?

On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy <verdy_p at wanadoo.fr<mailto:verdy_p at wanadoo.fr>> wrote:
?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150703/07de359d/attachment.html>

From charupdate at orange.fr  Fri Jul  3 10:19:13 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 3 Jul 2015 17:19:13 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <20150702190244.789e44af@JRWUBU2>
References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
 <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>
 <20150702190244.789e44af@JRWUBU2>
Message-ID: <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36>

On Thu, Jul 02, 2015, Richard Wordingham  wrote:

> On Thu, 2 Jul 2015 10:37:17 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > (because it is
> > sufficient to simply type the words one after each other without
> > anything between, to get them as *one* word)
> 
> This only applies where it is traditional to separate words, a habit
> the Romans got out of and the Irish revived.

IMHO the case is a bit different in handwritten or engraved text vs word processing.

> Unicode Word Boundary Rule WB4 (in UAX #29 'Unicode Text
> Segmentation') decrees that U+2060 and U+FEFF be ignored in
> word-boundary determination except that newline breaks before them and
> that inserting them between between and creates an extra word
> boundary.

When we look up the set of existing format characters (Cf), the ZWSP, ZWNBSP and WJ fall out of the group in that they are used to detect word boundaries in cases like whole word search and spell checking. (They indicate word boundaries.) This is why, in reality, they are remapped to another category, a practice expressedly allowed by UAX #29. So in fact, the WB4 rule scarcely ever (say, *never*) applies to them. This can be discovered by oneself following the hints given at the very beginning of the UAX #29 content.

I believe that UAXes as well as the whole Standard are not here to decree, as Richard calls it, but to promote knowledge and to share a number of useful rules, given in accordance with practice and real needs. Perhaps some sentences are likely to be rewritten for clarification in order to stick even more with reality.

Perhaps, too, we should reconsider what we are talking about when using the expression ?word boundary?. This is a bit ambiguous because UIs are designed to meet different needs, and because in English, the apostrophe is often a part of the sequences it is between. If I'm right, U+2019 or U+02BC in _month?s_ is expected to indicate a word boundary, and a search for the whole word _month_ will succeed, while _won?t_ in in the UAX #29 example is *one* word, and searching for a supposed _won_ word makes no sense (and will fail). However, both are selected as a whole by Shift+Ctrl+LEFT/RIGHT ARROW. 


[For the archive: Please refer to the last month?s thread _A new take on the English apostrophe in Unicode_. About the difference between quick cursor move and double-click select vs "whole word" search, please refer to my previous e-mails.] 

Definitely, word boundaries are found with a whole word search (see UAX #29, again).


Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150703/256647ad/attachment.html>

From kenwhistler at att.net  Fri Jul  3 13:23:42 2015
From: kenwhistler at att.net (Ken Whistler)
Date: Fri, 03 Jul 2015 11:23:42 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <BL2PR03MB1148BF6281CB167BC400730D5960@BL2PR03MB114.namprd03.prod.outlook.com>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <003401d0b4be$3af16970$b0d43c50$@fi>
 <BL2PR03MB1148BF6281CB167BC400730D5960@BL2PR03MB114.namprd03.prod.outlook.com>
Message-ID: <5596D32E.9030403@att.net>


On 7/2/2015 5:56 PM, Peter Constable wrote:
>
> Erkki, in this case, I think Philippe is making valid points.
>
> -For the proposal to be workable requires some means of ensuring 
> stability of encoded representations. The way this would be done would 
> be for CLDR to provide data with all valid sequences --- effectively 
> becoming a registry.
>

I think that is wrong on a couple of grounds.

First, detailed stability of reference to actual defined geopolitical 
entities
or particular detailed flag designs is
not *required* for proposal to represent *pictographs* of flags by some
sequence of Unicode characters to be "workable". Sure, more stability
of reference is desirable. But the current RIS pair mechanism for 
representing
flag pictographs for countries is already "workable" -- it works and is 
widely deployed
and widely used -- without having guarantees that some particular 
country may
not decide tomorrow to change its official flag and hence result in some
particular pictographic display being obsolete in some sense, for example.

Second, the horse is already out of the barn regarding the particular
data that CLDR would be referring to. This works by reference to
the ISO 3166-2 scheme of subdivisions:

https://en.wikipedia.org/wiki/ISO_3166-2

and *that* becomes the registry required for stability of representations,
plus whatever grandfathering stability-of-code mechanism BCP 47
adds on top of that. We don't require a further detailed level of
registration, I think, to make this workable. If the New Zealand
Hawke's Bay Regional Council (NZ-HKB) decided it needed a district
flag (or decided to change one it may already have), I'm not going to be
overly concerned about the details there. As long as
<base, tag-N, tag-Z, tag-H, tag-K, tag-B> has a stable definition as
a Unicode extended flag tag sequence, it is up to somebody else to
decide if they want to actually map a Hawke's Bay flag /pictograph /in a 
font to
that sequence -- or update the flag pictograph they may have been
using.

Yeah, this could be a giant headache for any vendor that felt they
had to support *every possible* region/subdivision sequence
and keep the exact representations of flag pictographs stable. But
I predict this will very, very quickly result in people making a
"let's cover the 99% case" set of decisions, and then issues like
"Should we display a flag pictograph for the Hawke's Bay Regional
Council?" will be dealt with by the normal methods of triage for
feature requests.


> -The concepts being denoted are inherently political, often unstable, 
> and sometimes highly sensitive.
>

> Sensitive issues aside, a better approach would be to have a URN 
> tagging scheme --- which IMO begs the question why this is a Unicode 
> topic as it clearly crosses outside the limits of plain text.
>

A URN tagging scheme might make sense if what we were trying to
do was delegating all identity concerns to external authority,
and if we didn't care about efficiency of representation, either.

I don't think that is what this is about, as I tried to make clear 
yesterday.
I don't think we are encoding *flags* -- we are creating a mechanism
for the reliable representation of a set of *pictographs (emoji) for flags*.
And those pictographs for flags need an efficient representation that
can coexist comfortably with the rest of plain text -- the way the RIS
pairs already do.

> Sensitive issues considered, though, it begs the question as to 
> whether Unicode should be considering any of this at all, no matter 
> what the scheme for encoded representation may be. Someone helpfully 
> reminded us of this:
>
> >> [...] the UTC does not wish to entertain further proposals for
>
> >> encoding of symbol characters for flags, whether national, state,
>
> >> regional, international, or otherwise. References to UTC Minutes:
>
> >> [134-C2], January 28, 2013.
>

I believe that that statement (and the referenced decision) refer
specifically to the unwillingness of the UTC to entertain proposals
for encoding an indefinite number of pictographs for flags (of
whatever variety) *as symbol characters* -- that is,
one-by-one encodings as a single, gc=So code point in the standard.
Heading that direction is clearly not an efficient way to deal with
the concern, and would waste everybody's time in one-by-one
proposals and ad hoc decisions for each individual flag pictograph
to be added.

The UTC has a long history of putting a stake in the ground when it
encounters a character encoding problem which requires a *general*
solution, rather than a dribbling in of one-off decisions an item
at a time. And I think the tag proposal for dealing with
the representation of flag pictographs for regional subdivisions
shows precisely the kind of generality that we are looking for --
dealing with hundreds of potentially representable entities
with a general mechanism, rather than trying to encode them
all one-by-one.

Incidentally, back to the ostensible topic of this thread -- I don't
think the extended flag tag proposal currently addresses the
issue of how to represent a pictograph for a rainbow flag.
In that case I think a new registry mechanism might in fact make
sense -- and I have spelled out details of how one could reasonably
work in conjunction with the extended flag tag proposal in
feedback submitted on PRI #299.

--Ken


> Peter
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150703/25a9e3df/attachment.html>

From richard.wordingham at ntlworld.com  Fri Jul  3 13:31:43 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 3 Jul 2015 19:31:43 +0100
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36>
References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
 <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>
 <20150702190244.789e44af@JRWUBU2>
 <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36>
Message-ID: <20150703193143.1fa823db@JRWUBU2>

On Fri, 3 Jul 2015 17:19:13 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> On Thu, Jul 02, 2015, Richard Wordingham  wrote:

> > This only applies where it is traditional to separate words, a habit
> > the Romans got out of and the Irish revived.
 
> IMHO the case is a bit different in handwritten or engraved text vs
> word processing.

For your information, the Thais, Burmese and Cambodians use word
processors. Look up line-breaking category SA for modern, mainstream
examples of writing systems where words are not separated by spaces or
any other character. 

Richard.

From doug at ewellic.org  Fri Jul  3 14:50:51 2015
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 3 Jul 2015 13:50:51 -0600
Subject: PRI #299 (was: Re: Adding RAINBOW FLAG to Unicode)
Message-ID: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell>

Leo Broukhis <leob at mailcom dot com> wrote:

> What I don't like about PRI #399 is its proposing to use default-
> ignorable characters. On a non-vexillology-aware platform, I'd like
> to see something informative, albeit not resembling a flag, but
> indicative of the intention to display a flag, like RIS can be, as
> opposed to nondescript white flags.

This is just a personal prediction, but I'd guess that once the PRI #299 
mechanism hits the streets, U+1F3F3 WAVING WHITE FLAG will be used 
overwhelmingly for tag sequences and comparatively seldom on its own. 
When a reader sees ??, it might be relatively safe to assume the writer 
intended to display a specific flag.

I don't know what the original impetus for adding U+1F3F3 was. That 
might help us predict how popular U+1F3F3 will be on its own. Maybe one 
of the Emoji Gurus can help out here.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From asmus-inc at ix.netcom.com  Fri Jul  3 18:28:43 2015
From: asmus-inc at ix.netcom.com (Asmus Freytag (t))
Date: Fri, 3 Jul 2015 16:28:43 -0700
Subject: PRI #299 (was: Re: Adding RAINBOW FLAG to Unicode)
In-Reply-To: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell>
References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell>
Message-ID: <55971AAB.4040203@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150703/1790e992/attachment.html>

From leob at mailcom.com  Fri Jul  3 23:14:07 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Fri, 3 Jul 2015 21:14:07 -0700
Subject: PRI #299 (was: Re: Adding RAINBOW FLAG to Unicode)
In-Reply-To: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell>
References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell>
Message-ID: <CAFmvRseB_E3cetoPyu=74c2SdTxBipmT8Z5Ks=-PmNNUK=7D1g@mail.gmail.com>

On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell <doug at ewellic.org> wrote:
> Leo Broukhis <leob at mailcom dot com> wrote:
>
>> What I don't like about PRI #399 is its proposing to use default-
>> ignorable characters. On a non-vexillology-aware platform, I'd like
>> to see something informative, albeit not resembling a flag, but
>> indicative of the intention to display a flag, like RIS can be, as
>> opposed to nondescript white flags.
>
>
> This is just a personal prediction, but I'd guess that once the PRI #299
> mechanism hits the streets, U+1F3F3 WAVING WHITE FLAG will be used
> overwhelmingly for tag sequences and comparatively seldom on its own. When a
> reader sees ??, it might be relatively safe to assume the writer intended to
> display a specific flag.

But then a reader will have to look at the raw Unicode bytestream to
find out *which* specific flag was intended.
How convenient is that?

Leo


From kenwhistler at att.net  Fri Jul  3 23:38:16 2015
From: kenwhistler at att.net (Ken Whistler)
Date: Fri, 03 Jul 2015 21:38:16 -0700
Subject: PRI #299
In-Reply-To: <CAFmvRseB_E3cetoPyu=74c2SdTxBipmT8Z5Ks=-PmNNUK=7D1g@mail.gmail.com>
References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell>
 <CAFmvRseB_E3cetoPyu=74c2SdTxBipmT8Z5Ks=-PmNNUK=7D1g@mail.gmail.com>
Message-ID: <55976338.4010500@att.net>


On 7/3/2015 9:14 PM, Leo Broukhis wrote:
> On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell <doug at ewellic.org> wrote:
>> Leo Broukhis <leob at mailcom dot com> wrote:
>>
>>> What I don't like about PRI #399 is its proposing to use default-
>>> ignorable characters. On a non-vexillology-aware platform, I'd like
>>> to see something informative, albeit not resembling a flag, but
>>> indicative of the intention to display a flag, like RIS can be, as
>>> opposed to nondescript white flags.
> But then a reader will have to look at the raw Unicode bytestream to
> find out *which* specific flag was intended.
> How convenient is that?
>

Ah, but on a "non-vexillology-aware platform", if it is just ignoring
all this vexatious trouble of mapping the tag sequences to identifiable
flag pictographs, you're just as likely that the fonts/renderers
involved won't do anything comprehensible with any new
non-default-ignorable metacharacter additions, either -- particularly as 
they
would be Unicode 10.0+ additions to the standard. So the most
likely display would end up looking more like: ? ? ? ? ?

How convenient is that?

--Ken


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150703/8ed73673/attachment.html>

From leob at mailcom.com  Fri Jul  3 23:52:52 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Fri, 3 Jul 2015 21:52:52 -0700
Subject: PRI #299
In-Reply-To: <55976338.4010500@att.net>
References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell>
 <CAFmvRseB_E3cetoPyu=74c2SdTxBipmT8Z5Ks=-PmNNUK=7D1g@mail.gmail.com>
 <55976338.4010500@att.net>
Message-ID: <CAFmvRsenYTfo+SZxn4f21XdrYKqpkeBz_0aoOcv=zBouFfJ3Rw@mail.gmail.com>

Most platforms display unknown printable characters as white
rectangles with hex digits in them.
In Doug's message, I saw a rectangle with 01F in the upper row, and
3F3 in the lower row.
Moreover, on any platform when users see unknown characters, they
search for a font, install it and are able to see in cleartext at
least something they can make sense of. For a RIS or any other
non-default-ignorable character on a non-vexillology-aware platform, a
font with  stylized letters would be sufficient to read the intent of
the writer, and, as a free extra, to tell apart Liechtenstein and
Haiti without squinting.


On Fri, Jul 3, 2015 at 9:38 PM, Ken Whistler <kenwhistler at att.net> wrote:
>
>
> On 7/3/2015 9:14 PM, Leo Broukhis wrote:
>
> On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell <doug at ewellic.org> wrote:
>
> Leo Broukhis <leob at mailcom dot com> wrote:
>
> What I don't like about PRI #399 is its proposing to use default-
> ignorable characters. On a non-vexillology-aware platform, I'd like
> to see something informative, albeit not resembling a flag, but
> indicative of the intention to display a flag, like RIS can be, as
> opposed to nondescript white flags.
>
> But then a reader will have to look at the raw Unicode bytestream to
> find out *which* specific flag was intended.
> How convenient is that?
>
>
> Ah, but on a "non-vexillology-aware platform", if it is just ignoring
> all this vexatious trouble of mapping the tag sequences to identifiable
> flag pictographs, you're just as likely that the fonts/renderers
> involved won't do anything comprehensible with any new
> non-default-ignorable metacharacter additions, either -- particularly as
> they
> would be Unicode 10.0+ additions to the standard. So the most
> likely display would end up looking more like: ? ? ? ? ?
>
> How convenient is that?
>
> --Ken
>
>


From charupdate at orange.fr  Sat Jul  4 10:02:00 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 4 Jul 2015 17:02:00 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <20150703193143.1fa823db@JRWUBU2>
References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
 <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>
 <20150702190244.789e44af@JRWUBU2>
 <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36>
 <20150703193143.1fa823db@JRWUBU2>
Message-ID: <486299455.13899.1436022120042.JavaMail.www@wwinf1j32>

On Fri, Jul 03, 2015, Richard Wordingham  wrote:

> On Fri, 3 Jul 2015 17:19:13 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > On Thu, Jul 02, 2015, Richard Wordingham wrote:
> 
> > > This only applies where it is traditional to separate words, a habit
> > > the Romans got out of and the Irish revived.
> 
> > IMHO the case is a bit different in handwritten or engraved text vs
> > word processing.
> 
> For your information, the Thais, Burmese and Cambodians use word
> processors. Look up line-breaking category SA for modern, mainstream
> examples of writing systems where words are not separated by spaces or
> any other character. 

I considered not to reply any more in this unfaithful dialogue, where after bringing up some historic examples to make me think about them, Richard switches back to present and makes people believe I could suppose that any country could prefer the use of other means than what's world standard. 
I already mentioned in this thread that I do not have any knowledge of Thai, and in another thread, that my scope is *latin* keyboard layouts.
Now lets come to the core: Why on earth do we need word boundaries for whole word search in Latin script, while Thai, Burmese and Cambodian scripts Richard mentions as examples, use impl?mentations that can find whole words without any need of "spaces or any other [separating] character"?

Best wishes,
Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150704/2e7a7be4/attachment.html>

From doug at ewellic.org  Sat Jul  4 12:13:21 2015
From: doug at ewellic.org (Doug Ewell)
Date: Sat, 4 Jul 2015 11:13:21 -0600
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <mailman.1.1436029201.9470.unicode@unicode.org>
References: <mailman.1.1436029201.9470.unicode@unicode.org>
Message-ID: <4FC92E8938644B21AA3587A879E9378F@DougEwell>

Ken Whistler <kenwhistler at att dot net> wrote:

> But the current RIS pair mechanism for representing flag pictographs
> for countries is already "workable" -- it works and is widely deployed
> and widely used -- without having guarantees that some particular
> country may not decide tomorrow to change its official flag and hence
> result in some particular pictographic display being obsolete in some
> sense, for example.

Which brings up a counterpoint to gfb hjjhjh's earlier point:

Suppose a Twitter user wants to use "the emoticon with country X's flag 
on it to show support for its current government," then the government 
is overthrown by an enemy which KEEPS the existing flag, forcing the 
government-in-exile to adopt a different flag? Now, the user who put the 
existing flag in her tweets appears to be showing support for the enemy.

This is what happened in France during World War II, except of course 
for the emoticon and Twitter and that.

--
Doug Ewell | http://ewellic.org | Thornton, CO ???? 


From richard.wordingham at ntlworld.com  Sat Jul  4 13:20:05 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 4 Jul 2015 19:20:05 +0100
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <486299455.13899.1436022120042.JavaMail.www@wwinf1j32>
References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
 <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>
 <20150702190244.789e44af@JRWUBU2>
 <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36>
 <20150703193143.1fa823db@JRWUBU2>
 <486299455.13899.1436022120042.JavaMail.www@wwinf1j32>
Message-ID: <20150704192005.474faf4a@JRWUBU2>

On Sat, 4 Jul 2015 17:02:00 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> On Fri, Jul 03, 2015, Richard Wordingham  wrote:
> 
> > On Fri, 3 Jul 2015 17:19:13 +0200 (CEST)
> > Marcel Schneider  wrote:
 
> I considered not to reply any more in this unfaithful dialogue, where
> after bringing up some historic examples to make me think about them,
> Richard switches back to present and makes people believe I could
> suppose that any country could prefer the use of other means than
> what's world standard.

I cannot work out what you think I am making people believe you might
suppose.  I was pointing out that not everyone uses visible word
boundaries.  I will also note that people are reluctant to type
invisible characters if they don't have immediate benefits.

> Now lets come to the core: Why on earth
> do we need word boundaries for whole word search in Latin script,
> while Thai, Burmese and Cambodian scripts Richard mentions as
> examples, use impl?mentations that can find whole words without any
> need of "spaces or any other [separating] character"?

The Thai and Cambodian implementations are far from perfect, even when
applied to the Thai and Cambodian languages.  Using a dictionary for
the national languages on text of other languages naturally has even
worse performance.  A quick experiment suggest that for whole word
search in Thai, LibreOffice simply ignores any boundaries bwtween Thai
word characters.  Double click and ctrl/arrow use different rules.

It's quite possible that we are misinterpreting the results of whole
word searches.  One way of implementing whole word search is to do a
general search and then check whether the word found is part of a
larger word.  To do that, one might simply ask whether the
characters before and after the string found are permitted in words.
One might easily set things up so that by omission U+2060 is not
considered part of a word - the code could have been written before
U+2060 was assigned and not updated since.

Richard.


From charupdate at orange.fr  Mon Jul  6 06:36:31 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Mon, 6 Jul 2015 13:36:31 +0200 (CEST)
Subject: WORD JOINER vs ZWNBSP
In-Reply-To: <20150704192005.474faf4a@JRWUBU2>
References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net>
 <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18>
 <20150702190244.789e44af@JRWUBU2>
 <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36>
 <20150703193143.1fa823db@JRWUBU2>
 <486299455.13899.1436022120042.JavaMail.www@wwinf1j32>
 <20150704192005.474faf4a@JRWUBU2>
Message-ID: <413346311.9685.1436182591794.JavaMail.www@wwinf1h21>

On Sat, Jul 04, 2015, Richard Wordingham  wrote?:

> I will also note that people are reluctant to type
> invisible characters if they don't have immediate benefits.

This might be the reason why U+2060 hadn't been properly implemented on the spot on word processors, whose users were supposed not to use it. ?As it has already been pointed out, on my version of Word, U+2060 is font-related, what it should not be, and the fallback isn't well set (nor is is it for U+205D TRICOLON, BTW). ?In the meantime, in typography, where the interest of a word joiner is obvious, other software is used. ?By contrast, later versions of word processing applications, no matter of which software house, would have experienced in-depth changes including text segmentation tailoring.

> The Thai and Cambodian implementations are far from perfect, even when
> applied to the Thai and Cambodian languages. ?Using a dictionary for
> the national languages on text of other languages naturally has even
> worse performance. ?A quick experiment suggest that for whole word
> search in Thai, LibreOffice simply ignores any boundaries bwtween Thai
> word characters. ?Double click and ctrl/arrow use different rules.

When Doug Ewell wrote on Tue Jun 30, 2015 that clicking on either part of ?'one\u2060two' selects the whole, I didn't check on my version, taking that as a matter of fact. ?Now I've done and I'm astonished to see *one* part selected only. ?Consequently, between Word 97 (the full version on which Word 2010 Starter is based upon, if I remember well what I've read somewhere) and Word 2010, even the rules for double click and ctrl/arrow must have been changed, to better meet users' needs and expectations. ?From this and some among the bugs having been fixed prior to Word 2013 (I've been told on Microsoft Community), I extrapolate without hasty generalization that Word 2016 could eventually be the performative version I expect since I do word processing.

> It's quite possible that we are misinterpreting the results of whole
> word searches. ?One way of implementing whole word search is to do a
> general search and then check whether the word found is part of a
> larger word. ?To do that, one might simply ask whether the
> characters before and after the string found are permitted in words.
> One might easily set things up so that by omission U+2060 is not
> considered part of a word - the code could have been written before
> U+2060 was assigned and not updated since.

Indeed, perhaps we are dealing with an obsolete behavior. ?I wonder whether Word 2010, which is already overriding U+2060 at word selecting and quick cursor move, does the same at whole word search. ?Personally I'd prefer it did not, because I?believe that this isn't useful. ?So I agree with OpenOffice/LibreOffice (tested version of the latter: 4.2.4.2), that don't. ?Nor does Adobe Reader. By deduction, I'm now supposing that Microsoft Word actually doesn't neither.

Thank you for the information about the Thai and Cambodian implementations. ?I?think that it would be correct to prioritize updates for those implementations which "are far from perfect", given that those still exist(!), in order that everybody on earth could come into the benefit of really performative worktools.

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150706/41b0e69e/attachment.html>

From doug at ewellic.org  Mon Jul  6 10:18:59 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 08:18:59 -0700
Subject: PRI #299
Message-ID: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net>

Leo Broukhis <leob at mailcom dot com> wrote:

> Most platforms display unknown printable characters as white
> rectangles with hex digits in them.
> In Doug's message, I saw a rectangle with 01F in the upper row, and
> 3F3 in the lower row.

This is a handy feature, at least for character geeks like us, but "most
platforms" might be a bit misleading here. There is a rather commonly
used platform that starts with the letter W which does not do this.

> Moreover, on any platform when users see unknown characters, they
> search for a font, install it and are able to see in cleartext at
> least something they can make sense of. For a RIS or any other
> non-default-ignorable character on a non-vexillology-aware platform, a
> font with stylized letters would be sufficient to read the intent of
> the writer, and, as a free extra, to tell apart Liechtenstein and
> Haiti without squinting.

I think a useful bit of feedback on PRI #299 would be to inquire whether
it is, in fact, a design goal to handle this use case of transparency of
the individual letters on platforms, rendering engines, and/or fonts
that don't support flag-tag composition. (Please, not
"non-vexillology-aware." None of these platforms studies or analyzes
flags. They assemble multiple characters into a single image.)

If transparency on flag-tag-unaware platforms is not a design goal, it
might be difficult to make the case that default-ignorable tag
characters are a poor choice because they don't support transparency.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Mon Jul  6 10:26:10 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 08:26:10 -0700
Subject: Adding RAINBOW FLAG to Unicode
Message-ID: <20150706082610.665a7a7059d7ee80bb4d670165c8327d.23bc5880f2.wbe@email03.secureserver.net>

Ken Whistler <kenwhistler at att dot net> wrote:

> Incidentally, back to the ostensible topic of this thread -- I don't
> think the extended flag tag proposal currently addresses the issue
> of how to represent a pictograph for a rainbow flag.

It doesn't.

> In that case I think a new registry mechanism might in fact make sense
> -- and I have spelled out details of how one could reasonably work in
> conjunction with the extended flag tag proposal in feedback submitted
> on PRI #299.

Is this list the right place to discuss that proposal?

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From leob at mailcom.com  Mon Jul  6 10:53:27 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Mon, 6 Jul 2015 08:53:27 -0700
Subject: PRI #299
In-Reply-To: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net>
References: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net>
Message-ID: <CAFmvRsdCz2tQEeXbFbTbciPz3tiBzJwqrTOOVrOcVnkZujfHrw@mail.gmail.com>

On Mon, Jul 6, 2015 at 8:18 AM, Doug Ewell <doug at ewellic.org> wrote:
> Leo Broukhis <leob at mailcom dot com> wrote:
>
>> Most platforms display unknown printable characters as white
>> rectangles with hex digits in them.
>> In Doug's message, I saw a rectangle with 01F in the upper row, and
>> 3F3 in the lower row.
>
> This is a handy feature, at least for character geeks like us, but "most
> platforms" might be a bit misleading here. There is a rather commonly
> used platform that starts with the letter W which does not do this.

I was a little surprised myself when I saw it in Firefox under W7
Enterprise, but here we are.

>> Moreover, on any platform when users see unknown characters, they
>> search for a font, install it and are able to see in cleartext at
>> least something they can make sense of. For a RIS or any other
>> non-default-ignorable character on a non-vexillology-aware platform, a
>> font with stylized letters would be sufficient to read the intent of
>> the writer, and, as a free extra, to tell apart Liechtenstein and
>> Haiti without squinting.
>
> I think a useful bit of feedback on PRI #299 would be to inquire whether
> it is, in fact, a design goal to handle this use case of transparency of

Huh? What kind of a deliberate design goal would be to forgo semantics
in favor of presentation, even as a fallback behavior?
In an ideal world, where all platforms are actively maintained, and
all maintainers rush to implement the cool new features,
it could have been acceptable, but not in our world, I'm afraid.

> the individual letters on platforms, rendering engines, and/or fonts
> that don't support flag-tag composition. (Please, not
> "non-vexillology-aware." None of these platforms studies or analyzes
> flags. They assemble multiple characters into a single image.)

"Vexillology awareness" was, of course, mostly in jest.

> If transparency on flag-tag-unaware platforms is not a design goal, it
> might be difficult to make the case that default-ignorable tag
> characters are a poor choice because they don't support transparency.

Right. Then the objection should be interpreted with regard to the design goal.

Leo

From kenwhistler at att.net  Mon Jul  6 10:53:35 2015
From: kenwhistler at att.net (Ken Whistler)
Date: Mon, 06 Jul 2015 08:53:35 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <20150706082610.665a7a7059d7ee80bb4d670165c8327d.23bc5880f2.wbe@email03.secureserver.net>
References: <20150706082610.665a7a7059d7ee80bb4d670165c8327d.23bc5880f2.wbe@email03.secureserver.net>
Message-ID: <559AA47F.5090906@att.net>


On 7/6/2015 8:26 AM, Doug Ewell wrote:
> Ken Whistler <kenwhistler at att dot net> wrote:
>
>
>> In that case I think a new registry mechanism might in fact make sense
>> -- and I have spelled out details of how one could reasonably work in
>> conjunction with the extended flag tag proposal in feedback submitted
>> on PRI #299.
> Is this list the right place to discuss that proposal?
>
>

It is fair game for discussion on this list, of course.

On the other hand, it might make sense to wait and see if it gains
any traction when the UTC meets later this month and considers
all of the feedback on the extended flag tag PRI #299 proposal together.
If the concept of a Unicode flag pictograph registry garners no
interest there, it is unlikely it would go further after that.

--Ken

From doug at ewellic.org  Mon Jul  6 10:59:14 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 08:59:14 -0700
Subject: Adding RAINBOW FLAG to Unicode
Message-ID: <20150706085914.665a7a7059d7ee80bb4d670165c8327d.3ea7e67602.wbe@email03.secureserver.net>

Ken Whistler <kenwhistler at att dot net> wrote:

> On the other hand, it might make sense to wait and see if it gains any
> traction when the UTC meets later this month and considers all of the
> feedback on the extended flag tag PRI #299 proposal together. If the
> concept of a Unicode flag pictograph registry garners no interest
> there, it is unlikely it would go further after that.

I'll wait, since most of my comments are about details.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Mon Jul  6 11:15:57 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 09:15:57 -0700
Subject: PRI #299
Message-ID: <20150706091557.665a7a7059d7ee80bb4d670165c8327d.a63c6e403b.wbe@email03.secureserver.net>

Leo Broukhis <leob at mailcom dot com> wrote:

>> This is a handy feature, at least for character geeks like us, but
>> "most platforms" might be a bit misleading here. There is a rather
>> commonly used platform that starts with the letter W which does not
>> do this.
>
> I was a little surprised myself when I saw it in Firefox under W7
> Enterprise, but here we are.

I'm surprised too; I hadn't tried using Firefox to view these sequences.
Thanks for demonstrating this.

We may once again be stumbling over different interpretations of the
word "platform": does it refer to an operating system in general, a
specific version thereof, or a specific editor, word processor, or
browser under that OS and version?

>> I think a useful bit of feedback on PRI #299 would be to inquire
>> whether it is, in fact, a design goal to handle this use case of
>> transparency of
>
> Huh? What kind of a deliberate design goal would be to forgo semantics
> in favor of presentation, even as a fallback behavior?
> In an ideal world, where all platforms are actively maintained, and
> all maintainers rush to implement the cool new features,
> it could have been acceptable, but not in our world, I'm afraid.

I questioned whether it was a (positive) design goal to handle the
fallback case in the way you described. I did not suggest that it was a
(negative) design goal NOT to handle it, or to obscure the tag
characters, and I would suggest there is a huge difference between the
two.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From mark at macchiato.com  Mon Jul  6 11:16:17 2015
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Mon, 6 Jul 2015 18:16:17 +0200
Subject: PRI #299
In-Reply-To: <CAFmvRsdCz2tQEeXbFbTbciPz3tiBzJwqrTOOVrOcVnkZujfHrw@mail.gmail.com>
References: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net>
 <CAFmvRsdCz2tQEeXbFbTbciPz3tiBzJwqrTOOVrOcVnkZujfHrw@mail.gmail.com>
Message-ID: <CAJ2xs_HOrnDqO7fRc8zyi_2Wo8pbkcNZRDhdMv11w+MgibKxOg@mail.gmail.com>

On Mon, Jul 6, 2015 at 5:53 PM, Leo Broukhis <leob at mailcom.com> wrote:
>> Most platforms display unknown printable characters as white
>> rectangles with hex digits in them.
>> In Doug's message, I saw a rectangle with 01F in the upper row, and
>> 3F3 in the lower row.


> > This is a handy feature, at least for character geeks like us, but "most
> > platforms" might be a bit misleading here. There is a rather commonly
> > used platform that starts with the letter W which does not do this.
>
> I was a little surprised myself when I saw it in Firefox under W7
> Enterprise, but here we are.


?"Most platforms" is quite misleading.

Rather the converse: for the vast majority of people, the programs that
they use on the devices they have will *not* show unknown printable
characters in a format with readable hex digits.

Mark
<https://google.com/+MarkDavis>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150706/f6d05195/attachment.html>

From steve at swales.us  Mon Jul  6 11:42:05 2015
From: steve at swales.us (Steve Swales)
Date: Mon, 6 Jul 2015 09:42:05 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net>
References: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net>
Message-ID: <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us>

Or a flag inversion modifier? recently I discovered that the Philippines flag, for example, has a special meaning (we are at war) when inverted.  Just a thought.

-steve

> On Jul 1, 2015, at 10:38 AM, Doug Ewell <doug at ewellic.org> wrote:
> 
> <dzo at bisharat dot net> wrote:
> 
>> Whatever notation that might be added to whatever decision is
>> ultimately made on this should probably mention historic use of the
>> rainbow flag by the peace movement. See for example:
>> 
>> https://en.wikipedia.org/wiki/Peace_flag#Rainbow_flag
> 
> The colors of the rainbow peace flag (purple on top) are often inverted
> with respect to the LGBT flag (red on top), making them essentially two
> different flags.
> 
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
> 
> 


From doug at ewellic.org  Mon Jul  6 11:53:48 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 09:53:48 -0700
Subject: Adding RAINBOW FLAG to Unicode
Message-ID: <20150706095348.665a7a7059d7ee80bb4d670165c8327d.9ec08d1f9c.wbe@email03.secureserver.net>

Steve Swales <steve at swales dot us> wrote:

> Or a flag inversion modifier? recently I discovered that the
> Philippines flag, for example, has a special meaning (we are at war)
> when inverted. Just a thought.

An inverted ensign on a ship was formerly used as a distress signal:
http://www.crwflags.com/fotw/flags/xf-flip.html

I'd argue strongly against adding "modifiers" to Unicode flag tags to
indicate inverted, waving, half-staff, folded, or any other transient
state.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From asmus-inc at ix.netcom.com  Mon Jul  6 14:24:18 2015
From: asmus-inc at ix.netcom.com (Asmus Freytag (t))
Date: Mon, 6 Jul 2015 12:24:18 -0700
Subject: PRI #299
In-Reply-To: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net>
References: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net>
Message-ID: <559AD5E2.5020204@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150706/6d0aebb2/attachment.html>

From asmus-inc at ix.netcom.com  Mon Jul  6 14:52:04 2015
From: asmus-inc at ix.netcom.com (Asmus Freytag (t))
Date: Mon, 6 Jul 2015 12:52:04 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us>
References: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net>
 <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us>
Message-ID: <559ADC64.80505@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150706/51951106/attachment.html>

From doug at ewellic.org  Mon Jul  6 15:11:35 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 13:11:35 -0700
Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode)
Message-ID: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net>

Asmus Freytag (t) <asmus dash inc at ix dot netcom dot com> wrote:

> Rather than modifiers, I think a more natural thing would be to have
> different base characters that reflect whether it's a flag, a pennant,
> waving, flying from a flag stock or whatever other variety.
>
> Base characters could be limited to an "approved" list, which could be
> extended as needed to cater to actual demand.
>
> In this context, I dislike the current proposal to use a WAVING flag
> as a base character for non-waving plan and rectangular images of
> flags.

Is it your belief that users who wish to display an emoji flag care
whether the flag is shown stationary versus flapping in the wind?

What would be the compatibility solution for the existing set of emoji
flags supported by RIS? Some carriers already show them rectangular,
while others already show them waving:

http://unicode.org/emoji/charts/full-emoji-list.html#1f1e6_1f1eb

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From leoboiko at namakajiri.net  Mon Jul  6 15:20:58 2015
From: leoboiko at namakajiri.net (Leonardo Boiko)
Date: Mon, 6 Jul 2015 17:20:58 -0300
Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to
 Unicode)
In-Reply-To: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net>
References: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net>
Message-ID: <CAJ6uix6hLxdnBYCiujViCqu2Rs-KjqF7GZ95fXphiDyGGC8Fbg@mail.gmail.com>

2015-07-06 17:11 GMT-03:00 Doug Ewell <doug at ewellic.org>:
> Is it your belief that users who wish to display an emoji flag care
> whether the flag is shown stationary versus flapping in the wind?

I think a waving white flag is an emoji symbol for
"truce/surrender/come in peace", whereas a white rectangle doesn't
easily transmit the same idea.

From doug at ewellic.org  Mon Jul  6 15:40:22 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 13:40:22 -0700
Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to
 Unicode)
Message-ID: <20150706134022.665a7a7059d7ee80bb4d670165c8327d.304da8751c.wbe@email03.secureserver.net>

Leonardo Boiko <leoboiko at namakajiri dot net> wrote:

>> Is it your belief that users who wish to display an emoji flag care
>> whether the flag is shown stationary versus flapping in the wind?
>
> I think a waving white flag is an emoji symbol for
> "truce/surrender/come in peace", whereas a white rectangle doesn't
> easily transmit the same idea.

I don't know how many other flags have different semantics depending on
whether they are waving or not. I note that neither RIS pairs nor PRI
#299 sequences can encode a plain white flag (but of course the user can
simply choose between U+2690 and U+1F3F3 for that).

I hear Asmus's concern about using WAVING WHITE FLAG as the base
character for emoji flags which might not be depicted as waving.
However, in that case the solution would be to choose a different,
*single* base character. What Asmus wanted was

> to have different base characters that reflect whether it's a flag, a
> pennant, waving, flying from a flag stock or whatever other variety

and this is the problem I don't think can be solved, either with RIS
flags or with PRI #299 flags, regardless of the choice of base
character. Different platforms already show (e.g.) the French flag as
either waving or not waving.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From gwalla at gmail.com  Mon Jul  6 15:55:20 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 6 Jul 2015 13:55:20 -0700
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <559ADC64.80505@ix.netcom.com>
References: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net>
 <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us>
 <559ADC64.80505@ix.netcom.com>
Message-ID: <CA+p4_H1RwN9YVVQGR2+JLivsxaPUD1AEvbpJg2t39zYtKbgNrw@mail.gmail.com>

On Mon, Jul 6, 2015 at 12:52 PM, Asmus Freytag (t)
<asmus-inc at ix.netcom.com> wrote:
> On 7/6/2015 9:42 AM, Steve Swales wrote:
>
> Or a flag inversion modifier? recently I discovered that the Philippines
> flag, for example, has a special meaning (we are at war) when inverted.
> Just a thought.
>
>
> Rather than modifiers, I think a more natural thing would be to have
> different base characters that reflect whether it's a flag, a pennant,
> waving, flying from a flag stock or whatever other variety.
>
> Base characters could be limited to an "approved" list, which could be
> extended as needed to cater to actual demand.
>
> In this context, I dislike the current proposal to use a WAVING flag as a
> base character for non-waving plan and rectangular images of flags.
> A./

I'm concerned that the proposed base is a white flag, which usually
means "surrender". It seems like there's some potential for
miscommunication there.


From doug at ewellic.org  Mon Jul  6 16:31:07 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jul 2015 14:31:07 -0700
Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to
 Unicode)
Message-ID: <20150706143107.665a7a7059d7ee80bb4d670165c8327d.c065cd7fa2.wbe@email03.secureserver.net>

I wrote:

> I hear Asmus's concern about using WAVING WHITE FLAG as the base
> character for emoji flags which might not be depicted as waving.

I suppose there's no particular reason why U+2690 can't be the base
character instead.

But then Garth Wallace <gwalla at gmail dot com> wrote:

> I'm concerned that the proposed base is a white flag, which usually
> means "surrender". It seems like there's some potential for
> miscommunication there.

If the intrinsic meaning of the base character in isolation is a problem
-- people using flag-tag-unaware systems will see a white flag and
assume it means "surrender" -- then there aren't any existing encoded
flag characters that are any better.

Black flags have historically had a wide variety of meanings as well --
mourning, anarchy, Italian fascism, race car driver disqualified, etc.
So substituting U+1F3F4 or U+2691 won't help. All of the other existing
flag symbol characters have even more specific meanings, usually
annotated in TUS.

Folks who consider this a problem are probably intrigued by item 2 under
"Discussion" in the background document: encode an all-new base
character. This would delay the rollout of the mechanism, and if the new
character has a glyph that looks at all like a flag, it will likely face
the same criticism (e.g. "looks too much like the Portuguese flag").

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From gwalla at gmail.com  Mon Jul  6 17:35:43 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 6 Jul 2015 15:35:43 -0700
Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to
 Unicode)
In-Reply-To: <20150706143107.665a7a7059d7ee80bb4d670165c8327d.c065cd7fa2.wbe@email03.secureserver.net>
References: <20150706143107.665a7a7059d7ee80bb4d670165c8327d.c065cd7fa2.wbe@email03.secureserver.net>
Message-ID: <CA+p4_H32y8pUN-0JDNpHc3j6ca6DRTw08Cuf6MtBYfvJQLt2Bg@mail.gmail.com>

On Mon, Jul 6, 2015 at 2:31 PM, Doug Ewell <doug at ewellic.org> wrote:
> I wrote:
>
>> I hear Asmus's concern about using WAVING WHITE FLAG as the base
>> character for emoji flags which might not be depicted as waving.
>
> I suppose there's no particular reason why U+2690 can't be the base
> character instead.

I suspect it's because WAVING WHITE FLAG is defined as having an emoji
representation and WHITE FLAG isn't.

> But then Garth Wallace <gwalla at gmail dot com> wrote:
>
>> I'm concerned that the proposed base is a white flag, which usually
>> means "surrender". It seems like there's some potential for
>> miscommunication there.
>
> If the intrinsic meaning of the base character in isolation is a problem
> -- people using flag-tag-unaware systems will see a white flag and
> assume it means "surrender" -- then there aren't any existing encoded
> flag characters that are any better.
>
> Black flags have historically had a wide variety of meanings as well --
> mourning, anarchy, Italian fascism, race car driver disqualified, etc.
> So substituting U+1F3F4 or U+2691 won't help. All of the other existing
> flag symbol characters have even more specific meanings, usually
> annotated in TUS.

That's true, none of the existing flag characters are neutral.

> Folks who consider this a problem are probably intrigued by item 2 under
> "Discussion" in the background document: encode an all-new base
> character. This would delay the rollout of the mechanism, and if the new
> character has a glyph that looks at all like a flag, it will likely face
> the same criticism (e.g. "looks too much like the Portuguese flag").

I think crosshatching would be neutral. I'm not aware of any flags
with a field of diagonal stripes; they usually only have one. Although
I suppose heraldry enthusiasts might interpret them as tinctures.

From nslater at tumbolia.org  Mon Jul  6 18:34:20 2015
From: nslater at tumbolia.org (Noah Slater)
Date: Mon, 06 Jul 2015 23:34:20 +0000
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <20150706095348.665a7a7059d7ee80bb4d670165c8327d.9ec08d1f9c.wbe@email03.secureserver.net>
References: <20150706095348.665a7a7059d7ee80bb4d670165c8327d.9ec08d1f9c.wbe@email03.secureserver.net>
Message-ID: <CA+Y+447EVAqyEBbRCRuE9+ocO9KpJ9ho1iUXNt4FyjxrJJJOGw@mail.gmail.com>

Previously in this thread, it was suggested that I make a formal proposal
to the UTC. I have held back from doing this because it's not at all clear
what implementation I should be proposing, or whether I can propose
something WITHOUT an implementation. (Some advise there would be handy!)

Should I trust that the UTC will be aware of the informal proposal of the
rainbow flag when they meet to discuss PRI #299, or should I do something
to properly bring it to their attention?

On Mon, 6 Jul 2015 at 17:58 Doug Ewell <doug at ewellic.org> wrote:

> Steve Swales <steve at swales dot us> wrote:
>
> > Or a flag inversion modifier? recently I discovered that the
> > Philippines flag, for example, has a special meaning (we are at war)
> > when inverted. Just a thought.
>
> An inverted ensign on a ship was formerly used as a distress signal:
> http://www.crwflags.com/fotw/flags/xf-flip.html
>
> I'd argue strongly against adding "modifiers" to Unicode flag tags to
> indicate inverted, waving, half-staff, folded, or any other transient
> state.
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150706/d81c5949/attachment.html>

From rscook at wenlin.com  Tue Jul  7 09:53:03 2015
From: rscook at wenlin.com (Richard Cook)
Date: Tue, 7 Jul 2015 07:53:03 -0700
Subject: vexillology, was: Adding RAINBOW FLAG to Unicode
In-Reply-To: <CA+p4_H3goDi5Acs_KzBZoePtsJks4fYTCW4v0=ocV96f4tH6vQ@mail.gmail.com>
References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net>
 <CA+p4_H3goDi5Acs_KzBZoePtsJks4fYTCW4v0=ocV96f4tH6vQ@mail.gmail.com>
Message-ID: <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com>

Ken Whistler wrote:
>> vexillology


> Garth Wallace wrote:
> 
> Tangentially, I recently ran across something called International
> Flag Identification Symbols. It's a symbolic notation for vexillology
> that describes their use of flags and some aspects of their design but
> not enough to reproduce them.

Ken,

Hasn't any vexillogist defined a full blown FDL (Flag Description Language) yet? That would be a sub-discipline of heraldic arms blazoning, I guess.

-Richard

<http://wenlin.com/cdl> ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150707/bf1f1d86/attachment.html>

From rscook at wenlin.com  Tue Jul  7 09:56:26 2015
From: rscook at wenlin.com (Richard Cook)
Date: Tue, 7 Jul 2015 07:56:26 -0700
Subject: vexillology, was: Adding RAINBOW FLAG to Unicode
In-Reply-To: <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com>
References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net>
 <CA+p4_H3goDi5Acs_KzBZoePtsJks4fYTCW4v0=ocV96f4tH6vQ@mail.gmail.com>
 <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com>
Message-ID: <E9B8298E-967D-4D38-BAFC-527FFA0876DF@wenlin.com>

On Jul 7, 2015, at 7:53 AM, Richard Cook <rscook at wenlin.com> wrote:
> 
> Ken Whistler wrote:
>>> vexillology
> 
> 
>> Garth Wallace wrote:
>> 
>> Tangentially, I recently ran across something called International
>> Flag Identification Symbols. It's a symbolic notation for vexillology
>> that describes their use of flags and some aspects of their design but
>> not enough to reproduce them.
> 
> Ken,
> 
> Hasn't any vexillogist

=> vexillologist

> defined a full blown FDL (Flag Description Language) yet? That would be a sub-discipline of heraldic arms blazoning, I guess.
> 
> -Richard
> 
> <http://wenlin.com/cdl> ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150707/fe2dd04d/attachment.html>

From petercon at microsoft.com  Tue Jul  7 10:11:37 2015
From: petercon at microsoft.com (Peter Constable)
Date: Tue, 7 Jul 2015 15:11:37 +0000
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <5596D32E.9030403@att.net>
References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net>
 <CAGa7JC3BHO+ei4kgW2a9jwz1jXYE7v60y1X5yDxOYnTEtzkUOQ@mail.gmail.com>
 <CAJ2xs_EsY6WduoeZ9vB5uQEVNkjOJ06yC=zserTin6scHx98mA@mail.gmail.com>
 <CAGa7JC3EyK2CqOkrxGstB1ix9BOET9ZX0_=-FkNV-o6-fnVewQ@mail.gmail.com>
 <003401d0b4be$3af16970$b0d43c50$@fi>
 <BL2PR03MB1148BF6281CB167BC400730D5960@BL2PR03MB114.namprd03.prod.outlook.com>
 <5596D32E.9030403@att.net>
Message-ID: <BLUPR03MB120546B81867BBC92C7E9F5D5920@BLUPR03MB120.namprd03.prod.outlook.com>

I never said anything about stability of geopolitical entities. I only mentioned stability of encoded character sequences.

Peter

From: Ken Whistler [mailto:kenwhistler at att.net]
Sent: Friday, July 3, 2015 11:24 AM
To: Peter Constable
Cc: unicode at unicode.org
Subject: Re: Adding RAINBOW FLAG to Unicode


On 7/2/2015 5:56 PM, Peter Constable wrote:
Erkki, in this case, I think Philippe is making valid points.


-          For the proposal to be workable requires some means of ensuring stability of encoded representations. The way this would be done would be for CLDR to provide data with all valid sequences --- effectively becoming a registry.

I think that is wrong on a couple of grounds.

First, detailed stability of reference to actual defined geopolitical entities
or particular detailed flag designs is
not *required* for proposal to represent *pictographs* of flags by some
sequence of Unicode characters to be "workable". Sure, more stability
of reference is desirable. But the current RIS pair mechanism for representing
flag pictographs for countries is already "workable" -- it works and is widely deployed
and widely used -- without having guarantees that some particular country may
not decide tomorrow to change its official flag and hence result in some
particular pictographic display being obsolete in some sense, for example.

Second, the horse is already out of the barn regarding the particular
data that CLDR would be referring to. This works by reference to
the ISO 3166-2 scheme of subdivisions:

https://en.wikipedia.org/wiki/ISO_3166-2

and *that* becomes the registry required for stability of representations,
plus whatever grandfathering stability-of-code mechanism BCP 47
adds on top of that. We don't require a further detailed level of
registration, I think, to make this workable. If the New Zealand
Hawke's Bay Regional Council (NZ-HKB) decided it needed a district
flag (or decided to change one it may already have), I'm not going to be
overly concerned about the details there. As long as
<base, tag-N, tag-Z, tag-H, tag-K, tag-B> has a stable definition as
a Unicode extended flag tag sequence, it is up to somebody else to
decide if they want to actually map a Hawke's Bay flag pictograph in a font to
that sequence -- or update the flag pictograph they may have been
using.

Yeah, this could be a giant headache for any vendor that felt they
had to support *every possible* region/subdivision sequence
and keep the exact representations of flag pictographs stable. But
I predict this will very, very quickly result in people making a
"let's cover the 99% case" set of decisions, and then issues like
"Should we display a flag pictograph for the Hawke's Bay Regional
Council?" will be dealt with by the normal methods of triage for
feature requests.


-          The concepts being denoted are inherently political, often unstable, and sometimes highly sensitive.


Sensitive issues aside, a better approach would be to have a URN tagging scheme --- which IMO begs the question why this is a Unicode topic as it clearly crosses outside the limits of plain text.

A URN tagging scheme might make sense if what we were trying to
do was delegating all identity concerns to external authority,
and if we didn't care about efficiency of representation, either.

I don't think that is what this is about, as I tried to make clear yesterday.
I don't think we are encoding *flags* -- we are creating a mechanism
for the reliable representation of a set of *pictographs (emoji) for flags*.
And those pictographs for flags need an efficient representation that
can coexist comfortably with the rest of plain text -- the way the RIS
pairs already do.


Sensitive issues considered, though, it begs the question as to whether Unicode should be considering any of this at all, no matter what the scheme for encoded representation may be. Someone helpfully reminded us of this:


>> [...] the UTC does not wish to entertain further proposals for

>> encoding of symbol characters for flags, whether national, state,

>> regional, international, or otherwise. References to UTC Minutes:

>> [134-C2], January 28, 2013.

I believe that that statement (and the referenced decision) refer
specifically to the unwillingness of the UTC to entertain proposals
for encoding an indefinite number of pictographs for flags (of
whatever variety) *as symbol characters* -- that is,
one-by-one encodings as a single, gc=So code point in the standard.
Heading that direction is clearly not an efficient way to deal with
the concern, and would waste everybody's time in one-by-one
proposals and ad hoc decisions for each individual flag pictograph
to be added.

The UTC has a long history of putting a stake in the ground when it
encounters a character encoding problem which requires a *general*
solution, rather than a dribbling in of one-off decisions an item
at a time. And I think the tag proposal for dealing with
the representation of flag pictographs for regional subdivisions
shows precisely the kind of generality that we are looking for --
dealing with hundreds of potentially representable entities
with a general mechanism, rather than trying to encode them
all one-by-one.

Incidentally, back to the ostensible topic of this thread -- I don't
think the extended flag tag proposal currently addresses the
issue of how to represent a pictograph for a rainbow flag.
In that case I think a new registry mechanism might in fact make
sense -- and I have spelled out details of how one could reasonably
work in conjunction with the extended flag tag proposal in
feedback submitted on PRI #299.

--Ken


Peter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150707/36f4af74/attachment.html>

From doug at ewellic.org  Tue Jul  7 11:07:22 2015
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 07 Jul 2015 09:07:22 -0700
Subject: Adding RAINBOW FLAG to Unicode
Message-ID: <20150707090722.665a7a7059d7ee80bb4d670165c8327d.90c2185143.wbe@email03.secureserver.net>

Disclaimer: These are only suggestions. I've never submitted a character
proposal. You should prefer the advice of people who have, or of UTC
members who evaluate proposals.

Noah Slater <nslater at tumbolia dot org> wrote:

> Previously in this thread, it was suggested that I make a formal
> proposal to the UTC. I have held back from doing this because it's not
> at all clear what implementation I should be proposing, or whether I
> can propose something WITHOUT an implementation. (Some advise there
> would be handy!)

If by "implementation" you mean a suggestion for how Unicode should
encode this flag (single character, extension to the PRI #299 mechanism
similar to what Ken proposed, or something else), it might be a good
idea to summarize the options and choose at least one "preferred"
option.

> Should I trust that the UTC will be aware of the informal proposal of
> the rainbow flag when they meet to discuss PRI #299, or should I do
> something to properly bring it to their attention?

As Mark Davis wrote [1], this list is not a venue for formally proposing
anything, and it's not safe to assume that UTC members have read this
list and have any background. If you want to state something, make sure
you state it in the proposal. You can quote and paraphrase list
discussions, but don't just insert links to the list archive.

[1] http://www.unicode.org/mail-arch/unicode-ml/y2015-m07/0033.html

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From everson at evertype.com  Tue Jul  7 11:09:48 2015
From: everson at evertype.com (Michael Everson)
Date: Tue, 7 Jul 2015 17:09:48 +0100
Subject: vexillology, was: Adding RAINBOW FLAG to Unicode
In-Reply-To: <E9B8298E-967D-4D38-BAFC-527FFA0876DF@wenlin.com>
References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net>
 <CA+p4_H3goDi5Acs_KzBZoePtsJks4fYTCW4v0=ocV96f4tH6vQ@mail.gmail.com>
 <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com>
 <E9B8298E-967D-4D38-BAFC-527FFA0876DF@wenlin.com>
Message-ID: <DCBC0F39-9D45-49E5-BD51-0CCF41508C3F@evertype.com>

As I recall, Ant?nio Martins-Tuv?lkin and Anshuman Pandey both submitted proposals on this subject in 2007 or 2008 and in 2012 respectively.

Michael Everson * http://www.evertype.com/


From nslater at tumbolia.org  Tue Jul  7 11:29:29 2015
From: nslater at tumbolia.org (Noah Slater)
Date: Tue, 07 Jul 2015 16:29:29 +0000
Subject: Adding RAINBOW FLAG to Unicode
In-Reply-To: <20150707090722.665a7a7059d7ee80bb4d670165c8327d.90c2185143.wbe@email03.secureserver.net>
References: <20150707090722.665a7a7059d7ee80bb4d670165c8327d.90c2185143.wbe@email03.secureserver.net>
Message-ID: <CA+Y+447Z7qjFmDe1igm6mjxsyNWHX9=NgoPC-nKhE2-+_QOK3g@mail.gmail.com>

Thanks Doug. That's very helpful.

On Tue, 7 Jul 2015 at 17:07 Doug Ewell <doug at ewellic.org> wrote:

> Disclaimer: These are only suggestions. I've never submitted a character
> proposal. You should prefer the advice of people who have, or of UTC
> members who evaluate proposals.
>
> Noah Slater <nslater at tumbolia dot org> wrote:
>
> > Previously in this thread, it was suggested that I make a formal
> > proposal to the UTC. I have held back from doing this because it's not
> > at all clear what implementation I should be proposing, or whether I
> > can propose something WITHOUT an implementation. (Some advise there
> > would be handy!)
>
> If by "implementation" you mean a suggestion for how Unicode should
> encode this flag (single character, extension to the PRI #299 mechanism
> similar to what Ken proposed, or something else), it might be a good
> idea to summarize the options and choose at least one "preferred"
> option.
>
> > Should I trust that the UTC will be aware of the informal proposal of
> > the rainbow flag when they meet to discuss PRI #299, or should I do
> > something to properly bring it to their attention?
>
> As Mark Davis wrote [1], this list is not a venue for formally proposing
> anything, and it's not safe to assume that UTC members have read this
> list and have any background. If you want to state something, make sure
> you state it in the proposal. You can quote and paraphrase list
> discussions, but don't just insert links to the list archive.
>
> [1] http://www.unicode.org/mail-arch/unicode-ml/y2015-m07/0033.html
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150707/e4d35286/attachment.html>

From doug at ewellic.org  Thu Jul  9 10:53:26 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 09 Jul 2015 08:53:26 -0700
Subject: Precomposed Cyrillic letters
Message-ID: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net>

 From http://www.unicode.org/L2/L2015/15169-montenegro-cyrillic.pdf,
"Addition of two letters from Montenegrin language, CYRILLIC script":

> 9. Can any of the proposed characters be encoded using a composed
> character sequence of either existing characters or other proposed
> characters?
> No

Saying it doesn't make it so:

> Annex 1: Character shapes (related to section B, item 4b)
> Cyrillic small letter SJ
> ??

<0441 0301>

> Cyrillic capital letter SJ
> ??

<0421 0301>

> Cyrillic small letter ZJ
> ??

<0437 0301>

> Cyrillic capital letter ZJ
> ??

<0417 0301>

Quite a few fonts don't display these well (and quite a few do), but of
course that's a font problem, not an encoding problem.

Cf. http://www.unicode.org/faq/char_combmark.html#11


--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Thu Jul  9 10:58:08 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 09 Jul 2015 08:58:08 -0700
Subject: Tamil-Latin proposal
Message-ID: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net>

http://www.unicode.org/L2/L2015/15153-tamil-latin-proposal.pdf

I suppose the response to this proposal won't be made public.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From markus.icu at gmail.com  Thu Jul  9 11:37:21 2015
From: markus.icu at gmail.com (Markus Scherer)
Date: Thu, 9 Jul 2015 09:37:21 -0700
Subject: Precomposed Cyrillic letters
In-Reply-To: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net>
References: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net>
Message-ID: <CAN49p6pGmSVqF7b+n_=pLTbLto0m7SbPWQSAVDNZ0aegWLVnTg@mail.gmail.com>

On Thu, Jul 9, 2015 at 8:53 AM, Doug Ewell <doug at ewellic.org> wrote:

>  From http://www.unicode.org/L2/L2015/15169-montenegro-cyrillic.pdf,
> "Addition of two letters from Montenegrin language, CYRILLIC script":
>
> > 9. Can any of the proposed characters be encoded using a composed
> > character sequence of either existing characters or other proposed
> > characters?
> > No
>
> Saying it doesn't make it so:
>

Right, although I doubt that the proposers monitor this mailing list...

In case an interested party is listening: If sr-ME needs different locale
data than sr, then one could contribute such data to CLDR
<http://cldr.unicode.org/>.
See the current state:
http://unicode.org/cldr/trac/browser/trunk/common/main/sr_Cyrl_ME.xml

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150709/b545cf34/attachment.html>

From richard.wordingham at ntlworld.com  Thu Jul  9 13:25:05 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 9 Jul 2015 19:25:05 +0100
Subject: Tamil-Latin proposal
In-Reply-To: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net>
References: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net>
Message-ID: <20150709192505.6ef5db34@JRWUBU2>

On Thu, 09 Jul 2015 08:58:08 -0700
"Doug Ewell" <doug at ewellic.org> wrote:

> http://www.unicode.org/L2/L2015/15153-tamil-latin-proposal.pdf
> 
> I suppose the response to this proposal won't be made public.

It's a shame there's no precedent for proposals being rejected for
lying. However, it might be rejected for being a 'contemporary'
script with no users - that much is admitted to!

Richard.

From verdy_p at wanadoo.fr  Thu Jul  9 14:08:09 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 9 Jul 2015 21:08:09 +0200
Subject: Tamil-Latin proposal
In-Reply-To: <20150709192505.6ef5db34@JRWUBU2>
References: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net>
 <20150709192505.6ef5db34@JRWUBU2>
Message-ID: <CAGa7JC0kT=iY_X+nxzQCWsUwkJmzrr5b0wEwgVd5vMm3roBCKA@mail.gmail.com>

Also it's cleanrly not needed to duplicate Latin letters (or Cyrillic tool)
to borrow them in them Tamil script, just in order to add Tamil vowel
diacritics on top of them.
If that proposer wnats to creatre a font allowing combinng Latin/Cyrillic
letters with Tamil vowel signs, there's no need to duplicate the encoding
of these base letters. Nothing prohibits a font to map those combinations,
evne if it's not needed for other languages using the Latin and Cyrillic
scripts: that could be done by extending an existing Tamil font (most of
them already map Basic Latin, even if none of them currently map
combinations with Tamil vowel signs).
For the usage purpose desribed, in fact a good font for Latin and IPA would
work, with just a few additions fow allowing the Tamil vowel signs. And no
need to create specific encodings for Latin+generic diacritics, evne if the
precombined letters are not encoded (why those additional "base letters"
would be restricted to Tamil?)
Given there's no user using this extended script, the Unicode policy will
require first experimenting and creating a user community, and demonstrate
that for this usage, the existing encodings cannot work reliably. But for
now there's no need for it, no compatibility issues to resolve, no
dictionaries or old books for which this encoding would be useful.
And it's definitely not a problem of chicken and egg: this is an attempt to
bypass the UCS encoding policies specifically for a script that really does
not these duplicate extra base letters and combining vowels. And it's
definitely not a new script by a proposed "new" script whose characters are
in fact badly named! There's no such "Tamil-Latin" letters, but the real
standard is about transliterations of Tamil using standard Latin letters
(romanizations), or IPA symbols, and for that there are already standards
that do not need any of these additions that would in fact add more
complications and would solve no practical problems. Let's just focus on
the Tamil romanization standards, and romanized IME for Tamil which already
work as is.


2015-07-09 20:25 GMT+02:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> On Thu, 09 Jul 2015 08:58:08 -0700
> "Doug Ewell" <doug at ewellic.org> wrote:
>
> > http://www.unicode.org/L2/L2015/15153-tamil-latin-proposal.pdf
> >
> > I suppose the response to this proposal won't be made public.
>
> It's a shame there's no precedent for proposals being rejected for
> lying. However, it might be rejected for being a 'contemporary'
> script with no users - that much is admitted to!
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150709/d4d65122/attachment.html>

From richard.wordingham at ntlworld.com  Thu Jul  9 15:18:30 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 9 Jul 2015 21:18:30 +0100
Subject: Tamil-Latin proposal
In-Reply-To: <CAGa7JC0kT=iY_X+nxzQCWsUwkJmzrr5b0wEwgVd5vMm3roBCKA@mail.gmail.com>
References: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net>
 <20150709192505.6ef5db34@JRWUBU2>
 <CAGa7JC0kT=iY_X+nxzQCWsUwkJmzrr5b0wEwgVd5vMm3roBCKA@mail.gmail.com>
Message-ID: <20150709211830.736197d4@JRWUBU2>

I did wonder if part of the idea was to get consonant + pulli accepted
as basic.

On Thu, 9 Jul 2015 21:08:09 +0200
Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> Also it's cleanrly not needed to duplicate Latin letters (or Cyrillic
> tool) to borrow them in them Tamil script, just in order to add Tamil
> vowel diacritics on top of them.

Actually, this touches on a very real issue.  U+0BC0 TAMIL VOWEL SIGN
II has a script property of Tamil, and there is a very strong tendency
for <U+006D LATIN SMALL LETTER M, U+0BC0> to be split between two script
runs and consequently to be rendered as containing a defective sequence
- the cursed dotted circle of the literal grammar police appears.  I
confirmed this in LibreOffice using the Code2000 font, which I know
supports Tamil.

Richard.

From everson at evertype.com  Thu Jul  9 16:06:36 2015
From: everson at evertype.com (Michael Everson)
Date: Thu, 9 Jul 2015 22:06:36 +0100
Subject: ISO 15924
Message-ID: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>

Please see http://www.unicode.org/iso15924/codechanges.html for today?s updates.

Michael Everson
Registrar, ISO 15924


From richard.wordingham at ntlworld.com  Thu Jul  9 16:59:29 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 9 Jul 2015 22:59:29 +0100
Subject: Precomposed Cyrillic letters
In-Reply-To: <CAN49p6pGmSVqF7b+n_=pLTbLto0m7SbPWQSAVDNZ0aegWLVnTg@mail.gmail.com>
References: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net>
 <CAN49p6pGmSVqF7b+n_=pLTbLto0m7SbPWQSAVDNZ0aegWLVnTg@mail.gmail.com>
Message-ID: <20150709225929.1f3b029a@JRWUBU2>

On Thu, 9 Jul 2015 09:37:21 -0700
Markus Scherer <markus.icu at gmail.com> wrote:

> On Thu, Jul 9, 2015 at 8:53 AM, Doug Ewell <doug at ewellic.org> wrote:
> 
> >  From http://www.unicode.org/L2/L2015/15169-montenegro-cyrillic.pdf,
> > "Addition of two letters from Montenegrin language, CYRILLIC
> > script":
> >
> > > 9. Can any of the proposed characters be encoded using a composed
> > > character sequence of either existing characters or other proposed
> > > characters?
> > > No
> >
> > Saying it doesn't make it so:

Is there a requirement to answer those questions truthfully?

> Right, although I doubt that the proposers monitor this mailing
> list...
> 
> In case an interested party is listening: If sr-ME needs different
> locale data than sr, then one could contribute such data to CLDR
> <http://cldr.unicode.org/>.
> See the current state:
> http://unicode.org/cldr/trac/browser/trunk/common/main/sr_Cyrl_ME.xml

Presumably http://cldr.unicode.org/index/survey-tool/accounts is the
most relevant page for someone with credibility.  However, as
Montenegro has an army and a navy, you have the wrong locale.  It's
still waiting for a language code.  See the language family panels
at https://en.wikipedia.org/wiki/Eastern_Herzegovinian_dialect and
https://en.wikipedia.org/wiki/Montenegrin_language for the extreme
Balkanisation.

But in short, yes we need the extra Cyrillic letters ?? and ??  and
Latin letters ? and ? for the exemplar characters in sr_Cyrl_ME and
sr_Latn_ME (or should that be sr_ME?).  I can't work out the status of
Montenegrin Latin {sj} and {zj}.

Richard.


From doug at ewellic.org  Thu Jul  9 17:23:50 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 09 Jul 2015 15:23:50 -0700
Subject: Precomposed Cyrillic letters
Message-ID: <20150709152350.665a7a7059d7ee80bb4d670165c8327d.539fcf5c76.wbe@email03.secureserver.net>

Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:

> Presumably http://cldr.unicode.org/index/survey-tool/accounts is the
> most relevant page for someone with credibility. However, as
> Montenegro has an army and a navy, you have the wrong locale. It's
> still waiting for a language code. See the language family panels
> at https://en.wikipedia.org/wiki/Eastern_Herzegovinian_dialect and
> https://en.wikipedia.org/wiki/Montenegrin_language for the extreme
> Balkanisation.

Montenegro could have all the military power in the world, but that
doesn't make "Montenegrin" a distinct language. It's a dialect of
Serbian.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From richard.wordingham at ntlworld.com  Thu Jul  9 18:03:20 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 10 Jul 2015 00:03:20 +0100
Subject: Precomposed Cyrillic letters
In-Reply-To: <20150709152350.665a7a7059d7ee80bb4d670165c8327d.539fcf5c76.wbe@email03.secureserver.net>
References: <20150709152350.665a7a7059d7ee80bb4d670165c8327d.539fcf5c76.wbe@email03.secureserver.net>
Message-ID: <20150710000320.19415118@JRWUBU2>

On Thu, 09 Jul 2015 15:23:50 -0700
"Doug Ewell" <doug at ewellic.org> wrote:

> Montenegro could have all the military power in the world, but that
> doesn't make "Montenegrin" a distinct language. It's a dialect of
> Serbian.

"A language is a dialect with an army and a navy." - Variously
attributed, including to Antoine Meillet, who may not have required a
navy.

Richard.

From markus.icu at gmail.com  Thu Jul  9 21:18:15 2015
From: markus.icu at gmail.com (Markus Scherer)
Date: Thu, 9 Jul 2015 19:18:15 -0700
Subject: ISO 15924
In-Reply-To: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
Message-ID: <CAN49p6qoeDLntMZOpsi8-5QZ+bFiXXpQrFiCpxVaw7vpZzc7sA@mail.gmail.com>

Thanks!
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150709/40f2fbfc/attachment.html>

From jcb+unicode at inf.ed.ac.uk  Sat Jul 11 08:48:05 2015
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Sat, 11 Jul 2015 14:48:05 +0100 (BST)
Subject: a mug
Message-ID: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>

I feel the following mug says something about a popular topic of
debate on this list...


http://www.redbubble.com/people/insider/works/15315362-i-3-unicode

(do look at the picture, don't just infer from the url)

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From charupdate at orange.fr  Sat Jul 11 10:26:13 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 11 Jul 2015 17:26:13 +0200 (CEST)
Subject: a mug
In-Reply-To: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
References: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
Message-ID: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>

On Sat, Jul 11, 2015, Julian Bradfield  wrote:

> I feel the following mug says something about a popular topic of
> debate on this list...

As I feel concerned too, I'd like (I ?) to underscore that the designer of this mug seems to be insulting Unicode implementers and developers. 
Given the mass of popular characters that are already well rendered across platforms, and the huge sets of *new* items that are constantly adding, blaming people of not having done their job is doing no good. 
And above all, regardless of personal opinions and personality of mug designers, I think that the name of UNICODE should be left aside in such messages, because linking implementation issues with Unicode's corporate image is simply dishonest.

Thank you however for the information, it's always good to know what ideas are on stage out there...

Marcel Schneider
?

> Message du 11/07/15 15:58
> De : "Julian Bradfield" 
> A : unicode at unicode.org
> Copie ? : 
> Objet : a mug
> 
> I feel the following mug says something about a popular topic of
> debate on this list...
> 
> 
> http://www.redbubble.com/people/insider/works/15315362-i-3-unicode
> 
> (do look at the picture, don't just infer from the url)
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
>


From daniel.buenzli at erratique.ch  Sat Jul 11 11:15:33 2015
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Sat, 11 Jul 2015 17:15:33 +0100
Subject: a mug
In-Reply-To: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
References: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
 <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
Message-ID: <CDC832E6771D4B2D886BB38B8562160A@erratique.ch>

Le samedi, 11 juillet 2015 ? 16:26, Marcel Schneider a ?crit :
> As I feel concerned too, I'd like (I ?) to underscore that the designer of this mug seems to be insulting Unicode implementers

Being one of these I would like to tell you that I feel absolutely not insulted by this mug.  

I find it rather funny as it actually reflects a reality you can expect to see more and more. Given the sheer volume of characters that are being added to the standard you can't expect font designers to cater for all of them. And this is actually due to the very definition of Unicode itself whether you like it or not.  

Best,

Daniel


From johannes at bergerhausen.com  Sat Jul 11 11:36:30 2015
From: johannes at bergerhausen.com (Johannes Bergerhausen)
Date: Sat, 11 Jul 2015 18:36:30 +0200
Subject: a mug
In-Reply-To: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
References: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
 <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
Message-ID: <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com>

Yes, the mug is funny.

It shows not a Unicode problem, it points at a general font problem of operating systems.

Dear Apple, Dear Google, Dear Microsoft: please give us *all* missing Unicode glyphs right inside your operating systems!

As I said at TEDx in Vienna:
www.youtube.com/watch?v=IRdupNXpm8k

So, better would be:

I [] Apple.
I [] Google.
I [] Microsoft.

All the best,
Johannes

From public at khwilliamson.com  Sat Jul 11 12:33:54 2015
From: public at khwilliamson.com (Karl Williamson)
Date: Sat, 11 Jul 2015 11:33:54 -0600
Subject: a mug
In-Reply-To: <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com>
References: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
 <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
 <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com>
Message-ID: <55A15382.9000608@khwilliamson.com>

On 07/11/2015 10:36 AM, Johannes Bergerhausen wrote:
> Yes, the mug is funny.


>
> It shows not a Unicode problem, it points at a general font problem of operating systems.
>
> Dear Apple, Dear Google, Dear Microsoft: please give us *all* missing Unicode glyphs right inside your operating systems!
>
> As I said at TEDx in Vienna:
> www.youtube.com/watch?v=IRdupNXpm8k
>
> So, better would be:
>
> I [] Apple.
> I [] Google.
> I [] Microsoft.
>
> All the best,
> Johannes
>


http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg

From shervinafshar at gmail.com  Sat Jul 11 13:08:56 2015
From: shervinafshar at gmail.com (Shervin Afshar)
Date: Sat, 11 Jul 2015 11:08:56 -0700
Subject: a mug
In-Reply-To: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
References: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
Message-ID: <CA+ONOD=K4HsUifbwv6wcY0Q5ThVtTJswdwGBTvdm5AKahVmuiA@mail.gmail.com>

????????. ????, Unicode ??? ??? ????? ; ?? vs. ??. ????????. ????? ??
Unicode ????. ??????  ??????.  ? ????? ????.

? Shervin

On Sat, Jul 11, 2015 at 6:48 AM, Julian Bradfield <jcb+unicode at inf.ed.ac.uk>
wrote:

> I feel the following mug says something about a popular topic of
> debate on this list...
>
>
> http://www.redbubble.com/people/insider/works/15315362-i-3-unicode
>
> (do look at the picture, don't just infer from the url)
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150711/5b4dfd66/attachment.html>

From haberg-1 at telia.com  Sat Jul 11 13:54:34 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Sat, 11 Jul 2015 20:54:34 +0200
Subject: a mug
In-Reply-To: <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com>
References: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
 <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
 <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com>
Message-ID: <6E678419-0F79-4DD0-BC05-2833BA23E66D@telia.com>


> On 11 Jul 2015, at 18:36, Johannes Bergerhausen <johannes at bergerhausen.com> wrote:
> 
> As I said at TEDx in Vienna:
> [https://www.youtube.com/watch?v=IRdupNXpm8k]

The keyboards for different languages are essentially the same nowadays: it sends a code indicating which button is acted on and whether it is depressed or released. The computer then translates using a key map. So for a Cherokee keyboard, as discussed in the video, one would need different images on the keys if one bothers, and a key map.

One problem here is that is that it is very time consuming to design such key maps. This is another shortcoming of Unicode usage: lack of input methods, in addition to the font issue.


From petercon at microsoft.com  Sun Jul 12 01:09:03 2015
From: petercon at microsoft.com (Peter Constable)
Date: Sun, 12 Jul 2015 06:09:03 +0000
Subject: ISO 15924
In-Reply-To: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
Message-ID: <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com>

Is there a significance to the colours in the table?


Peter

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Michael Everson
Sent: Thursday, July 9, 2015 2:07 PM
To: unicode Unicode Discussion; UnicoRe Mailing List
Subject: ISO 15924

Please see http://www.unicode.org/iso15924/codechanges.html for today?s updates.

Michael Everson
Registrar, ISO 15924


From everson at evertype.com  Sun Jul 12 06:19:57 2015
From: everson at evertype.com (Michael Everson)
Date: Sun, 12 Jul 2015 12:19:57 +0100
Subject: ISO 15924
In-Reply-To: <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com>
References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
 <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com>
Message-ID: <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com>

Yes, and this usage is explained on the page (as it has been since 2006).

> On 12 Jul 2015, at 07:09, Peter Constable <petercon at microsoft.com> wrote:
> 
> Is there a significance to the colours in the table?
> 
> Peter

Michael Everson * http://www.evertype.com/


From umesh.p.nair at gmail.com  Sat Jul 11 11:17:23 2015
From: umesh.p.nair at gmail.com (Umesh P N)
Date: Sat, 11 Jul 2015 09:17:23 -0700
Subject: a mug
In-Reply-To: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
References: <slrnmq27kk.59b.jcb@home.stevens-bradfield.com>
 <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26>
Message-ID: <CANnXk+VZxOuyJMrUOorqmVQPoThZq=bNwR-s6qSHMzpgi-2NYQ@mail.gmail.com>

On Sat, Jul 11, 2015 at 8:26 AM, Marcel Schneider <charupdate at orange.fr>
wrote:

> On Sat, Jul 11, 2015, Julian Bradfield  wrote:
>
> > I feel the following mug says something about a popular topic of
> > debate on this list...
>
> As I feel concerned too, I'd like (I ?) to underscore that the designer of
> this mug seems to be insulting Unicode implementers and developers.
> Given the mass of popular characters that are already well rendered across
> platforms, and the huge sets of *new* items that are constantly adding,
> blaming people of not having done their job is doing no good.
>

Henri Bergson has observed
?:

Laughter is purely cerebral: being able to laugh seems to require a
> detached attitude, an emotional distance to the object of laughter
> ?.
>

(A well-known example is laughing when somebody falls down over a banana
peel?.  We can't laugh if the fall was serious and causes the person some
injury, thus making us emotionally attached to the person.)


So, having a strong emotional attachment to unicode can make this kind of
joke offensive.  I found it as funny as the CSS mug
<http://www.zazzle.com/cheap_css_is_awesome_mug-168565401817501350>. (Some
version
<http://www.zazzle.com/css_is_awesome_with_overflow_mug-168685521846695550>
of this mug has the pun overflow:hidden also specified.) I don't know the
people who maintain the CSS standards and the developers of various
browsers and tools get heavily offended by that mug.

Satire and cartoons exaggerate minor things that helps making the object
better and healthier.  We are not dictators who cannot tolerate criticism
and satire.

- Umesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150711/04fdf910/attachment.html>

From charupdate at orange.fr  Mon Jul 13 04:15:54 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Mon, 13 Jul 2015 11:15:54 +0200 (CEST)
Subject: a mug
Message-ID: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12>

On Sat, Jul 11, 2015, 18:15, Daniel B?nzli  wrote:

> On Sat, Jul 11, 2015, 16:26, Marcel Schneider a ?crit :
> > As I feel concerned too, I'd like (I ?) to underscore that the designer of this mug seems to be insulting Unicode implementers
> 
> Being one of these I would like to tell you that I feel absolutely not insulted by this mug. 
> 
> I find it rather funny as it actually reflects a reality you can expect to see more and more. Given the sheer volume of characters that are being added to the standard you can't expect font designers to cater for all of them. And this is actually due to the very definition of Unicode itself whether you like it or not. 


On Sat, Jul 11, 2015, 18:17, Umesh P N  wrote:

> Henri Bergson has observed:
> Laughter is purely cerebral: being able to laugh seems to require a detached attitude, an emotional distance to the object of laughter.
>
> (A well-known example is laughing when somebody falls down over a banana peel.? We can't laugh if the fall was serious and causes the person some injury, thus making us emotionally attached to the person.)
>
> So, having a strong emotional attachment to unicode can make this kind of joke offensive.? I found it as funny as the CSS mug. (Some version of this mug has the pun java-script:hidden also specified.) I don't know the people who maintain the CSS standards and the developers of various browsers and tools get heavily offended by that mug.
>
> Satire and cartoons exaggerate minor things that helps making the object better and healthier.? We are not dictators who cannot tolerate criticism and satire.


I see that taking it serious I was very wrong, and I thank all who answered on this thread, for having helped to put things into perspective.

Of course everybody may feel free to laugh. There are just two problems about. First, as Umesh points out quoting Bergson, this implies some lack of empathy. Abb? Pierre never laughed, as he has discovered about himself in an interview. Personally, I do, unfortunately, even too much. However, and this is the second problem, one should not mix up responsibilities and then laugh at the wrong body, because here's where satire ends and injustice is starting.

As Johannes Bergerhausen pointed it out a little later:

On Sat, Jul 11, 2015, 18:44, Johannes Bergerhausen"  wrote:

> Yes, the mug is funny.
> 
> It shows not a Unicode problem, it points at a general font problem of operating systems.
> 
> Dear Apple, Dear Google, Dear Microsoft: please give us *all* missing Unicode glyphs right inside your operating systems!
> 
> As I said at TEDx in Vienna:
> www.youtube.com/watch?v=IRdupNXpm8k
> 
> So, better would be:
> 
> I [] Apple.
> I [] Google.
> I [] Microsoft.

If people (including me) took the pain of installing some complete fonts and setting the fallback behavior of the app (if feasible), they would not experience any longer the oddities this satirist seems to be laughing at while making (hateful?) insinuations. But they?re too busy with designing mugs...

It's roughly the same problem with the CSS and UTF-8 malfunctioning that is laughed at with the other merchandising items brought in by Umesh:
http://www.zazzle.com/cheap_css_is_awesome_mug-168565401817501350
http://www.zazzle.com/css_is_awesome_with_java-script_mug-168685521846695550
and Karl Williamson 
(On Sat, Jul 11, 2015, 19:42):
http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg

Personally the only time CSS was awesome to me is when I'd written bad code. In truth, CSS is very smart and allows browsers to adapt the box width to the content, if not hindered in doing so by some fixed-width. We can write bad code in any language, but then we should rather laugh at our own incapacity.

Idem with charsets. The only time I saw UTF-8 like on the T-shirt, was when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do was to add the charset in the file header. Of course one can make T-shirts about that, but people wearing them while meaning to be laughing at Unicode Transformation Format, are more likely to get other people laughing at themselves for not knowing how to begin an HTML file, isn't it?

I feel concerned because I recently published on this list (WORD JOINER vs ZWNBSP) some harsh criticism about a word processor that hadn?t implemented U+2060 WORD JOINER, which displays as a kind of .notdef box unless the font is set to Segoe UI Symbol. ?I?am concerned to mention that this very valuable workaround has been provided on this List by Mr?Doug?Ewell (on Tue, Jun 30, 2015). I wouldn?t have got by myself the idea to look for U+2060 in Segoe UI Symbol. This works also for U+205D TRICOLON. When I insert the tricolon and the quadricolon U+205E side by side in Segoe UI Symbol, and then switch the font to Arial, the tricolon is replaced with a .notdef box on my version of Word. The behavior of LibreOffice?4.2.4.2 this time is exactly the same except for the .notdef box, which in that case is *not* displayed on LibreOffice, letting me unaware of the missing tricolon! Well, I?m likely to restart, making my first reply turn out to be a kind of lenification...

About why I come up with the tricolon-quadricolon (VERTICAL FOUR DOTS) issue, there is to say that I?wanted to use ? as a representation of U+2060, and ? for U+FEFF. Now I must use a common colon for this. (|, ? and ?? are already taken. Alternate ideas are welcome.)

All those mischiefs, I fully agree, are clearly all about implementation and particularly, about font support and fallback handling, and nothing about Unicode.

Best regards,

Marcel

P.S.: For the case that future readers stumble on this thread by a Google or Bing search (and because I hope so mean a mug won?t find many buyers), I should have mentioned the topic: an ?I ? UNICODE? mug where the heart symbol (U+2665) is replaced with a .notdef box:


The product designation is ?I <3 UNICODE!?, insinuating that for exapmple emojis still aren?t converted to pictures. The message as I decrypt it, is: ?Unicode implementations are so uncomplete that I can?t use the Unicode characters I?d like to; consequently I cannot like/love Unicode.? ?BTW I find the expression is rather clumsy, as this one is inserted (and displayed!) by Alt+3 on every Windows numpad.
And here are CSS and UTF-8:


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150713/4772cb91/attachment.html>

From verdy_p at wanadoo.fr  Mon Jul 13 05:53:25 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 13 Jul 2015 12:53:25 +0200
Subject: a mug
In-Reply-To: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12>
References: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12>
Message-ID: <CAGa7JC1fAAL+dDNAZi+99FqvR5sQd8PHcFY99nFHnVFQTPgDyA@mail.gmail.com>

2015-07-13 11:15 GMT+02:00 Marcel Schneider <charupdate at orange.fr>:

> It's roughly the same problem with the CSS and UTF-8 malfunctioning that
> is laughed at with the other merchandising items brought in by Umesh:
> http://www.zazzle.com/cheap_css_is_awesome_mug-168565401817501350
>
> http://www.zazzle.com/css_is_awesome_with_java-script_mug-168685521846695550
> and Karl Williamson <public at khwilliamson.com> (On Sat, Jul 11, 2015,
> 19:42):
> http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg
>
> Personally the only time CSS was awesome to me is when I'd written bad
> code. In truth, CSS is very smart and allows browsers to adapt the box
> width to the content, if not hindered in doing so by some fixed-width. We
> can write bad code in any language, but then we should rather laugh at our
> own incapacity.
>
> Idem with charsets. The only time I saw UTF-8 like on the T-shirt, was
> when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do
> was to add the charset in the file header.
>
Or simply add a leading BOM. All browsers will autodetect it. This only
concerns HTML files (on a local filesystem).

BOMs are not recommended for UTF-8 encoded javascripts: if your HTML local
file references a local javascript file, it can specify the expected file
type in addition to the local URL of the script file itself: this is an
HTML attribute to add to the HTML "script" element. If your page needs to
perform JSON requests, the JSON is normally served by a webserver that will
deliver the MIME type and charset in metadata. Some JSON parsers can also
be set to autodetect the BOM and then discard it from the visible content.

That's just the first 3 bytes to check in the input stream before sending
the stream data to the parser which can then be instantiated and
initialized directly with the correct charset.

For pages served by webservers, you add it in the metadata of your shared
folder to associate some files with MIME types. This can even be a global
setting of the server if all your pages and scripts are UTF-8 encoded, or
this can be set on the main folder and changed for specific folders for
files that should not be sent with the UTF-8 MIME metadata but with another
charset.

Or you can add the autodetection feature in Apache which will autodetect
the BOM in the file, then serve the UTF-8 file without this leading BOM but
with the corrected filesize and the correct MIME type with its charset
extension.

It is more complicate for files hosted on FTP as there's no MIME metadata:
for that the BOM is still the easiest option (but it will be up to the FTP
client to perform the autodetection. Autodetecting a BOM is much more
efficient than autodetecting an HTML meta tag in the header (this requires
aborting the curent parsing in the middle and restart it, this uses more
memory that will need to be garbage collected, and requires some
miliseconds and more CPU resources as HTML parsers are very costly in terms
of CPU-processing)..

If you place the charset in a meta tag of the HTML page, make sure that
this tag is near the begining of the HTML header (it should be fully within
the first 4KB, and even before the mandatory <title> element). In my
opinion this meta tag should ve the first child element of the <head>
element which is otself the first element of the <html> element that
immediately follows the optional HTML doc type declaration. If your page is
XHTML, you should use the leading XML declaration line to put that charset
indication: putting the indication in the first 4KB allows some charset
guessers to identify the charset faster without actually starting to
instanciate a parser and abort it in the middle. 4KB is typically the size
of a single memory page, so that page will remain in CPU/bus caches without
using paging I/O. The CPU cost will be minimal if the charset can be
autodetected very early in a few nanoseconds by just scanning the content
of a single memory page. 4KB is much large enough so that any placement of
the autodetected signatures will succeed without having to wait for long.

Actually I even think that the tag should be in the first 1400 bytes (to
match the maximum size of a single TCP packet with the smallest MTU:it will
minimize the networking I/O delays: aborting a parser and restartging it
has a significant processing time that could delay even more the processing
of the next TCP packet, which coudl then be paged out by the OS if there
are concurrent networking streams used by concurrent processes, such as
large file downloads or an active streamed video).

I just wonder why HTML5 did not deprecate the old meta tag of HTML4 in
favor of an attribute directly in the <html> root element, or even in its
recommended DOCTYPE declaration. But if you use the abbreviated HTML5
doctype line, its default should be UTF-8 and no indeication is necessary
(charset guessers should not be used with HTML5, except in case of parsing
failure only as a possible recovery solution, in which case the meta tag
may be processed. If there's no parsing error for the main document,
excluding all other referenced documents suc has scripts or inner frames,
the meta tag should better be ignored even if its present and specifies
something else).

May be in some future, there will be an HTML6 that enforces the use of a
single charset and possibly a more compact encoding. We've seen similar
radical changes including for core protocols such as HTTP(S) itself. this
could become a single unified protocol mixing this new generation HTTP and
HTML capabilities, but with more capabilities such as dynamic parallel
streams, encryption, authentication, simplified and more efficient data
signature, real time constraints and QoS management of streams for web
applications, and a more efficient support for encapsulated binary data
(notably audio/video/images, or even nearly native executable scripts,
precompiled by the server for the target client when its processing
capabilities are constrained, notably smartphones to save energy in their
battery). That future of HTML will focus muich more on its API, the
effective encoding may be autoadapted or negociated and cached (given that
we need security now everywhere on the web, negociation protocols are
already used: this is for now just for authenticating and exchanging
encryption pairs, but it could negociate in the same roundtrip some
presentation formats such as the MIME type and charset encoding,
compression levels, and binary compatibility of the clients for receiving
precompiled executable contents, or for sharing tasks and CPU/GPU resources
or local/remote storage, or synchronization of cached data)

---

We'll rapidely need in the future a true "network-centered OS" where
applications can run on one or more devices in parallel, owned by the
client or by the service provider, and allowing on-demand allocation and
sharing of processing ressources available locally or remotely. On that OS,
there will no longer be the concept of a host (or it will just be a virtual
delocalized host), the concept of "local" may be replaced by the concept of
personal user environment which will autoadapt to the capacilities of
devices around him and the available networking bandwidths.

At that time, this virtualized OS will certainly be 128-bit (and not 64-bit
as of today), and it will manage many terabytes of virtual memory,
including the environment of other users located anywhere. Clients and
servers will share or demand resources to that network dynamically and the
core element of this OS will be to manage caches, automatic
synchronization, and bandwidth allocations, and nobody will know "where"
the code is actually running physically. All devices will then exchange
indifferently code or data, or will perform computing tasks delegated to
them by other members in the network (including transformation codecs). The
network OS will provide the necessary isolation for security and the
architecture will be more peer-to-peer, working in a collaborative grid
computing architecture. It will be also failure resistant, with implicit
backup/replication.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150713/2baf72c4/attachment.html>

From richard.wordingham at ntlworld.com  Wed Jul 15 02:49:13 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 15 Jul 2015 08:49:13 +0100
Subject: Mark-up to Indicate Words
Message-ID: <20150715084913.2b66392e@JRWUBU2>

What mark-up schemes exist to show that a sequence of letters and
combining marks constitutes a single word?

Such mark-up would be useful when using spell checkers. At present, I
use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary.
(Systematic marking of boundaries using ZWSP is not popular with
users, and is normally not used in Thai - it's not supported in
their national or Windows 8-bit encodings.) However, it seems likely
that when Unicode 8.00 is defined in August, WJ will suppress line
breaks but not word breaks.  There would still be the limitation that
mark-up is not available in plain text.

It appears that, for example, Open Document Format has no mark-up to
indicate word boundaries, relying instead on the overrides of
the word boundary detection algorithms being stored at character level.

Richard.

From charupdate at orange.fr  Wed Jul 15 04:06:41 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 15 Jul 2015 11:06:41 +0200 (CEST)
Subject: Input methods at the age of Unicode (was: a mug)
Message-ID: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>

On Sat, Jul 11, 2015, at 20:54, Hans Aberg  wrote:

> On 11 Jul 2015, at 18:36, Johannes Bergerhausen  wrote:
>>
>> As I said at TEDx in Vienna:
>> [https://www.youtube.com/watch?v=IRdupNXpm8k]

> The keyboards for different languages are essentially the same nowadays: it sends a code indicating which button is acted on and whether it is depressed or released. The computer then translates using a key map. So for a Cherokee keyboard, as discussed in the video, one would need different images on the keys if one bothers, and a key map.

> One problem here is that is that it is very time consuming to design such key maps. This is another shortcoming of Unicode usage: lack of input methods, in addition to the font issue.

I fully agree. These keyboard updates are consistent with Microsoft?s new corporate ambition which consists in empowering people to achieve more, Microsoft?s CEO Satya?Nadella wrote to All?Employees on July 10, 2014 at 6:00 a.m. PT http://bit.ly/1wRIBqD
If we understand the goal as a relative one, users will be allowed to do more than during the past few decades. Obviously, better keyboard UIs are essential in this process.

We are today mainly still using inherited ANSI keyboards, despite of using Unicode characters. Overcoming this discrepancy is urgent, and I believe that at development level, this is very easy (though it may be time consuming, as Johannes warns us). Whether it is easy at users? level too, depends on the amount of novelty packed into the keymap. In Cherokee, users now would probably be learning to use casing, due to the script?s new extension to bicamerality.

By contrast, to convert all US American Standard keyboards to Unicode keyboards, nothing else is needed than replacing the spacing Grave with the Letter Apostrophe, and the right-hand Alt key with a Compose key, acted by the right-hand thumb. The need of U+02BC in English results from evidence accessible by last month?s thread ?A new take on the English apostrophe in Unicode?. 

For example, users who want to input smart quotes without an algorighm may then type Compose, {, ", for an opening quotation mark, or Compose, ], ', for a closing single-quote. Compose, Letter Apostrophe, a, brings ?. This principle extends to all Latin letters and punctuations (about two thousand, if my estimation is correct). There will then be no more separate US?International keyboard layout. That layout seems not to be determined by efficiency but by it?s creation environment (seemingly excluding dead key chaining), as well as by IBM?s choice not to copy Digital?s Compose key (but the inverted T arrow keys and six miscellaneous only). The US?Intl is so bad it cannot be currently kept in use, Mark?Davis explained on Sun Jul 18 1999 - 13:47:47 EDT http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML017/0558.html).

The set of all Latin letters is thus made available thanks to the chained dead keys implementation of the Compose functionality. On the other hand, designing key maps for any alphabetical language on earth appears to be rather easy. Much easier and probably far less time consuming in any case than writing some other software. Writing keyboard drivers is essentially editing key defines, allocation tables, and deadtrans function lists. The latter two are best done with spreadsheet software. At the condition that spreadsheet software (e.g. Excel?2010?Starter) is used, the job is much less complicated than it ordinarily may have the reputation. Because good keyboard layouts have long deadlists, and these are not efficiently edited with ordinary keyboard editing software UIs. Keyboard layout sources in software format too may be edited in spreadsheets and lead to good results if the deadkey chaining flag is accessible. On Windows this is the case in KbdEdit, but the object modules (drivers) compiled by this software are proprietary and therefore cannot be effectively shared.

Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). Nothing is needed that would not be publicly available. There?s nothing to wait for.

Good luck,

Marcel 

P.S.: There?s a new version of the Compose Key article in Wikipedia:
https://en.wikipedia.org/wiki/Compose_key

To quickly resume the advantages of the new US English Unicode keyboard layout and the similar UK English Unicode keyboard layout:

- Backward compatibility: Simply consider that the engraved Grave now stays for a curly apostrophe (which is very approximate but avoids keycap stickers).

- Application compatibility: The smart quotes algorithm stays working for what it is made for, and stops to be sollicited for what it isn?t made for: simulating apostrophes in all positions, including leading apostrophes.

- Adaptability: The user recovers full autonomy and can now decide by himself whether he wants an apostrophe or a quotation mark. No more workarounds are needed.

- Efficiency: The reintroduced Compose key, on right Alt, is a super dead key which allows to type huge sets of characters without much memorization, while the nearly useless** Grave accent key position becomes suddenly useful again.

- Efficacy: No more spaces needed to type apostrophes and quotes, no key is hijacked for a dead key any longer, except the otherwise rather useless right Alt key (a double of the left, and on the wrong side of the space bar for Alt+NumPad). No more confusion with Ctrl+Alt application shortcuts, like AltGr used to create on Windows, while AltGr can be made available in a safe emulation thanks to a Shift?+?Right?Alt dead key.

- Quality: Resulting text files are much more useful than versions that mix up apostrophes and single closing-quotes. For computer processing, paired punctuations and unpaired punctuations must be clearly distinct, regardless of any glyphic resemblance, and even more as in real English, the apostrophe has not punctuation status but letter status.


**I know that because the Grave is on the keyboard, it is used in markup and perhaps in programming (seemingly not in C/C++). On a Unicode keyboard, a Space following a diacritic dead key chain inserts the combining diacritic (which is against the inherited rule, dating from before combining diacritics were encoded). As on a Unicode keyboard, Shift+Space should be NBSP, spacing diacritics are inserted when the diacritic is followed by NBSP. Both behaviors are already implemented for Mac?OS?X: http://uscustom.sourceforge.net/. In current writing, spacing diacritics are generally much less useful than combining ones. To speed up the insertion of the spacing Grave, we might use Compose, s (for Spacing), g (for Grave). Likewise we would have spacing Acute (sa), Cedilla (sc) and Little Tilde (?st? or ?slt?, not ?lt? which is already taken).


Along with this, word processor updates must extend the smart quotes algorithms to support the correct handling of the apostrophe. This too is rather easy to implement:

* Extended autocorrect settings will allow users to specify whether the most used squiggle is apostrophe or single quotation mark, and whether the apostrophe be U+02BC or U+2019. These toggles should be actionable by customizable keyboard shortcuts, and an info bubble and/or a flag will show what?s on.

* Conforming to Ted?Clancy?s proposal, a new Option setting will empower users to dedicate the Apostrophe key to the apostrophe *exclusively*, and to use the Quotation mark key for *all* quotation marks, whether they be double or simple. This is indeed feasible in English (otherwise as I thought when replying in the thread ?A new take on the English apostrophe in Unicode?, and otherwise as in good French and German usage where angle quotation marks are used for quotations, vs comma quotation marks for scares [using angle quotes as scare quotes is bad practice]).

* Automatic quotes pairing therefore will insert matching characters at input, and check pairing at revision.

* Multiple stroke with circular output will insert the most used quotation mark after the Quotation mark key is hit one time, and the other after two times. The most used is set in the options. For example, in American English, the user may choose to get single quotes first because he?s a scientist and needs to mark many words, while he may switch to double quotes first when writing litterary text. The same should be available for the Apostrophe key: whether leading apostrophe or quotation mark after one stroke, the other one after two strokes, and an appropriate sequence of both after three keystrokes. Hitting the key again will restart the cycle, and so forth. An info bubble, or colored display as suggested by William Overington on Fri, Jun?05, 2015, 11:48, could disambiguate apostrophe and quote. Alternately the letter apostrophe may be displayed on the customizable ?field? color as are NBSP and WJ on LibreOffice.

* New Help sections may be invoked for ready information about the usefulness of Letter Apostrophe and the features facilitating its usage. We must depart from the comfortable idea about users who are meant to be unwilling to spend any thought about why and how to distinguish two characters that look identical. This idea should be considered as respectless (despiteful, I would say), and IMO this idea is probably just a mean pretext for reducing production costs by lowering the product quality. (The product being the word processor, e.g. Microsoft Word.)

* An optional dialog will display every time there is an ambiguity, that is when a leading apostrophe is typed, and also when a trailing apostrophe is typed while a marked quotation is open (after an opening single quotation mark). This dialog may ask ?Do you wish to type an apostrophe?? or alternately, ?Is this a quotation mark??. The choice may be set with Tab, and validated with Space.

* Users who wish to keep mixing up, will be welcome to do so (???Don?t ask me again?). This choice may be cancelled in the Settings (??Distinguish apostrophes and quotation marks; ??Display the apostrophe dialog).


For subscribers who have read until here and who agree to read forth, I?m concerned to note that any criticism is rather easily uttered as long as the default seems to be on the side of Unicode, a fact that would explain why Unicode bashing is meant to be so popular that we can find it even on mugs (see the parent thread of this), as if we were meant to take pleasure in repeating to ourselves every morning at breakfast that our universal charset is still useless and won?t work before a long time. By contrast, as soon as the responsibilities end up to be shifted from the Consortium to its most powerful members, as are Apple, Google, Microsoft, especially the latter, only very few persons carry on.

In this paragraph I would like to vent more and try to debrief the Apostrophe thread, but I fear that would be too long and tiresome. I just mention that many persons are monitoring this Mailing List who know exactly why Unicode decided to recommend U+02BC for the English apostrophe, and who know exactly how things happened when U+02BC was discarded to the benefit of U+2019, but that nobody conceded to disclose these pieces of information, neither when the information written up by Ted?Clancy was submitted by a Mailing List subscriber, nor when I shared the results of my decrypting early NamesList versions. Consistently, I ended up to be blamed of knowing little about. 

Now I try again to learn more by submitting the following three questions:

1. Why had the UTC recommended U+02BC as apostrophe?

2. Why has the UTC withdrawn its recommendation?

3. On whose demand the UTC moved the information about the preferred character for apostrophe from U+02BC to U+2019?

Answering these three questions is essential for a thorough understanding of history, which will reinforce the bases of keyboard reengineering as it must be carried on at this juncture of imminent Windows?10 release.

Best regards,

Marcel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150715/b6fdd057/attachment.html>

From duerst at it.aoyama.ac.jp  Wed Jul 15 06:18:09 2015
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=)
Date: Wed, 15 Jul 2015 20:18:09 +0900
Subject: Mark-up to Indicate Words
In-Reply-To: <20150715084913.2b66392e@JRWUBU2>
References: <20150715084913.2b66392e@JRWUBU2>
Message-ID: <55A64171.9070600@it.aoyama.ac.jp>

Hello Richard,

On 2015/07/15 16:49, Richard Wordingham wrote:
> What mark-up schemes exist to show that a sequence of letters and
> combining marks constitutes a single word?
>
> Such mark-up would be useful when using spell checkers. At present, I
> use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary.
> (Systematic marking of boundaries using ZWSP is not popular with
> users, and is normally not used in Thai - it's not supported in
> their national or Windows 8-bit encodings.) However, it seems likely
> that when Unicode 8.00 is defined in August, WJ will suppress line
> breaks but not word breaks.  There would still be the limitation that
> mark-up is not available in plain text.
>
> It appears that, for example, Open Document Format has no mark-up to
> indicate word boundaries, relying instead on the overrides of
> the word boundary detection algorithms being stored at character level.

I'd suggest looking at higher-end formats such as DITA or TEI (Text 
Encoding Initiative).

Regards,   Martin.

> Richard.
> .
>

From haberg-1 at telia.com  Wed Jul 15 09:07:12 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Wed, 15 Jul 2015 16:07:12 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
Message-ID: <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>


> On 15 Jul 2015, at 11:06, Marcel Schneider <charupdate at orange.fr> wrote:

> Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). 

In mathematics, there are a couple of thousands of characters, including Latin and Greek styles, which would take some time to develop a key map for.


From petercon at microsoft.com  Wed Jul 15 16:03:08 2015
From: petercon at microsoft.com (Peter Constable)
Date: Wed, 15 Jul 2015 21:03:08 +0000
Subject: ISO 15924
In-Reply-To: <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com>
References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
 <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com>
 <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com>
Message-ID: <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com>

I don't see an explanation of the pale yellow or pale green shading. 

Also, re this:

"All changes are displayed in color and italics..."

Every row is a change record, yet not every row (in fact no row) is entirely coloured and in italics. If what is meant is "All changed values are displayed in color and italics...", then that is still not the case: there are lots of coloured cells that do not have italics text. 

To me, it's all rather unclear.


Peter

-----Original Message-----
From: Unicore [mailto:unicore-bounces at unicode.org] On Behalf Of Michael Everson
Sent: Sunday, July 12, 2015 4:20 AM
To: unicode Unicode Discussion; UnicoRe Mailing List
Subject: Re: ISO 15924

Yes, and this usage is explained on the page (as it has been since 2006).

> On 12 Jul 2015, at 07:09, Peter Constable <petercon at microsoft.com> wrote:
> 
> Is there a significance to the colours in the table?
> 
> Peter

Michael Everson * http://www.evertype.com/


From verdy_p at wanadoo.fr  Wed Jul 15 16:31:09 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 15 Jul 2015 23:31:09 +0200
Subject: ISO 15924
In-Reply-To: <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com>
References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
 <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com>
 <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com>
 <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com>
Message-ID: <CAGa7JC33NZjufTWY964O1Lp_uRd96XJ+w-wjg1u-tjDZhce=oA@mail.gmail.com>

pale yellow are cells that have had a change since the first publication
(most of them for fixing names with better ones, less ambiguous, or
changing the order of names when there are synonyms, to put the most common
one at first position, or to fic minor typos when the first publication was
an approximative translation that does not match the most common name :
they have an history you can look at, the date indicated is the date of
last modification which is different from their first release)
The history is not on the table itself.

2015-07-15 23:03 GMT+02:00 Peter Constable <petercon at microsoft.com>:

> I don't see an explanation of the pale yellow or pale green shading.
>
> Also, re this:
>
> "All changes are displayed in color and italics..."
>
> Every row is a change record, yet not every row (in fact no row) is
> entirely coloured and in italics. If what is meant is "All changed values
> are displayed in color and italics...", then that is still not the case:
> there are lots of coloured cells that do not have italics text.
>
> To me, it's all rather unclear.
>
>
> Peter
>
> -----Original Message-----
> From: Unicore [mailto:unicore-bounces at unicode.org] On Behalf Of Michael
> Everson
> Sent: Sunday, July 12, 2015 4:20 AM
> To: unicode Unicode Discussion; UnicoRe Mailing List
> Subject: Re: ISO 15924
>
> Yes, and this usage is explained on the page (as it has been since 2006).
>
> > On 12 Jul 2015, at 07:09, Peter Constable <petercon at microsoft.com>
> wrote:
> >
> > Is there a significance to the colours in the table?
> >
> > Peter
>
> Michael Everson * http://www.evertype.com/
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150715/b15fa3f7/attachment.html>

From everson at evertype.com  Wed Jul 15 17:47:06 2015
From: everson at evertype.com (Michael Everson)
Date: Wed, 15 Jul 2015 23:47:06 +0100
Subject: ISO 15924
In-Reply-To: <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com>
References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com>
 <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com>
 <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com>
 <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com>
Message-ID: <6A8D873D-8812-4098-B3B5-ED5C130DBF01@evertype.com>


> On 15 Jul 2015, at 22:03, Peter Constable <petercon at microsoft.com> wrote:
> 
> I don't see an explanation of the pale yellow or pale green shading. 
> 
> Also, re this:
> 
> "All changes are displayed in color and italics?"

Please read the next clause:

?entry additions are not given in italics."

The Category of Change Key is found at the bottom of the page. 

> Every row is a change record, yet not every row (in fact no row) is entirely coloured and in italics.

Nor should they be. A full row is an addition. Only changes are in italics. 

> If what is meant is "All changed values are displayed in color and italics...", then that is still not the case: there are lots of coloured cells that do not have italics text. 

Are any of those cells in a row marked ?Add??

Michael Everson * http://www.evertype.com/


From charupdate at orange.fr  Thu Jul 16 03:29:11 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 16 Jul 2015 10:29:11 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
Message-ID: <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>

On Sat, Jul 11, 2015, at 20:54, Hans Aberg  wrote:

> So for a Cherokee keyboard, as discussed in the video, one would need different images on the keys if one bothers, and a key map.
> One problem here is [...] that it is very time consuming to design such key maps.?

On Wen, Jul 15, 2015, at 16:07, Hans Aberg  wrote:

> > On 15 Jul 2015, at 11:06, Marcel Schneider  wrote:
> 
> > Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). 
> 
> In mathematics, there are a couple of thousands of characters, including Latin and Greek styles, which would take some time to develop a key map for.

That is of course a hard piece of work. For mathematical symbols, rather than a keymap, I'd prefer a Compose tree.

For natural languages like Cherokee, Spanish, Welsh, English, or all languages together that use a given script, like Cyrillic, Greek, or Latin, developing keymaps is a very grateful job, regardless of the time we finally spend on, because the results will be useful to many people?at the condition that the results are good. Now, the better a keymap, the more it's likely to need time and personal investment (that is, we need to spend supplemental thinking time, additionally to the worktime). Obviously we can't rely on Apple, Google and Microsoft for doing this job, they simply *cannot* afford to spend so much time, which in this case is money, to develop absolutely free products that will never pay back all that money. 

By "pay back all that money" I mean that e.g. Microsoft would sell more Windows licenses for the sake of all the ultra-performative keyboard layouts the OS will be shipped with. I?don't believe that things could happen this way. First, Windows will now be distributed as a free update; second, OEMs *cannot* afford neither to raise computer prices for the sake of keyboard layouts; third, these keyboard drivers are so transparent by nature that de facto they're open source; fourth, the goal being that *everybody* come into the benefit of those keyboard layouts, they *must* be shared for free; and last but not least, a keyboard driver is not a good spot to place ads.

This is why *everybody* is invited to tackle this job. The idea is that when we concede to do some good with our personal time (as opposed to gaming or chasing, which are just two examples of time consuming activities that personally I consider as doing no good), then time will stop to be in the foreground when talking about key maps and Compose trees.

Best,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150716/8aeaed4f/attachment.html>

From haberg-1 at telia.com  Thu Jul 16 03:35:35 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 16 Jul 2015 10:35:35 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
Message-ID: <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>

On 16 Jul 2015, at 10:29, Marcel Schneider <charupdate at orange.fr> wrote:
> 
> On Sat, Jul 11, 2015, at 20:54, Hans Aberg <haberg-1 at telia.com> wrote:

> > > On 15 Jul 2015, at 11:06, Marcel Schneider <charupdate at orange.fr> wrote:
> > 
> > > Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). 
> > 
> > In mathematics, there are a couple of thousands of characters, including Latin and Greek styles, which would take some time to develop a key map for.
> 
> That is of course a hard piece of work. For mathematical symbols, rather than a keymap, I'd prefer a Compose tree.

One still has to figure out a good map.

Using Unicode helps the readability of the input file, though. One can use for example ConTeXt with LuaLaTeX, which comes with the TeX live installation.


From charupdate at orange.fr  Thu Jul 16 04:21:23 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 16 Jul 2015 11:21:23 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
Message-ID: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>

On 16 Jul 2015, at 10:35, Hans Aberg  wrote:

> One still has to figure out a good map.
> 
> Using Unicode helps the readability of the input file, though. One can use for example ConTeXt with LuaLaTeX, which comes with the TeX live installation.

Thank you very much for these hints, I'll try to apply them. Actually I stick with a rather common set of characters on the key map except that I've added U+2610, which is very useful, even more when it's a part of the dead lists as a base character, and several additional exotic currency symbols as a mark of respect. Backwards compatibility leads to limit the number of key positions. From eight per key I've come back down to four, and from a dozen or more dead keys (and a maximum of about twenty-five or thirty) back to five plus the Compose key (one key with four dead key positions: Compose, AltGr, Greek, Secondary group?with respect to ISO 9995). But with one Compose key we've potentially as many dead keys as there are key positions on the rest of the keyboard, and each one of them can give access to as many again. I?believe that the future of keyboards is as well in the Compose tree as in the key map, or even more.

The file format of my source files is UTF-8, however the compiler admits clear characters only up to U+008F. From U+00A0 upwards, we must use code points. For readability I?add Unicode characters in the trailing comments, as well as automatically added Unicode character identifiers (names), along with as much comments as we want. Doing all in spreadsheets allows to semi-automatically derive HTML tables without needing any other software than a text editor.

Now I've just downloaded the two versions of ConTEXT, which might well be the enhanced text editor I'm looking for since a while. LuaLaTeX will be very interesting too if I can edit source files with (however the bulk job is done in spreadsheet software which is Unicode; actual versions include even the UNICAR and UNICODE functions). I'll try if ConTeXt recognizes the Kana shift states (Gedit seemingly doesn't).

Have a great day,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150716/083f134f/attachment.html>

From charupdate at orange.fr  Thu Jul 16 04:53:54 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 16 Jul 2015 11:53:54 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
Message-ID: <105482380.7936.1437040434492.JavaMail.www@wwinf1f21>

On 16 Jul 2015, at 11:30, I wrote:

> the compiler admits clear characters only up to U+008F.

Up to U+007E, of course. 

On 16 Jul 2015, at 10:35, Hans Aberg  wrote:

> One still has to figure out a good map.

Yes this is the primary issue for every newly encoded script, and it remains important with respect to ergonomics. 

I just wanted to say that I'm focussing on the Compose tree of a Latin keyboard layout.

Do you mean that the US American English keymap should be thoroughly reengineered too, additionally to the solutions of ANSI, ISO, and August Dvorak? I think that on the ANSI/ISO keyboards it would be sufficient to remove the dead keys, change T29/E00 from 0x0060 to 0x02bc, and replace VK_RMENU with a Compose key. It's a bit more complicated however to get a simple *and* complete keymap for France, and surely a number of other countries using diacrited characters.


?

Marcel 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150716/20b8cd6a/attachment.html>

From haberg-1 at telia.com  Thu Jul 16 06:12:48 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 16 Jul 2015 13:12:48 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
Message-ID: <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>


> On 16 Jul 2015, at 11:21, Marcel Schneider <charupdate at orange.fr> wrote:
> 
> On 16 Jul 2015, at 10:35, Hans Aberg <haberg-1 at telia.com> wrote:
> 
> > Using Unicode helps the readability of the input file, though. One can use for example ConTeXt with LuaLaTeX, which comes with the TeX live installation.
> 
> Thank you very much for these hints, I'll try to apply them. 

> Now I've just downloaded the two versions of ConTEXT, which might well be the enhanced text editor I'm looking for since a while. LuaLaTeX will be very interesting too if I can edit source files with (however the bulk job is done in spreadsheet software which is Unicode; actual versions include even the UNICAR and UNICODE functions).

It is simplest to just download the whole Tex Live:
  https://www.tug.org/texlive/
There is special package for OS X.

Though large, the main distribution lives in a single directory, so it is easy to throw away.

> I'll try if ConTeXt recognizes the Kana shift states (Gedit seemingly doesn't).

It seems to depending on the font:

When trying a OS X systems Arabic font, the ligatures where broken. However, when trying Khaled Hosny's <http://www.amirifont.org/>, it seemed working.

There is a ConTeXt users list <http://www.ntg.nl/mailman/listinfo/ntg-context>, as well as support pages <http://wiki.contextgarden.net/>


From haberg-1 at telia.com  Thu Jul 16 08:20:11 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 16 Jul 2015 15:20:11 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <105482380.7936.1437040434492.JavaMail.www@wwinf1f21>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <105482380.7936.1437040434492.JavaMail.www@wwinf1f21>
Message-ID: <D5B9129B-544D-4F6B-AA7F-4BE19E60C6BF@telia.com>


> On 16 Jul 2015, at 11:53, Marcel Schneider <charupdate at orange.fr> wrote:

> On 16 Jul 2015, at 10:35, Hans Aberg <haberg-1 at telia.com> wrote:
> 
> > One still has to figure out a good map.
> 
> Yes this is the primary issue for every newly encoded script, and it remains important with respect to ergonomics. 
> 
> I just wanted to say that I'm focussing on the Compose tree of a Latin keyboard layout.
> 
> Do you mean that the US American English keymap should be thoroughly reengineered too, additionally to the solutions of ANSI, ISO, and August Dvorak?

It may suffice with a logical layout, letters in alphabetical order. The traditional layouts were designed for speed typing on physical typing machines, specifically, with fixed finger positioning, in order not having look at the keyboard while typing.

Speed typing is not so important in these days, as it is mostly for secretaries that write down material in other format. And the computer keyboard does not have the physical limitation of mechanical typewriters.

It is also considerably faster with moving finger positioning, which can be done if one does not have too look at some text while typing.


From haberg-1 at telia.com  Thu Jul 16 08:26:19 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 16 Jul 2015 15:26:19 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <5596365.26347.1437045209046.JavaMail.defaultUser@defaultHost>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <5596365.26347.1437045209046.JavaMail.defaultUser@defaultHost>
Message-ID: <68383FA8-F314-4439-862D-59E03710FE2F@telia.com>


> On 16 Jul 2015, at 13:13, William_J_G Overington <wjgo_10009 at btinternet.com> wrote:

> I do not know if it is of interest, but some time ago I produced some pdf files that can each be used as a typecase so as to copy a character from the pdf, then paste into a Unicode-aware wordprocessor or desktop publishing program and then formatted to the desired font and font size.

On OS X there is a ?Character Viewer?, which has a similar purpose. One has access to all of Unicode, and can click on characters to get them pasted into the text. One can use special categories and also make ones own table. But it is slow.


From charupdate at orange.fr  Thu Jul 16 09:44:20 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 16 Jul 2015 16:44:20 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
Message-ID: <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>

On 16 Jul 2015, at 13:21, Hans Aberg  wrote:

> On 16 Jul 2015, at 11:21, Marcel Schneider  wrote:
>> 
>> Now I've just downloaded the two versions of ConTEXT, which might well be the enhanced text editor I'm looking for since a while. LuaLaTeX will be very interesting too if I can edit source files with (however the bulk job is done in spreadsheet software which is Unicode; actual versions include even the UNIC[H]AR and UNICODE functions).

Knowing nothing about, I mixed up ConTeXt you referred to, and ConTEXT, and ended up downloading and istalling a new text editor. At least, this time, that is very useful to me, as ConTEXT will replace for me the use of Gedit, because ConTEXT handles correctly the Kana shift states (about a half of my keyboard layout). However, as it is new, the support of characters like U+2610 or simply precomposed letters with macron or double acute is not yet ensured. When I've some time left I'll write to them, because the project is very promising.

> 
> It is simplest to just download the whole Tex Live:
> https://www.tug.org/texlive/
> There is special package for OS X.

Unfortunately I've no OS X machine at home nor otherwhere, nor have I Linux at home. Where I use Ubuntu I cannot install this. I'll check if there is a Windows version, but it seems to move me from my urgent goal, so it'll be for a bit later.
> 
> Though large, the main distribution lives in a single directory, so it is easy to throw away.

Nor will I throw away this software, could I install it.
> 
>> I'll try if ConTeXt recognizes the Kana shift states (Gedit seemingly doesn't).
> 
> It seems to depending on the font:
> 
> When trying a OS X systems Arabic font, the ligatures where broken. However, when trying Khaled Hosny's , it seemed working.

First I'll have to learn the language. This is a very valuable purpose, but it needs some time I don't have actually.
> 
> There is a ConTeXt users list , as well as support pages 

I'll save, thank you.


On 16 Jul 2015, at 15:20, Hans Aberg  wrote:

> On 16 Jul 2015, at 11:53, Marcel Schneider  wrote:
>> 
>> Do you mean that the US American English keymap should be thoroughly reengineered too, additionally to the solutions of ANSI, ISO, and August Dvorak?

> It may suffice with a logical layout, letters in alphabetical order. The traditional layouts were designed for speed typing on physical typing machines, specifically, with fixed finger positioning, in order not having look at the keyboard while typing.

This is an important point, not to look at the keyboard. Even with alphabetical order, one *must* learn typing. Often suggested for computers, the alphabetical order is also often rejected, because it needs much more finger move than its couterpart, the ergonomical order as proposed by August Dvorak, and very actively promoted in a French version by the association ERGODIS [http://bepo.fr/].

> Speed typing is not so important in these days, as it is mostly for secretaries that write down material in other format. And the computer keyboard does not have the physical limitation of mechanical typewriters.

Yes for the hardware, but no for the need of speed typing. By the time, secretaries were almost the only people using typewriting. Today, more and more managers do their own mailing by themselves, without dicting to a secretary, while their employee manages much more than writing (as they did already by the time). Personally I wonna look at my keyboard when typing text, nor do you nor does anybody at all.

> It is also considerably faster with moving finger positioning, which can be done if one does not have too look at some text while typing.

I don't understand well how to speed up with moving fingers except towards the dedicated keys, the little fingers having much more of these, and the thumbs acting the central modifiers if any, and/or the central Compose key, additionally to the space bar. Central means on the Alt keys. Alt itself at this favorite position is counter-productive, it should be moved on Left Windows, this on Apps (Menu), which is not suppressed by a set of netbook manufacturers. If it is, then use the mouse/trackpad.

I believe that at this juncture of imminent climate change and global destruction, we should stick with the existing hardware. For France neither I am not going to propose a completely *new* layout, I will bring something you can use by simply thinking at the little set of useful modifications, even without needing keyboard stickers. A reuse-what-you've-got concept.

Best,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150716/014e9215/attachment.html>

From wjgo_10009 at btinternet.com  Thu Jul 16 06:13:29 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Thu, 16 Jul 2015 12:13:29 +0100 (BST)
Subject: Input methods at the age of Unicode
In-Reply-To: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
Message-ID: <5596365.26347.1437045209046.JavaMail.defaultUser@defaultHost>

Hi
I do not know if it is of interest, but some time ago I produced some pdf files that can each be used as a typecase so as to copy a character from the pdf, then paste into a Unicode-aware wordprocessor or desktop publishing program and then formatted to the desired font and font size.
The following might be of particular interest.
http://www.users.globalnet.co.uk/~ngo/typecase_accented_characters_for_Latvian.pdf
http://www.users.globalnet.co.uk/~ngo/typecase_esperanto.pdf
http://www.users.globalnet.co.uk/~ngo/typecase_hot_beverage.pdf
http://www.users.globalnet.co.uk/~ngo/typecase_maltese.pdf
http://www.users.globalnet.co.uk/~ngo/typecase_quotation_marks.pdf
http://www.users.globalnet.co.uk/~ngo/typecase_spaces.pdf
http://www.users.globalnet.co.uk/~ngo/typecase_welsh_accented_characters.pdf
These and some others are linked from the following web page.
http://www.users.globalnet.co.uk/~ngo/outlinks.htm
That page is linked from another web page.
http://www.users.globalnet.co.uk/~ngo/library.htm
Best regards,
William Overington
16 July 2015
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150716/9104f766/attachment.html>

From charupdate at orange.fr  Thu Jul 16 10:49:45 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 16 Jul 2015 17:49:45 +0200 (CEST)
Subject: Input methods at the age of Unicode
Message-ID: <1804867121.15811.1437061785195.JavaMail.www@wwinf1n11>

On 16 Jul 2015, at 13:12, William_J_G Overington  wrote:

> Hi

> I do not know if it is of interest, but some time ago I produced some pdf files that can each be used as a typecase so as to copy a character from the pdf, then paste into a Unicode-aware wordprocessor or desktop publishing program and then formatted to the desired font and font size.

This is a nice piece of work. If you are using these characters very often, a solution using a Compose tree may be interesting too. It allows to type a sequence of characters available on the keyboard, to obtain the insertion of precomposed characters, punctuation and symbols. I'll insert some suggestions between, and I'm curious to know if you would like them.

> The following might be of particular interest.

> http://www.users.globalnet.co.uk/~ngo/typecase_accented_characters_for_Latvian.pdf

To input a letter with macron, it is current to type 'Compose, _' and then the letter. With hacek, there is 'Compose, v' or 'Compose, <' but this is taken for "subscript", so I prefer 'v' and 'V'. You can find 'Compose, c' because of the ISO name of this diacritic, which has been enforced at merger (Unicode called it HACEK, which is the true name). So better is to choose 'v', a mnemonic derived from the shape. For comma below, take 'Compose, <, Comma', and for turned comma above, 'Compose, >, #, Comma' (I'm not quite sure, because I've not yet implemented these ones). But in fact, AFAIK the turned comma above is a preferred glyphic variant of the hacek on the g.

> http://www.users.globalnet.co.uk/~ngo/typecase_esperanto.pdf

These are easy, you need 'Compose, ^' and 'Compose, v'.

> http://www.users.globalnet.co.uk/~ngo/typecase_hot_beverage.pdf

This may be obtained by typing 'Compose, h, o, t' or 'Compose, h, b'.

> http://www.users.globalnet.co.uk/~ngo/typecase_maltese.pdf

With dot above is usually 'Compose, Full stop'; and the latin letter h with stroke is 'Compose, -, h'.

> http://www.users.globalnet.co.uk/~ngo/typecase_quotation_marks.pdf

You may type 'Compose, Grave' as a grave accent dead key, then go on with 'Apostrophe' or 'Quotation mark' for either single or double opening qoutation marks. Or 'Comose, Apostrophe' for the acute, then equally for the closing. That matches old ASCII practice, hence the mnemonics. For the low, type 'Compose, <', and for the reversed, 'Compose, \'.

> http://www.users.globalnet.co.uk/~ngo/typecase_spaces.pdf

There is an ultra-performative way to get *all* Unicode spaces (perhaps without the two doubles) with 'Compose, Space' and then any mnemonic letter, digit (1; 2; 3; 4; 6), and even < or > for the unpaired directional marks (very useful to correct the display when RTL characters are used in a LTR context and vice versa).

> http://www.users.globalnet.co.uk/~ngo/typecase_welsh_accented_characters.pdf

For the letters with diaeresis one can use the usual 'Compose, "', or the alternate 'Compose, :'. The latter helps disambiguating the use of quotation marks, because 'Compose, Apostrophe, Quotation mark' is already used for the closing double quote, so "diaeresis and acute" may interfere. For acute, grave, circumflex, we use 'Compose, '/`/^'. (Alternately, if the apostrophe risks to interfere, one can use the vertical bar instead, which is a solution that should have been implemented on the US International keyboard to prevent that "it messes" apostrophe, single quotes, and acute dead key. Instead of the quotation mark for diaeresis, IMO one could have chosen the number sign or some other less often used character. I know that ASCII used ' and " after Backspace to diacrite letters, hence the choice of the dead keys on the US International.) 

> These and some others are linked from the following web page.
> http://www.users.globalnet.co.uk/~ngo/outlinks.htm
> That page is linked from another web page.
> http://www.users.globalnet.co.uk/~ngo/library.htm

I'm confident to extrapolate that for each one of the other PDF typecases, there will be Compose solutions too.
To implement a two characters Compose sequence, program the following:
DEADTRANS(first character, compose, first character, 0x0001),
DEADTRANS(second character, first character, target character, 0x0000)

Best, 

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150716/0a26a74c/attachment.html>

From haberg-1 at telia.com  Thu Jul 16 11:21:59 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 16 Jul 2015 18:21:59 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
Message-ID: <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com>


> On 16 Jul 2015, at 16:44, Marcel Schneider <charupdate at orange.fr> wrote:
> 
> On 16 Jul 2015, at 13:21, Hans Aberg <haberg-1 at telia.com> wrote:

> Knowing nothing about, I mixed up ConTeXt you referred to, and ConTEXT, and ended up downloading and istalling a new text editor. At least, this time, that is very useful to me, as ConTEXT will replace for me the use of Gedit, because ConTEXT handles correctly the Kana shift states (about a half of my keyboard layout). However, as it is new, the support of characters like U+2610 or simply precomposed letters with macron or double acute is not yet ensured. When I've some time left I'll write to them, because the project is very promising.

One needs a good UTF-8 text editor as well.

> > It is simplest to just download the whole Tex Live:
> > https://www.tug.org/texlive/
> > There is special package for OS X.
> 
> Unfortunately I've no OS X machine at home nor otherwhere, nor have I Linux at home. Where I use Ubuntu I cannot install this. I'll check if there is a Windows version, but it seems to move me from my urgent goal, so it'll be for a bit later.

The link above has an entry for that, too.

> > Though large, the main distribution lives in a single directory, so it is easy to throw away.
> 
> Nor will I throw away this software, could I install it.

It is updated yearly, and there is usually no need to keep the old, but one can - they end up different directories.

> > There is a ConTeXt users list <http://www.ntg.nl/mailman/listinfo/ntg-context>, as well as support pages <http://wiki.contextgarden.net/>
> 
> I'll save, thank you.

It hard to figure out from the documentation, so it might be better to ask there.


From eliz at gnu.org  Thu Jul 16 11:33:34 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Thu, 16 Jul 2015 19:33:34 +0300
Subject: Input methods at the age of Unicode
In-Reply-To: <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com>
Message-ID: <838uag6p0h.fsf@gnu.org>

> From: Hans Aberg <haberg-1 at telia.com>
> Date: Thu, 16 Jul 2015 18:21:59 +0200
> Cc: Unicode Mailing List <unicode at unicode.org>
> 
> One needs a good UTF-8 text editor as well.

Emacs is one possibility, of course.

From haberg-1 at telia.com  Thu Jul 16 11:35:49 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 16 Jul 2015 18:35:49 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
Message-ID: <A2B794E6-03BA-4AB4-A9C4-5570D2C21F8A@telia.com>


> On 16 Jul 2015, at 16:44, Marcel Schneider <charupdate at orange.fr> wrote:

> On 16 Jul 2015, at 15:20, Hans Aberg <haberg-1 at telia.com> wrote:

> > It may suffice with a logical layout, letters in alphabetical order. The traditional layouts were designed for speed typing on physical typing machines, specifically, with fixed finger positioning, in order not having look at the keyboard while typing.
> 
> This is an important point, not to look at the keyboard. Even with alphabetical order, one *must* learn typing. Often suggested for computers, the alphabetical order is also often rejected, because it needs much more finger move than its couterpart, the ergonomical order as proposed by August Dvorak, and very actively promoted in a French version by the association ERGODIS [http://bepo.fr/].

It depends on the objective. Languages may have a number of layouts, which may efficient for just that.

But if one would want to have a single layout for the Latin scripts, it would be hard to have special letter orders.

> > It is also considerably faster with moving finger positioning, which can be done if one does not have too look at some text while typing.
> 
> I don't understand well how to speed up with moving fingers except towards the dedicated keys, the little fingers having much more of these, and the thumbs acting the central modifiers if any, and/or the central Compose key, additionally to the space bar. Central means on the Alt keys. Alt itself at this favorite position is counter-productive, it should be moved on Left Windows, this on Apps (Menu), which is not suppressed by a set of netbook manufacturers. If it is, then use the mouse/trackpad.

It is used on music keyboards. For example, one can use more than one finger on the same key if it should be pressed rapidly in succession. If the hand needs to move, one shifts the fingers, which will avoid the stretching that would occur in fixed hand positioning.

> I believe that at this juncture of imminent climate change and global destruction, we should stick with the existing hardware. For France neither I am not going to propose a completely *new* layout, I will bring something you can use by simply thinking at the little set of useful modifications, even without needing keyboard stickers. A reuse-what-you've-got concept.

There are physical keyboard with displays on the keys that can be changed, e.g., [1], thus able to display different key layouts, but currently they are expensive, and the keys require more force when depressed.

1. http://www.artlebedev.com/everything/optimus/


From haberg-1 at telia.com  Thu Jul 16 11:36:39 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 16 Jul 2015 18:36:39 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <838uag6p0h.fsf@gnu.org>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org>
Message-ID: <7A8C61A0-C4AC-4D08-BA1D-BF46850D0BB8@telia.com>


> On 16 Jul 2015, at 18:33, Eli Zaretskii <eliz at gnu.org> wrote:
> 
>> From: Hans Aberg <haberg-1 at telia.com>
>> Date: Thu, 16 Jul 2015 18:21:59 +0200
>> Cc: Unicode Mailing List <unicode at unicode.org>
>> 
>> One needs a good UTF-8 text editor as well.
> 
> Emacs is one possibility, of course.

And on OS X, Xcode has a good text editor as well.


From richard.wordingham at ntlworld.com  Thu Jul 16 17:59:24 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 16 Jul 2015 23:59:24 +0100
Subject: Input methods at the age of Unicode
In-Reply-To: <838uag6p0h.fsf@gnu.org>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com>
 <838uag6p0h.fsf@gnu.org>
Message-ID: <20150716235924.2dfc406b@JRWUBU2>

On Thu, 16 Jul 2015 19:33:34 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > One needs a good UTF-8 text editor as well.

> Emacs is one possibility, of course.

If you're prepared to cut and paste, it's easy to extend it own
keyboards.  (Creating the first one was a bit stressful - the ones
that come with Emacs were almost all set up using ISO-2022, before
Emacs adopted Unicode.)

Richard.

From jsbien at mimuw.edu.pl  Thu Jul 16 22:41:11 2015
From: jsbien at mimuw.edu.pl (Janusz S. Bien)
Date: Fri, 17 Jul 2015 05:41:11 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <20150716235924.2dfc406b@JRWUBU2>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org>
 <20150716235924.2dfc406b@JRWUBU2>
Message-ID: <20150717054111.129625mbnpjlz5uf@mail.mimuw.edu.pl>

Quote/Cytat - Richard Wordingham <richard.wordingham at ntlworld.com>  
(Fri 17 Jul 2015 12:59:24 AM CEST):

> On Thu, 16 Jul 2015 19:33:34 +0300
> Eli Zaretskii <eliz at gnu.org> wrote:
>
>> > One needs a good UTF-8 text editor as well.
>
>> Emacs is one possibility, of course.
>
> If you're prepared to cut and paste,

Why it is relevant?

> it's easy to extend it own
> keyboards.  (Creating the first one was a bit stressful

It is not clear for me what do you mean by "own keyboards"

- the ones
> that come with Emacs were almost all set up using ISO-2022, before
> Emacs adopted Unicode.)

I my opinion creating a new Emacs input method is extremely easy and I  
solve my problems my modifying 'polish-slash'.

In a file you can associate an input method with it using Emacs an  
appropriate local variable.

Best regards

Janusz


-- 
Prof. dr hab. Janusz S. Bie? -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)
Prof. Janusz S. Bie? - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/


From richard.wordingham at ntlworld.com  Fri Jul 17 01:39:57 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 17 Jul 2015 07:39:57 +0100
Subject: Input methods at the age of Unicode
In-Reply-To: <20150717054111.129625mbnpjlz5uf@mail.mimuw.edu.pl>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com>
 <838uag6p0h.fsf@gnu.org> <20150716235924.2dfc406b@JRWUBU2>
 <20150717054111.129625mbnpjlz5uf@mail.mimuw.edu.pl>
Message-ID: <20150717073957.1290cd32@JRWUBU2>

On Fri, 17 Jul 2015 05:41:11 +0200
"Janusz S. Bien" <jsbien at mimuw.edu.pl> wrote:

> Quote/Cytat - Richard Wordingham <richard.wordingham at ntlworld.com>  
> (Fri 17 Jul 2015 12:59:24 AM CEST):

Perhaps I'm missing a trick.  My conception was that to use an Emacs
keyboard for, say, word processor input, one would have to type into
an Emacs buffer and then copy the text to the word processor
application.

> > it's easy to extend it own
> > keyboards.  (Creating the first one was a bit stressful
 
> It is not clear for me what do you mean by "own keyboards"

Except possibly for Windows (last time I looked into it, Emacs there was
built as an ANSI application rather than as a Unicode application),
Emacs can use the user-specified system keyboards (and general-purpose
user keyboards) as well as the Emacs-specific keyboards.  By "own
keyboards" I meant the ones defined for Emacs, specifically the ones
set up by quail-define-package and quail-define-rules.

There was a period when, due to an external error, Emacs launched with
an English locale couldn't use keyboards made available by ibus.

> - the ones
> > that come with Emacs were almost all set up using ISO-2022, before
> > Emacs adopted Unicode.)

> I my opinion creating a new Emacs input method is extremely easy and
> I solve my problems my modifying 'polish-slash'.

I see latin-pre.el and latin-post.el in particular are now defined in
UTF-8, which simplifies adaptation.  My exemplar was thai.el, which
at the time was in ISO-2022.

> In a file you can associate an input method with it using Emacs an  
> appropriate local variable.

Another example of the first keyboard being difficult and the rest
easy.  Once one starts using that trick it is easy to modify it for
other keyboards.

Richard.

From eliz at gnu.org  Fri Jul 17 01:57:46 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Fri, 17 Jul 2015 09:57:46 +0300
Subject: Input methods at the age of Unicode
In-Reply-To: <20150716235924.2dfc406b@JRWUBU2>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org>
 <20150716235924.2dfc406b@JRWUBU2>
Message-ID: <831tg76zkl.fsf@gnu.org>

> Date: Thu, 16 Jul 2015 23:59:24 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> On Thu, 16 Jul 2015 19:33:34 +0300
> Eli Zaretskii <eliz at gnu.org> wrote:
> 
> > > One needs a good UTF-8 text editor as well.
> 
> > Emacs is one possibility, of course.
> 
> If you're prepared to cut and paste, it's easy to extend it own
> keyboards.

FWIW, I do that a lot, because the number of convenient input methods
in Emacs far outnumbers what I have on MS-Windows.  For example, if I
have to type Russian with no Russian keyboard available, the
cyrillic-translit input method is a life savior.

From marc at keyman.com  Fri Jul 17 03:01:46 2015
From: marc at keyman.com (Marc Durdin)
Date: Fri, 17 Jul 2015 08:01:46 +0000
Subject: Input methods at the age of Unicode
In-Reply-To: <831tg76zkl.fsf@gnu.org>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org>
 <20150716235924.2dfc406b@JRWUBU2>,<831tg76zkl.fsf@gnu.org>
Message-ID: <D2057BA2-75F0-4328-AAEB-4B078A448875@keyman.com>

On Windows, you can always use Keyman and Keyman Developer to create very flexible input methods that work across pretty much any app, FWIW. Both of these are available free these days at least in basic editions (www.keyman.com/desktop<http://www.keyman.com/desktop> and www.keyman.com/developer<http://www.keyman.com/developer>). Just providing another alternative.

Marc

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Eli Zaretskii
Sent: Friday, 17 July 2015 4:58 PM
To: Richard Wordingham
Cc: unicode at unicode.org<mailto:unicode at unicode.org>
Subject: Re: Input methods at the age of Unicode

Date: Thu, 16 Jul 2015 23:59:24 +0100
From: Richard Wordingham <richard.wordingham at ntlworld.com<mailto:richard.wordingham at ntlworld.com>>

On Thu, 16 Jul 2015 19:33:34 +0300
Eli Zaretskii <eliz at gnu.org<mailto:eliz at gnu.org>> wrote:

One needs a good UTF-8 text editor as well.

Emacs is one possibility, of course.

If you're prepared to cut and paste, it's easy to extend it own
keyboards.

FWIW, I do that a lot, because the number of convenient input methods in Emacs far outnumbers what I have on MS-Windows.  For example, if I have to type Russian with no Russian keyboard available, the cyrillic-translit input method is a life savior.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150717/b3bd1bfd/attachment.html>

From eliz at gnu.org  Fri Jul 17 03:28:10 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Fri, 17 Jul 2015 11:28:10 +0300
Subject: Input methods at the age of Unicode
In-Reply-To: <D2057BA2-75F0-4328-AAEB-4B078A448875@keyman.com>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org>
 <20150716235924.2dfc406b@JRWUBU2> <831tg76zkl.fsf@gnu.org>
 <D2057BA2-75F0-4328-AAEB-4B078A448875@keyman.com>
Message-ID: <83zj2v5gth.fsf@gnu.org>

> From: Marc Durdin <marc at keyman.com>
> CC: Richard Wordingham <richard.wordingham at ntlworld.com>,
> 	"unicode at unicode.org" <unicode at unicode.org>
> Date: Fri, 17 Jul 2015 08:01:46 +0000
> 
> On Windows, you can always use Keyman and Keyman Developer to create very
> flexible input methods that work across pretty much any app, FWIW. Both of
> these are available free these days at least in basic editions
> (www.keyman.com/desktop and www.keyman.com/developer). Just providing another
> alternative.

I'm surprised there isn't such an input method already.  I think it's
available only with Some East Asia packs, or something.

From charupdate at orange.fr  Fri Jul 17 04:33:12 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 17 Jul 2015 11:33:12 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com>
References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12>
 <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com>
 <784036309.5510.1437035352057.JavaMail.www@wwinf1f21>
 <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com>
 <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21>
 <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com>
 <748934299.14990.1437057860253.JavaMail.www@wwinf1e21>
 <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com>
Message-ID: <1139160524.7966.1437125592262.JavaMail.www@wwinf1f21>

On 16 Jul 2015, at 18:22, Hans Aberg  wrote:

> One needs a good UTF-8 text editor as well.

ConTEXT displays "UTF-8" in the status bar. I'm pretty confident that it has the potential of becoming the world's best text editor. It's not yet 1.0, still 0.98.6, and many users are already enthusiastic.

> The link above has an entry for that, too.

Thank you, I just can't work with TeX right now, I know it needs some skill.

> It is updated yearly, and there is usually no need to keep the old, but one can - they end up different directories.

> It hard to figure out from the documentation, so it might be better to ask there.

Thank you.


On 16 Jul 2015, at 18:35, Hans Aberg  wrote:

> It depends on the objective. Languages may have a number of layouts, which may efficient for just that.
> But if one would want to have a single layout for the Latin scripts, it would be hard to have special letter orders.

My goal is not a single Latin, just a universal Latin depending on locales, now French for France, then fr-BE, de-... en-... and so on, implementing some pinciples in different locales.

> It is used on music keyboards. For example, one can use more than one finger on the same key if it should be pressed rapidly in succession. If the hand needs to move, one shifts the fingers, which will avoid the stretching that would occur in fixed hand positioning.

I've little idea of music keyboards as I primarily learned other instruments, but AFAIK the keystroke dynamics are quite different as opposed to a classical computer keyboard, be it ergonomical or current.

> There are physical keyboard with displays on the keys that can be changed, e.g., [1], thus able to display different key layouts, but currently they are expensive, and the keys require more force when depressed.
> 1. http://www.artlebedev.com/everything/optimus/

I think that is an idea for users having to toggle between a lot of locales and not the time to learn them all. Very heavy, very much technology. Alternately an onscreen keyboard with visual real-time feedback may allow to get the same effect without looking at the fingers on the keycaps. This is much cheaper, as we have already HD screens if needed (not I, nor do I?need any).


On 16 Jul 2015, at 18:36, Hans Aberg  wrote:

>> On 16 Jul 2015, at 18:33, Eli Zaretskii  wrote:
>>
>>> From: Hans Aberg 
>>> Date: Thu, 16 Jul 2015 18:21:59 +0200
>>> Cc: Unicode Mailing List 
>>>
>>> One needs a good UTF-8 text editor as well.
>>
>> Emacs is one possibility, of course.

Almost everybody, including me, has heard of Emacs and that it is very hard to use.

> And on OS X, Xcode has a good text editor as well.

And on Xfce we have MousePad. No I'll try Notepad++, which reduces the environmental impact of text editing.

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150717/df7ffeb2/attachment.html>

From doug at ewellic.org  Fri Jul 17 09:31:37 2015
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 17 Jul 2015 07:31:37 -0700
Subject: Keyman Developer for
 =?UTF-8?Q?free=3F=20=28was=3A=20Re=3A=20Input=20meth?=
 =?UTF-8?Q?ods=20at=20the=20age=20of=20Unicode=29?=
Message-ID: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net>

Marc Durdin <marc at keyman dot com> wrote:

> On Windows, you can always use Keyman and Keyman Developer to create
> very flexible input methods that work across pretty much any app,
> FWIW. Both of these are available free these days at least in basic
> editions (www.keyman.com/desktop and www.keyman.com/developer). Just
> providing another alternative.

Can you provide a specific link to a freely available version? I hadn't
heard before that there was such a thing, and the links above don't say
anything about free. Limited-time evaluation versions don't count, of
course.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From doug at ewellic.org  Fri Jul 17 09:36:51 2015
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 17 Jul 2015 07:36:51 -0700
Subject: Keyman Developer for
 =?UTF-8?Q?free=3F=20=28was=3A=20Re=3A=20Input=20?=
 =?UTF-8?Q?methods=20at=20the=20age=20of=20Unicode=29?=
Message-ID: <20150717073651.665a7a7059d7ee80bb4d670165c8327d.faa2fb2b36.wbe@email03.secureserver.net>

I wrote:
 
>> (www.keyman.com/desktop and www.keyman.com/developer)
>
> the links above don't say anything about free

s/links/link/

The first link does offer a free version of Desktop, but that's for end
users only. Creating a keyboard requires Developer.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From charupdate at orange.fr  Fri Jul 17 10:26:26 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 17 Jul 2015 17:26:26 +0200 (CEST)
Subject: Input methods at the age of Unicode
Message-ID: <1146739442.16626.1437146786363.JavaMail.www@wwinf1k33>

On 30 Jun 2015, at 23:28, Doug Ewell  wrote:

> This works on the built-in Notepad as well as Notepad++ and BabelPad

Notepad++ is great software. It supports Kana shift states and all of Unicode, I infere from what I've tested.
The bit on process garbage found on the homepage might target other text editors that would then not be streamlined for efficiency, I suppose.

As a text editor, I recommend Notepad++.

Thank you for this information.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150717/6c857324/attachment.html>

From charupdate at orange.fr  Fri Jul 17 10:38:22 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 17 Jul 2015 17:38:22 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <1146739442.16626.1437146786363.JavaMail.www@wwinf1k33>
References: <1146739442.16626.1437146786363.JavaMail.www@wwinf1k33>
Message-ID: <2010183755.15134.1437147502497.JavaMail.www@wwinf1g19>

On 30 Jun 2015, at 23:28, Doug Ewell  wrote:

> This works on the built-in Notepad as well as Notepad++ and BabelPad

Notepad++ is great software. It supports Kana shift states and all of Unicode, I infere from what I've tested.
The bit on process garbage found on the homepage might target other text editors that would then not be streamlined for efficiency, I suppose.

As a text editor, I recommend Notepad++.

Thank you for this information.

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150717/b2822955/attachment.html>

From marc at keyman.com  Fri Jul 17 17:55:27 2015
From: marc at keyman.com (Marc Durdin)
Date: Fri, 17 Jul 2015 22:55:27 +0000
Subject: Keyman Developer for free? (was: Re: Input methods at the age
 of Unicode)
In-Reply-To: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net>
References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net>
Message-ID: <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com>


> On 18 Jul 2015, at 12:32 am, Doug Ewell <doug at ewellic.org> wrote:
> 
> Marc Durdin <marc at keyman dot com> wrote:
> 
>> On Windows, you can always use Keyman and Keyman Developer to create
>> very flexible input methods that work across pretty much any app,
>> FWIW. Both of these are available free these days at least in basic
>> editions (www.keyman.com/desktop and www.keyman.com/developer). Just
>> providing another alternative.
> 
> Can you provide a specific link to a freely available version? I hadn't
> heard before that there was such a thing, and the links above don't say
> anything about free. Limited-time evaluation versions don't count, of
> course.
> 
http://tavultesoft.com/beta has the free download of Developer 9. The beta has the license key requirement but you can obtain a free perpetual license key on that page as well. 

While Keyman Developer 9 is version still in beta, it is stable and we are finalising the documentation and a few loose ends. The release version will continue to be free.

Version 9 includes support for building keyboards for Windows, web, mobile web, iOS and Android, with Mac OS X coming shortly. The web and mobile web versions run with KeymanWeb 2.0 which is open source at http://www.keyman.com/developer/keymanweb. Keyman apps for mobile platforms can be found at keyman.com as well.

Sorry if this sounds a bit like a commercial but wanted to clear up the some uncertainty on where Keyman is at today.

Marc

From charupdate at orange.fr  Sat Jul 18 09:33:23 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST)
Subject: Input methods at the age of Unicode
Message-ID: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>

On 16 Jul 2015, at 23:59:24 +0100, Eli Zaretskii  wrote: wrote:

> FWIW, I do that a lot, because the number of convenient input methods
> in Emacs far outnumbers what I have on MS-Windows. For example, if I
> have to type Russian with no Russian keyboard available, the
> cyrillic-translit input method is a life savior. 

You might wish also to use the Windows on-screen keyboard which allows to see what's exactly on each key while typing on whatever physical keyboard, without any need to have the keycap labels match the layout. This on-screen keyboard is built-in, only it does not support Kana shift states.
Likewise Windows came to me along with all that is needed to type ?? ???? ?? ? ?????, so I can?t really believe that users need Emacs as a savior. 

When process garbage is an environmental issue, one might consider that our real savior is Notepad++, thanks to its energy saving algorithms. Indeed I do not think that we should get supplemental input facilities at any price. This is why, too, the goal should be to pack a reasonably large subset of Unicode into the very core of the keyboard driver of every locale, and make it accessible right there with a Compose tree. Every time we open charmap dialogs or even go on the internet to pick a character, we?re consuming some energy, and if it?s a routine task that could be done with a memorized Compose sequence, that energy is wasted. I don?t know if it?s a real issue, but I?m likely to believe it is.

Of course we need some software as a savior, but this software is consequently called Zotero and helps us save and manage our research results (?Search, not re-search!? https://www.zotero.org).

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150718/c0e703da/attachment.html>

From charupdate at orange.fr  Sat Jul 18 09:47:09 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 18 Jul 2015 16:47:09 +0200 (CEST)
Subject: Keyman Developer for free? (was: Re: Input methods at the age
 of Unicode)
In-Reply-To: <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com>
References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net>
 <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com>
Message-ID: <279207082.12069.1437230829393.JavaMail.www@wwinf1k02>

On 18 Jul 2015, at 00:55:27, Marc Durdin  wrote:

> http://tavultesoft.com/beta has the free download of Developer 9. The beta has the license key requirement but you can obtain a free perpetual license key on that page as well. 

> While Keyman Developer 9 is version still in beta, it is stable and we are finalising the documentation and a few loose ends. The release version will continue to be free.

> Version 9 includes support for building keyboards for Windows, web, mobile web, iOS and Android, with Mac OS X coming shortly. The web and mobile web versions run with KeymanWeb 2.0 which is open source at http://www.keyman.com/developer/keymanweb. Keyman apps for mobile platforms can be found at keyman.com as well.

In front of this very outworked keyboard mapping solution I knew nothing about, I?m very astonished. If it helps make available the missing layouts and improve BTW a number of Windows keyboard layouts where I found some oddities, I welcome it and am considering to try.
In the meantime however, I would ask a couple of questions:

1. Does Keyman allow to place a Kana toggle? This feature available at least on Windows is useful for locales like Czech and French that use so many precomposed characters that the upper row is filled up with them to some extent. When Kana toggle is on, digits will be in Base (Kana) there. The preferred place for this toggle is E00 (ISO 9995-1).

2. Does Keyman support extended Compose trees? An extended Compose tree allows to use ?Compose? as a part of Compose sequences. In fact, ?Compose? can convert to a dead key *any* key on the keyboard, including the Compose key itself (regardless of the fact that it is already a dead key). This allows to make sequences more user-friendly. For example, the h??ek dead key may be ?Compose, v?, while ? may be ?Compose, z, h?. With an extended Compose tree, users may input ? typing ?Compose, v, Compose, z, h?. Otherwise it must be typed ?Compose, z, v, h?, because ?Compose, v, z? is already ?. With ?Compose? acted by the right thumb, the first option may be appealing. One keystroke more, but one memorization less. However, I know that the second order matches the principle of double combining marks as stated in TUS ?7.9. It would be interesting to know the user preferences about these Compose sequences, as implementing them both is needless if one is disliked.

3. Does Keyman propose a spreadsheet-like UI? The use of spreadsheets for keyboard layout programming helps streamlining the development process.

4. Are Keyman layouts programmable in C? Windows drivers (at least, as I know little about other OSes) are. The syntax of C and C++ allows developers to use spreadsheets, from where allocation tables, deadtrans lists, and ligatures tables (that is, in keyboard driver language, Unicode character [WCHAR] sequences tables) are copied and pasted into the source.

5. Does Keyman allow to get such ligatures (sequences) accessed by dead keys? On Windows I don't see this possibility, and I never knew how to program it. But Unicode recommends that impl?mentations provide this facility.

Regards,

Marcel Schneider
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150718/86ab442a/attachment.html>

From jsbien at mimuw.edu.pl  Sat Jul 18 09:51:24 2015
From: jsbien at mimuw.edu.pl (Janusz S. Bien)
Date: Sat, 18 Jul 2015 16:51:24 +0200
Subject: Input methods at the age of Unicode
In-Reply-To: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
Message-ID: <20150718165124.24201ih6cra1y224@mail.mimuw.edu.pl>

Quote/Cytat - Marcel Schneider <charupdate at orange.fr> (Sat 18 Jul 2015  
04:33:23 PM CEST):

> On 16 Jul 2015, at 23:59:24 +0100, Eli Zaretskii  wrote: wrote:
>
>> FWIW, I do that a lot, because the number of convenient input methods
>> in Emacs far outnumbers what I have on MS-Windows. For example, if I
>> have to type Russian with no Russian keyboard available, the
>> cyrillic-translit input method is a life savior.
>
> You might wish also to use the Windows on-screen keyboard which  
> allows to see what's exactly on each key while typing on whatever  
> physical keyboard, without any need to have the keycap labels match  
> the layout. This on-screen keyboard is built-in, only it does not  
> support Kana shift states.
> Likewise Windows came to me along with all that is needed to type ??  
> ???? ?? ? ?????, so I can?t really believe that users need Emacs as  
> a savior.

cyrillic-translit and most other Emacs input methods are more  
convenient than on-screen keyboard, especially if you don't like to  
use mouse and your goal is to get the text into Emacs :-)

>
> When process garbage is an environmental issue, one might consider  
> that our real savior is Notepad++, thanks to its energy saving  
> algorithms. Indeed I do not think that we should get supplemental  
> input facilities at any price. This is why, too, the goal should be  
> to pack a reasonably large subset of Unicode into the very core of  
> the keyboard driver of every locale, and make it accessible right  
> there with a Compose tree.

I don't think it would be practical.

> Every time we open charmap dialogs or even go on the internet to  
> pick a character, we?re consuming some energy,

Agreed.

> and if it?s a routine task that could be done with a memorized

Memorizing also requires some effort and energy.

> Compose sequence, that energy is wasted. I don?t know if it?s a real  
> issue, but I?m likely to believe it is.
>
> Of course we need some software as a savior, but this software is  
> consequently called Zotero and helps us save and manage our research  
> results (?Search, not re-search!? https://www.zotero.org).

I have nothing against Zotero, but its mention here seems completely  
irrelevant.

Best regards

Janusz


-- 
Prof. dr hab. Janusz S. Bie? -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)
Prof. Janusz S. Bie? - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/


From eliz at gnu.org  Sat Jul 18 10:31:02 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 18 Jul 2015 18:31:02 +0300
Subject: Input methods at the age of Unicode
In-Reply-To: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
Message-ID: <83bnf95vpl.fsf@gnu.org>

> Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST)
> From: Marcel Schneider <charupdate at orange.fr>
> Cc: UnicodeMailingList <unicode at unicode.org>
> 
> > FWIW, I do that a lot, because the number of convenient input methods
> > in Emacs far outnumbers what I have on MS-Windows. For example, if I
> > have to type Russian with no Russian keyboard available, the
> > cyrillic-translit input method is a life savior. 
> 
> You might wish also to use the Windows on-screen keyboard which allows to see
> what's exactly on each key while typing on whatever physical keyboard, without
> any need to have the keycap labels match the layout. This on-screen keyboard is
> built-in, only it does not support Kana shift states.

That makes typing much more slow, unless you already know, at least
approximately, where the keys are.  you are talking to someone who is
almost touch typist in English, but cannot remember for the life of me
the Russian keyboard.  Transliteration is the way to go in such cases,
and it's strange that transliteration-based input methods are not
readily available on Windows out of the box.


From doug at ewellic.org  Sat Jul 18 12:14:48 2015
From: doug at ewellic.org (Doug Ewell)
Date: Sat, 18 Jul 2015 11:14:48 -0600
Subject: Keyman Developer for free? (was: Re: Input methods at the age of
 Unicode)
In-Reply-To: <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com>
References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net>
 <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com>
Message-ID: <B0ADB1717C6244B2A16364FEBF8CF5A1@DougEwell>

Marc Durdin wrote:

> http://tavultesoft.com/beta has the free download of Developer 9. The
> beta has the license key requirement but you can obtain a free
> perpetual license key on that page as well.

Thanks for the additional link. I'll try this.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From charupdate at orange.fr  Sat Jul 18 15:34:37 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 18 Jul 2015 22:34:37 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <83bnf95vpl.fsf@gnu.org>
References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
 <83bnf95vpl.fsf@gnu.org>
Message-ID: <1182315476.15127.1437251677051.JavaMail.www@wwinf1g36>

On 18 Jul 2015, at 17:30, Eli Zaretskii  wrote:

> > Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST)
> > From: Marcel Schneider 

> > You might wish also to use the Windows on-screen keyboard which allows to see
> > what's exactly on each key while typing on whatever physical keyboard, without
> > any need to have the keycap labels match the layout. This on-screen keyboard is
> > built-in, only it does not support Kana shift states.
> 
> That makes typing much more slow, unless you already know, at least
> approximately, where the keys are. you are talking to someone who is
> almost touch typist in English, but cannot remember for the life of me
> the Russian keyboard. Transliteration is the way to go in such cases,
> and it's strange that transliteration-based input methods are not
> readily available on Windows out of the box.

The Chinese IME new style is a very smart tool based on transliteration. You type just the syllables like they sound in English, and you get plenty of suggestions among which to choose. There is still the Chinese old style IME shipped with, too. I don't know Chinese so I can't tell more but visually I believe these tools are very performative. Perhaps for Russian no transliteration based input tool was built for Windows because we are meant to use the keyboard straightforward. Now, the osk.exe should probably include on each key picture the letter that is on the current physical keyboard. That is what I often missed on such UIs, that you cannot make the link with the base layout as the user knows it. I will say, too, that when the OS is in Russian, the OSK should display cyrillic letters following the Russian keyboard when the OSK displays a QWERTY keyboard layout. As you can have the OSK always above, you just look at it and see the keys you're striking.

There is also the old solution with a keymap on a paper. You can open the Russian layout in the MSKLC, choose a nice font, font-size, window size (to get square keys; don't let the default rectangles), nice background colors. Then save it as a picture, in the File menu > Save as image. Open this in Paint or Gimp and add the Latin letters. 

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150718/404cb932/attachment.html>

From charupdate at orange.fr  Sat Jul 18 15:44:49 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 18 Jul 2015 22:44:49 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <20150718165124.24201ih6cra1y224@mail.mimuw.edu.pl>
References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
 <20150718165124.24201ih6cra1y224@mail.mimuw.edu.pl>
Message-ID: <1769733270.15174.1437252289729.JavaMail.www@wwinf1g36>

On 18 Jul 2015, at 16:58, Janusz S. Bien  wrote:

> cyrillic-translit and most other Emacs input methods are more 
> convenient than on-screen keyboard, especially if you don't like to 
> use mouse and your goal is to get the text into Emacs :-)

The OSK while working by mouse click too, does not require the use of the mouse/trackpad.

> > This is why, too, the goal should be 
> > to pack a reasonably large subset of Unicode into the very core of 
> > the keyboard driver of every locale, and make it accessible right 
> > there with a Compose tree.
> 
> I don't think it would be practical.

Could you please explain in any way what is the reason why a Compose key, or a huge Compose tree, wouldn't be practical? I'm interested in knowing more about this issue.

> > Every time we open charmap dialogs or even go on the internet to 
> > pick a character, we?re consuming some energy,
> 
> Agreed.
> 
> > and if it?s a routine task that could be done with a memorized
> 
> Memorizing also requires some effort and energy.

Like using the bicycle instead of the car... 

> > Of course we need some software as a savior, but this software is 
> > consequently called Zotero and helps us save and manage our research 
> > results (?Search, not re-search!? https://www.zotero.org).
> 
> I have nothing against Zotero, but its mention here seems completely 
> irrelevant.

We just were talking about saviors.

Marcel 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150718/f43db1b6/attachment.html>

From marc at keyman.com  Sun Jul 19 01:16:44 2015
From: marc at keyman.com (Marc Durdin)
Date: Sun, 19 Jul 2015 06:16:44 +0000
Subject: Keyman Developer for free? (was: Re: Input methods at the age
 of Unicode)
In-Reply-To: <279207082.12069.1437230829393.JavaMail.www@wwinf1k02>
References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net>
 <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com>
 <279207082.12069.1437230829393.JavaMail.www@wwinf1k02>
Message-ID: <1CEDD746887FFF4B834688E7AF5FDA5A82164B69@federation.tavultesoft.local>


From: Marcel Schneider [mailto:charupdate at orange.fr]
Sent: Sunday, 19 July 2015 12:47 AM
Subject: Re: Keyman Developer for free? (was: Re: Input methods at the age of Unicode)

1. Does Keyman allow to place a Kana toggle? This feature available at least on Windows is useful for locales like Czech and French that use so many precomposed characters that the upper row is filled up with them to some extent. When Kana toggle is on, digits will be in Base (Kana) there. The preferred place for this toggle is E00 (ISO 9995-1).

Yes. See http://help.keyman.com/developer/9.0/docs/guide/guide_lang_options.php for one way to implement this. Note: URLs I refer to are from the beta and so are subject to change shortly, but the details will still be found on http://help.keyman.com/developer/ after the site is updated.

2. Does Keyman support extended Compose trees? An extended Compose tree allows to use ?Compose? as a part of Compose sequences. In fact, ?Compose? can convert to a dead key *any* key on the keyboard, including the Compose key itself (regardless of the fact that it is already a dead key). This allows to make sequences more user-friendly. For example, the h??ek dead key may be ?Compose, v?, while ? may be ?Compose, z, h?. With an extended Compose tree, users may input ? typing ?Compose, v, Compose, z, h?. Otherwise it must be typed ?Compose, z, v, h?, because ?Compose, v, z? is already ?. With ?Compose? acted by the right thumb, the first option may be appealing. One keystroke more, but one memorization less. However, I know that the second order matches the principle of double combining marks as stated in TUS ?7.9. It would be interesting to know the user preferences about these Compose sequences, as implementing them both is needless if one is disliked.

Yes, although not in the way you understand Compose trees. Keyman uses a more powerful context-based mechanism. See http://help.keyman.com/developer/9.0/docs/tutorial/tutorial_keyboard.php for a starter on how the Keyman keyboard language works.

3. Does Keyman propose a spreadsheet-like UI? The use of spreadsheets for keyboard layout programming helps streamlining the development process.

Not really. Table-based setups tend to constrain the design of keyboards. Keyman uses a rule based model ? see the tutorial link above for more detail.

4. Are Keyman layouts programmable in C? Windows drivers (at least, as I know little about other OSes) are. The syntax of C and C++ allows developers to use spreadsheets, from where allocation tables, deadtrans lists, and ligatures tables (that is, in keyboard driver language, Unicode character [WCHAR] sequences tables) are copied and pasted into the source.

No, this would not be cross-platform. Keyman layouts compile down to Javascript (web, mobile web, Android, iOS) or a proprietary binary format (Windows, Mac OS X). Keyman layouts can be extended with C/C++ (Windows) or Javascript (other platforms) to add more complex behaviours that cannot be represented in the Keyman keyboard language.

5. Does Keyman allow to get such ligatures (sequences) accessed by dead keys? On Windows I don't see this possibility, and I never knew how to program it. But Unicode recommends that impl?mentations provide this facility.

Yes, although dead keys are typically not the best choice for the majority of the world?s languages. See the tutorial again, e.g. step 8.

The help site for Keyman has a stack of documentation and examples and is the best place to start, but if you don?t find answers to your queries there, I am happy to answer additional questions about the specifics of Keyman off-list, or you can simply download and try the development tools yourself from http://tavultesoft.com/beta/

Regards,

Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/630407a5/attachment.html>

From charupdate at orange.fr  Sun Jul 19 07:37:55 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 19 Jul 2015 14:37:55 +0200 (CEST)
Subject: Keyman Developer for free? (was: Re: Input methods at the age
 of Unicode)
In-Reply-To: <1CEDD746887FFF4B834688E7AF5FDA5A82164B69@federation.tavultesoft.local>
References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net>
 <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com>
 <279207082.12069.1437230829393.JavaMail.www@wwinf1k02>
 <1CEDD746887FFF4B834688E7AF5FDA5A82164B69@federation.tavultesoft.local>
Message-ID: <730296831.6627.1437309475345.JavaMail.www@wwinf2221>

On 19 Jul 2015, 08:17, Marc Durdin  wrote:

>> 1. Does Keyman allow to place a Kana toggle?
?
> Yes. See http://help.keyman.com/developer/9.0/docs/guide/guide_lang_options.php for one way to implement this.

[...]

> The help site for Keyman has a stack of documentation and examples and is the best place to start, but if you don?t find answers to your queries there, I am happy to answer additional questions about the specifics of Keyman off-list, or you can simply download and try the development tools yourself from http://tavultesoft.com/beta/ 

Thank you for having answered my questions. It's a new universe for me. I understand that end-users cannot install and use the layouts like a Windows keyboard driver. I do confess that I don't feel ready to go on this way, even while seeing that it is a very performative one.

Thank you for the information.
Best regards,

Marcel 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/8a295eb4/attachment.html>

From c933103 at gmail.com  Sun Jul 19 07:52:57 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Sun, 19 Jul 2015 20:52:57 +0800
Subject: Input methods at the age of Unicode
In-Reply-To: <CAGHjPP+_pjvAi9GxhqBqHXy=J7Xhph3cf-=4aPoqoxq0piUAfg@mail.gmail.com>
References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
 <83bnf95vpl.fsf@gnu.org>
 <1182315476.15127.1437251677051.JavaMail.www@wwinf1g36>
 <CAGHjPP+_pjvAi9GxhqBqHXy=J7Xhph3cf-=4aPoqoxq0piUAfg@mail.gmail.com>
Message-ID: <CAGHjPPLJTbJks4g7Hg3+oPv1RmMZbR+sZk5+19SoEeX1RVncCw@mail.gmail.com>

forget to add Unicode maillist to reply address in my previous mail...add
back and resend
---------- ????? ----------
????"gfb hjjhjh" <c933103 at gmail.com>
???2015?7?19? ??9:38
???Re: Input methods at the age of Unicode
????"Marcel Schneider" <charupdate at orange.fr>
???

the input method of type in the sound and pick corresponding characters
have been developed for more than 20 years by many Chinese companies.
Featues include prioritize characters to be selected according to usage
frequency, if multiple sounds are input together without selection then it
would provide selection of best fit vocabulary, with database constantly
updating from network database, analyzing and personalizing its wordbank
from social application, contact list, email, SMS and what you type, and if
you input even more sounds together it can also give out candidates that
fit natural sentence structure. And for those more commonly used characters
or vocabulary, entering the first latin character of each letter's
romanization is already enough for the input method to provide a list of
best fit words, and thus saving typing time as each chinese character can
romanize up to six or seven characters. It have also been developed that
input methods have included some auto correction capability such that even
if you have not master mandarin Chinese pronounciation and make some common
mistake durung romanization, the program can still understand what you want
to type. And on the other hand for  increasing speed, as typing each
chinese character directly by their romamization often involve typing up to
6 characters, people map each vowel and each syllables into individual keys
so that only 2 key strokes is needed to press before people start selecting
which characters they want. However, as all the above mentioned methods
involve body-eye coordination to select word they want, those who really
emphasis speed would stock to some older input methods where they decompose
characters base on glyph's shape, convert that into a series of string
which if designed properly those string can be unique to most of the
characters, or even if it really come down to repeated code or when you are
using a scheme that uses shorter code which yield higher repeat rate,
people would memorize the candidate # as part of the string so that they
can type without looking at the screen. The typing speed using such method
(with regular keyboard) have been recorded at more than 220 characters per
minute which have already exceeded the Chinese national standard for
stenographer that utilize specialized stenotype machine. On the other hand
it appears that some Chinese stenotype machinese [in mainland China] used
sound of characters to type just like those mentioned at the beginning, and
some of them even used an application that compatible with the one used in
desktop environment... So it's hard to say if it help or hinder the typing
speed by letting typer rely on visual hint...
2015?7?19? ??4:39? "Marcel Schneider" <charupdate at orange.fr>???

> On 18 Jul 2015, at 17:30, Eli Zaretskii <eliz at gnu.org> wrote:
>
> > > Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST)
> > > From: Marcel Schneider <charupdate at orange.fr>
>
> > > You might wish also to use the Windows on-screen keyboard which allows
> to see
> > > what's exactly on each key while typing on whatever physical keyboard,
> without
> > > any need to have the keycap labels match the layout. This on-screen
> keyboard is
> > > built-in, only it does not support Kana shift states.
> >
> > That makes typing much more slow, unless you already know, at least
> > approximately, where the keys are. you are talking to someone who is
> > almost touch typist in English, but cannot remember for the life of me
> > the Russian keyboard. Transliteration is the way to go in such cases,
> > and it's strange that transliteration-based input methods are not
> > readily available on Windows out of the box.
>
> The Chinese IME new style is a very smart tool based on transliteration.
> You type just the syllables like they sound in English, and you get plenty
> of suggestions among which to choose. There is still the Chinese old style
> IME shipped with, too. I don't know Chinese so I can't tell more but
> visually I believe these tools are very performative. Perhaps for Russian
> no transliteration based input tool was built for Windows because we are
> meant to use the keyboard straightforward. Now, the osk.exe should probably
> include on each key picture the letter that is on the current physical
> keyboard. That is what I often missed on such UIs, that you cannot make the
> link with the base layout as the user knows it. I will say, too, that when
> the OS is in Russian, the OSK should display cyrillic letters following the
> Russian keyboard when the OSK displays a QWERTY keyboard layout. As you can
> have the OSK always above, you just look at it and see the keys you're
> striking.
>
> There is also the old solution with a keymap on a paper. You can open the
> Russian layout in the MSKLC, choose a nice font, font-size, window size (to
> get square keys; don't let the default rectangles), nice background colors.
> Then save it as a picture, in the File menu > Save as image. Open this in
> Paint or Gimp and add the Latin letters.
>
> Marcel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/4a855372/attachment.html>

From c933103 at gmail.com  Sun Jul 19 08:04:10 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Sun, 19 Jul 2015 21:04:10 +0800
Subject: UDHR in Unicode: 400 translations in text form!
In-Reply-To: <CAGHjPPJ_s_n0yLCBTPdsfuD+We4h=F1mZHH0YxgyiR-z3nisvA@mail.gmail.com>
References: <55903CBC.9050900@efele.net>
 <CAGHjPPJ_s_n0yLCBTPdsfuD+We4h=F1mZHH0YxgyiR-z3nisvA@mail.gmail.com>
Message-ID: <CAGHjPPLztb3U_E+Q7iw-xnGwp=miz0PQ=M+9KrCFZqRZ9RgVkw@mail.gmail.com>

resending previously sent mail that forgotten to add the mailing list to
receiver
---------- ????? ----------
????"gfb hjjhjh" <c933103 at gmail.com>
???2015?6?29? ??4:35
???Re: UDHR in Unicode: 400 translations in text form!
????"Eric Muller" <eric.muller at efele.net>
???

I've just use the web report form to report the discovery of its
translation (or its partial translation) in Classical Chinese, Yue Chinese,
and Min Nan Chinese form (ISO 639-3 code: lzh, yue, nan) and all of them
are from wikipedia. Please try to dig into Wikipedia to see if you can find
more translations.
2015?6?29? ??2:30? "Eric Muller" <eric.muller at efele.net>???

> I am pleased to announce that the UDHR in Unicode project (
> http://unicode.org/udhr) has reached a notable milestone: we now have 400
> translations of the Universal Declaration of Human Rights in text form.
>
> The latest translation is in Sinhala, thanks to Keshan Sodimana, Pasundu
> de Silva and Sascha Brawer. Many thanks to them and to all the contributors.
>
> There is still plenty of work: most translations would benefit from a
> review, and there are 55 translations for which we have PDFs or images, but
> not yet the text form (look for stage 2 translations).
>
> The site has also been revamped a bit, with a more functional map, and a
> more functional table of the translations. The mapping to ISO 639-3 and BCP
> 47 have been updated to take into account the evolution of those standards.
>
> Again, thanks to all the contributors, past, present and future,
>
> Eric.
>
> PS: I believe I have taken care of all the backlog of contributions and
> comments. If I missed something, sorry, and please ping me again.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/93608cad/attachment.html>

From c933103 at gmail.com  Sun Jul 19 08:05:43 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Sun, 19 Jul 2015 21:05:43 +0800
Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to
 Unicode)
In-Reply-To: <CAGHjPPJWU08vEiEunGvkjGE0LrmYvwaa+R37-2NtS7TkEcj02Q@mail.gmail.com>
References: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net>
 <CAJ6uix6hLxdnBYCiujViCqu2Rs-KjqF7GZ95fXphiDyGGC8Fbg@mail.gmail.com>
 <CAGHjPPJWU08vEiEunGvkjGE0LrmYvwaa+R37-2NtS7TkEcj02Q@mail.gmail.com>
Message-ID: <CAGHjPPLW+vFm8Je_4S_M0mgfMCYBzqg0=LEt7ZgS5NOL4QqW2Q@mail.gmail.com>

resending mails that were not sent correctly.
---------- ????? ----------
????"gfb hjjhjh" <c933103 at gmail.com>
???2015?7?7? ??4:30
???Re: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode)
???? <unicode at unicode.prg>
???

How about transparent flag?
2015?7?7? ??4:24? "Leonardo Boiko" <leoboiko at namakajiri.net>???

> 2015-07-06 17:11 GMT-03:00 Doug Ewell <doug at ewellic.org>:
> > Is it your belief that users who wish to display an emoji flag care
> > whether the flag is shown stationary versus flapping in the wind?
>
> I think a waving white flag is an emoji symbol for
> "truce/surrender/come in peace", whereas a white rectangle doesn't
> easily transmit the same idea.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/6ec229e6/attachment.html>

From charupdate at orange.fr  Sun Jul 19 08:10:40 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 19 Jul 2015 15:10:40 +0200 (CEST)
Subject: On-screen keyboards (was: Re: Input methods at the age of Unicode)
In-Reply-To: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
Message-ID: <228083042.7008.1437311440268.JavaMail.www@wwinf2221>

On 18 Jul 2015, at 16:44, I wrote:

> You might wish also to use the Windows on-screen keyboard which allows to see what's exactly on each key while typing on whatever physical keyboard, without any need to have the keycap labels match the layout. This on-screen keyboard is built-in, only it does not support Kana shift states.

Although the support of Kana shift states by the Windows OSK is not complete, it is *not* completely missing. Even more, it works fully if the Kana modifier is on Left Alt (as on my actual French delta layout), I found out testing the OSK again today. My opinion was coined when testing the OSK with a Windows keyboard layout where the Kana modifier is implemented on Right Control. Hitting or clicking Right Ctrl you see nothing happen except the Ctrl turning to white. However, when hitting or clicking the letter key, you get effectively the Kana layer character. Now on my delta, even the key labels are updated with Kana characters when Kana (Left Alt) is pressed.

Please do not understand the following as a mere criticizing. I think that suggestions on Microsoft products are most useful because of the widespread use of the products. So I would add some suggestions that if agreed may help improve user experience.

+ The dead keys are actually not highlighted on the OSK. Perhaps it would be useful to get them looking somewhat different.

+ When hitting a letter key, no visual feedback is provided. I suggest that the feedback be the same when pressing the key as when clicking the key. 

+ A few option settings should be provided, among which the additional display of the physical keycap labels (see my e-mail on 18 Jul 2015 at 22:34), the highlighting of the pinned keys (F and J), the display of the middle line, things allowing users to see which finger to use for a given key.

Probably there may be other suggestions. As a UI issue, it might however not be followed up on this List.

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/f1283500/attachment.html>

From charupdate at orange.fr  Sun Jul 19 08:15:59 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 19 Jul 2015 15:15:59 +0200 (CEST)
Subject: Input methods at the age of Unicode
In-Reply-To: <CAGHjPPLJTbJks4g7Hg3+oPv1RmMZbR+sZk5+19SoEeX1RVncCw@mail.gmail.com>
References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02>
 <83bnf95vpl.fsf@gnu.org>
 <1182315476.15127.1437251677051.JavaMail.www@wwinf1g36>
 <CAGHjPP+_pjvAi9GxhqBqHXy=J7Xhph3cf-=4aPoqoxq0piUAfg@mail.gmail.com>
 <CAGHjPPLJTbJks4g7Hg3+oPv1RmMZbR+sZk5+19SoEeX1RVncCw@mail.gmail.com>
Message-ID: <1700203382.7082.1437311759766.JavaMail.www@wwinf2221>

Hello,

thank you very much for this many information I didn't know and that is very useful to put into perspective the Windows Chinese IME new experience I referred to on the Mailing List.

Best regards,

Marcel
?

> Message du 19/07/15 15:01
> De : "gfb hjjhjh" 
> A : unicode at unicode.org
> Copie ? : 
> Objet : Re: Input methods at the age of Unicode
> 
>
forget to add Unicode maillist to reply address in my previous mail...add back and resend

---------- ????? ----------
> ????"gfb hjjhjh" 
> ???2015?7?19? ??9:38
> ???Re: Input methods at the age of Unicode
> ????"Marcel Schneider" 
> ???
> 

the input method of type in the sound and pick corresponding characters have been developed for more than 20 years by many Chinese companies. Featues include prioritize characters to be selected according to usage frequency, if multiple sounds are input together without selection then it would provide selection of best fit vocabulary, with database constantly updating from network database, analyzing and personalizing its wordbank from social application, contact list, email, SMS and what you type, and if you input even more sounds together it can also give out candidates that fit natural sentence structure. And for those more commonly used characters or vocabulary, entering the first latin character of each letter's romanization is already enough for the input method to provide a list of best fit words, and thus saving typing time as each chinese character can romanize up to six or seven characters. It have also been developed that input methods have included some auto correction capability such that even if you have not master mandarin Chinese pronounciation and make some common mistake durung romanization, the program can still understand what you want to type. And on the other hand for? increasing speed, as typing each chinese character directly by their romamization often involve typing up to 6 characters, people map each vowel and each syllables into individual keys so that only 2 key strokes is needed to press before people start selecting which characters they want. However, as all the above mentioned methods involve body-eye coordination to select word they want, those who really emphasis speed would stock to some older input methods where they decompose characters base on glyph's shape, convert that into a series of string which if designed properly those string can be unique to most of the characters, or even if it really come down to repeated code or when you are using a scheme that uses shorter code which yield higher repeat rate, people would memorize the candidate # as part of the string so that they can type without looking at the screen. The typing speed using such method (with regular keyboard) have been recorded at more than 220 characters per minute which have already exceeded the Chinese national standard for stenographer that utilize specialized stenotype machine. On the other hand it appears that some Chinese stenotype machinese [in mainland China] used sound of characters to type just like those mentioned at the beginning, and some of them even used an application that compatible with the one used in desktop environment... So it's hard to say if it help or hinder the typing speed by letting typer rely on visual hint...

2015?7?19? ??4:39? "Marcel Schneider" ???


> On 18 Jul 2015, at 17:30, Eli Zaretskii  wrote:
> 
> > > Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST)
> > > From: Marcel Schneider 
> 
> > > You might wish also to use the Windows on-screen keyboard which allows to see
> > > what's exactly on each key while typing on whatever physical keyboard, without
> > > any need to have the keycap labels match the layout. This on-screen keyboard is
> > > built-in, only it does not support Kana shift states.
> > 
> > That makes typing much more slow, unless you already know, at least
> > approximately, where the keys are. you are talking to someone who is
> > almost touch typist in English, but cannot remember for the life of me
> > the Russian keyboard. Transliteration is the way to go in such cases,
> > and it's strange that transliteration-based input methods are not
> > readily available on Windows out of the box.
> 
> The Chinese IME new style is a very smart tool based on transliteration. You type just the syllables like they sound in English, and you get plenty of suggestions among which to choose. There is still the Chinese old style IME shipped with, too. I don't know Chinese so I can't tell more but visually I believe these tools are very performative. Perhaps for Russian no transliteration based input tool was built for Windows because we are meant to use the keyboard straightforward. Now, the osk.exe should probably include on each key picture the letter that is on the current physical keyboard. That is what I often missed on such UIs, that you cannot make the link with the base layout as the user knows it. I will say, too, that when the OS is in Russian, the OSK should display cyrillic letters following the Russian keyboard when the OSK displays a QWERTY keyboard layout. As you can have the OSK always above, you just look at it and see the keys you're striking.
> 
> There is also the old solution with a keymap on a paper. You can open the Russian layout in the MSKLC, choose a nice font, font-size, window size (to get square keys; don't let the default rectangles), nice background colors. Then save it as a picture, in the File menu > Save as image. Open this in Paint or Gimp and add the Latin letters. 
> 
> Marcel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/85c79aae/attachment.html>

From doug at ewellic.org  Sun Jul 19 13:39:23 2015
From: doug at ewellic.org (Doug Ewell)
Date: Sun, 19 Jul 2015 12:39:23 -0600
Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to
 Unicode)
In-Reply-To: <mailman.1.1437325201.8488.unicode@unicode.org>
References: <mailman.1.1437325201.8488.unicode@unicode.org>
Message-ID: <DBBB8D665ED6452B89F82E55460FBB8A@DougEwell>

gfb hjjhjh <c933103 at gmail dot com> wrote:

>> I think a waving white flag is an emoji symbol for
>> "truce/surrender/come in peace", whereas a white rectangle doesn't
>> easily transmit the same idea.
>
> How about transparent flag?

I'm still not convinced this is a problem that needs to be solved. "A 
flag goes here which your system couldn't display" is all that the base 
character's glyph is trying to convey.

Proposing a new base character will ensure that this solution gets 
delayed by at least another year. Is it really worth it?

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From asmus-inc at ix.netcom.com  Sun Jul 19 18:26:52 2015
From: asmus-inc at ix.netcom.com (Asmus Freytag (t))
Date: Sun, 19 Jul 2015 16:26:52 -0700
Subject: Stationary vs. waving flags
In-Reply-To: <DBBB8D665ED6452B89F82E55460FBB8A@DougEwell>
References: <mailman.1.1437325201.8488.unicode@unicode.org>
 <DBBB8D665ED6452B89F82E55460FBB8A@DougEwell>
Message-ID: <55AC323C.7050105@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150719/a74be55a/attachment.html>

From charupdate at orange.fr  Mon Jul 20 02:39:30 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Mon, 20 Jul 2015 09:39:30 +0200 (CEST)
Subject: Plain text custom fraction input (Child thread of: Input methods at
 the age of Unicode)
Message-ID: <1598696058.2090.1437377970835.JavaMail.www@wwinf1f21>

Hello, 

I've got a concern about entering customized (vulgar) fractions in plain text, using a sequence of superscript and subscript digits separated by U+2044 FRACTION SLASH. I submitted it in PRI#297. 
As I need to clear up this point for future keyboard layout usage recommendations, I would like to submit this to the attention of the Unicode Mailing List for advice and discussion.

A demo file opening in a word processor, typeset in Arial Unicode MS
typeface, is available at http://bit.ly/1DNPtf0
To view it in PDF, there is another file at http://bit.ly/1JutBGK

The following is based on http://www.unicode.org/review/pri297/feedback.html

Date/Time: Mon Apr 13 10:07:49 CDT 2015
There is some additional information about U+2044 FRACTION SLASH I would
suggest adding at the ?Fraction Slash? paragraphs in the ?Other Punctuation?
subsection of ?6.2, page 273 of the Standard, as well as in the Code Charts?
Fractions subheader before U+2150.

U+2044 FRACTION SLASH working together with superscripts and subscripts is so
obvious no discussion is needed.
[Note: This proved to be wrong. I'm sorry not to have e-mailed this to the List.]
On the other hand, as fraction formatting
needs at least desktop publishing software, it is usually not a part of office
automation. It seems therefore useful to show the plain text entering method
for (so-called vulgar) fractions.

The "Number Forms" block?s "Fractions" subhead may therefore be followed by a
NOTICE_LINE like this one: ?@+? [TAB] [TAB] ?Fractions may be composed in
plain text on a [superscripts] 2044 [subscripts] pattern.?

On the other hand, the Fraction Slash notice in the Standard might contain the informations 
below (including those already provided in the Standard).
___________________________
Fraction Slash. U+2044 FRACTION SLASH is used between digits to form numeric
fractions. It is kerning for use with superscripts and subscripts to compose
plain text fractions such as ??? and ???.The pattern of a plain text fraction
built using the fraction slash is defined as follows: any sequence of one or
more superscript digits (U+00B9, U+00B2, U+00B3, U+2074 - U+2079, U+2070),
followed by the fraction slash, followed by any sequence of one or more
subscript digits (U+2080 - U+2089).

U+2044 FRACTION SLASH may also act as a formatting command for use with
decimal digits, and it may be used instead of U+002F SOLIDUS prior to applying
fraction formatting. The standard form of a fraction designed for formatting
is defined as follows: any sequence of one or more decimal digits (General
Category = Nd), followed by the fraction slash, followed by any sequence of
one or more decimal digits. If the fraction is to be separated from a previous
number, then a space can be used, choosing the appropriate width (normal,
thin, zero width, and so on). For example, 1 + thin space + 3 + fraction slash
+ 4 can be displayed as 1?.

Whether they are plain text or formatted, fractions should be displayed as a
unit, such as ? or {unavailable glyph}. The precise choice of display can
depend on additional formatting information. If the displaying software is
incapable of mapping the fraction to a unit, then it can also be displayed as
a simple linear sequence as a fallback (for example, 3/4). For fallback
display, U+002F SOLIDUS is preferred, because the fraction slash kerns.
????????????????????????????
Date/Time: Wed Apr 22 11:26:44 CDT 2015
Opt Subject: PRI #297 Fraction slash

2044 FRACTION SLASH

Additionally to a previous feedback, I would suggest adding the hint about how
to compose arbitrary fractions in plain text, in another place as well. This
could be the entry of the fraction slash U+2044 and, more precisely, the end
of the existing COMMENT_LINE, after a comma:

2044 FRACTION SLASH
= solidus (in typography)
* for composing arbitrary fractions, in plain text with superscripts and subscripts.

Thank you for feedback.
Best regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150720/1f097a8f/attachment.html>

From doug at ewellic.org  Mon Jul 20 10:30:48 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 20 Jul 2015 08:30:48 -0700
Subject: Stationary vs. waving flags
Message-ID: <20150720083048.665a7a7059d7ee80bb4d670165c8327d.7dd2ebc26a.wbe@email03.secureserver.net>

Asmus Freytag (t) <asmus dash inc at ix dot netcom dot com> wrote:

>> Proposing a new base character will ensure that this solution gets
>> delayed by at least another year. Is it really worth it?
>
> Sometimes haste is a poor guide.

This is ironic, considering that all of this flag stuff belongs to the
emoji wing of Unicode, where fast-tracking of "urgently needed" cheese
wedges and hockey sticks is the norm.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From jknappen at web.de  Mon Jul 20 10:46:42 2015
From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=)
Date: Mon, 20 Jul 2015 17:46:42 +0200
Subject: Security concerns: OGHAM SPACE MARK
Message-ID: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150720/e2b4c2a1/attachment.html>

From verdy_p at wanadoo.fr  Mon Jul 20 11:40:51 2015
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 20 Jul 2015 18:40:51 +0200
Subject: Security concerns: OGHAM SPACE MARK
In-Reply-To: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
Message-ID: <CAGa7JC0cmC+CSGv3Z205__oAyB5qzdqvhE+LJnxDVF2VA7yxeQ@mail.gmail.com>

Bank transactions do not send in the same field amounts that contain
operations to compute. Also they limit the kind of digits they accept for
interchanging.
A change of sign is a different kind of transaction with different
responsabilities, so signs are prohibited, they are replaced by a separate
codification of the transaction type.

So the risk may only exist when presenting a signed number to a user and
asking him to accept the transaction.

There are simialr issues when amounts are using grouping separators and
ambiguously use the decimal separator with a precision counting as many
digits as there are digits in groups (for most locales, groups are made
with 3 digits, so prices always avoid using formats with 3 decimals and
most currencies have 0 or 2 decimals of precision). This could be a problem
in locales grouping digits by group of 2. If group separators are used to
show a price to a user in a UI, it is strongly suggested to avoid anything
else than a (narrow) space. If the document will be printed you may avoid
all separators and replace the decimal sepator by the currency symbol, or
use a modified typography to render the decimals (e.g. in superscript or
smaller font size).

But the most common confusion when presenting prices to users, is to not
clearly state if taxes and additional fees will be applied or have been
included, or will have to be paid after the purchase when receiving the
product (e.g. buying a product in Australia from Europe: you accept the
price in AUD, you know that there will be bank fees to process the change
operation, you pay the price to the seller, later your bank performs the
change operation and applies a new currency rate plus fees, and you have a
second line of payment in your bank account, then a week later you receive
the product but to get it you must first pay the import taxes and VAT to
the customs (via the postal or delivery service, plus sometimes a new fee
to the devlivery service that had to advance the custom taxes and acts as
an intermediate). The total price is much higher than that was advertized.
Some sellers (notably on the Internet) do not explain clearly that these
products will cost more and what to expect, even if they target customers
in other countries in their own language as if they had a local branch in
that country.

Banks are protected from these errors, but not customers.

2015-07-20 17:46 GMT+02:00 "J?rg Knappen" <jknappen at web.de>:

> I stumbled over a very strange snippet of javascript code, where an
> apparent
> minus sign is interpreted as a space here:
>
> http://stackoverflow.com/questions/31507143/why-does-2-40-equal-42
>
> Imagine such kind of behaviour in bank transactions ...
>
> --J?rg Knappen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150720/a7456ffe/attachment.html>

From prosfilaes at gmail.com  Tue Jul 21 00:05:11 2015
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 21 Jul 2015 05:05:11 +0000
Subject: Security concerns: OGHAM SPACE MARK
In-Reply-To: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
Message-ID: <CAMZ=zj4Hn1aHfj+nh3COY57RyQ8T3TpA1b+bQRxTWRhXNL4dqg@mail.gmail.com>

It's a confusable. There's a lot of them in Unicode. Auditing source code
is hard, and if it's a concern, I suggest filtering out all non-ASCII
characters.

If you really think it's a concern, let's be specific; what do you mean
this kind of behavior in bank transactions? If you're worried about the
bank's JavaScript, you already have to trust code written for OS/360 that
the bank considers proprietary and to be keep deeply hidden, as if you
could read GOTO-laden PL/I anyway.

On Mon, Jul 20, 2015 at 8:49 AM "J?rg Knappen" <jknappen at web.de> wrote:

> I stumbled over a very strange snippet of javascript code, where an
> apparent
> minus sign is interpreted as a space here:
>
> http://stackoverflow.com/questions/31507143/why-does-2-40-equal-42
>
> Imagine such kind of behaviour in bank transactions ...
>
> --J?rg Knappen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/36638c01/attachment.html>

From charupdate at orange.fr  Tue Jul 21 01:45:19 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Tue, 21 Jul 2015 08:45:19 +0200 (CEST)
Subject: Plain text custom fraction input (Part of: Input methods at the
 age of Unicode)
In-Reply-To: <1598696058.2090.1437377970835.JavaMail.www@wwinf1f21>
References: <1598696058.2090.1437377970835.JavaMail.www@wwinf1f21>
Message-ID: <2061348317.2626.1437461119877.JavaMail.www@wwinf1f21>

Entering fractions in plain text is consistent with the very core of Unicode?s purpose, which (please check if I?m right) is to empower all people on earth to get in readable plain text as much information as possible.? As fractions, that ISO wanted to stay called ?vulgar?, are part of this information, the designer of Arial?Unicode?MS matched precomposed fractions, superscript and subscript digits and the fraction slash so that in the cases where equal precomposed fractions exist, [superscript digit(s)] U+2044 [subscript digit(s)] looks exactly like [precomposed fraction].? I really can?t see any difference.? If we look at the example in the demo files, we get convinced that in Arial?Unicode?MS, U+00B3 U+2044 U+2085 ??? is congruent with U+2157 ?.? DejaVu?Sans and DejaVu Serif and their Condensed variants are some other fonts that work.? Well, a lot of other fonts don?t, because they are uncomplete or for some other reasons, but I cannot really infer from what I see on my machine, for the reason that my versions are uncomplete.? 

You may test it by yourself and you are still welcome to download the samples:
.docx: http://bit.ly/1DNPtf0
.pdf: http://bit.ly/1JutBGK

The lesson I?learned from this is that proportionally spaced fonts which comply fully to the Standard, allow users to get nice fractions without formatting.? Obviously that does not work with monospaced fonts, nor does it look nice when the ASCII superscripts (???) and the other super- and subscripts are not of the same font, as it may occur in browsers but also in word processing.? To run this?well, call it a trick, we must make sure to use a convenient font.? But at this condition it works, and I see no reason not to do it.? Even more, I do not consider it as a mere trick, but as normal usage.

The problem we?ve now to deal with, is why this usage is hidden in the Standard.? And I?d like to bring immediately my answer to the question, an answer inherent in what I wrote yesterday:? The plain text custom fraction input method is not recommended in TUS *because* fraction formatting is a part of desktop publishing software but not of office automation software.? That may be wrong, and I didn?t check whether at one time of history, Unicode has removed plain text custom fractions from TUS, or not.? Nor can I?know whether Unicode has been urged to remove / not to inform, or not.? However, a number of facts lead me to the supposition that software marketing reasons are implied.

I need probably to underscore that I?m not here to disturb business, but to try to help to improve user experience, worktool usefulness, and overall productivity.

Regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/45c45fea/attachment.html>

From richard.wordingham at ntlworld.com  Tue Jul 21 01:56:33 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Tue, 21 Jul 2015 07:56:33 +0100
Subject: Chinese Word Breaking
Message-ID: <20150721075633.2e76fcab@JRWUBU2>

I'm puzzled by a statement in UAX #29 Unicode Text Segmentation:

"In particular, the characters with the Line_Break property values of
Contingent_Break (CB), Complex_Context (SA/Southeast Asian), and
Unknown (XX) are assigned word boundary property values based on
criteria outside of the scope of this annex. That means that
satisfactory treatment of languages like Chinese or Thai requires
special handling."

Is 'Contingent_Break (CB)' an error for 'Ideographic (ID)'?  That would
make sense for Chinese, for some applications needs to group ideographs
into words.

While I am on the topic, does anyone know of character level
mechanisms used to advise alogrithms of the word boundaries (or lack
of boundaries) in Chinese text? 

Richard.

From charupdate at orange.fr  Tue Jul 21 03:46:33 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Tue, 21 Jul 2015 10:46:33 +0200 (CEST)
Subject: UTF-8 display (was: Re: a mug)
In-Reply-To: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12>
References: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12>
Message-ID: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21>

On 13 Jul 2015, at 11:28, I wrote:

> The only time I saw UTF-8 like on the T-shirt, was when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do was to add the charset in the file header.

Now I see that this issue is much more tricky. I've just stumbled over a no-display page instead of (or at the URL of) http://www-01.ibm.com/software/globalization/topics/keyboards/physical.jsp where I read:
Our apologies???
while the source as displayed by Firefox shows:
charset=utf-8

Our apologies
(The markup comes from the header 1 tags.)

The trick is that the real HTML file as saved by Zotero contains:

Our apologies?
(with a U+2026)
and is encoded in... 
charset=windows-1252

Once changed this to utf-8, the page displays correctly:
Our apologies?

This may be why people are puzzled with UTF-8 up to the end we've seen.

So I would like to present my apologies to the List, and ask if anyone would help us to know the real problem (browsers, web editors, or else) and how to fix it. I don't think it's a mere HTML issue, as it concerns the Unicode Transformation Format.

Best regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/8397f282/attachment.html>

From albrecht.dreiheller at siemens.com  Tue Jul 21 04:12:00 2015
From: albrecht.dreiheller at siemens.com (Dreiheller, Albrecht)
Date: Tue, 21 Jul 2015 09:12:00 +0000
Subject: AW: Security concerns: OGHAM SPACE MARK
In-Reply-To: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
Message-ID: <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net>

Allowing arbitrary non-Ascii characters in programming languages will make it more difficult
to detect malicious code.
If the author really intends to deceive potential readers he will succeed.

Programming languages like JS should at least implement exclusion rules from the "Unicode Confusables Characters" list.
Otherwise such programming languages ought to be black-listed.

Albrecht.

Von: Unicode [mailto:unicode-bounces at unicode.org] Im Auftrag von "J?rg Knappen"
Gesendet: Montag, 20. Juli 2015 17:47
An: Unicode Public
Betreff: Security concerns: OGHAM SPACE MARK

I stumbled over a very strange snippet of javascript code, where an apparent
minus sign is interpreted as a space here:

http://stackoverflow.com/questions/31507143/why-does-2-40-equal-42

Imagine such kind of behaviour in bank transactions ...

--J?rg Knappen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/fc17c543/attachment.html>

From c933103 at gmail.com  Tue Jul 21 05:10:14 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Tue, 21 Jul 2015 18:10:14 +0800
Subject: Chinese Word Breaking
In-Reply-To: <20150721075633.2e76fcab@JRWUBU2>
References: <20150721075633.2e76fcab@JRWUBU2>
Message-ID: <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com>

When you write text in modern Chinese, there will not be any break between
different words, and thus if you segment characters according to the
ideographic characters, what being groupped together would either be a
clausee or a sentence, Or even a whole paragraph if you are handling some
older text without punctuations.

Also, that group of characters are not solely used by modern standard
chinese. For example, in Japanese there are expressions like ???? which
these four characters are generally treated as one word but as you can see
it is a mix of ideograph and hiragana. Similarly Taiwanese (nan) user would
also write latin alphabet together with these ideograph to form word. In
these cases if you change it to ID then what you are selecting would just
be part of the word.

And on character level you can't even tell what language the character is
written in, let alone telling apart which character is word or not. In
fact, in literal Chinese (lzh), most of these characters can be consider as
a word itself.
2015?7?21? ??2:59? "Richard Wordingham" <richard.wordingham at ntlworld.com>???

> I'm puzzled by a statement in UAX #29 Unicode Text Segmentation:
>
> "In particular, the characters with the Line_Break property values of
> Contingent_Break (CB), Complex_Context (SA/Southeast Asian), and
> Unknown (XX) are assigned word boundary property values based on
> criteria outside of the scope of this annex. That means that
> satisfactory treatment of languages like Chinese or Thai requires
> special handling."
>
> Is 'Contingent_Break (CB)' an error for 'Ideographic (ID)'?  That would
> make sense for Chinese, for some applications needs to group ideographs
> into words.
>
> While I am on the topic, does anyone know of character level
> mechanisms used to advise alogrithms of the word boundaries (or lack
> of boundaries) in Chinese text?
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/fd30f9e6/attachment.html>

From prosfilaes at gmail.com  Tue Jul 21 05:45:40 2015
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 21 Jul 2015 10:45:40 +0000
Subject: Security concerns: OGHAM SPACE MARK
In-Reply-To: <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net>
References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
 <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net>
Message-ID: <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com>

On Tue, Jul 21, 2015 at 2:14 AM Dreiheller, Albrecht <
albrecht.dreiheller at siemens.com> wrote:

> If the author really intends to deceive potential readers he will succeed.
>

Possibly. Code is hard. But the Ogham space is not a real threat; it's easy
to search for and obviously a deliberate attempt to confuse.


> Programming languages like JS should at least implement exclusion rules
> from the "Unicode Confusables Characters" list.
>

Have you looked at that list? 1 and l is one pair of confusables in that
list, and while that is an incredibly classic confusable pair, it's not one
that's implementable in a programming language. ? and a is another pair;
but if you ban ?, you've practically banned Cyrillic identifiers completely.


>
> Otherwise such programming languages ought to be black-listed.
>

Black-listed? By whom? If you wish to make sure a set of code you control
does not use non-ASCII characters, most source-control systems.will let you
reject such files from being checked in. If you want to reject JavaScript
altogether, that is also your freedom. But of all the attacks weighed
against JavaScript, I seriously doubt that this is the one that will bring
it down.

As note for confusable code, let me point out this code that someone tried
to illicitly push into the Linux CVS back in 2003:

    if ((options == (__WCLONE|__WALL)) && (current->uid = 0))
                     retval = -EINVAL;

the all-ASCII trick being that current->uid is being set to zero, not
checked. It would be much easier to find any sort of Unicode trick then a
backdoor like that in a sufficiently large body of code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/34687d9f/attachment.html>

From philip_chastney at yahoo.com  Tue Jul 21 07:49:52 2015
From: philip_chastney at yahoo.com (philip chastney)
Date: Tue, 21 Jul 2015 05:49:52 -0700
Subject: UTF-8 display (was: Re: a mug)
In-Reply-To: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21>
Message-ID: <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com>

so the webmaster put up the page, declaring the charset to be UTF-8...

but what charset was being used by the guy who knocked out the HTML?

it could be more complicated than that: maybe the page was produced using UTF-8, 
somebody reads the page using, say, WIndows 1252, and "converts" it to UTF-8

I'm sure, with a little effort, ever more complicated scenarii could be constructed
--  it's amazing what can be achieved when arrogance and ignorance are combined

/phil

--------------------------------------------
On Tue, 21/7/15, Marcel Schneider <charupdate at orange.fr> wrote:

 Subject: UTF-8 display (was: Re: a mug)
 To: "UmeshPN" <umesh.p.nair at gmail.com>, "DanielB?nzli" <daniel.buenzli at erratique.ch>
 Cc: "UnicodeMailingList" <unicode at unicode.org>
 Date: Tuesday, 21 July, 2015, 8:46 AM
 
 On 13 Jul 2015, at
 11:28, I wrote:
 
 > The only time I saw UTF-8
 like on the T-shirt, was when opening UTF-8 files that
 didn't specify charset=UTF-8. The thing to do was to add
 the charset in the file header.
 
 Now I see that this issue is
 much more tricky. I've just stumbled over a no-display
 page instead of (or at the URL of)
 http://www-01.ibm.com/software/globalization/topics/keyboards/physical.jsp
 where I read:
 Our apologies???
 while the source as displayed
 by Firefox shows:
 charset=utf-8
 Our apologies
 (The markup comes from the header 1 tags.)
 
 The trick is that the real HTML
 file as saved by Zotero contains:
 Our apologies?
 (with a U+2026)
 and is encoded in... 
 charset=windows-1252
 
 Once changed this to utf-8, the
 page displays correctly:
 Our apologies?
 
 This may be why people are
 puzzled with UTF-8 up to the end we've seen.
 
 So I would like to present my
 apologies to the List, and ask if anyone would help us to
 know the real problem (browsers, web editors, or else) and
 how to fix it. I don't think it's a mere HTML issue,
 as it concerns the Unicode Transformation Format.
 
 Best regards,
 
 Marcel


From charupdate at orange.fr  Tue Jul 21 08:45:24 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Tue, 21 Jul 2015 15:45:24 +0200 (CEST)
Subject: UTF-8 display (was: Re: a mug)
In-Reply-To: <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com>
References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21>
 <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com>
Message-ID: <457130915.11820.1437486324730.JavaMail.www@wwinf1n18>

On 21 Jul 2015, at 14;49, philip chastney 
wrote:

> so the webmaster put up the page, declaring the charset to be UTF-8...
> 
> but what charset was being used by the guy who knocked out the HTML?
> 
> it could be more complicated than that: maybe the page was produced using UTF-8, 
> somebody reads the page using, say, WIndows 1252, and "converts" it to UTF-8
> 
> I'm sure, with a little effort, ever more complicated scenarii could be constructed
> -- it's amazing what can be achieved when arrogance and ignorance are combined


I fear things have grown somewhat upside down, so I'll try to outline the real scenario:

1 - I open the page, the horizontal ellipsis is displayed as ??? (of course I don't know yet that it's a horizontal ellipsis...).
2 - I remember my comment about the T-shirt and decide to check whether it's accurate. Firefox shows me the page is in UTF-8 and that there is nothing after "Our apologies".
3 - After some trial and error, I save the page in Zotero and open the folder. The only HTML file inside is declared as Windows-1252, and there is the horizontal ellipsis.
4 - I back up the original file, try modifying the charset value to utf-8 and refresh the page, the ??? converts to a horizontal ellipsis.

To answer your questions, I figure out that the page was written on a Windows-1252 template but without sticking with this charset. U+2026 was probably an autocorrect. So it was "produced using UTF-8" but "the webmaster" must have published it under the old charset.

The puzzling point is that Firefox tried UTF-8 and told me he's serious, but "ate" the U+2026 while it used the native Windows-1252 to "display" it...

I hope that some macro could enable the "webmasters" to rapidly update websites, because resolving this "funny" "scenario" has cost me some "effort" today!

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/fb05d7ce/attachment.html>

From tom at bluesky.org  Tue Jul 21 09:00:45 2015
From: tom at bluesky.org (Tom Gewecke)
Date: Tue, 21 Jul 2015 10:00:45 -0400
Subject: UTF-8 display (was: Re: a mug)
In-Reply-To: <457130915.11820.1437486324730.JavaMail.www@wwinf1n18>
References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21>
 <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com>
 <457130915.11820.1437486324730.JavaMail.www@wwinf1n18>
Message-ID: <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org>

The IBM page seems to have an ellipsis character in UTF-8, with bytes E2 80 A6.  The web server is set to force all browsers to use the encoding iso-8859-1 regardless of what charset is stipulated in the html code.  The browser uses the Win 1252 equivalents and displays ???

To see what a web server is forcing, if anything, you can use

http://web-sniffer.net/


On Jul 21, 2015, at 9:45 AM, Marcel Schneider wrote:

> 
> I fear things have grown somewhat upside down, so I'll try to outline the real scenario:
> 
> 1 - I open the page, the horizontal ellipsis is displayed as ??? (of course I don't know yet that it's a horizontal ellipsis...).
> 2 - I remember my comment about the T-shirt and decide to check whether it's accurate. Firefox shows me the page is in UTF-8 and that there is nothing after "Our apologies".
> 3 - After some trial and error, I save the page in Zotero and open the folder. The only HTML file inside is declared as Windows-1252, and there is the horizontal ellipsis.
> 4 - I back up the original file, try modifying the charset value to utf-8 and refresh the page, the ??? converts to a horizontal ellipsis.
> 
> To answer your questions, I figure out that the page was written on a Windows-1252 template but without sticking with this charset. U+2026 was probably an autocorrect. So it was "produced using UTF-8" but "the webmaster" must have published it under the old charset.
> 
> The puzzling point is that Firefox tried UTF-8 and told me he's serious, but "ate" the U+2026 while it used the native Windows-1252 to "display" it...
> 
> I hope that some macro could enable the "webmasters" to rapidly update websites, because resolving this "funny" "scenario" has cost me some "effort" today!
> 
> Marcel
> 


From doug at ewellic.org  Tue Jul 21 11:33:17 2015
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 21 Jul 2015 09:33:17 -0700
Subject: Plain text custom fraction input
Message-ID: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>

As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273, U+2044
FRACTION SLASH is intended for use with Basic Latin digits, or other
digits with General Category = Nd. The superscript and subscript
presentation forms have General Category = No.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From gwalla at gmail.com  Tue Jul 21 15:54:39 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Tue, 21 Jul 2015 13:54:39 -0700
Subject: Emoji: The Movie
Message-ID: <CA+p4_H2OGVxzyrU=aKo6PdixhNfgyoL1BRRoSN3fvpTLA=umrg@mail.gmail.com>

I'm not sure if this is a joke or not:
http://deadline.com/2015/07/emoji-movie-sony-pictures-animation-anthony-leondis-kung-fu-panda-secrets-of-the-masters-1201482768/

From doug at ewellic.org  Tue Jul 21 16:05:20 2015
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 21 Jul 2015 14:05:20 -0700
Subject: Emoji: The Movie
Message-ID: <20150721140520.665a7a7059d7ee80bb4d670165c8327d.75b6c5f170.wbe@email03.secureserver.net>

Garth Wallace <gwalla at gmail dot com> wrote:

> I'm not sure if this is a joke or not:

Yes.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From albrecht.dreiheller at siemens.com  Tue Jul 21 16:55:05 2015
From: albrecht.dreiheller at siemens.com (Dreiheller, Albrecht)
Date: Tue, 21 Jul 2015 21:55:05 +0000
Subject: AW: Security concerns: OGHAM SPACE MARK
In-Reply-To: <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com>
References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
 <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net>
 <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com>
Message-ID: <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net>

On Tue, Jul 21, 2015 at 12:46 David Starner [mailto:prosfilaes at gmail.com] wrote:

On Tue, Jul 21, 2015 at 2:14 AM Dreiheller, Albrecht <albrecht.dreiheller at siemens.com> wrote:
If the author really intends to deceive potential readers he will succeed.
Possibly. Code is hard. But the Ogham space is not a real threat; it's easy to search for and obviously a deliberate attempt to confuse.

My concern is not about the Ogham space, but about the free usage of non-Ascii in programming languages in general.
Just imagine, when you decide to open a door for public traffic in busy city with a security check point, you wouldn't  consider only
how to check a single person; instead, you have to consider how you would check thousands of people within one hour, if you don?t plan to
close the door again.
Therefore, consider a huge software system written developed in, let's say, Serbia or Russia using Cyrillic names throughout for classes and variables.
int ?????? = ???????(?????????);  return ??????;
It might be  a valuable system with some unique features and you want to evaluate the source code before you buy it.
Or the community want's to adopt it for Open Source because it has some nice features.
Looking for a deliberate attempt to confuse within this code  would be like looking for a needle in a haystack, since every line has non-Ascii in it.
 Programming languages like JS should at least implement exclusion rules from the "Unicode Confusables Characters" list.
Have you looked at that list? 1 and l is one pair of confusables in that list, and while that is an incredibly classic confusable pair,
it's not one that's implementable in a programming language. ? and a is another pair; but if you ban ?, you've practically banned Cyrillic identifiers completely.
Of course, there are confusables within the Ascii range, but they are well-known for years, and thus more likely to be detected.
Regarding your other example, some compilers warn if you have an assignment within an if-clause.
I used a term "exclusion rules", meaning a ruleset bases on the confusables list.
For example  the following code sequence
           int a;  {  int ?;  a = 5;  }      (N.B. the second "?"  is Cyrillic)
could be banned by a rule saying
"It's not allowed to declare a variable that is DISTINCT from others (thus not hiding them) but which is CONFUSABLY SIMILAR  to another variable in the same scope."
Another rule could demand "It's not allowed to mix two alphabets within one name".
This would not ban Cyrillic identifiers in general.
Otherwise such programming languages ought to be black-listed.
Black-listed? By whom? If you wish to make sure a set of code you control does not use non-ASCII characters, most source-control systems.will let you reject such files from being checked in. If you want to reject JavaScript altogether, that is also your freedom. But of all the attacks weighed against JavaScript, I seriously doubt that this is the one that will bring it down.
With "black-listed" I meant "known to be unsafe" in some way.
Just the same way as domain-registration authorities  would be  "known to be unsafe"   if they  accept or allow domain names
like    myb?nk.com   beside   mybank.com  where one has a Latin "a" and the other has a Cyrillic  "?"  in it,  thus ignoring the confusables list.
BTW,  I don't want to attack JavaScript.  It's pretty.

The fathers of ALGOL  and other early languages racked their brain to avoid ambigous semantics caused by poor syntax rules.
Today when Unicode supersedes Ascii in some contexts the challenges are different, but not less important.

Albrecht.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/bcc562f3/attachment.html>

From asmus-inc at ix.netcom.com  Tue Jul 21 18:06:45 2015
From: asmus-inc at ix.netcom.com (Asmus Freytag (t))
Date: Tue, 21 Jul 2015 16:06:45 -0700
Subject: AW: Security concerns: OGHAM SPACE MARK
In-Reply-To: <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net>
References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
 <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net>
 <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com>
 <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net>
Message-ID: <55AED085.2040109@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/3945cc16/attachment.html>

From prosfilaes at gmail.com  Tue Jul 21 18:29:33 2015
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 21 Jul 2015 23:29:33 +0000
Subject: Security concerns: OGHAM SPACE MARK
In-Reply-To: <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net>
References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44>
 <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net>
 <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com>
 <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net>
Message-ID: <CAMZ=zj7mUn911+Xavqr_Tj4HVsfByaXh+tTB+zmEnW9=eoBAfg@mail.gmail.com>

On Tue, Jul 21, 2015 at 2:55 PM Dreiheller, Albrecht <
albrecht.dreiheller at siemens.com> wrote:


> My concern is not about the Ogham space, but about the free usage of
non-Ascii in programming languages in general.

> Just imagine, when you decide to open a door for public traffic in busy
city with a security check point, you wouldn't consider only how to check a
single person; instead, you have to consider how you would check thousands
of people within one hour, if you don?t plan to close the door again.

There is no way to check thousands of people in an hour through a door
that's a security check point. That's why few places have security check
points. That's comparable; it's very hard to check any significant body of
code at any speed, so it's a rare issue.

> Therefore, consider a huge software system written developed in, let's
say, Serbia or Russia using Cyrillic names throughout for classes and
variables.

> int ?????? = ???????(?????????); return ??????;

Then do what you need to do. Transliterate the Serbian characters, see if
it works any differently. The language (in any character set) is going to
be a large barrier for a lot of audiences, but that's what it is.

> Looking for a deliberate attempt to confuse within this code would be
like looking for a needle in a haystack, since every line has non-Ascii in
it.

Looking for a deliberate attempt to confuse in code is like looking for a
needle in a haystack. If those two lines shown in my last post had been
hidden in a million line kernel, they would have been rather hard to find,
particularly if the kernel wasn't warning-clean.

> I used a term "exclusion rules", meaning a ruleset bases on the
confusables list.

First step probably is implement it as a lint type program. Then discuss it
with the compiler writers of the languages you're worried about. As I've
said above, I don't see this as a huge concern for most real-life programs,
since the attack surface is huge.

> With "black-listed" I meant "known to be unsafe" in some way.

I.e. Javascript. C. C++. A huge amount of existing and still-in-use code is
written in C, whose buffer overruns are a notorious source of security
holes. It seems like a much better candidate to be black-listed, if anyone
was capable of such.

> The fathers of ALGOL and other early languages racked their brain to
avoid ambigous semantics caused by poor syntax rules.

Published examples of ALGOL 60 are unreadable, and very hard to verify
correctness; a modern reader will generally have to start by reformatting
the code, and then replacing GOTOs with loops and ifs, and finding better
variable names, if they want to know what's going on.

We've increased code clarity hugely, but reading large amounts of code is
still hard, hard enough that I see stressing about deliberate deception to
be a narrow market.

This is not something that really needs language support; it can be done in
compilers and editors and lint-type programs without that support.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150721/514472b4/attachment.html>

From richard.wordingham at ntlworld.com  Tue Jul 21 18:33:34 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 22 Jul 2015 00:33:34 +0100
Subject: Chinese Word Breaking
In-Reply-To: <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com>
References: <20150721075633.2e76fcab@JRWUBU2>
 <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com>
Message-ID: <20150722003334.2b5e6b94@JRWUBU2>

On Tue, 21 Jul 2015 18:10:14 +0800
gfb hjjhjh <c933103 at gmail.com> wrote:

> When you write text in modern Chinese, there will not be any break
> between different words, and thus if you segment characters according
> to the ideographic characters, what being groupped together would
> either be a clausee or a sentence, Or even a whole paragraph if you
> are handling some older text without punctuations.

I had another look at Chinese word breaking algorithms today and saw
that their practical purposes were mostly indexing and machine
translation.  Consequently, I suspect that authors have little
incentive to mark word boundaries in the texts they originate.  This
differs from the Thai situation where marking word boundaries improves
layout and spell-checking.

Richard.

From charupdate at orange.fr  Wed Jul 22 01:38:42 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 22 Jul 2015 08:38:42 +0200 (CEST)
Subject: UTF-8 display (was: Re: a mug)
In-Reply-To: <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org>
References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21>
 <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com>
 <457130915.11820.1437486324730.JavaMail.www@wwinf1n18>
 <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org>
Message-ID: <899256744.2177.1437547122576.JavaMail.www@wwinf1f21>

On 21 Jul 2015, at 16;00, Tom Gewecke  wrote:

> The IBM page seems to have an ellipsis character in UTF-8, with bytes E2 80 A6. The web server is set to force all browsers to use the encoding iso-8859-1 regardless of what charset is stipulated in the html code. The browser uses the Win 1252 equivalents and displays ???
> 
> To see what a web server is forcing, if anything, you can use
> 
> http://web-sniffer.net/


Thank you. So the file i get when saving the page is a modified one. The workaround is then, if I understand well, to let web-sniffer check whether the server is forcing an unconsistent encoding:
| Content-Type: text/html;charset=ISO-8859-1
Then save the page...
| meta http-equiv="Content-Type" content="text/html; charset=windows-1252"
...and reset the charset to the value shown in the source code:
| meta http-equiv="Content-Type" content="text/html; charset=utf-8"
Then open this.
That's very useful!

Have a great day,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150722/7d46b03f/attachment.html>

From c933103 at gmail.com  Wed Jul 22 01:46:57 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Wed, 22 Jul 2015 14:46:57 +0800
Subject: Chinese Word Breaking
In-Reply-To: <20150722003334.2b5e6b94@JRWUBU2>
References: <20150721075633.2e76fcab@JRWUBU2>
 <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com>
 <20150722003334.2b5e6b94@JRWUBU2>
Message-ID: <CAGHjPP+956vDSxuhYBH=UYtkrUj0RGgHFtw0U7JHvszOWLkuYw@mail.gmail.com>

Pretty much so, and IMO it is actually quite unnatural to write Chinese
with marking boundaries for word, and even in cases like machine
translation, people would expect the translation engine figure out how
characters should be grouped into words on its own without any markup for
word boundary or so, just like when you type a sentence into machine
translator, you would not expect the machine translator to ask you or show
you which part is subject and which part is verb, etc.

btw, you might want to look up GB/T 13715 standard from mainland China
(PRC) or CNS 14366 standard from Taiwan (ROC) fof some standard that
discuss about how to handle word segmentation when processing Chinese with
technology.
2015?7?22? ??7:37? "Richard Wordingham" <richard.wordingham at ntlworld.com>???

> On Tue, 21 Jul 2015 18:10:14 +0800
> gfb hjjhjh <c933103 at gmail.com> wrote:
>
> > When you write text in modern Chinese, there will not be any break
> > between different words, and thus if you segment characters according
> > to the ideographic characters, what being groupped together would
> > either be a clausee or a sentence, Or even a whole paragraph if you
> > are handling some older text without punctuations.
>
> I had another look at Chinese word breaking algorithms today and saw
> that their practical purposes were mostly indexing and machine
> translation.  Consequently, I suspect that authors have little
> incentive to mark word boundaries in the texts they originate.  This
> differs from the Thai situation where marking word boundaries improves
> layout and spell-checking.
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150722/a1a771df/attachment.html>

From charupdate at orange.fr  Wed Jul 22 02:00:38 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 22 Jul 2015 09:00:38 +0200 (CEST)
Subject: Plain text custom fraction input
In-Reply-To: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
Message-ID: <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>

On 21 Jul 2015, at 18;42, Doug Ewell  wrote:

> As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273, U+2044
> FRACTION SLASH is intended for use with Basic Latin digits, or other
> digits with General Category = Nd. The superscript and subscript
> presentation forms have General Category = No.

That is was bugs me, that this kerning fraction slash is presented to us as to be used with plain digits, that overlap the fraction slash in proportional fonts. That recommendation is inconsistent with plain text encoding. Following TUS, anybody who uses U+2044 must use a fraction formatting feature. I?know this from the time I'd got the valid demo version of some Desktop Publishing software. The feature wasn't flagged by the fraction slash, and on the other hand, the feature worked with the common slash U+002F too. It's a formatting command like superscript or underline.

Might anybody explain to us why the font designers of Arial Unicode MS and DejaVu Serif / DejaVu Sans defined the matching glyphs that allow users to compose professionally looking fractions in plain text, without any need of the high-end formatting as specified in TUS? I'm most likely to believe that any proportional font that complies fully to TUS, works the same way. But this fact is hidden in the Standard.

I can't believe that Unicode didn't think about this usage. If really it didn't, the invention of the fully operational fraction slash is wholly the merit of the innovative font designers. Why is this invention not being welcomed?

This is why I?suggested completing right this section of the Standard. This is also why I finally decided to bring it to the attention of the Mailing List. I hope that a huge majority will allow Unicode to complete this point.

Thank you for your feedback.

Have a nice day,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150722/8bb37020/attachment.html>

From charupdate at orange.fr  Wed Jul 22 02:22:58 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 22 Jul 2015 09:22:58 +0200 (CEST)
Subject: Global apostrophe solution? (Part of: A new take on the English
 apostrophe in Unicode; Keyman Developer for free?; Input methods at the age
 of Unicode)
Message-ID: <113034332.3118.1437549778585.JavaMail.www@wwinf1f21>

On Mon, Jun 15, 2015 at 10:19 AM, Mark Davis ??  wrote: 

> More seriously, it is not all so black and white. 

This applies to apostrophe recommendations too. The thread about the English apostrophe was biased because it (I) ended up discussing Unicode?s general apostrophe recommendation, while the scope of the thread was originally limited to one language. And before all, the discussion was somewhat biased by not taking into consideration the following TUS statement (??6.2?Punctuation Apostrophe):

| The semantics of U+2019 are therefore context dependent. For example, if surrounded by
| letters or digits on both sides, it behaves as an in-text punctuation character and does not
| separate words or lines.

I may fail, of course, but actually I?m thinking that U+02BC is not needed to prevent word separation.? As U+02BC is missing in most fonts and on all native Latin Windows keyboards, it cannot be used, even as a letter, before we have resolved some problems.? 
Please see the advice of User:Gholton in the very last paragraph of https://en.wikipedia.org/wiki/Talk:Gwich%27in_people

Moreover, if it exists in a given font, U+02BC looks mostly like U+2019, slanted if this is slanted (as in Tahoma, Segoe?UI, Open?Sans, Sakkal?Majalla), and thus does not match some expectations as stated on a web page I already cited: 
http://www.languagegeek.com/typography/apostrophes.html

The only fonts I found where U+02BC is a bit smaller than U+2019, are Linux?Biolinum?G, Gentium?Basic, Gentium?Book?Basic. If this difference of size matches the preferences of English native readers, U+02BC could be preferred in English typography.

Another bias of the Apostrophe tread was that it focussed on disambiguation for text processing only, whereas disambiguation is more generally a human readers? issue, which needs to be resolved on a glyphic level. And which comes from far, very far into the past. See again http://www.languagegeek.com/typography/apostrophes.html#Anchor-Potentia-61409 ? the last section, where Potential Problems are resolved.

Along with adding some missing information in the Standard about disambiguating quotation quotes and scare quotes, we?ll end up with language-specific recommendations for the apostrophe like for the quotation marks. About the mixup between scare quotes and quotation quotes, there was my last sentence yesterday that contained a lot of quotes looking like scare quotes but that marked quotations. Let?s take this handy example:

> I hope that some macro could enable "webmasters" to rapidly update websites, because resolving this "funny" "scenario" has cost me some "effort" today!

I?m not going to put webmasters between scare quotes! The quotes in _"webmasters"_ indicate that I?m quoting somebody who?s started talking about webmasters.
That goes on with "funny", a word that is often scare-quoted, but here it is simply a quotation from ?Re:?a?mug?, where such kind of phenomena looked rather funny (on a mug), I was told.
Again, "scenario" and "effort" are two more quotations from the e-mail I?was responding?to.

Straightforward: In English we should take example on the French and German people, who distinguish quotations and scares by using angle quotation marks for the former, comma quotation marks for the latter, even though these are considered as ?English? (I?m quoting) in France, so primarily French typographers are reluctant to use them, generating thus exactly the same irritating mixup where one is often unsure whether the author is serious or not. But serious journalism leads to systematically differenciate ?quotations? and ?scares?. This is common usage in print and web news media products from roughly all publishers.

In actual French and German usage, single quotes are nearly unexistent, despite of U+2019 being unambiguously an apostrophe in German.? Primary quotations are always in ?double quotes? (or ?this way?), and a nested close-quote (? or ?) never looks like an apostrophe.? When the goal is to help text reading and text handling, would using angle quotation marks for quotations not be a good idea?? I would add that personally I?consider these marks as more respectful towards authors who are quoted, as well as towards readers who are to understand unambiguously how it?s meant.

Eventually there could be different recommendations, so for example, in German, U+2019 is preferred for apostrophe, in French it is, too, and the use of U+2018 should be strongly discouraged, which it should be in English too when U+2019?is preferred for apostrophe; otherwise, following user preferences, U+02BC can be preferred for this, and the use of U+00AB and U+00BB would be preferred for quotations, U+2039 and U+203A for nested quotations, and U+201C - U+201D for markup that does not mean a quotation.? The same sould be recommended for all languages that don?t already differenciate visually the two meanings of quotation marks, because they don't already use angle quotes, or comma quotes.

For input, rather than (as I meant) a layout with U+02BC on E00 (because this key is too peripherical for an often used character, and the grave accent is used in TeX), a smart keyboard layout is needed, with an *apostrophe toggle* that allows to get alternately U+0027, U+2019, U+02BC on the same apostrophe key, and another independent or related toggle that makes the < and > keys produce the ? and ? quotes. Such keyboards can be programmed using Keyman?Developer. Keyman uses a powerful language to define flexible layouts including an unlimited number of toggles, which may have more than two states. See http://www.unicode.org/mail-arch/unicode-ml/y2015-m07/0146.html
Keyman is the solution for what I expected a keyboard layout to perform, and that is very hard (or even impossible) to obtain with the OS related keyboard drivers as I am programming for Windows.

As a keyboard layout framework, I?recommend Keyman.

Best regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150722/63aad7a6/attachment.html>

From richard.wordingham at ntlworld.com  Wed Jul 22 02:52:40 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 22 Jul 2015 08:52:40 +0100
Subject: Plain text custom fraction input
In-Reply-To: <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
Message-ID: <20150722085240.00f61ba2@JRWUBU2>

On Wed, 22 Jul 2015 09:00:38 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> On 21 Jul 2015, at 18;42, Doug Ewell  wrote:
> 
> > As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273,
> > U+2044 FRACTION SLASH is intended for use with Basic Latin digits,
> > or other digits with General Category = Nd. The superscript and
> > subscript presentation forms have General Category = No.
> 
> That is was bugs me, that this kerning fraction slash is presented to
> us as to be used with plain digits, that overlap the fraction slash
> in proportional fonts. That recommendation is inconsistent with plain
> text encoding. Following TUS, anybody who uses U+2044 must use a
> fraction formatting feature. I?know this from the time I'd got the
> valid demo version of some Desktop Publishing software. The feature
> wasn't flagged by the fraction slash, and on the other hand, the
> feature worked with the common slash U+002F too. It's a formatting
> command like superscript or underline.

Implementing FRACTION SLASH is fiddly, and formally it is impossible in
OpenType - the lookup tables can only cope with a finite range
of numerator and denominator lengths.  The next problem is what feature
to put it under.  Microsoft Word is notorious for preventing users from
using ligatures in Latin script text, though that restriction has been
relaxed.

One of the touted capabilities of Microsoft's Universal
Script Engine is the rendering of cartouches for Egyptian hieroglyphs.
However, the interface specification makes no mention of special
handling for them - I can only assume that the capability arises
through the enabling of certain features.  Egyptian hieroglyphs are
currently a simple script - it lacks essential support for writing the
script seen on Egyptian monuments.  (I'm not entirely sure of the
correct bidi classification of the original hieroglyphs - they should
probably be weakly right-to-left, not strongly left-to-right.  Strong
left-to-right may, however, be appropriate for most printed hieroglyphs
- I've even seen plain text hieroglyphs running left to right on a page
whose primary script is Arabic.)

Richard.


From tom at bluesky.org  Wed Jul 22 04:57:49 2015
From: tom at bluesky.org (Tom Gewecke)
Date: Wed, 22 Jul 2015 05:57:49 -0400
Subject: UTF-8 display (was: Re: a mug)
In-Reply-To: <899256744.2177.1437547122576.JavaMail.www@wwinf1f21>
References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21>
 <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com>
 <457130915.11820.1437486324730.JavaMail.www@wwinf1n18>
 <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org>
 <899256744.2177.1437547122576.JavaMail.www@wwinf1f21>
Message-ID: <3AEE94B6-3D82-477A-8B81-75A184AD1F1E@bluesky.org>

Normally you should be able to get correct display in a case like this by just going to the View > Encoding menu of your browser and switching to Unicode UTF-8.


On Jul 22, 2015, at 2:38 AM, Marcel Schneider wrote:

>  The workaround is then, if I understand well, to let web-sniffer check whether the server is forcing an unconsistent encoding:
> | Content-Type: text/html;charset=ISO-8859-1
> Then save the page...
> | meta http-equiv="Content-Type" content="text/html; charset=windows-1252"
> ...and reset the charset to the value shown in the source code:
> | meta http-equiv="Content-Type" content="text/html; charset=utf-8"
> Then open this.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150722/62138fca/attachment.html>

From charupdate at orange.fr  Wed Jul 22 05:21:32 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 22 Jul 2015 12:21:32 +0200 (CEST)
Subject: Plain text custom fraction input
In-Reply-To: <20150722085240.00f61ba2@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
Message-ID: <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>

On 22 Jul 2015, at 09:52, Richard Wordingham  wrote:

> Implementing FRACTION SLASH is fiddly, and formally it is impossible in
> OpenType - the lookup tables can only cope with a finite range
> of numerator and denominator lengths. The next problem is what feature
> to put it under. Microsoft Word is notorious for preventing users from
> using ligatures in Latin script text, though that restriction has been
> relaxed.
> 
> One of the touted capabilities of Microsoft's Universal
> Script Engine is the rendering of cartouches for Egyptian hieroglyphs.
> However, the interface specification makes no mention of special
> handling for them - I can only assume that the capability arises
> through the enabling of certain features. Egyptian hieroglyphs are
> currently a simple script - it lacks essential support for writing the
> script seen on Egyptian monuments. (I'm not entirely sure of the
> correct bidi classification of the original hieroglyphs - they should
> probably be weakly right-to-left, not strongly left-to-right. Strong
> left-to-right may, however, be appropriate for most printed hieroglyphs
> - I've even seen plain text hieroglyphs running left to right on a page
> whose primary script is Arabic.)

We never thought of common hieroglyphs otherwise as running LTR, while on monuments the great liberty of the script allows to run in amost all directions. IMO monumental transcription is always difficult to deal with, whenever exact rendering is expected. However, since Unicode's purpose is plain text encoding, we must stick with what I consider as a convention in egyptology...

...which brings us back to plain text fractions, which by an apparent but tacit convention we can input as an *unlimited* string of superscript digits, followed by U+2044, followed by an *unlimited* string of subscript digits. What are you referring to when talking about implementing the fraction slash? The fonts I've tested successfully are OpenType at least as for Arial Unicode MS. The way the fraction slash is actually implemented, was purely a font design issue, which has been brilliantly resolved:
1 - Superscript digits match numerators like they appear in precomposed fractions.
2 - Subscript digits match denominators.
3 - The fraction slash kerns consequently.

If this input method is not encouraged, what's the use of U+215F FRACTION NUMERATOR ONE?

About ligatures: Replacing ff, fl, ffl with ligatures is typically a rendering engine task, but for backwards compatibility the precomposed ligatures of the Alphabetic Presentation Forms FB00 - FB4F have been encoded in Unicode. What is the relation with plain text fractions, and why do you look out for a feature? The fraction formatting feature I mentioned, becomes right completely useless when users start typing custom fractions in plain text. That is what I suspect to be at the origin of the taboo that seems to be observed about this hint.

If you would ask me if I know hieroglyphs, well I'd just started a little bit learning. But I launched this thread only for the purpose of Latin plain text, no feature, no bidi-mirroring, just plain text fractions. The skill, if there is any, is only about how to get supers, subs, and fraction slash at reach on the keyboard. A good solution is to put them in AltGr on the NumPad. So you press Left Ctrl and Left Alt together to get superscripts right on the numpad. Adding Shift, you get the subscripts. Ah, the fraction slash: just press the numpad Divide after the last numerator digit.

That works because we can program for the numpad exactly the same shift states as on the alphanumerical block. Don't trust the comment in the C source which prevents us from integrating the numpad into the general allocation table, urging us to "put this last" (quotation). I've got no bug by not following this. Well its still "last", but at the bottom of the big table!

Thank you for your feedback.

Have a nice afternoon,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150722/8fbe44a7/attachment.html>

From frederic.grosshans at gmail.com  Wed Jul 22 05:21:43 2015
From: frederic.grosshans at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBHcm9zc2hhbnM=?=)
Date: Wed, 22 Jul 2015 12:21:43 +0200
Subject: Machine learning to find the meaning of emojis
Message-ID: <55AF6EB7.9070204@gmail.com>

The following post, by Instagram engineering team, might be interesting 
for the people in this list who are interested in the emoji use in the 
wild. It?s an attempt to algorithmically define teh meaning of emojis as 
they are used on instagram .

http://instagram-engineering.tumblr.com/post/117889701472/emojineering-part-1-machine-learning-for-emoji

I find the synonymity of ?? with #fingerscrossed quite funny but 
understandable.

      Fr?d?ric

PS: Found via ?All Things Linguistic? aka ??? 
http://allthingslinguistic.com/post/124609017512/emojineering-part-1-machine-learning-for-emoji 
. By the way, this blog post contains the first emojis in italics I ever 
saw.

From charupdate at orange.fr  Wed Jul 22 05:42:30 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 22 Jul 2015 12:42:30 +0200 (CEST)
Subject: UTF-8 display (was: Re: a mug)
In-Reply-To: <3AEE94B6-3D82-477A-8B81-75A184AD1F1E@bluesky.org>
References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21>
 <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com>
 <457130915.11820.1437486324730.JavaMail.www@wwinf1n18>
 <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org>
 <899256744.2177.1437547122576.JavaMail.www@wwinf1f21>
 <3AEE94B6-3D82-477A-8B81-75A184AD1F1E@bluesky.org>
Message-ID: <2083521746.6980.1437561751081.JavaMail.www@wwinf1d31>

On 22 Jul 2015, at 11:58, Tom Gewecke  wrote:?

> Normally you should be able to get correct display in a case like this by just going to the View > Encoding menu of your browser and switching to Unicode UTF-8.

Indeed. And now Firefox saves the page as UTF-8.
Now I found that this concern has already been dealt with at http://superuser.com/questions/765044/how-do-i-view-a-page-with-a-different-character-encoding-in-firefox

To quickly look back to the T-shirt of the parent thread http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg
Perhaps like I and a user on this forum page, people have been puzzled to find "utf-8" in the source of the page and concluded prematurely that it's buggy and hard to deal with... while it's so easy.

Thanks a lot for your help!

Best regards,

Marcel

?

> Message du 22/07/15 11:58
> De : "Tom Gewecke" 
> A : "Marcel Schneider" 
> Copie ? : "Unicode Public" 
> Objet : Re: UTF-8 display (was: Re: a mug)
> 
>Normally you should be able to get correct display in a case like this by just going to the View > Encoding menu of your browser and switching to Unicode UTF-8.

>

>

On Jul 22, 2015, at 2:38 AM, Marcel Schneider wrote:


> ?The workaround is then, if I understand well, to let web-sniffer check whether the server is forcing an unconsistent encoding:
> | Content-Type: text/html;charset=ISO-8859-1
> Then save the page...
> | meta http-equiv="Content-Type" content="text/html; charset=windows-1252"
> ...and reset the charset to the value shown in the source code:
> | meta http-equiv="Content-Type" content="text/html; charset=utf-8"
> Then open this.
>


>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150722/6ada1071/attachment.html>

From khaledhosny at eglug.org  Wed Jul 22 08:01:48 2015
From: khaledhosny at eglug.org (Khaled Hosny)
Date: Wed, 22 Jul 2015 15:01:48 +0200
Subject: Plain text custom fraction input
In-Reply-To: <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
Message-ID: <20150722130143.GA29225@khaled-laptop>

On Wed, Jul 22, 2015 at 09:00:38AM +0200, Marcel Schneider wrote:
> On 21 Jul 2015, at 18;42, Doug Ewell  wrote:
> 
> > As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273, U+2044
> > FRACTION SLASH is intended for use with Basic Latin digits, or other
> > digits with General Category = Nd. The superscript and subscript
> > presentation forms have General Category = No.
> 
> That is was bugs me, that this kerning fraction slash is presented to
> us as to be used with plain digits, that overlap the fraction slash in
> proportional fonts. That recommendation is inconsistent with plain
> text encoding. Following TUS, anybody who uses U+2044 must use a
> fraction formatting feature. I?know this from the time I'd got the
> valid demo version of some Desktop Publishing software. The feature
> wasn't flagged by the fraction slash, and on the other hand, the
> feature worked with the common slash U+002F too. It's a formatting
> command like superscript or underline.

Some layout engines, like HarfBuzz, automatically turn on the required
OpenType features for proper fraction rendering when fraction flag is
used. If the font has ?numr? and ?dnom? features, HarfBuzz will turn
them on for the <digits><fraction slash><digits> sequence. IMHO, that is
the most Unicode-compliant approach and other engines should do the
same.

Regards,
Khaled

From richard.wordingham at ntlworld.com  Wed Jul 22 17:54:02 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 22 Jul 2015 23:54:02 +0100
Subject: Plain text custom fraction input
In-Reply-To: <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
Message-ID: <20150722235402.7770e30a@JRWUBU2>

On Wed, 22 Jul 2015 12:21:32 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> On 22 Jul 2015, at 09:52, Richard Wordingham  wrote:

> We never thought of common hieroglyphs otherwise as running LTR,
> while on monuments the great liberty of the script allows to run in
> amost all directions. IMO monumental transcription is always
> difficult to deal with, whenever exact rendering is expected.
> However, since Unicode's purpose is plain text encoding, we must
> stick with what I consider as a convention in egyptology...

Which means that Ancient Egyptian hieroglyphs are unencoded!  Their
default direction is right-to-left, but that's only the start of the
trouble.  The encoded hieroglyphs aren't Bidi-mirrored, so if I embed
then in a right-to-left override, I should get retrograde characters.
Now these aren't totally useless, but at present we seem to need a
duplicate set of right-to-left hieroglyphs for unstacked text.  There
is work in progress to allow normal Egyptological hieroglyphic text.

There seems to have been a change in the notion of what the Egyptian
scripts are.  Hieratic texts are normally printed in hieroglyphs for
general study, so it had seemed that it would be legitimate to use a
font that rendered a hieratic style rather than a hieroglyphic style.
(Some 'hieroglyphs' only occurred in the hieratic style.)  The
hieratic style is strictly right-to-left, so rendering the text in a
hieratic style would not be compliant with Unicode.  However, it seems
that the hieratic style is now a separate script, so any such
rendering would now be doubly non-compliant. 

> ...which brings us back to plain text fractions, which by an apparent
> but tacit convention we can input as an *unlimited* string of
> superscript digits, followed by U+2044, followed by an *unlimited*
> string of subscript digits. What are you referring to when talking
> about implementing the fraction slash?

If you are happy with that style, I was wrong, I wasn't being clever
enough.  In a left to right context, the conversion of digits to the
numerator and denominator forms can progress from right to left for the
numerator by conditioning on the following character being a fraction
slash or converted digit, and similarly from left to right for the
denominator.  I'm not sure what should happen in right to left
contexts.  I've a feeling the numerator should come before the
denominator, but the bidi algorithm doesn't swap them - it keeps the
first number on the left. Note that subscript and superscript digits
are only available for those of us who use the Western Arabic digits.

However, I believe there is a real problem for the 'nut' style, where
the numerator and denominator are separated by a horizontal line - in
Western Asia westwards.  I'm having trouble finding examples of
fractions using Indic scripts - apparently they originally stacked the
numerator above the denominator, but I don't know what happens nowadays.

<snip>
> If this input method is not encouraged, what's the use of U+215F
> FRACTION NUMERATOR ONE?

It's for temporarily storing a character defined in some other coding
standard.

Richard.

From c933103 at gmail.com  Thu Jul 23 00:54:45 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Thu, 23 Jul 2015 13:54:45 +0800
Subject: Plain text custom fraction input
In-Reply-To: <20150722235402.7770e30a@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2>
Message-ID: <CAGHjPPJ2pBCM62eB7upaHX7HLesTY2b3FMURRG45sEG1XombTA@mail.gmail.com>

1. aren't the 'nut' style you said used in daily English too?
2.most of the time I seen fraction within Chinese text are in the 'nut'
style.
3. I think standards should noy be written in a way that users or
implementers cannot choose their preferred style to represent fractions?

2015?7?23? ??6:58? "Richard Wordingham" <richard.wordingham at ntlworld.com>???
> > ...which brings us back to plain text fractions, which by an apparent
> > but tacit convention we can input as an *unlimited* string of
> > superscript digits, followed by U+2044, followed by an *unlimited*
> > string of subscript digits. What are you referring to when talking
> > about implementing the fraction slash?
>
> If you are happy with that style, I was wrong, I wasn't being clever
> enough.  In a left to right context, the conversion of digits to the
> numerator and denominator forms can progress from right to left for the
> numerator by conditioning on the following character being a fraction
> slash or converted digit, and similarly from left to right for the
> denominator.  I'm not sure what should happen in right to left
> contexts.  I've a feeling the numerator should come before the
> denominator, but the bidi algorithm doesn't swap them - it keeps the
> first number on the left. Note that subscript and superscript digits
> are only available for those of us who use the Western Arabic digits.
>
> However, I believe there is a real problem for the 'nut' style, where
> the numerator and denominator are separated by a horizontal line - in
> Western Asia westwards.  I'm having trouble finding examples of
> fractions using Indic scripts - apparently they originally stacked the
> numerator above the denominator, but I don't know what happens nowadays.
>
> <snip>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150723/0c42aaa9/attachment.html>

From richard.wordingham at ntlworld.com  Thu Jul 23 01:44:11 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 23 Jul 2015 07:44:11 +0100
Subject: Plain text custom fraction input
In-Reply-To: <CAGHjPPJ2pBCM62eB7upaHX7HLesTY2b3FMURRG45sEG1XombTA@mail.gmail.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2>
 <CAGHjPPJ2pBCM62eB7upaHX7HLesTY2b3FMURRG45sEG1XombTA@mail.gmail.com>
Message-ID: <20150723074411.5880dd99@JRWUBU2>

On Thu, 23 Jul 2015 13:54:45 +0800
gfb hjjhjh <c933103 at gmail.com> wrote:

> 1. aren't the 'nut' style you said used in daily English too?
> 2.most of the time I seen fraction within Chinese text are in the
> 'nut' style.
> 3. I think standards should noy be written in a way that users or
> implementers cannot choose their preferred style to represent
> fractions?

The style is left to the rendering system.  The problem I see is that
the usual shaping instructions in a font cannot handle arbitrarily long
numerators and denominators for the nut style.  Perhaps I am wrong
again.

Richard.

From haberg-1 at telia.com  Thu Jul 23 03:20:47 2015
From: haberg-1 at telia.com (Hans Aberg)
Date: Thu, 23 Jul 2015 10:20:47 +0200
Subject: Plain text custom fraction input
In-Reply-To: <20150722235402.7770e30a@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2>
Message-ID: <51DE554B-3798-4252-ABE2-2CC73BA89433@telia.com>


> On 23 Jul 2015, at 00:54, Richard Wordingham <richard.wordingham at ntlworld.com> wrote:
> 
> On Wed, 22 Jul 2015 12:21:32 +0200 (CEST)
> Marcel Schneider <charupdate at orange.fr> wrote:
> 
>> On 22 Jul 2015, at 09:52, Richard Wordingham  wrote:
> 
>> We never thought of common hieroglyphs otherwise as running LTR,
>> while on monuments the great liberty of the script allows to run in
>> amost all directions. IMO monumental transcription is always
>> difficult to deal with, whenever exact rendering is expected.
>> However, since Unicode's purpose is plain text encoding, we must
>> stick with what I consider as a convention in egyptology...
> 
> Which means that Ancient Egyptian hieroglyphs are unencoded!  Their
> default direction is right-to-left, but that's only the start of the
> trouble.  The encoded hieroglyphs aren't Bidi-mirrored, so if I embed
> then in a right-to-left override, I should get retrograde characters.
> Now these aren't totally useless, but at present we seem to need a
> duplicate set of right-to-left hieroglyphs for unstacked text.  There
> is work in progress to allow normal Egyptological hieroglyphic text.

Egyptian hieroglyphs are read in the direction the heads are facing. So you need more than an RTL mapping.


From charupdate at orange.fr  Thu Jul 23 03:25:22 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 23 Jul 2015 10:25:22 +0200 (CEST)
Subject: Plain text custom fraction input
In-Reply-To: <20150722130143.GA29225@khaled-laptop>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722130143.GA29225@khaled-laptop>
Message-ID: <220949466.4439.1437639922889.JavaMail.www@wwinf1f21>

On 22 Jul 2015, at 15:08, Khaled Hosny  wrote:

> Some layout engines, like HarfBuzz, automatically turn on the required
> OpenType features for proper fraction rendering when fraction flag is
> used. If the font has ?numr? and ?dnom? features, HarfBuzz will turn
> them on for the sequence. IMHO, that is
> the most Unicode-compliant approach and other engines should do the
> same.


I fully agree that every good rendering engine must implement the Unicode fraction scheme. I'm glad to learn that Firefox and LibreOffice use HarfBuzz. Even more, as Richard Wordingham wrote yesterday, this scheme should be transposable on Arabic digits where as he writes, no super- nor subscripts are available. Moreover, uncomplete fonts?for example, ornamental fonts, which sometimes lack super- and subscripts because the user is expected to use the formatting tool (consistently with the ornamental purpose of the font), can be used for fractions thanks to the formatting feature. Using the fraction slash as a formatting flag, considerably lightens the work.

Seen from this point of view, the fractions handling as specified by Unicode is the most universal and most reliable way. On the other hand, the harmonization inside the fonts, between super- and subscripts and the numerators and denominators of the precomposed fractions they contain, could be purely esthetical without any idea of using superscripts as numerators, subscripts as denominators. 

The remaining question would then be: What was the idea when at font design, the fraction slash was given left and right kerning, so that a preceding superscript digit will take exactly the place it has as a part of a precomposed fraction, and a following subscript takes place like if it were a denominator in one of the precomposed fractions? If Unicode really never targeted such a usage and always thought of the fraction slash as a mere formatting flag with some glyph to make the user aware of its presence, this kerning idea was, as I?outlined yesterday, the merit of a caring and innovative font designer. (We should get some testimony, surely a Latin font designer on this List would be glad to share his experience, given that because of the lack of Arabic super- and subscripts in the UCS, IMHO you were not given this peculiar opportunity.) Then it would be ungrateful not to make use of his invention whenever the font complies with this alternate scheme, additionally to its support of the standard scheme.

Perhaps should we consider plain text rendering too, because many situations require that all the needed information be given in plain text. Especially in these cases, it could be interesting to be able to enter fractions that look like if they were formatted. However, keyboard layout considerations can lead to not officially recommend this input method, in order not to bug people who will complain not to have super- and subscripts along with the accompanying fraction slash right on their keyboard. Yesterday I explained that this is very easy to enter, at least on Windows (but on Linux too we have AltGr layers on the Numpad, except that these are used for the simple and double arrows like they are engraved following the legacy implementation of the caret commands). With an appropriate Windows keyboard driver, it's enough to hold down the left Ctrl and Alt while typing the numerator on the numpad followed by the numpad slash (a key that in AltGr will produce 0x2044), and adding Shift while ending with the denominator.

As I outlined yesterday at this occasion, the default Windows keyboard driver templates contain a warning to prevent developers from adding more characters on the numpad. More precisely, the allocation tables are split according to the number of shift states, and the numpad allocation table contains the least number of shift states among all these split alloc tables. Moreover, a comment says to "put this last", adding some explanation based on internal processes. But experience, at least as it is actually provided on Windows 7, proved that the numpad as well as all other keys can be unified in *one* table containing all shift states (including the Kana shift states, up to Shift + Ctrl + Alt + Kana). This is how I've got the arrows, too. I simply press *all* keys to the left of the spacebar, and I get simple or double arrows (the latter with Shift). So I must hold down Shift with the left little finger, and Ctrl, Fn, Alt, Kana with the four other fingers, while typing on the Numpad. For fractions, it's roughly the same, except that Kana is not to be pressed. This may be somewhat complicated, but I do believe that using character tables for super- and subscripts is a less performative input method.

As already outlined yesterday, I fear that much is done to prevent users from getting plainly started with the worktool, in order to keep us prisoners of some high-end software. I do not deny that this software is sometimes or often indispensible at work. But I?do wish that everybody come into the benefit of *all* performative input methods, including those which do not require more than a complete keyboard layout.

Thank you for your feedback.

Best regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150723/7ae686b7/attachment.html>

From moyogo at gmail.com  Thu Jul 23 03:48:47 2015
From: moyogo at gmail.com (Denis Jacquerye)
Date: Thu, 23 Jul 2015 09:48:47 +0100
Subject: Plain text custom fraction input
In-Reply-To: <220949466.4439.1437639922889.JavaMail.www@wwinf1f21>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722130143.GA29225@khaled-laptop>
 <220949466.4439.1437639922889.JavaMail.www@wwinf1f21>
Message-ID: <CAJKta0xVrZ5a7e1DDPf7ZM-s81Y9yo-K2RAh3ECqgP3nZ_OKyA@mail.gmail.com>

On Thu, Jul 23, 2015 at 9:25 AM, Marcel Schneider <charupdate at orange.fr>
 wrote:
>
>
> The remaining question would then be: What was the idea when at font
> design, the fraction slash was given left and right kerning, so that a
> preceding superscript digit will take exactly the place it has as a part of
> a precomposed fraction, and a following subscript takes place like if it
> were a denominator in one of the precomposed fractions?
>
Many font designers do not differentiate between superscript and numerator,
subscript and denominator because it?s easier to design glyphs once and can
work fine in some cases.
In some fonts, the superscript and subscript figures are completely
different from the numerators and denominators, or are at different
heights, because this is better in some cases.
In the end it's a design issue but you cannot expect either behaviour in
every font.

Using the recommended figures with the fraction slash will not work
everywhere or with every font, but abusing the superscript and subscript
will not either.

-- 
Denis Moyogo Jacquerye

On Thu, Jul 23, 2015 at 9:25 AM, Marcel Schneider <charupdate at orange.fr>
wrote:

> On 22 Jul 2015, at 15:08, Khaled Hosny <khaledhosny at eglug.org> wrote:
>
> > Some layout engines, like HarfBuzz, automatically turn on the required
> > OpenType features for proper fraction rendering when fraction flag is
> > used. If the font has ?numr? and ?dnom? features, HarfBuzz will turn
> > them on for the sequence. IMHO, that is
> > the most Unicode-compliant approach and other engines should do the
> > same.
>
>
> I fully agree that every good rendering engine must implement the Unicode
> fraction scheme. I'm glad to learn that Firefox and LibreOffice use
> HarfBuzz. Even more, as Richard Wordingham wrote yesterday, this scheme
> should be transposable on Arabic digits where as he writes, no super- nor
> subscripts are available. Moreover, uncomplete fonts?for example,
> ornamental fonts, which sometimes lack super- and subscripts because the
> user is expected to use the formatting tool (consistently with the
> ornamental purpose of the font), can be used for fractions thanks to the
> formatting feature. Using the fraction slash as a formatting flag,
> considerably lightens the work.
>
> Seen from this point of view, the fractions handling as specified by
> Unicode is the most universal and most reliable way. On the other hand, the
> harmonization inside the fonts, between super- and subscripts and the
> numerators and denominators of the precomposed fractions they contain,
> could be purely esthetical without any idea of using superscripts as
> numerators, subscripts as denominators.
>
> The remaining question would then be: What was the idea when at font
> design, the fraction slash was given left and right kerning, so that a
> preceding superscript digit will take exactly the place it has as a part of
> a precomposed fraction, and a following subscript takes place like if it
> were a denominator in one of the precomposed fractions? If Unicode really
> never targeted such a usage and always thought of the fraction slash as a
> mere formatting flag with some glyph to make the user aware of its
> presence, this kerning idea was, as I outlined yesterday, the merit of a
> caring and innovative font designer. (We should get some testimony, surely
> a Latin font designer on this List would be glad to share his experience,
> given that because of the lack of Arabic super- and subscripts in the UCS,
> IMHO you were not given this peculiar opportunity.) Then it would be
> ungrateful not to make use of his invention whenever the font complies with
> this alternate scheme, additionally to its support of the standard scheme.
>
> Perhaps should we consider plain text rendering too, because many
> situations require that all the needed information be given in plain text.
> Especially in these cases, it could be interesting to be able to enter
> fractions that look like if they were formatted. However, keyboard layout
> considerations can lead to not officially recommend this input method, in
> order not to bug people who will complain not to have super- and subscripts
> along with the accompanying fraction slash right on their keyboard.
> Yesterday I explained that this is very easy to enter, at least on Windows
> (but on Linux too we have AltGr layers on the Numpad, except that these are
> used for the simple and double arrows like they are engraved following the
> legacy implementation of the caret commands). With an appropriate Windows
> keyboard driver, it's enough to hold down the left Ctrl and Alt while
> typing the numerator on the numpad followed by the numpad slash (a key that
> in AltGr will produce 0x2044), and adding Shift while ending with the
> denominator.
>
> As I outlined yesterday at this occasion, the default Windows keyboard
> driver templates contain a warning to prevent developers from adding more
> characters on the numpad. More precisely, the allocation tables are split
> according to the number of shift states, and the numpad allocation table
> contains the least number of shift states among all these split alloc
> tables. Moreover, a comment says to "put this last", adding some
> explanation based on internal processes. But experience, at least as it is
> actually provided on Windows 7, proved that the numpad as well as all other
> keys can be unified in *one* table containing all shift states (including
> the Kana shift states, up to Shift + Ctrl + Alt + Kana). This is how I've
> got the arrows, too. I simply press *all* keys to the left of the spacebar,
> and I get simple or double arrows (the latter with Shift). So I must hold
> down Shift with the left little finger, and Ctrl, Fn, Alt, Kana with the
> four other fingers, while typing on the Numpad. For fractions, it's roughly
> the same, except that Kana is not to be pressed. This may be somewhat
> complicated, but I do believe that using character tables for super- and
> subscripts is a less performative input method.
>
> As already outlined yesterday, I fear that much is done to prevent users
> from getting plainly started with the worktool, in order to keep us
> prisoners of some high-end software. I do not deny that this software is
> sometimes or often indispensible at work. But I do wish that everybody come
> into the benefit of *all* performative input methods, including those which
> do not require more than a complete keyboard layout.
>
> Thank you for your feedback.
>
> Best regards,
>
> Marcel
>


-- 
Denis Moyogo Jacquerye
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150723/235a1af3/attachment.html>

From charupdate at orange.fr  Thu Jul 23 03:50:11 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 23 Jul 2015 10:50:11 +0200 (CEST)
Subject: Global apostrophe solution? (Part of: A new take on the English
 apostrophe in Unicode; Keyman Developer for free?; Input methods at the age
 of Unicode)
Message-ID: <973533516.5116.1437641411369.JavaMail.www@wwinf1f21>

As I don?t know if the apostrophe issue** has been satisfactorily resolved, I?d like to briefly check that up, making a few statements to agree or not to agree with:

1 - We are all allowed to use U+02BC for the English apostrophe.? U+2019 is only a de facto preference, mainly with respect to end-users and wysiwyg word processing.? Unicode is thus a user-oriented standard.? However we must also take into consideration the font-related issues: U+02BC missing, or varying in shape following different expectations, like in these three sans-serif fonts (tested in LibreOffice):


2 - UAX?#29 is not intended to work fine for English, so English implementations need to be tailored. These two statements are inferred from the Notes at ??4.1.1. This tailoring is however often not completed, as we can deduce from the behavior of word processors applying the UAX?#29 recommendation:
| A further complication is the use of the same character as an apostrophe
| and as a quotation mark. Therefore leading or trailing apostrophes
| are best excluded from the default definition of a word.


3 - As in English, a leading U+2019 is never a quotation mark (as opposed to Scandinavian usage), leading apostrophes should be included in word definition, at the same level as in-word apostrophes.? Only the possessive mark apostrophe would end up to be left out when trailing.? This however is inconsistent, so a complete tailoring of UAX?#29 for English must include algorithms that take a trailing U+2019 as a quote only if preceded by U+2018 within a number of words... but this too is uncomplete.


4 - Conversion of British single quotes to double quotes needs special processing to identify the close-quotes: applying a number of search rules, submitting each instance to the operator for validation.? This routine task is very annoying but remains limited to technicians (editors, typesetters), while the disambiguation of the apostrophe would affect the public on the whole.? As Marc?Davis wrote on Mon, Jun 15, 2015 at 10:19 AM:

> In practice, whenever characters are essentially identical?and by that I mean that the overlap between the acceptable glyphs for each character is very high?people will inevitably mix up the characters on entry. So any processing that depends on that distinction is forced to correct the data anyway.

Consequently, the introduction of U+02BC in English usage would not produce reliable data.


5 - The use of angle quotation marks for quotations in English (both British and American) would eliminate the apostrophe problem and bring a number of substantial advantages:

+ Quotations, especially when consisting in single words, are better highlighted and are no longer confusable with the use of scare quotes.

+ This may result in a move inside the psychological relationship towards quotations and quoting, which could eventually improve the handling of intellectual property.? A certain menace in this domain, due to word processing and internet, has been detected by Roman linguist Raffaele?Simone.

+ British and American English would use the same quotes convention, so no quotes conversion would be necessary any longer.? This process streamlining could facilitate exchanges, locale barriers being overcome while locales? ?flavour? (I?m quoting, not scaring, here?s my source: http://babelstone.blogspot.fr/2006/03/unicode-character-names-part-2-name-is.html) will be preserved trough word orthography.

+ Scare quotes would always have the same appearance, inside as well as outside of quotations. Their meaning is independent of quotation, so it seems consistent that they be not affected by their environment.


6 - Additionally, the use of U+0027 could be preferred for highlighting words, a usage found in technical documents like the Unicode documentation.? (However, even the inword apostrophe is in most cases represented by U+0027.)
As a result, the use of U+2018 is not needed any longer and should be strongly discouraged, at least in lanquages like English and French, to prevent U+2019 from being used as a quotation mark.? This is far easier and better feasible than completing all fonts with U+02BC, urge users to deal with *two* different but identically looking ?squiggles? (quotation), and track incorrect use. Having then an old and a new quotation marks convention visibly side by side, would probably be less cumbersome than having two apostrophes that look identical in most of the complete fonts but behave differently.


7 - As an input method for angle quotation marks, we can use the autocorrect while waiting that this and nested quotes management is implemented in word processing.? To achieve this, six entries may be required:
< ?? ?
?< ? ?
?< ? <
>? ? ?
?> ? ?
?> ? >
In Microsoft Word (supporting punctuations and symbols as autocorrect triggers), this will result in getting the double quotes with one keystroke, the single quotes (less used) with two keystrokes, and finally the less-than/greater-than signs with three keystrokes.
Following user preferences, the latter may be raised, and four entries only would be required:
<< ? ?
?< ? ?
>> ? ?
?> ? ?

For a solution working in *all* applications, we can program extended keyboard layouts, notably using Keyman Developer, a software that I see as an important part of Unicode implementation by its easy-to-understand and flexible layout programming, matching expectations that were uttered soon after the first releases of the Unicode Standard.


8 - I (or even: We) still not know why the apostrophe has not been disambiguated with one of the quotation marks, while the hyphen-minus (mentioned in the parent thread) has been (U+2010 vs U+2212).? I?m not sure to buy the argument that ?essential identity? (this is derived quotation, not scaring!) can be deduced from glyphic resemblence.? And indeed it hasn?t been much times in Unicode history, given that the purpose is ?to encode characters, not glyphs.?? The following quotation of TUS has not exactly this meaning:?(??1.3, p.?6) ?the standard defines how characters are interpreted, not how glyphs are rendered?.? 

In the case of ?that squiggle? '?', TUS doesn?t fully define how it is interpreted, only whether it?s a letter (U+02BC) or a punctuation (U+2019), but *not* whether it?s an apostrophe or a single closing quote, even while the two are essentially different (not in appearance, but in what philosophers called ?essentia?, which is ?the?being?).? They ?are the same in outward form but different in essence.?? To prove that to ourselves, we may look at German usage: single quotes are U+201A and U+2018, apostrophe is U+2019.? If the same principles had been applied, U+201A should have been merged with the comma, because we can?t tell the difference: ?,?,?,?(the 1st, 3rd and 5th are quotation marks).? And here at least, the semantics would have been legible even for computers: leading comma is quote, trailing comma is comma.? The actual apostrophe convention in English is illegible semantics.

The curly apostrophe?s misfortune might have been to be encoded at the same time as the curly quote, while the (curly) comma was pre-existent to it?s curly quote counterpart.? Ultimately, the punctuation apostrophe has *not* been encoded in Unicode.? Hence the *original* recommendation to use the letter apostrophe, which is very consistent with English usage.? Even more, we already learned that since 1983, the apostrophe may be considered as the 27th letter of the Latin alphabet: http://unicode.org/pipermail/unicode/2015-June/001914.html


9 - By not encoding the punctuation apostrophe, Unicode could rely upon the typographical tradition, realizing some scale economies and making the Standard more end-user friendly in some way.? This reflects however a tendency that prioritizes the appearance.? In Unicode this tendency is far from being omnipresent, it is surely very marginal in Unicode, and it?s presence is due to the influence of the software industry where that tendency is naturally more widespread, for economical reasons, that is mainly because the demand on users? side has already a component (among others) which handles appearance as a satisfactory good and not asking for more than that a given item looks fine, no matter what?s behind...

Actually, as far as the English apostrophe is concerned, the process burden is moved from input to treatment.? Users can enter text without bothering, while on the other side, other people must work hard to fix a number of recurrent problems...


Now the goal would be to know if a part of the problem is conveniently resolved, and if there is an agreement on some of the different points listed above.? Ted?Clancy and all who launched and responded the parent thread, are invited to share their feelings and how they see the topic today.

Best regards,

Marcel

** Note for archive readers:? Please refer to Ted?Clancy?s blogpost and the subsequent discussion:
http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0047.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150723/02902414/attachment.html>

From charupdate at orange.fr  Thu Jul 23 04:11:32 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 23 Jul 2015 11:11:32 +0200 (CEST)
Subject: Plain text custom fraction input
In-Reply-To: <CAJKta0xVrZ5a7e1DDPf7ZM-s81Y9yo-K2RAh3ECqgP3nZ_OKyA@mail.gmail.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722130143.GA29225@khaled-laptop>
 <220949466.4439.1437639922889.JavaMail.www@wwinf1f21>
 <CAJKta0xVrZ5a7e1DDPf7ZM-s81Y9yo-K2RAh3ECqgP3nZ_OKyA@mail.gmail.com>
Message-ID: <977808179.5607.1437642693041.JavaMail.www@wwinf1f21>

On 23 Jul 2015, at 10;48, Denis Jacquerye  wrote:

> Many font designers do not differentiate between superscript and numerator, subscript and denominator because it?s easier to design glyphs once and can work fine in some cases.
> In some fonts, the superscript and subscript figures are completely different from the numerators and denominators, or are at different heights, because this is better in some cases.
> In the end it's a design issue but you cannot expect either behaviour in every font.

> Using the recommended figures with the fraction slash will not work everywhere or with every font, but abusing the superscript and subscript will not either.


Is it really an abuse, to use the kerning of the fraction slash? Perhaps should we ask from which point of view it is an abuse. The huge majority of designers having built complete fonts, matched all little digits together, as stated. Giving the fraction slash an appropriate kerning would then be a natural reflex. Font designers who did that, won't probably refer to this usage as an abuse. I'm still afraid that this qualification comes from vendors who represent high-end layout software.

I'm fully aware however that the plain text input method for fractions does not work with all fonts, and that it requires the use of a font that authorizes this usage. This seems however the standard behavior of complete proportional fonts. I'm curious to see a font which has the superscripts differ from the numerators. I?see it may be useful, and word processors allow to choose the relative size of the superscript and subscript formatted characters, as well as their position.

Thank you for this hint.

Best regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150723/40cf7c9b/attachment.html>

From charupdate at orange.fr  Thu Jul 23 04:45:14 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 23 Jul 2015 11:45:14 +0200 (CEST)
Subject: Plain text custom fraction input
In-Reply-To: <20150722235402.7770e30a@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2>
Message-ID: <323648769.6397.1437644714380.JavaMail.www@wwinf1f21>

On 23 Jul 2015, at 01:06, Richard Wordingham  wrote:

> On Wed, 22 Jul 2015 12:21:32 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > We never thought of common hieroglyphs otherwise as running LTR,
> > while on monuments the great liberty of the script allows to run in
> > amost all directions. IMO monumental transcription is always
> > difficult to deal with, whenever exact rendering is expected.
> > However, since Unicode's purpose is plain text encoding, we must
> > stick with what I consider as a convention in egyptology...
> 
> Which means that Ancient Egyptian hieroglyphs are unencoded! Their
> default direction is right-to-left,

Sorry, I didn't know it, I must have forgotten. However, as Hans Aberg notes, they're facing writing direction, I?remember that looking at the writing signs representing living creatures from the side, we can detect writing direction. I don't remember however that we'd to write ancient hieroglyphs from right to left. But one may do it without problems, except if...

> but that's only the start of the
> trouble. The encoded hieroglyphs aren't Bidi-mirrored,

That's really a pity. Hieroglyphs *must* be bidi-mirroring enabled to ensure the plain usefulness of the encoded characters. 

> so if I embed
> then in a right-to-left override, I should get retrograde characters.
> Now these aren't totally useless, but at present we seem to need a
> duplicate set of right-to-left hieroglyphs for unstacked text. There
> is work in progress to allow normal Egyptological hieroglyphic text.
> 
> There seems to have been a change in the notion of what the Egyptian
> scripts are. Hieratic texts are normally printed in hieroglyphs for
> general study, so it had seemed that it would be legitimate to use a
> font that rendered a hieratic style rather than a hieroglyphic style.
> (Some 'hieroglyphs' only occurred in the hieratic style.) The
> hieratic style is strictly right-to-left, so rendering the text in a
> hieratic style would not be compliant with Unicode. However, it seems
> that the hieratic style is now a separate script, so any such
> rendering would now be doubly non-compliant. 
> 
> > ...which brings us back to plain text fractions, which by an apparent
> > but tacit convention we can input as an *unlimited* string of
> > superscript digits, followed by U+2044, followed by an *unlimited*
> > string of subscript digits. What are you referring to when talking
> > about implementing the fraction slash?
> 
> If you are happy with that style, I was wrong, I wasn't being clever
> enough.

It's a matter of practice! I wouldn't bother typing in super- and subscripts if I hadn't them on the keyboard layout :-)

> In a left to right context, the conversion of digits to the
> numerator and denominator forms can progress from right to left for the
> numerator by conditioning on the following character being a fraction
> slash or converted digit, and similarly from left to right for the
> denominator. I'm not sure what should happen in right to left
> contexts.

Sorry again, I wasn't really thinking about, even when yesterday I denied bidi-mirroring (I?regretted soon), since the keyboard layout I'm programming is dedicated for use with Latin script. But I believe that the principles are portable to support other scripts, ideally *all* scripts.

> I've a feeling the numerator should come before the
> denominator, but the bidi algorithm doesn't swap them - it keeps the
> first number on the left. Note that subscript and superscript digits
> are only available for those of us who use the Western Arabic digits.

As I wrote to Khaled Hosny a few moments ago, I understand that fraction formatting is indispensible with Arabic (read: actual Arabic) digits.

> 
> However, I believe there is a real problem for the 'nut' style, where
> the numerator and denominator are separated by a horizontal line - in
> Western Asia westwards. I'm having trouble finding examples of
> fractions using Indic scripts - apparently they originally stacked the
> numerator above the denominator, but I don't know what happens nowadays.

IMHO it would be hard to input fractions in nut style while using plain text or normal formatting, at the extent that we need the special Maths applications we know, from LibreOffice as far as I am concerned. But that isn't plain text. With the font-supported plain text fraction input as suggested, we can never get nut style, unfortunately. This is inimaginable *in plain text*.

> 
> 
> > If this input method is not encouraged, what's the use of U+215F
> > FRACTION NUMERATOR ONE?
> 
> It's for temporarily storing a character defined in some other coding
> standard.

It would be interesting to know more about this standard, and what was the use of this character in that standard, which seems to be hard to retrieve. What do you mean by "temporarily", given that Unicode code point allocations are stable? I'm very puzzled. I'd rather think that the inverse value as a "vulgar" fraction is so important that an input facility is provided, intended to be completed with subscript digits.

Best regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150723/c6a82178/attachment.html>

From frederic.grosshans at gmail.com  Thu Jul 23 05:00:06 2015
From: frederic.grosshans at gmail.com (=?windows-1252?Q?Fr=E9d=E9ric_Grosshans?=)
Date: Thu, 23 Jul 2015 12:00:06 +0200
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom
 fraction input)
In-Reply-To: <20150722235402.7770e30a@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2>
Message-ID: <55B0BB26.5080601@gmail.com>

Le 23/07/2015 00:54, Richard Wordingham a ?crit :
> Which means that Ancient Egyptian hieroglyphs are unencoded!  Their
> default direction is right-to-left, but that's only the start of the
> trouble.  The encoded hieroglyphs aren't Bidi-mirrored, so if I embed
> then in a right-to-left override, I should get retrograde characters.
The text of the standard say that they should be mirrored in this case. 
The version 7.0.0. has the following comment on Egyptian hieroglyphs : 
(p424, p9 of pdf) :

    ?When left-to-right directionality is overridden to display Egyptian
    hieroglyphic text right to left, the glyphs should be mirrored from
    those shown in the code charts.?

Similar comments are present for other historic script (Italic, Runic), 
but also Old North Arabian, which is encoded as RTL but ?Glyphs may be 
mirrored in lines whenthey have left-to-right directionality?. This kind 
of implementation at the font level is perfectly possible and is indeed 
done sometimes (see e.g. Andrew West?s anglo-saxon runic fonts 
http://babelstone.co.uk/Fonts/AngloSaxon.html).

The BidiMirrored property is not adapted in this case because, it is for 
a few ?characters such as parentheses? (Unicode8.0.0, ?4.7 p180=pf 23 of 
ch04.pdf), and it is thought for a LTR default : it can in no way 
consider the case of Old North Arabian.

Extending this property for whole scripts would be a lot of work, and 
should be more than a Y/N property as currently, since it should account 
for cases where the glyph are

 1. always mirrored (Egyptian, Italic, Runic. Greek ?),
 2. sometimes mirrored (I have examples of both cases in Latin. North
    Arabian seems to be in this case too),
 3. never mirrored (Han),
 4. not exactly mirrored ( like for U+2232 CLOCKWISE CONTOUR INTEGRAL
    and U+221B CUBE ROOT )
 5. And also when the behaviour under direction change is undefined (I
    have difficulties to guess what it means to have LTR Arabic or
    Syriac, or RTL Devanagari. Maybe there are some traditions for some
    complex scripts, but it makes no sense to invent a uniform behaviour
    for them)

Currently a BidiMirrorred=N can mean anything of the above, and 
BidiMirrored=Y means (1. or 4.).

By the way, I think a comment should be added in the ?4.7 of the 
standard to clarify that the BidiMirrored property is not intended for 
cases like hieroglyphs or italic.

     Fr?d?ric


From khaledhosny at eglug.org  Thu Jul 23 07:50:59 2015
From: khaledhosny at eglug.org (Khaled Hosny)
Date: Thu, 23 Jul 2015 14:50:59 +0200
Subject: Plain text custom fraction input
In-Reply-To: <220949466.4439.1437639922889.JavaMail.www@wwinf1f21>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722130143.GA29225@khaled-laptop>
 <220949466.4439.1437639922889.JavaMail.www@wwinf1f21>
Message-ID: <20150723125059.GA26732@khaled-laptop>

On Thu, Jul 23, 2015 at 10:25:22AM +0200, Marcel Schneider wrote:
> The remaining question would then be: What was the idea when at font
> design, the fraction slash was given left and right kerning, so that a
> preceding superscript digit will take exactly the place it has as a
> part of a precomposed fraction, and a following subscript takes place
> like if it were a denominator in one of the precomposed fractions?

What says that this kerning is there for super/subscript glyphs, it can
be equally (and more likely) be there for the numerator and denominator
glyphs.

Regards,
Khaled

From khaledhosny at eglug.org  Thu Jul 23 07:59:20 2015
From: khaledhosny at eglug.org (Khaled Hosny)
Date: Thu, 23 Jul 2015 14:59:20 +0200
Subject: Plain text custom fraction input
In-Reply-To: <20150722235402.7770e30a@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2>
Message-ID: <20150723125920.GC26732@khaled-laptop>

On Wed, Jul 22, 2015 at 11:54:02PM +0100, Richard Wordingham wrote:
> On Wed, 22 Jul 2015 12:21:32 +0200 (CEST)
> Marcel Schneider <charupdate at orange.fr> wrote:
> 
> > On 22 Jul 2015, at 09:52, Richard Wordingham  wrote:
> 
> > We never thought of common hieroglyphs otherwise as running LTR,
> > while on monuments the great liberty of the script allows to run in
> > amost all directions. IMO monumental transcription is always
> > difficult to deal with, whenever exact rendering is expected.
> > However, since Unicode's purpose is plain text encoding, we must
> > stick with what I consider as a convention in egyptology...
> 
> Which means that Ancient Egyptian hieroglyphs are unencoded!  Their
> default direction is right-to-left, but that's only the start of the
> trouble.  The encoded hieroglyphs aren't Bidi-mirrored, so if I embed
> then in a right-to-left override, I should get retrograde characters.

At least in OpenType, you can have mirrored glyphs in the font (which
you will need in any case) and use a ?rtlm? feature which should be
applied when the text is being typeset right-to-left (naturally or
forced).

Regards,
Khaled

From charupdate at orange.fr  Thu Jul 23 09:47:58 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 23 Jul 2015 16:47:58 +0200 (CEST)
Subject: Plain text custom fraction input
In-Reply-To: <20150723125059.GA26732@khaled-laptop>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722130143.GA29225@khaled-laptop>
 <220949466.4439.1437639922889.JavaMail.www@wwinf1f21>
 <20150723125059.GA26732@khaled-laptop>
Message-ID: <1182631931.12000.1437662879081.JavaMail.www@wwinf1n18>

On 23 Jul 2015, at 14:57, ?Khaled Hosny  wrote:

> On Thu, Jul 23, 2015 at 10:25:22AM +0200, Marcel Schneider wrote:
> > The remaining question would then be: What was the idea when at font
> > design, the fraction slash was given left and right kerning, so that a
> > preceding superscript digit will take exactly the place it has as a
> > part of a precomposed fraction, and a following subscript takes place
> > like if it were a denominator in one of the precomposed fractions?
> 
> What says that this kerning is there for super/subscript glyphs, it can
> be equally (and more likely) be there for the numerator and denominator
> glyphs.

You are right, the fraction slash's kerning helps the rendering engine when it's flagged to use the numerators and denominators. I should be able to look inside a font with Western Arabic super- and subscripts and with glyphs for numerator and for denominator, to see whether the numerator glyphs are mapped to the superscript glyphs, and the denominator glyphs to the subscript glyphs. As Denis Jacquerye wrote, this is, if ever, not the case in all fonts, some of them having different glyphs for the two classes. 

The fraction formatting works also when the slash is not a fraction slash but a common slash. Here too it would be interesting to know whether the slash is then mapped to U+2044, or the rendering engine performs the whole.

if the synergy between the fraction slash and the super- and subscripts is purely fortuitous, plain text fraction input would be categorized as a hack, a shortcut which works around the legal process. I would be glad if that weren't true, because I think that the shortest way, if correct, is the best. Again, this short way is practicable only under certain circumstances.

Regards,
Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150723/f9aa12d3/attachment.html>

From doug at ewellic.org  Thu Jul 23 11:00:46 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 23 Jul 2015 09:00:46 -0700
Subject: Plain text custom fraction input
Message-ID: <20150723090046.665a7a7059d7ee80bb4d670165c8327d.1347f3f300.wbe@email03.secureserver.net>

Sorry, everyone:

> On the other hand, the harmonization inside the fonts, between super-
> and subscripts and the numerators and denominators of the precomposed
> fractions they contain, could be purely esthetical without any idea of
> using superscripts as numerators, subscripts as denominators. [...]

> The fraction formatting works also when the slash is not a fraction
> slash but a common slash. [...]

What you have discovered is that under certain circumstances, with
certain fonts, you can get the visual results you want by using
characters other than those recommended in the Standard -- by using
characters simply because they "look right."

This is not plain text encoding, and it is not a matter of Unicode
failing to consider a particular usage scenario or failing to "complete"
some part of the Standard. It is about having an incomplete
understanding of the Unicode Standard.

Read, listen, learn.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From kenwhistler at att.net  Thu Jul 23 11:23:23 2015
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 23 Jul 2015 09:23:23 -0700
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <55B0BB26.5080601@gmail.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
Message-ID: <55B114FB.2000000@att.net>


On 7/23/2015 3:00 AM, Fr?d?ric Grosshans wrote:
>
> By the way, I think a comment should be added in the ?4.7 of the 
> standard to clarify that the BidiMirrored property is not intended for 
> cases like hieroglyphs or italic.
>
>

This eminently sensible suggestion has been passed along to the
Unicode editorial committee for consideration.

--Ken


From richard.wordingham at ntlworld.com  Thu Jul 23 13:42:50 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 23 Jul 2015 19:42:50 +0100
Subject: BidiMirrored property  and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <55B0BB26.5080601@gmail.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
Message-ID: <20150723194250.1cc05710@JRWUBU2>

On Thu, 23 Jul 2015 12:00:06 +0200
Fr?d?ric Grosshans <frederic.grosshans at gmail.com> wrote:

> Le 23/07/2015 00:54, Richard Wordingham a ?crit :
> > Which means that Ancient Egyptian hieroglyphs are unencoded!  Their
> > default direction is right-to-left, but that's only the start of the
> > trouble.  The encoded hieroglyphs aren't Bidi-mirrored, so if I
> > embed then in a right-to-left override, I should get retrograde
> > characters.
> The text of the standard say that they should be mirrored in this
> case. The version 7.0.0. has the following comment on Egyptian
> hieroglyphs : (p424, p9 of pdf) :
> 
>     ?When left-to-right directionality is overridden to display
> Egyptian hieroglyphic text right to left, the glyphs should be
> mirrored from those shown in the code charts.?

The UCD may trump the core specification; I'm expecting to be advised
not to trust anything in the core specification.

> Similar comments are present for other historic script (Italic,
> Runic), but also Old North Arabian, which is encoded as RTL but
> ?Glyphs may be mirrored in lines whenthey have left-to-right
> directionality?. This kind of implementation at the font level is
> perfectly possible and is indeed done sometimes (see e.g. Andrew
> West?s anglo-saxon runic fonts
> http://babelstone.co.uk/Fonts/AngloSaxon.html).

> The BidiMirrored property is not adapted in this case because, it is
> for a few ?characters such as parentheses? (Unicode8.0.0, ?4.7
> p180=pf 23 of ch04.pdf), and it is thought for a LTR default : it can
> in no way consider the case of Old North Arabian.

There had been hope until today.

> Extending this property for whole scripts would be a lot of work, and 
> should be more than a Y/N property as currently, since it should
> account for cases where the glyph are
> 
>  1. always mirrored (Egyptian, Italic, Runic. Greek ?),
>  2. sometimes mirrored (I have examples of both cases in Latin. North
>     Arabian seems to be in this case too),
>  3. never mirrored (Han),
>  4. not exactly mirrored ( like for U+2232 CLOCKWISE CONTOUR INTEGRAL
>     and U+221B CUBE ROOT )
>  5. And also when the behaviour under direction change is undefined (I
>     have difficulties to guess what it means to have LTR Arabic or
>     Syriac, or RTL Devanagari. Maybe there are some traditions for
> some complex scripts, but it makes no sense to invent a uniform
> behaviour for them)
 
> Currently a BidiMirrorred=N can mean anything of the above, and 
> BidiMirrored=Y means (1. or 4.).

To be precise, having reread the Bidi algorithm, in particular L4 and
HL6:

1) If resolved directionality is R and Bidi_Mirrored=Yes,
mirroring is mandatory.

2) If resolved directionality is L and bidirectional type is not R
or AL, mirroring is prohibited.

3) Otherwise, mirroring is optional.

It's odd that a font that reverses all the Hebrew letters is compliant
with the Unicode standard.

So, I was wrong.  Not marking hieroglyphs as Bidi_Mirrored didn't stop
them being used for Ancient Egyptian in marked up text.

> By the way, I think a comment should be added in the ?4.7 of the 
> standard to clarify that the BidiMirrored property is not intended
> for cases like hieroglyphs or italic.

That is a stupid and dangerous remark.

If the hieroglyphs had had the BidiMirrored property corrected to Yes,
one could have had, in plain text, once fonts had caught up:

<U+132B9 EGYPTIAN HIEROGLYPH R008> for nt?r in normal left-to-right text
<U+202B RIGHT-TO-LEFT EMBEDDING, U+132B9, U+202C POP DIRECTIONAL
FORMATTING> for nt?r in retrograde left-to-right text

and embed whole paragraphs in <U+202B>...<U+202C> for right-to-left
text.

Once your remark has been adopted in the Unicode Standard, the only
way to get consistently oriented Ancient Egyptian in plain text is to:

a) Add a complete set of right-to-left hieroglyphs.
b) Add the retrograde hieroglyphs to each set.

One hopes that Egyptian Hieroglyphs is the only script for which
mirroring or not has meaning.

Richard.


From richard.wordingham at ntlworld.com  Thu Jul 23 15:25:40 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 23 Jul 2015 21:25:40 +0100
Subject: Plain text custom fraction input
In-Reply-To: <323648769.6397.1437644714380.JavaMail.www@wwinf1f21>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2>
 <323648769.6397.1437644714380.JavaMail.www@wwinf1f21>
Message-ID: <20150723212540.0d02f7f4@JRWUBU2>

On Thu, 23 Jul 2015 11:45:14 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> On 23 Jul 2015, at 01:06, Richard Wordingham  wrote:

> IMHO it would be hard to input fractions in nut style while using
> plain text or normal formatting, at the extent that we need the
> special Maths applications we know, from LibreOffice as far as I am
> concerned. But that isn't plain text. With the font-supported plain
> text fraction input as suggested, we can never get nut style,
> unfortunately. This is inimaginable *in plain text*.

The Unicode does not distinguish 'nut' style and the 'slash'-based
style.  The problem is entirely one of rendering.  A renderer could
support the 'nut' style, just as renderers typically support
underlining and strike-out with just a few numeric parameters from the
font.  'Plain text' just means no formatting commands associated with
the text - it doesn't prevent immense quantities of information being
taken from a font, but it does prevent specification of which font to
use.

> > > If this input method is not encouraged, what's the use of U+215F
> > > FRACTION NUMERATOR ONE?

> > It's for temporarily storing a character defined in some other
> > coding standard.

> It would be interesting to know more about this standard, and what
> was the use of this character in that standard, which seems to be
> hard to retrieve. What do you mean by "temporarily", given that
> Unicode code point allocations are stable?

The idea is that data is read in from an old encoding, manipulated, and
written out in the old encoding.  For long term use, it would be
better to convert the data, though conversion may have to do more than
just change the character sequence.  You are correct in that the
unconverted data may be held as such indefinitely.

>  I'm very puzzled. I'd
> rather think that the inverse value as a "vulgar" fraction is so
> important that an input facility is provided, intended to be
> completed with subscript digits.

The standard answer is that in the Unicode scheme, that sort of
capability should belong to the input mechanism.  An example is
the general refusal to encode new precomposed characters.  Indeed, if
renderers supported U+2044 (rather than just treating it as an ordinary
character), input resources would be better employed supporting the
input of U+2044.

Richard.

From charupdate at orange.fr  Fri Jul 24 04:23:59 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 24 Jul 2015 11:23:59 +0200 (CEST)
Subject: Plain text custom fraction input
Message-ID: <303748076.8726.1437729839437.JavaMail.www@wwinf1f21>

On 23 Jul 2015, at 18:00, Doug Ewell  wrote:

> What you have discovered

?

Alas, I'd better done a search on the internet prior to solliciting some new advice and feedback, with respect to other peoples' time. Indeed I've "discovered" (quotation) that for myself, but as I learned *after* my last reply yesterday, this "new" (scare quotes!) way of input fractions is already a more or less well established practice. Please read the information with my apologies two e-mails later.
?

> is that under certain circumstances, with
> certain fonts, you can get the visual results you want by using
> characters other than those recommended in the Standard -- by using
> characters simply because they "look right."


This might be the case of the apostrophe too, for which a quotation mark is used for its looking the same. Yesterday I criticized this proceeding when I wrote in the thread ?Global apostrophe solution??:

>> This reflects however a tendency that prioritizes the appearance. In Unicode this tendency is far from being omnipresent, it is surely very marginal in Unicode, and it?s presence is due to the influence of the software industry where that tendency is naturally more widespread, for economical reasons, that is mainly because the demand on users? side has already a component (among others) which handles appearance as a satisfactory good and not asking for more than that a given item looks fine, no matter what?s behind...

Really I understand now that for the fractions I suggest to do exactly the same: using characters that are intended to be used as superscripts/subscripts, to represent digits that are numerators/denominators, not superscripts/subscripts. From the beginning on, my view was based solely on appearance, and the samples I provided use only one single font.

> 
> This is not plain text encoding, and it is not a matter of Unicode
> failing to consider a particular usage scenario or failing to "complete"
> some part of the Standard. It is about having an incomplete
> understanding of the Unicode Standard.

I'm truly far, very far from knowing thoroughly the least part of the Standard, and often I started mailing while the requested information would have been at hand by simply uplooking TUS... About plain text, I simply know for having read it somewhere, that this is the base purpose of Unicode. Representing fractions as U+2044 is known as a compatibility mapping, equally like representing a superscript as , while (I go on checking my knowledge...) representing a precomposed diacriticized letter as is known as a decomposition mapping. The difference between the two ways of getting the same thing is in plain text. With decomposition we stay in plain text, while compatibility mappings need formatting, thus leaving the field of plain text.

So in fact, what I'm suggesting for fractions, is to use a decomposition rather than a compat mapping. And to use this decomposition scheme to compose arbitrary fractions without leaving plain text. The problem is, as you point it out, that this is *not* defined in the Standard. Therefore a font can be compliant to the Standard without allowing this usage. That is the case of at least *all* monospaced fonts. By contrast, for example combining diacritics work in *all* Unicode compliant fonts if the decomposition mapping is defined. Overlay combining diacritics however sometimes don't work fine. Their usage is not defined in the Standard for decomposition (precomposed letters with overlay diacritics are not decomposed), *because* they don't work always fine. From this we might infer that plain text custom fraction input is not a part of TUS because it doesn't always work fine.

> 
> Read, listen, learn.

Thank you for your answer. I've been given the opportunity of learning a certain amount of things by reading the actual replies and by doing some searches in the Archive. I?confess however that I'm somewhat unprepared. It's very hard for me to work up all that's required within a useful timelap, unfortunately.

Best regards,

Marcel
?

> Message du 23/07/15 18:10
> De : "Doug Ewell" 
> A : "Unicode Mailing List" 
> Copie ? : "Marcel Schneider" 
> Objet : RE: Plain text custom fraction input
> 
> Sorry, everyone:
> 
> > On the other hand, the harmonization inside the fonts, between super-
> > and subscripts and the numerators and denominators of the precomposed
> > fractions they contain, could be purely esthetical without any idea of
> > using superscripts as numerators, subscripts as denominators. [...]
> 
> > The fraction formatting works also when the slash is not a fraction
> > slash but a common slash. [...]
> 
> What you have discovered is that under certain circumstances, with
> certain fonts, you can get the visual results you want by using
> characters other than those recommended in the Standard -- by using
> characters simply because they "look right."
> 
> This is not plain text encoding, and it is not a matter of Unicode
> failing to consider a particular usage scenario or failing to "complete"
> some part of the Standard. It is about having an incomplete
> understanding of the Unicode Standard.
> 
> Read, listen, learn.
> 
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
> 
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150724/fcd27956/attachment.html>

From charupdate at orange.fr  Fri Jul 24 04:28:07 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 24 Jul 2015 11:28:07 +0200 (CEST)
Subject: Plain text custom fraction input
Message-ID: <1672445198.8924.1437730087572.JavaMail.www@wwinf1f21>

On 23 Jul 2015, at 22;35, Richard Wordingham  wrote: 

> > IMHO it would be hard to input fractions in nut style while using
> > plain text or normal formatting, at the extent that we need the
> > special Maths applications we know, from LibreOffice as far as I am
> > concerned. But that isn't plain text. With the font-supported plain
> > text fraction input as suggested, we can never get nut style,
> > unfortunately. This is inimaginable *in plain text*.
> 
> The Unicode does not distinguish 'nut' style and the 'slash'-based
> style. The problem is entirely one of rendering. A renderer could
> support the 'nut' style, just as renderers typically support
> underlining and strike-out with just a few numeric parameters from the
> font. 'Plain text' just means no formatting commands associated with
> the text - it doesn't prevent immense quantities of information being
> taken from a font, but it does prevent specification of which font to
> use.

?

I fully agree, even without knowing much about how a font works, precisely.


> 
> > > > If this input method is not encouraged, what's the use of U+215F
> > > > FRACTION NUMERATOR ONE?
> 
> > > It's for temporarily storing a character defined in some other
> > > coding standard.
> 
> > It would be interesting to know more about this standard, and what
> > was the use of this character in that standard, which seems to be
> > hard to retrieve. What do you mean by "temporarily", given that
> > Unicode code point allocations are stable?
> 
> The idea is that data is read in from an old encoding, manipulated, and
> written out in the old encoding. For long term use, it would be
> better to convert the data, though conversion may have to do more than
> just change the character sequence. You are correct in that the
> unconverted data may be held as such indefinitely.

?

Indeed Unicode was forced to encode a number of characters for the unique reason that these characters are a part of preceding standards with which backwards compatibility is to be ensured. That's the case for example of U+0149. This character looks a bit different when input as recommended and specified in the compatibility mapping, with letter apostrophe. This is more distant from the letter than the apostrophe as a part of the all-in-one apostrophe-en glyph. But that's a font issue, not a Unicode concern. As for the fraction numerator one, I'm still unsure about how it was used in this old standard. Perhaps subscripts were used to complete, so the plain text custom fraction input we're discussing would be compliant to this legacy standard. That's very interesting.


> 
> > I'm very puzzled. I'd
> > rather think that the inverse value as a "vulgar" fraction is so
> > important that an input facility is provided, intended to be
> > completed with subscript digits.
> 
> The standard answer is that in the Unicode scheme, that sort of
> capability should belong to the input mechanism. An example is
> the general refusal to encode new precomposed characters. Indeed, if
> renderers supported U+2044 (rather than just treating it as an ordinary
> character), input resources would be better employed supporting the
> input of U+2044.

?

I don't deny the usefulness of automatized fraction formatting following the detection of the presence of U+2044. 

Encoding any *new* precomposed characters or *new* characters that can be obtained by formatting some existing ones, is useless and resource-wasting. This is why it is refused. By contrast, plain text custom fractions input wholly relies on existing and largely implemented characters, by combining them in a "new" way. This time the quotes are scare quotes, because as I learned *after* my last reply yesterday, this is already a more or less well established practice. Please read the information with my apologies in the next e-mail.

?

Best regards,

?

Marcel
?

> Message du 23/07/15 22:35
> De : "Richard Wordingham" 
> A : "Unicode Mailing List" 
> Copie ? : 
> Objet : Re: Plain text custom fraction input
> 
> On Thu, 23 Jul 2015 11:45:14 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > On 23 Jul 2015, at 01:06, Richard Wordingham wrote:
> 
> > IMHO it would be hard to input fractions in nut style while using
> > plain text or normal formatting, at the extent that we need the
> > special Maths applications we know, from LibreOffice as far as I am
> > concerned. But that isn't plain text. With the font-supported plain
> > text fraction input as suggested, we can never get nut style,
> > unfortunately. This is inimaginable *in plain text*.
> 
> The Unicode does not distinguish 'nut' style and the 'slash'-based
> style. The problem is entirely one of rendering. A renderer could
> support the 'nut' style, just as renderers typically support
> underlining and strike-out with just a few numeric parameters from the
> font. 'Plain text' just means no formatting commands associated with
> the text - it doesn't prevent immense quantities of information being
> taken from a font, but it does prevent specification of which font to
> use.
> 
> > > > If this input method is not encouraged, what's the use of U+215F
> > > > FRACTION NUMERATOR ONE?
> 
> > > It's for temporarily storing a character defined in some other
> > > coding standard.
> 
> > It would be interesting to know more about this standard, and what
> > was the use of this character in that standard, which seems to be
> > hard to retrieve. What do you mean by "temporarily", given that
> > Unicode code point allocations are stable?
> 
> The idea is that data is read in from an old encoding, manipulated, and
> written out in the old encoding. For long term use, it would be
> better to convert the data, though conversion may have to do more than
> just change the character sequence. You are correct in that the
> unconverted data may be held as such indefinitely.
> 
> > I'm very puzzled. I'd
> > rather think that the inverse value as a "vulgar" fraction is so
> > important that an input facility is provided, intended to be
> > completed with subscript digits.
> 
> The standard answer is that in the Unicode scheme, that sort of
> capability should belong to the input mechanism. An example is
> the general refusal to encode new precomposed characters. Indeed, if
> renderers supported U+2044 (rather than just treating it as an ordinary
> character), input resources would be better employed supporting the
> input of U+2044.
> 
> Richard.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150724/b91ee933/attachment.html>

From charupdate at orange.fr  Fri Jul 24 04:33:46 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 24 Jul 2015 11:33:46 +0200 (CEST)
Subject: Plain text custom fraction input
Message-ID: <1326602911.9053.1437730426498.JavaMail.www@wwinf1f21>

The Plain text custom fraction input issue IMHO has so far been resolved at a certain level and to some extent.? It?s a bit complicated for me to explain.? As you already know, I?m still lacking the reflex of doing first a search on the internet.? Only after my last yesterday?s e-mail I did and was given the link to a Microsoft?Community wiki:
http://answers.microsoft.com/en-us/office/wiki/office_2013_release-word/styled-fractions-in-windows/4a07d5fa-2484-4e39-b1f3-70bb3eb0c332
where we find some information written up for Microsoft?Office users about the input of fractions using Unicode super- and subscripts along with the fraction slash.? For practice, very detailed step-by-step instructions show how to use the Special Characters dialog for this purpose as well as how to program in VBA the addition of a huge set of autocorrect entries, so that the user does not need to do more than to type a digits-slash-digits sequence to get it converted to a plain text fraction.? Macros are provided for download.

>From there on I?understood fortunately that Microsoft must really be one of the most user friendly IT?companies, given that it allows people to publish on its websites very detailed information about how to get ?styled fractions? [I'm now using the angle quotation marks, instead of mentioning that this is a quotation to make sure that nobody reads a submeaning in the quotes; please see my suggestion in the thread ?Global apostrophe solution??], well, how to get ?styled fractions? without using any formatting feature, just in Unicode-enriched plain text (by what I mean plain text using Unicode characters without any restriction), using fonts that wholly implement Unicode *and* are proportional (which point seems not to be specially mentioned).

By this search, I found also another page, where a cheerful Lady presents to the users of a given software not less than five methods of formatting digit-slash-digit sequences as fractions, but not mentioning by a single word the plain text input method.? As my goal is not to blame marketing strategies?and even less, to criticize the work of anyone who cares for the instruction and edification of the users?but to enhance user experience, neither the URL, nor the product name, nor the keywords nor the name of the search engine are disclosed here.

I?m very sorry to bring this information so late, after?not before?solliciting feedback from the List Subscribers, whom I thank for their kind replies and the many pieces of information I?would not have got aware of by just doing a search on the internet. But honestly it would have been correct to start the thread by bringing in *all* the information that can be at my reach. My apologies...

???

To complete this thread on fraction input, part of ?Input methods at the age of Unicode?, I?d like to mention one more way of using the keyboard.? As far as I understand, smart keyboard frameworks, of whom the only one I know is Keyman, allow to automatize what in Windows keyboard drivers is changing the shift state three times.? Along with all other useful toggles we can implement and figure out, Keyman lets us create what I'm calling a Fraction toggle.? Once the Fraction flag set, the layout converts all digits to superscripts, and the slash to U+2044.? The slash then sets the layout to another state where all digits are converted to subscripts, and typing a non-digit character would then set the keyboard back to its normal state.

I recall that this works in plain text, like this: ???, ???, ??????????????????????.? The font must contain the complete range of super- and subscripts (which it does normally when the fraction slash is present).? In fonts that have different glyphs for numerator/denominator and for superscript/subscript, the use of the precomposed fractions is discouraged for harmony and consistency if plain text custom fractions are input in the same document.

Font designers who have created superscript and subscript digits glyphs in OpenType fonts, are welcome to unveil the relationship between these and the numerator/denominator glyphs.? Developers who have programmed a fraction formatting feature in a rendering engine, are equally welcome to share how the common slash is given the slant of a fraction slash.

Best regards,
Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150724/8bb55698/attachment.html>

From charupdate at orange.fr  Fri Jul 24 04:53:28 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 24 Jul 2015 11:53:28 +0200 (CEST)
Subject: Plain text custom fraction input
Message-ID: <2028339936.9524.1437731608194.JavaMail.www@wwinf1f21>

Sorry, I'd forgotten to add two Addressees who had responded on this thread.


?

?

The Plain text custom fraction input issue IMHO has so far been resolved at a certain level and to some extent.? It?s a bit complicated for me to explain.? As you already know, I?m still lacking the reflex of doing first a search on the internet.? Only after my last yesterday?s e-mail I did and was given the link to a Microsoft?Community wiki:
http://answers.microsoft.com/en-us/office/wiki/office_2013_release-word/styled-fractions-in-windows/4a07d5fa-2484-4e39-b1f3-70bb3eb0c332
where we find some information written up for Microsoft?Office users about the input of fractions using Unicode super- and subscripts along with the fraction slash.? For practice, very detailed step-by-step instructions show how to use the Special Characters dialog for this purpose as well as how to program in VBA the addition of a huge set of autocorrect entries, so that the user does not need to do more than to type a digits-slash-digits sequence to get it converted to a plain text fraction.? Macros are provided for download.

>From there on I?understood fortunately that Microsoft must really be one of the most user friendly IT?companies, given that it allows people to publish on its websites very detailed information about how to get ?styled fractions? [I'm now using the angle quotation marks, instead of mentioning that this is a quotation to make sure that nobody reads a submeaning in the quotes; please see my suggestion in the thread ?Global apostrophe solution??], well, how to get ?styled fractions? without using any formatting feature, just in Unicode-enriched plain text (by what I mean plain text using Unicode characters without any restriction), using fonts that wholly implement Unicode *and* are proportional (which point seems not to be specially mentioned).

By this search, I found also another page, where a cheerful Lady presents to the users of a given software not less than five methods of formatting digit-slash-digit sequences as fractions, but not mentioning by a single word the plain text input method.? As my goal is not to blame marketing strategies?and even less, to criticize the work of anyone who cares for the instruction and edification of the users?but to enhance user experience, neither the URL, nor the product name, nor the keywords nor the name of the search engine are disclosed here.

I?m very sorry to bring this information so late, after?not before?solliciting feedback from the List Subscribers, whom I thank for their kind replies and the many pieces of information I?would not have got aware of by just doing a search on the internet. But honestly it would have been correct to start the thread by bringing in *all* the information that can be at my reach. My apologies...

???

To complete this thread on fraction input, part of ?Input methods at the age of Unicode?, I?d like to mention one more way of using the keyboard.? As far as I understand, smart keyboard frameworks, of whom the only one I know is Keyman, allow to automatize what in Windows keyboard drivers is changing the shift state three times.? Along with all other useful toggles we can implement and figure out, Keyman lets us create what I'm calling a Fraction toggle.? Once the Fraction flag set, the layout converts all digits to superscripts, and the slash to U+2044.? The slash then sets the layout to another state where all digits are converted to subscripts, and typing a non-digit character would then set the keyboard back to its normal state.

I recall that this works in plain text, like this: ???, ???, ??????????????????????.? The font must contain the complete range of super- and subscripts (which it does normally when the fraction slash is present).? In fonts that have different glyphs for numerator/denominator and for superscript/subscript, the use of the precomposed fractions is discouraged for harmony and consistency if plain text custom fractions are input in the same document.

Font designers who have created superscript and subscript digits glyphs in OpenType fonts, are welcome to unveil the relationship between these and the numerator/denominator glyphs.? Developers who have programmed a fraction formatting feature in a rendering engine, are equally welcome to share how the common slash is given the slant of a fraction slash.

Best regards,
Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150724/2fcb9a1d/attachment.html>

From frederic.grosshans at gmail.com  Fri Jul 24 04:59:23 2015
From: frederic.grosshans at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBHcm9zc2hhbnM=?=)
Date: Fri, 24 Jul 2015 11:59:23 +0200
Subject: BidiMirrored property  and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <20150723194250.1cc05710@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2>
Message-ID: <55B20C7B.5020000@gmail.com>

Le 23/07/2015 20:42, Richard Wordingham a ?crit :
> On Thu, 23 Jul 2015 12:00:06 +0200
> Fr?d?ric Grosshans <frederic.grosshans at gmail.com> wrote:
>
>> Le 23/07/2015 00:54, Richard Wordingham a ?crit :
>>> Which means that Ancient Egyptian hieroglyphs are unencoded!  Their
>>> default direction is right-to-left, but that's only the start of the
>>> trouble.  The encoded hieroglyphs aren't Bidi-mirrored, so if I
>>> embed then in a right-to-left override, I should get retrograde
>>> characters.
>> The text of the standard say that they should be mirrored in this
>> case. The version 7.0.0. has the following comment on Egyptian
>> hieroglyphs : (p424, p9 of pdf) :
>>
>>      ?When left-to-right directionality is overridden to display
>> Egyptian hieroglyphic text right to left, the glyphs should be
>> mirrored from those shown in the code charts.?
> The UCD may trump the core specification; I'm expecting to be advised
> not to trust anything in the core specification.
Would I be wrong in saying that ?which trumps which ?? is a short term 
question. However, in the long term,
a disgareement between the UCD and the core specification is either a 
bug to be corrected or a misunderstanding to be clarified.

>> Similar comments are present for other historic script (Italic,
>> Runic), but also Old North Arabian, which is encoded as RTL but
>> ?Glyphs may be mirrored in lines whenthey have left-to-right
>> directionality?. This kind of implementation at the font level is
>> perfectly possible and is indeed done sometimes (see e.g. Andrew
>> West?s anglo-saxon runic fonts
>> http://babelstone.co.uk/Fonts/AngloSaxon.html).
>> The BidiMirrored property is not adapted in this case because, it is
>> for a few ?characters such as parentheses? (Unicode8.0.0, ?4.7
>> p180=pf 23 of ch04.pdf), and it is thought for a LTR default : it can
>> in no way consider the case of Old North Arabian.
> There had been hope until today.
Well there is still hope, if the BidiMirrored property is amended or 
supplemented with another mechanism. What I meant is ?The current Y/N 
values of BidiMirrored cannot be used for mirroring scripts which are 
RTL by default, and at lest one such script exists in Unicode 7.0.0?
>> Extending this property for whole scripts would be a lot of work, and
>> should be more than a Y/N property as currently, [...]
>   
>> Currently a BidiMirrorred=N can mean anything of the above, and
>> BidiMirrored=Y means (1. or 4.).
> To be precise, having reread the Bidi algorithm, in particular L4 and
> HL6:
>
> 1) If resolved directionality is R and Bidi_Mirrored=Yes,
> mirroring is mandatory.
>
> 2) If resolved directionality is L and bidirectional type is not R
> or AL, mirroring is prohibited.
>
> 3) Otherwise, mirroring is optional.
Thanks for the check.
>
> It's odd that a font that reverses all the Hebrew letters is compliant
> with the Unicode standard.
Indeed !
>> he way, I think a comment should be added in the ?4.7 of the
>> standard to clarify that the BidiMirrored property is not intended
>> for cases like hieroglyphs or italic.
> That is a stupid and dangerous remark.
     My remark was on the BidiMirrored property itself, it was not 
intended to mean ?mirroring of ancient script is forbidden?. I wanted to 
say ?Don?t trust the BidiMirrored=N for ancient script : it does not 
mean that they should not be mirrored.?
> If the hieroglyphs had had the BidiMirrored property corrected to Yes,
> one could have had, in plain text, once fonts had caught up: [...]
Agreed. But you don?t need to have the BidiMirrored property to let the 
font catch up: Andrew West?s anglo-saxon runic font behave correctly 
when mirrored, and are Unicode compliant.

> Once your remark has been adopted in the Unicode Standard, the only
> way to get consistently oriented Ancient Egyptian in plain text is to:
>
> a) Add a complete set of right-to-left hieroglyphs.
> b) Add the retrograde hieroglyphs to each set.
That would be a very bad idea !
> One hopes that Egyptian Hieroglyphs is the only script for which
> mirroring or not has meaning.
You also have mirroring in Italic, Runic, Old North Arabian and probably 
many other scripts.

Let me rephrase my remark in a less ?stupid and dangerous? way.

    If a LTR character has the BidiMirrored=No property, it may either
    be mirrored or not when typeset in RTL, depending on other factors.
    Specifically, the BidiMirrored property has not been specified for
    ancient LTR scripts which are mirrored when RTL or boustrephodon,
    like Italic, Runic, Archaic Greek, Archaic Latin, Egyptian
    Hieroglyphs. Note that some RTL script, like Old North Arabian, are
    mirrored when LTR.

Is that better ? Once again, I agree that forbidding ancient Egyptian to 
be mirrored when  ?stupid and dangerous?

I (maybe naively) thought that the BidiMirrored=No property for 
hieroglyphs, runes, etc. in the UCD was volunteer. If it was not, do you 
think that the unicode consortium would consider some (if not all) of 
the following actions :

  *   accepting proposals to ?BidiMirror? relevant ancient scripts with
    no modern usage
  *   changing the BidiAlgorithm and BidiMirrored property (or
    BidiMirroredv2) to take into account Mirrored RTL scripts
  * Distinguish between ?never mirrored? caracters (Han), and ?Sometimes
    mirrored, unknown mirrored? (Latin? Most Indic ? Cyrillic ?)
  * Look into the security implication of all this for modern scripts

Of course, all that is a non negligible work.

     Fr?d?ric


From richard.wordingham at ntlworld.com  Fri Jul 24 11:17:28 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 24 Jul 2015 17:17:28 +0100
Subject: BidiMirrored property  and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <20150723194250.1cc05710@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2>
Message-ID: <20150724171728.287a5e32@JRWUBU2>

On Thu, 23 Jul 2015 19:42:50 +0100
Richard Wordingham <richard.wordingham at ntlworld.com> wrote:

> If the hieroglyphs had had the BidiMirrored property corrected to Yes,
> one could have had, in plain text, once fonts had caught up:
> 
> <U+132B9 EGYPTIAN HIEROGLYPH R008> for nt?r in normal left-to-right
> text <U+202B RIGHT-TO-LEFT EMBEDDING, U+132B9, U+202C POP DIRECTIONAL
> FORMATTING> for nt?r in retrograde left-to-right text
> 
> and embed whole paragraphs in <U+202B>...<U+202C> for right-to-left
> text.

Correction: Use U+202E RIGHT-TO-LEFT OVERRIDE, not U+202B!

Richard.


From kenwhistler at att.net  Fri Jul 24 11:28:05 2015
From: kenwhistler at att.net (Ken Whistler)
Date: Fri, 24 Jul 2015 09:28:05 -0700
Subject: BidiMirrored property  and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <55B20C7B.5020000@gmail.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
Message-ID: <55B26795.3020209@att.net>

On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote:
>
>
> Is that better ? Once again, I agree that forbidding ancient Egyptian 
> to be mirrored when  ?stupid and dangerous?

I can see that this thread seems to have gone off the rails a bit.

The Unicode Standard does not forbid Egyptian hieroglyphs from being
"mirrored" in a RTL layout context. The Unicode Bidirectional Algorithm
neither requires nor forbids that. It is simply out of scope.

First there is a general issue of general mirroring of body text for some
ancient scripts, which in paleographic contexts often followed conventions
(no longer seen, except in rare edge cases) of having the direction of
glyph orientation switch depending on line orientation. This is particularly
noted in epigraphic contexts in ancient scripts of the greater Mediterranean
area, but also occurs occasionally elsewhere. This general mirroring of
body text is *not* part of Unicode plain text. There are no UCD properties
defined for this, normative or informative, with either granularity at
the per-character basis or the per-script basis. And there is no algorithm
defined in the Unicode Standard to deal with this issue of paleography.
Note that for the most part, this general mirroring is not a *bi*directional
problem at all. It is a dextroverse versus sinistroverse layout issue, as
nearly all of this kind of epigraphic text does not occur in *bi*directional
contexts at all -- but rather in text where everything goes one direction.
(Lest the nitpickers immediately cite boustrophedon -- boustrophedon is
*also* not *bi*directional text -- it is a convention that alternates
dextroverse lines with sinistroverse lines, but does not mix directions on
single lines.)

Then there is the *specific* issue of bidirectional mirroring. That is
*different*. It is a normative part of the Unicode Bidirectional Algorithm,
it is controlled in applicability by specific rules and by exact 
specification
of the set of characters that have the Bidi_Mirrored=Y property in the UCD.
That property applies to all paired brackets (except 2 Arabic ornate
parentheses, for legacy reasons) and a set of non-symmetric mathematical
operators (but not to arrow symbols). The applicability of bidirectional
mirroring is mandatory and required by the Unicode Bidirectional
Algorithm, and is essential in the layout of *modern* text, because of
the very general problem of the interpretation of opening and closing for
directionally oriented brackets occurring in pairs, in text where mixed
directional runs may occur together on the same line of text.

These two concerns are *not* the same and should not be confused.
They are, however, commonly confused, because they both involve
"mirroring" of glyphs and have something to do with line layout direction.

>
> I (maybe naively) thought that the BidiMirrored=No property for 
> hieroglyphs, runes, etc. in the UCD was volunteer. 

It is not "volunteer". It is out of scope.

> If it was not, do you think that the unicode consortium would consider 
> some (if not all) of the following actions :
>
>  *   accepting proposals to ?BidiMirror? relevant ancient scripts with
>    no modern usage

This will not happen.

>  * changing the BidiAlgorithm and BidiMirrored property (or
>    BidiMirroredv2) to take into account Mirrored RTL scripts

This will not happen.

>  * Distinguish between ?never mirrored? caracters (Han), and ?Sometimes
>    mirrored, unknown mirrored? (Latin? Most Indic ? Cyrillic ?)

That is an issue for how to deal with the paleographic issues of
reversed direction body text. People can certainly head down that
direction and create databases of information about which scripts
do this, in which contexts and time periods. But it is completely
out of scope for the UBA. Note that even in scripts that have this
behavior paleographically, the occurrence of RTL versus LTR versions
may differ statistically over time and eventually die out in favor
of one direction or the other. See Old Italic. For that matter,
see ancient Greek, which had RTL, LTR, and boustrophedon, but
which eventually settled on strictly LTR layout.

--Ken

>
>


From asmusf at ix.netcom.com  Fri Jul 24 12:09:18 2015
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Fri, 24 Jul 2015 10:09:18 -0700
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <55B20C7B.5020000@gmail.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
Message-ID: <55B2713E.4030006@ix.netcom.com>

On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote:
> Let me rephrase my remark in a less ?stupid and dangerous? way.
>
>    If a LTR character has the BidiMirrored=No property, it may either
>    be mirrored or not when typeset in RTL, depending on other factors.
>    Specifically, the BidiMirrored property has not been specified for
>    ancient LTR scripts which are mirrored when RTL or boustrephodon,
>    like Italic, Runic, Archaic Greek, Archaic Latin, Egyptian
>    Hieroglyphs. Note that some RTL script, like Old North Arabian, are
>    mirrored when LTR. 

We do want "BidiMirrorred=No" to be honored; for example for the arrows
and the ornate parens. And we do not want that to be overridden

The issue with the ancient scripts (or any script used to capture
paleographic texts) seems to be primarily with letter shapes, not 
punctuation,
and further would apply only to unpaired forms.

A carefully written note would keep in scope all paired characters.

It would be nice if there was a property that covered them, but I'm afraid
that BidiMirroringGlyph does not cover the character pairs to use when
BidiMirrored=No and code points need to be substituted to get the RTL
layout correct. That kind of property would be useful for modern text,
e.g. to allow support for automatic re-layout from RTL to LTR and vice
versa for texts containing arrows.

Declaring all unpaired code points overridable "in certain contexts" or
"depending on other factors" might then work.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150724/373e7a80/attachment.html>

From doug at ewellic.org  Fri Jul 24 14:13:28 2015
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 24 Jul 2015 12:13:28 -0700
Subject: Plain text custom fraction input
Message-ID: <20150724121328.665a7a7059d7ee80bb4d670165c8327d.f4e5f76ddd.wbe@email03.secureserver.net>

Marcel Schneider <charupdate at orange dot fr> wrote:

> Representing fractions as U+2044 is known as a compatibility mapping,
> equally like representing a superscript as , while (I go on checking
> my knowledge...) representing a precomposed diacriticized letter as is
> known as a decomposition mapping. The difference between the two ways
> of getting the same thing is in plain text. With decomposition we stay
> in plain text, while compatibility mappings need formatting, thus
> leaving the field of plain text. 

It's not a matter of one being plain text and the other not. Read
Section 3.7, "Decomposition" [1] to learn about canonical and
compatibility decomposition.

In general, the Glossary [2] and FAQ [3] are useful resources.

[1] http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G729
[2] http://www.unicode.org/glossary/
[3] http://www.unicode.org/faq/

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From richard.wordingham at ntlworld.com  Fri Jul 24 14:29:58 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 24 Jul 2015 20:29:58 +0100
Subject: BidiMirrored property  and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <55B26795.3020209@att.net>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B26795.3020209@att.net>
Message-ID: <20150724202958.5bf3399f@JRWUBU2>

On Fri, 24 Jul 2015 09:28:05 -0700
Ken Whistler <kenwhistler at att.net> wrote:

> First there is a general issue of general mirroring of body text for
> some ancient scripts, which in paleographic contexts often followed
> conventions (no longer seen, except in rare edge cases) of having the
> direction of glyph orientation switch depending on line orientation.

Direction switching is commonplace in didactic text for Ancient Egyptian
in modern texts.  Right-to-left text is also natural when showing how
to normalise hieratic or demotic to hieroglyphs.

> It is a dextroverse versus
> sinistroverse layout issue, as nearly all of this kind of epigraphic
> text does not occur in *bi*directional contexts at all -- but rather
> in text where everything goes one direction.

Remember that parentheses in pure Arabic or Hebrew text without numbers
are also mirrored.  The same would apply for N'ko, where numbers are
also right-to-left.

Please remind us of the purpose of RLO and LRO.  Are you suggesting
that their use may be 'out of scope' in some contexts?

Recall Bidi rule L4:

"A character is depicted by a mirrored glyph if and
only if (a) the resolved directionality of that character is R, and (b)
the Bidi_Mirrored property value of that character is Yes.

The Bidi_Mirrored property is defined by Section 4.7, Bidi Mirrored of
[Unicode]; the property values are specified in [UCD]. This rule can be
overridden in certain cases; see HL6."

The higher-level protocols are beyond the control of a supplier of
plain text.  It is not good that they may be kept secret from the
user displaying the text, as would often be the case defined by a
protocol that says that the font automatically selected defines the
mirroring or not.

> Note that even in scripts that have this
> behavior paleographically, the occurrence of RTL versus LTR versions
> may differ statistically over time and eventually die out in favor
> of one direction or the other. See Old Italic. For that matter,
> see ancient Greek, which had RTL, LTR, and boustrophedon, but
> which eventually settled on strictly LTR layout.

The question is about controlling mirroring when the 'abnormal'
direction (largely as defined by the UCD) is used, not whether it is
used.

Richard.

From richard.wordingham at ntlworld.com  Fri Jul 24 15:23:52 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 24 Jul 2015 21:23:52 +0100
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <55B2713E.4030006@ix.netcom.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com>
Message-ID: <20150724212352.101e5030@JRWUBU2>

On Fri, 24 Jul 2015 10:09:18 -0700
Asmus Freytag <asmusf at ix.netcom.com> wrote:

> On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote:
> > Let me rephrase my remark in a less ?stupid and dangerous? way.
> >
> >    If a LTR character has the BidiMirrored=No property, it may
> > either be mirrored or not when typeset in RTL, depending on other
> > factors. Specifically, the BidiMirrored property has not been
> > specified for ancient LTR scripts which are mirrored when RTL or
> > boustrephodon, like Italic, Runic, Archaic Greek, Archaic Latin,
> > Egyptian Hieroglyphs. Note that some RTL script, like Old North
> > Arabian, are mirrored when LTR. 

> We do want "BidiMirrorred=No" to be honored; for example for the
> arrows and the ornate parens. And we do not want that to be overridden

And at present, that may be overridden in a right-to-left context!

I think Fr?d?ric meant Bidi_Class=Left_To_Right by 'LTR', in which
case the only paired arrows included are U+2347 APL FUNCTIONAL SYMBOL
QUAD LEFTWARDS ARROW and U+2348 APL FUNCTIONAL SYMBOL QUAD
RIGHTWARDS ARROW.  It's definitely appropriate for U+101D9 PHAISTOS DISC
SIGN ARROW.

> The issue with the ancient scripts (or any script used to capture
> paleographic texts) seems to be primarily with letter shapes, not 
> punctuation,
> and further would apply only to unpaired forms.
> 
> A carefully written note would keep in scope all paired characters.
> 
> It would be nice if there was a property that covered them, but I'm
> afraid that BidiMirroringGlyph does not cover the character pairs to
> use when BidiMirrored=No and code points need to be substituted to
> get the RTL layout correct. That kind of property would be useful for
> modern text, e.g. to allow support for automatic re-layout from RTL
> to LTR and vice versa for texts containing arrows.

Microsoft has frozen BidiMirroringGlyph.  Text rendering honours it up
to Unicode 5.1 (I think), but thereafter it's up to the font.  That may
be appropriate for some bidirectional writing systems - I dimly recall
that mirroring had a tendency to fail with some letters.

> Declaring all unpaired code points overridable "in certain contexts"
> or "depending on other factors" might then work.

I think a stronger indication is needed.  U+2044 FRACTION SLASH had
better not be overridable between European numbers or between Arabic
numbers, for with a generally linear layout the number on the left is
the numerator and the number on the right is the denominator.  Am I
missing something on the options for this character in a wider
right-to-left context?  A sequence looking like (numerator, on right)
(backslash) (denominator, on left) seems to be known in Arabic maths.

I think it is useful to gather the information together in one list,
albeit informative.

Richard.


From eliz at gnu.org  Sat Jul 25 02:14:58 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 25 Jul 2015 10:14:58 +0300
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <20150724212352.101e5030@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
Message-ID: <83380c3dzh.fsf@gnu.org>

> Date: Fri, 24 Jul 2015 21:23:52 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> On Fri, 24 Jul 2015 10:09:18 -0700
> Asmus Freytag <asmusf at ix.netcom.com> wrote:
> 
> > On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote:
> > > Let me rephrase my remark in a less ?stupid and dangerous? way.
> > >
> > >    If a LTR character has the BidiMirrored=No property, it may
> > > either be mirrored or not when typeset in RTL, depending on other
> > > factors. Specifically, the BidiMirrored property has not been
> > > specified for ancient LTR scripts which are mirrored when RTL or
> > > boustrephodon, like Italic, Runic, Archaic Greek, Archaic Latin,
> > > Egyptian Hieroglyphs. Note that some RTL script, like Old North
> > > Arabian, are mirrored when LTR. 
> 
> > We do want "BidiMirrorred=No" to be honored; for example for the
> > arrows and the ornate parens. And we do not want that to be overridden
> 
> And at present, that may be overridden in a right-to-left context!

What do you mean by "overridden" in this context?  AFAIK, mirroring
indeed depends on context, but a character whose BidiMirrorred
property is No will _never_ be mirrored, according to the UBA.  There
are no overrides for that property, AFAIK.

From richard.wordingham at ntlworld.com  Sat Jul 25 02:17:07 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 25 Jul 2015 08:17:07 +0100
Subject: BidiMirrored property  and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <9f243f7007da4b13aece8f6226cf3a2c@DFM-TK5MBX15-06.exchange.corp.microsoft.com>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B26795.3020209@att.net> <20150724202958.5bf3399f@JRWUBU2>
 <9f243f7007da4b13aece8f6226cf3a2c@DFM-TK5MBX15-06.exchange.corp.microsoft.com>
Message-ID: <20150725081707.66c86ac6@JRWUBU2>

On Fri, 24 Jul 2015 23:11:24 +0000
Murray Sargent <murrays at exchange.microsoft.com> wrote:

> Richard questions when mirroring is used. As Ken points out, in
> modern BiDi text, such as Arabic and Hebrew, the answer is given by
> the Unicode BiDi Algorithm and associated tables. In ancient scripts
> and in Boustrophedon, it's given by a higher level protocol.

Do you just mean it's determined by the font?  Please give me an actual
example of any other higher level protocol. 

So far as I am aware, in OpenType, anything beyond the Unicode 5.1 Bidi
Mirroring Glyph property actually resides in the font, in features
ltrm, ltra, rtlm and rtla. According to the documentation
(https://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl),
these features are applied automatically whenever mirroring appears
to be appropriate for a run.  I'd guess that this means for a resolved
level greater than zero.

Boustrophedon could be given by a higher level protocol.  Are there any
examples of such a higher level protocol?  There are issues with
DIY implementations - text with commas and contour integrals would come
unstuck!  More seriously, I believe there may be a minority of letters
which don't mirror in writing systems where mirroring otherwise happens.

> The UBA wasn't designed to handle mirroring for those scripts.

It wasn't designed for N'ko or Kharoshthi either.  It just happens
to work for them as well.  What is true is that the properties of
Egyptian hieroglyphs weren't designed to work with the UBA.  It is
also true that they weren't set up to work as a script - they're
currently more like mathematical symbols.  As I've noted, there is
work in progress to enable the writing of 'plain-text' Egyptian. 

The UBA's got LRO and RLO.  While not ideal, they ought to work for
scripts where both directions are regularly used. Of course, footnote
numbers would complicate matters, but here we would be getting away
from plain text.

Richard.

From richard.wordingham at ntlworld.com  Sat Jul 25 02:44:22 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 25 Jul 2015 08:44:22 +0100
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <83380c3dzh.fsf@gnu.org>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org>
Message-ID: <20150725084422.2b96491b@JRWUBU2>

On Sat, 25 Jul 2015 10:14:58 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > Asmus Freytag <asmusf at ix.netcom.com> wrote:

> > > We do want "BidiMirrorred=No" to be honored; for example for the
> > > arrows and the ornate parens. And we do not want that to be
> > > overridden
 
> > And at present, that may be overridden in a right-to-left context!

> What do you mean by "overridden" in this context?  AFAIK, mirroring
> indeed depends on context, but a character whose BidiMirrorred
> property is No will _never_ be mirrored, according to the UBA.  There
> are no overrides for that property, AFAIK.

Reread the Bidi algorithm, especially
http://www.unicode.org/reports/tr9/#L4 and
http://www.unicode.org/reports/tr9/#HL6.

In principle, I could have a higher-level protocol that mirrors lamedh
on Wednesdays, but I must follow the rules for parentheses. 

It's part of the tendency to write specifications as 'Do what you want,
but we recommend...'.  It eliminates non-compliances without increasing
compatibility.

Richard,

From eliz at gnu.org  Sat Jul 25 02:51:19 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 25 Jul 2015 10:51:19 +0300
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <20150725084422.2b96491b@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
Message-ID: <83r3nw1xqg.fsf@gnu.org>

> Date: Sat, 25 Jul 2015 08:44:22 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> On Sat, 25 Jul 2015 10:14:58 +0300
> Eli Zaretskii <eliz at gnu.org> wrote:
> 
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > > Asmus Freytag <asmusf at ix.netcom.com> wrote:
> 
> > > > We do want "BidiMirrorred=No" to be honored; for example for the
> > > > arrows and the ornate parens. And we do not want that to be
> > > > overridden
>  
> > > And at present, that may be overridden in a right-to-left context!
> 
> > What do you mean by "overridden" in this context?  AFAIK, mirroring
> > indeed depends on context, but a character whose BidiMirrorred
> > property is No will _never_ be mirrored, according to the UBA.  There
> > are no overrides for that property, AFAIK.
> 
> Reread the Bidi algorithm, especially
> http://www.unicode.org/reports/tr9/#L4 and
> http://www.unicode.org/reports/tr9/#HL6.
> 
> In principle, I could have a higher-level protocol that mirrors lamedh
> on Wednesdays, but I must follow the rules for parentheses. 

I don't see how this is related.  What HL6 describes is something that
should make sense.  For example, Emacs uses '/' as a kind of
"mirrored" '\', when it needs to indicate that a line in an R2L
paragraph is continued on the next screen line.

By contrast, indiscriminately mirroring random characters that don't
really have mirrored glyphs, in the context of modern scripts, doesn't
make any sense, IMO, so it should never be done.

> It's part of the tendency to write specifications as 'Do what you want,
> but we recommend...'.  It eliminates non-compliances without increasing
> compatibility.

Just say no.

From richard.wordingham at ntlworld.com  Sat Jul 25 04:11:02 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 25 Jul 2015 10:11:02 +0100
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <83r3nw1xqg.fsf@gnu.org>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org>
Message-ID: <20150725101102.16dbf4ed@JRWUBU2>

On Sat, 25 Jul 2015 10:51:19 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Reread the Bidi algorithm, especially
> > http://www.unicode.org/reports/tr9/#L4 and
> > http://www.unicode.org/reports/tr9/#HL6.
> > 
> > In principle, I could have a higher-level protocol that mirrors
> > lamedh on Wednesdays, but I must follow the rules for parentheses. 
> 
> I don't see how this is related.  What HL6 describes is something that
> should make sense.  For example, Emacs uses '/' as a kind of
> "mirrored" '\', when it needs to indicate that a line in an R2L
> paragraph is continued on the next screen line.

HL6 reads:

"Certain characters that do not have the Bidi_Mirrored property can also
be depicted by a mirrored glyph in specialized contexts. Such contexts
include, but are not limited to, historic scripts and associated
punctuation, private-use characters, and characters in mathematical
expressions. (See Section 7, Mirroring.) These characters are those
that fit at least one of the following conditions:

1) Characters with a resolved directionality of R
2) Characters with a resolved directionality of L and whose
bidirectional type is R or AL"

The logic of my statement is as follows:

a) 'Specialised contexts' is undefined; 'specialised context' may
therefore include 'whenever I see fit'.
b) The bidirectional type of lamedh is 'R', and it will always have
a resolved directionality.  The resolved directionalities are 'L' and
'R'.
c) Therefore I may choose to mirror all lamedhs on Wednesdays.

Similarly, an arrow with a resolved directionality of R may be mirrored
if a higher level protocol so dictates.

The issue lies with the wording of condition (1).  One might expect it
to apply only to characters with a bidirectional type of L.  That should
work for text whose directionality is known when written.

It would be interesting to hear the rationale for the wording.  My
surmise is that it attempts to address text whose directionality is not
known before rendering.  The most obvious example would be where an
application is laying out boustrophedon text in. The author would not
be able to correctly choose between COMMA and REVERSED COMMA (an
anachronistic example) depending on text direction if line-breaks were
not fixed.

Richard.

From eliz at gnu.org  Sat Jul 25 04:52:53 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 25 Jul 2015 12:52:53 +0300
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <20150725101102.16dbf4ed@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
Message-ID: <83mvyk1s3u.fsf@gnu.org>

> Date: Sat, 25 Jul 2015 10:11:02 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> On Sat, 25 Jul 2015 10:51:19 +0300
> Eli Zaretskii <eliz at gnu.org> wrote:
> 
> > > Reread the Bidi algorithm, especially
> > > http://www.unicode.org/reports/tr9/#L4 and
> > > http://www.unicode.org/reports/tr9/#HL6.
> > > 
> > > In principle, I could have a higher-level protocol that mirrors
> > > lamedh on Wednesdays, but I must follow the rules for parentheses. 
> > 
> > I don't see how this is related.  What HL6 describes is something that
> > should make sense.  For example, Emacs uses '/' as a kind of
> > "mirrored" '\', when it needs to indicate that a line in an R2L
> > paragraph is continued on the next screen line.
> 
> HL6 reads:
> 
> "Certain characters that do not have the Bidi_Mirrored property can also
> be depicted by a mirrored glyph in specialized contexts. Such contexts
> include, but are not limited to, historic scripts and associated
> punctuation, private-use characters, and characters in mathematical
> expressions. (See Section 7, Mirroring.) These characters are those
> that fit at least one of the following conditions:
> 
> 1) Characters with a resolved directionality of R
> 2) Characters with a resolved directionality of L and whose
> bidirectional type is R or AL"

Yes.

> The logic of my statement is as follows:
> 
> a) 'Specialised contexts' is undefined; 'specialised context' may
> therefore include 'whenever I see fit'.

No.  HLn clauses are for implementations that use their specialized
logic on top of the UBA-mandated behavior.  That logic must make
sense, in the context of the implemented functionality.  "Whenever I
see fit" doesn't fulfill that requirement, certainly not when the
implementation has anything to do with presenting human-readable text.

> b) The bidirectional type of lamedh is 'R', and it will always have
> a resolved directionality.  The resolved directionalities are 'L' and
> 'R'.

But it doesn't have a mirrored glyph, at least not in most fonts.

> c) Therefore I may choose to mirror all lamedhs on Wednesdays.

If your implementation's purpose is to illustrate random permutations
of glyphs, or artificially scrambling the text appearance, maybe.  But
if the implementation's purpose is to present a legible text using
that character in some modern script, then no, it makes no sense and
would be perceived as a bug.  Although it'd probably be rendered "not
guilty for lack of evidence" in a court of UBA law.

> Similarly, an arrow with a resolved directionality of R may be mirrored
> if a higher level protocol so dictates.

Again, you'd have to present a protocol that makes sense in the
context of the specific implementation.  Otherwise, it's a bug.

> The issue lies with the wording of condition (1).  One might expect it
> to apply only to characters with a bidirectional type of L.

I see no reason to restrict this to L characters.  I'd be interested
to hear your rationale for that.

> My surmise is that it attempts to address text whose directionality
> is not known before rendering.

Indeed, UBA mirroring is only relevant to neutral characters.

> The most obvious example would be where an application is laying out
> boustrophedon text in.

I don't think so.  I agree with those who maintain that boustrophedon
is unidirectional text, and so out of scope for the UBA.

From charupdate at orange.fr  Sat Jul 25 05:38:39 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 25 Jul 2015 12:38:39 +0200 (CEST)
Subject: Plain text custom fraction input
Message-ID: <1685728552.8207.1437820719699.JavaMail.www@wwinf1e23>

On 24 Jul 2015, at 21:24, Doug Ewell  wrote:

> It's not a matter of one being plain text and the other not. Read
> Section 3.7, "Decomposition" [1] to learn about canonical and
> compatibility decomposition.
> 
> In general, the Glossary [2] and FAQ [3] are useful resources.
> 
> [1] http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G729
> [2] http://www.unicode.org/glossary/
> [3] http://www.unicode.org/faq/

Thank you, this is indeed indispensible to know, I'll try to get the time of learning thoroughly how it works and how not to abuse of terminology.

Best regards,

Marcel 

P.S.: Below I'll try to recomplete my last e-mail as it was when I wrote it in plain text, before applying the font formatting. The use of lt/gt as angle brackets is very tricky because engines may confuse them with valid HTML tags and make disappear the whole. We can type them as & l t ;? and? & g t ; but this is not safe. Now I'll use curly and square brackets.

?

> Marcel Schneider wrote:
> 
> > Representing fractions as {fraction} [digit] U+2044 [digit] is known as a compatibility mapping,
> > equally like representing a superscript as {super} [digit] , while (I go on checking
> > my knowledge...) representing a precomposed diacriticized letter as [letter] [combining diacritic] is
> > known as a decomposition mapping. The difference between the two ways
> > of getting the same thing is in plain text. With decomposition we stay
> > in plain text, while compatibility mappings need formatting, thus
> > leaving the field of plain text. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150725/b432abc7/attachment.html>

From richard.wordingham at ntlworld.com  Sat Jul 25 08:36:51 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 25 Jul 2015 14:36:51 +0100
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <83mvyk1s3u.fsf@gnu.org>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org>
Message-ID: <20150725143651.059e466a@JRWUBU2>

On Sat, 25 Jul 2015 12:52:53 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Date: Sat, 25 Jul 2015 10:11:02 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > 
> > On Sat, 25 Jul 2015 10:51:19 +0300
> > Eli Zaretskii <eliz at gnu.org> wrote:

> If your implementation's purpose is to illustrate random permutations
> of glyphs, or artificially scrambling the text appearance, maybe.

Obviously the purpose would be to demonstrate that a cart and horses
can be driven through the Unicode standard.

> But
> if the implementation's purpose is to present a legible text using
> that character in some modern script, then no, it makes no sense and
> would be perceived as a bug.  Although it'd probably be rendered "not
> guilty for lack of evidence" in a court of UBA law.

No, it should be "not guilty because acting lawfully".

> > Similarly, an arrow with a resolved directionality of R may be
> > mirrored if a higher level protocol so dictates.
> 
> Again, you'd have to present a protocol that makes sense in the
> context of the specific implementation.  Otherwise, it's a bug.

No, it's a feature. :-)  It's only a bug if there's a requirement to be
fit for purpose.  If the purpose of the implementation is to gobble up
disk space, then it's not a bug.

> > The issue lies with the wording of condition (1).  One might expect
> > it to apply only to characters with a bidirectional type of L.

> I see no reason to restrict this to L characters.  I'd be interested
> to hear your rationale for that.

A) A strong character's form in the corresponding directional context
is the form identified by the Unicode charts.  If it is of type AL or
R, it will , by definition, not be mirrored.

B) A weak or neutral character's form in the charts is the form that
occurs in the left-to-right direction.  Such a character has
Bidi-mirrored set to Yes if it has different forms for left-to-right and
right-to-left.  By rule L4, it will be mirrored if it receives a
resolved direction of R.

C) A character of type L may need to be mirrored if it receives a
resolved directionality of R.  The most notable example is Egyptian
hieroglyphs, but the same applies to Greek.

There is a definite hole in my argument for non-spacing marks; marks
used primarily in the Arabic script are shown in a form they take in a
right-to-left context.

> 
> > My surmise is that it attempts to address text whose directionality
> > is not known before rendering.
> 
> Indeed, UBA mirroring is only relevant to neutral characters.

Then how do you explain condition (2):

"Characters with a resolved directionality of L and whose
bidirectional type is R or AL"

Obviously these characters are not neutral characters.  The only way
they can acquire a resolved directionality of R is by application of
RLO. 

> I don't think so.  I agree with those who maintain that boustrophedon
> is unidirectional text, and so out of scope for the UBA.

There are three main parts to the UBA:

1) Interpreting the text as nested runs of text in the same order.

2) Sorting out the left-to-right order in which to write them (L2)

3) Sorting out mirroring (L4)

Interpreting LRO and RLO is part of (1).  I'd like to know what the
justification for have directionality overrides is.

Now, ancient boustrophedon text, to the best of my knowledge, does not
need parts 1 to 2.  Modern numerical place notation should be a problem
when writing boustrophedon.  Boustrophedon starts from the assumption
that text has an order from start to finish, but numbers in place
notation have a left and a right.

Where we may part company is in our view of Hebrew text (no Arabic
numbers) with parentheses in a right-to-left paragraph.  I think such
text is really just as unidirectional as equivalent Latin text in a
left-to-right paragraph.  However, one needs the UBA to sort out the
rendering of the parentheses in the Hebrew text.  Indeed, one may rely
on the bidi algorithm to declare the Latin example unidirectional.

If one can determine that text to be rendered boustrophedon is genuinely
'unidirectional', it seems entirely reasonable to call upon the Bidi
algorithm to sort out the mirroring of glyphs on a *line* once one has
chosen the direction of a line.

Where we may have a problem is that the Latin and Hebrew commas have
the same codepoint, *despite* having the same appearance.

I can accept is that the handling a mixture of boustrophedon,
left-to-right and right-to-left text is to much to ask of the Bidi
algorithm.  The very first problem is that of defining what would
constitute unidirectional boustrophedon text

Richard.

From eliz at gnu.org  Sat Jul 25 09:26:14 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 25 Jul 2015 17:26:14 +0300
Subject: BidiMirrored property and ancient scripts (Was Re: Plain text
 custom fraction input)
In-Reply-To: <20150725143651.059e466a@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
Message-ID: <83fv4c1fg9.fsf@gnu.org>

> Date: Sat, 25 Jul 2015 14:36:51 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> > > The issue lies with the wording of condition (1).  One might expect
> > > it to apply only to characters with a bidirectional type of L.
> 
> > I see no reason to restrict this to L characters.  I'd be interested
> > to hear your rationale for that.
> 
> A) A strong character's form in the corresponding directional context
> is the form identified by the Unicode charts.  If it is of type AL or
> R, it will , by definition, not be mirrored.
> 
> B) A weak or neutral character's form in the charts is the form that
> occurs in the left-to-right direction.  Such a character has
> Bidi-mirrored set to Yes if it has different forms for left-to-right and
> right-to-left.  By rule L4, it will be mirrored if it receives a
> resolved direction of R.
> 
> C) A character of type L may need to be mirrored if it receives a
> resolved directionality of R.  The most notable example is Egyptian
> hieroglyphs, but the same applies to Greek.

Mirroring is not changing a character's shape.  It is a replacement of
a character's glyph with a glyph of a different character.

Thus, your reasons make no sense to me, because a character's shape,
any character's shape, be it L, R, AL, or anything else, is immutable.

> There is a definite hole in my argument for non-spacing marks; marks
> used primarily in the Arabic script are shown in a form they take in a
> right-to-left context.

I don't think it's a hole.  I think your interpretation of this is
entirely wrong.

> > > My surmise is that it attempts to address text whose directionality
> > > is not known before rendering.
> > 
> > Indeed, UBA mirroring is only relevant to neutral characters.
> 
> Then how do you explain condition (2):
> 
> "Characters with a resolved directionality of L and whose
> bidirectional type is R or AL"

I never saw an example of it.  Can you show something like that?

Note that those conditions are "at least one of", so they are not all
required to be true at the same time.

> Obviously these characters are not neutral characters.  The only way
> they can acquire a resolved directionality of R is by application of
> RLO.

You mean, resolved directionality of L and LRO, right?

Anyway, let's talk about a concrete example of applying this rule,
shall we?  I'm guessing this is for some very specific characters in a
script I never used.

> > I don't think so.  I agree with those who maintain that boustrophedon
> > is unidirectional text, and so out of scope for the UBA.
> 
> There are three main parts to the UBA:
> 
> 1) Interpreting the text as nested runs of text in the same order.

I take it that by this you mean resolving the level of each
character.  To me, that is the main part of the UBA; all the rest is
almost trivial.

> 2) Sorting out the left-to-right order in which to write them (L2)
> 
> 3) Sorting out mirroring (L4)
> 
> Interpreting LRO and RLO is part of (1).  I'd like to know what the
> justification for have directionality overrides is.

One justification is when you want to present characters in some
particular order that overrides their innate bidirectional properties.
For example, imagine you want to tell your readers what will some
bidirectional text look like after reordering by the UBA, and you want
to do that without relying on the UBA implementation of whatever
software is used to view your presentation.

> Where we may part company is in our view of Hebrew text (no Arabic
> numbers) with parentheses in a right-to-left paragraph.  I think such
> text is really just as unidirectional as equivalent Latin text in a
> left-to-right paragraph.

No, not as soon as numbers or Latin characters are involved, IMO.

> However, one needs the UBA to sort out the rendering of the
> parentheses in the Hebrew text.

Not really, you can short-cut it, the same as in strictly
left-to-right text.

> Indeed, one may rely on the bidi algorithm to declare the Latin
> example unidirectional.

One might, but to what purpose and goal?

> If one can determine that text to be rendered boustrophedon is genuinely
> 'unidirectional', it seems entirely reasonable to call upon the Bidi
> algorithm to sort out the mirroring of glyphs on a *line* once one has
> chosen the direction of a line.

No, not as soon as characters of different or weak/neutral
directionality are involved, IMO.

From richard.wordingham at ntlworld.com  Sat Jul 25 12:27:26 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 25 Jul 2015 18:27:26 +0100
Subject: BidiMirrored property and ancient scripts
In-Reply-To: <83fv4c1fg9.fsf@gnu.org>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
 <83fv4c1fg9.fsf@gnu.org>
Message-ID: <20150725182726.533d4b78@JRWUBU2>

On Sat, 25 Jul 2015 17:26:14 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Date: Sat, 25 Jul 2015 14:36:51 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > 
> > > > The issue lies with the wording of condition (1).  One might
> > > > expect it to apply only to characters with a bidirectional type
> > > > of L.
> > 
> > > I see no reason to restrict this to L characters.  I'd be
> > > interested to hear your rationale for that.
> > 
> > A) A strong character's form in the corresponding directional
> > context is the form identified by the Unicode charts.  If it is of
> > type AL or R, it will , by definition, not be mirrored.
> > 
> > B) A weak or neutral character's form in the charts is the form that
> > occurs in the left-to-right direction.  Such a character has
> > Bidi-mirrored set to Yes if it has different forms for
> > left-to-right and right-to-left.  By rule L4, it will be mirrored
> > if it receives a resolved direction of R.
> > 
> > C) A character of type L may need to be mirrored if it receives a
> > resolved directionality of R.  The most notable example is Egyptian
> > hieroglyphs, but the same applies to Greek.
> 
> Mirroring is not changing a character's shape.  It is a replacement of
> a character's glyph with a glyph of a different character.

Mirroring is changing a glyph to suitable for reading in the other
direction.  Note the following extract from BidiMirroring.txt in the
Unicode Character Database:

<quote>
# The following characters have no appropriate mirroring character.
# For these characters it is up to the rendering system 
#   to provide mirrored glyphs.

# 2140; DOUBLE-STRUCK N-ARY SUMMATION
# 2201; COMPLEMENT
# 2202; PARTIAL DIFFERENTIAL
<snip/>
</quote>

> Thus, your reasons make no sense to me, because a character's shape,
> any character's shape, be it L, R, AL, or anything else, is immutable.

So go back and reread.
 
> > There is a definite hole in my argument for non-spacing marks; marks
> > used primarily in the Arabic script are shown in a form they take
> > in a right-to-left context.
 
> I don't think it's a hole.  I think your interpretation of this is
> entirely wrong.

 
> > > > My surmise is that it attempts to address text whose
> > > > directionality is not known before rendering.
> > > 
> > > Indeed, UBA mirroring is only relevant to neutral characters.
> > 
> > Then how do you explain condition (2):
> > 
> > "Characters with a resolved directionality of L and whose
> > bidirectional type is R or AL"
> 
> I never saw an example of it.  Can you show something like that?

Fr?d?ric gave the example of Old North Arabian - there are samples at
http://www.mnh.si.edu/epigraphy/e_pre-islamic/safaitic.htm


> Note that those conditions are "at least one of", so they are not all
> required to be true at the same time.

Obviously, since a character cannot simultaneously have both resolved
directions.

> > Obviously these characters are not neutral characters.  The only way
> > they can acquire a resolved directionality of R is by application of
> > RLO.
> 
> You mean, resolved directionality of L and LRO, right?

Sorry, you're correct.

> Anyway, let's talk about a concrete example of applying this rule,
> shall we?  I'm guessing this is for some very specific characters in a
> script I never used.

I rather suspect it's for all current characters in a script you never
used.  Given half a chance, a script with weak directionality will be
encoded with Bidi-class L letters.  Old North Arabian has squeezed in
as a right-to-left script.

> > > I don't think so.  I agree with those who maintain that
> > > boustrophedon is unidirectional text, and so out of scope for the
> > > UBA.
> > 
> > There are three main parts to the UBA:
> > 
> > 1) Interpreting the text as nested runs of text in the same order.
> 
> I take it that by this you mean resolving the level of each
> character.  To me, that is the main part of the UBA; all the rest is
> almost trivial.

The nesting is implied by the levels, but the levels are just a means
to store the nesting and an elegant way of storing the direction.
There is a distressing tendency of Unicode algorithms to just record
the algorithm, rather than to explain what is being done.  Perfectly
intelligible steps can end up looking like an arcane dance.

> > 2) Sorting out the left-to-right order in which to write them (L2)
> > 
> > 3) Sorting out mirroring (L4)
> > 
> > Interpreting LRO and RLO is part of (1).  I'd like to know what the
> > justification for have directionality overrides is.
> 
> One justification is when you want to present characters in some
> particular order that overrides their innate bidirectional properties.
> For example, imagine you want to tell your readers what will some
> bidirectional text look like after reordering by the UBA, and you want
> to do that without relying on the UBA implementation of whatever
> software is used to view your presentation.

Brute force layout!  That makes it seem that overriding strong types
was an error that leaves people hoping for support for switching text
direction.

> > Where we may part company is in our view of Hebrew text (no Arabic
> > numbers) with parentheses in a right-to-left paragraph.  I think
> > such text is really just as unidirectional as equivalent Latin text
> > in a left-to-right paragraph.
> 
> No, not as soon as numbers or Latin characters are involved, IMO.

My example, which your e-mail client may take as being in a
left-to-right paragraph, is:

????? ????? / ???? (?? ?????? ?????? ??? ??????)

> > However, one needs the UBA to sort out the rendering of the
> > parentheses in the Hebrew text.

> Not really, you can short-cut it, the same as in strictly
> left-to-right text.

It's the UBA that mandates that the opening and closing parentheses be
rendered like right and left parentheses respectively rather than like
left and right parentheses.  I think it may be compatible with the 
character identity for the U+0028 glyph to be marked with a tiny 'o'
regardless of whether it broadly looks like a left or a right
parenthesis. 

> > Indeed, one may rely on the bidi algorithm to declare the Latin
> > example unidirectional.
> 
> One might, but to what purpose and goal?

A right-to-left paragraph consisting of the two characters "(a" would
be bidirectional and have a parenthesis on the right; a left-to-right
paragraph with the same content would have a parenthesis on the left.

The e-mail client I'm using has no higher-level protocol to determine
whether a paragraph is left-to-right or right-to-left, but uses the
first strong character.  Notepad (Windows 7, at least) seems to have two
options - all paragraphs are left-to-right, or all paragraphs are
right-to-left.  

> > If one can determine that text to be rendered boustrophedon is
> > genuinely 'unidirectional', it seems entirely reasonable to call
> > upon the Bidi algorithm to sort out the mirroring of glyphs on a
> > *line* once one has chosen the direction of a line.
> 
> No, not as soon as characters of different or weak/neutral
> directionality are involved, IMO.

If the paragraph contains any digits, it is not genuinely
unidirectional.  If it is, and there are no unmatched PDF characters,
one can just prefix LRO or RLO to each line to get the right
directionality.  If there are strong characters of different
directionalities, then it is unlikely that the paragraph is genuinely
unidirectional.  The full tridirectional (left, right and
boustrophedon) algorithm is likely to be extremely fiddly, as well as
dependent on non-existent information.

Richard.


From eliz at gnu.org  Sat Jul 25 13:05:41 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 25 Jul 2015 21:05:41 +0300
Subject: BidiMirrored property and ancient scripts
In-Reply-To: <20150725182726.533d4b78@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
 <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2>
Message-ID: <831tfw15ai.fsf@gnu.org>

> Date: Sat, 25 Jul 2015 18:27:26 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> > Mirroring is not changing a character's shape.  It is a replacement of
> > a character's glyph with a glyph of a different character.
> 
> Mirroring is changing a glyph to suitable for reading in the other
> direction.

Sorry, I disagree.

> Note the following extract from BidiMirroring.txt in the
> Unicode Character Database:
> 
> <quote>
> # The following characters have no appropriate mirroring character.
> # For these characters it is up to the rendering system 
> #   to provide mirrored glyphs.

How's that a contradiction to what I said?

> > Thus, your reasons make no sense to me, because a character's shape,
> > any character's shape, be it L, R, AL, or anything else, is immutable.
> 
> So go back and reread.

Did that; still no sense.

> > > Interpreting LRO and RLO is part of (1).  I'd like to know what the
> > > justification for have directionality overrides is.
> > 
> > One justification is when you want to present characters in some
> > particular order that overrides their innate bidirectional properties.
> > For example, imagine you want to tell your readers what will some
> > bidirectional text look like after reordering by the UBA, and you want
> > to do that without relying on the UBA implementation of whatever
> > software is used to view your presentation.
> 
> Brute force layout!  That makes it seem that overriding strong types
> was an error that leaves people hoping for support for switching text
> direction.

No, not at all.  Think various needs of presenting error messages that
quote bidirectional text, etc.  I had plenty of those problems in
Emacs.

> > > Where we may part company is in our view of Hebrew text (no Arabic
> > > numbers) with parentheses in a right-to-left paragraph.  I think
> > > such text is really just as unidirectional as equivalent Latin text
> > > in a left-to-right paragraph.
> > 
> > No, not as soon as numbers or Latin characters are involved, IMO.
> 
> My example, which your e-mail client may take as being in a
> left-to-right paragraph, is:
 
> ????? ????? / ???? (?? ?????? ?????? ??? ??????)

I'm reading this in Emacs, so the layout is R2L, as it should be.

But there are no numbers or Latin characters in this example, so it's
not what I had in mind.

> > > However, one needs the UBA to sort out the rendering of the
> > > parentheses in the Hebrew text.
> 
> > Not really, you can short-cut it, the same as in strictly
> > left-to-right text.
> 
> It's the UBA that mandates that the opening and closing parentheses be
> rendered like right and left parentheses respectively rather than like
> left and right parentheses.

Mirroring comes after layout in the UBA, as you pointed out, and the
short-cuts I mentioned are about layout, not about mirroring.

> > > Indeed, one may rely on the bidi algorithm to declare the Latin
> > > example unidirectional.
> > 
> > One might, but to what purpose and goal?
> 
> A right-to-left paragraph consisting of the two characters "(a" would
> be bidirectional and have a parenthesis on the right; a left-to-right
> paragraph with the same content would have a parenthesis on the left.

I don't see how this answers my question.

> The e-mail client I'm using has no higher-level protocol to determine
> whether a paragraph is left-to-right or right-to-left, but uses the
> first strong character.  Notepad (Windows 7, at least) seems to have two
> options - all paragraphs are left-to-right, or all paragraphs are
> right-to-left.  

Emacs has those 3 options, and it also has a higher-level protocol,
whereby the paragraph direction is only decided after an empty line.

But I still don't see how this is relevant.

From richard.wordingham at ntlworld.com  Sat Jul 25 16:15:40 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 25 Jul 2015 22:15:40 +0100
Subject: BidiMirrored property and ancient scripts
In-Reply-To: <831tfw15ai.fsf@gnu.org>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
 <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2>
 <831tfw15ai.fsf@gnu.org>
Message-ID: <20150725221540.72e6ee48@JRWUBU2>

On Sat, 25 Jul 2015 21:05:41 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Date: Sat, 25 Jul 2015 18:27:26 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > 
> > > Mirroring is not changing a character's shape.  It is a
> > > replacement of a character's glyph with a glyph of a different
> > > character.
> > 
> > Mirroring is changing a glyph to suitable for reading in the other
> > direction.
> 
> Sorry, I disagree.
> 
> > Note the following extract from BidiMirroring.txt in the
> > Unicode Character Database:
> > 
> > <quote>
> > # The following characters have no appropriate mirroring character.
> > # For these characters it is up to the rendering system 
> > #   to provide mirrored glyphs.
> 
> How's that a contradiction to what I said?

U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is
not replaced by any other character's glyph.  Or are you claiming that
left-to-right U+2140 and right-to-left U+2140 are two different
characters?

> > > Thus, your reasons make no sense to me, because a character's
> > > shape, any character's shape, be it L, R, AL, or anything else,
> > > is immutable.
> > 
> > So go back and reread.
> 
> Did that; still no sense.

Because you still seem not to understand the concept of mirroring.  It
isn't just for characters that have a Bidi_Mirroring_Glyph property
value other than <none>.

> > > > Where we may part company is in our view of Hebrew text (no
> > > > Arabic numbers) with parentheses in a right-to-left paragraph.
> > > > I think such text is really just as unidirectional as
> > > > equivalent Latin text in a left-to-right paragraph.

> > My example <snip> is:
>  
> > ????? ????? / ???? (?? ?????? ?????? ??? ??????)
> 
> > > > However, one needs the UBA to sort out the rendering of the
> > > > parentheses in the Hebrew text.
> > 
> > > Not really, you can short-cut it, the same as in strictly
> > > left-to-right text.
> > 
> > It's the UBA that mandates that the opening and closing parentheses
> > be rendered like right and left parentheses respectively rather
> > than like left and right parentheses.
> 
> Mirroring comes after layout in the UBA, as you pointed out, and the
> short-cuts I mentioned are about layout, not about mirroring.

So irrelevant.

I take it we now agree that the right shape for the parentheses for the
unidirectional right-to-left example is derived by the UBA.

> > > > Indeed, one may rely on the bidi algorithm to declare the Latin
> > > > example unidirectional.
> > > 
> > > One might, but to what purpose and goal?
> > 
> > A right-to-left paragraph consisting of the two characters "(a"
> > would be bidirectional and have a parenthesis on the right; a
> > left-to-right paragraph with the same content would have a
> > parenthesis on the left.

If there is no higher-level protocol in effect, the 'first strong
character' rule (Rules P2 and P3 of the UBA) declares that the
paragraph will be a left-to-right paragraph and will look
like "(a".  Had it been declared a right-to-left paragraph by a
higher-level protocol, it would look like "a)".  Thus the UBA has a
r?le even for unidirectional left-to-right text.

Richard.


From wjgo_10009 at btinternet.com  Sat Jul 25 11:43:09 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Sat, 25 Jul 2015 17:43:09 +0100 (BST)
Subject: Emoji characters for food allergens
Message-ID: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>

Emoji characters for food allergens
An interesting document entitled
Preliminary proposal to add emoji characters for food allergens
by Hiroyuki Komatsu
was added into the UTC (Unicode Technical Committee) Document Register yesterday.
http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf
This is a welcome development.
I suggest that, in view of the importance of precision in conveying information about food allergens, that the emoji characters for food allergens should be separate characters from other emoji characters. That is, encoded in a separate quite distinct block of code points far away in the character map from other emoji characters, with no dual meanings for any of the characters: a character for a food allergen should be quite separate and distinct from a character for any other meaning.
I opine that having two separate meanings for the same character, one meaning as an everyday jolly good fun meaning in a text message and one meaning as a specialist food allergen meaning could be a source of confusion. Far better to encode a separate code block with separate characters right from the start than risk needless and perhaps medically dangerous confusion in the future.
I suggest that for each allergen that there be two characters.
The glyph for the first character of the pair goes from baseline to ascender.
The  glyph for the second character of the pair is a copy of the glyph for the first character of the pair augmented with a thick red line from lower left descender to higher right a little above the base line, the thick red line perhaps being at about thirty degrees from the horizontal. Thus the thick red line would go over the allergen part of the glyph yet just by clipping it a bit so that clarity is maintained.
The glyphs are thus for the presence of the allergen and the absence of the allergen respectively.
It is typical in the United Kingdom to label food packets not only with an ingredients list but also with a list of allergens in the food and also with a list of allergens not in the food.
For example, a particular food may contain soya yet not gluten.
Thus I opine that two characters are needed for each allergen.
I have deliberately avoided a total strike through at forty-five degrees as I opine that that could lead to problems distinguishing clearly the glyph for the absence of one allergen from the glyph for the absence of another allergen.
I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise.
I opine that two separate characters for each allergen is desirable rather than some solution such as having one character for each allergen and a combining strike through character. 
The two separate characters approach keeps the system straightforward to use with many software packages. The matter of expressing food allergens is far too important to become entangled in problems for everyday users.
For gluten, it might be necessary to have three distinct code points.
In the United Kingdom there is a legal difference between "gluten-free" and "no gluten-containing ingredients".
To be labelled gluten-free the product must have been tested. This is to ensure that there has been no cross-contamination of ingredients. For example, rice has no gluten, but was a particular load of rice transported in a lorry used for wheat on other days?
Yet testing is not always possible in a restaurant situation.
William Overington
25 July 2015
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150725/39c6d0cb/attachment.html>

From gwalla at gmail.com  Sun Jul 26 00:05:21 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Sat, 25 Jul 2015 22:05:21 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>
References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>
Message-ID: <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>

On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington
<wjgo_10009 at btinternet.com> wrote:
> Emoji characters for food allergens
>
> An interesting document entitled
>
> Preliminary proposal to add emoji characters for food allergens
>
> by Hiroyuki Komatsu
>
> was added into the UTC (Unicode Technical Committee) Document Register
> yesterday.
>
> http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf
>
> This is a welcome development.

I'm skeptical. I understand the rationale, but several of the proposed
characters are essentially SMALL PILE OF BROWN DOTS and would be
difficult to distinguish at typical sizes.

From eliz at gnu.org  Sun Jul 26 10:08:00 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Sun, 26 Jul 2015 18:08:00 +0300
Subject: BidiMirrored property and ancient scripts
In-Reply-To: <20150725221540.72e6ee48@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
 <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2>
 <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2>
Message-ID: <83oaizyn1r.fsf@gnu.org>

> Date: Sat, 25 Jul 2015 22:15:40 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> > > Mirroring is changing a glyph to suitable for reading in the other
> > > direction.
> >
> > Sorry, I disagree.
> >
> > > Note the following extract from BidiMirroring.txt in the
> > > Unicode Character Database:
> > >
> > > <quote>
> > > # The following characters have no appropriate mirroring character.
> > > # For these characters it is up to the rendering system
> > > # to provide mirrored glyphs.
> >
> > How's that a contradiction to what I said?
> 
> U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is
> not replaced by any other character's glyph. Or are you claiming that
> left-to-right U+2140 and right-to-left U+2140 are two different
> characters?

I'm saying that "providing a mirrored glyph" entails coming up with a
character whose glyph can play that role, AFAIU.

If you are saying that the "rendering system" here is the shaping
engine using the rtlm OTF feature, then you are in fact saying that
the mirroring of these characters cannot be implemented with most
fonts in wide use today, and with most shaping engines.  That would be
a very strange claim, IMO, tantamount to saying that those characters
cannot, or don't need to, be mirrored at all in most use cases.

> > > > Thus, your reasons make no sense to me, because a character's
> > > > shape, any character's shape, be it L, R, AL, or anything else,
> > > > is immutable.
> > >
> > > So go back and reread.
> >
> > Did that; still no sense.
> 
> Because you still seem not to understand the concept of mirroring.

I think you will fare much better, and actually stand a chance of
convincing you are right, if you assume your opponents do understand
the issues, and just happen to disagree about their interpretation, or
misinterpret what you write.

> It isn't just for characters that have a Bidi_Mirroring_Glyph
> property value other than <none>.

Only "in specialized contexts", like "historic scripts and associated
punctuation, private-use characters, and characters in mathematical
expressions" (I believe the latter is only happening in Arabic
context, if it ever does).  IOW, in extremely rare and marginal use
cases.  And all that is only in HL6, which is really a fire escape
meant for applications whose scope is beyond simple text.  That's a
far cry from boustrophedon, which was the trigger for most of this
exchange.  In all other cases:

  L4. A character is depicted by a mirrored glyph if and only if (a)
  the resolved directionality of that character is R, and (b) the
  Bidi_Mirrored property value of that character is Yes.

That's normative and unequivocal.

> > > > > However, one needs the UBA to sort out the rendering of the
> > > > > parentheses in the Hebrew text.
> > >
> > > > Not really, you can short-cut it, the same as in strictly
> > > > left-to-right text.
> > >
> > > It's the UBA that mandates that the opening and closing parentheses
> > > be rendered like right and left parentheses respectively rather
> > > than like left and right parentheses.
> >
> > Mirroring comes after layout in the UBA, as you pointed out, and the
> > short-cuts I mentioned are about layout, not about mirroring.
> 
> So irrelevant.

No, not irrelevant.  You can sort out rendering of parentheses in such
text without applying the BPA, just by considering the parentheses as
neutrals.  That's one shortcut I alluded to.

> I take it we now agree that the right shape for the parentheses for the
> unidirectional right-to-left example is derived by the UBA.

The mirroring is dictated by the UBA, yes.  But that just delineates
the difference between boustrophedon and bidirectional text, the
latter being subject to the UBA, while the former isn't.

> > > > > Indeed, one may rely on the bidi algorithm to declare the Latin
> > > > > example unidirectional.
> > > >
> > > > One might, but to what purpose and goal?
> > >
> > > A right-to-left paragraph consisting of the two characters "(a"
> > > would be bidirectional and have a parenthesis on the right; a
> > > left-to-right paragraph with the same content would have a
> > > parenthesis on the left.
> 
> If there is no higher-level protocol in effect, the 'first strong
> character' rule (Rules P2 and P3 of the UBA) declares that the
> paragraph will be a left-to-right paragraph and will look
> like "(a". Had it been declared a right-to-left paragraph by a
> higher-level protocol, it would look like "a)". Thus the UBA has a
> r?le even for unidirectional left-to-right text.

Once the paragraph direction is overridden by a higher-level protocol,
the text is no longer unidirectional.  Such overriding is equivalent
to enclosing the paragraph in RLE..PDF pair, which makes the text
bidirectional by definition.

From richard.wordingham at ntlworld.com  Mon Jul 27 09:32:01 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Mon, 27 Jul 2015 15:32:01 +0100
Subject: BidiMirrored property and ancient scripts
In-Reply-To: <83oaizyn1r.fsf@gnu.org>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
 <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2>
 <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2>
 <83oaizyn1r.fsf@gnu.org>
Message-ID: <20150727153201.4f7cd9e1@JRWUBU2>

On Sun, 26 Jul 2015 18:08:00 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Date: Sat, 25 Jul 2015 22:15:40 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > 
> > > > Mirroring is changing a glyph to suitable for reading in the
> > > > other direction.
> > >
> > > Sorry, I disagree.
> > >
> > > > Note the following extract from BidiMirroring.txt in the
> > > > Unicode Character Database:
> > > >
> > > > <quote>
> > > > # The following characters have no appropriate mirroring
> > > > character. # For these characters it is up to the rendering
> > > > system # to provide mirrored glyphs.
> > >
> > > How's that a contradiction to what I said?
> > 
> > U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is
> > not replaced by any other character's glyph. Or are you claiming
> > that left-to-right U+2140 and right-to-left U+2140 are two different
> > characters?
> 
> I'm saying that "providing a mirrored glyph" entails coming up with a
> character whose glyph can play that role, AFAIU.

I'll take that as 'No' - the left-to-right and right-to-left forms are
the same character.  (Unicode has no consistency in this matter.)

> If you are saying that the "rendering system" here is the shaping
> engine using the rtlm OTF feature, then you are in fact saying that
> the mirroring of these characters cannot be implemented with most
> fonts in wide use today, and with most shaping engines.  That would be
> a very strange claim, IMO, tantamount to saying that those characters
> cannot, or don't need to, be mirrored at all in most use cases.

OpenType can handle it - feature rtlm effectively provides a
supplementary an RTL cmap, and ltrm an LTR cmap.  It's conceivable that
DirectWrite and Uniscribe don't support it, but that's unlikely.

It looks as though the HarfBuzz implementation of OpenType also supports
mirroring for right-to-left runs, but I can't find the code subsequent
to tagging characters that weren't reversed using the
Bidi_Mirroring_Glyph property.  I have a similar lack of progress with
finding the code for fractions, which also tags characters.  Fractions
using U+2044 are supported by HarfBuzz, for all that I can't find the
code.

I can't find any evidence of AAT support.

The OpenType scheme for mirroring for right-to-left text is:

1) Apply Unicode 5.1 Bidi_Mirroring_Glyph property where applicable.

2) For other characters, apply the rtlm feature.  This is intended to be
applied character by character.

3) Apply the rtla feature to the resulting glyph sequence.

Note that the font-writer is responsible for determining whether a
character is to be mirrored at Step 2.  Also note that there is no need
for font support if all the Bidi mirrored characters it supports have
the Bidi_Mirroring_Glyph property.

There is similar logic for mirroring for left-to-right text, except
that there is no Bidi_Mirroring_Glyph support from Unicode tables.  The
decision to mirror is entirely up to the font.

Now, you may be right about font support being lacking, just as it is
often lacking for U+2044 FRACTION SLASH.

If you still don't believe me, please explain why U+222B INTEGRAL has
Bidi_Mirrored=Yes but Bidi_Mirroring_Glyph=<none>.

> > > > > Thus, your reasons make no sense to me, because a character's
> > > > > shape, any character's shape, be it L, R, AL, or anything
> > > > > else, is immutable.
> > > >
> > > > So go back and reread.
> > >
> > > Did that; still no sense.
> > 
> > Because you still seem not to understand the concept of mirroring.
> 
> I think you will fare much better, and actually stand a chance of
> convincing you are right, if you assume your opponents do understand
> the issues, and just happen to disagree about their interpretation, or
> misinterpret what you write.

You won't understand my reasoning unless you accept that Bidi mirroring
can change a glyph's shape rather than substitute the glyph of another
character.  If you don't accept that, my argument will make no sense,
because you don't accept the premisses.

> > It isn't just for characters that have a Bidi_Mirroring_Glyph
> > property value other than <none>.
> 
> Only "in specialized contexts", like "historic scripts and associated
> punctuation, private-use characters, and characters in mathematical
> expressions" (I believe the latter is only happening in Arabic
> context, if it ever does).  IOW, in extremely rare and marginal use
> cases.  And all that is only in HL6, which is really a fire escape
> meant for applications whose scope is beyond simple text.

L4 calls for mandatory 'mirroring'.  Note that mirroring is not exact
mirroring.  My interpretation works for both Arabic and Hebrew.  The
UBA Rule L4 calls for some mathematical symbols to take the form
appropriate for a right-to-left context.  (HL6 allows this set
to be extended.)  However, from what you say this form depends on the
language. For example, the basic integral sign flips for Arabic maths,
but from what you say, I think not for Hebrew maths. OpenType can make
the mirrored shaped dependent on the language of the text.

> That's a
> far cry from boustrophedon, which was the trigger for most of this
> exchange.  In all other cases:
> 
>   L4. A character is depicted by a mirrored glyph if and only if (a)
>   the resolved directionality of that character is R, and (b) the
>   Bidi_Mirrored property value of that character is Yes.
> 
> That's normative and unequivocal.

And therefore applies to U+222B INTEGRAL.  Formally, HL6 is
irrelevant for this character.  Now, you might wish for HL6 to be
modified to allow it not to be mirrored, but I think we can stretch the
definition of mirroring to handle it.

UBA Section 7 "Mirroring" says:

"Implementing rule L4 calls for mirrored glyphs. These glyphs may not be
exact graphical mirror images. For example, clearly an italic
parenthesis is not an exact mirror image of another? ?(? is not the
mirror image of ?)?. Instead, mirror glyphs are those acceptable as
mirrors within the normal parameters of the font in which they are
represented."

This opens up the possibility of the degree of mirroring depending on
the language being supported.

> > > > > > However, one needs the UBA to sort out the rendering of the
> > > > > > parentheses in the Hebrew text.
> > > >
> > > > > Not really, you can short-cut it, the same as in strictly
> > > > > left-to-right text.
> > > >
> > > > It's the UBA that mandates that the opening and closing
> > > > parentheses be rendered like right and left parentheses
> > > > respectively rather than like left and right parentheses.
> > >
> > > Mirroring comes after layout in the UBA, as you pointed out, and
> > > the short-cuts I mentioned are about layout, not about mirroring.
> > 
> > So irrelevant.
> 
> No, not irrelevant.  You can sort out rendering of parentheses in such
> text without applying the BPA, just by considering the parentheses as
> neutrals.  That's one shortcut I alluded to.

> > I take it we now agree that the right shape for the parentheses for
> > the unidirectional right-to-left example is derived by the UBA.

> The mirroring is dictated by the UBA, yes.

Which was my point - the UBA applies to unidirectional text.

> But that just delineates
> the difference between boustrophedon and bidirectional text, the
> latter being subject to the UBA, while the former isn't.

I didn't say boustrophedon text was subject to the UBA.  I said a
boustrophedon renderer may modify the text to be rendered so that the
UBA will layout the text properly.  This modification is heavily
dependent on line length.  Ideally one would lay it out line-by-line.

> > > > > > Indeed, one may rely on the bidi algorithm to declare the
> > > > > > Latin example unidirectional.
> > > > >
> > > > > One might, but to what purpose and goal?
> > > >
> > > > A right-to-left paragraph consisting of the two characters "(a"
> > > > would be bidirectional and have a parenthesis on the right; a
> > > > left-to-right paragraph with the same content would have a
> > > > parenthesis on the left.
> > 
> > If there is no higher-level protocol in effect, the 'first strong
> > character' rule (Rules P2 and P3 of the UBA) declares that the
> > paragraph will be a left-to-right paragraph and will look
> > like "(a". Had it been declared a right-to-left paragraph by a
> > higher-level protocol, it would look like "a)". Thus the UBA has a
> > r?le even for unidirectional left-to-right text.

> Once the paragraph direction is overridden by a higher-level protocol,
> the text is no longer unidirectional.  Such overriding is equivalent
> to enclosing the paragraph in RLE..PDF pair, which makes the text
> bidirectional by definition.

And if it isn't overridden, it is the UBA which makes it
unidirectional.  The UBA specifies the appearance of an opening
parenthesis.

Richard.


From eliz at gnu.org  Mon Jul 27 10:18:09 2015
From: eliz at gnu.org (Eli Zaretskii)
Date: Mon, 27 Jul 2015 18:18:09 +0300
Subject: BidiMirrored property and ancient scripts
In-Reply-To: <20150727153201.4f7cd9e1@JRWUBU2>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
 <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2>
 <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2>
 <83oaizyn1r.fsf@gnu.org> <20150727153201.4f7cd9e1@JRWUBU2>
Message-ID: <83bnexzl1q.fsf@gnu.org>

I no longer see where this is going.  If there's still some goal,
something you think we should agree or discuss, perhaps you could
spell that out.  Otherwise, I think it' time to quit.

Some random comments:

> Date: Mon, 27 Jul 2015 15:32:01 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> Cc: unicode at unicode.org

> > > U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is
> > > not replaced by any other character's glyph. Or are you claiming
> > > that left-to-right U+2140 and right-to-left U+2140 are two different
> > > characters?
> > 
> > I'm saying that "providing a mirrored glyph" entails coming up with a
> > character whose glyph can play that role, AFAIU.
> 
> I'll take that as 'No' - the left-to-right and right-to-left forms are
> the same character.  (Unicode has no consistency in this matter.)

I don't know what is meant by "left-to-right and right-to-left forms"
here.  To me, a character has only one form.

> > If you are saying that the "rendering system" here is the shaping
> > engine using the rtlm OTF feature, then you are in fact saying that
> > the mirroring of these characters cannot be implemented with most
> > fonts in wide use today, and with most shaping engines.  That would be
> > a very strange claim, IMO, tantamount to saying that those characters
> > cannot, or don't need to, be mirrored at all in most use cases.
> 
> OpenType can handle it - feature rtlm effectively provides a
> supplementary an RTL cmap, and ltrm an LTR cmap.  It's conceivable that
> DirectWrite and Uniscribe don't support it, but that's unlikely.

Most popular fonts don't, so this support is basically useless, if it
turns out to be a must.

> The decision to mirror is entirely up to the font.

Not at all.  A display engine can make those decisions on its own,
even if it consults the fonts while making those decisions.

> If you still don't believe me, please explain why U+222B INTEGRAL has
> Bidi_Mirrored=Yes but Bidi_Mirroring_Glyph=<none>.

The explanation is in the file: there's no glyph for that.

> > > Because you still seem not to understand the concept of mirroring.
> > 
> > I think you will fare much better, and actually stand a chance of
> > convincing you are right, if you assume your opponents do understand
> > the issues, and just happen to disagree about their interpretation, or
> > misinterpret what you write.
> 
> You won't understand my reasoning unless you accept that Bidi mirroring
> can change a glyph's shape rather than substitute the glyph of another
> character.

Try to convince me in that.

> L4 calls for mandatory 'mirroring'.  Note that mirroring is not exact
> mirroring.  My interpretation works for both Arabic and Hebrew.  The
> UBA Rule L4 calls for some mathematical symbols to take the form
> appropriate for a right-to-left context.  (HL6 allows this set
> to be extended.)  However, from what you say this form depends on the
> language. For example, the basic integral sign flips for Arabic maths,
> but from what you say, I think not for Hebrew maths.

Hebrew always typesets math left to right, so no mirroring of math
symbols, including U+222B INTEGRAL, is ever necessary.

> OpenType can make the mirrored shaped dependent on the language of
> the text.

The language of the text is not always well defined, alas.

> >   L4. A character is depicted by a mirrored glyph if and only if (a)
> >   the resolved directionality of that character is R, and (b) the
> >   Bidi_Mirrored property value of that character is Yes.
> > 
> > That's normative and unequivocal.
> 
> And therefore applies to U+222B INTEGRAL.

Yes, but since there's no glyph, it's a non-issue.

> UBA Section 7 "Mirroring" says:
> 
> "Implementing rule L4 calls for mirrored glyphs. These glyphs may not be
> exact graphical mirror images. For example, clearly an italic
> parenthesis is not an exact mirror image of another? ?(? is not the
> mirror image of ?)?. Instead, mirror glyphs are those acceptable as
> mirrors within the normal parameters of the font in which they are
> represented."
> 
> This opens up the possibility of the degree of mirroring depending on
> the language being supported.

My reading of that is that there's some freedom in choosing the shape
of the mirrored glyph, but the degree of mirroring is non-negotiable.

> > But that just delineates
> > the difference between boustrophedon and bidirectional text, the
> > latter being subject to the UBA, while the former isn't.
> 
> I didn't say boustrophedon text was subject to the UBA.  I said a
> boustrophedon renderer may modify the text to be rendered so that the
> UBA will layout the text properly.

Given the directional overrides, this is a trivium, I think.

> > > If there is no higher-level protocol in effect, the 'first strong
> > > character' rule (Rules P2 and P3 of the UBA) declares that the
> > > paragraph will be a left-to-right paragraph and will look
> > > like "(a". Had it been declared a right-to-left paragraph by a
> > > higher-level protocol, it would look like "a)". Thus the UBA has a
> > > r?le even for unidirectional left-to-right text.
> 
> > Once the paragraph direction is overridden by a higher-level protocol,
> > the text is no longer unidirectional.  Such overriding is equivalent
> > to enclosing the paragraph in RLE..PDF pair, which makes the text
> > bidirectional by definition.
> 
> And if it isn't overridden, it is the UBA which makes it
> unidirectional.

No, it doesn't.

> The UBA specifies the appearance of an opening parenthesis.

That's bidirectional, not unidirectional.

From charupdate at orange.fr  Mon Jul 27 12:30:25 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Mon, 27 Jul 2015 19:30:25 +0200 (CEST)
Subject: Emoji characters for food allergens
Message-ID: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215>

On 26 Jul 2015, at 05:45, William_J_G Overington  wrote:

> Emoji characters for food allergens
> An interesting document entitled
> Preliminary proposal to add emoji characters for food allergens
> by Hiroyuki Komatsu
> was added into the UTC (Unicode Technical Committee) Document Register yesterday.
> http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf

> This is a welcome development.

> I suggest that, in view of the importance of precision in conveying information about food allergens, that the emoji characters for food allergens should be separate characters from other emoji characters. That is, encoded in a separate quite distinct block of code points far away in the character map from other emoji characters, with no dual meanings for any of the characters: a character for a food allergen should be quite separate and distinct from a character for any other meaning.

> I opine that having two separate meanings for the same character, one meaning as an everyday jolly good fun meaning in a text message and one meaning as a specialist food allergen meaning could be a source of confusion. Far better to encode a separate code block with separate characters right from the start than risk needless and perhaps medically dangerous confusion in the future.

> I suggest that for each allergen that there be two characters.
> The glyph for the first character of the pair goes from baseline to ascender.
> The glyph for the second character of the pair is a copy of the glyph for the first character of the pair augmented with a thick red line from lower left descender to higher right a little above the base line, the thick red line perhaps being at about thirty degrees from the horizontal. Thus the thick red line would go over the allergen part of the glyph yet just by clipping it a bit so that clarity is maintained.
> The glyphs are thus for the presence of the allergen and the absence of the allergen respectively.

> It is typical in the United Kingdom to label food packets not only with an ingredients list but also with a list of allergens in the food and also with a list of allergens not in the food.
> For example, a particular food may contain soya yet not gluten.
> Thus I opine that two characters are needed for each allergen.

> I have deliberately avoided a total strike through at forty-five degrees as I opine that that could lead to problems distinguishing clearly the glyph for the absence of one allergen from the glyph for the absence of another allergen.

> I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise.

I'm not sure whether another code would facilitate the handling of these warnings. IMHO the allergen name in natural language is more efficient in communication. This needs however to identify and learn the words prior to travelling into a foreign language country, while a code point is more obvious to read if it's meaning is at hand.

> I opine that two separate characters for each allergen is desirable rather than some solution such as having one character for each allergen and a combining strike through character.

This is consistent with the Unicode policy of not decomposing overlay diacritics in writing characters. Symbols however are intended for use with combining marks for symbols, like 20E0 COMBINING ENCLOSING CIRCLE BACKSLASH. We hope that the food allergens issue's importance make implement an efficient system of language-independent labelling.

> The two separate characters approach keeps the system straightforward to use with many software packages. The matter of expressing food allergens is far too important to become entangled in problems for everyday users.

> For gluten, it might be necessary to have three distinct code points.
> In the United Kingdom there is a legal difference between "gluten-free" and "no gluten-containing ingredients".
> To be labelled gluten-free the product must have been tested. This is to ensure that there has been no cross-contamination of ingredients. For example, rice has no gluten, but was a particular load of rice transported in a lorry used for wheat on other days?
> Yet testing is not always possible in a restaurant situation.

All the best,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150727/235e098b/attachment.html>

From charupdate at orange.fr  Mon Jul 27 12:45:39 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Mon, 27 Jul 2015 19:45:39 +0200 (CEST)
Subject: Emoji characters for food allergens
In-Reply-To: <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>
References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>
 <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>
Message-ID: <942647388.13410.1438019139444.JavaMail.www@wwinf2215>

On 26 Jul 2015 at 07:14, Garth Wallace  wrote:

> On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington
>  wrote:
> > Emoji characters for food allergens
> >
> > An interesting document entitled
> >
> > Preliminary proposal to add emoji characters for food allergens
> >
> > by Hiroyuki Komatsu
> >
> > was added into the UTC (Unicode Technical Committee) Document Register
> > yesterday.
> >
> > http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf
> >
> > This is a welcome development.
> 
> I'm skeptical. I understand the rationale, but several of the proposed
> characters are essentially SMALL PILE OF BROWN DOTS and would be
> difficult to distinguish at typical sizes.

Only two, buckwheat and sesame. As disclaimed, none is final. For buckwheat we can opt for an ear of buckwheat rather than an amount of grains. Typically the form of the buckwheat grain could be used, as it's resemblance with a beechnut lead to its German name "Buchweizen". But scaling a single grain to almost 1:1 might become hard to understand at a glyphic level.

Marcel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150727/92929047/attachment.html>

From richard.wordingham at ntlworld.com  Mon Jul 27 13:16:40 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Mon, 27 Jul 2015 19:16:40 +0100
Subject: BidiMirrored property and ancient scripts
In-Reply-To: <83bnexzl1q.fsf@gnu.org>
References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net>
 <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21>
 <20150722085240.00f61ba2@JRWUBU2>
 <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31>
 <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com>
 <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com>
 <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2>
 <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2>
 <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2>
 <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2>
 <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2>
 <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2>
 <83oaizyn1r.fsf@gnu.org> <20150727153201.4f7cd9e1@JRWUBU2>
 <83bnexzl1q.fsf@gnu.org>
Message-ID: <20150727191640.1257b5f1@JRWUBU2>

On Mon, 27 Jul 2015 18:18:09 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> I no longer see where this is going.  If there's still some goal,
> something you think we should agree or discuss, perhaps you could
> spell that out.  Otherwise, I think it' time to quit.

It's basically to establish that for UBA-compliant bidirectional support
of some characters, a font must have both a left-to-right and a
right-to-left glyph for the character.

> Some random comments:
> 
> > Date: Mon, 27 Jul 2015 15:32:01 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > Cc: unicode at unicode.org
> 
> > > > U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its
> > > > glyph is not replaced by any other character's glyph. Or are
> > > > you claiming that left-to-right U+2140 and right-to-left U+2140
> > > > are two different characters?
> > > 
> > > I'm saying that "providing a mirrored glyph" entails coming up
> > > with a character whose glyph can play that role, AFAIU.
> > 
> > I'll take that as 'No' - the left-to-right and right-to-left forms
> > are the same character.  (Unicode has no consistency in this
> > matter.)
> 
> I don't know what is meant by "left-to-right and right-to-left forms"
> here.  To me, a character has only one form.

I trust you've just forgotten that that's not true.  Soft-dotted
characters like 'i' and 'j' lose their dot when a mark above (ccc=230)
is attached, e.g. <U+0069 LATIN SMALL LETTER I, U+1DC4 COMBINING
MACRON-ACUTE>. Indic scripts have some more spectacular variations.

In a font that supports both left-to-right and Arabic right-to-left
maths, U+222B INTEGRAL will have at least two forms, one for
left-to-right and one for right-to-left.

> > > If you are saying that the "rendering system" here is the shaping
> > > engine using the rtlm OTF feature, then you are in fact saying
> > > that the mirroring of these characters cannot be implemented with
> > > most fonts in wide use today, and with most shaping engines.
> > > That would be a very strange claim, IMO, tantamount to saying
> > > that those characters cannot, or don't need to, be mirrored at
> > > all in most use cases.

Is this an expression of disbelief, or a lament that the UBA demands
too much?  If it's a lament, I believe I've made my point.

> > OpenType can handle it - feature rtlm effectively provides a
> > supplementary an RTL cmap, and ltrm an LTR cmap.  It's conceivable
> > that DirectWrite and Uniscribe don't support it, but that's
> > unlikely.
> 
> Most popular fonts don't, so this support is basically useless, if it
> turns out to be a must.

No, it's a 'shall'.  One won't be arrested for not doing it.

> > The decision to mirror is entirely up to the font.
> 
> Not at all.  A display engine can make those decisions on its own,
> even if it consults the fonts while making those decisions.

If application of the rtlm and rtla features do not change the glyph
used for U+222B INTEGRAL, then the font has refused to mirror the
character.

Now it is possible, in this circumstance, that the rendering enging
might synthesise a reflected glyph.  The font could then deceive the
rendering engine by substituting an identical glyph.

> > If you still don't believe me, please explain why U+222B INTEGRAL
> > has Bidi_Mirrored=Yes but Bidi_Mirroring_Glyph=<none>.
> 
> The explanation is in the file: there's no glyph for that.

You mean, I hope, that there's no other character with the glyph for
that r?le.


> > I didn't say boustrophedon text was subject to the UBA.  I said a
> > boustrophedon renderer may modify the text to be rendered so that
> > the UBA will layout the text properly.
> 
> Given the directional overrides, this is a trivium, I think.

Yes.  I couldn't see why you were making such a fuss about it.

> > The UBA specifies the appearance of an opening parenthesis.
> 
> That's bidirectional, not unidirectional

There may not be any more point in arguing about what is unidirectional
and what is bidirectional.

Richard.


From gwalla at gmail.com  Mon Jul 27 13:49:47 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 27 Jul 2015 11:49:47 -0700
Subject: Olympic sports emoji
Message-ID: <CA+p4_H0UnA9-teCtZ-Sak1-Fa3j1MiJyVD4D1MLG53tWLScZcw@mail.gmail.com>

I read this proposal and was a little confused. Why aren't they
proposing the actual sports pictograms that are in use for
international events like the Olympics? Those are generally stylized
human figures shown engaging in sports, but the suggested symbols in
this proposal seem to mostly be pictures of sports equipment. It seems
like reinventing the wheel. Are the Olympic-style pictograms not felt
to be sufficiently emoji-like?

Singling out modern pentathlon, water polo, and team handball to be
encoded as ZWJ sequences instead of atomic characters also seems
arbitrary. Why would PERSON WITH BALL plus GOAL NET specifically imply
team handball? It seems like that combination covers a lot of sports.
Why should modern pentathlon require nine characters for a single
symbol?

From doug at ewellic.org  Mon Jul 27 15:10:06 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 27 Jul 2015 13:10:06 -0700
Subject: Olympic sports emoji
Message-ID: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net>

Garth Wallace <gwalla at gmail dot com> wrote:

> I read this proposal [L2/15-196R] and was a little confused. Why
> aren't they proposing the actual sports pictograms that are in use for
> international events like the Olympics? Those are generally stylized
> human figures shown engaging in sports, but the suggested symbols in
> this proposal seem to mostly be pictures of sports equipment. It seems
> like reinventing the wheel. Are the Olympic-style pictograms not felt
> to be sufficiently emoji-like?

The official Summer Olympics pictograms change each time the Games are
held:

http://www.olympic.org/Assets/OSC%20Section/pdf/QR_sports_pictograms_of_the_olympic_summer_games_1964_2016.pdf

Although the symbols introduced for the 1972 Munich Games were
particularly influential and are often thought to be canonical, these
symbols have been styled quite differently since 1992.

Additionally, the images are copyrighted, for the most part by the
International Olympic Committee (see page 2 of the PDF document).

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From leob at mailcom.com  Mon Jul 27 15:40:04 2015
From: leob at mailcom.com (Leo Broukhis)
Date: Mon, 27 Jul 2015 13:40:04 -0700
Subject: Olympic sports emoji
In-Reply-To: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net>
References: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net>
Message-ID: <CAFmvRsfW-b4auFvORCTQ4QGCtBEzA9CR8xGJDyvkfgJ29Bw=mA@mail.gmail.com>

Fonts vary and can be copyrighted, no doubt, but Unicode is not about fonts.

Leo

On Mon, Jul 27, 2015 at 1:10 PM, Doug Ewell <doug at ewellic.org> wrote:
> Garth Wallace <gwalla at gmail dot com> wrote:
>
>> I read this proposal [L2/15-196R] and was a little confused. Why
>> aren't they proposing the actual sports pictograms that are in use for
>> international events like the Olympics? Those are generally stylized
>> human figures shown engaging in sports, but the suggested symbols in
>> this proposal seem to mostly be pictures of sports equipment. It seems
>> like reinventing the wheel. Are the Olympic-style pictograms not felt
>> to be sufficiently emoji-like?
>
> The official Summer Olympics pictograms change each time the Games are
> held:
>
> http://www.olympic.org/Assets/OSC%20Section/pdf/QR_sports_pictograms_of_the_olympic_summer_games_1964_2016.pdf
>
> Although the symbols introduced for the 1972 Munich Games were
> particularly influential and are often thought to be canonical, these
> symbols have been styled quite differently since 1992.
>
> Additionally, the images are copyrighted, for the most part by the
> International Olympic Committee (see page 2 of the PDF document).
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>


From gwalla at gmail.com  Mon Jul 27 15:41:11 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 27 Jul 2015 13:41:11 -0700
Subject: Olympic sports emoji
In-Reply-To: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net>
References: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net>
Message-ID: <CA+p4_H2umcUvZ91hzJ66bUhGbebkDOpYgWPS8MwObZGYTLw0Bg@mail.gmail.com>

On Mon, Jul 27, 2015 at 1:10 PM, Doug Ewell <doug at ewellic.org> wrote:
> Garth Wallace <gwalla at gmail dot com> wrote:
>
>> I read this proposal [L2/15-196R] and was a little confused. Why
>> aren't they proposing the actual sports pictograms that are in use for
>> international events like the Olympics? Those are generally stylized
>> human figures shown engaging in sports, but the suggested symbols in
>> this proposal seem to mostly be pictures of sports equipment. It seems
>> like reinventing the wheel. Are the Olympic-style pictograms not felt
>> to be sufficiently emoji-like?
>
> The official Summer Olympics pictograms change each time the Games are
> held:
>
> http://www.olympic.org/Assets/OSC%20Section/pdf/QR_sports_pictograms_of_the_olympic_summer_games_1964_2016.pdf
>
> Although the symbols introduced for the 1972 Munich Games were
> particularly influential and are often thought to be canonical, these
> symbols have been styled quite differently since 1992.
>
> Additionally, the images are copyrighted, for the most part by the
> International Olympic Committee (see page 2 of the PDF document).

The style of them changes with each Games, but the identities do not.
To my mind, this is equivalent to the glyph/character distinction. The
individual Olympiad-specific images are copyrighted but not even the
IOC can copyright the idea of "stick figure playing hockey, used to
symbolize the sport of ice hockey". The UCS even includes a few of
them already: U+26F7   SKIER is the symbol for Alpine skiing.

From doug at ewellic.org  Mon Jul 27 17:12:00 2015
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 27 Jul 2015 15:12:00 -0700
Subject: Olympic sports emoji
Message-ID: <20150727151200.665a7a7059d7ee80bb4d670165c8327d.8d64c4d981.wbe@email03.secureserver.net>

Leo Broukhis <leob at mailcom dot com> wrote:

> Fonts vary and can be copyrighted, no doubt, but Unicode is not about
> fonts.

I was going to bust out the Apple logo as an analogy to the Olympic
symbols, but apparently the Apple logo is trademarked and not merely
copyrighted, so never mind.

In any case, if this is just a character/glyph thing, then there
shouldn't be a problem using either the existing emoji or the ones
proposed in L2/15-196R for Olympic sports, since the glyphs can simply
be styled as needed.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From gwalla at gmail.com  Mon Jul 27 18:46:52 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 27 Jul 2015 16:46:52 -0700
Subject: Hentaigana and the Kana Supplement block
Message-ID: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com>

The recent hentaigana proposal requests that they be encoded as
Standardized Variation Sequences of hiragana. This seems like a good
idea, since fallback in the absence of font support would be to the
standard hiragana, so the results would still be readable. But where
does that leave the Kana Supplement block? That block contains only
two encoded characters, but was allocated 256 code points, presumably
for the future encoding of hentaigana. With hentaigana handled by
SVSes, it seems unlikely that many of those points would ever get
filled. I realize there's no shortage of code points in the UCS, but
still.

One thing I noticed: the hentaigana proposal contains a duplicate of
an existing character. MJ090014 (? variant with mother ideograph ?)
looks like it's already encoded in the Kana Supplement block as
U+1B001 HIRAGANA LETTER ARCHAIC YE.


From markus.icu at gmail.com  Mon Jul 27 18:59:44 2015
From: markus.icu at gmail.com (Markus Scherer)
Date: Mon, 27 Jul 2015 16:59:44 -0700
Subject: Hentaigana and the Kana Supplement block
In-Reply-To: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com>
References: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com>
Message-ID: <CAN49p6pKe2NTeu0iFZMnd94HSNUCS3Eu-xR0=iXAZdk8=Cbf3w@mail.gmail.com>

On Mon, Jul 27, 2015 at 4:46 PM, Garth Wallace <gwalla at gmail.com> wrote:

> where
> does that leave the Kana Supplement block? That block contains only
> two encoded characters, but was allocated 256 code points, presumably
> for the future encoding of hentaigana. With hentaigana handled by
> SVSes, it seems unlikely that many of those points would ever get
> filled. I realize there's no shortage of code points in the UCS, but
> still.
>

I don't think the committee fills blocks with characters just because there
is space and some glyphs are related :-)

One thing I noticed: the hentaigana proposal contains a duplicate of
> an existing character. MJ090014 (? variant with mother ideograph ?)
> looks like it's already encoded in the Kana Supplement block as
> U+1B001 HIRAGANA LETTER ARCHAIC YE.
>

Please submit this via http://www.unicode.org/reporting.html

Best regards,
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150727/c99e29ba/attachment.html>

From gwalla at gmail.com  Tue Jul 28 00:55:09 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 27 Jul 2015 22:55:09 -0700
Subject: Fwd: Olympic sports emoji
In-Reply-To: <CA+p4_H2=tw5J_TQDAr-irvnJ0V-8Cy_Q=Hp4j928kfmTufC1=g@mail.gmail.com>
References: <20150727151200.665a7a7059d7ee80bb4d670165c8327d.8d64c4d981.wbe@email03.secureserver.net>
 <CA+p4_H2=tw5J_TQDAr-irvnJ0V-8Cy_Q=Hp4j928kfmTufC1=g@mail.gmail.com>
Message-ID: <CA+p4_H1p2LPbm7-CZjFxTFgx6STFTv2gQ425=drMQiB0H1kJig@mail.gmail.com>

(sorry, meant to send this to the list)

On Mon, Jul 27, 2015 at 3:12 PM, Doug Ewell <doug at ewellic.org> wrote:
> Leo Broukhis <leob at mailcom dot com> wrote:
>
>> Fonts vary and can be copyrighted, no doubt, but Unicode is not about
>> fonts.
>
> I was going to bust out the Apple logo as an analogy to the Olympic
> symbols, but apparently the Apple logo is trademarked and not merely
> copyrighted, so never mind.
>
> In any case, if this is just a character/glyph thing, then there
> shouldn't be a problem using either the existing emoji or the ones
> proposed in L2/15-196R for Olympic sports, since the glyphs can simply
> be styled as needed.

Would this be considered within the normal range of glyphic variation?
Would an icon of two pugilists fighting be an acceptable rendering of
a BOXING GLOVE emoji?

BTW, speaking as a martial artist myself, I have to say an empty dogi
is an odd representation for martial arts, even specifically Japanese
ones. The proposal says that it could be used for judo, karate, and
tae kwon do; it at least matches the first two (they are distinct, but
not in a way that would , and practice uniforms for TKD are similar,
but competitive TKD under WTF rules (including Olympic competition)
uses several pieces of protective equipment (helmet, gloves, chest
guard) with colored padding over the dobok.

From gwalla at gmail.com  Tue Jul 28 01:11:36 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 27 Jul 2015 23:11:36 -0700
Subject: Hentaigana and the Kana Supplement block
In-Reply-To: <CAN49p6pKe2NTeu0iFZMnd94HSNUCS3Eu-xR0=iXAZdk8=Cbf3w@mail.gmail.com>
References: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com>
 <CAN49p6pKe2NTeu0iFZMnd94HSNUCS3Eu-xR0=iXAZdk8=Cbf3w@mail.gmail.com>
Message-ID: <CA+p4_H2o050Qr3sqmhu1GDfAuuzXc022MMtYdpqOLEyiw3po9A@mail.gmail.com>

On Mon, Jul 27, 2015 at 4:59 PM, Markus Scherer <markus.icu at gmail.com> wrote:
> On Mon, Jul 27, 2015 at 4:46 PM, Garth Wallace <gwalla at gmail.com> wrote:
>>
>> where
>> does that leave the Kana Supplement block? That block contains only
>> two encoded characters, but was allocated 256 code points, presumably
>> for the future encoding of hentaigana. With hentaigana handled by
>> SVSes, it seems unlikely that many of those points would ever get
>> filled. I realize there's no shortage of code points in the UCS, but
>> still.
>
>
> I don't think the committee fills blocks with characters just because there
> is space and some glyphs are related :-)

Yes, but it looked like that was the intent. I'm not saying the
hentaigana should be encoded as atomic characters in that block just
because there is space; I think the SVS approach sounds like the right
one (though I'm hardly an expert on hentaigana). I'm just wondering
what's to be done with all of those code points if they won't be used
for hentaigana, since it seems unlikely that there would be many other
kana that couldn't be handled by existing characters or the proposed
SVSes. Is it possible for a block to be later renamed as something
more general to allow for some non-kana, or even to carve out some of
the empty columns for a new block? Or does the stability policy apply
to block allocations?

From everson at evertype.com  Tue Jul 28 08:00:03 2015
From: everson at evertype.com (Michael Everson)
Date: Tue, 28 Jul 2015 14:00:03 +0100
Subject: Emoji characters for food allergens
In-Reply-To: <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>
References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>
 <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>
Message-ID: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com>

I do NOT understand the rationale.

Emojis are not for labelling things. They?re for the playful expression of emotions. 

Standardized symbols for allergens might be useful, if there were a textual use for them. 

> On 26 Jul 2015, at 06:05, Garth Wallace <gwalla at gmail.com> wrote:
> 
> On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington
> <wjgo_10009 at btinternet.com> wrote:
>> Emoji characters for food allergens
>> 
>> An interesting document entitled
>> 
>> Preliminary proposal to add emoji characters for food allergens
>> 
>> by Hiroyuki Komatsu
>> 
>> was added into the UTC (Unicode Technical Committee) Document Register
>> yesterday.
>> 
>> http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf
>> 
>> This is a welcome development.
> 
> I'm skeptical. I understand the rationale, but several of the proposed
> characters are essentially SMALL PILE OF BROWN DOTS and would be
> difficult to distinguish at typical sizes.

Michael Everson * http://www.evertype.com/


From wjgo_10009 at btinternet.com  Tue Jul 28 05:19:21 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Tue, 28 Jul 2015 11:19:21 +0100 (BST)
Subject: Emoji characters for food allergens
In-Reply-To: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215>
References: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215>
Message-ID: <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost>

Hi Marcel
>> I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise.
> I'm not sure whether another code would facilitate the handling of these warnings. IMHO the allergen name in natural language is more efficient in communication. This needs however to identify and learn the words prior to travelling into a foreign language country, while a code point is more obvious to read if it's meaning is at hand.
Well a lot could be done information technology-wise to facilitate communication through the language barrier.
For example in text messages, sent by email, or over a mobile telephone link or maybe thrown to a device nearby, to communicate dietary needs, using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the receiving device;
For example, by using a smartphone by reading from an RFID tag (radio-frequency identification tag) on a shelf label in a supermarket display about a product . The RFID tag could contain the food allergen information about the food encoded using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the smartphone.
Rest regards, 
William Overington
28 July 2015
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150728/1c459fd4/attachment.html>

From rscook at wenlin.com  Tue Jul 28 08:34:54 2015
From: rscook at wenlin.com (Richard Cook)
Date: Tue, 28 Jul 2015 06:34:54 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com>
References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>
 <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>
 <474B9A36-2A8C-449A-8019-60B373459914@evertype.com>
Message-ID: <4F3ECCC5-34D0-4D9F-9FC9-08926EB10885@wenlin.com>

On Jul 28, 2015, at 6:00 AM, Michael Everson <everson at evertype.com> allegedly wrote:
> 
> Emojis are not for labelling things. They?re for the playful expression of emotions.

Is that what they're for? I thought they were (encoded) to satisfy certain device manufacturers. And, what is the emotion playfully expressed by ???? ?


From eric.muller at efele.net  Tue Jul 28 09:48:34 2015
From: eric.muller at efele.net (Eric Muller)
Date: Tue, 28 Jul 2015 07:48:34 -0700
Subject: Toki Pona: A Language With a Hundred Words - The Atlantic
Message-ID: <55B79642.9000103@efele.net>

http://www.theatlantic.com/technology/archive/2015/07/toki-pona-smallest-language/398363/

Eric.


From doug at ewellic.org  Tue Jul 28 09:53:53 2015
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 28 Jul 2015 07:53:53 -0700
Subject: Emoji characters for food allergens
Message-ID: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net>

Richard Cook <rscook at wenlin dot com> wrote:

> And, what is the emotion playfully expressed by ???? ?

"I'm having a burger and fries for lunch but can't be bothered to type
all that into this text message lol"

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From rscook at wenlin.com  Tue Jul 28 10:07:37 2015
From: rscook at wenlin.com (Richard Cook)
Date: Tue, 28 Jul 2015 08:07:37 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net>
References: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net>
Message-ID: <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com>

On Jul 28, 2015, at 7:53 AM, Doug Ewell <doug at ewellic.org> wrote:
> 
> Richard Cook <rscook at wenlin dot com> wrote:
> 
>> And, what is the emotion playfully expressed by ???? ?
> 
> "I'm having a burger and fries for lunch but can't be bothered to type
> all that into this text message lol"
> 
Is all that one emotion or two?

> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
> 
> 


From asmusf at ix.netcom.com  Tue Jul 28 10:56:33 2015
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Tue, 28 Jul 2015 08:56:33 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com>
References: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net>
 <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com>
Message-ID: <55B7A631.7070301@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150728/98c62581/attachment.html>

From c933103 at gmail.com  Tue Jul 28 12:46:28 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Wed, 29 Jul 2015 01:46:28 +0800
Subject: Emoji characters for food allergens
In-Reply-To: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com>
References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>
 <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>
 <474B9A36-2A8C-449A-8019-60B373459914@evertype.com>
Message-ID: <CAGHjPP+LUEkn=w3DEEcyfwhFen7Tq=jqV2DMuQ6cenQMShz1yg@mail.gmail.com>

Probably if these symbols are to be added to unicode, it would better to
allocate blocks that are not belong to emoji for them.

Also, it should be noted that emoji can look very different across
different places, see http://unicode.org/faq/emoji_dingbats.html and
http://www.unicode.org/reports/tr51/index.html#Design_Guidelines
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/77a00611/attachment.html>

From gwalla at gmail.com  Tue Jul 28 13:26:35 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Tue, 28 Jul 2015 11:26:35 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com>
References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost>
 <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com>
 <474B9A36-2A8C-449A-8019-60B373459914@evertype.com>
Message-ID: <CA+p4_H2+6QSLEEXWMSBrRGPamwT6uMa9iE+hjPe8XXFBrfYMAQ@mail.gmail.com>

Well, there are several emoji for various items encountered in daily
life, and I think the reasoning is that allergens are important things
to refer to because of their health effects. It's a bit of a leap to
say that means there's a need for dedicated pictograms though. I
agree, it does seem to be putting the cart before the horse.

On Tue, Jul 28, 2015 at 6:00 AM, Michael Everson <everson at evertype.com> wrote:
> I do NOT understand the rationale.
>
> Emojis are not for labelling things. They?re for the playful expression of emotions.
>
> Standardized symbols for allergens might be useful, if there were a textual use for them.
>
>> On 26 Jul 2015, at 06:05, Garth Wallace <gwalla at gmail.com> wrote:
>>
>> On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington
>> <wjgo_10009 at btinternet.com> wrote:
>>> Emoji characters for food allergens
>>>
>>> An interesting document entitled
>>>
>>> Preliminary proposal to add emoji characters for food allergens
>>>
>>> by Hiroyuki Komatsu
>>>
>>> was added into the UTC (Unicode Technical Committee) Document Register
>>> yesterday.
>>>
>>> http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf
>>>
>>> This is a welcome development.
>>
>> I'm skeptical. I understand the rationale, but several of the proposed
>> characters are essentially SMALL PILE OF BROWN DOTS and would be
>> difficult to distinguish at typical sizes.
>
> Michael Everson * http://www.evertype.com/
>
>


From doug at ewellic.org  Tue Jul 28 14:24:16 2015
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 28 Jul 2015 12:24:16 -0700
Subject: Emoji characters for food allergens
Message-ID: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>

gfb hjjhjh <c933103 at gmail dot com> wrote:

> Probably if these symbols are to be added to unicode, it would better
> to allocate blocks that are not belong to emoji for them. 

I'm curious what this is supposed to accomplish. It's not as though
people viewing such a symbol on a screen or in print, or entering it on
a phone keypad, will know or care what its Unicode code point is, or
what other types of symbols have nearby code points.

The Miscellaneous Symbols block contains U+2620 SKULL AND CROSSBONES,
U+2623 BIOHAZARD SIGN, and U+263A WHITE SMILING FACE.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From rscook at wenlin.com  Tue Jul 28 15:07:26 2015
From: rscook at wenlin.com (Richard Cook)
Date: Tue, 28 Jul 2015 13:07:26 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <55B7A631.7070301@ix.netcom.com>
References: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net>
 <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com>
 <55B7A631.7070301@ix.netcom.com>
Message-ID: <F92549C7-A6F1-410E-9F82-336F79404E54@wenlin.com>

On Jul 28, 2015, at 8:56 AM, Asmus Freytag <asmusf at ix.netcom.com> wrote:
> 
>> On 7/28/2015 8:07 AM, Richard Cook wrote:
>>> On Jul 28, 2015, at 7:53 AM, Doug Ewell <doug at ewellic.org> wrote:
>>> Richard Cook <rscook at wenlin dot com> wrote:
>>> 
>>>> And, what is the emotion playfully expressed by ???? ?
>>> "I'm having a burger and fries for lunch but can't be bothered to type
>>> all that into this text message lol"
>>> 
>> Is all that one emotion or two?
> 
> Remember:
> e-moji == picto-graph
> 
> and 
> 
> emoji != emoticon.
> 

hey Michael, 

You want ?? with that? ??

-R

> A./
>> 
>>> --
>>> Doug Ewell | http://ewellic.org | Thornton, CO ????
>>> 
>>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150728/572fc633/attachment.html>

From c933103 at gmail.com  Tue Jul 28 15:26:21 2015
From: c933103 at gmail.com (gfb hjjhjh)
Date: Wed, 29 Jul 2015 04:26:21 +0800
Subject: Emoji characters for food allergens
In-Reply-To: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
Message-ID: <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>

As according to http://unicode.org/faq/emoji_dingbats.html , emoji
characters do not have single semantics. Which I think it is not what the
original proposer want? Or were I misunderstanding that
2015?7?29? ??3:28? "Doug Ewell" <doug at ewellic.org>???

> gfb hjjhjh <c933103 at gmail dot com> wrote:
>
> > Probably if these symbols are to be added to unicode, it would better
> > to allocate blocks that are not belong to emoji for them.
>
> I'm curious what this is supposed to accomplish. It's not as though
> people viewing such a symbol on a screen or in print, or entering it on
> a phone keypad, will know or care what its Unicode code point is, or
> what other types of symbols have nearby code points.
>
> The Miscellaneous Symbols block contains U+2620 SKULL AND CROSSBONES,
> U+2623 BIOHAZARD SIGN, and U+263A WHITE SMILING FACE.
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ????
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/b51bd4ae/attachment.html>

From gwalla at gmail.com  Tue Jul 28 17:27:08 2015
From: gwalla at gmail.com (Garth Wallace)
Date: Tue, 28 Jul 2015 15:27:08 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
Message-ID: <CA+p4_H1Y2FrCKv038Q+UxMZbp0Xcy0gURndkgASO88HkEwv2QQ@mail.gmail.com>

That's what Mr. Overington wants, but he's not the original proposer.
The proposal by Hiroyuki Komatsu
<http://www.unicode.org/L2/L2015/15197r-emoji-food-allergens.pdf> does
not say anything of the sort, and by unifying some with existing
characters implies otherwise.

On Tue, Jul 28, 2015 at 1:26 PM, gfb hjjhjh <c933103 at gmail.com> wrote:
> As according to http://unicode.org/faq/emoji_dingbats.html , emoji
> characters do not have single semantics. Which I think it is not what the
> original proposer want? Or were I misunderstanding that
>
> 2015?7?29? ??3:28? "Doug Ewell" <doug at ewellic.org>???
>>
>> gfb hjjhjh <c933103 at gmail dot com> wrote:
>>
>> > Probably if these symbols are to be added to unicode, it would better
>> > to allocate blocks that are not belong to emoji for them.
>>
>> I'm curious what this is supposed to accomplish. It's not as though
>> people viewing such a symbol on a screen or in print, or entering it on
>> a phone keypad, will know or care what its Unicode code point is, or
>> what other types of symbols have nearby code points.
>>
>> The Miscellaneous Symbols block contains U+2620 SKULL AND CROSSBONES,
>> U+2623 BIOHAZARD SIGN, and U+263A WHITE SMILING FACE.
>>
>> --
>> Doug Ewell | http://ewellic.org | Thornton, CO ????
>>
>>
>


From mark at kli.org  Tue Jul 28 21:21:27 2015
From: mark at kli.org (Mark Shoulson)
Date: Tue, 28 Jul 2015 22:21:27 -0400
Subject: Revenge of pIqaD
Message-ID: <55B838A7.30603@kli.org>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150728/a3fe7171/attachment.html>

From Shawn.Steele at microsoft.com  Tue Jul 28 21:50:19 2015
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Wed, 29 Jul 2015 02:50:19 +0000
Subject: Revenge of pIqaD
In-Reply-To: <55B838A7.30603@kli.org>
References: <55B838A7.30603@kli.org>
Message-ID: <BLUPR03MB13789D75233C4ECAE4F5A40F828C0@BLUPR03MB1378.namprd03.prod.outlook.com>

You missed Bing translate?  http://www.bing.com/translator/?from=en&to=tlh-Qaak&text=Success

- Shawn

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark Shoulson
Sent: Tuesday, July 28, 2015 7:21 PM
To: unicode at unicode.org; Chris Lipscombe <qurgh at wizage.net>
Subject: Revenge of pIqaD

OK!  I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and I have to report that Klingon pIqaD really is out there and getting some use, despite having been banished to the PUA.  I've seen it on a wine-bottle label (commercially produced, not someone's homebrew), on the Klingon version of the Monopoly game, a book or two (NOT published by the KLI); there are websites using it (but then there were last time I mentioned this and that didn't seem to count then), and apparently support for it on several platforms, including a smartphone keypad, to say nothing of quite a few T-shirts.  Apparently there is a small community actually using pIqaD to (*gasp*) exchange information via SMS.  I'm copying Chris Lipscombe on this email; he is better plugged in to the use of pIqaD in Real Life? (don't forget to Reply All if you want to include him, since I think he isn't on the list at the moment).

What has to be done to get this encoded?  The proposal is likely still more or less what we need, and it probably has at least as much online information interchange as, say, Gondi does ("Well, what do you expect, Gondi isn't encoded yet!" "Neither is pIqaD.")  Are we ready to revisit this question again?

~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/8fbb91f4/attachment.html>

From Shawn.Steele at microsoft.com  Tue Jul 28 21:53:08 2015
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Wed, 29 Jul 2015 02:53:08 +0000
Subject: Revenge of pIqaD
References: <55B838A7.30603@kli.org> 
Message-ID: <BLUPR03MB1378ACB02E3FAE9AA9E594F9828C0@BLUPR03MB1378.namprd03.prod.outlook.com>

Ooo, I forgot that means everything is in pIqaD!  http://www.microsofttranslator.com/bv.aspx?from=en&to=tlh-Qaak&a=http%3A%2F%2Fwww.cnn.com%2F

From: Shawn Steele
Sent: Tuesday, July 28, 2015 7:50 PM
To: 'Mark Shoulson' <mark at kli.org>; unicode at unicode.org; Chris Lipscombe <qurgh at wizage.net>
Subject: RE: Revenge of pIqaD

You missed Bing translate?  http://www.bing.com/translator/?from=en&to=tlh-Qaak&text=Success

- Shawn

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark Shoulson
Sent: Tuesday, July 28, 2015 7:21 PM
To: unicode at unicode.org<mailto:unicode at unicode.org>; Chris Lipscombe <qurgh at wizage.net<mailto:qurgh at wizage.net>>
Subject: Revenge of pIqaD

OK!  I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and I have to report that Klingon pIqaD really is out there and getting some use, despite having been banished to the PUA.  I've seen it on a wine-bottle label (commercially produced, not someone's homebrew), on the Klingon version of the Monopoly game, a book or two (NOT published by the KLI); there are websites using it (but then there were last time I mentioned this and that didn't seem to count then), and apparently support for it on several platforms, including a smartphone keypad, to say nothing of quite a few T-shirts.  Apparently there is a small community actually using pIqaD to (*gasp*) exchange information via SMS.  I'm copying Chris Lipscombe on this email; he is better plugged in to the use of pIqaD in Real Life? (don't forget to Reply All if you want to include him, since I think he isn't on the list at the moment).

What has to be done to get this encoded?  The proposal is likely still more or less what we need, and it probably has at least as much online information interchange as, say, Gondi does ("Well, what do you expect, Gondi isn't encoded yet!" "Neither is pIqaD.")  Are we ready to revisit this question again?

~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/f3a7ee11/attachment.html>

From mark at kli.org  Tue Jul 28 21:57:40 2015
From: mark at kli.org (Mark Shoulson)
Date: Tue, 28 Jul 2015 22:57:40 -0400
Subject: Revenge of pIqaD
In-Reply-To: <BLUPR03MB13789D75233C4ECAE4F5A40F828C0@BLUPR03MB1378.namprd03.prod.outlook.com>
References: <55B838A7.30603@kli.org>
 <BLUPR03MB13789D75233C4ECAE4F5A40F828C0@BLUPR03MB1378.namprd03.prod.outlook.com>
Message-ID: <55B84124.7030108@kli.org>

For added amusement, type "Seqram" into Bing translate, translating from 
Klingon back to English, and see what you get.

~mark

On 07/28/2015 10:50 PM, Shawn Steele wrote:
>
> You missed Bing translate? 
> http://www.bing.com/translator/?from=en&to=tlh-Qaak&text=Success
>
> - Shawn
>
> *From:*Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of 
> *Mark Shoulson
> *Sent:* Tuesday, July 28, 2015 7:21 PM
> *To:* unicode at unicode.org; Chris Lipscombe <qurgh at wizage.net>
> *Subject:* Revenge of pIqaD
>
> OK!  I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and 
> I have to report that Klingon pIqaD really is out there and getting 
> some use, despite having been banished to the PUA.  I've seen it on a 
> wine-bottle label (commercially produced, not someone's homebrew), on 
> the Klingon version of the Monopoly game, a book or two (NOT published 
> by the KLI); there are websites using it (but then there were last 
> time I mentioned this and that didn't seem to count then), and 
> apparently support for it on several platforms, including a smartphone 
> keypad, to say nothing of quite a few T-shirts.  Apparently there is a 
> small community actually using pIqaD to (*gasp*) exchange information 
> via SMS.  I'm copying Chris Lipscombe on this email; he is better 
> plugged in to the use of pIqaD in Real Life? (don't forget to Reply 
> All if you want to include him, since I think he isn't on the list at 
> the moment).
>
> What has to be done to get this encoded?  The proposal is likely still 
> more or less what we need, and it probably has at least as much online 
> information interchange as, say, Gondi does ("Well, what do you 
> expect, Gondi isn't encoded yet!" "Neither is pIqaD.")  Are we ready 
> to revisit this question again?
>
> ~mark
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150728/daadfd0e/attachment.html>

From charupdate at orange.fr  Wed Jul 29 02:48:22 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 29 Jul 2015 09:48:22 +0200 (CEST)
Subject: Emoji characters for food allergens
In-Reply-To: <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost>
References: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215>
 <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost>
Message-ID: <1592423653.5282.1438156103036.JavaMail.www@wwinf1k37>

Hi William,

Sorry.

On 28 Jul 2015, at 12:19, William_J_G Overington  wrote: 

> Well a lot could be done information technology-wise to facilitate communication through the language barrier.

> For example in text messages, sent by email, or over a mobile telephone link or maybe thrown to a device nearby, to communicate dietary needs, using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the receiving device;

> For example, by using a smartphone by reading from an RFID tag (radio-frequency identification tag) on a shelf label in a supermarket display about a product . The RFID tag could contain the food allergen information about the food encoded using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the smartphone.


Alternately, scanning the EAN barcode on the package could give access to a database intended for food information. This requires the use of a smartphone or other compatible device.

Another use of allergen emojis would be to respond to an invitation by SMS. Somebody inviting to dinner at home, can gather information from guests about what allergens to keep away from the ingredients list when cooking. This is typically an emoji case.

The emotions implied with food allergens are concern, fear and anxiety. But, as already discussed in this thread, emoticons/emojis must not necessarily convey an emotion, the term having become somehow a generic for symbols.

Best regards,

Marcel Schneider
?

> Message du 28/07/15 12:19
> De : "William_J_G Overington" 
> A : "Marcel Schneider" 
> Copie ? : gwalla at gmail.com, unicode at unicode.org, komatsu at google.com
> Objet : re: Emoji characters for food allergens
> 
>

> Hi Marcel

> >> I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise.
> 
> > I'm not sure whether another code would facilitate the handling of these warnings. IMHO the allergen name in natural language is more efficient in communication. This needs however to identify and learn the words prior to travelling into a foreign language country, while a code point is more obvious to read if it's meaning is at hand.

> Well a lot could be done information technology-wise to facilitate communication through the language barrier.

> For example in text messages, sent by email, or over a mobile telephone link or maybe thrown to a device nearby, to communicate dietary needs, using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the receiving device;

> For example,?by using a smartphone by reading from an?RFID tag (radio-frequency identification tag) on a shelf?label in a supermarket display?about a product . The RFID tag could contain the food allergen information about the food encoded using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the smartphone.

> Rest regards, 

> William Overington


> 28 July 2015


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/9bb54540/attachment.html>

From charupdate at orange.fr  Wed Jul 29 03:10:02 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 29 Jul 2015 10:10:02 +0200 (CEST)
Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP)
Message-ID: <264002605.5965.1438157402178.JavaMail.www@wwinf1k37>

On 02 Jul 2015, at 12:22, I replied:

> However, I believe that WJs being a part of plain text, they should be properly supported on all text handling applications. And they should be on the keyboard.

> The solution I suggest is therefore to have the word joiner (and the sequences containing it) on Ctrl+Alt or Kana, and the zero width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users working efficently on good software may access the preferred character a bit easier than users who must use the deprecated character because their word processor does not properly support the preferred one.


Unfortunately that doesn?t work on at least one recent version of Windows. An unambigous bug was due to the presence of 0x2060 in the Ligatures table. This has cost me a whole workday to retrieve, fix, and verify.

The effect of the bug was that Word, Excel, Firefox and Zotero were unstartable.

As a result, the WORD JOINER cannot be implemented on a driver based keyboard layout for general use on Windows. By contrast, the ZWNBSP can.

Consequently we hope that such kind of bugs are being fixed on Windows 10, that is to be released today. If everybody using Windows 7 or 8 is being updated for free, Windows 10 will become the standard and we will be able to build upon.

It needs to be underscored that this kind of keyboard driver related bugs is normally impossible when using Keyman. I don?t see any way for the OS to detect the presence of 0x2060 in a ligatures table in order to block the full execution of the system, when this character is a part of some keyboard layout software that is fully managed and executed by an additional framework like Keyman. Under the actual overall circumstances, and for ease and flexibility of development and use, Keyman appears to me as an indispensable software for thorough and complete Unicode implementations.

Best regards,
Marcel Schneider
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/5ec35f9e/attachment.html>

From pandey at umich.edu  Wed Jul 29 06:09:59 2015
From: pandey at umich.edu (Anshuman Pandey)
Date: Wed, 29 Jul 2015 07:09:59 -0400
Subject: Revenge of pIqaD
In-Reply-To: <55B838A7.30603@kli.org>
References: <55B838A7.30603@kli.org>
Message-ID: <7AB838BC-4DCD-4F39-8858-E39238F8E6B4@umich.edu>


Dear Mark and Chris,

I wonder if copyright or other IP issues might hinder the suitability of encoding Klingon, similar to the Tolkien scripts?

And to be sure, Klingon certainly does have a larger digital presence than the Gondi scripts...

All the best,
Anshu


> On Jul 28, 2015, at 10:21 PM, Mark Shoulson <mark at kli.org> wrote:
> 
> OK!  I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and I have to report that Klingon pIqaD really is out there and     getting some use, despite having been banished to the PUA.  I've seen it on a wine-bottle label (commercially produced, not someone's homebrew), on the Klingon version of the Monopoly game, a book or two (NOT published by the KLI); there are websites using it (but then there were last time I mentioned this and that didn't seem to count then), and apparently support for it on several platforms, including a smartphone keypad, to say nothing of quite a few T-shirts.  Apparently there is a small community actually using pIqaD to (*gasp*) exchange information via SMS.  I'm copying Chris Lipscombe on this email; he is better plugged in to the use of pIqaD in Real Life? (don't forget to Reply All if you want to include him, since I think he isn't on the list at the moment).
> 
> What has to be done to get this encoded?  The proposal is likely still more or less what we need, and it probably has at least as much online information interchange as, say, Gondi does ("Well, what do you expect, Gondi isn't encoded yet!" "Neither is pIqaD.")  Are we ready to revisit this question again?
> 
> ~mark


From wjgo_10009 at btinternet.com  Wed Jul 29 03:21:17 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Wed, 29 Jul 2015 09:21:17 +0100 (BST)
Subject: Emoji characters for food allergens
In-Reply-To: <1592423653.5282.1438156103036.JavaMail.www@wwinf1k37>
References: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215>
 <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost>
 <1592423653.5282.1438156103036.JavaMail.www@wwinf1k37>
Message-ID: <19038120.7776.1438158077110.JavaMail.defaultUser@defaultHost>

Hi Marcel
> Alternately, scanning the EAN barcode on the package could give access to a database intended for food information. This requires the use of a smartphone or other compatible device.
That is a good idea.
In which case the emoji would not need to be encoded on the package, yet would be sent by the database facility. Using EAN barcode to database and the results sent to the end user would need a two-way communication link and that could possibly mean queueing problems as the database facility would possibly be answering requests from many people.
Another possibility would be to encode the Unicode characters for the allergens contained in the food within a QR code (Quick Response Code) on the package.
Decoding could then be local, in the device being used to scan the QR code.
Both of these methods, EAN barcode and QR code, could be used to communicate through the language barrier, either by viewing the emoji, or by the emoji becoming converted to localized text in the device that is being used by the end user.
Best regards,
William Overington
29 July 2015
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/79bfc06d/attachment.html>

From wjgo_10009 at btinternet.com  Wed Jul 29 03:38:38 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Wed, 29 Jul 2015 09:38:38 +0100 (BST)
Subject: Emoji characters for food allergens
Message-ID: <5794935.9139.1438159118596.JavaMail.defaultUser@defaultHost>

>> Probably if these symbols are to be added to unicode, it would better to allocate blocks that are not belong to emoji for them. 

> I'm curious what this is supposed to accomplish. It's not as though people viewing such a symbol on a screen or in print, or entering it on a phone keypad, will know or care what its Unicode code point is, or what other types of symbols have nearby code points.

Yet some people might be using a system with an Insert Symbol... facility to prepare an email or to design a label or whatever.

In such Insert Symbol... facilities it is often the case that characters are listed in Unicode code point order.

My original purpose of suggesting separate blocks of code points was to seek to avoid a symbol relating to a food allergen having more than one meaning, one precise and medical, one or more others just everyday chat.

The issue of the meaning of an emoji character not being precisely defined that has been discussed in other posts in this thread makes having separate blocks and maybe not even terming the characters as emoji but as "precise emoji" or some other new term, become very important so as to avoid confusion in the application of the symbols.

Also, suppose that a person programming an app wishes to have the software in the app notice whatever food allergen emoji characters are in a message. Having them all within two contiguous blocks of code points would assist the programming process.

There was also a coding aesthetics aspect that separate blocks seems better to me as a way to organize such an encoding.

William Overington

29 July 2015


From wjgo_10009 at btinternet.com  Wed Jul 29 08:42:59 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Wed, 29 Jul 2015 14:42:59 +0100 (BST)
Subject: Emoji characters for food allergens
In-Reply-To: <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
Message-ID: <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>

> As according to http://unicode.org/faq/emoji_dingbats.html , emoji characters do not have single semantics. Which I think it is not what the original proposer want? Or were I misunderstanding that
Garth Wallace has already indicated in his reply to your post that it was me, not the original proposer, who wanted single semantics.
Thank you for the link. I have followed it and read in the document what it says about single semantics.
Oh!
Well, it seems to me that something has got to give in order for "Emoji characters for food allergies" to work effectively.
The easiest thing appears to be to not call the items emoji.
I opine that a new word is needed to mean the following.
A character that looks like it is an emoji character yet has precise semantics.
There is an issue here that is, in my opinion, quite fundamental to the future of encoding items that are currently all regarded as emoji: an issue that goes far beyond the matter of encoding emoji characters for food allergens.
Communication through the language barrier is of huge importance and may become more so in the future.
Emoji seemed like a wonderful way to achieve communication through the language barrier.
Yet if semantics are not defined, then there is a problem.
Please consider the matter of text to speech in the draft Unicode Technical Report 51.
I remember years ago I was asked in this mailing list what chat means.
I think that discussing the meaning of chat is some classic Unicode cultural matter.
In English it is an informal talk between two or more people, in French it is a cat.
So the sequence of Unicode characters only has meaning in the context that they are being used.
Now the big opportunity with emoji could be to assist communication through the language barrier.
>From reading about semantics in the linked document it appears that that opportunity might be disappearing or may have gone already.
This, in my opinion, is unfortunate.
The food allergen characters could, by being precisely defined with one and only one meaning, be either an exception to the general situation or could be the start of a trend.
A name other than emoji is needed for such characters that have one and only one meaning, that meaning precisely defined.
Those characters could still be colourful and could look emoji-ish.
Maybe they could be double width so as to show their distinctiveness?
Would double width characters be a problem as regards applying them in systems such as mobile telephones at present?
Now, such precisely defined emoji could be entirely representationally pictures, yet there could also be abstract pictures and also pictures that are partly representational and partly abstract.
For example, one such character could be used to be placed before a list of emoji characters for food allergens to indicate that that a list of dietary need follows.
For example,
My dietary need is no gluten no dairy no egg
There could be a way to indicate the following.
My diet can include soya
There is a situation that affects further discussion of some aspects of this matter, though not all aspects of this matter, as a totally symbolic representation could still be discussed.
http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0208.html
However, there is also the following.
http://www.oxforddictionaries.com/definition/english/moratorium
Please note the use of the word temporary in the definition.
So maybe all is not lost and discussion of all aspects will become possible at some future time.
William Overington
29 July 2015
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150729/7e743275/attachment.html>

From andrewcwest at gmail.com  Wed Jul 29 09:27:13 2015
From: andrewcwest at gmail.com (Andrew West)
Date: Wed, 29 Jul 2015 15:27:13 +0100
Subject: Emoji characters for food allergens
In-Reply-To: <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
Message-ID: <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com>

On 29 July 2015 at 14:42, William_J_G Overington
<wjgo_10009 at btinternet.com> wrote:
>
> For example, one such character could be used to be placed before a list of
> emoji characters for food allergens to indicate that that a list of dietary
> need follows.
>
> For example,
>
> My dietary need is no gluten no dairy no egg
>
> There could be a way to indicate the following.
>
> My diet can include soya

There already is, you can write "My diet can include soya".

If you are likely to swell up and die if you eat a peanut (for
example), you will not want to trust your life to an emoji picture of
a peanut which could be mistaken for something else or rendered as a
square box for the recipient.  There may be a case to be made for
encoding symbols for food allergens for labelling purposes, but there
is no case for encoding such symbols as a form of symbolic language
for communication of dietary requirements.

Andrew

From doug at ewellic.org  Wed Jul 29 11:39:51 2015
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 29 Jul 2015 09:39:51 -0700
Subject: Emoji characters for food allergens
Message-ID: <20150729093951.665a7a7059d7ee80bb4d670165c8327d.bef66cbee0.wbe@email03.secureserver.net>

Andrew West <andrewcwest at gmail dot com> wrote:

> There may be a case to be made for encoding symbols for food allergens
> for labelling purposes, but there is no case for encoding such symbols
> as a form of symbolic language for communication of dietary
> requirements.

For what little it is worth, I agree with Andrew on this.

Earlier I mentioned U+2620 SKULL AND CROSSBONES and U+2623 BIOHAZARD
SIGN, two symbols which have been in Unicode since the dawn of time.
Both of these are Level 2 emoji, according to emoji-data.txt [1], and
are accorded no special treatment, placement, or display guidelines
beyond that. While communication about food allergens is undoubtedly
important, it's hard to imagine that communication about poisons and
biohazards is any less important.

[1] http://www.unicode.org/Public/emoji/1.0//emoji-data.txt

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From richard.wordingham at ntlworld.com  Wed Jul 29 13:48:00 2015
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 29 Jul 2015 19:48:00 +0100
Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP)
In-Reply-To: <264002605.5965.1438157402178.JavaMail.www@wwinf1k37>
References: <264002605.5965.1438157402178.JavaMail.www@wwinf1k37>
Message-ID: <20150729194800.5dba3d0b@JRWUBU2>

On Wed, 29 Jul 2015 10:10:02 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> On 02 Jul 2015, at 12:22, I replied:
> 
> > However, I believe that WJs being a part of plain text, they should
> > be properly supported on all text handling applications. And they
> > should be on the keyboard.
> 
> > The solution I suggest is therefore to have the word joiner (and
> > the sequences containing it) on Ctrl+Alt or Kana, and the zero
> > width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users
> > working efficently on good software may access the preferred
> > character a bit easier than users who must use the deprecated
> > character because their word processor does not properly support
> > the preferred one.

> Unfortunately that doesn?t work on at least one recent version of
> Windows. An unambigous bug was due to the presence of 0x2060 in the
> Ligatures table. This has cost me a whole workday to retrieve, fix,
> and verify.

> The effect of the bug was that Word, Excel, Firefox and Zotero were
> unstartable.

> As a result, the WORD JOINER cannot be implemented on a driver based
> keyboard layout for general use on Windows. By contrast, the ZWNBSP
> can.

Your lament is a bit vague - I'm not sure what U+2060 is doing in a
'ligature table'.  I can say that a Windows keyboard mapping that
maps AltGr-M to WJ which was created using MSKLC on Windows 7 in April
2011 still works.

Richard.


From duerst at it.aoyama.ac.jp  Wed Jul 29 20:06:43 2015
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=)
Date: Thu, 30 Jul 2015 10:06:43 +0900
Subject: Emoji characters for food allergens
In-Reply-To: <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com>
Message-ID: <55B978A3.1010807@it.aoyama.ac.jp>


On 2015/07/29 23:27, Andrew West wrote:
> On 29 July 2015 at 14:42, William_J_G Overington

>> My diet can include soya
>
> There already is, you can write "My diet can include soya".
>
> If you are likely to swell up and die if you eat a peanut (for
> example), you will not want to trust your life to an emoji picture of
> a peanut which could be mistaken for something else

Yes, in the worst case for something like "I like peanuts".

> or rendered as a
> square box for the recipient.  There may be a case to be made for
> encoding symbols for food allergens for labelling purposes, but there
> is no case for encoding such symbols as a form of symbolic language
> for communication of dietary requirements.
>
> Andrew
> .
>

From mark at kli.org  Wed Jul 29 20:15:45 2015
From: mark at kli.org (Mark E. Shoulson)
Date: Wed, 29 Jul 2015 21:15:45 -0400
Subject: Emoji characters for food allergens
In-Reply-To: <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com>
Message-ID: <55B97AC1.3080902@kli.org>

Indeed; depending on special Emoji characters to convey unambiguously an 
crucial sentence beyond language barriers also treads very close to 
using those "localizable sentences" we mustn't talk about.

~mark

On 07/29/2015 10:27 AM, Andrew West wrote:
> On 29 July 2015 at 14:42, William_J_G Overington
> <wjgo_10009 at btinternet.com> wrote:
>> For example, one such character could be used to be placed before a list of
>> emoji characters for food allergens to indicate that that a list of dietary
>> need follows.
>>
>> For example,
>>
>> My dietary need is no gluten no dairy no egg
>>
>> There could be a way to indicate the following.
>>
>> My diet can include soya
> There already is, you can write "My diet can include soya".
>
> If you are likely to swell up and die if you eat a peanut (for
> example), you will not want to trust your life to an emoji picture of
> a peanut which could be mistaken for something else or rendered as a
> square box for the recipient.  There may be a case to be made for
> encoding symbols for food allergens for labelling purposes, but there
> is no case for encoding such symbols as a form of symbolic language
> for communication of dietary requirements.
>
> Andrew


From mark at kli.org  Wed Jul 29 23:01:06 2015
From: mark at kli.org (Mark E. Shoulson)
Date: Thu, 30 Jul 2015 00:01:06 -0400
Subject: Emoji characters for food allergens
In-Reply-To: <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
Message-ID: <55B9A182.2030504@kli.org>

On 07/29/2015 09:42 AM, William_J_G Overington wrote:
>
>     The easiest thing appears to be to not call the items emoji.
>
>     I opine that a new word is needed to mean the following.
>
>     A character that looks like it is an emoji character yet has
>     precise semantics.
>

So, like, a localizable sentence character?  Something that has a 
precise, sentence-level meaning that is not linguistically determined?  
We aren't doing those here, as far as I know.


~mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150730/a010f638/attachment.html>

From wjgo_10009 at btinternet.com  Thu Jul 30 03:51:35 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Thu, 30 Jul 2015 09:51:35 +0100 (BST)
Subject: Emoji characters for food allergens
In-Reply-To: <55B9A182.2030504@kli.org>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
Message-ID: <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>

>> The easiest thing appears to be to not call the
          items emoji.
>> I opine that a new word is needed to mean the
          following.
>> A character that looks like it is an emoji
          character yet has precise semantics.
      
    
> So, like, a localizable sentence character?
Well, a localizable sentence character with an emoji-like symbol would indeed be an example of such a character.
Yet not every character that looks like it is an emoji          character yet has precise semantics would be a localizable sentence.
Indeed, not every localizable sentence symbol would look like an emoji character. My research has used symbols 23 units in width by 7 units in height. 
For example, please consider an emoji symbol to mean "railway station" and, for example, please consider an emoji symbol to mean "peppermint tea".
If, for example, an emoji symbol that starts off to mean "railway station" became used to mean "transportation station" then the way to express specifically a railway station as an emoji rather than expressing just a place that may be either or both of a railway station and a bus station would become lost. 
If, for example, a symbol that starts off to mean "peppermint tea" became used to mean "herbal tea", then the way to express specifically peppermint tea as an emoji rather than expressing just a cup of herbal tea that might be peppermint or one of many other flavours of herbal tea would become lost.
The emoji characters for food allergens are not localizable sentences, yet they do need, in my opinion, precise definitions and should be encoded in a separate block and given a name not as emoji but as some other name that combines them looking like emoji yet emphasises the precision of their definition: maybe they should be double width so as to avoid confusion: maybe each glyph should include a surrounding landscape format ellipse so as to emphasise their difference from ordinary emoji.
> Something that has a
      precise, sentence-level meaning that is not linguistically
      determined?  We aren't doing those here, as far as I know.
Well, I am not a linguist and I do not fully understand that question or the comment that follows it.
I have just tried to state a problem that I feel exists and hope that people who are expert in such matters can consider it and hopefully find a solution.
William Overington
30 July 2015
    
    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150730/4440e211/attachment.html>

From charupdate at orange.fr  Thu Jul 30 10:56:11 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 30 Jul 2015 17:56:11 +0200 (CEST)
Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP)
Message-ID: <792325334.18912.1438271771971.JavaMail.www@wwinf1e26>

On Wen 29 Jul 2015, at 20:57, Richard Wordingham  wrote:

> On Wed, 29 Jul 2015 10:10:02 +0200 (CEST)
> Marcel Schneider  wrote:
> 
> > On 02 Jul 2015, at 12:22, I replied:
> > 
> > > However, I believe that WJs being a part of plain text, they should
> > > be properly supported on all text handling applications. And they
> > > should be on the keyboard.
> > 
> > > The solution I suggest is therefore to have the word joiner (and
> > > the sequences containing it) on Ctrl+Alt or Kana, and the zero
> > > width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users
> > > working efficently on good software may access the preferred
> > > character a bit easier than users who must use the deprecated
> > > character because their word processor does not properly support
> > > the preferred one.
> 
> > Unfortunately that doesn?t work on at least one recent version of
> > Windows. An unambigous bug was due to the presence of 0x2060 in the
> > Ligatures table. This has cost me a whole workday to retrieve, fix,
> > and verify.
> 
> > The effect of the bug was that Word, Excel, Firefox and Zotero were
> > unstartable.
> 
> > As a result, the WORD JOINER cannot be implemented on a driver based
> > keyboard layout for general use on Windows. By contrast, the ZWNBSP
> > can.
> 
> Your lament is a bit vague - I'm not sure what U+2060 is doing in a
> 'ligature table'. I can say that a Windows keyboard mapping that
> maps AltGr-M to WJ which was created using MSKLC on Windows 7 in April
> 2011 still works.

I'm really pleased to learn about every initiative to implement Unicode in input practice, and I take notice that an MSKLC layout with U+2060 does not make Windows block heavy applications. Indeed I wasn't very clear, as in the deadlist I can keep 0x2060 without any problem (Compose, Space, G). This is just not very speedful.

The so-called ligatures, by contrast, must not be constructed with 0x2060. This however was the case of three items:
- A justifying no-break space emulation 0x2060 0x0020 0x2060, for use in word processors where the NBSP is not justifying, unlike as in desktop publishing and high-end editing software as Philippe Verdy referred to, where U+00A0 is justifying. It not being in word processing is consistent with the need of using U+00A0 along with punctuations in French, and the lack of U+202F in many fonts.
- A colon with such a justifying no-break space, for use in documents that imitate the usage of at least a part, if not mainstream, old-fashioned typography: 0x2060 0x0020 0x2060 0x003a.
- A punctuation apostrophe emulation 0x2060 0x0027 0x2060, mapped to Kana + I.

I'm about to test on another Windows Edition. I wonder if there is a real issue or not, as you are suggesting. Nevertheless I believe that no such bugs must occur in whatever version and edition of Windows.

Thank you for your feedback.

Best regards,

Marcel


From charupdate at orange.fr  Thu Jul 30 12:07:36 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 30 Jul 2015 19:07:36 +0200 (CEST)
Subject: Emoji characters for food allergens
In-Reply-To: <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
 <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
Message-ID: <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>

I'll try to respond to all, having not much time outside my main concerns, sorry.

Indeed I agree that there are limits to the automatization of interhuman communication. In practice, whenever we are in contact with one another, the use of natural language is preferrable. Emoticons and other pictographs IMHO are intended to complete what written language cannot express in a reasonably little number of words, or for ready orientation. When at a moment or another we fall back to natural language, using this from the beginning on seems more efficient. My bad idea about responding to an invitation by a set of nutrition constraint pictographs ends up to rather prepare a predefined message in every language we're expecting invitations in. About reading packaging information, it might not be enough to avoid allergens, we should pay attention to the presence of palm oil because of the useless devastation of primates' habitats while enough fallow land exists in a concerned country for palm oil production until 2050, just as an example of how food choices are complex and need thorough awareness of numerous parameters, far beyond allergens, regardless of how life threatening these often are. Moreover, the lives of everybody on earth are threatened by imminent climate change (please see http://avaaz.org/en/ too).

The Babel issue about how to communicate in language confusion might soon be resolved, if there is no more communication at all...

Best regards,

Marcel Schneider
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150730/861eee0c/attachment.html>

From asmus-inc at ix.netcom.com  Thu Jul 30 13:45:42 2015
From: asmus-inc at ix.netcom.com (Asmus Freytag (t))
Date: Thu, 30 Jul 2015 11:45:42 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
 <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
 <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
Message-ID: <55BA70D6.2070002@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150730/ec2e4604/attachment.html>

From doug at ewellic.org  Thu Jul 30 13:46:31 2015
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 30 Jul 2015 11:46:31 -0700
Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP)
Message-ID: <20150730114631.665a7a7059d7ee80bb4d670165c8327d.7c3d4b8766.wbe@email03.secureserver.net>

Marcel Schneider <charupdate at orange dot fr> wrote:

>>> Unfortunately that doesn?t work on at least one recent version of
>>> Windows. An unambigous bug was due to the presence of 0x2060 in the
>>> Ligatures table. This has cost me a whole workday to retrieve, fix,
>>> and verify.
>>>
>>> The effect of the bug was that Word, Excel, Firefox and Zotero were
>>> unstartable.
>>>
>>> As a result, the WORD JOINER cannot be implemented on a driver based
>>> keyboard layout for general use on Windows. By contrast, the ZWNBSP
>>> can.

and:

> The so-called ligatures, by contrast, must not be constructed with
> 0x2060. This however was the case of three items:
>
> - A justifying no-break space emulation 0x2060 0x0020 0x2060, for use
> in word processors where the NBSP is not justifying, unlike as in
> desktop publishing and high-end editing software as Philippe Verdy
> referred to, where U+00A0 is justifying. It not being in word
> processing is consistent with the need of using U+00A0 along with
> punctuations in French, and the lack of U+202F in many fonts.
>
> - A colon with such a justifying no-break space, for use in documents
> that imitate the usage of at least a part, if not mainstream, old-
> fashioned typography: 0x2060 0x0020 0x2060 0x003a.
>
> - A punctuation apostrophe emulation 0x2060 0x0027 0x2060, mapped to
> Kana + I.
>
> I'm about to test on another Windows Edition. I wonder if there is a
> real issue or not, as you are suggesting. Nevertheless I believe that
> no such bugs must occur in whatever version and edition of Windows.

I created, installed, and activated an MSKLC keyboard with the three WJ
sequences described above, mapped for convenience to AltGr+Z, AltGr+X,
and AltGr+C respectively (not the Kana key, which I don't have), and had
no trouble opening or using any applications on Windows 7, including the
four mentioned above (except Zotero, which I don't use). KLC source
available on request.

I wouldn't have wasted the 15 minutes but for the continuing, tiresome
rhetoric about Windows bugs.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From andrewcwest at gmail.com  Thu Jul 30 14:07:12 2015
From: andrewcwest at gmail.com (Andrew West)
Date: Thu, 30 Jul 2015 20:07:12 +0100
Subject: Emoji characters for food allergens
In-Reply-To: <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
 <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
 <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
Message-ID: <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com>

On 30 July 2015 at 18:07, Marcel Schneider <charupdate at orange.fr> wrote:
>
> I'll try to respond to all,

Please don't.

Andrew

From asmus-inc at ix.netcom.com  Thu Jul 30 15:56:00 2015
From: asmus-inc at ix.netcom.com (Asmus Freytag (t))
Date: Thu, 30 Jul 2015 13:56:00 -0700
Subject: Emoji characters for food allergens
In-Reply-To: <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
 <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
 <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
 <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com>
Message-ID: <55BA8F60.3050104@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150730/fdc23603/attachment.html>

From wjgo_10009 at btinternet.com  Fri Jul 31 04:16:42 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Fri, 31 Jul 2015 10:16:42 +0100 (BST)
Subject: Emoji characters for food allergens
In-Reply-To: <55BA70D6.2070002@ix.netcom.com>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
 <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
 <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
 <55BA70D6.2070002@ix.netcom.com>
Message-ID: <12270613.13467.1438334202211.JavaMail.defaultUser@defaultHost>

 >> it might not be enough to avoid allergens,
        we should pay attention to the presence of palm oil because of
        the useless devastation of primates' habitats while enough
        fallow land exists in a concerned country for palm oil
        production until 2050
    > I believe that for topics like this, there are
      other lists or forums that are more appropriate.
Well, Marcel was writing in the context of reading packaging information in a thread about emoji characters for food allergens.
Now it could perhaps be said that encoding a symbol to indicate the presence of palm oil is off-topic to the thread and that a new thread spinning off from this thread would be desirable, yet still in this mailing list.
However, it could also be said that as this thread is about emoji and food ingredients and knowing what is in a particular foodstuff that, although not strictly on-topic, it is relevant to discuss encoding a symbol to indicate the presence of palm oil in this thread.
I had considered suggesting an emoji to express that a food is vegan, yet held back as it is not an allergen issue, more a lifestyle choice.
Yet a statement that a foodstuff is suitable for a vegan diet does appear on some food packaging.
Some packages also have an indication of spice strength, though I have observed that this is, within the gamut of my observations, only for things that are regarded as spicy as such, like curries, not just for a little spice in, say, the ingredients list of a soup.
For me, as a gluten-avoiding vegan who avoids spicy food, the encoding of an emoji regarding gluten, yet not one for vegan or for no spice seems an issue that could reasonably be addressed while considering emoji for food allergens.
So, I thank Marcel for raising the issue of palm oil in this thread.
William Overington
31 July 2015
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150731/38b84ef9/attachment.html>

From wjgo_10009 at btinternet.com  Fri Jul 31 04:37:52 2015
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Fri, 31 Jul 2015 10:37:52 +0100 (BST)
Subject: Emoji characters for food allergens
In-Reply-To: <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
 <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
 <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
 <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com>
Message-ID: <29774584.15201.1438335472835.JavaMail.defaultUser@defaultHost>

>> I'll try to respond to all,

> Please don't.

What Marcel wrote was as follows:

quote

I'll try to respond to all, having not much time outside my main concerns, sorry. 

end quote

When I first read that, and indeed when I read it again after reading Andrew's comment, I read it as Marcel wishing that he could reply individually to each of several posts in this thread, but as he was busy, he would reply in just the one post, the post he was then writing, to various points.

Thus there was no need to ask him not to do so, as he had already done it in that same post.

As someone else has decided to post supporting the request, I reply that I enjoy reading Marcel's posts and that I hope that he continues.

These are important issues for end users of encoding standards and for consumers generally as they are about food allergens and the labelling of food packaging.

A request not to post and support for a request not to post without stating any reason whatsoever is, in my opinion, unfair.

William Overington

31 July 2015


From charupdate at orange.fr  Fri Jul 31 15:51:27 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 31 Jul 2015 22:51:27 +0200 (CEST)
Subject: Windows 10 release (was: Re: WORD JOINER vs ZWNBSP)
In-Reply-To: <20150730114631.665a7a7059d7ee80bb4d670165c8327d.7c3d4b8766.wbe@email03.secureserver.net>
References: <20150730114631.665a7a7059d7ee80bb4d670165c8327d.7c3d4b8766.wbe@email03.secureserver.net>
Message-ID: <1621810209.26857.1438375887376.JavaMail.www@wwinf1j14>

On 30 Jul 2015 at 20:56, Doug Ewell  wrote:

> I created, installed, and activated an MSKLC keyboard with the three WJ
> sequences described above, mapped for convenience to AltGr+Z, AltGr+X,
> and AltGr+C respectively (not the Kana key, which I don't have), and had
> no trouble opening or using any applications on Windows 7, including the
> four mentioned above (except Zotero, which I don't use). KLC source
> available on request.
> 
> I wouldn't have wasted the 15 minutes but for the continuing, tiresome
> rhetoric about Windows bugs.

Thank you for having tested. Indeed the problem turned out to be located at another level. I'm still usure, but the WJ works now exept that LibreOffice doesn't insert the sequence. 

Sorry for my complaint.

Best regards,

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150731/8291b605/attachment.html>

From charupdate at orange.fr  Fri Jul 31 15:58:40 2015
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 31 Jul 2015 22:58:40 +0200 (CEST)
Subject: Emoji characters for food allergens
In-Reply-To: <29774584.15201.1438335472835.JavaMail.defaultUser@defaultHost>
References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net>
 <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com>
 <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost>
 <55B9A182.2030504@kli.org>
 <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost>
 <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34>
 <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com>
 <29774584.15201.1438335472835.JavaMail.defaultUser@defaultHost>
Message-ID: <1958555392.26939.1438376320812.JavaMail.www@wwinf1j14>

On 31 Jul 2015 at 15:32, William_J_G Overington  wrote:

> A request not to post and support for a request not to post without stating any reason whatsoever is, in my opinion, unfair.

Thank you; however I believe that Mr West's and Mr Freytag's reactions were triggered also by my hasty complaints about Microsoft.?Fundamentally I didn't respect a mailing list rule which is to always respond to a particular request or statement, to stick with the thread. I'm sorry to have uselessly vented; nevertheless I'm thinking about some precise replies which I'll send soon.

All the best,

Marcel Schneider

?

> Message du 31/07/15 15:32
> De : "William_J_G Overington" 
> A : komatsu at google.com, andrewcwest at gmail.com, asmus-inc at ix.netcom.com, charupdate at orange.fr
> Copie ? : unicode at unicode.org
> Objet : Re: Emoji characters for food allergens
> 
> >> I'll try to respond to all,
> 
> > Please don't.
> 
> What Marcel wrote was as follows:
> 
> quote
> 
> I'll try to respond to all, having not much time outside my main concerns, sorry. 
> 
> end quote
> 
> When I first read that, and indeed when I read it again after reading Andrew's comment, I read it as Marcel wishing that he could reply individually to each of several posts in this thread, but as he was busy, he would reply in just the one post, the post he was then writing, to various points.
> 
> Thus there was no need to ask him not to do so, as he had already done it in that same post.
> 
> As someone else has decided to post supporting the request, I reply that I enjoy reading Marcel's posts and that I hope that he continues.
> 
> These are important issues for end users of encoding standards and for consumers generally as they are about food allergens and the labelling of food packaging.
> 
> A request not to post and support for a request not to post without stating any reason whatsoever is, in my opinion, unfair.
> 
> William Overington
> 
> 31 July 2015
> 
> 
> 
> 
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150731/079ddeea/attachment.html>