From unicode at unicode.org  Thu Mar  1 04:56:02 2018
From: unicode at unicode.org (James Kass via Unicode)
Date: Thu, 1 Mar 2018 02:56:02 -0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <91680448.22170.1519824152519@ox.hosteurope.de>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
Message-ID: <CABPY6Z1HqP5ioOq6KiFtmHJ5OO=fhdXJ9QbSMCd_YPTTH9aqUA@mail.gmail.com>

Christoph P?per wrote,

>> There are approximately 7,000 living human languages,
>> but fewer than 100 of these languages are well-supported on computers,
>> ...
>
> Why is the announcement mentioning those numbers of languages at all?
> The script coverage of written living human languages, except
> for constructed ones, is almost complete in Unicode and rendering
> for most of them is reasonably well supported by all modern
> operating systems ...

This page ...
https://www.unicode.org/standard/unsupported.html
... lists several modern scripts which are not yet encoded.  (Hanifi
Rohingya, Gunjala Gondi, Loma, Medefaidrin, Naxi Dongba (Moso), and
Nyiakeng Puachue Hmong.)  It's noted that there are additional unencoded
"minor modern scripts" shown on the Roadmap, which implies that those
listed are also "minor".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180301/cf18ec5a/attachment.html>

From unicode at unicode.org  Thu Mar  1 05:11:58 2018
From: unicode at unicode.org (James Kass via Unicode)
Date: Thu, 1 Mar 2018 03:11:58 -0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CABPY6Z1HqP5ioOq6KiFtmHJ5OO=fhdXJ9QbSMCd_YPTTH9aqUA@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CABPY6Z1HqP5ioOq6KiFtmHJ5OO=fhdXJ9QbSMCd_YPTTH9aqUA@mail.gmail.com>
Message-ID: <CABPY6Z3Xn7Pd3nEKErye=iB5u5X=igBTzSTv7BiEb=h_rGRQeg@mail.gmail.com>

Here's a good opening line:

"The Unicode Standard encodes scripts rather than languages."

https://www.unicode.org/standard/supported.html

But, quoting from this page:

http://www.unicode.org/consortium/aboutdonations.html

" ... and provide universal access for the world's languages?past,
present, and future. The Consortium lays the groundwork to enable
universal access by encoding the characters for the world?s languages,
..."

That's inaccurate.  Languages don't use characters, technically.  It's
more about providing universal access for the world's communication,
data, and history.  You know, the sum of mankind's knowledge that's
been digitized so far.  Unicode encodes the characters used for the
world's computer data interchange and storage systems.

Salesmen and techies have different requirements for accuracy, however.


From unicode at unicode.org  Thu Mar  1 11:04:05 2018
From: unicode at unicode.org (Tim Partridge via Unicode)
Date: Thu, 1 Mar 2018 17:04:05 +0000
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CABPY6Z3Xn7Pd3nEKErye=iB5u5X=igBTzSTv7BiEb=h_rGRQeg@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CABPY6Z1HqP5ioOq6KiFtmHJ5OO=fhdXJ9QbSMCd_YPTTH9aqUA@mail.gmail.com>,
 <CABPY6Z3Xn7Pd3nEKErye=iB5u5X=igBTzSTv7BiEb=h_rGRQeg@mail.gmail.com>
Message-ID: <MM1P12301MB1530A687ACD748380AB6BB63DEC60@MM1P12301MB1530.GBRP123.PROD.OUTLOOK.COM>

Perhaps the CLDR work the Consortium does is being referenced. That is by language on this list http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ee By the time it gets to the 100th entry the Modern percentage has "room for improvement".

Regards,

Tim
________________________________________
From: Unicode [unicode-bounces at unicode.org] on behalf of James Kass via Unicode [unicode at unicode.org]
Sent: 01 March 2018 11:11
To: Unicode Public
Subject: Re: Unicode Emoji 11.0 characters now ready for adoption!

Here's a good opening line:

"The Unicode Standard encodes scripts rather than languages."

https://www.unicode.org/standard/supported.html

But, quoting from this page:

http://www.unicode.org/consortium/aboutdonations.html

" ... and provide universal access for the world's languages?past,
present, and future. The Consortium lays the groundwork to enable
universal access by encoding the characters for the world?s languages,
..."

That's inaccurate.  Languages don't use characters, technically.  It's
more about providing universal access for the world's communication,
data, and history.  You know, the sum of mankind's knowledge that's
been digitized so far.  Unicode encodes the characters used for the
world's computer data interchange and storage systems.

Salesmen and techies have different requirements for accuracy, however.


From unicode at unicode.org  Thu Mar  1 14:10:07 2018
From: unicode at unicode.org (Doug Ewell via Unicode)
Date: Thu, 01 Mar 2018 13:10:07 -0700
Subject: Unicode Emoji 11.0 characters now ready for adoption!
Message-ID: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com>

Tim Partridge wrote:

> Perhaps the CLDR work the Consortium does is being referenced. That is
> by language on this list
> http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ee
> By the time it gets to the 100th entry the Modern percentage has "room
> for improvement".

I think that is a measurement of locale coverage -- whether the
collation tables and translations of "a.m." and "p.m." and "a week ago
Thursday" are correct and verified -- not character coverage.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


From unicode at unicode.org  Fri Mar  2 07:29:46 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Fri, 2 Mar 2018 14:29:46 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com>
References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com>
Message-ID: <CAJ2xs_HnR3OfiRB+SyhZ7YYb3CJD_AGG1aUD_Et4etVaBGBC2A@mail.gmail.com>

Right, Doug. I'll say a few more words.

In terms of language support, encoding of new characters in Unicode
benefits mostly digital heritage languages (via representation of historic
languages in Unicode, enabling preservation and scholarly work), although
there are some modern-use cases like Hanifi Rohingya. We do include digital
heritage under the umbrella of "digitally disadvantaged languages", but we
are not consistent in our terminology sometimes.

But encoding is just a first step. A vital first step, but just one step.

People tend to forget that adding new characters is just a part of what
Unicode does. For script support, it is just as important to have correct
Unicode algorithms and properties, such as correct values for the
Indic_Positional_Category
property (which together with the related work in with the Universal
Shaping Engine, allows for proper rendering of many languages). Behind the
scenes we have people like Ken and Laurentiu who have to dig through the
encoding proposals and fill in the many, many gaps to come up with
reasonable properties for such basic behavior as line-break.

As important as the work is on encoding, properties, and algorithms, when
we go up a level we get CLDR and ICU. Those have more impact on language
support for far more people in the world than the addition of new scripts
does. After all, approaching half of the population of the globe owns
smartphones: ICU provides programmatic access to the Unicode encoding,
properties, and algorithms, and CLDR + ICU together provide the core
language support on essentially every one of those smartphones.

But in terms of language coverage, the chart you reference (and the
corresponding
graph <http://cldr.unicode.org/index/downloads/cldr-32#TOC-Growth>) show
how very far CLDR still has to go. So we are gearing up for ways to extend
that graph: to move at least the basic coverage (the lower plateau in that
graph) to more languages, and to move basic-coverage languages up to more
in-depth coverage. We are focusing on ways to improve the CLDR survey tool
backend and frontend, since we know it currently cannot able to handle the
number of people that want to contribute, and has glitches in the UI that
make it clumsier to use than it should be.

Well, this turned out to be more than just a few words... sorry for going
on!

Mark

On Thu, Mar 1, 2018 at 9:10 PM, Doug Ewell via Unicode <unicode at unicode.org>
wrote:

> Tim Partridge wrote:
>
> > Perhaps the CLDR work the Consortium does is being referenced. That is
> > by language on this list
> > http://www.unicode.org/cldr/charts/32/supplemental/locale_
> coverage.html#ee
> > By the time it gets to the 100th entry the Modern percentage has "room
> > for improvement".
>
> I think that is a measurement of locale coverage -- whether the
> collation tables and translations of "a.m." and "p.m." and "a week ago
> Thursday" are correct and verified -- not character coverage.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180302/ad680b1a/attachment.html>

From unicode at unicode.org  Fri Mar  2 08:22:36 2018
From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode)
Date: Fri, 2 Mar 2018 15:22:36 +0100 (CET)
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com>
References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com>
Message-ID: <1399756717.44805.1520000556843@ox.hosteurope.de>

F'up2: cldr-users at unicode.org

Doug Ewell via unicode at unicode.org:
> 
> I think that is a measurement of locale coverage -- whether the
> collation tables and translations of "a.m." and "p.m." and "a week ago
> Thursday" are correct and verified -- not character coverage.

By the way, the binary `am` vs. `pm` distinction common in English and labelled `a` as a placeholder in CLDR formats is too simplistic for some languages when using the 12-hour clock (which they usually don't in written language). In German, for instance, you would always use a format with `B` instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier during daylight).

How and where can I best suggest to change this in CLDR? The B formats have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to set `hms` etc. to the same value next time the Survey Tool is open?

In my experience, there are too few people reviewing even the "largest" languages (like German). I participated in v32 and v33, but other than me there were only contributions from (seemingly) a single employee from each of Apple, Google and Microsoft. Most improvements or corrections I suggested just got lost, i.e. nobody discussed or voted on them, so the old values remained.

From unicode at unicode.org  Fri Mar  2 09:26:18 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Fri, 2 Mar 2018 16:26:18 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <1399756717.44805.1520000556843@ox.hosteurope.de>
References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com>
 <1399756717.44805.1520000556843@ox.hosteurope.de>
Message-ID: <CAJ2xs_HtjpopcEhB8RZ4s4phACfWtmyvVGAePYOTJHhQeB3-HQ@mail.gmail.com>

No, the patterns should always have the right format. However, in the
supplemental data there is information as to the preferred data for each
language. This data isn't collected through the ST, so a ticket needs to be
filed.

In your particular case, the data has:

<hours
 preferred="H"
 allowed="H hB"
 regions="AD AM AO AT AW BE BF BJ BL BR CG CI CV DE EE FR GA GF GN GP GW HR
IL IT KZ MC MD MF MQ MZ NC NL PM PT RE RO SI SM SR ST TG TR WF YT"/>

If DE just doesn't use hB, then you can file a ticket to say that it
shouldn't be in @allowed.

Note that the format permits either regions or locales, as in:

<hours
 preferred="H"
 allowed="H h hB"
 regions="CF CM LU NP PF SC SN TF ca_ES gl_ES"/>


As to involvement, we try to encourage interaction on the forum. In some
languages those are quite active; in others not so much. (BTW, a number of
your suggestions made sense to me, but not being a native German speaker, I
don't weigh in on de.xml except for structural issues or where people seem
to miss the intent.) So people may look at the forum, disagree with the
proposal, but not respond why they disagree.


Mark

On Fri, Mar 2, 2018 at 3:22 PM, Christoph P?per via Unicode <
unicode at unicode.org> wrote:

> F'up2: cldr-users at unicode.org
>
> Doug Ewell via unicode at unicode.org:
> >
> > I think that is a measurement of locale coverage -- whether the
> > collation tables and translations of "a.m." and "p.m." and "a week ago
> > Thursday" are correct and verified -- not character coverage.
>
> By the way, the binary `am` vs. `pm` distinction common in English and
> labelled `a` as a placeholder in CLDR formats is too simplistic for some
> languages when using the 12-hour clock (which they usually don't in written
> language). In German, for instance, you would always use a format with `B`
> instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier
> during daylight).
>
> How and where can I best suggest to change this in CLDR? The B formats
> have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to
> set `hms` etc. to the same value next time the Survey Tool is open?
>
> In my experience, there are too few people reviewing even the "largest"
> languages (like German). I participated in v32 and v33, but other than me
> there were only contributions from (seemingly) a single employee from each
> of Apple, Google and Microsoft. Most improvements or corrections I
> suggested just got lost, i.e. nobody discussed or voted on them, so the old
> values remained.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180302/e3b71468/attachment.html>

From unicode at unicode.org  Fri Mar  2 09:51:04 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Fri, 2 Mar 2018 16:51:04 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <1399756717.44805.1520000556843@ox.hosteurope.de>
References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com>
 <1399756717.44805.1520000556843@ox.hosteurope.de>
Message-ID: <CAGa7JC1c07rFvqtTZbNTQ90ZdcV4wWubwOP964OYYBncz4q1kw@mail.gmail.com>

day periods (from 00:0 to 24:00 : sometimes "night", but generally included
in "matin", then "midi", "apr?s-midi", "soir") are also used in French muct
more usefully than the ambiguous and unused am/pm Latin abbreviations that
fell compeltely out of use a few centuries ago

(side note: not sure if it was commonly abbreviated, most probably only in
written form but not spelled orally where it would read only the full latin
words in before French finally replaced the judiciary and liturgic "Late
Vulgar Latin" language that no one was really understanding correctlmy and
it was constantly creolized with the many regional vernacular oil languages
instead of following the liturgic and judiciary style; at that time, the
"ante/poste meridiem was only heard in christian masses or judiciary
documents, both full of corportative jargons, and even different from the
approximative Latin of the adminsitration; then Latin collapsed under
regional oil languages that differentiated much between each other, before
French was finally created, abandoning Latin as the sole source, but
reinventing words borrowed from Greek and adapted to the Anjou oil variant
used by ruling nobility and the neighborhood of the King and some
passionate chuch personalities that also wanted to incoporate the several
oc languages and other european languages for the diplomacy; then Frenchc
took about 2 centuries to develop before it finally burnt most regional oil
variants and nearly burnt also oc variants ; there remains some Latin
expressions in French, but only for specific/technical usages, especially
in the judiciary language, like in English; but English kept the "ante/post
meridiem" only by its abbreviations, and today, most native English
speakers don't know really what "am"  and "pm" really means).

So yes, day periods should have their own format codes. But the number of
day periods varies across languages (not really between distinct scripts of
the same language), but more importantly also across gerographic
regions/countries/territories (more than by language). CLDR would then need
more regional variants than those supported for now (ISO 3166-1 codes may
not be sufficient as BCP 47 language subtags )


2018-03-02 15:22 GMT+01:00 Christoph P?per via Unicode <unicode at unicode.org>
:

> F'up2: cldr-users at unicode.org
>
> Doug Ewell via unicode at unicode.org:
> >
> > I think that is a measurement of locale coverage -- whether the
> > collation tables and translations of "a.m." and "p.m." and "a week ago
> > Thursday" are correct and verified -- not character coverage.
>
> By the way, the binary `am` vs. `pm` distinction common in English and
> labelled `a` as a placeholder in CLDR formats is too simplistic for some
> languages when using the 12-hour clock (which they usually don't in written
> language). In German, for instance, you would always use a format with `B`
> instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier
> during daylight).
>
> How and where can I best suggest to change this in CLDR? The B formats
> have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to
> set `hms` etc. to the same value next time the Survey Tool is open?
>
> In my experience, there are too few people reviewing even the "largest"
> languages (like German). I participated in v32 and v33, but other than me
> there were only contributions from (seemingly) a single employee from each
> of Apple, Google and Microsoft. Most improvements or corrections I
> suggested just got lost, i.e. nobody discussed or voted on them, so the old
> values remained.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180302/90c4d27f/attachment.html>

From unicode at unicode.org  Sun Mar  4 08:10:35 2018
From: unicode at unicode.org (Helena Miton via Unicode)
Date: Sun, 4 Mar 2018 15:10:35 +0100
Subject: Fonts and font sizes used in the Unicode
Message-ID: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>

Greetings. Is there a way to know which font and font size have been used
in the Unicode charts (for various writing systems)? Many thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180304/a3f2719e/attachment.html>

From unicode at unicode.org  Sun Mar  4 11:12:34 2018
From: unicode at unicode.org (Markus Scherer via Unicode)
Date: Sun, 4 Mar 2018 09:12:34 -0800
Subject: Fonts and font sizes used in the Unicode
In-Reply-To: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
Message-ID: <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>

On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode <
unicode at unicode.org> wrote:

> Greetings. Is there a way to know which font and font size have been used
> in the Unicode charts (for various writing systems)? Many thanks!
>

What are you trying to do?

Many of the fonts are unique to the Unicode chart production, and are not
licensed for other uses. Some are not even generally usable.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180304/5fbda7cd/attachment.html>

From unicode at unicode.org  Sun Mar  4 12:52:26 2018
From: unicode at unicode.org (William_J_G Overington via Unicode)
Date: Sun, 4 Mar 2018 18:52:26 +0000 (GMT)
Subject: Fonts and font sizes used in the Unicode
In-Reply-To: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
Message-ID: <26364438.35351.1520189546089.JavaMail.defaultUser@defaultHost>

Helena Milton asks:

> Greetings. Is there a way to know which font and font size have been used in the Unicode charts (for various writing systems)? Many thanks!

Yes, download the PDF (Portable Document Format) code chart document to local storage.

Open the file in Adobe Reader.

Right click on the page.

On the panel that is displayed, click on Document Properties... and then on the panel that is then displayed, choose the Fonts tab.

The list of fonts used in the document is then displayed.

Copying a character from the PDF document and pasting it into WordPad may well give the point size of the font that is being used, even if the character glyph is not displayed and what is displayed is just a box with a question mark in it or other some other design of the .notdef glyph from whatever font is being used in WordPad.

William


From unicode at unicode.org  Sun Mar  4 13:49:33 2018
From: unicode at unicode.org (Asmus Freytag via Unicode)
Date: Sun, 4 Mar 2018 11:49:33 -0800
Subject: Fonts and font sizes used in the Unicode
In-Reply-To: <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
Message-ID: <40df19c6-8287-a740-91f3-f00bc827b5e7@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180304/43b18ffb/attachment.html>

From unicode at unicode.org  Sun Mar  4 21:54:10 2018
From: unicode at unicode.org (fantasai via Unicode)
Date: Mon, 5 Mar 2018 12:54:10 +0900
Subject: Emoji as East Asian Width = Wide
Message-ID: <d580dd62-3be2-ee5d-8bc0-17f1e28e641b@inkedblade.net>

Why are the new emoji like U+1F600 Grinning Face EAW=Wide
when other dingbats like U+263A Smiling Face are EAW=Neutral?
This is making it difficult to have consistent formatting
across emoticons. Also, emoji aren't really CJK context only
now, are they.

https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show
https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show

~fantasai

From unicode at unicode.org  Sat Mar  3 19:32:45 2018
From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode)
Date: Sun, 4 Mar 2018 10:32:45 +0900
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
Message-ID: <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>

Hello John,

On 2018/03/01 12:31, via Unicode wrote:

> Pen, or brush and paper is much more flexible. With thousands of names 
> of people and places still not encoded I am not sure if I would describe 
> hans (simplified Chinese characters) as well supported. nor with current 
> policy which limits China with over one billion people to submitting 
> less than 500 Chinese characters a year on average, and names not being 
> all to be added, it is hard to say which decade hans will be well 
> supported.

I think this contains several misunderstandings. First, of course 
pen/brush and paper are more flexible than character encoding, but 
that's true for the Latin script, too.

Second, while I have heard that people create new characters for naming 
a baby in a traditional Han context, I haven't heard about this in a 
simplified Han context. And it's not frequent at all, the same way 
naming a baby John in the US is way more frequent than let's say Qvtwzx. 
I'd also assume that China has regulations on what characters can be 
used to name a baby, and that the parents in this age of smartphone 
communication will think at least twice before giving their baby a name 
that they cannot send to their relatives via some chat app.

Third, I cannot confirm or deny the "500 characters a year" limit, but 
I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need 
to encode more characters, everybody would find a way to handle these.

Due to the nature of your claims, it's difficult to falsify many of 
them. It would be easier to prove them (assuming they were true), so if 
you have any supporting evidence, please provide it.

Regards,   Martin.

> John Knightley


From unicode at unicode.org  Mon Mar  5 01:58:33 2018
From: unicode at unicode.org (Oren Watson via Unicode)
Date: Mon, 5 Mar 2018 02:58:33 -0500
Subject: Fwd: Emoji as East Asian Width = Wide
In-Reply-To: <CAKs2F=qqOcMjFTz_OxB2fCx3PoE8+bf0ej5X-mKR7U5R+k-D5g@mail.gmail.com>
References: <d580dd62-3be2-ee5d-8bc0-17f1e28e641b@inkedblade.net>
 <CAKs2F=qqOcMjFTz_OxB2fCx3PoE8+bf0ej5X-mKR7U5R+k-D5g@mail.gmail.com>
Message-ID: <CAKs2F=p42Q6EuzxSrKRvTpjh4amrUDc7mQVZdDhc4_RGW0hMcg@mail.gmail.com>

EAW is used in fixed-width settings to distinguish characters that should
take up one space versus two. I would also prefer that all these be
considered wide, since otherwise it causes format problems in these
settigns.
(unfortunately fixed-width appear to be largley ignored by unicode... ??)

On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode <unicode at unicode.org>
wrote:

> Why are the new emoji like U+1F600 Grinning Face EAW=Wide
> when other dingbats like U+263A Smiling Face are EAW=Neutral?
> This is making it difficult to have consistent formatting
> across emoticons. Also, emoji aren't really CJK context only
> now, are they.
>
> https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show
> https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show
>
> ~fantasai
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/4e3bf7aa/attachment.html>

From unicode at unicode.org  Mon Mar  5 02:57:11 2018
From: unicode at unicode.org (Phake Nick via Unicode)
Date: Mon, 05 Mar 2018 08:57:11 +0000
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
Message-ID: <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>

? 2018?3?5??? 13:25?Martin J. D?rst via Unicode <unicode at unicode.org> ???

> Hello John,
>
> On 2018/03/01 12:31, via Unicode wrote:
>
> > Pen, or brush and paper is much more flexible. With thousands of names
> > of people and places still not encoded I am not sure if I would describe
> > hans (simplified Chinese characters) as well supported. nor with current
> > policy which limits China with over one billion people to submitting
> > less than 500 Chinese characters a year on average, and names not being
> > all to be added, it is hard to say which decade hans will be well
> > supported.
>
> I think this contains several misunderstandings. First, of course
> pen/brush and paper are more flexible than character encoding, but
> that's true for the Latin script, too.
>

In latin script, as an example, I can simply name myself "Phake", but in
Chinese with current Unicode-based environment, it would not be possible
for me to randomly name myself using a character  ??? as I would like to.


> Second, while I have heard that people create new characters for naming
> a baby in a traditional Han context, I haven't heard about this in a
> simplified Han context. And it's not frequent at all, the same way
> naming a baby John in the US is way more frequent than let's say Qvtwzx.
> I'd also assume that China has regulations on what characters can be
> used to name a baby, and that the parents in this age of smartphone
> communication will think at least twice before giving their baby a name
> that they cannot send to their relatives via some chat app.
>

Traditional character versus simplified characters in this context is just
like Fraktur vs Antiqua. The way to write some components have been changed
and then there are also orthographical changes that make some characters no
longer comprise of same component, but they are still Chinese characters
and their usage are still unchanged. I believe there are regulations on
naming but that regulations would have be manmade to adopt to the
limitations of current computational system. Plus, once in a while I still
often heard about news that people are having difficulties in using e.g.
train booking system or banking systems due to characters that they are
using. (Although in many case those are encoded characters not supported by
system)


> Third, I cannot confirm or deny the "500 characters a year" limit, but
> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need
> to encode more characters, everybody would find a way to handle these.


> Due to the nature of your claims, it's difficult to falsify many of
> them. It would be easier to prove them (assuming they were true), so if
> you have any supporting evidence, please provide it.
>
> Regards,   Martin.
>
> > John Knightley
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/efb0618f/attachment.html>

From unicode at unicode.org  Mon Mar  5 05:25:17 2018
From: unicode at unicode.org (James Kass via Unicode)
Date: Mon, 5 Mar 2018 03:25:17 -0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
Message-ID: <CABPY6Z3QbRLVcj2=BDzL7LGfcMEM51kzO-LiMZGHiXmzqv1thA@mail.gmail.com>

Phake Nick wrote,

> In latin script, as an example, I can simply name myself
> "Phake", but in Chinese with current Unicode-based environment,
> it would not be possible for me to randomly name myself using
> a character  ???

Isn't that U+246E8? "??"


From unicode at unicode.org  Mon Mar  5 05:49:47 2018
From: unicode at unicode.org (Phake Nick via Unicode)
Date: Mon, 05 Mar 2018 11:49:47 +0000
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CABPY6Z3QbRLVcj2=BDzL7LGfcMEM51kzO-LiMZGHiXmzqv1thA@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <CABPY6Z3QbRLVcj2=BDzL7LGfcMEM51kzO-LiMZGHiXmzqv1thA@mail.gmail.com>
Message-ID: <CAGHjPPJNOvG1KoVY8wShSdis=GyD1P4eUkeLiJFFicEJpaHc9Q@mail.gmail.com>

ah right that's it.

2018?3?5? 19:25 ? "James Kass" <jameskasskrv at gmail.com> ???

Phake Nick wrote,


> In latin script, as an example, I can simply name myself
> "Phake", but in Chinese with current Unicode-based environment,
> it would not be possible for me to randomly name myself using
> a character  ???

Isn't that U+246E8? "??"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/cc993c55/attachment.html>

From unicode at unicode.org  Mon Mar  5 06:00:45 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Mon, 5 Mar 2018 13:00:45 +0100
Subject: Emoji as East Asian Width = Wide
In-Reply-To: <CAKs2F=p42Q6EuzxSrKRvTpjh4amrUDc7mQVZdDhc4_RGW0hMcg@mail.gmail.com>
References: <d580dd62-3be2-ee5d-8bc0-17f1e28e641b@inkedblade.net>
 <CAKs2F=qqOcMjFTz_OxB2fCx3PoE8+bf0ej5X-mKR7U5R+k-D5g@mail.gmail.com>
 <CAKs2F=p42Q6EuzxSrKRvTpjh4amrUDc7mQVZdDhc4_RGW0hMcg@mail.gmail.com>
Message-ID: <CAGa7JC3PAEj2YvjiuDf_aqAAEAHGARJDh6kSobdf+jo1x=2icw@mail.gmail.com>

I think that fixed-width rendering properties for East-Asian characters was
meant only for rendering letters or symbols as plain-text, not for the new
rendering with emoji styles.
If the symbols are rendered as emojis, these properties don't apply at all,
the Emojis style overrides that completely.

Note that when characters have both styles (notably the oldest dingbats),
there's a variant selector available to select the emoji (EAW ignored)
style vs. plain-text style (where EAW is suitable). Characters that have
only Emoji styles and no selectors should not have any EAW property (only
the default one applicable to all Emojis).


2018-03-05 8:58 GMT+01:00 Oren Watson via Unicode <unicode at unicode.org>:

> EAW is used in fixed-width settings to distinguish characters that should
> take up one space versus two. I would also prefer that all these be
> considered wide, since otherwise it causes format problems in these
> settigns.
> (unfortunately fixed-width appear to be largley ignored by unicode... ??)
>
> On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode <unicode at unicode.org
> > wrote:
>
>> Why are the new emoji like U+1F600 Grinning Face EAW=Wide
>> when other dingbats like U+263A Smiling Face are EAW=Neutral?
>> This is making it difficult to have consistent formatting
>> across emoticons. Also, emoji aren't really CJK context only
>> now, are they.
>>
>> https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show
>> https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show
>>
>> ~fantasai
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/8b2d7ad3/attachment.html>

From unicode at unicode.org  Mon Mar  5 09:13:00 2018
From: unicode at unicode.org (via Unicode)
Date: Mon, 05 Mar 2018 23:13:00 +0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
Message-ID: <4f203cff3a031ab9846afe48b34f87e6@koremail.com>

Dear All,

to simplify discussion I have split the points.

On 05.03.2018 16:57, Phake Nick via Unicode wrote:
> ? 2018?3?5??? 13:25?Martin J. D?rst via Unicode
> <unicode at unicode.org [1]> ???
>
>> Hello John,
>>
>> On 2018/03/01 12:31, via Unicode wrote:
>>
>>>Third, I cannot confirm or deny the "500 characters a year" limit, 
>>> but
>>>I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real 
>>> need
>>>to encode more characters, everybody would find a way to handle 
>>> these.

>>> Due to the nature of your claims, it's difficult to falsify many of
>>> them. It would be easier to prove them (assuming they were true), 
>>> so if
>>> you have any supporting evidence, please provide it.

Chinese characters for Unicode first go to IRG (or ISO/IEC 
JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an 
average based on IRG #48 document regarding working set 2017 
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf 
which explicitly states "each submission shall not exceed 1,000 
characters". The People's Republic of China as one member of IRG is 
limited to 1,000 characters, which hopefully we can all agree has a 
population of over 1,000,000,000 , therefore was limited to submitting 
at most 1,000 characters. The earliest possible date for the next 
working set is two or three years later, that is 2019 or 2020, so that's 
an average limit of either 500 or 333 characters a year.

Regards
John

>>> Regards,   Martin.


From unicode at unicode.org  Mon Mar  5 09:42:15 2018
From: unicode at unicode.org (via Unicode)
Date: Mon, 05 Mar 2018 23:42:15 +0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
Message-ID: <447c571bad4174b493e4bd42ee7a41f2@koremail.com>


Dear All,

here is reply to points one and two.

On 05.03.2018 16:57, Phake Nick via Unicode wrote:
> ? 2018?3?5??? 13:25?Martin J. D?rst via Unicode
> <unicode at unicode.org [1]> ???
>
>> Hello John,
>>
>> On 2018/03/01 12:31, via Unicode wrote:
>>
>> > Pen, or brush and paper is much more flexible. With thousands of
>> names
>> > of people and places still not encoded I am not sure if I would
>> describe
>> > hans (simplified Chinese characters) as well supported. nor with
>> current
>> > policy which limits China with over one billion people to
>> submitting
>> > less than 500 Chinese characters a year on average, and names not
>> being
>> > all to be added, it is hard to say which decade hans will be well
>> > supported.
>>
>> I think this contains several misunderstandings. First, of course
>> pen/brush and paper are more flexible than character encoding, but
>> thats true for the Latin script, too.
>
> In latin script, as an example, I can simply name myself "Phake", but
> in Chinese with current Unicode-based environment, it would not be
> possible for me to randomly name myself using a character? ???
> as I would like to.
>
>> Second, while I have heard that people create new characters for
>> naming
>> a baby in a traditional Han context, I havent heard about this in a
>> simplified Han context. And its not frequent at all, the same way
>> naming a baby John in the US is way more frequent than lets say
>> Qvtwzx.
>> Id also assume that China has regulations on what characters can be
>> used to name a baby, and that the parents in this age of smartphone
>> communication will think at least twice before giving their baby a
>> name
>> that they cannot send to their relatives via some chat app.
>

In most cases the answer to the above may well be the same, the 
unencoded names of people and places are not new names, but rather names 
of places and poeple in use from before Unicode and often before 
computers. In IRG #48 People's Republic of China 
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2187ChinaActivityReport.pdf 
that states of over 3,000 names of people and places are under 
condideration for IRG working set 2017 and at least half require 
encoding. The document also list other categories of CJK ideographs 
under consideration for submission to Unicode.

Regards
John


>
>
> Links:types
> ------
> [1] mailto:unicode at unicode.org


From unicode at unicode.org  Mon Mar  5 11:03:27 2018
From: unicode at unicode.org (suzuki toshiya via Unicode)
Date: Tue, 6 Mar 2018 02:03:27 +0900
Subject: [Unicode] Re: Fonts and font sizes used in the Unicode
In-Reply-To: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
Message-ID: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>

Hi,

I remember, the front page of the code charts by
Unicode has following note:

<quote>
Fonts
The shapes of the reference glyphs used in these code
charts are not prescriptive. Considerable variation is
to be expected in actual fonts. The particular fonts
used in these charts were provided to the Unicode
Consortium by a number of different font designers,
who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.
</quote>

--

I have a question; if some people try to make a
translated version of Unicode, they should contact
all font contributors and ask for the license?
Unicode Consortium cannot give any sublicense?

If I understand correctly, ISO/IEC JTC1 hold the
copyright of the materials used in the published
documents of JTC1 standard, because they have to
permit the production of the translated version of
their standards, the reuse of the content of a spec
by another spec, etc.

Thus, I guess, it would not be so irrelevant to ask
the permission to JTC1, about the fonts used in
ISO/IEC 10646 - although it does not mean that
JTC1 would permit anything. If I'm misunderstanding,
please correct me.

Regards,
mpsuzuki

On 3/5/2018 4:49 AM, Asmus Freytag via Unicode wrote:
> On 3/4/2018 9:12 AM, Markus Scherer via Unicode wrote:
> On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode <unicode at unicode.org<mailto:unicode at unicode.org>> wrote:
> Greetings. Is there a way to know which font and font size have been used in the Unicode charts (for various writing systems)? Many thanks!
> 
> What are you trying to do?
> 
> Many of the fonts are unique to the Unicode chart production, and are not licensed for other uses. Some are not even generally usable.
> 
> markus
> 
> The editors of the Unicode charts will use any font resource that gets the job done (that is, results in a chart that correctly displays the characters in the standard). These fonts are often not production fonts, and may lack any of the many tables needed to actually display running text. They may also, as has been mentioned, be licensed solely for the purpose of publishing the standard. In some cases, they are custom built.
> 
> For most scripts, the font size is nominally set to 22pt in the main code charts, but the tool that the editors use allow a different size to be selected for any range of code points, or individual characters. There are some examples where a character is very wide or tall where it had to be scaled down individually to fit the cell.
> 
> The purpose of the code charts is *exclusively* that of helping users of the standard identify which character is encoded at what code position. They are not intended as a font resource or normative description of the glyphs. Any usage scenario that is outside the very narrow scope is unsupported and reverse engineering / extracting font resources is explicitly in violation of the terms of use.
> 
> A./
> 


From unicode at unicode.org  Mon Mar  5 11:39:41 2018
From: unicode at unicode.org (Markus Scherer via Unicode)
Date: Mon, 5 Mar 2018 09:39:41 -0800
Subject: [Unicode] Re: Fonts and font sizes used in the Unicode
In-Reply-To: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
Message-ID: <CAN49p6p5GV858fvOektvsYkswiUHueWXs7JcGnAyn++nrXy=5w@mail.gmail.com>

On Mon, Mar 5, 2018 at 9:03 AM, suzuki toshiya via Unicode <
unicode at unicode.org> wrote:

> I have a question; if some people try to make a
> translated version of Unicode, they should contact
> all font contributors and ask for the license?
> Unicode Consortium cannot give any sublicense?
>

If you want to translate the Unicode Standard or its companion standards
(UAX, UTS, ...), then please contact the Unicode Consortium.

Thus, I guess, it would not be so irrelevant to ask
> the permission to JTC1, about the fonts used in
> ISO/IEC 10646 - although it does not mean that
> JTC1 would permit anything. If I'm misunderstanding,
> please correct me.
>

The production of the ISO 10646 standard is done by the Unicode Consortium.
I am fuzzy on what exactly that means for copyright. If you need to find
out, then please contact the consortium.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/1e09c8ed/attachment.html>

From unicode at unicode.org  Mon Mar  5 11:40:46 2018
From: unicode at unicode.org (Ken Whistler via Unicode)
Date: Mon, 5 Mar 2018 09:40:46 -0800
Subject: CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0
 characters now ready for adoption!)
In-Reply-To: <4f203cff3a031ab9846afe48b34f87e6@koremail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <4f203cff3a031ab9846afe48b34f87e6@koremail.com>
Message-ID: <c818db03-7149-308a-7197-35ea0c611f93@att.net>

John,

I think this may be giving the list a somewhat misleading picture of the 
actual statistics for encoding of CJK unified ideographs. The "500 
characters a year" or "1000 characters a year" limits are administrative 
limits set by the IRG for national bodies (and others) submitting 
repertoire to the "working set" that the IRG then segments into chunks 
for processing to prepare new increments for actual encoding.

In point of fact, if we take 1991 as the base year, the *average* rate 
of encoding new CJK unified ideographs now stands at 3379 per annum 
(87,860 as of Unicode 10.0). By "encoding" here, I mean, final, finished 
publication of the encoded characters -- not the larger number of 
potentially unifiable submissions that eventually go into a publication 
increment. There is a gradual downward drift in that number over time, 
because of the impact on the stats of the "big bang" encoding of 42,711 
ideographs for Extension B back in 2001, but recently, the numbers have 
been quite consistent with an average incremental rate of about 3000 new 
ideographs per year:

5762 added for Extension E in 2015

7463 added for Extension F in 2017

~ 4934 to be added for Extension G, probably to be published in 2020

If you run the average calculation including Extension G, assuming 2020, 
you end up with a cumulative per annum rate of 3200, not much different 
than the calculation done as of today.

And as for the implication that China, in particular, is somehow limited 
by these numbers, one should note that the vast majority of Extension G 
is associated with Chinese sources. Although a substantial chunk is 
formally labeled with a "UK" source this time around, almost all of 
those characters represent a roll-in of systematic simplifications, of 
various sorts, associated with PRC usage. (People who want to check can 
take a look at L2/17-366R in the UTC document registry.)

--Ken


On 3/5/2018 7:13 AM, via Unicode wrote:
> Dear All,
>
> to simplify discussion I have split the points. <unicode at unicode.org [1]

>
>>
>>>
>>>
>>> On 2018/03/01 12:31, via Unicode wrote:
>>>
>>>> Third, I cannot confirm or deny the "500 characters a year" limit, but
>>>> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real 
>>>> need
>>>> to encode more characters, everybody would find a way to handle these.
>
>
> Chinese characters for Unicode first go to IRG (or ISO/IEC 
> JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an 
> average based on IRG #48 document regarding working set 2017 
> http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf 
> which explicitly states "each submission shall not exceed 1,000 
> characters". The People's Republic of China as one member of IRG is 
> limited to 1,000 characters, which hopefully we can all agree has a 
> population of over 1,000,000,000 , therefore was limited to submitting 
> at most 1,000 characters. The earliest possible date for the next 
> working set is two or three years later, that is 2019 or 2020, so 
> that's an average limit of either 500 or 333 characters a year.
>
> Regards
> John
>
>
>
>


From unicode at unicode.org  Mon Mar  5 11:49:17 2018
From: unicode at unicode.org (Asmus Freytag via Unicode)
Date: Mon, 5 Mar 2018 09:49:17 -0800
Subject: [Unicode] Re: Fonts and font sizes used in the Unicode
In-Reply-To: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
Message-ID: <b991accc-33c4-9dbe-6830-3c5b34151ca2@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/53d176e3/attachment.html>

From unicode at unicode.org  Mon Mar  5 12:21:23 2018
From: unicode at unicode.org (Ken Whistler via Unicode)
Date: Mon, 5 Mar 2018 10:21:23 -0800
Subject: Translating the standard (was: Re: Fonts and font sizes used in the
 Unicode)
In-Reply-To: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
Message-ID: <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>


On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:
> I have a question; if some people try to make a
> translated version of Unicode

And to add to Asmus' response, folks on the list should understand that 
even with the best of effort, the concept of a "translated version of 
Unicode" is a near impossibility. In fairly recent times, two serious 
efforts to translate *just *the core specification -- one in Japanese, 
and a somewhat later attempt for Chinese -- crashed and burned, for a 
variety of reasons. The core specification is huge, contains a lot of 
very specific technical terminology that is difficult to translate, 
along with a large collection of script- and language-specific detail, 
also hard to translate. Worse, it keeps changing, with updates now 
coming out once every year. Some large parts are stable, but it is 
impossible to predict what sections might be impacted by the next year's 
encoding decisions.

That is not including that fact that "the Unicode Standard" now also 
includes 14 separate HTML (or XHTML) annexes, all of which are also 
moving targets, along with the UCD data files, which often contain 
important information in their headers that would also require 
translation. And then, of course, there are the 2000+ pages of the 
formatted code charts, which require highly specific and very 
complicated custom tooling and font usage to produce.

It would require a dedicated (and expensive) small army of translators, 
terminologists, editors, programmers, font designers, and project 
managers to replicate all of this into another language publication -- 
and then they would have to do it again the next year, and again the 
next year, in perpetuity. Basically, given the current situation, it 
would be a fool's errand, more likely to introduce errors and 
inconsistencies than to help anybody with actual implementation.

People who want accessibility to the Unicode Standard in other languages 
need to scale down their expectations considerably, and focus on 
preparing reasonably short and succinct introductions to the terminology 
and complexity involved in the full standard. Such projects are 
feasible. But a full translation of "the Unicode Standard" simply is not.

--Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/7bf04eb5/attachment.html>

From unicode at unicode.org  Mon Mar  5 13:19:47 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Mon, 5 Mar 2018 20:19:47 +0100
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
In-Reply-To: <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
Message-ID: <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>

There's been significant efforts to "translate" or more precisely "adapt"
significant parts of the standard with good presentations in Wikipedia and
various sites for scoped topics. So there are alternate charts, and instead
of translating all, the concepts are summarized, reexplained, but still
give links to the original version in English everytime more info is needed.
All UCD files don't need to be translated, they can also be automatically
processed to generate alternate presentations or datatables in other
formats. There's no value in taking efforts to translate them manually,
it's better to develop a tool that will process them in the format users
can read.

So remove the UCD files and the tables from the count, as well as sample
code (which is jsut demontrative and uses simplified non optimal
implementation to keep this code clear). We an now have separate tools or
websites presenting them and proposing commented code which is also better
performing. We have large collections of i18n libraries that were developed
for various development platforms and usage documentation in various
languages.

The only efforts is in:
* naming characters (Wikipedia is great to distribute the effort and have
articles showing relevant collections of characters and document alternate
names or disambiguate synonyms).
* the core text of the standard (section 3 about conformance and
requirements is the first thing to adapt). There's absolutely no need
however to do that as a pure translation, it can be rewritten and presented
with the goals wanted by users. Here again Wikiepdia has done significant
efforts there, in various languages
* keeping the tools developed in the previous paragraph in sync and
conformity with the standard (sync the UCD files they use).

2018-03-05 19:21 GMT+01:00 Ken Whistler via Unicode <unicode at unicode.org>:

>
> On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:
>
> I have a question; if some people try to make a
> translated version of Unicode
>
>
> And to add to Asmus' response, folks on the list should understand that
> even with the best of effort, the concept of a "translated version of
> Unicode" is a near impossibility. In fairly recent times, two serious
> efforts to translate *just *the core specification -- one in Japanese,
> and a somewhat later attempt for Chinese -- crashed and burned, for a
> variety of reasons. The core specification is huge, contains a lot of very
> specific technical terminology that is difficult to translate, along with a
> large collection of script- and language-specific detail, also hard to
> translate. Worse, it keeps changing, with updates now coming out once every
> year. Some large parts are stable, but it is impossible to predict what
> sections might be impacted by the next year's encoding decisions.
>
> That is not including that fact that "the Unicode Standard" now also
> includes 14 separate HTML (or XHTML) annexes, all of which are also moving
> targets, along with the UCD data files, which often contain important
> information in their headers that would also require translation. And then,
> of course, there are the 2000+ pages of the formatted code charts, which
> require highly specific and very complicated custom tooling and font usage
> to produce.
>
> It would require a dedicated (and expensive) small army of translators,
> terminologists, editors, programmers, font designers, and project managers
> to replicate all of this into another language publication -- and then they
> would have to do it again the next year, and again the next year, in
> perpetuity. Basically, given the current situation, it would be a fool's
> errand, more likely to introduce errors and inconsistencies than to help
> anybody with actual implementation.
>
> People who want accessibility to the Unicode Standard in other languages
> need to scale down their expectations considerably, and focus on preparing
> reasonably short and succinct introductions to the terminology and
> complexity involved in the full standard. Such projects are feasible. But a
> full translation of "the Unicode Standard" simply is not.
>
> --Ken
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180305/6619b45a/attachment.html>

From unicode at unicode.org  Tue Mar  6 02:09:49 2018
From: unicode at unicode.org (via Unicode)
Date: Tue, 06 Mar 2018 16:09:49 +0800
Subject: CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0
 characters now ready for adoption!)
In-Reply-To: <c818db03-7149-308a-7197-35ea0c611f93@att.net>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <4f203cff3a031ab9846afe48b34f87e6@koremail.com>
 <c818db03-7149-308a-7197-35ea0c611f93@att.net>
Message-ID: <506f0e2407f87dfe5159a7a66da44e3d@koremail.com>

Dear Ken,

the context of the question was how many characters in modern use are 
being encoded. Part of the answer is that there are several thousand 
Chinese characters that are names of people on places to be encoded. The 
limit of 1,000 characters a working set per member was for workings set 
2017, this is a new thing. If the same member limit is applied to future 
working sets, then the result will be that some of these characters 
identified in 2017. Some around 500 have been included in working set 
2017. Some will be included in the following working set which will most 
likely be in 2020 and if there is then also a limit of 1,000 characters 
per member then not all would be included. That would mean some would 
have to wait until 2022 before they can be submitted to IRG, which means 
at least 2027 before they are encoded. Names of pleople and places are 
not the only CJK unified ideographs that need to be encoded but they 
illustrate the problem that if future working have a 1,000 limit per 
member which submissions every 2 or 3 years, then it delay the encoding 
on CJK unified ideographs by years.

On 06.03.2018 01:40, Ken Whistler via Unicode wrote:
> John,
>
> I think this may be giving the list a somewhat misleading picture of
> the actual statistics for encoding of CJK unified ideographs. The 
> "500
> characters a year" or "1000 characters a year" limits are
> administrative limits set by the IRG for national bodies (and others)
> submitting repertoire to the "working set" that the IRG then segments
> into chunks for processing to prepare new increments for actual
> encoding.
>

Here I was refering to the number of CJK unified ideogrpahs that the 
People's Republic of China can submit to IRG, the numbers are of course 
different for CJK  unified ideographs as a whole. A limit of 1,000 a 
working set means that the number of CJK unified ideographs in the 
People's Republic of China awaiting submission to IRG is most likely to 
increase not decreases for decades to come. For other IRG members that 
still have characters to submit a limit of 1,000 a working set most 
likely leads to a decrease in the number of CJK unified ideographs 
awaiting submission over time. In short the administrative limit of 
1,000 works to a degree for most IRG members, but not for the People's 
Republic of China.

> In point of fact, if we take 1991 as the base year, the *average*
> rate of encoding new CJK unified ideographs now stands at 3379 per
> annum (87,860 as of Unicode 10.0). By "encoding" here, I mean, final,
> finished publication of the encoded characters -- not the larger
> number of potentially unifiable submissions that eventually go into a
> publication increment. There is a gradual downward drift in that
> number over time, because of the impact on the stats of the "big 
> bang"
> encoding of 42,711 ideographs for Extension B back in 2001, but
> recently, the numbers have been quite consistent with an average
> incremental rate of about 3000 new ideographs per year:
>

1991 to 2001 70,207 that is around seven thousand a year. However 2002 
to 2018 only 17,675 so around one thousand a year

> 5762 added for Extension E in 2015
>

These 5762 were submitted to IRG in 2001, so 14 years from submission 
to encoding.

> 7463 added for Extension F in 2017
>
> ~ 4934 to be added for Extension G, probably to be published in 2020
>
> If you run the average calculation including Extension G, assuming
> 2020, you end up with a cumulative per annum rate of 3200, not much
> different than the calculation done as of today.
>
> And as for the implication that China, in particular, is somehow
> limited by these numbers, one should note that the vast majority of
> Extension G is associated with Chinese sources. Although a 
> substantial
> chunk is formally labeled with a "UK" source this time around, almost
> all of those characters represent a roll-in of systematic
> simplifications, of various sorts, associated with PRC usage. (People
> who want to check can take a look at L2/17-366R in the UTC document
> registry.)
>

Extension G was before the 1,000 character per memeber limit. Whatever 
the UK characters submitted were, the largest single Chinese source was 
in fact over one thousand Zhuang characters submitted by People's 
Republic of Chhina not "systematic simplifications". It would certainly 
be incorrect to think that the vaste majority of CJK unified ideographs 
to be encoded are "systematic simplifications".

Regards
John


> --Ken
>
>
> On 3/5/2018 7:13 AM, via Unicode wrote:
>> Dear All,
>>
>> to simplify discussion I have split the points. <unicode at unicode.org 
>> [1]
>
>>
>>>
>>>>
>>>>
>>>> On 2018/03/01 12:31, via Unicode wrote:
>>>>
>>>>> Third, I cannot confirm or deny the "500 characters a year" 
>>>>> limit, but
>>>>> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a 
>>>>> real need
>>>>> to encode more characters, everybody would find a way to handle 
>>>>> these.
>>
>>
>> Chinese characters for Unicode first go to IRG (or ISO/IEC 
>> JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an 
>> average based on IRG #48 document regarding working set 2017 
>> http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf 
>> which explicitly states "each submission shall not exceed 1,000 
>> characters". The People's Republic of China as one member of IRG is 
>> limited to 1,000 characters, which hopefully we can all agree has a 
>> population of over 1,000,000,000 , therefore was limited to submitting 
>> at most 1,000 characters. The earliest possible date for the next 
>> working set is two or three years later, that is 2019 or 2020, so 
>> that's an average limit of either 500 or 333 characters a year.
>>
>> Regards
>> John
>>
>>
>>
>>


From unicode at unicode.org  Tue Mar  6 14:52:30 2018
From: unicode at unicode.org (=?utf-8?B?IkouwqBTLiBDaG9pIg==?= via Unicode)
Date: Tue, 06 Mar 2018 12:52:30 -0800
Subject: New default emoji presentation in CSS: non-conformance with UTR 51 by
 web browsers
Message-ID: <20EA0846-0690-475A-A8B8-B79E90E1A833@icloud.com>

The W3C CSS Working Group is continuing to work on standardizing the default emoji presentation in perhaps the most ubiquitous application of Unicode today, the world wide web. Some recent logs:

https://github.com/w3c/csswg-drafts/commit/7a5e0d702b00f8d3df5f2b43c9c65d1c2a2284f6
https://github.com/w3c/csswg-drafts/issues/2304#issuecomment-369323232 <https://github.com/w3c/csswg-drafts/issues/2304#issuecomment-369323232>
Current draft at https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc <https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc>

Currently, the CSS draft specifies three values for emoji that a web author may use to style their content: auto, text, and emoji. The auto value (which is the default) leaves emoji presentation to the discretion of the web browser and system platform itself, rather than conforming strictly to UTR 51. https://github.com/w3c/csswg-drafts/issues/1223 <https://github.com/w3c/csswg-drafts/issues/1223> proposes that a strict 

If the authors or experts of UTR 51 believe that the Emoji_presentation property is useful, then they may want to chime in at Issue w3c/csswg-drafts#1223 <https://github.com/w3c/csswg-drafts/issues/1223> with their expertise. My opinion is that standardizing the default presentation is important enough to strictly conform to UTR 51. Breakage has already occurred in the past, such as when WebKit in 2015 unexpectedly switched the default presentation of U+21A9 LEFTWARDS ARROW WITH HOOK ??? from text to emoji, which unexpectedly broke existing websites such as Daring Fireball (see https://daringfireball.net/linked/2015/04/22/unicode-emoji <https://daringfireball.net/linked/2015/04/22/unicode-emoji> and also http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/ <http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/>).

See also https://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0016.html <https://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0016.html> and https://github.com/w3c/csswg-drafts/issues/2138 <https://github.com/w3c/csswg-drafts/issues/2138>. To give an update on this issue: The CSS WG recently resolved to make all web browsers completely ignore BCP47?s -u- extension. If the authors/experts of the BCP47 extension believe that the extension is at all useful, they still may wish to chime in at https://github.com/w3c/csswg-drafts/issues/2138 <https://github.com/w3c/csswg-drafts/issues/2138>, but https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc <https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc> has now been updated to specify the ignoring of the BCP47 extension.

Cheers,
J. S. Choi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180306/735b4270/attachment.html>

From unicode at unicode.org  Tue Mar  6 14:57:21 2018
From: unicode at unicode.org (=?utf-8?B?IkouwqBTLiBDaG9pIg==?= via Unicode)
Date: Tue, 06 Mar 2018 12:57:21 -0800
Subject: New default emoji presentation in CSS: non-conformance with UTR 51
 by web browsers
In-Reply-To: <20EA0846-0690-475A-A8B8-B79E90E1A833@icloud.com>
References: <20EA0846-0690-475A-A8B8-B79E90E1A833@icloud.com>
Message-ID: <34A34966-58C6-4DAC-8CD5-2482D8943161@icloud.com>

Apologies for the duplicate threads; I accidentally sent the email as rich text. Here?s a version without the duplicate links.

> On Mar 6, 2018, at 12:52 PM, J. S. Choi via Unicode <unicode at unicode.org> wrote:
> 
> The W3C CSS Working Group is continuing to work on standardizing the default emoji presentation in perhaps the most ubiquitous application of Unicode today, the world wide web. Some recent logs:
> 
> https://github.com/w3c/csswg-drafts/commit/7a5e0d702b00f8d3df5f2b43c9c65d1c2a2284f6
> https://github.com/w3c/csswg-drafts/issues/2304#issuecomment-369323232
> Current draft at https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc
> 
> Currently, the CSS draft specifies three values for emoji that a web author may use to style their content: auto, text, and emoji. The auto value (which is the default) leaves emoji presentation to the discretion of the web browser and system platform itself, rather than conforming strictly to UTR 51. https://github.com/w3c/csswg-drafts/issues/1223 proposes that a strict 
> 
> If the authors or experts of UTR 51 believe that the Emoji_presentation property is useful, then they may want to chime in at Issue w3c/csswg-drafts#1223 with their expertise. My opinion is that standardizing the default presentation is important enough to strictly conform to UTR 51. Breakage has already occurred in the past, such as when WebKit in 2015 unexpectedly switched the default presentation of U+21A9 LEFTWARDS ARROW WITH HOOK ??? from text to emoji, which unexpectedly broke existing websites such as Daring Fireball (see https://daringfireball.net/linked/2015/04/22/unicode-emoji and also http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/).
> 
> See also https://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0016.html and https://github.com/w3c/csswg-drafts/issues/2138. To give an update on this issue: The CSS WG recently resolved to make all web browsers completely ignore BCP47?s -u- extension. If the authors/experts of the BCP47 extension believe that the extension is at all useful, they still may wish to chime in at https://github.com/w3c/csswg-drafts/issues/2138, but https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc has now been updated to specify the ignoring of the BCP47 extension.
> 
> Cheers,
> J. S. Choi


From unicode at unicode.org  Wed Mar  7 14:26:21 2018
From: unicode at unicode.org (Richard Wordingham via Unicode)
Date: Wed, 7 Mar 2018 20:26:21 +0000
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
Message-ID: <20180307202621.770d1099@JRWUBU2>

On Mon, 05 Mar 2018 23:42:15 +0800
via Unicode <unicode at unicode.org> wrote:

> In most cases the answer to the above may well be the same, the 
> unencoded names of people and places are not new names,

How many new characters are being devised per year?

Richard.

From unicode at unicode.org  Wed Mar  7 15:12:41 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Wed, 7 Mar 2018 22:12:41 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <20180307202621.770d1099@JRWUBU2>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
Message-ID: <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>

So most of the growth in Han characters is caused by people inventing and
registering new sinograms for their own names, using the basic principles
of combining a phonogram and a distinctive semantic character.
It's like if we were encoding in the UCS the personal handwritten
signatures with our own choice.
Are these worth encoding ? Why can't we just encode most of them as a
sequence (phonogram, ideogram, and combining layout character) i.e. mostly
what IDS provide, except that they are descriptive but suited for the same
purpose.

Why can't those IDS be rendered as ligatures and then have those
"characters" being in fact ligatured IDS strings ?

Shouldn't the IRG better work on providing a disctionary of IDS strings
needed for people names, then allowing font providers in China to render
them as ligatures (the "representative glyph" of these ligatures would be
the official Chinese personal record for such use, and it would be enough
for the chinese administration).

After all this is what we are already doing by encoding in Unicode various
emoji sequences (then rendered as ligatures in a much more fuzzy way !)...

Shouldn't we create a variant of IDS, using combining joiners between Han
base glyphs (then possibly augmented by variant selectors if there are
significant differences on the simplification of rendered strokes for each
component) ? What is really limiting us to do that ?


2018-03-07 21:26 GMT+01:00 Richard Wordingham via Unicode <
unicode at unicode.org>:

> On Mon, 05 Mar 2018 23:42:15 +0800
> via Unicode <unicode at unicode.org> wrote:
>
> > In most cases the answer to the above may well be the same, the
> > unencoded names of people and places are not new names,
>
> How many new characters are being devised per year?
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180307/71a85b5b/attachment.html>

From unicode at unicode.org  Wed Mar  7 15:35:42 2018
From: unicode at unicode.org (Ken Whistler via Unicode)
Date: Wed, 7 Mar 2018 13:35:42 -0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
Message-ID: <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net>


On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
> Shouldn't we create a variant of IDS, using combining joiners between 
> Han base glyphs (then possibly augmented by variant selectors if there 
> are significant differences on the simplification of rendered strokes 
> for each component) ? What is really limiting us to do that ?
>

Ummm.... ambiguity, lack of precision, complexity of model, pushback by 
stakeholders, likely failure of uptake by most implementers, duplication 
of representation, ...

Do you think combining models of Han weren't already thought of years 
ago? They predated the original encoding of unified CJK in Unicode in 
1992. They weren't viable then, and they aren't viable now, either, 
after 26 years of Unicode implementation of unified CJK as atomic 
ideographs.

--Ken


From unicode at unicode.org  Wed Mar  7 16:04:21 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Wed, 7 Mar 2018 23:04:21 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
 <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net>
Message-ID: <CAGa7JC0P0vfwMyAFpJCGdUmedWSGCPodf9nHx3vZRxpNfHfiqw@mail.gmail.com>

I'm just speaking about the many yearly inventions of sinograms for
personal/proper names, not about the ues of traditional characters for
normal language.

People just start by assembling components with common rules. Then they
enhance the produced character just like we personalize signatures. But for
me, all these look like personal signatures and are not neede for formal
encoding and even these persons will accept alternate presentations if it's
just to cite them (and would not like much that you imitate their personal
signature by standardizing it in a worldwide standard: I think many of
these encodings have severe privacy issues, possibly as well copyright
issues !).


2018-03-07 22:35 GMT+01:00 Ken Whistler <kenwhistler at att.net>:

>
>
> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
>
>> Shouldn't we create a variant of IDS, using combining joiners between Han
>> base glyphs (then possibly augmented by variant selectors if there are
>> significant differences on the simplification of rendered strokes for each
>> component) ? What is really limiting us to do that ?
>>
>>
> Ummm.... ambiguity, lack of precision, complexity of model, pushback by
> stakeholders, likely failure of uptake by most implementers, duplication of
> representation, ...
>
> Do you think combining models of Han weren't already thought of years ago?
> They predated the original encoding of unified CJK in Unicode in 1992. They
> weren't viable then, and they aren't viable now, either, after 26 years of
> Unicode implementation of unified CJK as atomic ideographs.
>
> --Ken
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180307/51e476dc/attachment.html>

From unicode at unicode.org  Wed Mar  7 16:13:31 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Wed, 7 Mar 2018 23:13:31 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC0P0vfwMyAFpJCGdUmedWSGCPodf9nHx3vZRxpNfHfiqw@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
 <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net>
 <CAGa7JC0P0vfwMyAFpJCGdUmedWSGCPodf9nHx3vZRxpNfHfiqw@mail.gmail.com>
Message-ID: <CAGa7JC2bEpB1WQ7F0qZ6_xZ1wGCw1G_b-0mYvXaXNDThXFEkqw@mail.gmail.com>

Note: I don't advocate "duplicate encoding" as you think. But probably the
current IDS model is not sufficient to describe characters correctly, and
that it may be augmented a bit (using variant codes or some additional
joiners or diacritics?).

But IDS strings are suitable for rendering as ligatures and this should be
permitted, and should even be the standard way to represent personal names
without making them depend on an unproved single distinctive presentation.

E.g. someone writes his name with some personal strokes and uses it as its
registered "signature"; he is then doing business or is cited in news with
simplified presentation, and the Chinese authorities also use their own
simplications. All these will designate the same person. But who is correct
for the presentation of the character ? In my opinion it is only the person
that invented it for themselve, as a personal signature, but this is not
suitable for encoding (privacy and copyright issue). All the other
presentation are legitimate, and we don't need additional encoding for it:
the ligaturing of IDS strings is sufficient even if it does not match
exactly the person's signature.


2018-03-07 23:04 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> I'm just speaking about the many yearly inventions of sinograms for
> personal/proper names, not about the ues of traditional characters for
> normal language.
>
> People just start by assembling components with common rules. Then they
> enhance the produced character just like we personalize signatures. But for
> me, all these look like personal signatures and are not neede for formal
> encoding and even these persons will accept alternate presentations if it's
> just to cite them (and would not like much that you imitate their personal
> signature by standardizing it in a worldwide standard: I think many of
> these encodings have severe privacy issues, possibly as well copyright
> issues !).
>
>
> 2018-03-07 22:35 GMT+01:00 Ken Whistler <kenwhistler at att.net>:
>
>>
>>
>> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
>>
>>> Shouldn't we create a variant of IDS, using combining joiners between
>>> Han base glyphs (then possibly augmented by variant selectors if there are
>>> significant differences on the simplification of rendered strokes for each
>>> component) ? What is really limiting us to do that ?
>>>
>>>
>> Ummm.... ambiguity, lack of precision, complexity of model, pushback by
>> stakeholders, likely failure of uptake by most implementers, duplication of
>> representation, ...
>>
>> Do you think combining models of Han weren't already thought of years
>> ago? They predated the original encoding of unified CJK in Unicode in 1992.
>> They weren't viable then, and they aren't viable now, either, after 26
>> years of Unicode implementation of unified CJK as atomic ideographs.
>>
>> --Ken
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180307/1daf45b3/attachment.html>

From unicode at unicode.org  Wed Mar  7 16:18:01 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Wed, 7 Mar 2018 23:18:01 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC2bEpB1WQ7F0qZ6_xZ1wGCw1G_b-0mYvXaXNDThXFEkqw@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
 <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net>
 <CAGa7JC0P0vfwMyAFpJCGdUmedWSGCPodf9nHx3vZRxpNfHfiqw@mail.gmail.com>
 <CAGa7JC2bEpB1WQ7F0qZ6_xZ1wGCw1G_b-0mYvXaXNDThXFEkqw@mail.gmail.com>
Message-ID: <CAGa7JC0kP9LXAxNtw==HU2d5V8ONFvZpVWjB+KvDxVLyNN2FaQ@mail.gmail.com>

Additional note: the UCS will never large enough to support the personal
signatures of billions Chinese people living today or born since
milleniums, or jsut those to be born in the next century. There's a need to
represent these names using composed strings. A reasonable
compositing/ligaturing process can then present almost all of them !


2018-03-07 23:13 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> Note: I don't advocate "duplicate encoding" as you think. But probably the
> current IDS model is not sufficient to describe characters correctly, and
> that it may be augmented a bit (using variant codes or some additional
> joiners or diacritics?).
>
> But IDS strings are suitable for rendering as ligatures and this should be
> permitted, and should even be the standard way to represent personal names
> without making them depend on an unproved single distinctive presentation.
>
> E.g. someone writes his name with some personal strokes and uses it as its
> registered "signature"; he is then doing business or is cited in news with
> simplified presentation, and the Chinese authorities also use their own
> simplications. All these will designate the same person. But who is correct
> for the presentation of the character ? In my opinion it is only the person
> that invented it for themselve, as a personal signature, but this is not
> suitable for encoding (privacy and copyright issue). All the other
> presentation are legitimate, and we don't need additional encoding for it:
> the ligaturing of IDS strings is sufficient even if it does not match
> exactly the person's signature.
>
>
> 2018-03-07 23:04 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:
>
>> I'm just speaking about the many yearly inventions of sinograms for
>> personal/proper names, not about the ues of traditional characters for
>> normal language.
>>
>> People just start by assembling components with common rules. Then they
>> enhance the produced character just like we personalize signatures. But for
>> me, all these look like personal signatures and are not neede for formal
>> encoding and even these persons will accept alternate presentations if it's
>> just to cite them (and would not like much that you imitate their personal
>> signature by standardizing it in a worldwide standard: I think many of
>> these encodings have severe privacy issues, possibly as well copyright
>> issues !).
>>
>>
>> 2018-03-07 22:35 GMT+01:00 Ken Whistler <kenwhistler at att.net>:
>>
>>>
>>>
>>> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
>>>
>>>> Shouldn't we create a variant of IDS, using combining joiners between
>>>> Han base glyphs (then possibly augmented by variant selectors if there are
>>>> significant differences on the simplification of rendered strokes for each
>>>> component) ? What is really limiting us to do that ?
>>>>
>>>>
>>> Ummm.... ambiguity, lack of precision, complexity of model, pushback by
>>> stakeholders, likely failure of uptake by most implementers, duplication of
>>> representation, ...
>>>
>>> Do you think combining models of Han weren't already thought of years
>>> ago? They predated the original encoding of unified CJK in Unicode in 1992.
>>> They weren't viable then, and they aren't viable now, either, after 26
>>> years of Unicode implementation of unified CJK as atomic ideographs.
>>>
>>> --Ken
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180307/f08a5fae/attachment.html>

From unicode at unicode.org  Wed Mar  7 17:32:02 2018
From: unicode at unicode.org (Andrew West via Unicode)
Date: Wed, 7 Mar 2018 23:32:02 +0000
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC0kP9LXAxNtw==HU2d5V8ONFvZpVWjB+KvDxVLyNN2FaQ@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
 <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net>
 <CAGa7JC0P0vfwMyAFpJCGdUmedWSGCPodf9nHx3vZRxpNfHfiqw@mail.gmail.com>
 <CAGa7JC2bEpB1WQ7F0qZ6_xZ1wGCw1G_b-0mYvXaXNDThXFEkqw@mail.gmail.com>
 <CAGa7JC0kP9LXAxNtw==HU2d5V8ONFvZpVWjB+KvDxVLyNN2FaQ@mail.gmail.com>
Message-ID: <CALgEMhy+NLgk6MCVsCoP+ZaCkC9Pom+Jkmxc3nhi-re2MuuvtA@mail.gmail.com>

On 7 March 2018 at 22:18, Philippe Verdy via Unicode
<unicode at unicode.org> wrote:
>
> Additional note: the UCS will never large enough to support the personal
> signatures of billions Chinese people living today or born since milleniums,
> or jsut those to be born in the next century. There's a need to represent
> these names using composed strings. A reasonable compositing/ligaturing
> process can then present almost all of them !

CJK characters invented for writing personal names are extremely rare,
and do not constitute a significant fraction of CJK ideographs
proposed for encoding. The majority of unencoded modern-use characters
in China (that are not systematic simplified forms of existing encoded
characters) are used in place names or in Chinese dialects or for
writing non-Chinese languages such as Zhuang.

Andrew

From unicode at unicode.org  Wed Mar  7 19:27:06 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Thu, 8 Mar 2018 02:27:06 +0100 (CET)
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
In-Reply-To: <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
Message-ID: <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>

On?Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote:
?
> There's been significant efforts to "translate" or more precisely "adapt" 
> significant parts of the standard with good presentations in Wikipedia and 
> various sites for scoped topics. So there are alternate charts, and instead 
> of translating all, the concepts are summarized, reexplained, but still 
> give links to the original version in English everytime more info is needed. 

Indeed one of the best uses we can make of efforts in Unicode education is
in extending and improving the Wikipedia coverage, because this is the first
place almost everybody is going to. So if a government is considering an 
investment, donating to Wikimedia and motivating a vast community seems
a really good plan. And hiring staffers for this purpose will increase reliability
of the data (given that some corporations misuse the infrastructure for PR).

> All UCD files don't need to be translated, they can also be automatically 
> processed to generate alternate presentations or datatables in other 
> formats. There's no value in taking efforts to translate them manually, 
> it's better to develop a tool that will process them in the format users 
> can read. 

The only UCD file I?d advise to fully translate is the Nameslist as being the 
source code of the Code Charts. These are indeed indispensable because of
the glyphic information they convey, that can be found nowhere else, Hence
all good secondary sources like Wikipedia link to the Unicode Charts,
The NamesList per se is useful also in that it provides a minimal amount of
information about the characters. But it lacks important hints about bidi?mirroring,
that should be compiled from yet another UCD file. The downside of generating
a holistic view is that it generally ends up in an atomic view as on a per?character
basis. Though anyway it?s up to the user to gather an overview tailored for his/her
needs. This is catered for by Chinese and Japanese versions of sites such as
www.fileformat.info.

[?]
> The only efforts is in: 
> * naming characters (Wikipedia is great to distribute the effort and have 
> articles showing relevant collections of characters and document alternate 
> names or disambiguate synonyms). 

Naming characters is a real challenge and is often running into multiple issues.
First we need to make clear for who the localization is intended: technical people
or UIs. It happened that a literal translation tuned in accordance with specialists
was then handed out to the industry for showing up on everyone?s computer,
while some core characters of the intended locale are named differently in real
life, so that students don?t encounter what they have learned at school. 
And the worst thing is that once a translation is released, image considerations
lead to seek stability even where no Unicode (ISO) policy is preventing updates.

> * the core text of the standard (section 3 about conformance and 
> requirements is the first thing to adapt). There's absolutely no need 
> however to do that as a pure translation, it can be rewritten and presented 
> with the goals wanted by users. Here again Wikiepdia has done significant 
> efforts there, in various languages 
> * keeping the tools developed in the previous paragraph in sync and 
> conformity with the standard (sync the UCD files they use). ?

Yes the biggest issue over time, as Ken wrote, is to *maintain* a translation, 
be it only the Nameslist.


Marcel


From unicode at unicode.org  Wed Mar  7 19:42:38 2018
From: unicode at unicode.org (via Unicode)
Date: Thu, 08 Mar 2018 09:42:38 +0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <20180307202621.770d1099@JRWUBU2>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
Message-ID: <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>

Dear Richard,

to the best of my knowledge virtually no new characters used just for 
names are under consideration, all the ones that are under consideration 
are from before this century. Some are only being submitted now, but 
that does not mean they are new in real life, just new to Unicode. Place 
names tend to be even older.

Regards
John

On 08.03.2018 04:26, Richard Wordingham via Unicode wrote:
> On Mon, 05 Mar 2018 23:42:15 +0800
> via Unicode <unicode at unicode.org> wrote:
>
>> In most cases the answer to the above may well be the same, the
>> unencoded names of people and places are not new names,
>
> How many new characters are being devised per year?
>
> Richard.


From unicode at unicode.org  Wed Mar  7 20:13:36 2018
From: unicode at unicode.org (via Unicode)
Date: Thu, 08 Mar 2018 10:13:36 +0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
Message-ID: <66d5823c8a89c7d6c96a39bf83adf766@koremail.com>

Dear Phillip

On 08.03.2018 05:12, Philippe Verdy via Unicode wrote:
> So most of the growth in Han characters is caused by people inventing
> and registering new sinograms for their own names, using the basic
> principles of combining a phonogram and a distinctive semantic
> character.

This is not correct. It is certainly not correct for CJK characrters 
added to Unicode, and to the best of my knowledge it one just makes up a 
new character for one's name it is now no longer possible to legally 
register it anywhere that uses Chinese characters. Take Extension F, 
over seven thousand characters of which nearly three thousand Japanese 
characters in Budhist texts, over one thousand Zhuang characters, naerly 
two thousand characters used in Korean historical texts.

Regards
John

From unicode at unicode.org  Wed Mar  7 20:32:27 2018
From: unicode at unicode.org (via Unicode)
Date: Thu, 08 Mar 2018 10:32:27 +0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC0kP9LXAxNtw==HU2d5V8ONFvZpVWjB+KvDxVLyNN2FaQ@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <CAGa7JC1xD-vmZAnwGVbQ0ZKT4KPoC=t8AzGdmdKO9s1scXPfEg@mail.gmail.com>
 <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net>
 <CAGa7JC0P0vfwMyAFpJCGdUmedWSGCPodf9nHx3vZRxpNfHfiqw@mail.gmail.com>
 <CAGa7JC2bEpB1WQ7F0qZ6_xZ1wGCw1G_b-0mYvXaXNDThXFEkqw@mail.gmail.com>
 <CAGa7JC0kP9LXAxNtw==HU2d5V8ONFvZpVWjB+KvDxVLyNN2FaQ@mail.gmail.com>
Message-ID: <6802900325b925e037f79746fc4b5b25@koremail.com>

On 08.03.2018 06:18, Philippe Verdy via Unicode wrote:
> Additional note: the UCS will never large enough to support the
> personal signatures of billions Chinese people living today or born
> since milleniums, or jsut those to be born in the next century. 
> Theres
> a need to represent these names using composed strings. A reasonable
> compositing/ligaturing process can then present almost all of them !
>

There is no such need, Chinese names are not formed in this way, if one 
just makes up a character how would others be able to read it, slight 
variants that add style to a character do not in Unicode count as new 
characters. Furthermore with government records in all computerised the 
are now strict rules on babies names in People's Reepulic of China, 
Taiwan, etc that prevent one making up new characters for names.

Whilst there are maybe a few thousand name CJK unified ideographs to 
add to UCS, there are tens of thousands of non-name CJK unified 
ideographs yet to be added.


Regards
John

From unicode at unicode.org  Thu Mar  8 02:04:40 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Thu, 8 Mar 2018 09:04:40 +0100 (CET)
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
In-Reply-To: <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
Message-ID: <2018652387.2179.1520496281199.JavaMail.www@wwinf1m17>

On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote:
[?]
> * the core text of the standard (section 3 about conformance and requirements is the first thing to adapt).
> There's absolutely no need however to do that as a pure translation, it can be rewritten and presented
> with the goals wanted by users. Here again Wikiepdia has done significant efforts there, in various languages

I don?t think there is a potential to rewrite the core specs if the goal is making an abstract, given that 
the original authors already made efforts to keep the language simple. Whenever the goal is to add
information, by contrast, e.g. about (yet) non?standard use of superscripts in Latin text, then the added
value ? clearly tagged as such ? will reward the effort.

A big part of the core spec is made of script?specific introductions designed to be balanced and handy.
Hence part of the information is provided only in the code charts, some in the annexes. Compiling it all 
and writing up more detailed articles is indeed much more interesting for readers focussing on a script.

Best regards,

Marcel


From unicode at unicode.org  Thu Mar  8 02:25:25 2018
From: unicode at unicode.org (fantasai via Unicode)
Date: Thu, 8 Mar 2018 17:25:25 +0900
Subject: Sentence_Break, Semi-colons, and Apparent Miscategorization
Message-ID: <a1344ff7-d3a4-80f3-03c1-4d88b3a70ad3@inkedblade.net>

Given that the comma and colon are categorized as SContinue,
why is the semicolon also not SContinue?

Also, why is the Greek Question Mark not categorized with
the rest of the question marks?

Why aren't the vertical presentation forms categorized with
the things they are presenting?

Thanks~
~fantasai

From unicode at unicode.org  Thu Mar  8 03:03:28 2018
From: unicode at unicode.org (Richard Wordingham via Unicode)
Date: Thu, 8 Mar 2018 09:03:28 +0000
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
In-Reply-To: <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
Message-ID: <20180308090328.336a734f@JRWUBU2>

On Thu, 8 Mar 2018 02:27:06 +0100 (CET)
Marcel Schneider via Unicode <unicode at unicode.org> wrote:

> Yes the biggest issue over time, as Ken wrote, is to *maintain* a
> translation, be it only the Nameslist.

For which accurately determined change bars can work wonders.  An
alternative would be paragraph identification and a list of changed
paragraphs.  The section number in TUS is too coarse for giving text
locations, and page numbers are inherently changeable.

Richard.


From unicode at unicode.org  Thu Mar  8 08:18:19 2018
From: unicode at unicode.org (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?= via Unicode)
Date: Thu, 8 Mar 2018 15:18:19 +0100
Subject: metric for block coverage
In-Reply-To: <20180217221825.wovnzpnzftpsjp37@angband.pl>
References: <20180217221825.wovnzpnzftpsjp37@angband.pl>
Message-ID: <81d9e511-f33e-c0ff-39a0-9b9ecbdc937b@gmail.com>


Hi !

 ?? I?ll just add two points to the various points raised in the 
previous conversation about block coverage :


Le 17/02/2018 ? 23:18, Adam Borowski via Unicode a ?crit?:
> Hi!
> As a part of Debian fonts team work, we're trying to improve fonts review:
> ways to organize them, add metadata, pick which fonts are installed by
> default and/or recommended to users, etc.
>
> I'm looking for a way to determine a font's coverage of available scripts.
> It's probably reasonable to do this per Unicode block.  [...]
>
> A na?ve way would be to count codepoints present in the font vs the number
> of all codepoints in the block.  Alas, there's way too much chaff for such
> an approach to be reasonable: ? or ? count the same as LATIN TURNED CAPITAL
> LETTER SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON.
A slightly less na?ve way would be to take care of when the code-points 
ere added to Unicode, with the rough idea that the most widespread use 
characters were added first. It also adds the nice feature that this 
metric is less ambiguous for the blocks which are not yet completed.

For example, if you have a 100% coverage of
Armenian for Unicode 10.0 (which I?ll call Armenian10.0 for short), it 
only implies a coverage of 89/91=97.8% of Armenian11.0, which will see 
the addition of two characters used in Armenian dialectology (ARMENIAN 
SMALL LETTER TURNED AYB and YI WITH STROKE).
If you look at the history of Armenian Block (e.g. here 
https://en.wikipedia.org/wiki/Armenian_(Unicode_block)),
Most (84) characters where added in 1.0, A ligature was added in 1.0, 
ARMENIAN HYPHEN was added in 3.0, a currency symbol in 6.1, two 
decorative symbols in 7.0 and two characters used in dialectology are 
planned in 11.0. I guess this roughly correspond to a ranking of the 
characters from the most used to the least used.


To take your examples, both ? and ? are in unicode since 1.1 (and, I 
guess 1.0), while LATIN TURNED CAPITAL
LETTER SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON is not 
encoded yet, so,they are not the same according to this metric...? To 
know what this means for othe Latin example, you can watch the Latin 
Extende-D block (history here 
https://en.wikipedia.org/wiki/Latin_Extended-D ) whith new characters in 
5.0, 5.1, 6.1, 7.0, 8.0, 9.0 and some accepted for 11.0 (SMALL CAPITAL 
Q, CAPITAL/SMALL LETTER U WITH STROKE), and later (15, for? Egyptology, 
Assyriology, medieval English and historical Pinyin)

Of course, this measure is only rough. A counter example is in the 
monetary symbol block, where ? U+20AC EURO SIGN (in Unicode since 2.1) 
is much more used than ? U+20A3 FRENCH FRANC SIGN encode since Unicode 
1.1 (1.0?) but that I never saw, despite living in France for more than 
four decades.
> [...]

> I don't think I'm the first to have this question.  Any suggestions?

For the Han (CJK) script, the IRG (Ideographic Rapporteur Group) defined 
a set of less than 10k essential Han characters, IICore (International 
Ideographs Core, 
https://en.wikipedia.org/wiki/International_Ideographs_Core). This is 
described in the Unihan database in the Unihan_IRGSources.txt file, 
kIICore field (https://www.unicode.org/reports/tr38/#kIICore ). This 
field also includes a letter (A,B or C) indicating a priority value and 
some regional information. For Unicode 10.0, a simple grep tells that 
there are 9810 IICore characters, 7772 of hitch pritority A, 417 
priority B and 1621 priority C.

Note that IICore has been stable (as version 2.2) since 2004, but Ken 
Lunde, from Adobe, has recently proposed an update to it 
(https://www.unicode.org/L2/L2018/18066-iicore-changes.pdf), but only in 
the region tags, neither on the priorities nor on the list of 
characters. However, reading the associated blog post of Ken Lunde, it 
seems a few characters could be added to IICore in the future.

 ?? Cheers,

 ??? ??? ??? French

From unicode at unicode.org  Thu Mar  8 08:19:24 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Thu, 8 Mar 2018 15:19:24 +0100 (CET)
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
Message-ID: <1023307540.11630.1520518764800.JavaMail.www@wwinf1m17>

On Thu, 8 Mar 2018 09:03:28 +0000, Richard Wordingham via Unicode wrote:
> 
> > Yes the biggest issue over time, as Ken wrote, is to *maintain* a
> > translation, be it only the Nameslist.
> 
> For which accurately determined change bars can work wonders. An
> alternative would be paragraph identification and a list of changed
> paragraphs. The section number in TUS is too coarse for giving text
> locations, and page numbers are inherently changeable.

Adobe Illustrator doesn?t seem to support purple numbers, and Adobe Reader
seems unable to accept input of bookmarks as a go?to feature (while that must
be proper to Acrobat). Word is reported not to add lasting change bars in an 
automated way. But all that can be done in HTML ? which is not the format 
of The Unicode Standard, whose web bookmarks are fortunately published in 
separate collections. When UAXes are updated, an intermediate revision has 
all changes highlighted and remains available online. We can see delta charts 
with all changes highlighted, in PDF. Why did the Core Specification not come 
into the benefit of these facilities?

Has this already been submitted as formal feedback? 
(UTC is known for not considering feedback that has not been submitted via
the Contact form or docsubmit at unicode.org, and Mailing lists have explicit 
caveats.)

Best regards,

Marcel


From unicode at unicode.org  Thu Mar  8 09:04:44 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Thu, 8 Mar 2018 16:04:44 +0100
Subject: Sentence_Break, Semi-colons, and Apparent Miscategorization
In-Reply-To: <a1344ff7-d3a4-80f3-03c1-4d88b3a70ad3@inkedblade.net>
References: <a1344ff7-d3a4-80f3-03c1-4d88b3a70ad3@inkedblade.net>
Message-ID: <CAJ2xs_FfWHuAeqUDoDYX-82y1HX2agP8dq5D-i743AP-P6j5mQ@mail.gmail.com>

>From the first line, I guess you mean that all three questions are having
to do with the Sentence_Break property values. Namely:

http://www.unicode.org/reports/tr29/proposed.html#Table_Sentence_Break_Property_Values
http://www.unicode.org/reports/tr29/proposed.html#SContinue

Mark

On Thu, Mar 8, 2018 at 9:25 AM, fantasai via Unicode <unicode at unicode.org>
wrote:

> Given that the comma and colon are categorized as SContinue,
> why is the semicolon also not SContinue?


> Also, why is the Greek Question Mark not categorized with
> the rest of the question marks?
>

?As I recall
?,?
?both are
 because the semicolon can also represent a greek question mark (they are
canonically equivalent
?, so you can't reliably distinguish between them
).?

?BTW, here is a table of property differences for codepoint X, toNfc(X) (if
a single character) and toNfkc(X) (again, if a single character).

https://docs.google.com/spreadsheets/d/1ZExxhAujA8kX42F8KBK3okX_So7Dt5YZvyanL8dH8tM/edit#gid=0

It was a quick dump so no guarantees that all the dots are crossed. It
skips comparing properties that are purposefully different across NFC (like
Decomposition_Mapping) or different code points (like Name or Block), and
most CJK properties (ones starting with 'k').


> Why aren't the vertical presentation forms categorized with
> the things they are presenting?
>

?At least some of them are:
U+FE10 ( ? ) PRESENTATION FORM FOR VERTICAL COMMA
U+FE11 ( ? ) PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA
U+FE13 ( ? ) PRESENTATION FORM FOR VERTICAL COLON
U+FE31 ( ? ) PRESENTATION FORM FOR VERTICAL EM DASH
U+FE32 ( ? ) PRESENTATION FORM FOR VERTICAL EN DASH
?

>
> Thanks~
> ~fantasai
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180308/0828d6fc/attachment.html>

From unicode at unicode.org  Thu Mar  8 03:25:53 2018
From: unicode at unicode.org (Elsebeth Flarup via Unicode)
Date: Thu, 08 Mar 2018 04:25:53 -0500
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
In-Reply-To: <20180308090328.336a734f@JRWUBU2>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
Message-ID: <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>

For a number of reasons I think translating the standard is a really bad idea.

As long as there are people interested in maintaining the translation, identifying deltas and easily translating just the deltas would NOT be difficult, however. Modern computer aided translation tools all use translation memories that automatically translate already translated segments and present only new/changed segments to the translator. No need for change bars etc. 

This assumes that somebody would have stewardship of the translation memory, that the people doing the translation would be willing to/capable of using the CAT tools, etc., but the technical translation technology is available to make this part of the equation not much of an issue.

There are other reasons to not do this.

Elsebeth


??

??????? Original Message ???????

On March 8, 2018 10:03 AM, Richard Wordingham via Unicode <unicode at unicode.org> wrote:

> ??
> 
> On Thu, 8 Mar 2018 02:27:06 +0100 (CET)
> 
> Marcel Schneider via Unicode unicode at unicode.org wrote:
> 
> > Yes the biggest issue over time, as Ken wrote, is to maintain a
> > 
> > translation, be it only the Nameslist.
> 
> For which accurately determined change bars can work wonders. An
> 
> alternative would be paragraph identification and a list of changed
> 
> paragraphs. The section number in TUS is too coarse for giving text
> 
> locations, and page numbers are inherently changeable.
> 
> Richard.


From unicode at unicode.org  Thu Mar  8 12:05:06 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Thu, 8 Mar 2018 19:05:06 +0100 (CET)
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
In-Reply-To: <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
Message-ID: <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>

On Thu, 08 Mar 2018 04:25:53 -0500, Elsebeth Flarup via Unicode wrote:
> 
> For a number of reasons I think translating the standard is a really bad idea.
> 
[?]
> 
> There are other reasons to not do this.

I assume that the reasons you are thinking of, are congruent with those that 
Ken already explained in detail in:

http://www.unicode.org/mail-arch/unicode-ml/y2018-m03/0025.html

And I think with Ken that the idea in itself isn?t bad as such, but that it is not 
feasible any longer. Everybody (supposedly) knows that the Core Spec has 
really been translated, published in a print edition, scanned into Google Books,
and is still for sale:

https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1206989878&sr=8-1

https://books.google.fr/books?
id=GgbWZNTRncsC&printsec=frontcover&dq=Andries+Patrick&hl=fr&sa=X&ved=0ahUKEwis59Cwp93ZAhUF6RQKHZ1GBlIQ6AEIKjAA#v=onepage&q
=Andries%20Patrick&f=false

OK, the version number was only half the actual one.

Best regards,

Marcel


From unicode at unicode.org  Thu Mar  8 12:27:47 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Thu, 8 Mar 2018 19:27:47 +0100
Subject: metric for block coverage
In-Reply-To: <81d9e511-f33e-c0ff-39a0-9b9ecbdc937b@gmail.com>
References: <20180217221825.wovnzpnzftpsjp37@angband.pl>
 <81d9e511-f33e-c0ff-39a0-9b9ecbdc937b@gmail.com>
Message-ID: <CAGa7JC2YdDVat4Fc4z8m-wDckUyNm6xkXHLdU+_6r9o+6Dt8ew@mail.gmail.com>

2018-03-08 15:18 GMT+01:00 Fr?d?ric Grosshans via Unicode <
unicode at unicode.org>:

> Le 17/02/2018 ? 23:18, Adam Borowski via Unicode a ?crit :
>
>> Of course, this measure is only rough. A counter example is in the
>> monetary symbol block, where ? U+20AC EURO SIGN (in Unicode since 2.1) is
>> much more used than ? U+20A3 FRENCH FRANC SIGN encode since Unicode 1.1
>> (1.0?) but that I never saw, despite living in France for more than four
>> decades.
>
> I actually saw a French franc symbol (not necessarily this one, most often
a narrowed version of the "Fr." abbreviation) only on mechanical
typewriters built in the 1960-1970's, and some IBM typewriter "balls" in
the early 1980's  also on some old printers with rotating wheels, This
narrow abbreviation was used by typists in accounting and administrative
services, typically in tabular data. It was also seen sometimes for
indicating the pricing on newspapers/magazines (but not sure it was really
a single character, as they were used along with non-monospaced fonts, and
was probably only using smaller narrow font styles, without any ligature).

I wonder if this symbol was not just outside of France, or in former
colonies before the 1960's (or created later to distinguish the French
Franc from the CFA Franc).

May be it has some use today in Africa as an abbreviation of the CFA (now
pegged to the Euro via agreement with Banque de France and the European
Commission for the amount of warranties, collected by CFA members and
France, and needed to offer this limited warranty of conversion with the
Euro on a limited exchange market subject to more restrictive policies),
but the two CFA currencies (of the BEAC or BCAO) are not fractions of Euro

The CFA-EUR conversion rates are not stable and subject to scheduled
changes ,by agreements between CFA members, France and the European
commission. And because they are not "liquid" currencies (subject to
restrictive conversions and controls), their rates against the Euro on open
markets varies constantly (but modestly) around the current designated
value decided by CFA banks members and partners: the pegged value is then
only indicative of the medium rate it should have on markets. For this
reason, most international payments and contracts are made in more liquid
major currencies (EUR, GBP, USD, CHF, ZAR, DTS, and gold ounces) but at
much more variable rates (and with higher transaction fees on open markets
than conversions between major currencies).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180308/16560d6b/attachment.html>

From unicode at unicode.org  Thu Mar  8 17:06:03 2018
From: unicode at unicode.org (Richard Wordingham via Unicode)
Date: Thu, 8 Mar 2018 23:06:03 +0000
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
Message-ID: <20180308230603.3fb73ef6@JRWUBU2>

On Thu, 08 Mar 2018 09:42:38 +0800
via Unicode <unicode at unicode.org> wrote:

> to the best of my knowledge virtually no new characters used just for 
> names are under consideration, all the ones that are under
> consideration are from before this century.

What I was interested in was the rate of generation of new
CJK characters in general, not just those for names.  I appreciate that
encoding is dominated by the backlog of older characters.

Richard.

From unicode at unicode.org  Thu Mar  8 19:17:32 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Fri, 9 Mar 2018 02:17:32 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <20180308230603.3fb73ef6@JRWUBU2>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
 <20180308230603.3fb73ef6@JRWUBU2>
Message-ID: <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>

This still leaves the question about how to write personal names !
IDS alone cannot represent them without enabling some "reasonable"
ligaturing (they don't have to match the exact strokes variants for optimal
placement, or with all possible simplifications).
I'm curious to know how China, Taiwan, Singapore or Japan handle this (for
official records or in banks): like our personal signatures (as digital
images), and then using a simplified official record (including the
registration of romanized names)?

2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode <
unicode at unicode.org>:

> On Thu, 08 Mar 2018 09:42:38 +0800
> via Unicode <unicode at unicode.org> wrote:
>
> > to the best of my knowledge virtually no new characters used just for
> > names are under consideration, all the ones that are under
> > consideration are from before this century.
>
> What I was interested in was the rate of generation of new
> CJK characters in general, not just those for names.  I appreciate that
> encoding is dominated by the backlog of older characters.
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/a79188ec/attachment.html>

From unicode at unicode.org  Thu Mar  8 19:22:47 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Fri, 9 Mar 2018 02:22:47 +0100
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
 <20180308230603.3fb73ef6@JRWUBU2>
 <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>
Message-ID: <CAGa7JC3a-sO6w_uX5qwkT9qY_9UJne+kf9owC2p7UH56VcXctw@mail.gmail.com>

As well how Chinese/Japanese post offices handle addresses written with
sinograms for personal names ? Is the expanded IDS form acceptable for
them, or do they require using Romanized addresses, or phonetic
approximations (Bopomofo in China, Kanas in Japan, Hangul in Korea) ?

2018-03-09 2:17 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> This still leaves the question about how to write personal names !
> IDS alone cannot represent them without enabling some "reasonable"
> ligaturing (they don't have to match the exact strokes variants for optimal
> placement, or with all possible simplifications).
> I'm curious to know how China, Taiwan, Singapore or Japan handle this (for
> official records or in banks): like our personal signatures (as digital
> images), and then using a simplified official record (including the
> registration of romanized names)?
>
> 2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode <
> unicode at unicode.org>:
>
>> On Thu, 08 Mar 2018 09:42:38 +0800
>> via Unicode <unicode at unicode.org> wrote:
>>
>> > to the best of my knowledge virtually no new characters used just for
>> > names are under consideration, all the ones that are under
>> > consideration are from before this century.
>>
>> What I was interested in was the rate of generation of new
>> CJK characters in general, not just those for names.  I appreciate that
>> encoding is dominated by the backlog of older characters.
>>
>> Richard.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/1a6d07d2/attachment.html>

From unicode at unicode.org  Fri Mar  9 04:48:04 2018
From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode)
Date: Fri, 9 Mar 2018 19:48:04 +0900
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
 <20180308230603.3fb73ef6@JRWUBU2>
 <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>
Message-ID: <28cece14-67ee-b0e5-52f6-34147e13c50a@it.aoyama.ac.jp>

On 2018/03/09 10:17, Philippe Verdy via Unicode wrote:
> This still leaves the question about how to write personal names !
> IDS alone cannot represent them without enabling some "reasonable"
> ligaturing (they don't have to match the exact strokes variants for optimal
> placement, or with all possible simplifications).
> I'm curious to know how China, Taiwan, Singapore or Japan handle this (for
> official records or in banks): like our personal signatures (as digital
> images), and then using a simplified official record (including the
> registration of romanized names)?

This question seems to assume more of a difference between alphabetic 
and ideographic traditions. A name in ideographs, in the same way as a 
name in alphabetic characters, is defined by the characters that are 
used, not by stuff like stroke variants, etc. And virtually all names, 
even before the introduction of computers, and even more after that, use 
reasonably frequent characters.

The difference, at least in Japan, is that some people keep the 
ideograph before simplification in their official records, but they may 
or may not insist on its use in everyday practice. In most cases, both a 
traditional and a simplified variant are available. Examples are ?/?, 
?/?, ?/?, and so on. I regularly hit such cases when grading, because 
our university database uses the formal (old) one, where students may 
not care about it and enter the new one on some system where they have 
to enter their name by themselves.

Apart from that, at least in Japan, signatures are used extremely 
rarely; it's mostly stamped seals, which are also kept as images by 
banks,...

Regards,   Martin.


From unicode at unicode.org  Fri Mar  9 04:54:18 2018
From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode)
Date: Fri, 9 Mar 2018 19:54:18 +0900
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC3a-sO6w_uX5qwkT9qY_9UJne+kf9owC2p7UH56VcXctw@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
 <20180308230603.3fb73ef6@JRWUBU2>
 <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>
 <CAGa7JC3a-sO6w_uX5qwkT9qY_9UJne+kf9owC2p7UH56VcXctw@mail.gmail.com>
Message-ID: <c33e50d2-cb12-1b16-8c82-e81ba1cc3f48@it.aoyama.ac.jp>

On 2018/03/09 10:22, Philippe Verdy via Unicode wrote:
> As well how Chinese/Japanese post offices handle addresses written with
> sinograms for personal names ? Is the expanded IDS form acceptable for
> them, or do they require using Romanized addresses, or phonetic
> approximations (Bopomofo in China, Kanas in Japan, Hangul in Korea) ?

They just see the printed form, not an encoding, and therefore no IDS. 
Many addresses use handwriting, which has its own variability. 
Variations such as those covered by IDSes are easily recognizable by 
people as being the same as the 'base' character, and OCR systems, if 
they are good enough to decipher handwriting, can handle such cases, 
too. Romanized addresses will be delivered because otherwise it would be 
difficult for foreigners to send anything. Pure Kana should work in 
Japan, although the postal employee will have a second look because it's 
extremely unusual. For Korea, these days, it will be mostly Hangul; I'm 
not sure whether addresses with Hanja would incur a delay. My guess 
would be that Bopomofo wouldn't work in mainland China (might work in 
Taiwan, not sure).

Regards,   Martin.

From unicode at unicode.org  Fri Mar  9 05:09:27 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Fri, 9 Mar 2018 12:09:27 +0100
Subject: A sketch with the best-known Swiss tongue twister
Message-ID: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>

https://www.youtube.com/watch?v=QOwITNazUKg

De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt.
literally: The Pope has [in Spiez] [the bacon cutlery] [too late] ordered.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/7fe2cbc0/attachment.html>

From unicode at unicode.org  Fri Mar  9 05:52:33 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Fri, 9 Mar 2018 12:52:33 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
Message-ID: <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>

Is that just for Switzerland in one of the local dialectal variants ? Or
more generally Alemannic (also in Northeastern France, South Germany,
Western Austria, Liechtenstein, Northern Italy).

2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode <unicode at unicode.org>:

> https://www.youtube.com/watch?v=QOwITNazUKg
>
> De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt.
> literally: The Pope has [in Spiez] [the bacon cutlery] [too late] ordered.
>
> Mark
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/57acc0b5/attachment.html>

From unicode at unicode.org  Fri Mar  9 06:23:23 2018
From: unicode at unicode.org (Otto Stolz via Unicode)
Date: Fri, 9 Mar 2018 13:23:23 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
Message-ID: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>

2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode
<unicode at unicode.org <mailto:unicode at unicode.org>:
>     De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt.
>     literally: The Pope has [in Spiez] [the bacon cutlery] [too late]
>     ordered.

Am 2018-03-09 um 12:52 schrieb Philippe Verdy via Unicode:
> Is that just for Switzerland in one of the local dialectal variants ?

Basically the same in Central Swabian (I am from Stuttgart):
   I m?en, mir h?bet s Sp?tzles-Bsteck z sp?t bstellt.
   literally: I guess, we have ordered the noodle cutlery too late.

And when my niece married a guy with the Polish surname Brzeczek
and had asked for cutlery for their wedding present, guess what we
have told them. ?

Otto

Solution:
   Zerst hemmer denkt, mir h?bet f?r die Brzeczeks s Bsteck
   z sp?t bstellt, aber n? h?ts doch no glangt.

From unicode at unicode.org  Fri Mar  9 06:24:13 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Fri, 9 Mar 2018 13:24:13 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
Message-ID: <CAJ2xs_GS_mLpzRQd9mQyiBb=Fr5S_5nAC7g=4LoNfL5kV_EpRg@mail.gmail.com>

There are definitely many dialects across Switzerland. I think that for
*this* phrase it would be roughly the same for most of the population, with
minor differences (eg 'het' vs 'h?t'). But a native speaker like Martin
would be able to say for sure.

Mark

On Fri, Mar 9, 2018 at 12:52 PM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> Is that just for Switzerland in one of the local dialectal variants ? Or
> more generally Alemannic (also in Northeastern France, South Germany,
> Western Austria, Liechtenstein, Northern Italy).
>
> 2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode <unicode at unicode.org>
> :
>
>> https://www.youtube.com/watch?v=QOwITNazUKg
>>
>> De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt.
>> literally: The Pope has [in Spiez] [the bacon cutlery] [too late] ordered.
>>
>> Mark
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/f0490712/attachment.html>

From unicode at unicode.org  Fri Mar  9 06:52:54 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Fri, 9 Mar 2018 13:52:54 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
 <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>
Message-ID: <CAGa7JC067oRNWdbUjZ=iPqb+Cs3f4XBAkL66EvUx+X9438cUWg@mail.gmail.com>

So the "best-known Swiss tongue" is still not so much known, and still
incorrectly referenced (frequently confused with "Swiss German", which is
much like standard High German, unifying with it on most aspects, with only
minor orthographic preferences such as capitalization rules or very few
Swiss-specific terms, but no alteration of the grammar and no specific
characters like in Alemanic dialects; the term "Swiss tongue" in the
context given by the video is obviously false).
Note tht Schw?bisch is way far from it. What looks more like the Swiss
dialects of Alemanic if French Alsatian, it is not "Swiss", and don't tell
Alsatians that this is "German" when there are clear differences with the
language on the other side of the Rhine River, and lot of differences with
Schw?bish (which is much more a distinct language than a dialect of
Alemannic or German). Same remark about Tyrol and Bavarian (they are
probably nearer from Schw?bish than Swiss or French Alemannic, or than
Standard High German; their difference with Schw?bish is almost like the
difference between Standard Dutch and Limburgish or West Fl?misch; Standard
Dutch, Standard German, French/Swiss Alemanic, and Schw?bisch are enough
differentiated to be distinct languages). The term "Alemannic" is way too
large, but calling it "Swiss German" is also wrong (even if its ISO 639-3
code is "gsw", probably taken from this incorrect name).

2018-03-09 13:23 GMT+01:00 Otto Stolz via Unicode <unicode at unicode.org>:

> 2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode
> <unicode at unicode.org <mailto:unicode at unicode.org>:
>
>>     De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt.
>>     literally: The Pope has [in Spiez] [the bacon cutlery] [too late]
>>     ordered.
>>
>
> Am 2018-03-09 um 12:52 schrieb Philippe Verdy via Unicode:
>
>> Is that just for Switzerland in one of the local dialectal variants ?
>>
>
> Basically the same in Central Swabian (I am from Stuttgart):
>   I m?en, mir h?bet s Sp?tzles-Bsteck z sp?t bstellt.
>   literally: I guess, we have ordered the noodle cutlery too late.
>
> And when my niece married a guy with the Polish surname Brzeczek
> and had asked for cutlery for their wedding present, guess what we
> have told them. ?
>
> Otto
>
> Solution:
>   Zerst hemmer denkt, mir h?bet f?r die Brzeczeks s Bsteck
>   z sp?t bstellt, aber n? h?ts doch no glangt.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/3f033630/attachment.html>

From unicode at unicode.org  Fri Mar  9 07:40:08 2018
From: unicode at unicode.org (Tom Gewecke via Unicode)
Date: Fri, 9 Mar 2018 06:40:08 -0700
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAGa7JC067oRNWdbUjZ=iPqb+Cs3f4XBAkL66EvUx+X9438cUWg@mail.gmail.com>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
 <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>
 <CAGa7JC067oRNWdbUjZ=iPqb+Cs3f4XBAkL66EvUx+X9438cUWg@mail.gmail.com>
Message-ID: <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org>


> On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode <unicode at unicode.org> wrote:
> 
> So the "best-known Swiss tongue" is still not so much known, and still incorrectly referenced (frequently confused with "Swiss German", which is much like standard High German

I think Swiss German is in fact the correct English name for the Swiss dialects, taken from the German Schweizerdeutsch.

https://en.wikipedia.org/wiki/Swiss_German

From unicode at unicode.org  Fri Mar  9 07:55:16 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Fri, 9 Mar 2018 14:55:16 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
 <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>
 <CAGa7JC067oRNWdbUjZ=iPqb+Cs3f4XBAkL66EvUx+X9438cUWg@mail.gmail.com>
 <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org>
Message-ID: <CAGa7JC37fDRCRtWtcbhhvWiORzWXfZO45A3WrGAU0Uj+ydMBMQ@mail.gmail.com>

English Wikipedia is not a good reference for the name; the GSW wiki states
clearly another name and "Alemannic" is attested and correct for the family
of dialects.
"Schweizerdeutsch" is also wrong like "Swiss German" when it refers to
Alsatian (neither Swiss nor German for those speaking it): these
expressions only refer to "de-CH", not "gsw".

2018-03-09 14:40 GMT+01:00 Tom Gewecke via Unicode <unicode at unicode.org>:

>
> > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode <
> unicode at unicode.org> wrote:
> >
> > So the "best-known Swiss tongue" is still not so much known, and still
> incorrectly referenced (frequently confused with "Swiss German", which is
> much like standard High German
>
> I think Swiss German is in fact the correct English name for the Swiss
> dialects, taken from the German Schweizerdeutsch.
>
> https://en.wikipedia.org/wiki/Swiss_German
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/7ff38beb/attachment.html>

From unicode at unicode.org  Fri Mar  9 08:11:49 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Fri, 9 Mar 2018 15:11:49 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
 <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>
 <CAGa7JC067oRNWdbUjZ=iPqb+Cs3f4XBAkL66EvUx+X9438cUWg@mail.gmail.com>
 <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org>
Message-ID: <CAJ2xs_Ezh3GJdYjW_UrNRitKAv-Mx_s=krQDKrevV66GP5TUDA@mail.gmail.com>

Yes, the right English names are "Swiss High German" for de-CH, and "Swiss
German" for gsw-CH.

Mark

On Fri, Mar 9, 2018 at 2:40 PM, Tom Gewecke via Unicode <unicode at unicode.org
> wrote:

>
> > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode <
> unicode at unicode.org> wrote:
> >
> > So the "best-known Swiss tongue" is still not so much known, and still
> incorrectly referenced (frequently confused with "Swiss German", which is
> much like standard High German
>
> I think Swiss German is in fact the correct English name for the Swiss
> dialects, taken from the German Schweizerdeutsch.
>
> https://en.wikipedia.org/wiki/Swiss_German
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/42f4d393/attachment.html>

From unicode at unicode.org  Fri Mar  9 08:52:16 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Fri, 9 Mar 2018 15:52:16 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAJ2xs_Ezh3GJdYjW_UrNRitKAv-Mx_s=krQDKrevV66GP5TUDA@mail.gmail.com>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
 <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>
 <CAGa7JC067oRNWdbUjZ=iPqb+Cs3f4XBAkL66EvUx+X9438cUWg@mail.gmail.com>
 <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org>
 <CAJ2xs_Ezh3GJdYjW_UrNRitKAv-Mx_s=krQDKrevV66GP5TUDA@mail.gmail.com>
Message-ID: <CAGa7JC1m+54JFtub7m=Uc1SdCki1a5pTcVs6yz8iMDE4hHjLLQ@mail.gmail.com>

In summary you do not object the fact that unqualified "gsw" language code
is not (and should not be) named "Swiss German" (as it is only for
"gsw-CH", not for any other non-Swiss variants of Alemannic).

The addition of "High" is optional, unneeded in fact, as it does not remove
any ambiguity, in Germany for "de-DE", or in Switzerland for "de-CH", or in
Italian South Tyrol for "de-IT", or in Austria for "de-AT", or even for
"Standard German" (de)

Note also that Alsatian itself ("gsw-FR") is considered part of the "High
German" branch of Germanic languages !

"High German" refers to the group that includes Standard German and its
national variants ("de", "de-DE", "de-CH", "de-AT", "de-CH", "de-IT") as
well as the Alemannic group ( "gsw" , "gsw-FR", "gsw-CH"), possibly extended
(this is discutable) to Schw?bish in Germany and Hungary.

My opinion is that even the Swiss variants should be preferably named
"Swiss Alemannic" collectively, and not "Swiss German" which causes
constant confusion between "de-CH" and "gsw-CH".


2018-03-09 15:11 GMT+01:00 Mark Davis ?? via Unicode <unicode at unicode.org>:

> Yes, the right English names are "Swiss High German" for de-CH, and "Swiss
> German" for gsw-CH.
>
> Mark
>
> On Fri, Mar 9, 2018 at 2:40 PM, Tom Gewecke via Unicode <
> unicode at unicode.org> wrote:
>
>>
>> > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode <
>> unicode at unicode.org> wrote:
>> >
>> > So the "best-known Swiss tongue" is still not so much known, and still
>> incorrectly referenced (frequently confused with "Swiss German", which is
>> much like standard High German
>>
>> I think Swiss German is in fact the correct English name for the Swiss
>> dialects, taken from the German Schweizerdeutsch.
>>
>> https://en.wikipedia.org/wiki/Swiss_German
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/26c7bd6c/attachment.html>

From unicode at unicode.org  Fri Mar  9 08:58:29 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Fri, 9 Mar 2018 15:58:29 +0100 (CET)
Subject: Translating the standard (was: Re: Fonts and font sizes used in
 the Unicode)
In-Reply-To: <20180308183304.GB2050855@phare.normalesup.org>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
Message-ID: <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>

On 08/03/18 19:33, Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:
> 
> On Thu, Mar 08, 2018 at 07:05:06PM +0100, Marcel Schneider via Unicode wrote:
> > https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1206989878&sr=8-1
> 
> You?re linking to the wrong one of Patrick?s books :-) The
> translation he made of version 3.1 (not 5.0) of the core specification
> is available in full at http://hapax.qc.ca/ (?Unicode et ISO 10646 en
> fran?ais?, middle of page), as well as a few free sample chapters from
> his other book.
> 
> Best,
> 
> Arthur
> 

Indeed, thank you very much for correction, and thanks for the link.

I can tell so much that the free online chapters of Patrick Andries? translation 
of the Unicode standard were to me the first introduction, more precisely ch. 7 
(Punctuation) which I even printed out to get in touch with the various dashes 
and spaces and learn more about quotation marks. [I didn?t have internet and
took the copy home from a library.] Based on this experience, I think there isn?t 
too much extrapolation in supposing that millions of newcomers in all countries 
could use such a translation. Although the latest version of TUS is obviously more 
up?to?date, version 3.1 isn?t plain wrong at all. Hence I warmly recommend to
translate at least v3.1 ? or those chapters of v10.0 that are already in v3.1 ? 
while prompting the reader to seek further information on the Unicode website.

We note too that Patrick?s translation is annotated (footnotes in gray print) with
additional information of interest for the target locale. (Here one could mention 
that Latin script requires preformatted superscript letters for an interoperable 
representation of current text in some languages.)

Some Unicode terminology like ?bidi?mirroring? may be hard to adapt but that 
isn?t more of a challenge than any tech/science writer is facing when handling 
content that was originally produced in the United States and/or, more generally,
in English. E.g. in French we may choose from a panel of more conservative 
through less usual grammatical forms among which: ?r?flexion bidi?, ?r?flexion
bidirectonnelle?, ?bidi?reflexion? (hyphenated or not), ?r?flexible? or, simply, 
?miroir?. Anyway, every locale is expected to localize the full range of Unicode 
terminology ? unless people agree to switch to English whenever the topic is 
Unicode, even while discussing any other topic currently in Chinese or in Japanese, 
although doing so is not a problem, it?s just ethically weird.

So we look forward to the concept of a ?Unicode in Practice? textbook implemented
in Chinese and in Japanese and in any other non?English and non?French locale if it
isn?t already.

As of translating the Core spec as a whole, why did two recent attempts crash even 
before the maintenance stage, while the 3.1 project succeeded?

Some pieces of the puzzle seem to be still missing.

Best regards,

Marcel


From unicode at unicode.org  Fri Mar  9 10:21:31 2018
From: unicode at unicode.org (via Unicode)
Date: Sat, 10 Mar 2018 00:21:31 +0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
 <20180308230603.3fb73ef6@JRWUBU2>
 <CAGa7JC3HM6HxEHHJ=JSee1++x-PvvfUV9watZXBNF-6DCePsKg@mail.gmail.com>
Message-ID: <60d1d499b4351e3d4285692afb1d792a@koremail.com>

On 09.03.2018 09:17, Philippe Verdy via Unicode wrote:
> This still leaves the question about how to write personal names !
> IDS alone cannot represent them without enabling some "reasonable"
> ligaturing (they dont have to match the exact strokes variants for
> optimal placement, or with all possible simplifications).
> Im curious to know how China, Taiwan, Singapore or Japan handle this
> (for official records or in banks): like our personal signatures (as
> digital images), and then using a simplified official record
> (including the registration of romanized names)?
>
> 2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode
> <unicode at unicode.org [2]>:
>
In mainliand China the full back is to use pinyin capitals without tone 
marks, so ASCII. Passport have names printed in both Chinese characters 
and capitalised pinyin, both are legally valid. ID cards which people 
get when they turn 16 have the names in printed Chinese characters only. 
So these I assume must be printed using a system that has some 
characters not in UCS. Banks certainly don't have all these extra 
characters so they use capitalised pinyin for any characters they can 
not type.

Japan in CJK Ext F had 1,645 characters which included all characters 
required for names of poeple and places. So there should be no need for 
a fallback system, Unicode is enough, now

John Knightley

From unicode at unicode.org  Fri Mar  9 10:41:35 2018
From: unicode at unicode.org (Ken Whistler via Unicode)
Date: Fri, 9 Mar 2018 08:41:35 -0800
Subject: Translating the standard
In-Reply-To: <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
Message-ID: <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>


On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote:
> As of translating the Core spec as a whole, why did two recent attempts crash even
> before the maintenance stage, while the 3.1 project succeeded?

Essentially because both the Japanese and the Chinese attempts were 
conceived of as commercial projects, which ultimately did not cost out 
for the publishers, I think. Both projects attempted limiting the scope 
of their translation to a subset of the core spec that would focus on 
East Asian topics, but the core spec is complex enough that it does not 
abridge well. And I think both projects ran into difficulties in trying 
to figure out how to deal with fonts and figures.

The Unicode 3.0 translation (and the 3.1 update) by Patrick Andries was 
a labor of love. In this arena, a labor of love is far more likely to 
succeed than a commercial translation project, because it doesn't have 
to make financial sense.

By the way, as a kind of annotation to an annotated translation, people 
should know that the 3.1 translation on Patrick's site is not a straight 
translation of 3.1, but a kind of interpreted adaptation. In particular, 
it incorporated a translation of UAX #15, Unicode Normalization Forms, 
Version 3.1.0, as a Chapter 6 of the translation, which is not the 
actual structure of Unicode 3.1. And there are other abridgements and 
alterations, where they make sense -- compare the resources section of 
the Preface, for example. This is not a knock on Patrick's excellent 
translation work, but it does illustrate the inherent difficulties of 
trying to approach a complete translation project for *any* version of 
the Unicode Standard.

--Ken


From unicode at unicode.org  Fri Mar  9 11:29:07 2018
From: unicode at unicode.org (via Unicode)
Date: Sat, 10 Mar 2018 01:29:07 +0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <20180308230603.3fb73ef6@JRWUBU2>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
 <20180308230603.3fb73ef6@JRWUBU2>
Message-ID: <269e1fb39a96b080981fd2748b6bfc65@koremail.com>

Dear Richard,

On 09.03.2018 07:06, Richard Wordingham via Unicode wrote:
> On Thu, 08 Mar 2018 09:42:38 +0800
> via Unicode <unicode at unicode.org> wrote:
>
>> to the best of my knowledge virtually no new characters used just 
>> for
>> names are under consideration, all the ones that are under
>> consideration are from before this century.
>
> What I was interested in was the rate of generation of new
> CJK characters in general, not just those for names.  I appreciate 
> that
> encoding is dominated by the backlog of older characters.
>

Impossible to give an accurate answer or even a reasonable guess.

As to those that would be condidates for Unicode, my guess would be not 
more than a few dozen a year. New  characters are not permitted in legal 
names. Fanasty Chinese characters used for a alien language or a mystery 
novel would not usually be suitable for encoding. Most new words in 
Chinese have more than one syllable and do not require any new 
characters. Documented increase such as scientific terms for new 
elements, flora and fauna, would seem to be not more one or two dozen a 
year.

Regards
John Knightley


> Richard.


From unicode at unicode.org  Fri Mar  9 12:46:23 2018
From: unicode at unicode.org (Ken Whistler via Unicode)
Date: Fri, 9 Mar 2018 10:46:23 -0800
Subject: Unicode Emoji 11.0 characters now ready for adoption!
In-Reply-To: <269e1fb39a96b080981fd2748b6bfc65@koremail.com>
References: <5A95D192.5050608@unicode.org>
 <91680448.22170.1519824152519@ox.hosteurope.de>
 <CAGa7JC0NiU5G4v6rTMck0XdaftaHYQ+YGigU3jbtDUgBBo9d0A@mail.gmail.com>
 <83722fa3ed05a8b0989a963b3f26833a@koremail.com>
 <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp>
 <CAGHjPPL_-LOY=EvOgVsoYK1x26qEJazKi7utvttYb2J50k_3oA@mail.gmail.com>
 <447c571bad4174b493e4bd42ee7a41f2@koremail.com>
 <20180307202621.770d1099@JRWUBU2>
 <51c12b4974b1cc0d2f476e29db47b99d@koremail.com>
 <20180308230603.3fb73ef6@JRWUBU2>
 <269e1fb39a96b080981fd2748b6bfc65@koremail.com>
Message-ID: <f242af9f-ae07-5196-c562-ff91b40fa1e1@att.net>


On 3/9/2018 9:29 AM, via Unicode wrote:
> Documented increase such as scientific terms for new elements, flora 
> and fauna, would seem to be not more one or two dozen a year. 

Indeed. Of the "urgently needed characters" added to the unified CJK 
ideographs for Unicode 11.0, two were obscure place name characters 
needed to complete mapping for the Japanese IT mandatory use of the Moji 
Joho collection.

The other three were newly standardized Chinese characters for 
superheavy elements that now have official designations by the IUPAC (as 
of December 2015): Nihonium (113), Tennessine (117) and Oganesson (118). 
The Chinese characters coined for those 3 were encoded at U+9FED, 
U+9FEC, and U+9FEB, respectively.

Oganesson, in particular, is of interest, as the heaviest known element 
produced to date. It is the subject of 1000's of hours of intense 
experimentation and of hundreds of scientific papers, but:

    ... since 2005, only five (possibly six) atoms of the nuclide ^294
    Og have been detected.


But we already have a Chinese character (pronounced ?o) for Og, and a 
standardized Unicode code point for it: U+9FEB.

Next up: unobtanium and hardtofindium

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/741ff65b/attachment.html>

From unicode at unicode.org  Fri Mar  9 15:19:46 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Fri, 9 Mar 2018 22:19:46 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAGa7JC1m+54JFtub7m=Uc1SdCki1a5pTcVs6yz8iMDE4hHjLLQ@mail.gmail.com>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
 <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de>
 <CAGa7JC067oRNWdbUjZ=iPqb+Cs3f4XBAkL66EvUx+X9438cUWg@mail.gmail.com>
 <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org>
 <CAJ2xs_Ezh3GJdYjW_UrNRitKAv-Mx_s=krQDKrevV66GP5TUDA@mail.gmail.com>
 <CAGa7JC1m+54JFtub7m=Uc1SdCki1a5pTcVs6yz8iMDE4hHjLLQ@mail.gmail.com>
Message-ID: <CAJ2xs_ErnRz=HWEKXP3AS2O61xjdsFOAcRBWzvcbHbxkbpKf1g@mail.gmail.com>

> In summary you do not object the fact that unqualified "gsw" language code

?Whether I object or not makes no? difference.

Whether for good or for bad, the gsw code (clearly originally for
German-Swiss from the code letters) has been expanded beyond the borders of
Switzerland. There are also separate codes for Schw?bisch and
Waliserd?tsch, so outside of Switzerland 'gsw' mainly extends to Elsassisch
(Alsace, ~0.5M speakers). So gsw-CH works to limit the scope to Switzerland
(~4.5M speakers).

> My opinion is that even the Swiss variants should be preferably named
"Swiss Alemannic" collectively...

That's clearly also not going to happen for the English term. Good luck
with the French equivalent...

Mark

On Fri, Mar 9, 2018 at 3:52 PM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> In summary you do not object the fact that unqualified "gsw" language code
> is not (and should not be) named "Swiss German" (as it is only for
> "gsw-CH", not for any other non-Swiss variants of Alemannic).
>
> The addition of "High" is optional, unneeded in fact, as it does not
> remove any ambiguity, in Germany for "de-DE", or in Switzerland for
> "de-CH", or in Italian South Tyrol for "de-IT", or in Austria for "de-AT",
> or even for "Standard German" (de)
>
> Note also that Alsatian itself ("gsw-FR") is considered part of the "High
> German" branch of Germanic languages !
>
> "High German" refers to the group that includes Standard German and its
> national variants ("de", "de-DE", "de-CH", "de-AT", "de-CH", "de-IT") as
> well as the Alemannic group ( "gsw" , "gsw-FR", "gsw-CH"), possibly extended
> (this is discutable) to Schw?bish in Germany and Hungary.
>
> My opinion is that even the Swiss variants should be preferably named
> "Swiss Alemannic" collectively, and not "Swiss German" which causes
> constant confusion between "de-CH" and "gsw-CH".
>
>
> 2018-03-09 15:11 GMT+01:00 Mark Davis ?? via Unicode <unicode at unicode.org>
> :
>
>> Yes, the right English names are "Swiss High German" for de-CH, and
>> "Swiss German" for gsw-CH.
>>
>> Mark
>>
>> On Fri, Mar 9, 2018 at 2:40 PM, Tom Gewecke via Unicode <
>> unicode at unicode.org> wrote:
>>
>>>
>>> > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode <
>>> unicode at unicode.org> wrote:
>>> >
>>> > So the "best-known Swiss tongue" is still not so much known, and still
>>> incorrectly referenced (frequently confused with "Swiss German", which is
>>> much like standard High German
>>>
>>> I think Swiss German is in fact the correct English name for the Swiss
>>> dialects, taken from the German Schweizerdeutsch.
>>>
>>> https://en.wikipedia.org/wiki/Swiss_German
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180309/467092cd/attachment.html>

From unicode at unicode.org  Sat Mar 10 05:26:42 2018
From: unicode at unicode.org (philip chastney via Unicode)
Date: Sat, 10 Mar 2018 11:26:42 +0000 (UTC)
Subject: A sketch with the best-known Swiss tongue twister
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
Message-ID: <392967520.14691005.1520681202416@mail.yahoo.com>

it is not clear whether you are quoting from some agreed standard, quoting from some other authority, or constructing a classification of your own

whatever the classification, it should be descriptive, and it is best not to be too pedantic, because practice can vary from region to region, from individual to individual with the same region, and from context to context for an individual

I would make the following observations on terminology in practice:

-- the newspapers in Zurich advertised courses in "Schweizerdeutsch", meaning the contemporary spoken language

-- in Wengen (pronounced with a [w] not a [v]), I tried to explain to the man behind the counter that my ski binding needed fixing, using my best High German (with a Stuttgart accent, according to my tutor - he came from Hannover, so I don't think it was intended as a compliment)
    with a muttered "momenta", the owner dived into the back of the shop, to fetch the technician, whose skills included conversation in High German  --  I told him my problem, he told me it wasn't worth fixing, and I said, "Oh, bugger"
    at this point, they realised I was a Brit, and (at their request) we switched to English ("so much easier", the owner said)  --  for all 3 of us, High German was a foreign language

-- in Romansch-speaking St. Moritz, the hotels claim to be able to accomodate those who speak High German, as well as those who speak Swiss German (because the two languages are not always mutually intelligible)

-- the newspapers in Zurich advertised courses in "Hoch Deutsch", for those who needed to deal with foreigners

-- when I lived that way, the French-speaking population of Nancy referred to the language of their German-speaking compatriots as "platt deutsch" (the way they used the term, it did not extend any further east than Alsace)

-- in Luxemburg, the same language was referred to as Luxemburgish (or Letzeburgesch, which is Luxemburgish for "Luxemburgish ") 
    (I forget what the Belgians called the language spoken in Ostbelgien)

-- I was assured by a Luxemburgish-speaking car mechanic, with a Swiss German speaking wife, that the two languages (dialects?) were practically identical, except for the names of some household items

in short, there seems little point in making distinctions which cannot be precisely identified in practice

there appear to be significant differences between between High German and (what the natives call) Swiss German

there are far fewer significant differences between Swiss German and the other spoken Germanic languages found on the borders of Germany

/phil

--------------------------------------------
On Fri, 9/3/18, Philippe Verdy via Unicode <unicode at unicode.org> wrote:

 Subject: Re: A sketch with the best-known Swiss tongue twister
 To: "Mark Davis ??" <mark at macchiato.com>
 Cc: "Tom Gewecke" <tom at bluesky.org>, "unicode Unicode Discussion" <unicode at unicode.org>
 Date: Friday, 9 March, 2018, 2:52 PM
 
 In summary you do not object the fact that unqualified "gsw" language code is not (and should not be) named "Swiss German" (as it is only for "gsw-CH", not for any other non-Swiss variants of Alemannic).

The addition of "High" is optional, unneeded in fact, as it does not remove any ambiguity, in Germany for "de-DE", or in Switzerland for "de-CH", or in Italian South Tyrol for "de-IT", or in Austria for "de-AT", or even for "Standard German" (de)
 
 Note also that Alsatian itself ("gsw-FR") is considered part of the "High German" branch of Germanic languages !
 "High German" refers to the group that includes Standard German and its national variants ("de", "de-DE",
 "de-CH", "de-AT", "de-CH", "de-IT") as well as the Alemannic group ( "gsw"?, "gsw-FR", "gsw-CH"), possibly?extended (this is discutable) to Schw?bish in Germany and Hungary.
 
 My opinion is that even the Swiss variants should be preferably named "Swiss Alemannic" collectively, and not
 "Swiss German" which causes constant confusion between "de-CH" and "gsw-CH".
 

From unicode at unicode.org  Sat Mar 10 06:16:48 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Sat, 10 Mar 2018 13:16:48 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <392967520.14691005.1520681202416@mail.yahoo.com>
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
 <392967520.14691005.1520681202416@mail.yahoo.com>
Message-ID: <CAGa7JC1-tbvyyoFhegzCRbUck5otpj3xPa=fqPmek738+_hsQw@mail.gmail.com>

2018-03-10 12:26 GMT+01:00 philip chastney <philip_chastney at yahoo.com>:

>
> -- when I lived that way, the French-speaking population of Nancy referred
> to the language of their German-speaking compatriots as "platt deutsch"
> (the way they used the term, it did not extend any further east than Alsace)
>

Note that  this is what you heard in Lorraine, and there's some competition
between Lorraine and Alsace. If you lived in Alsace they absolutely don't
like to have their language named "German" or "Deutsch" or "platt Deutsch",
this is "alsacien" for them and nothing else even if people in Lorraine
(that use other regional oil languages, not based on the Germanic substrate
but on Romance substrate) refer to Alsatians as "platt deutsch" with even
more confusion as it actually mean "low German" and confusing with "nds"
spoken much further to the North (North-western Germany and Netherlands)
and not at all in France, not even in the Nord department (where Flemish,
i.e. a local variant of Dutch="nl-FR" is spoken by a small aging minority
around Dunkerque and nearly extinct now everywhere in the French Flanders
and extinct now in Lille, replaced since long by the popular Lillois
variant of Picard locally named "ch'timi").
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180310/47e14590/attachment.html>

From unicode at unicode.org  Sat Mar 10 12:26:32 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Sat, 10 Mar 2018 19:26:32 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <20180310180235.GB3698923@phare.normalesup.org>
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
 <392967520.14691005.1520681202416@mail.yahoo.com>
 <CAGa7JC1-tbvyyoFhegzCRbUck5otpj3xPa=fqPmek738+_hsQw@mail.gmail.com>
 <20180310180235.GB3698923@phare.normalesup.org>
Message-ID: <CAGa7JC38xTCdkbUYygUZAC3gbi9GC++w9hSR72=O6LmvTmfGBg@mail.gmail.com>

2018-03-10 19:02 GMT+01:00 Arthur Reutenauer <
arthur.reutenauer at normalesup.org>:

>         Philippe,
>
>   So many approximations and misinterpretations ...
>
> > Note that  this is what you heard in Lorraine, and there's some
> competition
> > between Lorraine and Alsace. If you lived in Alsace they absolutely don't
> > like to have their language named "German" or "Deutsch" or "platt
> Deutsch",
> > this is "alsacien" for them and nothing else
>
>   Condescending, are we?  This can of course be a delicate issue,
> especially if expressed insensitively, but most people are also able to
> recognise objective truths.  I never heard anyone deny that Alsatian was
> a dialect of German, except the totally misinformed.  There is even a
> good feeling of connection with the dialects beyond the border, in Baden
> in particular (not so much in Switzerland) -- and an acknowledgement
> that dialects become quite different further inland.
>
> >                                              even if people in Lorraine
> > (that use other regional oil languages, not based on the Germanic
> substrate
> > but on Romance substrate) refer to Alsatians as "platt deutsch" with even
> > more confusion as it actually mean "low German" and confusing with "nds"
> > spoken much further to the North (North-western Germany and Netherlands)
>
>   Where do I start?
>
>   1. That?s not what Philip said
>   2. There is a Germanic dialect in Lorraine, with a large number of
> speakers
>

The dialect of Lorraine with the  large number of speaker is not the one
you think about, yes it is a Romance/O?l language and not Germanic at all.

The one you are refering to is only in a very small tiny part of Lorraine
and almost extinct.

  3. Platt just means dialect in German
>   4. Nobody is confusing Lothringer Platt with Low German, except perhaps
> you
>

You are confusing it with the "parler lorrain" (as I said, "Lothringer
Platt", part of "Francique" is nearly extinct in Lorraine, this is not the
case of the "Parler lorrain", also known in Belgium as "Gaumais" and very
near from "Wallon").


>   5. If you?re going to write ?o?l languages? in English you could at
>      least put the diaeresis on the ?i?, otherwise it really looks silly
>

Sorry, my message was posted in English, I had not realized that "Oil" with
the capital would look so silly without the diaeresis and in this context,
as if we were sepaking about olives or burnable energy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180310/e6ca6cca/attachment.html>

From unicode at unicode.org  Sat Mar 10 14:44:14 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Sat, 10 Mar 2018 21:44:14 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <20180310193359.GA3818257@phare.normalesup.org>
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
 <392967520.14691005.1520681202416@mail.yahoo.com>
 <CAGa7JC1-tbvyyoFhegzCRbUck5otpj3xPa=fqPmek738+_hsQw@mail.gmail.com>
 <20180310180235.GB3698923@phare.normalesup.org>
 <CAGa7JC38xTCdkbUYygUZAC3gbi9GC++w9hSR72=O6LmvTmfGBg@mail.gmail.com>
 <20180310193359.GA3818257@phare.normalesup.org>
Message-ID: <CAGa7JC0rVdA3PnmepRDj=4RpAxGvh6jyvfD7P+y1P0u+GfaCgQ@mail.gmail.com>

Apparently you just trust Wikipedia that uses old sources.

Very poopulated area does not mean it is populated by native speakers.
There were lots of migrants that never spoke anything than just standard
French or French slightly "creolized" with foreign languages (but these
adapations are also disappearing in younger generations of people born in
France from migrants). The "Francique" is not so popular, much less than
Alsatian (Alsace is very densily populated too) and Francique" is not the
same as Alsatian and doest not have the same mevel of protection by
cultural relation institututions (there's no national support at all only
regional initiatives or initiatives taken by municipalities to support
schools, and some museums or local universities with linguistic study
branches).

Apparently you've never been in France: regional languages have low levels
of support (lower than the support for English or Standard German or
Spanish in higher levels of education, or even Arabic, Latin and Hebrew,
sponsored by private educational institutiuons where a minimum "trunk" for
standard French is still mandatory for most domains).

I really doubt you can find 400,000 speakers of Francique in Lorraine,
except in a very narrow band near Luxembourg in rural areas in an aging
population. I've lived and worked in Nancy and Metz, and in fact almost
never heard any word in that language, only French and few reginal words.

On the opposite the Alsatian language (French Allemanic) is very vivid in
Alsace (including in Strasbourg), and not correlated with the Allemanic
languages of Switzerland and far enough from standard German to be
distinguished.
.

2018-03-10 20:33 GMT+01:00 Arthur Reutenauer <
arthur.reutenauer at normalesup.org>:

> > The dialect of Lorraine with the  large number of speaker is not the one
> > you think about, yes it is a Romance/O?l language and not Germanic at
> all.
>
>   You are not reading what I write, so you can?t know what I?m thinking.
>
> > The one you are refering to is only in a very small tiny part of Lorraine
> > and almost extinct.
>
>   Yes, and that?s the language Philip was talking about, reportedly
> called Plattdeutsch by French speakers.  What?s your source for ?almost
> extinct??  Ethnologue 20th ed. has 400,000 speakers (2013), even
> accounting for possible exaggerations that?s hardly extinct.  The ?very
> small tiny part? where it?s spoken ? 3,300km? according to The Dialects
> of Modern German (Charles Russ ed., Routledge 1990) ? is very populous
> because of the former mining industry.
>
> > You are confusing it with the "parler lorrain" (as I said, "Lothringer
> > Platt", part of "Francique" is nearly extinct in Lorraine, this is not
> the
> > case of the "Parler lorrain", also known in Belgium as "Gaumais" and very
> > near from "Wallon").
>
>   You are condescending and your pseudo-erudition gets in the way of the
> conversation.  Nobody except you mentioned Romance dialects, you just
> drifted there on your own.
>
>         Arthur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180310/2d941f24/attachment.html>

From unicode at unicode.org  Sat Mar 10 23:04:25 2018
From: unicode at unicode.org (Keith Turner via Unicode)
Date: Sun, 11 Mar 2018 00:04:25 -0500
Subject: base1024 encoding using Unicode emojis
Message-ID: <CAGUtCHps=h+X-rBE=-X8C+eACGaL3Se6UEimgd0h+KwOj3ZifA@mail.gmail.com>

I created a neat little project based on Unicode emojis.  I thought
some on this list may find it interesting.  It encodes arbitrary data
as 1024 emojis.  The project is called Ecoji and is hosted on github
at https://github.com/keith-turner/ecoji

Below are some examples of encoding and decoding.

$ echo 'Unicode emojis are awesome!!' | ecoji
????????????????????????????????????????????????

$ echo ????????????????????????????????????????????????   | ecoji -d
Unicode emojis are awesome!!

I would eventually like to create a base4096 version when there are more emojis.

Keith


From unicode at unicode.org  Sun Mar 11 09:46:54 2018
From: unicode at unicode.org (Mathias Bynens via Unicode)
Date: Sun, 11 Mar 2018 15:46:54 +0100
Subject: base1024 encoding using Unicode emojis
In-Reply-To: <CAGUtCHps=h+X-rBE=-X8C+eACGaL3Se6UEimgd0h+KwOj3ZifA@mail.gmail.com>
References: <CAGUtCHps=h+X-rBE=-X8C+eACGaL3Se6UEimgd0h+KwOj3ZifA@mail.gmail.com>
Message-ID: <CACrCd9Pte0XCW0+JO-BejHJaKOAPhymoM1bPdsZfUURmgLY6hQ@mail.gmail.com>

Neat! Prior art:

   - https://github.com/watson/base64-emoji
   - https://github.com/nate-parrott/emojicode


On Sun, Mar 11, 2018 at 6:04 AM, Keith Turner via Unicode <
unicode at unicode.org> wrote:

> I created a neat little project based on Unicode emojis.  I thought
> some on this list may find it interesting.  It encodes arbitrary data
> as 1024 emojis.  The project is called Ecoji and is hosted on github
> at https://github.com/keith-turner/ecoji
>
> Below are some examples of encoding and decoding.
>
> $ echo 'Unicode emojis are awesome!!' | ecoji
> ????????????????????????????????????????????????
>
> $ echo ????????????????????????????????????????????????   | ecoji -d
> Unicode emojis are awesome!!
>
> I would eventually like to create a base4096 version when there are more
> emojis.
>
> Keith
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180311/ba88260b/attachment.html>

From unicode at unicode.org  Sun Mar 11 10:25:13 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Sun, 11 Mar 2018 16:25:13 +0100
Subject: base1024 encoding using Unicode emojis
In-Reply-To: <CAGUtCHps=h+X-rBE=-X8C+eACGaL3Se6UEimgd0h+KwOj3ZifA@mail.gmail.com>
References: <CAGUtCHps=h+X-rBE=-X8C+eACGaL3Se6UEimgd0h+KwOj3ZifA@mail.gmail.com>
Message-ID: <CAGa7JC0+ueZ30OXup_p2HCUkb4PBzfnJirGZKBY6a33PUikX-Q@mail.gmail.com>

Ideally, the purpose of such base-1024 encoding is to allow compacting
arbitrary data into plain-text which can be safely preserved including by
Unicode normalization and transforms by encoding like UTF-8.
But then we have a way to do that is such a way that this minimizes the
UTF-8 string sizes (Emojis is probably not the best set to use if most of
them lie in supplementary planes).

You can choose another arbitrary set of 1024 codepoints in the BMP that is
preserved by normalization (no decomposition, combining class=0) and text
filters (no controls, no whitespaces, possibly no punctuation, only letters
or digits) and which is still simple to compute with a basic algorithm not
requiring any table lookup (only a few tests for some boundary values or a
very small lookup table with 16 entries, one entry for each subset of 64
values).

As well some frequent binary data (notably runs of null bytes) should be
able to use shorter UTF-8 sequences from the ASCII set, so my opinion is
that the 64 first codes should be the same as standard Base-64, others can
be taken easily from CJK blocks, or the PUA block in the BMP, but you can
also select some blocks below the U+0800 codepoint so that they get encoded
as 2 bytes and not 3 for the rest of the BMP (and 4 bytes for most emojis,
where 10 bits become 64 bits with a huge waste of storage space in UTF-8)

So the real need it to find the smallest set of subranges with 64
consecutive codepoints with minimal values that contain only letters or
digits and where all positions are assigned with such general properties.
Emojis will unlikely be part of them ! With this goal, you can even avoid
using any PUAs (which are likely to be filtered/forbidden by some
protocols), or compatibility characters (likely to be transformed by
NFKC/NFKD).

And even within just the BMP, you could reach more than 10-bit encoding
(base-1024) and can probably find 12-bit encoding (base 4096) or more (CJK
blocks of the BMP offer wide ranges of suitable characters, as well as some
extended Latin or extended Cyrillic blocks)

If you want to use supplementary characters that are already encoded, then
you can certainly use CJK blocks in the large supplementary ideographic
plane and create a 16-bit encoding (base 65536). Only some legacy Emojis in
the BMP will be used before that.


2018-03-11 6:04 GMT+01:00 Keith Turner via Unicode <unicode at unicode.org>:

> I created a neat little project based on Unicode emojis.  I thought
> some on this list may find it interesting.  It encodes arbitrary data
> as 1024 emojis.  The project is called Ecoji and is hosted on github
> at https://github.com/keith-turner/ecoji
>
> Below are some examples of encoding and decoding.
>
> $ echo 'Unicode emojis are awesome!!' | ecoji
> ????????????????????????????????????????????????
>
> $ echo ????????????????????????????????????????????????   | ecoji -d
> Unicode emojis are awesome!!
>
> I would eventually like to create a base4096 version when there are more
> emojis.
>
> Keith
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180311/c143a3a0/attachment.html>

From unicode at unicode.org  Sun Mar 11 12:07:27 2018
From: unicode at unicode.org (Keith Turner via Unicode)
Date: Sun, 11 Mar 2018 13:07:27 -0400
Subject: base1024 encoding using Unicode emojis
In-Reply-To: <CAGa7JC0+ueZ30OXup_p2HCUkb4PBzfnJirGZKBY6a33PUikX-Q@mail.gmail.com>
References: <CAGUtCHps=h+X-rBE=-X8C+eACGaL3Se6UEimgd0h+KwOj3ZifA@mail.gmail.com>
 <CAGa7JC0+ueZ30OXup_p2HCUkb4PBzfnJirGZKBY6a33PUikX-Q@mail.gmail.com>
Message-ID: <CAGUtCHoevPWWDc9LoWy-eK7rsdg9X7O5DTyDJs5X2ZfNQmgShg@mail.gmail.com>

On Sun, Mar 11, 2018 at 11:25 AM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> Ideally, the purpose of such base-1024 encoding is to allow compacting
> arbitrary data into plain-text which can be safely preserved including by
> Unicode normalization and transforms by encoding like UTF-8.
> But then we have a way to do that is such a way that this minimizes the
> UTF-8 string sizes (Emojis is probably not the best set to use if most of
> them lie in supplementary planes).
>

Yeah, it certainly results in larger utf8 strings.  For example a sha256
hash is 112 bytes when encoded as Ecoji utf8.  For base64, sha256 is 44
bytes.

Even though its more bytes, Ecoji has less visible characters than base64
for sha256.  Ecoji has 28 visible characters and base64 44.  So that makes
me wonder which one would be quicker for a human to verify on average?
Also, which one is more accurate for a human to verify? I have no idea. For
accuracy, it seems like a lot of thought was put into the visual uniqueness
of Unicode emojis.


>
> You can choose another arbitrary set of 1024 codepoints in the BMP that is
> preserved by normalization (no decomposition, combining class=0) and text
> filters (no controls, no whitespaces, possibly no punctuation, only letters
> or digits) and which is still simple to compute with a basic algorithm not
> requiring any table lookup (only a few tests for some boundary values or a
> very small lookup table with 16 entries, one entry for each subset of 64
> values).
>
> As well some frequent binary data (notably runs of null bytes) should be
> able to use shorter UTF-8 sequences from the ASCII set, so my opinion is
> that the 64 first codes should be the same as standard Base-64, others can
> be taken easily from CJK blocks, or the PUA block in the BMP, but you can
> also select some blocks below the U+0800 codepoint so that they get encoded
> as 2 bytes and not 3 for the rest of the BMP (and 4 bytes for most emojis,
> where 10 bits become 64 bits with a huge waste of storage space in UTF-8)
>
> So the real need it to find the smallest set of subranges with 64
> consecutive codepoints with minimal values that contain only letters or
> digits and where all positions are assigned with such general properties.
> Emojis will unlikely be part of them ! With this goal, you can even avoid
> using any PUAs (which are likely to be filtered/forbidden by some
> protocols), or compatibility characters (likely to be transformed by
> NFKC/NFKD).
>
> And even within just the BMP, you could reach more than 10-bit encoding
> (base-1024) and can probably find 12-bit encoding (base 4096) or more (CJK
> blocks of the BMP offer wide ranges of suitable characters, as well as some
> extended Latin or extended Cyrillic blocks)
>
> If you want to use supplementary characters that are already encoded, then
> you can certainly use CJK blocks in the large supplementary ideographic
> plane and create a 16-bit encoding (base 65536). Only some legacy Emojis in
> the BMP will be used before that.
>
>
>
> 2018-03-11 6:04 GMT+01:00 Keith Turner via Unicode <unicode at unicode.org>:
>
>> I created a neat little project based on Unicode emojis.  I thought
>> some on this list may find it interesting.  It encodes arbitrary data
>> as 1024 emojis.  The project is called Ecoji and is hosted on github
>> at https://github.com/keith-turner/ecoji
>>
>> Below are some examples of encoding and decoding.
>>
>> $ echo 'Unicode emojis are awesome!!' | ecoji
>> ????????????????????????????????????????????????
>>
>> $ echo ????????????????????????????????????????????????   | ecoji -d
>> Unicode emojis are awesome!!
>>
>> I would eventually like to create a base4096 version when there are more
>> emojis.
>>
>> Keith
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180311/caeba64e/attachment.html>

From unicode at unicode.org  Sun Mar 11 12:32:40 2018
From: unicode at unicode.org (Doug Ewell via Unicode)
Date: Sun, 11 Mar 2018 11:32:40 -0600
Subject: base1024 encoding using Unicode emojis
In-Reply-To: <mailman.1.1520787602.23194.unicode@unicode.org>
References: <mailman.1.1520787602.23194.unicode@unicode.org>
Message-ID: <6FAF09B530744A87AA1B3086073F8748@DougEwell>

Oh, let him have a little fun. At least he's using emoji for something 
related to characters, instead of playing Mr. Potato Head.

Incidentally, more prior art on large-base encoding:
https://sites.google.com/site/markusicu/unicode/base16k

--
Doug Ewell | Thornton, CO, US | ewellic.org


From unicode at unicode.org  Sun Mar 11 13:35:11 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Sun, 11 Mar 2018 19:35:11 +0100 (CET)
Subject: Translating the standard
Message-ID: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17>

On Fri, 9 Mar 2018 08:41:35 -0800, Ken Whistler wrote:
> 
> 
> On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote:
> > As of translating the Core spec as a whole, why did two recent attempts crash even
> > before the maintenance stage, while the 3.1 project succeeded?
> 
> Essentially because both the Japanese and the Chinese attempts were 
> conceived of as commercial projects, which ultimately did not cost out 
> for the publishers, I think.

I immediately thought of these projects as government?funded initiatives, 
which is most coherent with the importance of Unicode?s work for these 
nations given that the unified CJK repertoire has always consumed the most
of the Consortium?s resources, I figure out. However, looking into early 
translations on the Unicode site, only those governments that are close to 
the United Kingdom are unveiled (or not) to have helped promote Unicode 
education.

And from the one among the three terminological vocabularies that I?m able
to parse, as well as from the 60+ What?is?Unicode translations, we gain the 
chilling impression that once the early enthusiasm had passed away, any 
level of effort dropped down to zero. To such an extent that even the link
to the translation guidelines has been removed from the first place:

http://www.unicode.org/help/translation.html
|
| Although its working language is English, the Unicode Consortium strives to reach as many people
| and organizations in as many countries as possible around the world. One way of doing that is by
| encouraging the translation of Unicode material into languages other than English.
|
| This page guides volunteers who wish to contribute a translation of any Unicode material
| they deem interesting to their local audiences.

I fail to understand why increasing complexity decreases the need to be 
widely understood. Recurrent threads show how slowly Unicode education 
is spreading among English native speakers; others incidentally complained 
about Unicode?educational issues in African countries. *Not* translating 
the Standard ? in whatever way ? won?t help steepen the curve.

Best regards,

Marcel

[To be continued; sorry for delay.]


From unicode at unicode.org  Sun Mar 11 16:14:18 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Sun, 11 Mar 2018 22:14:18 +0100 (CET)
Subject: Translating the standard
In-Reply-To: <20180311200503.GB216921@phare.normalesup.org>
References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17>
 <20180311200503.GB216921@phare.normalesup.org>
Message-ID: <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17>

On 11/03/18 21:05, Arthur Reutenauer wrote:
> 
> On Sun, Mar 11, 2018 at 07:35:11PM +0100, Marcel Schneider via Unicode wrote:
> > I fail to understand why increasing complexity decreases the need to be 
> > widely understood.
> 
> I?m pretty sure that everybody will agree that the need gets all the
> greater as Unicode and connected technologies get more complex. But you
> can hopefully see that the cost also increases, and that?s incentive
> enough to refrain from doing it ? as it already was very costly fifteen
> years ago, it?s likely to be prohibitive today.
> 
> > Recurrent threads show how slowly Unicode education 
> > is spreading among English native speakers; others incidentally complained 
> > about Unicode?educational issues in African countries. *Not* translating 
> > the Standard ? in whatever way ? won?t help steepen the curve.
> 
> Nobody is saying ?let?s not translate the Unicode Standard?; what
> several people here have pointed out is that it pays to have more modest
> and manageable goals. Besides, you?re hinting yourself that the
> problems are not only with translation, since they also affect native
> English speakers.

Indeed, to be fair. And for implementers, documenting themselves in English 
may scarcely ever have much of a problem, no matter what?s the locale.

Today?s policy is, that we are welcome to browse Wikipedia:

http://www.unicode.org/standard/WhatIsUnicode.html

Fundamentally that?s true (although the wording could use some fixes as of 
the difference between *using* Unicode and *documenting* Unicode), and
it?s consistent with actual trends.

As of the cost ? It still seems to me that we?re far from the last word?

Best regards,

Marcel


From unicode at unicode.org  Mon Mar 12 02:39:53 2018
From: unicode at unicode.org (Alastair Houghton via Unicode)
Date: Mon, 12 Mar 2018 07:39:53 +0000
Subject: Translating the standard
In-Reply-To: <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17>
References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17>
 <20180311200503.GB216921@phare.normalesup.org>
 <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17>
Message-ID: <A52DCFBC-6D6B-4B69-B6D4-E41ED73235AE@alastairs-place.net>

On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode <unicode at unicode.org> wrote:
> 
> Indeed, to be fair. And for implementers, documenting themselves in English 
> may scarcely ever have much of a problem, no matter what?s the locale.

Agreed.  Implementers will already understand English; you can?t write computer software without, since almost all documentation is in English, almost all computer languages are based on English, and, to be frank, a large proportion of the software market is itself English speaking.  I have yet to meet a software developer who didn?t speak English.

That?s not to say that people wouldn?t appreciate a translation of the standard, but there are, as others have pointed out, obvious maintenance problems, not to mention the issue that plagues some international institutions, namely the fact that translations are necessarily non-canonical and so those who really care about the details of the rules usually have to refer to a version in a particular language (sometimes that language might be French rather than English; very occasionally there are two versions declared, for political reasons, to both be canonical, which is obviously risky as there?s a chance they might differ subtly on some point, perhaps even because of punctuation).

In terms of widespread understanding of the standard, which is where I think translation is perhaps more important, I?m not sure translating the actual standard itself is really the way forward.  It?d be better to ensure that there are reliable translations of books like Unicode Demystified or Unicode Explained - or, quite possibly, other books aimed more at the general public rather than the software community per se.

Kind regards,

Alastair.

--
http://alastairs-place.net


From unicode at unicode.org  Mon Mar 12 02:59:48 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Mon, 12 Mar 2018 08:59:48 +0100 (CET)
Subject: Translating the standard
In-Reply-To: <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
Message-ID: <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>

On Fri, 9 Mar 2018 08:41:35 -0800, Ken Whistler wrote:
> 
> 
> On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote:
> > As of translating the Core spec as a whole, why did two recent attempts crash even
> > before the maintenance stage, while the 3.1 project succeeded?
> 
> Essentially because both the Japanese and the Chinese attempts were 
> conceived of as commercial projects, which ultimately did not cost out 
> for the publishers, I think. Both projects attempted limiting the scope 
> of their translation to a subset of the core spec that would focus on 
> East Asian topics, but the core spec is complex enough that it does not 
> abridge well. And I think both projects ran into difficulties in trying 
> to figure out how to deal with fonts and figures.

This is normally catered for by Unicode whose fonts are donated and 
licensed for the sole purpose of documenting the Standard. See FAQ.

Templates of any material to be translated are sent by Unicode, aren?t 
they? The Unicode home page reads: ?An essential part of our mission 
is to educate and engage academic and scientific communities, and 
the general public.? Therefore, translators should just have to translate 
e.g. the NamesList following Ken?s sample localization (TN #24) ? 
which is already a hard piece of work ? and send the file to Unicode, 
to get a localized version of the Code Charts. Likewise ISO/IEC 10646 
is available in a French version or at least, it should have an official 
?French version like all ISO standards.

If Unicode don?t own the tooling yet, Apple shall be happy to donate the 
funding to get Unicode in a position to fulfill their mission thoroughly,
like Apple (supposedly) donate non?trivial amounts to many vendors to 
get them remove old software from the internet.

Using such localized NamesLists with Unibook to browse the Code Charts 
locally is another question, since that supposes handing the fonts out to 
the general public. So that is clearly a non?starter. But browsing localized
Code Charts in Adobe Reader would be a nice facility.

Best regards,

Marcel


From unicode at unicode.org  Mon Mar 12 03:34:01 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Mon, 12 Mar 2018 09:34:01 +0100 (CET)
Subject: Translating the standard
Message-ID: <789893273.3371.1520843642008.JavaMail.www@wwinf1m17>

On Mon, 12 Mar 2018 07:39:53 +0000, Alastair Houghton wrote:
> 
> On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode  wrote:
> > 
> > Indeed, to be fair. And for implementers, documenting themselves in English 
> > may scarcely ever have much of a problem, no matter what?s the locale.
> 
> Agreed. Implementers will already understand English; you can?t write computer software
> without, since almost all documentation is in English, almost all computer languages are
> based on English, and, to be frank, a large proportion of the software market is itself
> English speaking. I have yet to meet a software developer who didn?t speak English.
> 
> That?s not to say that people wouldn?t appreciate a translation of the standard, but there are,
> as others have pointed out, obvious maintenance problems, not to mention the issue that
> plagues some international institutions, namely the fact that translations are necessarily
> non-canonical and so those who really care about the details of the rules usually have to refer
> to a version in a particular language (sometimes that language might be French rather than
> English; very occasionally there are two versions declared, for political reasons, to both be
> canonical, which is obviously risky as there?s a chance they might differ subtly on some point,
> perhaps even because of punctuation).

Sometimes it occurred in the EU that the French version was so sloppy it transformed the issue 
to entirely another one, but at the Unicode?ISO/IEC merger the bad will was clearly on the other 
side ?

> 
> In terms of widespread understanding of the standard, which is where I think translation is
> perhaps more important, I?m not sure translating the actual standard itself is really the way
> forward. It?d be better to ensure that there are reliable translations of books like
> Unicode Demystified or Unicode Explained - or, quite possibly, other books aimed more at
> the general public rather than the software community per se.

Good point. What we need most of all is a complete terminology, as well as full ranges of 
character names in every language, to enable people to talk about it after reading in English. 

Best regards,

Marcel


From unicode at unicode.org  Mon Mar 12 04:11:09 2018
From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode)
Date: Mon, 12 Mar 2018 18:11:09 +0900
Subject: base1024 encoding using Unicode emojis
In-Reply-To: <CAGUtCHoevPWWDc9LoWy-eK7rsdg9X7O5DTyDJs5X2ZfNQmgShg@mail.gmail.com>
References: <CAGUtCHps=h+X-rBE=-X8C+eACGaL3Se6UEimgd0h+KwOj3ZifA@mail.gmail.com>
 <CAGa7JC0+ueZ30OXup_p2HCUkb4PBzfnJirGZKBY6a33PUikX-Q@mail.gmail.com>
 <CAGUtCHoevPWWDc9LoWy-eK7rsdg9X7O5DTyDJs5X2ZfNQmgShg@mail.gmail.com>
Message-ID: <3b0f9914-9d09-5fb9-9e3e-e68493a81e8a@it.aoyama.ac.jp>

On 2018/03/12 02:07, Keith Turner via Unicode wrote:

> Yeah, it certainly results in larger utf8 strings.  For example a sha256
> hash is 112 bytes when encoded as Ecoji utf8.  For base64, sha256 is 44
> bytes.
> 
> Even though its more bytes, Ecoji has less visible characters than base64
> for sha256.  Ecoji has 28 visible characters and base64 44.  So that makes
> me wonder which one would be quicker for a human to verify on average?
> Also, which one is more accurate for a human to verify? I have no idea. For
> accuracy, it seems like a lot of thought was put into the visual uniqueness
> of Unicode emojis.

Using emoji to help people verify security information is an interesting 
idea. What I'm afraid is that even if emoji are designed with 
distinctiveness in mind, some people may have difficulties distinguish 
all the various face variants. Also, while emoji get designed so that 
in-font distinguishability is high, the same may not apply across fonts 
(e.g. if one has to compare a printed version with a version on-screen).

Regards,   Martin.


>> 2018-03-11 6:04 GMT+01:00 Keith Turner via Unicode <unicode at unicode.org>:
>>
>>> I created a neat little project based on Unicode emojis.  I thought
>>> some on this list may find it interesting.  It encodes arbitrary data
>>> as 1024 emojis.  The project is called Ecoji and is hosted on github
>>> at https://github.com/keith-turner/ecoji
>>>
>>> Below are some examples of encoding and decoding.
>>>
>>> $ echo 'Unicode emojis are awesome!!' | ecoji
>>> ????????????????????????????????????????????????
>>>
>>> $ echo ????????????????????????????????????????????????   | ecoji -d
>>> Unicode emojis are awesome!!
>>>
>>> I would eventually like to create a base4096 version when there are more
>>> emojis.


From unicode at unicode.org  Mon Mar 12 05:00:16 2018
From: unicode at unicode.org (Andrew West via Unicode)
Date: Mon, 12 Mar 2018 10:00:16 +0000
Subject: Translating the standard
In-Reply-To: <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
Message-ID: <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>

On 12 March 2018 at 07:59, Marcel Schneider via Unicode
<unicode at unicode.org> wrote:
>
> Likewise ISO/IEC 10646 is available in a French version

No it is not, and never has been.

Why don't you check your facts before making misleading statements to this list?

> or at least, it should have an official French version like all ISO standards.

That is also blatantly untrue.

Only six of the publicly available ISO standards listed at
http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
have French versions, and one has a Russian version. You will notice
that there is no French version of ISO/IEC 10646.

Andrew

From unicode at unicode.org  Mon Mar 12 06:55:54 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Mon, 12 Mar 2018 12:55:54 +0100 (CET)
Subject: Translating the standard
In-Reply-To: <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
 <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
Message-ID: <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>

On Mon, 12 Mar 2018 10:00:16 +0000, Andrew West wrote:
> 
> On 12 March 2018 at 07:59, Marcel Schneider via Unicode
>  wrote:
> >
> > Likewise ISO/IEC 10646 is available in a French version
> 
> No it is not, and never has been.
> 
> Why don't you check your facts before making misleading statements to this list?
> 
> > or at least, it should have an official French version like all ISO standards.
> 
> That is also blatantly untrue.
> 
> Only six of the publicly available ISO standards listed at
> http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
> have French versions, and one has a Russian version. You will notice
> that there is no French version of ISO/IEC 10646.
> 
> Andrew

Since ISO has made of standards a business, all prior versions are removed from 
the internet, so that they don?t show up even in that list (which I?d used to grab a 
free copy, just to check the differences). Because if they had public archives of the 
free standards, not having any for the pay standards would stand out even more.
This is why if you need an older version for reference, you need to find a good soul 
in the organization, who will be so kind to make a copy for you in the archives at 
the headquarters.

The last published French version of ISO/IEC 10646 ? to which you contributed ? 
is still available on Patrick?s site:

http://hapax.qc.ca/Tableaux-5.0.htm

Actually, the French version has no chief redactor, and during a time, the French 
version of the NamesList was maintained only so far as to add the new names (for 
use in ISO 14651). For Unicode 10.0.0, the French translation has been again fully 
updated to Code Charts production level:

http://hapax.qc.ca/ListeNoms-10.0.0.txt

(I?d noticed that the contributors? list has slightly shrinked without being able to 
find out why.) The Code Charts have not been produced, however (because there 
is actually no redactor?in?chief, as already stated, and also because of budget cuts 
the government is not in a position to pay the non?trivial amount of money asked 
for by Unicode for use of the fonts and/or [just trying to be as precise as I can this 
time| the owner of the tooling needed).

Having said that, I still believe that all ISO standards should have a French version,
shouldn?t they??????????:)

Best regards,

Marcel


From unicode at unicode.org  Mon Mar 12 08:58:32 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Mon, 12 Mar 2018 14:58:32 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <20180311195150.GA216921@phare.normalesup.org>
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
 <392967520.14691005.1520681202416@mail.yahoo.com>
 <CAGa7JC1-tbvyyoFhegzCRbUck5otpj3xPa=fqPmek738+_hsQw@mail.gmail.com>
 <20180310180235.GB3698923@phare.normalesup.org>
 <CAGa7JC38xTCdkbUYygUZAC3gbi9GC++w9hSR72=O6LmvTmfGBg@mail.gmail.com>
 <20180310193359.GA3818257@phare.normalesup.org>
 <CAGa7JC0rVdA3PnmepRDj=4RpAxGvh6jyvfD7P+y1P0u+GfaCgQ@mail.gmail.com>
 <20180311195150.GA216921@phare.normalesup.org>
Message-ID: <CAGa7JC0xAA9K_L_N_55JPJ+3KSCaZ26AnoWCoCQngCGPkZNj=g@mail.gmail.com>

20  Incidentally, since you have very strong opinions on what things
>
> should and shouldn?t be called: I don?t see the phrase ?French
> Alemannic? catching on at all :-)
>
I've not used that terminology. In France this is just called "alsacien"
(Alsatian in English) and descibed as one of the Alemannic
languages/dialects, and never German, nor Swiss, nor a combination of these
!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180312/eafc40b5/attachment.html>

From unicode at unicode.org  Mon Mar 12 09:30:39 2018
From: unicode at unicode.org (Andre Schappo via Unicode)
Date: Mon, 12 Mar 2018 14:30:39 +0000
Subject: Unicode 11.0 and 12.0 Cover Design Art
Message-ID: <E497ACB0-E1CA-460C-8633-41A3EDD84021@lboro.ac.uk>


One of my project students has an art gallery as a client ? surfacegallery.org<http://surfacegallery.org> This gallery is also a focal point for a collective of local artists.

This morning I had a project meeting with this student. I suggested that surface gallery artists might like to submit entries.

I showed the Unicode character set to the student and she was well impressed. I also suggested possible cover design art.

The basic principle of my suggestions was that the artwork should be constructed from Unicode characters and only Unicode characters. My suggestions included: plants, animals, portraits, cityscape, zoo, farm ...etc... If the artists collective use my suggestions then the unicode cover artwork they submit will most definitely feature Unicode.

Recent Unicode cover artwork has not featured Unicode (well not in any way that I can determine) and I think it should and it should feature it prominently and obviously.

I do not know who or how the artwork is judged but I think it would be good if members of this list could vote on the submitted cover artwork.

Andr? Schappo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180312/eb919758/attachment.html>

From unicode at unicode.org  Mon Mar 12 09:55:28 2018
From: unicode at unicode.org (Michel Suignard via Unicode)
Date: Mon, 12 Mar 2018 14:55:28 +0000
Subject: Translating the standard
In-Reply-To: <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
 <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
 <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>
Message-ID: <DM5PR1901MB21972D3B49E4008B9570E675A2D30@DM5PR1901MB2197.namprd19.prod.outlook.com>

Time to correct some facts.
The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. National bodies are always welcome to try to transpose and translate an ISO standard. But unless this is done by the ISO Sub-committee (SC2 here) itself, this is not a long-term solution. This was almost 15 years ago. I should know, I have been project editor for 10646 since October 2000 (I started as project editor in 1997 for part-2, and been involved in both Unicode and SC2 since 1990).

Now to some alternative facts:
>Since ISO has made of standards a business, all prior versions are removed from the internet, 
>so that they don?t show up even in that list (which I?d used to grab a free copy, just to check
> the differences). Because if they had public archives of the free standards, not having any 
>for the pay standards would stand out even more.
>This is why if you need an older version for reference, you need to find a good soul in
> the organization, who will be so kind to make a copy for you in the archives at the
> headquarters.

OK, yes, the old versions are removed from the ISO site. Andrew has probably easier access to older versions than you through BSI. He has been involved directly in SC2 work for many years. The 2003 version is completely irrelevant now anyway and again was not done by the SC, there was never a project editor for a French version of 10646.

>The last published French version of ISO/IEC 10646 ? to which you contributed ? is still available on
> Patrick?s site:
>
>http://hapax.qc.ca/Tableaux-5.0.htm

The only live part of that page is the code chart and does not correspond to the 1064:2003 itself (they are in fact Unicode 5.0 charts, however close to 10646:2003 and its first 2 amendments), I am not sure the original 10646:2003 (F), and the 2 translated amendments (1 and 2) are available anywhere and are totally obsolete today anyway. Only Canada and/or Afnor may still have archived versions.

>(I?d noticed that the contributors? list has slightly shrinked without being able to find out why.)
> The Code Charts have not been produced, however (because there is actually no
> redactor?in?chief, as already stated, and also because of budget cuts the government is not in
> a position to pay the non?trivial amount of money asked for by Unicode for use of the fonts
> and/or [just trying to be as precise as I can this time| the owner of the tooling needed).

A bunch of speculation here, never was a 'redactor-in-chief' for French version, Unicode never asked for money because first of all it does not own the tool (it is licensed by the tool owner who btw does this work as a giant goodwill gesture, based on the money received and the amount of work required to get this to work). In a previous message you also made some speculation about Apple role or possibility that have no relationship with reality.

>Having said that, I still believe that all ISO standards should have a French version, shouldn?t they??

You are welcome to contribute to that. Good luck though.

On a side note, I have been working with the same team of French volunteers to revive the French name list. So, this may re-appear in the Unicode web site at some point. Because I also produce the original code chart (in cooperation with Rick McGowan) for both ISO and Unicode it is a bit easier for me (although non-trivial). It also helps that I can read the French list :-). But the names list is probably as far as you want to go, and even that requires a serious amount of work in term of terms definition and production.

Michel


From unicode at unicode.org  Mon Mar 12 11:22:25 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Mon, 12 Mar 2018 17:22:25 +0100
Subject: Unicode 11.0 and 12.0 Cover Design Art
In-Reply-To: <E497ACB0-E1CA-460C-8633-41A3EDD84021@lboro.ac.uk>
References: <E497ACB0-E1CA-460C-8633-41A3EDD84021@lboro.ac.uk>
Message-ID: <CAGa7JC1dGXBRvgXEguYygf0bg49D_Xi-wibby3d-=vT_dWD2yw@mail.gmail.com>

The problem with some recent covers is that they either
- had no meaning (not even implied), they where just marble textures.
- or where too culturally centered, showing some scripts or a specific
projection of the Earth
- I have sent a proposal something that is culturally neutral, it evokates
a chart of characters, but not using any actual glyph, and suggesting some
maps with continents/islands, but not real maps, and easily
scalable/croppable at any resolution (It should be noted that the edition
will have several volumes and that the central vertical part of the image
may be variable. as well I avoided implying an horizontal or vertical
layout, placing the grid at uneven angles (about 30 degrees so that it
scales smoothly without visible artefacts). Also the pattern used is never
repeated (all tiles are unique but share some common general aspect, as if
it was a regular structure, but still irregular shapes, never twice the
same but still aligning cleaning with a semi-regular structure). I was
inspired by the beautiful blue mosaics I saw in Portugal.

You may of course have other ideas. But characters endoded in Unicode are
now very rich (and the glyphs for representing them and combining them are
even more rich if we also add the introduction of significant colors).

And the general principle was that this was just a background texture that
should not obscure the text/titles put on top of it (so it should have low
contrasting lines, and should be mostly unicolor, and reasonably dark or
pale, still attractive (avoiding low saturation levels of grays). As
several concepts are requested for several editions, we may vary these
ideas/concepts, including on the central cover border area, where the
Unicode logos and titles in smaller fonts should also be clearly
distinctive. As well the fine prints (e.g. name of the editor, or a small
abstract text on the background side (left part of the suggested image
canvas), without necessarily having to map an uniform background panel on
it (an uniform white rectangle will be needed for getting a clear
black&white barcode, and such insert should be also not more distractive to
the titles, meaning that the titles will most probably be white if we want
to avoid the uniform background behing them, and this suggests a moderately
dark or medium-light colored texture).

Some photos may be used of course, or some assembly. but it's hard to
predict the exact placement/centering of the photo if the cover size must
be adapted to the effective size of the central border area (depending on
the number of pages of each volume and the quality/grammage of paper used
for printed pages in the book). Given the small diffusion of the book and
its price, I think that cheap paper will be used to limit production costs
and allow "printing on demand" by publishers (or directly by some online
resellers such as Amazon, if they is permitted to print books themselves
via their partner publishers in the world, to save expedition costs and
storage costs).

It is also very likely that most sales could be now for electronic editions.

Complex patterns of contrasting lines should be limited to not cover the
whole area and should still allow easy placement of large titles, at their
placement suggested by the described template, and should avoid touching
the central vertical area (border cover) as well as some places needed for
usual small prints.


2018-03-12 15:30 GMT+01:00 Andre Schappo via Unicode <unicode at unicode.org>:

>
> One of my project students has an art gallery as a client ?
> surfacegallery.org This gallery is also a focal point for a collective of
> local artists.
>
> This morning I had a project meeting with this student. I suggested that
> surface gallery artists might like to submit entries.
>
> I showed the Unicode character set to the student and she was well
> impressed. I also suggested possible cover design art.
>
> The basic principle of my suggestions was that the artwork should be
> constructed from Unicode characters and only Unicode characters. My
> suggestions included: plants, animals, portraits, cityscape, zoo, farm
> ...etc... If the artists collective use my suggestions then the unicode
> cover artwork they submit will most definitely feature Unicode.
>
> Recent Unicode cover artwork has not featured Unicode (well not in any way
> that I can determine) and I think it should and it should feature it
> prominently and obviously.
>
> I do not know who or how the artwork is judged but I think it would be
> good if members of this list could vote on the submitted cover artwork.
>
> Andr? Schappo
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180312/89a28064/attachment.html>

From unicode at unicode.org  Mon Mar 12 14:21:49 2018
From: unicode at unicode.org (Rick McGowan via Unicode)
Date: Mon, 12 Mar 2018 12:21:49 -0700
Subject: IUC 42 - abstract submission deadline extended to March 16
Message-ID: <5AA6D34D.3080702@unicode.org>

Hello everyone,

The submission deadline for IUC 42 abstracts has been extended to 
Friday, March 16.

http://www.unicodeconference.org/call-for-participation.htm

Hope you can join us in September.

Regards,
     Rick


From unicode at unicode.org  Mon Mar 12 18:31:24 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Tue, 13 Mar 2018 00:31:24 +0100
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <20180312202825.GA1207055@phare.normalesup.org>
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
 <392967520.14691005.1520681202416@mail.yahoo.com>
 <CAGa7JC1-tbvyyoFhegzCRbUck5otpj3xPa=fqPmek738+_hsQw@mail.gmail.com>
 <20180310180235.GB3698923@phare.normalesup.org>
 <CAGa7JC38xTCdkbUYygUZAC3gbi9GC++w9hSR72=O6LmvTmfGBg@mail.gmail.com>
 <20180310193359.GA3818257@phare.normalesup.org>
 <CAGa7JC0rVdA3PnmepRDj=4RpAxGvh6jyvfD7P+y1P0u+GfaCgQ@mail.gmail.com>
 <20180311195150.GA216921@phare.normalesup.org>
 <CAGa7JC0xAA9K_L_N_55JPJ+3KSCaZ26AnoWCoCQngCGPkZNj=g@mail.gmail.com>
 <20180312202825.GA1207055@phare.normalesup.org>
Message-ID: <CAGa7JC2EnUAA1SChCtPqcJD_-m8MrmmfjnccMhHygNjiAm+6EQ@mail.gmail.com>

2018-03-12 21:28 GMT+01:00 Arthur Reutenauer <
arthur.reutenauer at normalesup.org>:

> On Mon, Mar 12, 2018 at 02:58:32PM +0100, Philippe Verdy via Unicode wrote:
> >> should and shouldn?t be called: I don?t see the phrase ?French
> >> Alemannic? catching on at all :-)
> >
> > I've not used that terminology.
>
>   That?s true, you misspelt it as ?French Allemanic?.
>

False, I've not used that expression at all ! I only cited what you wrote,
so there was no typo at all, except possibly by you (the citation above
that you wrote yourself).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180313/18772808/attachment.html>

From unicode at unicode.org  Mon Mar 12 19:18:31 2018
From: unicode at unicode.org (Asmus Freytag via Unicode)
Date: Mon, 12 Mar 2018 17:18:31 -0700
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAGa7JC2EnUAA1SChCtPqcJD_-m8MrmmfjnccMhHygNjiAm+6EQ@mail.gmail.com>
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
 <392967520.14691005.1520681202416@mail.yahoo.com>
 <CAGa7JC1-tbvyyoFhegzCRbUck5otpj3xPa=fqPmek738+_hsQw@mail.gmail.com>
 <20180310180235.GB3698923@phare.normalesup.org>
 <CAGa7JC38xTCdkbUYygUZAC3gbi9GC++w9hSR72=O6LmvTmfGBg@mail.gmail.com>
 <20180310193359.GA3818257@phare.normalesup.org>
 <CAGa7JC0rVdA3PnmepRDj=4RpAxGvh6jyvfD7P+y1P0u+GfaCgQ@mail.gmail.com>
 <20180311195150.GA216921@phare.normalesup.org>
 <CAGa7JC0xAA9K_L_N_55JPJ+3KSCaZ26AnoWCoCQngCGPkZNj=g@mail.gmail.com>
 <20180312202825.GA1207055@phare.normalesup.org>
 <CAGa7JC2EnUAA1SChCtPqcJD_-m8MrmmfjnccMhHygNjiAm+6EQ@mail.gmail.com>
Message-ID: <745ca474-d259-bc95-7d0d-11fdd3f43f40@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180312/6c79a7d2/attachment.html>

From unicode at unicode.org  Mon Mar 12 21:49:04 2018
From: unicode at unicode.org (=?UTF-8?B?WWlmw6FuIFfDoW5n?= via Unicode)
Date: Tue, 13 Mar 2018 11:49:04 +0900
Subject: Translating the standard
In-Reply-To: <A52DCFBC-6D6B-4B69-B6D4-E41ED73235AE@alastairs-place.net>
References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17>
 <20180311200503.GB216921@phare.normalesup.org>
 <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17>
 <A52DCFBC-6D6B-4B69-B6D4-E41ED73235AE@alastairs-place.net>
Message-ID: <CAF5KyEwrjRDJz98Ja0OXb6Q=fZy5ZW2wafVvLzOx6q7XBi4=FA@mail.gmail.com>

2018-03-12 16:39 GMT+09:00 Alastair Houghton via Unicode <unicode at unicode.org>:
> On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode <unicode at unicode.org> wrote:
>>
>> Indeed, to be fair. And for implementers, documenting themselves in English
>> may scarcely ever have much of a problem, no matter what?s the locale.
>
> Agreed.  Implementers will already understand English; you can?t write computer software without, since almost all documentation is in English, almost all computer languages are based on English, and, to be frank, a large proportion of the software market is itself English speaking.  I have yet to meet a software developer who didn?t speak English.

Somewhat digressing from the topic, but I'd like to make some comment
on this part as I smell a persistent myth among some, hopefully small number
of, software engineers in Anglosphere.

First, the fact that computer languages are written using English
words doesn't mean that programmers are supposed to have proportional
English knowledge. Take the word of Matz, the creator of Ruby
language: "The English skill is a super-powerful rare card (in the
career path of a Japanese engineer)!" He then continue that you should
be in keeping with most up-to-date overseas info/trend in order to be
a high-tier engineer and so on. It's far from "requirement".
http://eikaiwa.dmm.com/blog/3826/

I've also read somewhere a memoir of a middle-aged programmer who was
already into BASIC in childhood. One day he thought he'd written off a
"great" program and printed it on paper, but to his surprise, an
auntie who took a look at it immediately decoded the program and
roughly understood what it was meant to do; she knew English, and he
didn't.

Programming as such, is just like a Chinese room replaced with
English, where you sit inside a cramped room night after night,
communicating with a computer by typing in English words the bulky
reference guide teaches you. Most East Asian countries are blessed
enough with a tremendous number of translated technical publications
(e.g. O'Reilly) each year, not to mention firsthand writings in their
own languages. So the documentation is easily available if you don't
speak English the language.

Second, that English is lingua franca doesn't necessarily mean the
English spoken in the wild is. The aviation industry is another field
which employs English as the common language, but they exert utmost
effort to maintain the system working. Namely, they have a controlled
word set with semantics as disambiguated as possible, called
ASD-STE100, for technical documentation, such as maintenance manuals,
to minimize errors caused by limited English knowledge. Unicode, on
the other hand, is merely written in a free style used when English
speakers who (almost) graduated from college write to English speakers
who (almost) graduated from college. Having such level of proficiency
being a non-native speaker isn't something trivial, unless someone is
constantly in contact with English-speaking community. (And
programming community isn't contained inside English-speaking
community at all.)

That said, I agree to almost everything Alastair said after. If I have
to add one more thing, a monolingual writing is usually too tightly
coupled with the language, more than engineers may believe, even if
the writer carefully chose their words to be context-neutral. Thus
it's hard job to say no more and no less than the original text in
another language, especially when exactitude matters. It's one of the
problems prevent from fully automated translation being a thing, I
guess.

Best regards,

Yifan


2018-03-12 16:39 GMT+09:00 Alastair Houghton via Unicode <unicode at unicode.org>:
> On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode <unicode at unicode.org> wrote:
>>
>> Indeed, to be fair. And for implementers, documenting themselves in English
>> may scarcely ever have much of a problem, no matter what?s the locale.
>
> Agreed.  Implementers will already understand English; you can?t write computer software without, since almost all documentation is in English, almost all computer languages are based on English, and, to be frank, a large proportion of the software market is itself English speaking.  I have yet to meet a software developer who didn?t speak English.
>
> That?s not to say that people wouldn?t appreciate a translation of the standard, but there are, as others have pointed out, obvious maintenance problems, not to mention the issue that plagues some international institutions, namely the fact that translations are necessarily non-canonical and so those who really care about the details of the rules usually have to refer to a version in a particular language (sometimes that language might be French rather than English; very occasionally there are two versions declared, for political reasons, to both be canonical, which is obviously risky as there?s a chance they might differ subtly on some point, perhaps even because of punctuation).
>
> In terms of widespread understanding of the standard, which is where I think translation is perhaps more important, I?m not sure translating the actual standard itself is really the way forward.  It?d be better to ensure that there are reliable translations of books like Unicode Demystified or Unicode Explained - or, quite possibly, other books aimed more at the general public rather than the software community per se.
>
> Kind regards,
>
> Alastair.
>
> --
> http://alastairs-place.net
>
>


From unicode at unicode.org  Tue Mar 13 02:52:57 2018
From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode)
Date: Tue, 13 Mar 2018 16:52:57 +0900
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <392967520.14691005.1520681202416@mail.yahoo.com>
References: <392967520.14691005.1520681202416.ref@mail.yahoo.com>
 <392967520.14691005.1520681202416@mail.yahoo.com>
Message-ID: <beffcdb2-f057-1b39-91a7-8ba895e31157@it.aoyama.ac.jp>

On 2018/03/10 20:26, philip chastney via Unicode wrote:

> I would make the following observations on terminology in practice:

> -- the newspapers in Zurich advertised courses in "Hoch Deutsch", for those who needed to deal with foreigners

This should probably be written 'the newspapers in Zurich advertised 
courses in "Hochdeutsch", for foreigners'. Hochdeutsch (Standard German) 
is the language used in school, and in writing, and while there may be 
some specialized courses for Swiss people who didn't do well throughout 
grade school and want to catch up, that's not what the advertisements 
are about.


> -- in Luxemburg, the same language was referred to as Luxemburgish (or Letzeburgesch, which is Luxemburgish for "Luxemburgish ")
>      (I forget what the Belgians called the language spoken in Ostbelgien)
> 
> -- I was assured by a Luxemburgish-speaking car mechanic, with a Swiss German speaking wife, that the two languages (dialects?) were practically identical, except for the names of some household items

I can't comment on this, because I don't remember to ever have listened 
to somebody speaking Letzeburgesch.

> in short, there seems little point in making distinctions which cannot be precisely identified in practice
> 
> there appear to be significant differences between between High German and (what the natives call) Swiss German
> 
> there are far fewer significant differences between Swiss German and the other spoken Germanic languages found on the borders of Germany

In terms of linguistic analysis, that may be true. But virtually every 
native Swiss German speaker would draw a clear line between Swiss German 
(including the dialect(s) spoken in the upper Valais (Oberwallis), which 
are classified differently by linguists) and other varieties such as 
Swabian, Elsatian, Vorarlbergian, or even Letzeburgesch (which I have 
never seen classified as Allemannic)).

The reason for this is not so much basic linguistics, but much more a) 
vocabulary differences ranging from food to administrative terms, and b) 
the fact that people hear many different Swiss dialects on Swiss Radio 
and Television, while that's not the case for the dialects from outside 
the borders. So in practice, Swiss German can be delineated quite 
precisely, but more from a sociolinguistic and vocabulary perspective 
than from a purely evolutionary/historic linguistic perspective.

[Disclaimer: I'm not a linguist.]

Regards,   Martin.

From unicode at unicode.org  Tue Mar 13 03:39:57 2018
From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode)
Date: Tue, 13 Mar 2018 17:39:57 +0900
Subject: A sketch with the best-known Swiss tongue twister
In-Reply-To: <CAJ2xs_GS_mLpzRQd9mQyiBb=Fr5S_5nAC7g=4LoNfL5kV_EpRg@mail.gmail.com>
References: <CAJ2xs_Hdz7qdQ4pskf_mSdvFQyPd-sCEBmsskVLghXMQO=gQPg@mail.gmail.com>
 <CAGa7JC3Q0jKdTMPwmj4wf53nN0GJoVpx3OJ1dOUHkuJPFfbv3A@mail.gmail.com>
 <CAJ2xs_GS_mLpzRQd9mQyiBb=Fr5S_5nAC7g=4LoNfL5kV_EpRg@mail.gmail.com>
Message-ID: <d75e07a1-3ec0-4646-20eb-dfe553fda7ed@it.aoyama.ac.jp>

On 2018/03/09 21:24, Mark Davis ?? wrote:
> There are definitely many dialects across Switzerland. I think that for
> *this* phrase it would be roughly the same for most of the population, with
> minor differences (eg 'het' vs 'h?t'). But a native speaker like Martin
> would be able to say for sure.

Yes indeed. The differences would be in the vowels (not necessarily 
minor, but your mileage may vary), and the difficulty of this tongue 
twister is very much on the consonants.

Regards,   Martin.

From unicode at unicode.org  Mon Mar 12 23:46:40 2018
From: unicode at unicode.org (Lisa Moore via Unicode)
Date: Mon, 12 Mar 2018 21:46:40 -0700
Subject: Unicode 11.0 and 12.0 Cover Design Art
In-Reply-To: <E497ACB0-E1CA-460C-8633-41A3EDD84021@lboro.ac.uk>
References: <E497ACB0-E1CA-460C-8633-41A3EDD84021@lboro.ac.uk>
Message-ID: <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com>

Dear Andre,

Please encourage her and other artists to make a submission. The judges 
take in many different perspectives, some more character oriented and 
some more abstract. All are welcome submissions.

Thank you,

Lisa


On 3/12/2018 7:30 AM, Andre Schappo via Unicode wrote:
> surface gallery artists might like to submit entries.
>
> I showed the Unicode character set to the student and she


From unicode at unicode.org  Tue Mar 13 13:20:55 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Tue, 13 Mar 2018 19:20:55 +0100 (CET)
Subject: Translating the standard
In-Reply-To: <DM5PR1901MB21972D3B49E4008B9570E675A2D30@DM5PR1901MB2197.namprd19.prod.outlook.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <CAN49p6qcBafkmoy43Fe96EMyiYhSWAAj_phmq-9Tkv96k059jw@mail.gmail.com>
 <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
 <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
 <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>
 <DM5PR1901MB21972D3B49E4008B9570E675A2D30@DM5PR1901MB2197.namprd19.prod.outlook.com>
Message-ID: <894159522.24446.1520965256249.JavaMail.www@wwinf1p23>

On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote:
> 
> Time to correct some facts.
> The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. 
> National bodies are always welcome to try to transpose and translate an ISO standard. But unless this is done by the ISO Sub-committee
> (SC2 here) itself, this is not a long-term solution. This was almost 15 years ago. I should know, I have been project editor for 10646 since 
> October 2000 (I started as project editor in 1997 for part-2, and been involved in both Unicode and SC2 since 1990).

Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too.

> 
> Now to some alternative facts:
> >Since ISO has made of standards a business, all prior versions are removed from the internet, 
> >so that they don?t show up even in that list (which I?d used to grab a free copy, just to check
> > the differences). Because if they had public archives of the free standards, not having any 
> >for the pay standards would stand out even more.
> >This is why if you need an older version for reference, you need to find a good soul in
> > the organization, who will be so kind to make a copy for you in the archives at the
> > headquarters.
> 
> OK, yes, the old versions are removed from the ISO site. Andrew has probably easier access to older versions than you through BSI.
> He has been involved directly in SC2 work for many years. The 2003 version is completely irrelevant now anyway and again was not
> done by the SC, there was never a project editor for a French version of 10646.

Call him whatever, how can a project thrive without a head?

I think relevance is not the only criterium in evaluating a translation. The most important would probably 
be usefulness. Older versions are an appropriate means to get in touch with Unicode, as discussed when 
some old core specs were proposed on this list.

> 
> >The last published French version of ISO/IEC 10646 ? to which you contributed ? is still available on
> > Patrick?s site:
> >
> >http://hapax.qc.ca/Tableaux-5.0.htm
> 
> The only live part of that page is the code chart and does not correspond to the 1064:2003 itself (they are in fact Unicode 5.0 charts,
> however close to 10646:2003 and its first 2 amendments), I am not sure the original 10646:2003 (F), and the 2 translated amendments
> (1 and 2) are available anywhere and are totally obsolete today anyway. Only Canada and/or Afnor may still have archived versions.

Given that for each time some benevolent people have their nameslist translation ready for print, 
they have to pay the tool and the fonts ? just plainly disgusting. 

No wonder once you get such a localized Code Charts edition printed out in PDF, it has everlasting value!

> 
> >(I?d noticed that the contributors? list has slightly shrinked without being able to find out why.)
> > The Code Charts have not been produced, however (because there is actually no
> > redactor?in?chief, as already stated, and also because of budget cuts the government is not in
> > a position to pay the non?trivial amount of money asked for by Unicode for use of the fonts
> > and/or [just trying to be as precise as I can this time| the owner of the tooling needed).
> 
> A bunch of speculation here, never was a 'redactor-in-chief' for French version, Unicode never asked for money because first of all
> it does not own the tool (it is licensed by the tool owner who btw does this work as a giant goodwill gesture, based on the money received
> and the amount of work required to get this to work).

Shame! Unicode should manage to get the funding ? no problem for Apple! (but for Microsoft who had to fire many employees) ?
so that the developer is fully paid and rewarded. Why has Unicode no unlimited license? Because of the stinginess of those corporate
members that have plenty of money to waste. I?ll save that off?topic rant but without ceasing to insist that he must be paid, fully paid
and paid back and paid in the future, the more as the Code Charts are now printed annually and grow bigger and bigger.
It?s really up to the Consortium to gather the full license fee from their corporate members for the English version and any other 
interested locale. Unicode?s claim of mission encompasses logically making available for free as many localized Code Charts and
whatever else so far as benevolent people translate the sources. 

Shouldn?t that have been clear from the beginning on?

> In a previous message you also made some speculation about Apple role or possibility that have no relationship with reality.
> 
> >Having said that, I still believe that all ISO standards should have a French version, shouldn?t they??
> 
> You are welcome to contribute to that. Good luck though.
> 
> On a side note, I have been working with the same team of French volunteers to revive the French name list. So, this may re-appear
> in the Unicode web site at some point. Because I also produce the original code chart (in cooperation with Rick McGowan) for both ISO
> and Unicode it is a bit easier for me (although non-trivial). It also helps that I can read the French list :-). But the names list is probably
> as far as you want to go, and even that requires a serious amount of work in term of terms definition and production.

Indeed. I experience how true that is. There is a lot of discordance about how to call things.
E.g. didn?t you first translate TURNED (R and so on) to RETOURN? or to TOURN?? (I furiously hate CULBUT?).

I welcome your effort in updating the French part of the Unicode site. Actually this is so outdated that it is even 
disallowed to search engines! Here:
unicode.org/robots.txt
|
| Disallow: /fr/ # obsolete pages and charts

Shame over shame.
And when a guy works over the holidays to get the mess fixed, it is ignored by the UTC.

?Best regards,

Marcel


From unicode at unicode.org  Tue Mar 13 13:38:20 2018
From: unicode at unicode.org (Asmus Freytag via Unicode)
Date: Tue, 13 Mar 2018 11:38:20 -0700
Subject: Translating the standard
In-Reply-To: <894159522.24446.1520965256249.JavaMail.www@wwinf1p23>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
 <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
 <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>
 <DM5PR1901MB21972D3B49E4008B9570E675
 A2D30@DM5PR1901MB2197.namprd19.prod.outlook.com>
 <894159522.24446.1520965256249.JavaMail.www@wwinf1p23>
Message-ID: <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180313/8897f3df/attachment.html>

From unicode at unicode.org  Tue Mar 13 14:55:01 2018
From: unicode at unicode.org (Philippe Verdy via Unicode)
Date: Tue, 13 Mar 2018 20:55:01 +0100
Subject: Translating the standard
In-Reply-To: <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
 <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
 <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>
 <894159522.24446.1520965256249.JavaMail.www@wwinf1p23>
 <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com>
Message-ID: <CAGa7JC1R6uyY+p-A3QRs7rMDru+VEcRJHA_45Aa_7ajaRW6q=w@mail.gmail.com>

It is then a version of the matching standards from Canadian and French
standard bodies. This does not make a big difference, except that those
national standards (last editions in 2003) are not kept in sync with
evolutions of the ISO/IEC standard. So it can be said that this was a
version for the 2003 version of the ISO/IEC standard, supported and
sponsored by some of their national members.

2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode <unicode at unicode.org>:

> On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote:
>
> On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote:
>
> Time to correct some facts.
> The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper.
> ...
>
> Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too.
>
> Correction: if a project is not carried out by SC2 (the proper ISO/IEC
> subcommittee) then it is not a "version" of the ISO/IEC standard.
>
> A./
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180313/029174d5/attachment.html>

From unicode at unicode.org  Tue Mar 13 16:59:50 2018
From: unicode at unicode.org (John H. Jenkins via Unicode)
Date: Tue, 13 Mar 2018 15:59:50 -0600
Subject: Unicode 11.0 and 12.0 Cover Design Art
In-Reply-To: <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com>
References: <E497ACB0-E1CA-460C-8633-41A3EDD84021@lboro.ac.uk>
 <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com>
Message-ID: <6450641E-C9F1-4EDF-8715-C22D9A1D533F@apple.com>

Maybe we should just throw in the towel and put "DON'T PANIC" on the cover in big, friendly letters. ??


From unicode at unicode.org  Tue Mar 13 18:48:51 2018
From: unicode at unicode.org (Asmus Freytag (c) via Unicode)
Date: Tue, 13 Mar 2018 16:48:51 -0700
Subject: Translating the standard
In-Reply-To: <CAGa7JC1R6uyY+p-A3QRs7rMDru+VEcRJHA_45Aa_7ajaRW6q=w@mail.gmail.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
 <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
 <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>
 <894159522.24446.1520965256249.JavaMail.www@wwinf1p23>
 <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com>
 <CAGa7JC1R6uyY+p-A3QRs7rMDru+VEcRJHA_45Aa_7ajaRW6q=w@mail.gmail.com>
Message-ID: <dab616f9-abaa-ded2-2ac5-38180de91e02@ix.netcom.com>

On 3/13/2018 12:55 PM, Philippe Verdy wrote:
> It is then a version of the matching standards from Canadian and 
> French standard bodies. This does not make a big difference, except 
> that those national standards (last editions in 2003) are not kept in 
> sync with evolutions of the ISO/IEC standard. So it can be said that 
> this was a version for the 2003 version of the ISO/IEC standard, 
> supported and sponsored by some of their national members.

There is a way to transpose international standards to national 
standards, but they then pick up a new designation, e.g. ANSI for US or 
DIN for German or EN for European Norm.

A./
>
> 2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode 
> <unicode at unicode.org <mailto:unicode at unicode.org>>:
>
>     On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote:
>>     On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote:
>>>     Time to correct some facts.
>>>     The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper.
>>>     ...
>>     Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too.
>     Correction: if a project is not carried out by SC2 (the proper
>     ISO/IEC subcommittee) then it is not a "version" of the ISO/IEC
>     standard.
>
>     A./
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180313/f367935d/attachment.html>

From unicode at unicode.org  Tue Mar 13 23:37:09 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Wed, 14 Mar 2018 05:37:09 +0100 (CET)
Subject: Translating the standard
In-Reply-To: <dab616f9-abaa-ded2-2ac5-38180de91e02@ix.netcom.com>
References: <CACKGfgXWT11Er0ghKJRqk8M_rSfOsLNyqVi86F_942K3Q6OyPA@mail.gmail.com>
 <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net>
 <CAGa7JC0d6uYoWQQHxMCVCP9ZQP2fXmEu4iaqEPBBOSvkZeffdQ@mail.gmail.com>
 <944626039.25442.1520472427152.JavaMail.www@wwinf1m17>
 <20180308090328.336a734f@JRWUBU2>
 <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch>
 <53251075.18695.1520532306737.JavaMail.www@wwinf1m17>
 <20180308183304.GB2050855@phare.normalesup.org>
 <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17>
 <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net>
 <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17>
 <CALgEMhyKV9bYC8-CwCAnAeABpRzibu3L04RzGGuLTMocE8a4yg@mail.gmail.com>
 <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17>
 <894159522.24446.1520965256249.JavaMail.www@wwinf1p23>
 <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com>
 <CAGa7JC1R6uyY+p-A3QRs7rMDru+VEcRJHA_45Aa_7ajaRW6q=w@mail.gmail.com>
 <dab616f9-abaa-ded2-2ac5-38180de91e02@ix.netcom.com>
Message-ID: <10963408.182.1521002229592.JavaMail.www@wwinf1m17>

On?Tue, 13 Mar 2018 16:48:51 -0700,?Asmus Freytag (c) via Unicode wrote:

On 3/13/2018 12:55 PM, Philippe Verdy wrote:

It is then a version of the matching standards from Canadian and French standard bodies. This does not make a big difference, except that those national standards (last editions in 2003) are not kept in sync with evolutions of the ISO/IEC standard. So it can be said that this was a version for the 2003 version of the ISO/IEC standard, supported and sponsored by some of their national members.


There is a way to transpose international standards to national standards, but they then pick up a new designation, e.g. ANSI for US or DIN for German or EN for European Norm.

A./


2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode?:


On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote:

On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote:


Time to correct some facts.
The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. 
...


Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too.


Correction: if a project is not carried out by SC2 (the proper ISO/IEC subcommittee) then it is not a "version" of the ISO/IEC standard.

A./
?


Thanks for correction. And I confess and apologize that on Patrick?s French Unicode 5.0 Code Charts page (
http://hapax.qc.ca/Tableaux-5.0.htm
), there is no instance of "version", although the item is referred to as "ISO 10646:2003 (F)", from which it can ordinarily be inferred that "ISO" did back the project and that it is considered as the French version of the standard.
?
I wasn?t aware that this kind of parsing the facts is somewhat informal and shouldn?t be handled on mailing lists without a caveat.?
?
That said, the French transposition of ISO/IEC 10646 was not carried out as just sort of a joint venture of Canada and France (which btw has stepped out, leaving Qu?bec alone supporting the cost of future editions! Really ugly), given that it got feedback from numerous countries, part of which was written in French, and went through a heavy ballot process. Thus, getting it changed is not easy since it was approved by the time, and any change requests should be documented and are primarily damageable as threatening stability. Name changes affecting rare characters prove to be feasible, while on the other hand, syncing the French name of U+202F with common practice and TUS is obviously more complicated, which in turn compromises usability in UIs, where we?re therefore likely to use descriptors i.e. altered names for roughly half of the characters bearing a specific name. Somehow the same rationale as for UTN #24 but somewhat less apposite given that the French transposition is not constrained by stability policies.
?
Best regards,
?
Marcel
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180314/e6595ee5/attachment.html>

From unicode at unicode.org  Wed Mar 14 06:55:08 2018
From: unicode at unicode.org (Andre Schappo via Unicode)
Date: Wed, 14 Mar 2018 11:55:08 +0000
Subject: Translating the standard
In-Reply-To: <CAF5KyEwrjRDJz98Ja0OXb6Q=fZy5ZW2wafVvLzOx6q7XBi4=FA@mail.gmail.com>
References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17>
 <20180311200503.GB216921@phare.normalesup.org>
 <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17>
 <A52DCFBC-6D6B-4B69-B6D4-E41ED73235AE@alastairs-place.net>
 <CAF5KyEwrjRDJz98Ja0OXb6Q=fZy5ZW2wafVvLzOx6q7XBi4=FA@mail.gmail.com>
Message-ID: <BB62C755-A250-4639-A5EC-CB826B1C9317@lboro.ac.uk>


On 13 Mar 2018, at 02:49, Yif?n W?ng via Unicode <unicode at unicode.org<mailto:unicode at unicode.org>> wrote:

Somewhat digressing from the topic, but I'd like to make some comment
on this part as I smell a persistent myth among some, hopefully small number
of, software engineers in Anglosphere.

First, the fact that computer languages are written using English
words doesn't mean that programmers are supposed to have proportional
English knowledge. Take the word of Matz, the creator of Ruby
language: "The English skill is a super-powerful rare card (in the
career path of a Japanese engineer)!" He then continue that you should
be in keeping with most up-to-date overseas info/trend in order to be
a high-tier engineer and so on. It's far from "requirement".
http://eikaiwa.dmm.com/blog/3826/

I've also read somewhere a memoir of a middle-aged programmer who was
already into BASIC in childhood. One day he thought he'd written off a
"great" program and printed it on paper, but to his surprise, an
auntie who took a look at it immediately decoded the program and
roughly understood what it was meant to do; she knew English, and he
didn't.

Programming as such, is just like a Chinese room replaced with
English, where you sit inside a cramped room night after night,
communicating with a computer by typing in English words the bulky
reference guide teaches you. Most East Asian countries are blessed
enough with a tremendous number of translated technical publications
(e.g. O'Reilly) each year, not to mention firsthand writings in their
own languages. So the documentation is easily available if you don't
speak English the language.

Second, that English is lingua franca doesn't necessarily mean the
English spoken in the wild is. The aviation industry is another field
which employs English as the common language, but they exert utmost
effort to maintain the system working. Namely, they have a controlled
word set with semantics as disambiguated as possible, called
ASD-STE100, for technical documentation, such as maintenance manuals,
to minimize errors caused by limited English knowledge. Unicode, on
the other hand, is merely written in a free style used when English
speakers who (almost) graduated from college write to English speakers
who (almost) graduated from college. Having such level of proficiency
being a non-native speaker isn't something trivial, unless someone is
constantly in contact with English-speaking community. (And
programming community isn't contained inside English-speaking
community at all.)

That said, I agree to almost everything Alastair said after. If I have
to add one more thing, a monolingual writing is usually too tightly
coupled with the language, more than engineers may believe, even if
the writer carefully chose their words to be context-neutral. Thus
it's hard job to say no more and no less than the original text in
another language, especially when exactitude matters. It's one of the
problems prevent from fully automated translation being a thing, I
guess.

Best regards,

Yifan


When it comes to program identifiers, languages such as Chinese has a huge advantage as it is a much more compact language than English. So, one can write meaningful identifier names with a small number of Chinese characters. In an all Chinese development team, producing software for the Chinese market, why not have the program identifiers written in Chinese? Or maybe this does happen?

Over the years I have talked with many Chinese students about this, and usually they tell me something like: "Our lecturers in China tell us to always use English for program identifiers".

I make use of several languages for my program identifiers ? jsfiddle.net/user/coas/fiddles<http://jsfiddle.net/user/coas/fiddles> My use of non English languages for program identifiers is somewhat random??

Andr? Schappo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180314/900dceb7/attachment.html>

From unicode at unicode.org  Thu Mar 15 16:06:32 2018
From: unicode at unicode.org (Adam Borowski via Unicode)
Date: Thu, 15 Mar 2018 22:06:32 +0100
Subject: Unicode 11.0 and 12.0 Cover Design Art
In-Reply-To: <6450641E-C9F1-4EDF-8715-C22D9A1D533F@apple.com>
References: <E497ACB0-E1CA-460C-8633-41A3EDD84021@lboro.ac.uk>
 <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com>
 <6450641E-C9F1-4EDF-8715-C22D9A1D533F@apple.com>
Message-ID: <20180315210632.t55kom7rwh2s3xgj@angband.pl>

On Tue, Mar 13, 2018 at 03:59:50PM -0600, John H. Jenkins via Unicode wrote:
> Maybe we should just throw in the towel and put "DON'T PANIC" on the cover
> in big, friendly letters.  ??

But what script would you use?
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
??? ???? ?????? ????? ??? ????
??? ???? ?????? ????? ?? ????
??? ???? ?????? ????? ??? ????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
??????????????????????????????
??? ???? ?????? ????? ??? ????
?????? ???????? ???????????? ?????????? y???? ???????
???? ???? ?????? ?????? ???? ?????
? w? sc?t w??d ?? ???
?t w?t sc?pt ??ld ?? ???
?????? ??????? ???????????? ?????????? ?????? ???????
?t w?t sc?pt ???ld ?? ???
But what script would you use?
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? w?????? ???????????? w???????? y???? ???????
???? ???? ?????? ????? ??? ????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
?????? ???????? ???????????? ?????????? ?????? ???????
????? ???????? ???????????? ????????? ????? ??????
B?? ???? ?????? ????? ??? ????
?????? ???????? ???????????? ?????????? ?????? ???????

-- 
??????? 
??????? A dumb species has no way to open a tuna can.
??????? A smart species invents a can opener.
??????? A master species delegates.

From unicode at unicode.org  Fri Mar 16 19:56:09 2018
From: unicode at unicode.org (Ed Borgquist via Unicode)
Date: Fri, 16 Mar 2018 17:56:09 -0700
Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones
Message-ID: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net>

Hello All,

The Full Emoji List [1] had, in the past, displayed Emoji with all skin tone variants. It seems that this is no longer the case. Does anyone know if it is possible that this could return in the future?

This data was useful for myself, as scraping this data allowed for me to identify "homographic" Emoji from a variety of vendors. Additionally, I could see how vendors approached skin tone variants for difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a person with no visible skin).

[1] https://unicode.org/emoji/charts/full-emoji-list.html

Kindest Regards,
?
Ed Borgquist
.WS Registry


From unicode at unicode.org  Sat Mar 17 07:19:54 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Sat, 17 Mar 2018 13:19:54 +0100
Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones
In-Reply-To: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net>
References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net>
Message-ID: <CAJ2xs_FPWL+xgV47nEyX7Q5JPAF1EDvnpmjA4-6Y0q5xQ_hhkg@mail.gmail.com>

We were getting so much traffic on the emoji pages that we had to produce
an abbreviated version to reduce the load (without skin tones, it is about
half the size).

We are looking at improvements to the infrastructure and/or chart design
that would let us restore them, but people are busy with other Unicode
projects right now.

Mark

On Sat, Mar 17, 2018 at 1:56 AM, Ed Borgquist via Unicode <
unicode at unicode.org> wrote:

> Hello All,
>
> The Full Emoji List [1] had, in the past, displayed Emoji with all skin
> tone variants. It seems that this is no longer the case. Does anyone know
> if it is possible that this could return in the future?
>
> This data was useful for myself, as scraping this data allowed for me to
> identify "homographic" Emoji from a variety of vendors. Additionally, I
> could see how vendors approached skin tone variants for
> difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a
> person with no visible skin).
>
> [1] https://unicode.org/emoji/charts/full-emoji-list.html
>
> Kindest Regards,
>
> Ed Borgquist
> .WS Registry
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180317/62581c79/attachment.html>

From unicode at unicode.org  Sat Mar 17 11:43:49 2018
From: unicode at unicode.org (Ed Borgquist via Unicode)
Date: Sat, 17 Mar 2018 09:43:49 -0700
Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones
References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net>
 <CAJ2xs_FPWL+xgV47nEyX7Q5JPAF1EDvnpmjA4-6Y0q5xQ_hhkg@mail.gmail.com>
Message-ID: <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net>

Thanks for the information. Does Unicode make public the source images received from vendors? Or, is there somewhere else you would recommend for me to look?

 
Kindest Regards,

 
Ed Borgquist

.WS Registry

 
From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ??
Sent: Saturday, March 17, 2018 5:20 AM
To: Ed Borgquist
Cc: Unicode Public
Subject: Re: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones

 
We were getting so much traffic on the emoji pages that we had to produce an abbreviated version to reduce the load (without skin tones, it is about half the size). 

 
We are looking at improvements to the infrastructure and/or chart design that would let us restore them, but people are busy with other Unicode projects right now.


Mark

 
On Sat, Mar 17, 2018 at 1:56 AM, Ed Borgquist via Unicode <unicode at unicode.org> wrote:

Hello All,

The Full Emoji List [1] had, in the past, displayed Emoji with all skin tone variants. It seems that this is no longer the case. Does anyone know if it is possible that this could return in the future?

This data was useful for myself, as scraping this data allowed for me to identify "homographic" Emoji from a variety of vendors. Additionally, I could see how vendors approached skin tone variants for difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a person with no visible skin).

[1] https://unicode.org/emoji/charts/full-emoji-list.html

Kindest Regards,
 
Ed Borgquist
.WS Registry

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180317/3a32e700/attachment.html>

From unicode at unicode.org  Sat Mar 17 12:23:51 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Sat, 17 Mar 2018 17:23:51 +0000
Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones
In-Reply-To: <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net>
References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net>
 <CAJ2xs_FPWL+xgV47nEyX7Q5JPAF1EDvnpmjA4-6Y0q5xQ_hhkg@mail.gmail.com>
 <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net>
Message-ID: <CAJ2xs_EQ5R5vLin8crU_D65wPC7PhoTPtjE8d5izS5qMqfPqBg@mail.gmail.com>

You can take a look at emojipedia. They have a good set of information
about emoji glyphs.

{phone}

On Sat, Mar 17, 2018, 17:44 Ed Borgquist <ed.borgquist at website.ws> wrote:

> Thanks for the information. Does Unicode make public the source images
> received from vendors? Or, is there somewhere else you would recommend for
> me to look?
>
>
>
> Kindest Regards,
>
>
>
> Ed Borgquist
>
> .WS Registry
>
>
>
> *From:* mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] *On
> Behalf Of *Mark Davis ??
> *Sent:* Saturday, March 17, 2018 5:20 AM
> *To:* Ed Borgquist
> *Cc:* Unicode Public
> *Subject:* Re: Full Emoji List Chart No Longer Displaying Emoji with
> Skin-tones
>
>
>
> We were getting so much traffic on the emoji pages that we had to produce
> an abbreviated version to reduce the load (without skin tones, it is about
> half the size).
>
>
>
> We are looking at improvements to the infrastructure and/or chart design
> that would let us restore them, but people are busy with other Unicode
> projects right now.
>
>
> Mark
>
>
>
> On Sat, Mar 17, 2018 at 1:56 AM, Ed Borgquist via Unicode <
> unicode at unicode.org> wrote:
>
> Hello All,
>
> The Full Emoji List [1] had, in the past, displayed Emoji with all skin
> tone variants. It seems that this is no longer the case. Does anyone know
> if it is possible that this could return in the future?
>
> This data was useful for myself, as scraping this data allowed for me to
> identify "homographic" Emoji from a variety of vendors. Additionally, I
> could see how vendors approached skin tone variants for
> difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a
> person with no visible skin).
>
> [1] https://unicode.org/emoji/charts/full-emoji-list.html
>
> Kindest Regards,
>
> Ed Borgquist
> .WS Registry
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180317/d5ea27c6/attachment.html>

From unicode at unicode.org  Mon Mar 19 10:00:04 2018
From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode)
Date: Mon, 19 Mar 2018 16:00:04 +0100 (CET)
Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones
In-Reply-To: <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net>
References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net>
 <CAJ2xs_FPWL+xgV47nEyX7Q5JPAF1EDvnpmjA4-6Y0q5xQ_hhkg@mail.gmail.com>
 <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net>
Message-ID: <1583983326.32942.1521471604856@ox.hosteurope.de>

Ed Borgquist:
> 
> Thanks for the information. Does Unicode make public the source images received from vendors? Or, is there somewhere else you would recommend for me to look?

Besides Emojipedia.org, you will find most current sets at <https://github.com/iamcal/emoji-data> and many old sets at <https://github.com/Crissov/original-emoji>.

From unicode at unicode.org  Fri Mar 23 08:00:36 2018
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Fri, 23 Mar 2018 14:00:36 +0100
Subject: Unicode Utilities
Message-ID: <CAJ2xs_GShMPEH7JJuvswYtiN=4RwCTpEy_nBpNu0xs+YWgQfwg@mail.gmail.com>

For testing, the Unicode Utilities now support the Unicode beta properties
(with some caveats). Example: \p{gc?=Lu}-\p{gc=Lu}
<https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cp%7Bgc%CE%B2%3DLu%7D-%5Cp%7Bgc%3DLu%7D&g=&i=>
.

Thanks to Sascha for helping to move to different infrastructure for the
utilities...

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180323/704c8cb7/attachment.html>

From unicode at unicode.org  Mon Mar 26 11:51:55 2018
From: unicode at unicode.org (William_J_G Overington via Unicode)
Date: Mon, 26 Mar 2018 17:51:55 +0100 (BST)
Subject: Accessibility Emoji
In-Reply-To: <8570587.39967.1522077150247.JavaMail.root@webmail02.bt.ext.cpcloud.co.uk>
References: <8570587.39967.1522077150247.JavaMail.root@webmail02.bt.ext.cpcloud.co.uk>
Message-ID: <17804074.45213.1522083115554.JavaMail.defaultUser@defaultHost>

I have been looking with interest at the following publication.

Proposal For New Accessibility Emoji

by Apple Inc.

www.unicode.org/L2/L2018/18080-accessibility-emoji.pdf

I am supportive of the proposal. Indeed please have more such emoji as well.

In relation to the two dogs.

My own (limited) experience of guide dogs for people with a vision disability, just from seeing them in the street and on television is that in the United Kingdom the dogs often have a yellow protective coat with silvery strips on them so that they can be more easily seen. It may also help them being more readily recognised as each being a guide dog. The dogs tend to be of a type of dog of rather wider aspect ratio, if that is the way to put it, than the dog in the sample glyph in the proposal document. The dogs tend to be a creamy yellow colour, though there was a famous guide dog who was all black, famous as the guide dog was allowed to accompany a then Member of Parliament into the House of Commons Chamber in London.

So, while the two rod guide handle, contrasted with a floppy lead, is a good disambiguation guide for the two types of assistance dogs, I suggest that using the presence of what the proposal terms a vest for disambiguation may not be appropriate.

Also the word vest appears to have different meanings in British English and American English.

Maybe jacket might be better choice of word than vest for the standards document.

What about the colour and type of the dog? Perhaps easier to add in now than later?

What about a person with a hidden disability? Many people have a hidden disability yet do not have a service dog as the nature of the particular hidden disability or maybe hidden disabilities does not need the help of a service dog.

Should there be an emoji for a person with a hidden disability? Or maybe more than one such emoji so as to disambiguate the types of hidden disability, always remembering to have an "other hidden disability" emoji so as to include all types of hidden disability?

Those questions, and indeed the whole proposal document, lead to asking for what purposes these emoji are envisioned as becoming used?

For example, a person with a hidden disability might not like to be referred to as such, yet may like to describe himself or herself as having a hidden disability if trying to find appropriate facilities relevant to the particular disability, such as a toilet for a person with a disability with the additional facilities thereof, or seeking access to a chair or a first-aid room, or seeking help for opening a door, or maybe when requesting a special diet, such as a gluten-free diet.

How could the accessibility emoji in the proposal be used in practice?

William Overington

Monday 26 March 2018

From unicode at unicode.org  Thu Mar 29 05:38:51 2018
From: unicode at unicode.org (William_J_G Overington via Unicode)
Date: Thu, 29 Mar 2018 11:38:51 +0100 (BST)
Subject: Accessibility Emoji
In-Reply-To: <10505691.13022.1522317252542.JavaMail.root@webmail01.bt.ext.cpcloud.co.uk>
References: <10505691.13022.1522317252542.JavaMail.root@webmail01.bt.ext.cpcloud.co.uk>
Message-ID: <16510571.16509.1522319931314.JavaMail.defaultUser@defaultHost>

I have been thinking about issues around the proposal.

http://www.unicode.org/L2/L2018/18080-accessibility-emoji.pdf

There is a sentence in that document that starts as follows.

> Emoji are a universal language and a powerful tool for communication, ....

It seems to me that what is lacking with emoji are verbs and pronouns.

For example, "to be", "to have" and "to need". The verb "to need" might well be of particular importance in relation to accessibility considerations.

How could verbs be introduced into emoji? The verb "to love" can already be indicated using a heart symbol.

Should abstract designs be used? Or should emoji always be pictographic?

If abstract designs were introduced would it be possible for the standards documents to include the meanings or would the standards documents need to simply use a geometrical description and then the meanings be regarded as a higher level protocol outside of the standard?

For, if abstract emoji were introduced with the intention of them to be of use as verbs in a universal language, it would be of benefit if the meanings were in the standard.

If abstract designs were used then the meanings would need to be learned. Yet if the meanings were universal that could be a useful development.

I have wondered whether verb tenses could be usefully expressed using some of the existing combining accent characters following an emoji verb character..

For example, U+0302 COMBINING CIRCUMFLEX ACCENT to indicate that the verb is in the future tense, U+0304 COMBINING MACRON to indicate that the verb is in the present tense, U+030C COMBINING CARON to indicate that the verb is in the past tense, U+0303 COMBINING TILDE to indicate that the verb is in the conditional tense.

The desirability of pronouns was raised by a gentleman in the audience of a lecture at the Internationalization and Unicode Conference in 2015.

I tried to produce some designs. I could not find a way to do that with conventional illustrative pictures, though I did produce a set of abstract designs that could possibly be useful in application; they could be displayed in colourful emoji style yet also in monochrome without ambiguity. Yet they are abstract designs, so meanings would need to be learned rather than indicated by the picture itself. Yet if the meanings were universal, that could be useful. Should there be abstract emoji or should emoji only be conventional pictures?

William Overington

Thursday 29 March 2018


From unicode at unicode.org  Thu Mar 29 17:23:06 2018
From: unicode at unicode.org (fantasai via Unicode)
Date: Thu, 29 Mar 2018 15:23:06 -0700
Subject: Sentence_Break, Semi-colons, and Apparent Miscategorization
In-Reply-To: <CAJ2xs_FfWHuAeqUDoDYX-82y1HX2agP8dq5D-i743AP-P6j5mQ@mail.gmail.com>
References: <a1344ff7-d3a4-80f3-03c1-4d88b3a70ad3@inkedblade.net>
 <CAJ2xs_FfWHuAeqUDoDYX-82y1HX2agP8dq5D-i743AP-P6j5mQ@mail.gmail.com>
Message-ID: <42a17725-bda9-0268-0296-e88b1b3c26a3@inkedblade.net>

On 03/08/2018 07:04 AM, Mark Davis ?? wrote:
>  From the first line, I guess you mean that all three questions are having to do with the Sentence_Break property values. Namely:
> 
> http://www.unicode.org/reports/tr29/proposed.html#Table_Sentence_Break_Property_Values
> http://www.unicode.org/reports/tr29/proposed.html#SContinue

Yes.

> On Thu, Mar 8, 2018 at 9:25 AM, fantasai via Unicode <unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
> 
> > Given that the comma and colon are categorized as SContinue,
> > why is the semicolon also not SContinue?
> 
> > Also, why is the Greek Question Mark not categorized with
> > the rest of the question marks?
> 
> ?As I recall?,?? both are because the semicolon can also represent a greek question mark
> (they are canonically equivalent, so you can't reliably distinguish between them).?

:/

I'm guessing this is why all other semicolons (which don't have
this problem) are also categorized as Other instead of SContinue?

Given SContinue is a set of punctuation that's ?softer? than
STerm, it seems to me it would make more sense to categorize
them all (including the Greek question mark) as SContinue,
and then allow implementations to tailor the Greek question
mark and semicolon to STerm as needed. Leaving them all under
Other means that all semicolons would have to be individually
tailored out of Other, which seems much more error-prone.

> > Why aren't the vertical presentation forms categorized with
> > the things they are presenting?
> 
> ?At least some of them are:
> U+FE10 ( ? ) PRESENTATION FORM FOR VERTICAL COMMA
> U+FE11 ( ? ) PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA
> U+FE13 ( ? ) PRESENTATION FORM FOR VERTICAL COLON
> U+FE31 ( ? ) PRESENTATION FORM FOR VERTICAL EM DASH
> U+FE32 ( ? ) PRESENTATION FORM FOR VERTICAL EN DASH

Yes, but others aren't:
  ? 	U+FE12	PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
  ? 	U+FE15	PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK
  ? 	U+FE16	PRESENTATION FORM FOR VERTICAL QUESTION MARK

https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AGeneral_category%3DPo%3A%5D&g=Sentence_Break&i=

I'm also wondering about Armenian, Coptic, and Ethiopic
   * Armenian exclamation mark and question mark are Other,
     whereas Latin (ASCII) places them as STerm.
   * None of the Coptic punctuation is categorized as non-Other,
     not even the full stop which I'd expect under STerm.
   * Ethiopic comma and colon are not grouped with commas and
     colons in general under SContinue.
Were these intentionally or accidentally placed under Other?

~fantasai


From unicode at unicode.org  Thu Mar 29 19:31:48 2018
From: unicode at unicode.org (Marcel Schneider via Unicode)
Date: Fri, 30 Mar 2018 02:31:48 +0200 (CEST)
Subject: Accessibility Emoji
Message-ID: <32693654.14872.1522369908394.JavaMail.www@wwinf1h30>

William,
?
On 29/03/18 17:03 William_J_G Overington via Unicode wrote:
>?
> I have been thinking about issues around the proposal.
> http://www.unicode.org/L2/L2018/18080-accessibility-emoji.pdf
> There is a sentence in that document that starts as follows.
>?
> > Emoji are a universal language and a powerful tool for communication, ....
?
That is clearly overstating the capabilities of emoji, and ignoring the borderline?
between verbal and pictographic expression. The appropriateness of each one?
depends mainly on semantics and context. The power of emoji may rely in their?
being polysemic, escaping censorship as already discussed during past years.
>?
> It seems to me that what is lacking with emoji are verbs and pronouns.
?
Along with these, one would need more nouns, too, setting up an autonomous?
language. That however is not the goal of emoji and is outside the scope of?
Unicode.
>?
> For example, "to be", "to have" and "to need". The verb "to need" might well?
be of particular importance in relation to accessibility considerations.
?
When accessibility matters, devices may be missing, and then the symbol charts?
are most appropriate, as seen. When somebody is pointing an object, the ?need??
case is most obvious anyway. Impaired persons may use a bundle of cards including?
textual messages. None of these justifies encoding extra emoji. E.g. when somebody?
wishes a relative to buy more bread while returning from work, the appropriate number
of loaves followed by an exclamation mark and a smile or heart may do it.
>?
> How could verbs be introduced into emoji? The verb "to love" can already be indicated using a heart symbol.
?
This is the one that people are likely to be most embarrassed typing out.?
>?
> Should abstract designs be used? Or should emoji always be pictographic?
?
Yes, they should always be highly iconic, Asmus explained in detail. See:
?
http://www.unicode.org/mail-arch/unicode-ml/y2015-m08/0014.html
>?
> If abstract designs were introduced would it be possible for the standards documents to include the meanings
> or would the standards documents need to simply use a geometrical description and then the meanings be
> regarded as a higher level protocol outside of the standard?
?
On one hand, Unicode does not encode semantics; but on the other hand, on character level, semantics are?
part of the documentation accompanying a number of characters in the Charts. There is a balance between?
polysemics and disambiguation. As a thumb rule: characters are disambiguated to ensure correct processing
of the data, so far as the cost induced by handling multiple characters doesn?t outweigh the benefit.?
In putting your question, you already answered it, except that there are geometric figures encoded for UIs,?
that therefore already have a meaning, yet are mostly generically named, leaving the door open to alternate?
semantics.
>?
> For, if abstract emoji were introduced with the intention of them to be of use as verbs in a universal language,
> it would be of benefit if the meanings were in the standard.
?
But such a language has clearly been stated as being out of scope of Unicode, and we aren?t even allowed?
to further discuss that particular topic, given the mass of threads and e?mails already dedicated to it in the past.
>?
> If abstract designs were used then the meanings would need to be learned. Yet if the meanings were
> universal that could be a useful development.
?
It would not, because automatic translation tools already cater for these needs, and possibly better. See:
?
http://unicode.org/pipermail/unicode/2015-October/003005.html
>?
> I have wondered whether verb tenses could be usefully expressed using some of the existing combining
> accent characters following an emoji verb character..
?
First of all, users should be likely to adopt the scheme in a fairly predictable way. I?m ignoring actual trends?
and can only repeat what has been said on this list: communities are missing, and so is interest.?
Hence, sadly to say, there is little through no point in elaborating further.
Personally I?m poorly armed to help building a user community, as I don?t have a smartphone, while being?
very busy with more and more tasks, leaving little time for many experiments. ?Sorry.
?
Best regards,
?
Marcel
>?
> For example, U+0302 COMBINING CIRCUMFLEX ACCENT to indicate that the verb is in the future tense, U+0304 COMBINING MACRON to indicate that the verb is in the present tense, U+030C COMBINING CARON to indicate that the verb is in the past tense, U+0303 COMBINING TILDE to indicate that the verb is in the conditional tense.
>?
> The desirability of pronouns was raised by a gentleman in the audience of a lecture at the Internationalization and Unicode Conference in 2015.
>?
> I tried to produce some designs. I could not find a way to do that with conventional illustrative pictures, though I did produce a set of abstract designs that could possibly be useful in application; they could be displayed in colourful emoji style yet also in monochrome without ambiguity. Yet they are abstract designs, so meanings would need to be learned rather than indicated by the picture itself. Yet if the meanings were universal, that could be useful. Should there be abstract emoji or should emoji only be conventional pictures?
>?
> William Overington
>?
> Thursday 29 March 2018
>?
>?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180330/a11109b5/attachment.html>