From cldr-users at unicode.org  Fri Feb  2 17:53:19 2018
From: cldr-users at unicode.org (Kip Cole via CLDR-Users)
Date: Sat, 3 Feb 2018 10:53:19 +1100
Subject: pt-BR in XML but not Json
Message-ID: <BF4BC9CA-54C4-4F40-AE89-856E477440AC@gmail.com>

I observe that the XML master data for CLDR versions 31 and 32 include the locale ?pt-BR? but the json repo does not.  May I ask if:

(a) I should file an issue because the json repo should include pt-BR?
(b) the locale ?pt? is considered to be Brazilian Portuguese as some googling suggests?

Many thanks,

?Kip

From cldr-users at unicode.org  Sun Feb  4 08:21:06 2018
From: cldr-users at unicode.org (Rafael Xavier via CLDR-Users)
Date: Sun, 4 Feb 2018 12:21:06 -0200
Subject: pt-BR in XML but not Json
In-Reply-To: <BF4BC9CA-54C4-4F40-AE89-856E477440AC@gmail.com>
References: <BF4BC9CA-54C4-4F40-AE89-856E477440AC@gmail.com>
Message-ID: <CADdLYsoTXUcQkoZfsMO9-HCZJPmKqfMGSzRcYAYn8yue9tSEfw@mail.gmail.com>

Hi Kip,

Your item (b) is correct... You may notice that pt_BR.xml
<https://www.unicode.org/repos/cldr/trunk/common/main/pt_BR.xml> is empty,
which means it's the default content
<http://cldr.unicode.org/translation/default-content> for pt.xml
<https://www.unicode.org/repos/cldr/trunk/common/main/pt.xml>. Note
pt_PT.xml <https://www.unicode.org/repos/cldr/trunk/common/main/pt_PT.xml>
has "Portuguese as spoken in Portugal" overrides only. Others could provide
additional details and correct me if I'm wrong.

Best,

On Fri, Feb 2, 2018 at 9:53 PM, Kip Cole via CLDR-Users <
cldr-users at unicode.org> wrote:

> I observe that the XML master data for CLDR versions 31 and 32 include the
> locale ?pt-BR? but the json repo does not.  May I ask if:
>
> (a) I should file an issue because the json repo should include pt-BR?
> (b) the locale ?pt? is considered to be Brazilian Portuguese as some
> googling suggests?
>
> Many thanks,
>
> ?Kip
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>


-- 
+55 (16) 98138-1583, skype: rxaviers
http://rafael.xavier.blog.br
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180204/d42ccb30/attachment.html>

From cldr-users at unicode.org  Thu Feb  8 01:01:59 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Thu, 8 Feb 2018 08:01:59 +0100 (CET)
Subject: Keyboards PRI #367 issues
Message-ID: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20>


Hello,

just joined CLDR-Users at Sarasvati?s invitation:
http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html

After having posted some feedback for PRI #367, 
I?m now bothered that one ticket is still unaccepted, 
although it contains indispensable features:

https://unicode.org/cldr/trac/ticket/10898

And that another ticket with editorial feedback is 
unaccepted:

https://unicode.org/cldr/trac/ticket/10901

?while its fellow editorial feedback (non-PRI) has 
been accepted:

https://unicode.org/cldr/trac/ticket/10906

Any hints about what?s wrong and how to improve 
are highly welcome.

CLDR and part 7 of UTS #35 seem to be the only 
de facto industrial standard for keyboard layouts 
that is actually taken into account by the industry.
Therefore it is important that all necessary features 
do make it into UTS #35-7.

Thanks in advance.

Regards,

Marcel


From cldr-users at unicode.org  Thu Feb  8 02:14:00 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Thu, 8 Feb 2018 09:14:00 +0100 (CET)
Subject: Keyboards PRI #367 issues
Message-ID: <898802699.3342.1518077640141.JavaMail.www@wwinf1c20>


Hello,

just joined CLDR-Users on Sarasvati?s invitation:
http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html

After having posted some feedback for PRI #367, 
it bothers me that two tickets are still unaccepted, 
while one of them contains indispensable features:

https://unicode.org/cldr/trac/ticket/10898

And another ticket has editorial feedback for UTS #35-7:

https://unicode.org/cldr/trac/ticket/10901

However its fellow editorial feedback (non-PRI) has 
been accepted:

https://unicode.org/cldr/trac/ticket/10906

Any hints about what?s wrong and how to improve 
are highly welcome.

CLDR and part 7 of UTS #35 seem to be the only 
de facto industrial standard for keyboard layouts 
that the industry actually relies upon.
Therefore it is important that all necessary features 
do make it into UTS #35-7.

Thanks in advance.

Regards,

Marcel


From cldr-users at unicode.org  Thu Feb  8 11:38:25 2018
From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users)
Date: Thu, 8 Feb 2018 09:38:25 -0800
Subject: Keyboards PRI #367 issues
In-Reply-To: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20>
References: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20>
Message-ID: <CAFYQx+CH1FJyRhgzEZg+tWjLnFKuA996xAnBjZ2jhyGzQC4QtQ@mail.gmail.com>

Hello, welcome to the CLDR Users list!

> After having posted some feedback for PRI #367, I?m now bothered that one
ticket is still unaccepted, although it contains indispensable features:

There's no reason to be bothered. The ticket is in the right place. The
ticket hasn't been rejected, just not accepted yet.


On Wed, Feb 7, 2018 at 11:01 PM, Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> wrote:

>
> Hello,
>
> just joined CLDR-Users at Sarasvati?s invitation:
> http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html
>
> After having posted some feedback for PRI #367,
> I?m now bothered that one ticket is still unaccepted,
> although it contains indispensable features:
>
> https://unicode.org/cldr/trac/ticket/10898
>
> And that another ticket with editorial feedback is
> unaccepted:
>
> https://unicode.org/cldr/trac/ticket/10901
>
> ?while its fellow editorial feedback (non-PRI) has
> been accepted:
>
> https://unicode.org/cldr/trac/ticket/10906
>
> Any hints about what?s wrong and how to improve
> are highly welcome.
>
> CLDR and part 7 of UTS #35 seem to be the only
> de facto industrial standard for keyboard layouts
> that is actually taken into account by the industry.
> Therefore it is important that all necessary features
> do make it into UTS #35-7.
>
> Thanks in advance.
>
> Regards,
>
> Marcel
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180208/b02d533c/attachment.html>

From cldr-users at unicode.org  Thu Feb  8 13:22:15 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Thu, 8 Feb 2018 20:22:15 +0100 (CET)
Subject: Keyboards PRI #367 issues
In-Reply-To: <CAFYQx+CH1FJyRhgzEZg+tWjLnFKuA996xAnBjZ2jhyGzQC4QtQ@mail.gmail.com>
References: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20>
 <CAFYQx+CH1FJyRhgzEZg+tWjLnFKuA996xAnBjZ2jhyGzQC4QtQ@mail.gmail.com>
Message-ID: <971227692.25370.1518117735893.JavaMail.www@wwinf1c20>

Hi Steven,

Thank you.

Anyway I haven?t finished yet, neither, am still editing.

Regards,

Marcel

On 08/02/18 18:38, Steven R. Loomis wrote:
> 
>
Hello, welcome to the CLDR Users list!

>
> After having posted some feedback for PRI #367, I?m now bothered that one ticket is still unaccepted,?although it contains indispensable features:

>
There's no reason to be bothered. The ticket is in the right place. The ticket hasn't been rejected, just not accepted yet.?


>
On Wed, Feb 7, 2018 at 11:01 PM, Marcel Schneider via CLDR-Users  wrote:
>

> Hello,
> 
> just joined CLDR-Users at Sarasvati?s invitation:
> http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html
> 
> After having posted some feedback for PRI #367,
> I?m now bothered that one ticket is still unaccepted,
> although it contains indispensable features:
> 
> https://unicode.org/cldr/trac/ticket/10898
> 
> And that another ticket with editorial feedback is
> unaccepted:
> 
> https://unicode.org/cldr/trac/ticket/10901
> 
> ?while its fellow editorial feedback (non-PRI) has
> been accepted:
> 
> https://unicode.org/cldr/trac/ticket/10906
> 
> Any hints about what?s wrong and how to improve
> are highly welcome.
> 
> CLDR and part 7 of UTS #35 seem to be the only
> de facto industrial standard for keyboard layouts
> that is actually taken into account by the industry.
> Therefore it is important that all necessary features
> do make it into UTS #35-7.
> 
> Thanks in advance.
> 
> Regards,
> 
> Marcel
> 
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>


>


From cldr-users at unicode.org  Fri Feb  9 04:00:44 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Fri, 9 Feb 2018 11:00:44 +0100 (CET)
Subject: CLDR ticket #10898: addition of content
Message-ID: <1503787976.6543.1518170444941.JavaMail.www@wwinf1c20>

I?ve summed up the rationale of using a dead-key based group selector 
in a new comment on ticket #10898:

https://unicode.org/cldr/trac/ticket/10898#comment:6

We know that ISO/IEC 9995 is widely disregarded by the industry, but 
the point in using such a group selector clearly exceeds what pertains 
to that standard. 

Generic dead keys are used also on layouts not conforming to ISO 9995. 

Mapping the generic group selector on AltGr/Option + Space is unrelated 
to any standard yet, but is actually proposed as the most straightforward
and interoperable (and intuitive) solution.

Any feedback is welcome. Please feel free to comment directly on the 
ticket (and the other related ticket #10851) likewise.

Regards,

Marcel


From cldr-users at unicode.org  Tue Feb 13 20:38:09 2018
From: cldr-users at unicode.org (Martin Hosken via CLDR-Users)
Date: Wed, 14 Feb 2018 09:38:09 +0700
Subject: block to script
Message-ID: <20180214093809.6300f4bc@sil-mh8>

Dear All,

Is there a way to get from a UBlockCode to a UScriptCode?

What? Aargh! No! Surely not! I hear you cry. But hold on a second. What I'm wanting to do is to add some (not perfect) future proofing to my application. When a new character is added to a block in Unicode, one can infer the script of that character, even if the character itself is unknown, from the block. But blocks get split! Yes they do. And this isn't a perfect solution. But block splits are rare, and this solution will give me a much better chance of an unknown character being handled 'appropriately' than being sure that the run break will break and having to wait however long until the next version of Unicode is released, ICU is updated and the application updated to that version of ICU.

Hence my question :)

Yours,
Martin

From cldr-users at unicode.org  Tue Feb 13 20:55:51 2018
From: cldr-users at unicode.org (Asmus Freytag via CLDR-Users)
Date: Tue, 13 Feb 2018 18:55:51 -0800
Subject: block to script
In-Reply-To: <20180214093809.6300f4bc@sil-mh8>
References: <20180214093809.6300f4bc@sil-mh8>
Message-ID: <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com>

On 2/13/2018 6:38 PM, Martin Hosken via CLDR-Users wrote:
> Dear All,
>
> Is there a way to get from a UBlockCode to a UScriptCode?
>
> What? Aargh! No! Surely not! I hear you cry. But hold on a second. What I'm wanting to do is to add some (not perfect) future proofing to my application. When a new character is added to a block in Unicode, one can infer the script of that character, even if the character itself is unknown, from the block. But blocks get split! Yes they do. And this isn't a perfect solution. But block splits are rare, and this solution will give me a much better chance of an unknown character being handled 'appropriately' than being sure that the run break will break and having to wait however long until the next version of Unicode is released, ICU is updated and the application updated to that version of ICU.
>
> Hence my question :)

Very simply count all the code points in the block that have a definite 
script assignment that's not COMMON/INHERITED (and not unassigned).

If a single script far outweighs both the COMMON/INHERITED and any other 
scripts, then "guessing" that a new character will end up with that 
script assignments will give you results that are better than "random".

And even if there is a combining mark assigned to a free spot, in many 
cases, whether you treat it as INHERITED or as having the script of its 
base character assigned to it makes no big difference (think script runs 
in a complex script).

Your algorithm will detect symbol and punctuation blocks and can predict 
COMMON as a likely script value.

Best thing is that for each? revision, your guesses will get better, 
that is, when you upgrade your application, it will improve not only 
assigned code points but the probabilistic guesses for some of the 
unassigned ones as well.

As long as you are aware that it's a probabilistic gamble, you should be 
fine.

Enjoy,

A./
>
> Yours,
> Martin
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>


From cldr-users at unicode.org  Wed Feb 14 08:42:10 2018
From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users)
Date: Wed, 14 Feb 2018 15:42:10 +0100
Subject: block to script
In-Reply-To: <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com>
References: <20180214093809.6300f4bc@sil-mh8>
 <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com>
Message-ID: <CAGa7JC1BWFcpA5_STd1u6vrwFjnXE5wxyN3+ePJmtgakTgL7Ag@mail.gmail.com>

We were told the blocks cannot be split to smaller units than a single
column of 16 codepoints: if any one position is assigned to a block, the
remaining codepoints in that column cannot be assigned to another block...
So unassigned positions in an allocated column should still belong to the
same block and may infer a default script property from that block (which
may turn to be a wrong guess only if that unassigned position gets assigned
a COMMON/INHERITED script).
Note however that some characters (notably currency signs, symbols or
punctuations) sometimes get used across several scripts without necessarily
being given a COMMON/INHERITED script). Most of these symbols are
bidi-neutral and should do not form complex ligatures or clusters: it means
you can almost safely assume some properties from the unassigned positions
in these allocated columns (for exampel to tune the default behavior of a
text rendering engine, if it ever has to render a character which was once
unallocated may gets finally assigned and found to be mapped in some new
font).

2018-02-14 3:55 GMT+01:00 Asmus Freytag via CLDR-Users <
cldr-users at unicode.org>:

> On 2/13/2018 6:38 PM, Martin Hosken via CLDR-Users wrote:
>
>> Dear All,
>>
>> Is there a way to get from a UBlockCode to a UScriptCode?
>>
>> What? Aargh! No! Surely not! I hear you cry. But hold on a second. What
>> I'm wanting to do is to add some (not perfect) future proofing to my
>> application. When a new character is added to a block in Unicode, one can
>> infer the script of that character, even if the character itself is
>> unknown, from the block. But blocks get split! Yes they do. And this isn't
>> a perfect solution. But block splits are rare, and this solution will give
>> me a much better chance of an unknown character being handled
>> 'appropriately' than being sure that the run break will break and having to
>> wait however long until the next version of Unicode is released, ICU is
>> updated and the application updated to that version of ICU.
>>
>> Hence my question :)
>>
>
> Very simply count all the code points in the block that have a definite
> script assignment that's not COMMON/INHERITED (and not unassigned).
>
> If a single script far outweighs both the COMMON/INHERITED and any other
> scripts, then "guessing" that a new character will end up with that script
> assignments will give you results that are better than "random".
>
> And even if there is a combining mark assigned to a free spot, in many
> cases, whether you treat it as INHERITED or as having the script of its
> base character assigned to it makes no big difference (think script runs in
> a complex script).
>
> Your algorithm will detect symbol and punctuation blocks and can predict
> COMMON as a likely script value.
>
> Best thing is that for each  revision, your guesses will get better, that
> is, when you upgrade your application, it will improve not only assigned
> code points but the probabilistic guesses for some of the unassigned ones
> as well.
>
> As long as you are aware that it's a probabilistic gamble, you should be
> fine.
>
> Enjoy,
>
> A./
>
>
>> Yours,
>> Martin
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>>
>>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180214/4b81df4c/attachment.html>

From cldr-users at unicode.org  Wed Feb 14 17:42:33 2018
From: cldr-users at unicode.org (Asmus Freytag via CLDR-Users)
Date: Wed, 14 Feb 2018 15:42:33 -0800
Subject: block to script
In-Reply-To: <CAGa7JC1BWFcpA5_STd1u6vrwFjnXE5wxyN3+ePJmtgakTgL7Ag@mail.gmail.com>
References: <20180214093809.6300f4bc@sil-mh8>
 <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com>
 <CAGa7JC1BWFcpA5_STd1u6vrwFjnXE5wxyN3+ePJmtgakTgL7Ag@mail.gmail.com>
Message-ID: <859df881-163a-193a-f2f7-095581d40e99@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180214/8118f5e0/attachment.html>

From cldr-users at unicode.org  Wed Feb 14 22:15:02 2018
From: cldr-users at unicode.org (Martin Hosken via CLDR-Users)
Date: Thu, 15 Feb 2018 11:15:02 +0700
Subject: block to script
In-Reply-To: <859df881-163a-193a-f2f7-095581d40e99@ix.netcom.com>
References: <20180214093809.6300f4bc@sil-mh8>
 <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com>
 <CAGa7JC1BWFcpA5_STd1u6vrwFjnXE5wxyN3+ePJmtgakTgL7Ag@mail.gmail.com>
 <859df881-163a-193a-f2f7-095581d40e99@ix.netcom.com>
Message-ID: <20180215111502.09f4a3de@sil-mh8>

Dear Asmus,

> The probability of block-boundary change is far less than the probability that the "guess" of the future script property for a code point turns out wrong for any of the other possible reasons. Therefore, it disappears in the noise. As long as you are willing to engage in "guessing" in the first place, small changes in probability simply don't matter.
> 
> There's also the question of whether you are better off "guessing" based on code point value alone, or whether it makes more sense to also use the surrounding context. If assembling script runs, for example, an unassigned code point in the middle of a run should have a higher probability of continuing the run when it is also in (one of) the blocks that cover the script, but unless the remainder of text is marked by large script variability, that probability should normally already be high.

This is true for the complexities of Latin/Common, but when it comes to Non Roman scripts, things become a lot clearer.

> Whether it's worth making all these guesses is questionable, but I'm willing to go along and assume that some credible scenarios might exist.

For the use cases that do call for this, things are much more clear cut.

Here is my take on a block to script list. As you can see:

1. There are many (34%) full blocks for which any and no value is just fine.
2. UNKNOWN vs SYMBOLS vs COMMON is ambiguous and I've made a best case. (30%)

#include "unicode/uscript.h"

#define USCRIPT_FULL USCRIPT_INVALID_CODE
#define USCRIPT_MATH USCRIPT_MATHEMATICAL_NOTATION
#define _(x) USCRIPT_##x

UScriptCode block_script[] = {
    _(INVALID_CODE), _(FULL), _(FULL),  _(FULL),    _(FULL),    _(FULL),    _(FULL),    _(FULL),
    _(GREEK),   _(FULL),    _(ARMENIAN), _(HEBREW), _(ARABIC),  _(SYRIAC),  _(THAANA),  _(FULL),
    _(BENGALI), _(GURMUKHI), _(GUJARATI), _(ORIYA), _(TAMIL),   _(TELUGU),  _(KANNADA), _(MALAYALAM),
    _(SINHALA), _(THAI),    _(LAO),     _(TIBETAN), _(FULL),    _(GEORGIAN), _(HANGUL), _(ETHIOPIC),
    _(CHEROKEE), _(UCAS),   _(OGHMA),   _(RUNIC),   _(KHMER),   _(MONGOLIAN), _(FULL),  _(GREEK),
    _(COMMON),  _(COMMON),  _(COMMON),  _(INHERITED), _(FULL),  _(UNKNOWN), _(FULL),    _(FULL),
    _(FULL),    _(UNKNOWN), _(COMMON),  _(FULL),    _(FULL),    _(FULL),    _(FULL),    _(FULL),
    _(FULL),    _(FULL),    _(HAN),     _(HAN),     _(HAN),     _(FULL),    _(KATAKANI_OR_HIRAGANA), _(FULL),
    _(BOPOMOFO), _(HANGUL), _(FULL),    _(BOPOMOFO), _(HAN),    _(FULL),    _(HAN),     _(HAN),
    _(YI),      _(YI),      _(HANGUL),  _(UNKNOWN), _(UNKNOWN), _(UNKNOWN), _(UNKNOWN), _(HAN),
    _(UNKNOWN), _(ARABIC),  _(FULL),    _(FULL),    _(COMMON),  _(ARABIC),  _(UNKNOWN), _(UNKNOWN),
// Unicode 3.1
    _(OLD_ITALIC), _(GOTHIC), _(DESERET), _(SYMBOLS), _(SYMBOLS), _(MATH),  _(HAN),     _(HAN),
    _(UNKNOWN), _(FULL),    _(TAGALOG), _(HANUNOO), _(BUHID),   _(TAGBANWA), _(FULL),   _(FULL),
    _(FULL),    _(FULL),    _(FULL),    _(FULL),    _(FULL),    _(UNKNOWN), _(UNKNOWN), _(LIMBU),
// Unicode 4
    _(TAI_LE),  _(KHMER),   _(FULL),    _(SYMBOLS), _(FULL),    _(LINEAR_B), _(LINEAR_B), _(UNKNOWN),
    _(UGARITIC), _(FULL),   _(OSMANYA), _(CYPRIOT), _(UNKNOWN), _(FULL),    _(UNKNOWN), _(UNKNOWN),
    _(FULL),    _(BUGINESE), _(HAN),    _(INHERITED), _(COPTIC), _(ETHIOPIC), _(ETHIOPIC), _(GEORGIAN),
    _(GLAGOLITIC), _(KHAROSHTHI), _(FULL), _(NEW_TAI_LUE), _(OLD_PERSIAN), _(FULL), _(UNKNOWN), _(SYLOTI_NAGRI),
    _(TIFINAGH), _(UNKNOWN), _(NKO),    _(BALINESE), _(FULL),   _(FULL),    _(PHAGS_PA), _(PHOENECIAN),
    _(CUNEIFORM), _(CUNEIFORM), _(UNKNOWN), _(SUNDANESE), _(LEPCHA), _(OL_CHIKI), _(FULL), _(VAI),
    _(FULL),    _(SAURASHTRA), _(FULL), _(REJANG),  _(CHAM),    _(UNKNOWN), _(UNKNOWN), _(LYCIAN),
    _(CARIAN),  _(LYDIAN),  _(SYMBOLS), _(SYMBOLS), _(SAMARITAN), _(UCAS),  _(LANNA),   _(DEVANAGARI),
    _(FULL),    _(BAMUM),   _(DAVANAGARI), _(DEVANAGARI), _(HANGUL), _(JAVANESE), _(FULL), _(TAI_VIET),
    _(MEITEI_MAYEK), _(HANGUL), _(IMPERIAL_ARAMAIC), _(FULL), _(AVESTAN), _(INSCRIPTIONAL_PARTHIAN), _(INSCRIPTIONAL_PAHLAVI), _(ORKHON),
    _(UNKNOWN), _(KAITHI),  _(EGYPTIAN_HIEROGLYPHS), _(UNKNOWN), _(HAN), _(HAN), _(MANDAIC), _(BATAK),
    _(ETHIOPIC), _(BRAHMI), _(BAMUM),   _(KATAKANI_OR_HIRAGANA), _(SYMBOLS), _(SYMBOLS), _(SYMBOLS), _(SYMBOLS),
    _(SYMBOLS), _(HAN),     _(ARABIC),  _(SYMBOLS), _(CHAKMA),  _(MEITEI_MAYEK), _(MEROITIC_CURSIVE), _(FULL),
    _(MIAO),    _(SHARADA), _(SORA_SOMPENG), _(SUNDANESE), _(TAKRI), _(BASSA_VAH), _(CAUCASIAN_ALBANIAN), _(COPTIC),
    _(INHERITED), _(DUPLOYAN_SHORTAND), _(ELBASAN), _(SYMBOLS), _(GRANTHA), _(KHOJKI), _(KHUDAWADI), _(LATIN),
    _(LINEAR_A), _(MAHAJANI), _(MANICHAEAN), _(MENDE), _(MODI), _(MRO),     _(MYANMAR), _(NABATAEAN),
    _(FULL),    _(OLD_PERMIC), _(SYMBOLS), _(PAHAWH_HMONG), _(FULL), _(PAU_CIN_HAU), _(PSALTER_PAHLAVI), _(COMMON),
    _(SIDDHAM), _(SINHALA), _(SYMBOLS), _(TIRHUTA), _(WARANG_CITI)
};


> 
> A./
> 
> On 2/14/2018 6:42 AM, Philippe Verdy via CLDR-Users wrote:
> We were told the blocks cannot be split to smaller units than a single column of 16 codepoints: if any one position is assigned to a block, the remaining codepoints in that column cannot be assigned to another block...
> > So unassigned positions in an allocated column should still belong to the same block and may infer a default script property from that block (which may turn to be a wrong guess only if that unassigned position gets assigned a COMMON/INHERITED script).
> > Note however that some characters (notably currency signs, symbols or punctuations) sometimes get used across several scripts without necessarily being given a COMMON/INHERITED script). Most of these symbols are bidi-neutral and should do not form complex ligatures or clusters: it means you can almost safely assume some properties from the unassigned positions in these allocated columns (for exampel to tune the default behavior of a text rendering engine, if it ever has to render a character which was once unallocated may gets finally assigned and found to be mapped in some new font).
> > 
> > 2018-02-14 3:55 GMT+01:00 Asmus Freytag via CLDR-Users <cldr-users at unicode.org>:
> > On 2/13/2018 6:38 PM, Martin Hosken via CLDR-Users wrote:
> >> Dear All,
> >>> 
> >>> Is there a way to get from a UBlockCode to a UScriptCode?
> >>> 
> >>> What? Aargh! No! Surely not! I hear you cry. But hold on a second. What I'm wanting to do is to add some (not perfect) future proofing to my application. When a new character is added to a block in Unicode, one can infer the script of that character, even if the character itself is unknown, from the block. But blocks get split! Yes they do. And this isn't a perfect solution. But block splits are rare, and this solution will give me a much better chance of an unknown character being handled 'appropriately' than being sure that the run break will break and having to wait however long until the next version of Unicode is released, ICU is updated and the application updated to that version of ICU.
> >>> 
> >>> Hence my question :)
> >>> 
> >> Very simply count all the code points in the block that have a definite script assignment that's not COMMON/INHERITED (and not unassigned).
> >> 
> >> If a single script far outweighs both the COMMON/INHERITED and any other scripts, then "guessing" that a new character will end up with that script assignments will give you results that are better than "random".
> >> 
> >> And even if there is a combining mark assigned to a free spot, in many cases, whether you treat it as INHERITED or as having the script of its base character assigned to it makes no big difference (think script runs in a complex script).
> >> 
> >> Your algorithm will detect symbol and punctuation blocks and can predict COMMON as a likely script value.
> >> 
> >> Best thing is that for each? revision, your guesses will get better, that is, when you upgrade your application, it will improve not only assigned code points but the probabilistic guesses for some of the unassigned ones as well.
> >> 
> >> As long as you are aware that it's a probabilistic gamble, you should be fine.
> >> 
> >> Enjoy,
> >> 
> >> A./
> >> 
> >> 
> >>> Yours,
> >>> Martin
> >>> _______________________________________________
> >>> CLDR-Users mailing list
> >>> CLDR-Users at unicode.org
> >>> http://unicode.org/mailman/listinfo/cldr-users
> >>> 
> >>> 
> >> _______________________________________________
> >> CLDR-Users mailing list
> >> CLDR-Users at unicode.org
> >> http://unicode.org/mailman/listinfo/cldr-users
> >> 
> > 
> > 
> > _______________________________________________
> > CLDR-Users mailing list
> > CLDR-Users at unicode.org
> > http://unicode.org/mailman/listinfo/cldr-users
> > 
> 
> 


From cldr-users at unicode.org  Fri Feb 16 12:34:14 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Fri, 16 Feb 2018 19:34:14 +0100 (CET)
Subject: Additional modifiers and toggles in beta
Message-ID: <1659792649.13149.1518806054787.JavaMail.www@wwinf1h28>

The PRI #367-related ?Additional modifiers and toggles? are currently in beta:

https://unicode.org/cldr/trac/ticket/10851#comment:9

Various prototypes are desirable for testing. A ?US-qwerty with additions? has 
proposed charts now up-to-date (with tooltips) and is about to be implemented 
for Windows and macOS:

http://charupdate.info/doc/kbenintu/

Any feedback will be welcomed, e.g. about:
? what characters are expected to be most easily accessed;
? whether CapsLock and Programmer toggle should be inverted; 
? mapping of the Numbers modifier (proposed on Left Alt);
? additionally remapping Backspace for convenience or not.

The linked page shall be completed once both implementations are released.

Regards,

Marcel


From cldr-users at unicode.org  Wed Feb 21 15:26:06 2018
From: cldr-users at unicode.org (John Emmons via CLDR-Users)
Date: Wed, 21 Feb 2018 15:26:06 -0600
Subject: Currency changes in v33
Message-ID: <OFF86B87E2.5CAD2263-ON8625823B.0074FD22-8625823B.0075C089@notes.na.collabserv.com>

CLDR 33 adds one additional currency code MRU for Mauritania, replacing 
MRO as of 2018-01-01.  Localized names have been updated accordingly.

Also, the names for currencies STD/STN that were added in 32 have been 
updated to reflect the current dates.

In English:

STN = S?o Tom? & Pr?ncipe Dobra
STD = S?o Tom? & Pr?ncipe Dobra (1977?2017)

Regards,

John C. Emmons
Globalization Architect & Unicode CLDR TC Vice Chairman
IBM Globalization Team
e-mail: emmo at us.ibm.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180221/8004c2e2/attachment-0001.html>

From cldr-users at unicode.org  Fri Feb 23 12:01:55 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Fri, 23 Feb 2018 19:01:55 +0100 (CET)
Subject: PRI #367 alpha for Windows
Message-ID: <1863701925.17028.1519408915410.JavaMail.www@wwinf1j01>

For CLDR Users following PRI #367:
A working model illustrating ticket #10851 is now available for Windows:

https://unicode.org/cldr/trac/ticket/10851#comment:16

Regards,

Marcel

From cldr-users at unicode.org  Sun Feb 25 11:20:21 2018
From: cldr-users at unicode.org (Mike Wesner via CLDR-Users)
Date: Sun, 25 Feb 2018 11:20:21 -0600
Subject: cldr keyboard platform questions
Message-ID: <CAB7-a5K26A7=992nk1VdEKcLQu=z7XGTo3XaYw7fP4ynKtgjLQ@mail.gmail.com>

I am interested in using the CLDR Keyboards data to create a mapping of
unicode characters to HID keycode data for a hardware project.  The device
intends to support some common keyboard layouts for some common platforms
such as windows, osx, linux, iOS, Android.  (with obvious restrictions, it
probably wont support all possible outputs, caps lock required, or
longpress or transforms)

I have scripts that are successfully using osx, windows, but I have some
questions about other platforms.

1. Where is the linux layout data?  iOS?

2. For android, the _platform.xml is lacking a hardwareMap.  How do I know
what keycodes map to the ISO codes?

Thank you for any assistance you can provide.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180225/59f3c4c0/attachment.html>

From cldr-users at unicode.org  Sun Feb 25 13:16:16 2018
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Sun, 25 Feb 2018 20:16:16 +0100
Subject: cldr keyboard platform questions
In-Reply-To: <CAB7-a5K26A7=992nk1VdEKcLQu=z7XGTo3XaYw7fP4ynKtgjLQ@mail.gmail.com>
References: <CAB7-a5K26A7=992nk1VdEKcLQu=z7XGTo3XaYw7fP4ynKtgjLQ@mail.gmail.com>
Message-ID: <CAJ2xs_Gh3mj15Odzr3S9=mCTv60ZjgyXRoZpsxYT-j0SiFXxXQ@mail.gmail.com>

1. The chromeos data includes a subset of the linux data. (Note that the
iOS data is older...)
2. There isn't a hardwareMap for the android platform, since it is virtual.
You could use the ISO codes to construct one.

Note that we are working on extensions of the keyboard mechanism:
http://www.unicode.org/review/pri367/

Mark

On Sun, Feb 25, 2018 at 6:20 PM, Mike Wesner via CLDR-Users <
cldr-users at unicode.org> wrote:

> I am interested in using the CLDR Keyboards data to create a mapping of
> unicode characters to HID keycode data for a hardware project.  The device
> intends to support some common keyboard layouts for some common platforms
> such as windows, osx, linux, iOS, Android.  (with obvious restrictions, it
> probably wont support all possible outputs, caps lock required, or
> longpress or transforms)
>
> I have scripts that are successfully using osx, windows, but I have some
> questions about other platforms.
>
> 1. Where is the linux layout data?  iOS?
>
> 2. For android, the _platform.xml is lacking a hardwareMap.  How do I know
> what keycodes map to the ISO codes?
>
> Thank you for any assistance you can provide.
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180225/d4e5c8f3/attachment.html>