From cldr-users at unicode.org Fri Feb 2 17:53:19 2018 From: cldr-users at unicode.org (Kip Cole via CLDR-Users) Date: Sat, 3 Feb 2018 10:53:19 +1100 Subject: pt-BR in XML but not Json Message-ID: I observe that the XML master data for CLDR versions 31 and 32 include the locale ?pt-BR? but the json repo does not. May I ask if: (a) I should file an issue because the json repo should include pt-BR? (b) the locale ?pt? is considered to be Brazilian Portuguese as some googling suggests? Many thanks, ?Kip From cldr-users at unicode.org Sun Feb 4 08:21:06 2018 From: cldr-users at unicode.org (Rafael Xavier via CLDR-Users) Date: Sun, 4 Feb 2018 12:21:06 -0200 Subject: pt-BR in XML but not Json In-Reply-To: References: Message-ID: Hi Kip, Your item (b) is correct... You may notice that pt_BR.xml is empty, which means it's the default content for pt.xml . Note pt_PT.xml has "Portuguese as spoken in Portugal" overrides only. Others could provide additional details and correct me if I'm wrong. Best, On Fri, Feb 2, 2018 at 9:53 PM, Kip Cole via CLDR-Users < cldr-users at unicode.org> wrote: > I observe that the XML master data for CLDR versions 31 and 32 include the > locale ?pt-BR? but the json repo does not. May I ask if: > > (a) I should file an issue because the json repo should include pt-BR? > (b) the locale ?pt? is considered to be Brazilian Portuguese as some > googling suggests? > > Many thanks, > > ?Kip > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -- +55 (16) 98138-1583, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Feb 8 01:01:59 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 8 Feb 2018 08:01:59 +0100 (CET) Subject: Keyboards PRI #367 issues Message-ID: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20> Hello, just joined CLDR-Users at Sarasvati?s invitation: http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html After having posted some feedback for PRI #367, I?m now bothered that one ticket is still unaccepted, although it contains indispensable features: https://unicode.org/cldr/trac/ticket/10898 And that another ticket with editorial feedback is unaccepted: https://unicode.org/cldr/trac/ticket/10901 ?while its fellow editorial feedback (non-PRI) has been accepted: https://unicode.org/cldr/trac/ticket/10906 Any hints about what?s wrong and how to improve are highly welcome. CLDR and part 7 of UTS #35 seem to be the only de facto industrial standard for keyboard layouts that is actually taken into account by the industry. Therefore it is important that all necessary features do make it into UTS #35-7. Thanks in advance. Regards, Marcel From cldr-users at unicode.org Thu Feb 8 02:14:00 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 8 Feb 2018 09:14:00 +0100 (CET) Subject: Keyboards PRI #367 issues Message-ID: <898802699.3342.1518077640141.JavaMail.www@wwinf1c20> Hello, just joined CLDR-Users on Sarasvati?s invitation: http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html After having posted some feedback for PRI #367, it bothers me that two tickets are still unaccepted, while one of them contains indispensable features: https://unicode.org/cldr/trac/ticket/10898 And another ticket has editorial feedback for UTS #35-7: https://unicode.org/cldr/trac/ticket/10901 However its fellow editorial feedback (non-PRI) has been accepted: https://unicode.org/cldr/trac/ticket/10906 Any hints about what?s wrong and how to improve are highly welcome. CLDR and part 7 of UTS #35 seem to be the only de facto industrial standard for keyboard layouts that the industry actually relies upon. Therefore it is important that all necessary features do make it into UTS #35-7. Thanks in advance. Regards, Marcel From cldr-users at unicode.org Thu Feb 8 11:38:25 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Thu, 8 Feb 2018 09:38:25 -0800 Subject: Keyboards PRI #367 issues In-Reply-To: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20> References: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20> Message-ID: Hello, welcome to the CLDR Users list! > After having posted some feedback for PRI #367, I?m now bothered that one ticket is still unaccepted, although it contains indispensable features: There's no reason to be bothered. The ticket is in the right place. The ticket hasn't been rejected, just not accepted yet. On Wed, Feb 7, 2018 at 11:01 PM, Marcel Schneider via CLDR-Users < cldr-users at unicode.org> wrote: > > Hello, > > just joined CLDR-Users at Sarasvati?s invitation: > http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html > > After having posted some feedback for PRI #367, > I?m now bothered that one ticket is still unaccepted, > although it contains indispensable features: > > https://unicode.org/cldr/trac/ticket/10898 > > And that another ticket with editorial feedback is > unaccepted: > > https://unicode.org/cldr/trac/ticket/10901 > > ?while its fellow editorial feedback (non-PRI) has > been accepted: > > https://unicode.org/cldr/trac/ticket/10906 > > Any hints about what?s wrong and how to improve > are highly welcome. > > CLDR and part 7 of UTS #35 seem to be the only > de facto industrial standard for keyboard layouts > that is actually taken into account by the industry. > Therefore it is important that all necessary features > do make it into UTS #35-7. > > Thanks in advance. > > Regards, > > Marcel > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Feb 8 13:22:15 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 8 Feb 2018 20:22:15 +0100 (CET) Subject: Keyboards PRI #367 issues In-Reply-To: References: <1296523422.1756.1518073319661.JavaMail.www@wwinf1c20> Message-ID: <971227692.25370.1518117735893.JavaMail.www@wwinf1c20> Hi Steven, Thank you. Anyway I haven?t finished yet, neither, am still editing. Regards, Marcel On 08/02/18 18:38, Steven R. Loomis wrote: > > Hello, welcome to the CLDR Users list! > > After having posted some feedback for PRI #367, I?m now bothered that one ticket is still unaccepted,?although it contains indispensable features: > There's no reason to be bothered. The ticket is in the right place. The ticket hasn't been rejected, just not accepted yet.? > On Wed, Feb 7, 2018 at 11:01 PM, Marcel Schneider via CLDR-Users wrote: > > Hello, > > just joined CLDR-Users at Sarasvati?s invitation: > http://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0193.html > > After having posted some feedback for PRI #367, > I?m now bothered that one ticket is still unaccepted, > although it contains indispensable features: > > https://unicode.org/cldr/trac/ticket/10898 > > And that another ticket with editorial feedback is > unaccepted: > > https://unicode.org/cldr/trac/ticket/10901 > > ?while its fellow editorial feedback (non-PRI) has > been accepted: > > https://unicode.org/cldr/trac/ticket/10906 > > Any hints about what?s wrong and how to improve > are highly welcome. > > CLDR and part 7 of UTS #35 seem to be the only > de facto industrial standard for keyboard layouts > that is actually taken into account by the industry. > Therefore it is important that all necessary features > do make it into UTS #35-7. > > Thanks in advance. > > Regards, > > Marcel > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > From cldr-users at unicode.org Fri Feb 9 04:00:44 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Fri, 9 Feb 2018 11:00:44 +0100 (CET) Subject: CLDR ticket #10898: addition of content Message-ID: <1503787976.6543.1518170444941.JavaMail.www@wwinf1c20> I?ve summed up the rationale of using a dead-key based group selector in a new comment on ticket #10898: https://unicode.org/cldr/trac/ticket/10898#comment:6 We know that ISO/IEC 9995 is widely disregarded by the industry, but the point in using such a group selector clearly exceeds what pertains to that standard. Generic dead keys are used also on layouts not conforming to ISO 9995. Mapping the generic group selector on AltGr/Option + Space is unrelated to any standard yet, but is actually proposed as the most straightforward and interoperable (and intuitive) solution. Any feedback is welcome. Please feel free to comment directly on the ticket (and the other related ticket #10851) likewise. Regards, Marcel From cldr-users at unicode.org Tue Feb 13 20:38:09 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Wed, 14 Feb 2018 09:38:09 +0700 Subject: block to script Message-ID: <20180214093809.6300f4bc@sil-mh8> Dear All, Is there a way to get from a UBlockCode to a UScriptCode? What? Aargh! No! Surely not! I hear you cry. But hold on a second. What I'm wanting to do is to add some (not perfect) future proofing to my application. When a new character is added to a block in Unicode, one can infer the script of that character, even if the character itself is unknown, from the block. But blocks get split! Yes they do. And this isn't a perfect solution. But block splits are rare, and this solution will give me a much better chance of an unknown character being handled 'appropriately' than being sure that the run break will break and having to wait however long until the next version of Unicode is released, ICU is updated and the application updated to that version of ICU. Hence my question :) Yours, Martin From cldr-users at unicode.org Tue Feb 13 20:55:51 2018 From: cldr-users at unicode.org (Asmus Freytag via CLDR-Users) Date: Tue, 13 Feb 2018 18:55:51 -0800 Subject: block to script In-Reply-To: <20180214093809.6300f4bc@sil-mh8> References: <20180214093809.6300f4bc@sil-mh8> Message-ID: <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com> On 2/13/2018 6:38 PM, Martin Hosken via CLDR-Users wrote: > Dear All, > > Is there a way to get from a UBlockCode to a UScriptCode? > > What? Aargh! No! Surely not! I hear you cry. But hold on a second. What I'm wanting to do is to add some (not perfect) future proofing to my application. When a new character is added to a block in Unicode, one can infer the script of that character, even if the character itself is unknown, from the block. But blocks get split! Yes they do. And this isn't a perfect solution. But block splits are rare, and this solution will give me a much better chance of an unknown character being handled 'appropriately' than being sure that the run break will break and having to wait however long until the next version of Unicode is released, ICU is updated and the application updated to that version of ICU. > > Hence my question :) Very simply count all the code points in the block that have a definite script assignment that's not COMMON/INHERITED (and not unassigned). If a single script far outweighs both the COMMON/INHERITED and any other scripts, then "guessing" that a new character will end up with that script assignments will give you results that are better than "random". And even if there is a combining mark assigned to a free spot, in many cases, whether you treat it as INHERITED or as having the script of its base character assigned to it makes no big difference (think script runs in a complex script). Your algorithm will detect symbol and punctuation blocks and can predict COMMON as a likely script value. Best thing is that for each? revision, your guesses will get better, that is, when you upgrade your application, it will improve not only assigned code points but the probabilistic guesses for some of the unassigned ones as well. As long as you are aware that it's a probabilistic gamble, you should be fine. Enjoy, A./ > > Yours, > Martin > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > From cldr-users at unicode.org Wed Feb 14 08:42:10 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Wed, 14 Feb 2018 15:42:10 +0100 Subject: block to script In-Reply-To: <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com> References: <20180214093809.6300f4bc@sil-mh8> <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com> Message-ID: We were told the blocks cannot be split to smaller units than a single column of 16 codepoints: if any one position is assigned to a block, the remaining codepoints in that column cannot be assigned to another block... So unassigned positions in an allocated column should still belong to the same block and may infer a default script property from that block (which may turn to be a wrong guess only if that unassigned position gets assigned a COMMON/INHERITED script). Note however that some characters (notably currency signs, symbols or punctuations) sometimes get used across several scripts without necessarily being given a COMMON/INHERITED script). Most of these symbols are bidi-neutral and should do not form complex ligatures or clusters: it means you can almost safely assume some properties from the unassigned positions in these allocated columns (for exampel to tune the default behavior of a text rendering engine, if it ever has to render a character which was once unallocated may gets finally assigned and found to be mapped in some new font). 2018-02-14 3:55 GMT+01:00 Asmus Freytag via CLDR-Users < cldr-users at unicode.org>: > On 2/13/2018 6:38 PM, Martin Hosken via CLDR-Users wrote: > >> Dear All, >> >> Is there a way to get from a UBlockCode to a UScriptCode? >> >> What? Aargh! No! Surely not! I hear you cry. But hold on a second. What >> I'm wanting to do is to add some (not perfect) future proofing to my >> application. When a new character is added to a block in Unicode, one can >> infer the script of that character, even if the character itself is >> unknown, from the block. But blocks get split! Yes they do. And this isn't >> a perfect solution. But block splits are rare, and this solution will give >> me a much better chance of an unknown character being handled >> 'appropriately' than being sure that the run break will break and having to >> wait however long until the next version of Unicode is released, ICU is >> updated and the application updated to that version of ICU. >> >> Hence my question :) >> > > Very simply count all the code points in the block that have a definite > script assignment that's not COMMON/INHERITED (and not unassigned). > > If a single script far outweighs both the COMMON/INHERITED and any other > scripts, then "guessing" that a new character will end up with that script > assignments will give you results that are better than "random". > > And even if there is a combining mark assigned to a free spot, in many > cases, whether you treat it as INHERITED or as having the script of its > base character assigned to it makes no big difference (think script runs in > a complex script). > > Your algorithm will detect symbol and punctuation blocks and can predict > COMMON as a likely script value. > > Best thing is that for each revision, your guesses will get better, that > is, when you upgrade your application, it will improve not only assigned > code points but the probabilistic guesses for some of the unassigned ones > as well. > > As long as you are aware that it's a probabilistic gamble, you should be > fine. > > Enjoy, > > A./ > > >> Yours, >> Martin >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Feb 14 17:42:33 2018 From: cldr-users at unicode.org (Asmus Freytag via CLDR-Users) Date: Wed, 14 Feb 2018 15:42:33 -0800 Subject: block to script In-Reply-To: References: <20180214093809.6300f4bc@sil-mh8> <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com> Message-ID: <859df881-163a-193a-f2f7-095581d40e99@ix.netcom.com> An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Feb 14 22:15:02 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Thu, 15 Feb 2018 11:15:02 +0700 Subject: block to script In-Reply-To: <859df881-163a-193a-f2f7-095581d40e99@ix.netcom.com> References: <20180214093809.6300f4bc@sil-mh8> <13a0a178-4f50-8cea-c0a9-47ed71f5eac8@ix.netcom.com> <859df881-163a-193a-f2f7-095581d40e99@ix.netcom.com> Message-ID: <20180215111502.09f4a3de@sil-mh8> Dear Asmus, > The probability of block-boundary change is far less than the probability that the "guess" of the future script property for a code point turns out wrong for any of the other possible reasons. Therefore, it disappears in the noise. As long as you are willing to engage in "guessing" in the first place, small changes in probability simply don't matter. > > There's also the question of whether you are better off "guessing" based on code point value alone, or whether it makes more sense to also use the surrounding context. If assembling script runs, for example, an unassigned code point in the middle of a run should have a higher probability of continuing the run when it is also in (one of) the blocks that cover the script, but unless the remainder of text is marked by large script variability, that probability should normally already be high. This is true for the complexities of Latin/Common, but when it comes to Non Roman scripts, things become a lot clearer. > Whether it's worth making all these guesses is questionable, but I'm willing to go along and assume that some credible scenarios might exist. For the use cases that do call for this, things are much more clear cut. Here is my take on a block to script list. As you can see: 1. There are many (34%) full blocks for which any and no value is just fine. 2. UNKNOWN vs SYMBOLS vs COMMON is ambiguous and I've made a best case. (30%) #include "unicode/uscript.h" #define USCRIPT_FULL USCRIPT_INVALID_CODE #define USCRIPT_MATH USCRIPT_MATHEMATICAL_NOTATION #define _(x) USCRIPT_##x UScriptCode block_script[] = { _(INVALID_CODE), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(GREEK), _(FULL), _(ARMENIAN), _(HEBREW), _(ARABIC), _(SYRIAC), _(THAANA), _(FULL), _(BENGALI), _(GURMUKHI), _(GUJARATI), _(ORIYA), _(TAMIL), _(TELUGU), _(KANNADA), _(MALAYALAM), _(SINHALA), _(THAI), _(LAO), _(TIBETAN), _(FULL), _(GEORGIAN), _(HANGUL), _(ETHIOPIC), _(CHEROKEE), _(UCAS), _(OGHMA), _(RUNIC), _(KHMER), _(MONGOLIAN), _(FULL), _(GREEK), _(COMMON), _(COMMON), _(COMMON), _(INHERITED), _(FULL), _(UNKNOWN), _(FULL), _(FULL), _(FULL), _(UNKNOWN), _(COMMON), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(HAN), _(HAN), _(HAN), _(FULL), _(KATAKANI_OR_HIRAGANA), _(FULL), _(BOPOMOFO), _(HANGUL), _(FULL), _(BOPOMOFO), _(HAN), _(FULL), _(HAN), _(HAN), _(YI), _(YI), _(HANGUL), _(UNKNOWN), _(UNKNOWN), _(UNKNOWN), _(UNKNOWN), _(HAN), _(UNKNOWN), _(ARABIC), _(FULL), _(FULL), _(COMMON), _(ARABIC), _(UNKNOWN), _(UNKNOWN), // Unicode 3.1 _(OLD_ITALIC), _(GOTHIC), _(DESERET), _(SYMBOLS), _(SYMBOLS), _(MATH), _(HAN), _(HAN), _(UNKNOWN), _(FULL), _(TAGALOG), _(HANUNOO), _(BUHID), _(TAGBANWA), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(FULL), _(UNKNOWN), _(UNKNOWN), _(LIMBU), // Unicode 4 _(TAI_LE), _(KHMER), _(FULL), _(SYMBOLS), _(FULL), _(LINEAR_B), _(LINEAR_B), _(UNKNOWN), _(UGARITIC), _(FULL), _(OSMANYA), _(CYPRIOT), _(UNKNOWN), _(FULL), _(UNKNOWN), _(UNKNOWN), _(FULL), _(BUGINESE), _(HAN), _(INHERITED), _(COPTIC), _(ETHIOPIC), _(ETHIOPIC), _(GEORGIAN), _(GLAGOLITIC), _(KHAROSHTHI), _(FULL), _(NEW_TAI_LUE), _(OLD_PERSIAN), _(FULL), _(UNKNOWN), _(SYLOTI_NAGRI), _(TIFINAGH), _(UNKNOWN), _(NKO), _(BALINESE), _(FULL), _(FULL), _(PHAGS_PA), _(PHOENECIAN), _(CUNEIFORM), _(CUNEIFORM), _(UNKNOWN), _(SUNDANESE), _(LEPCHA), _(OL_CHIKI), _(FULL), _(VAI), _(FULL), _(SAURASHTRA), _(FULL), _(REJANG), _(CHAM), _(UNKNOWN), _(UNKNOWN), _(LYCIAN), _(CARIAN), _(LYDIAN), _(SYMBOLS), _(SYMBOLS), _(SAMARITAN), _(UCAS), _(LANNA), _(DEVANAGARI), _(FULL), _(BAMUM), _(DAVANAGARI), _(DEVANAGARI), _(HANGUL), _(JAVANESE), _(FULL), _(TAI_VIET), _(MEITEI_MAYEK), _(HANGUL), _(IMPERIAL_ARAMAIC), _(FULL), _(AVESTAN), _(INSCRIPTIONAL_PARTHIAN), _(INSCRIPTIONAL_PAHLAVI), _(ORKHON), _(UNKNOWN), _(KAITHI), _(EGYPTIAN_HIEROGLYPHS), _(UNKNOWN), _(HAN), _(HAN), _(MANDAIC), _(BATAK), _(ETHIOPIC), _(BRAHMI), _(BAMUM), _(KATAKANI_OR_HIRAGANA), _(SYMBOLS), _(SYMBOLS), _(SYMBOLS), _(SYMBOLS), _(SYMBOLS), _(HAN), _(ARABIC), _(SYMBOLS), _(CHAKMA), _(MEITEI_MAYEK), _(MEROITIC_CURSIVE), _(FULL), _(MIAO), _(SHARADA), _(SORA_SOMPENG), _(SUNDANESE), _(TAKRI), _(BASSA_VAH), _(CAUCASIAN_ALBANIAN), _(COPTIC), _(INHERITED), _(DUPLOYAN_SHORTAND), _(ELBASAN), _(SYMBOLS), _(GRANTHA), _(KHOJKI), _(KHUDAWADI), _(LATIN), _(LINEAR_A), _(MAHAJANI), _(MANICHAEAN), _(MENDE), _(MODI), _(MRO), _(MYANMAR), _(NABATAEAN), _(FULL), _(OLD_PERMIC), _(SYMBOLS), _(PAHAWH_HMONG), _(FULL), _(PAU_CIN_HAU), _(PSALTER_PAHLAVI), _(COMMON), _(SIDDHAM), _(SINHALA), _(SYMBOLS), _(TIRHUTA), _(WARANG_CITI) }; > > A./ > > On 2/14/2018 6:42 AM, Philippe Verdy via CLDR-Users wrote: > We were told the blocks cannot be split to smaller units than a single column of 16 codepoints: if any one position is assigned to a block, the remaining codepoints in that column cannot be assigned to another block... > > So unassigned positions in an allocated column should still belong to the same block and may infer a default script property from that block (which may turn to be a wrong guess only if that unassigned position gets assigned a COMMON/INHERITED script). > > Note however that some characters (notably currency signs, symbols or punctuations) sometimes get used across several scripts without necessarily being given a COMMON/INHERITED script). Most of these symbols are bidi-neutral and should do not form complex ligatures or clusters: it means you can almost safely assume some properties from the unassigned positions in these allocated columns (for exampel to tune the default behavior of a text rendering engine, if it ever has to render a character which was once unallocated may gets finally assigned and found to be mapped in some new font). > > > > 2018-02-14 3:55 GMT+01:00 Asmus Freytag via CLDR-Users : > > On 2/13/2018 6:38 PM, Martin Hosken via CLDR-Users wrote: > >> Dear All, > >>> > >>> Is there a way to get from a UBlockCode to a UScriptCode? > >>> > >>> What? Aargh! No! Surely not! I hear you cry. But hold on a second. What I'm wanting to do is to add some (not perfect) future proofing to my application. When a new character is added to a block in Unicode, one can infer the script of that character, even if the character itself is unknown, from the block. But blocks get split! Yes they do. And this isn't a perfect solution. But block splits are rare, and this solution will give me a much better chance of an unknown character being handled 'appropriately' than being sure that the run break will break and having to wait however long until the next version of Unicode is released, ICU is updated and the application updated to that version of ICU. > >>> > >>> Hence my question :) > >>> > >> Very simply count all the code points in the block that have a definite script assignment that's not COMMON/INHERITED (and not unassigned). > >> > >> If a single script far outweighs both the COMMON/INHERITED and any other scripts, then "guessing" that a new character will end up with that script assignments will give you results that are better than "random". > >> > >> And even if there is a combining mark assigned to a free spot, in many cases, whether you treat it as INHERITED or as having the script of its base character assigned to it makes no big difference (think script runs in a complex script). > >> > >> Your algorithm will detect symbol and punctuation blocks and can predict COMMON as a likely script value. > >> > >> Best thing is that for each? revision, your guesses will get better, that is, when you upgrade your application, it will improve not only assigned code points but the probabilistic guesses for some of the unassigned ones as well. > >> > >> As long as you are aware that it's a probabilistic gamble, you should be fine. > >> > >> Enjoy, > >> > >> A./ > >> > >> > >>> Yours, > >>> Martin > >>> _______________________________________________ > >>> CLDR-Users mailing list > >>> CLDR-Users at unicode.org > >>> http://unicode.org/mailman/listinfo/cldr-users > >>> > >>> > >> _______________________________________________ > >> CLDR-Users mailing list > >> CLDR-Users at unicode.org > >> http://unicode.org/mailman/listinfo/cldr-users > >> > > > > > > _______________________________________________ > > CLDR-Users mailing list > > CLDR-Users at unicode.org > > http://unicode.org/mailman/listinfo/cldr-users > > > > From cldr-users at unicode.org Fri Feb 16 12:34:14 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Fri, 16 Feb 2018 19:34:14 +0100 (CET) Subject: Additional modifiers and toggles in beta Message-ID: <1659792649.13149.1518806054787.JavaMail.www@wwinf1h28> The PRI #367-related ?Additional modifiers and toggles? are currently in beta: https://unicode.org/cldr/trac/ticket/10851#comment:9 Various prototypes are desirable for testing. A ?US-qwerty with additions? has proposed charts now up-to-date (with tooltips) and is about to be implemented for Windows and macOS: http://charupdate.info/doc/kbenintu/ Any feedback will be welcomed, e.g. about: ? what characters are expected to be most easily accessed; ? whether CapsLock and Programmer toggle should be inverted; ? mapping of the Numbers modifier (proposed on Left Alt); ? additionally remapping Backspace for convenience or not. The linked page shall be completed once both implementations are released. Regards, Marcel From cldr-users at unicode.org Wed Feb 21 15:26:06 2018 From: cldr-users at unicode.org (John Emmons via CLDR-Users) Date: Wed, 21 Feb 2018 15:26:06 -0600 Subject: Currency changes in v33 Message-ID: CLDR 33 adds one additional currency code MRU for Mauritania, replacing MRO as of 2018-01-01. Localized names have been updated accordingly. Also, the names for currencies STD/STN that were added in 32 have been updated to reflect the current dates. In English: STN = S?o Tom? & Pr?ncipe Dobra STD = S?o Tom? & Pr?ncipe Dobra (1977?2017) Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Vice Chairman IBM Globalization Team e-mail: emmo at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Feb 23 12:01:55 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Fri, 23 Feb 2018 19:01:55 +0100 (CET) Subject: PRI #367 alpha for Windows Message-ID: <1863701925.17028.1519408915410.JavaMail.www@wwinf1j01> For CLDR Users following PRI #367: A working model illustrating ticket #10851 is now available for Windows: https://unicode.org/cldr/trac/ticket/10851#comment:16 Regards, Marcel From cldr-users at unicode.org Sun Feb 25 11:20:21 2018 From: cldr-users at unicode.org (Mike Wesner via CLDR-Users) Date: Sun, 25 Feb 2018 11:20:21 -0600 Subject: cldr keyboard platform questions Message-ID: I am interested in using the CLDR Keyboards data to create a mapping of unicode characters to HID keycode data for a hardware project. The device intends to support some common keyboard layouts for some common platforms such as windows, osx, linux, iOS, Android. (with obvious restrictions, it probably wont support all possible outputs, caps lock required, or longpress or transforms) I have scripts that are successfully using osx, windows, but I have some questions about other platforms. 1. Where is the linux layout data? iOS? 2. For android, the _platform.xml is lacking a hardwareMap. How do I know what keycodes map to the ISO codes? Thank you for any assistance you can provide. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sun Feb 25 13:16:16 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Sun, 25 Feb 2018 20:16:16 +0100 Subject: cldr keyboard platform questions In-Reply-To: References: Message-ID: 1. The chromeos data includes a subset of the linux data. (Note that the iOS data is older...) 2. There isn't a hardwareMap for the android platform, since it is virtual. You could use the ISO codes to construct one. Note that we are working on extensions of the keyboard mechanism: http://www.unicode.org/review/pri367/ Mark On Sun, Feb 25, 2018 at 6:20 PM, Mike Wesner via CLDR-Users < cldr-users at unicode.org> wrote: > I am interested in using the CLDR Keyboards data to create a mapping of > unicode characters to HID keycode data for a hardware project. The device > intends to support some common keyboard layouts for some common platforms > such as windows, osx, linux, iOS, Android. (with obvious restrictions, it > probably wont support all possible outputs, caps lock required, or > longpress or transforms) > > I have scripts that are successfully using osx, windows, but I have some > questions about other platforms. > > 1. Where is the linux layout data? iOS? > > 2. For android, the _platform.xml is lacking a hardwareMap. How do I know > what keycodes map to the ISO codes? > > Thank you for any assistance you can provide. > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: