From goldsmit at apple.com Mon Feb 2 17:20:32 2015 From: goldsmit at apple.com (Deborah Goldsmith) Date: Mon, 02 Feb 2015 15:20:32 -0800 Subject: 2015a metazone update? Message-ID: <949A5656-33D3-407F-80C1-6580B2C3AF41@apple.com> Hi, IANA released the 2015a version of the TZ database on Friday. Is there a timetable yet for an associated metazone update for CLDR/ICU? Thanks, Deborah From rxaviers at gmail.com Mon Feb 2 20:14:38 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Tue, 3 Feb 2015 00:14:38 -0200 Subject: tg-Cyrl-TJ and tk-TM Message-ID: Hi friends, I cannot find tg-Cyrl-TJ and tk-TM on http://unicode.org/repos/cldr/trunk/common/main/. Are they available somewhere? Thank you in advance. PS: I can see them listed on supplemental/likelySubtags though. "tk": "tk-Latn-TM", "tg": "tg-Cyrl-TJ", "tg-Arab": "tg-Arab-PK", "tg-PK": "tg-Arab-PK", -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Mon Feb 2 21:11:37 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 3 Feb 2015 04:11:37 +0100 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References: Message-ID: Given that tg-Cyrl-TJ is the likely value for tg, it is also the likely value for tg-Cyrl. The CLDR data is available for tg, and it is then (from the likely value) the default for tg-Cyrl and for tg-Cyrl-TJ (and there's no need of additional data, except for tg-Arab, but this last one is also sufficient for tg-Arab-PK and even for tg-Arab-TJ or tg-Arab-UZ). Given that tk-Latn-TM is the likely value for tk, it is also the likely value for tk-Latn. The CLDR data is available for tk, and it is then (from the likely value) the default for tk-Latn and for tg-Latn-TM (and there's no need of additional data, except possibly for tk-Arab, but this last one is also sufficient for tk-Arab-TM and even for tk-Arab-UZ). Do you really need specialized data for tk-TM and tg-Cyrl-TJ, which are already part of the default data for tk and tg? 2015-02-03 3:14 GMT+01:00 Rafael Xavier : > Hi friends, > > I cannot find tg-Cyrl-TJ and tk-TM on > http://unicode.org/repos/cldr/trunk/common/main/. Are they available > somewhere? > > Thank you in advance. > > PS: I can see them listed on supplemental/likelySubtags though. > "tk": "tk-Latn-TM", > "tg": "tg-Cyrl-TJ", > "tg-Arab": "tg-Arab-PK", > "tg-PK": "tg-Arab-PK", > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > http://rafael.xavier.blog.br > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Mon Feb 2 21:35:58 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 3 Feb 2015 04:35:58 +0100 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References: Message-ID: Note that this is the same case for the likely value of "en", which is "en-Latn-US" (but you could argue that "en" represents just the international form of English, without the US specific jargon) : you don't need data for "en-Latn" and "en-Latn-US" or "en-US". Similar case for "zh", but with more branches : (1) Its likely value is "zh-Hani-CN", but more precisely "cmn-Hans-CN" (Mandarin being the predomnant language in the Chinese macrolanguage, and predominently written with the the simplified sinographic script variant). So with "zh" data you don't need additional data for "zh-Hani", "zh-Hani-CN", "zh-Hans", "zh-Hans-CN", "zh-CN", "cmn", "cmn-Hani", "cmn-Hani-CN", "cmn-Hans", "cmn-Hans-CN" or "cmn-CN". (2) But you can have specific data for "zh-Hant", futher specialized with additional data: * for either "zh-Hani-TW" or "zh-Hant-TW", or just "zh-TW" (given that the likely script variant in Taiwan is traditional) * for either "zh-Hani-MO" or "zh-Hant-MO", or just "zh-MO" (given that the likely script variant in Taiwan is traditional) * for either "zh-Hani-SG" or "zh-Hans-SG", or just "zh-SG" (given that the likely script variant in Taiwan is simplified) ? A lot of combinations of BCP47 subtags can be used in localization data, but CLDR data concentrates on the default for the root of all branches, and provides specialized data only for specific branches needing them (it assumes that you'll use them with the standard fallback resolution mechanism of BCP47. So you should understand how the fallback mechanism works: as soon as "likely" subtags are registered in the IANA database for BCP47 it removes the need to make many specializations for various combinations (and this is the best role of these "likely" declarations). -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Mon Feb 2 23:45:48 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Tue, 3 Feb 2015 06:45:48 +0100 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References: Message-ID: ?Nice explanation, Philippe.? One quick note: CLDR does define 'en' to have the content for 'en-Latn-US'. However, it also supplies 'en-001' ? English (World) ? which can be used for an international form of English. Mark *? Il meglio ? l?inimico del bene ?* On Tue, Feb 3, 2015 at 4:35 AM, Philippe Verdy wrote: > Note that this is the same case for the likely value of "en", which is > "en-Latn-US" (but you could argue that "en" represents just the > international form of English, without the US specific jargon) : you don't > need data for "en-Latn" and "en-Latn-US" or "en-US". > > Similar case for "zh", but with more branches : > > (1) Its likely value is "zh-Hani-CN", but more precisely "cmn-Hans-CN" > (Mandarin being the predomnant language in the Chinese macrolanguage, and > predominently written with the the simplified sinographic script variant). > So with "zh" data you don't need additional data for "zh-Hani", > "zh-Hani-CN", "zh-Hans", "zh-Hans-CN", "zh-CN", "cmn", "cmn-Hani", > "cmn-Hani-CN", "cmn-Hans", "cmn-Hans-CN" or "cmn-CN". > > (2) But you can have specific data for "zh-Hant", futher specialized with > additional data: > > * for either "zh-Hani-TW" or "zh-Hant-TW", or just "zh-TW" (given that the > likely script variant in Taiwan is traditional) > * for either "zh-Hani-MO" or "zh-Hant-MO", or just "zh-MO" (given that > the likely script variant in Taiwan is traditional) > * for either "zh-Hani-SG" or "zh-Hans-SG", or just "zh-SG" (given that > the likely script variant in Taiwan is simplified) > ? > A lot of combinations of BCP47 subtags can be used in localization data, > but CLDR data concentrates on the default for the root of all branches, and > provides specialized data only for specific branches needing them (it > assumes that you'll use them with the standard fallback resolution > mechanism of BCP47. So you should understand how the fallback mechanism > works: as soon as "likely" subtags are registered in the IANA database for > BCP47 it removes the need to make many specializations for various > combinations (and this is the best role of these "likely" declarations). > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rxaviers at gmail.com Tue Feb 3 03:52:47 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Tue, 3 Feb 2015 07:52:47 -0200 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References: Message-ID: Thank you Philippe for all your detailed explanation about default content. But, my question was a little more basic. Sorry not being explicit about it and by causing any confusion. But, I cannot find either tg or tk as a main bundle. Can someone point me to its LDML file? Thanks On Tuesday, February 3, 2015, Mark Davis ?? wrote: > ?Nice explanation, Philippe.? > > One quick note: CLDR does define 'en' to have the content for > 'en-Latn-US'. However, it also supplies 'en-001' ? English (World) ? which > can be used for an international form of English. > > Mark > > *? Il meglio ? l?inimico del bene ?* > > On Tue, Feb 3, 2015 at 4:35 AM, Philippe Verdy > wrote: > >> Note that this is the same case for the likely value of "en", which is >> "en-Latn-US" (but you could argue that "en" represents just the >> international form of English, without the US specific jargon) : you don't >> need data for "en-Latn" and "en-Latn-US" or "en-US". >> >> Similar case for "zh", but with more branches : >> >> (1) Its likely value is "zh-Hani-CN", but more precisely "cmn-Hans-CN" >> (Mandarin being the predomnant language in the Chinese macrolanguage, and >> predominently written with the the simplified sinographic script variant). >> So with "zh" data you don't need additional data for "zh-Hani", >> "zh-Hani-CN", "zh-Hans", "zh-Hans-CN", "zh-CN", "cmn", "cmn-Hani", >> "cmn-Hani-CN", "cmn-Hans", "cmn-Hans-CN" or "cmn-CN". >> >> (2) But you can have specific data for "zh-Hant", futher specialized with >> additional data: >> >> * for either "zh-Hani-TW" or "zh-Hant-TW", or just "zh-TW" (given that >> the likely script variant in Taiwan is traditional) >> * for either "zh-Hani-MO" or "zh-Hant-MO", or just "zh-MO" (given that >> the likely script variant in Taiwan is traditional) >> * for either "zh-Hani-SG" or "zh-Hans-SG", or just "zh-SG" (given that >> the likely script variant in Taiwan is simplified) >> ? >> A lot of combinations of BCP47 subtags can be used in localization data, >> but CLDR data concentrates on the default for the root of all branches, and >> provides specialized data only for specific branches needing them (it >> assumes that you'll use them with the standard fallback resolution >> mechanism of BCP47. So you should understand how the fallback mechanism >> works: as soon as "likely" subtags are registered in the IANA database for >> BCP47 it removes the need to make many specializations for various >> combinations (and this is the best role of these "likely" declarations). >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Feb 3 06:30:31 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 3 Feb 2015 13:30:31 +0100 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References: Message-ID: Sorry I repeated the word "Taiwan" 3 times in the second list where it should have been by evidence Taiwan, Macau and Singapore. 2015-02-03 4:35 GMT+01:00 Philippe Verdy : > Note that this is the same case for the likely value of "en", which is > "en-Latn-US" (but you could argue that "en" represents just the > international form of English, without the US specific jargon) : you don't > need data for "en-Latn" and "en-Latn-US" or "en-US". > > Similar case for "zh", but with more branches : > > (1) Its likely value is "zh-Hani-CN", but more precisely "cmn-Hans-CN" > (Mandarin being the predomnant language in the Chinese macrolanguage, and > predominently written with the the simplified sinographic script variant). > So with "zh" data you don't need additional data for "zh-Hani", > "zh-Hani-CN", "zh-Hans", "zh-Hans-CN", "zh-CN", "cmn", "cmn-Hani", > "cmn-Hani-CN", "cmn-Hans", "cmn-Hans-CN" or "cmn-CN". > > (2) But you can have specific data for "zh-Hant", futher specialized with > additional data: > > * for either "zh-Hani-TW" or "zh-Hant-TW", or just "zh-TW" (given that the > likely script variant in Taiwan is traditional) > * for either "zh-Hani-MO" or "zh-Hant-MO", or just "zh-MO" (given that > the likely script variant in Taiwan is traditional) > * for either "zh-Hani-SG" or "zh-Hans-SG", or just "zh-SG" (given that > the likely script variant in Taiwan is simplified) > ? > A lot of combinations of BCP47 subtags can be used in localization data, > but CLDR data concentrates on the default for the root of all branches, and > provides specialized data only for specific branches needing them (it > assumes that you'll use them with the standard fallback resolution > mechanism of BCP47. So you should understand how the fallback mechanism > works: as soon as "likely" subtags are registered in the IANA database for > BCP47 it removes the need to make many specializations for various > combinations (and this is the best role of these "likely" declarations). > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmo at us.ibm.com Tue Feb 3 08:00:00 2015 From: emmo at us.ibm.com (John Emmons) Date: Tue, 3 Feb 2015 08:00:00 -0600 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References: Message-ID: They're both in http://unicode.org/repos/cldr/trunk/seed/main/ - because we don't have enough confirmed data for these locales in order for them to be in "common". If we could find vetters that would be willing to contribute data for these via survey tool, then they would move from "seed" to "main" as part of our release process. Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Chairman IBM Software Group Internet: emmo at us.ibm.com From: Rafael Xavier To: "cldr-users at unicode.org" Cc: Sudhakar Pandey Date: 02/02/2015 08:18 PM Subject: tg-Cyrl-TJ and tk-TM Sent by: "CLDR-Users" Hi friends, I cannot find tg-Cyrl-TJ and tk-TM on http://unicode.org/repos/cldr/trunk/common/main/. Are they available somewhere? Thank you in advance. PS: I can see them listed on supplemental/likelySubtags though. "tk": "tk-Latn-TM", "tg": "tg-Cyrl-TJ", "tg-Arab": "tg-Arab-PK", "tg-PK": "tg-Arab-PK", -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Feb 3 13:28:38 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 3 Feb 2015 20:28:38 +0100 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References:

Message-ID: About this line in supplemental data: Most data I've seen in Fiji Hindi is romanized (including Fiji Hindi Wikipedia [1] and its MediaWiki UI, or its local page about the language itself [2], with very few content contributed there using the Devanagari). At least there should also be this additional line: But my opinion is that this should be: --- Philippe. [1] http://hif.wikipedia.org/ [2] http://hif.wikipedia.org/wiki/Fiji_Hindi 2015-02-03 15:00 GMT+01:00 John Emmons : > They're both in http://unicode.org/repos/cldr/trunk/seed/main/ - because > we don't have enough confirmed data for these locales in order for them to > be in "common". If we could find vetters that would be willing to > contribute data for these via survey tool, then they would move from "seed" > to "main" as part of our release process. > > > Regards, > > John C. Emmons > Globalization Architect & Unicode CLDR TC Chairman > IBM Software Group > Internet: emmo at us.ibm.com > > > > > From: Rafael Xavier > To: "cldr-users at unicode.org" > Cc: Sudhakar Pandey > Date: 02/02/2015 08:18 PM > Subject: tg-Cyrl-TJ and tk-TM > Sent by: "CLDR-Users" > ------------------------------ > > > > Hi friends, > > I cannot find tg-Cyrl-TJ and tk-TM on > *http://unicode.org/repos/cldr/trunk/common/main/* > . Are they available > somewhere? > > Thank you in advance. > > PS: I can see them listed on supplemental/likelySubtags though. > "tk": "tk-Latn-TM", > "tg": "tg-Cyrl-TJ", > "tg-Arab": "tg-Arab-PK", > "tg-PK": "tg-Arab-PK", > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > *http://rafael.xavier.blog.br* > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rxaviers at gmail.com Tue Feb 3 13:29:44 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Tue, 3 Feb 2015 17:29:44 -0200 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References:

Message-ID: Oh, ok! Thank you very much John. A related question, is that data supposed to be in the JSON full package? I don't see them there, so I'm assuming not. On Tue, Feb 3, 2015 at 12:00 PM, John Emmons wrote: > They're both in http://unicode.org/repos/cldr/trunk/seed/main/ - because > we don't have enough confirmed data for these locales in order for them to > be in "common". If we could find vetters that would be willing to > contribute data for these via survey tool, then they would move from "seed" > to "main" as part of our release process. > > > Regards, > > John C. Emmons > Globalization Architect & Unicode CLDR TC Chairman > IBM Software Group > Internet: emmo at us.ibm.com > > > > > From: Rafael Xavier > To: "cldr-users at unicode.org" > Cc: Sudhakar Pandey > Date: 02/02/2015 08:18 PM > Subject: tg-Cyrl-TJ and tk-TM > Sent by: "CLDR-Users" > ------------------------------ > > > > Hi friends, > > I cannot find tg-Cyrl-TJ and tk-TM on > *http://unicode.org/repos/cldr/trunk/common/main/* > . Are they available > somewhere? > > Thank you in advance. > > PS: I can see them listed on supplemental/likelySubtags though. > "tk": "tk-Latn-TM", > "tg": "tg-Cyrl-TJ", > "tg-Arab": "tg-Arab-PK", > "tg-PK": "tg-Arab-PK", > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > *http://rafael.xavier.blog.br* > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Tue Feb 3 13:31:41 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Tue, 3 Feb 2015 20:31:41 +0100 Subject: tg-Cyrl-TJ and tk-TM In-Reply-To: References:

Message-ID: Please file a ticket about this. Mark *? Il meglio ? l?inimico del bene ?* On Tue, Feb 3, 2015 at 8:28 PM, Philippe Verdy wrote: > About this line in supplemental data: > > > > Most data I've seen in Fiji Hindi is romanized (including Fiji Hindi > Wikipedia [1] and its MediaWiki UI, or its local page about the language > itself [2], with very few content contributed there using the Devanagari). > At least there should also be this additional line: > > > > But my opinion is that this should be: > > > > > --- Philippe. > > [1] http://hif.wikipedia.org/ > [2] http://hif.wikipedia.org/wiki/Fiji_Hindi > > 2015-02-03 15:00 GMT+01:00 John Emmons : > >> They're both in http://unicode.org/repos/cldr/trunk/seed/main/ - because >> we don't have enough confirmed data for these locales in order for them to >> be in "common". If we could find vetters that would be willing to >> contribute data for these via survey tool, then they would move from "seed" >> to "main" as part of our release process. >> >> >> Regards, >> >> John C. Emmons >> Globalization Architect & Unicode CLDR TC Chairman >> IBM Software Group >> Internet: emmo at us.ibm.com >> >> >> >> >> From: Rafael Xavier >> To: "cldr-users at unicode.org" >> Cc: Sudhakar Pandey >> Date: 02/02/2015 08:18 PM >> Subject: tg-Cyrl-TJ and tk-TM >> Sent by: "CLDR-Users" >> ------------------------------ >> >> >> >> Hi friends, >> >> I cannot find tg-Cyrl-TJ and tk-TM on >> *http://unicode.org/repos/cldr/trunk/common/main/* >> . Are they available >> somewhere? >> >> Thank you in advance. >> >> PS: I can see them listed on supplemental/likelySubtags though. >> "tk": "tk-Latn-TM", >> "tg": "tg-Cyrl-TJ", >> "tg-Arab": "tg-Arab-PK", >> "tg-PK": "tg-Arab-PK", >> >> >> -- >> +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers >> *http://rafael.xavier.blog.br* >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Tue Feb 3 16:50:52 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Tue, 3 Feb 2015 14:50:52 -0800 Subject: 2015a metazone update? In-Reply-To: <949A5656-33D3-407F-80C1-6580B2C3AF41@apple.com> References: <949A5656-33D3-407F-80C1-6580B2C3AF41@apple.com> Message-ID: Yoshito said on another list that he is working on it now. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Wed Feb 4 11:58:12 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Wed, 4 Feb 2015 09:58:12 -0800 Subject: Hyphenation Message-ID: Hey cldr-users, It is often the case, especially on smaller screens, that long words must be hyphenated so they wrap in a natural way. As far as I can tell, the CLDR data set does not define hyphenation rules. I'm not even really sure what the hyphenation rules should be for English. The implementation I've seen uses a dictionary - maybe it's identifying potential breaks at syllable boundaries? Thoughts? -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkorpela at cs.tut.fi Wed Feb 4 12:57:38 2015 From: jkorpela at cs.tut.fi (Jukka K. Korpela) Date: Wed, 04 Feb 2015 20:57:38 +0200 Subject: Hyphenation In-Reply-To: References: Message-ID: <54D26BA2.4070909@cs.tut.fi> 2015-02-04, 19:58, Cameron Dutro wrote: > It is often the case, especially on smaller screens, that long words > must be hyphenated so they wrap in a natural way. As far as I can tell, > the CLDR data set does not define hyphenation rules. That is correct. And they cannot really be described using the techniques currently deployed in CLDR. > I'm not even really > sure what the hyphenation rules should be for English. They vary by version of English (and by authority). > The implementation I've seen uses a dictionary - maybe it's identifying > potential breaks at syllable boundaries? Some simple hyphenators are dictionary-driven. But this does not work well even for English, since any word not in the dictionary would remain unhyphenated. It does not work well at all for languages that have, say, a thousand inflected forms for each verb or noun ? but may have simple algorithmic rules for hyphenation. Hyphenation strategies vary greatly by language. At present, the best you can do is to try to find suitable hyphenation software for the languages that are relevant to you. Yucca From cameron at lumoslabs.com Wed Feb 4 13:24:32 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Wed, 4 Feb 2015 11:24:32 -0800 Subject: Hyphenation In-Reply-To: <54D26BA2.4070909@cs.tut.fi> References: <54D26BA2.4070909@cs.tut.fi> Message-ID: Thanks Jukka. I did some research and found out that LibreOffice (and OpenOffice) uses a dictionary-based approach via the Hunspell project. They have dictionaries for quite a few languages. Hunspell and TeX use an algorithm developed at Stanford in a dissertation by Franklin Liang that describes the format of such dictionaries and how to identify potential hyphen locations in text. I realize this won't work for all non-dictionary words, but Liang's algorithm purportedly does work for a great many of them. I've attached a .pdf summary of how it works. Anyway, it's a place to start. CLDR could perhaps incorporate the hyphenation dictionaries from LibreOffice since I believe they're fairly permissively licensed. -Cameron On Wed, Feb 4, 2015 at 10:57 AM, Jukka K. Korpela wrote: > 2015-02-04, 19:58, Cameron Dutro wrote: > > It is often the case, especially on smaller screens, that long words >> must be hyphenated so they wrap in a natural way. As far as I can tell, >> the CLDR data set does not define hyphenation rules. >> > > That is correct. And they cannot really be described using the techniques > currently deployed in CLDR. > > I'm not even really >> sure what the hyphenation rules should be for English. >> > > They vary by version of English (and by authority). > > The implementation I've seen uses a dictionary - maybe it's identifying >> potential breaks at syllable boundaries? >> > > Some simple hyphenators are dictionary-driven. But this does not work well > even for English, since any word not in the dictionary would remain > unhyphenated. It does not work well at all for languages that have, say, a > thousand inflected forms for each verb or noun ? but may have simple > algorithmic rules for hyphenation. > > Hyphenation strategies vary greatly by language. At present, the best you > can do is to try to find suitable hyphenation software for the languages > that are relevant to you. > > Yucca > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tb87nemeth.pdf Type: application/pdf Size: 166735 bytes Desc: not available URL: From georges at mhsoftware.com Thu Feb 5 20:26:57 2015 From: georges at mhsoftware.com (George Sexton) Date: Thu, 05 Feb 2015 19:26:57 -0700 Subject: LDML data for en_IE Message-ID: <54D42671.9060106@mhsoftware.com> I'm looking at the LDML data for common/main/en_IE.xml. In this file, in the gregorian section there is only a full date format entry. As documented somewhat ironically in section 4 of Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML), a lookup for dateFormatLength short should follow inheritance. Thus the value would come from en.xml, which would be: M/d/yy However examining the JSON file of cldr data, main/en-IE/ca-gregorian.js, it contains: "short": "dd/MM/y" I've also had a person who is a native of that country inform me that M/d/yy is not correct. Can someone help me understand why the LDML data implicitly contains (to my understanding) an incorrect definition of the short date format for the en-IE locale? -- George Sexton *MH Software, Inc.* Voice: 303 438 9585 http://www.mhsoftware.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rxaviers at gmail.com Fri Feb 6 09:59:20 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Fri, 6 Feb 2015 13:59:20 -0200 Subject: LDML data for en_IE In-Reply-To: <54D42671.9060106@mhsoftware.com> References: <54D42671.9060106@mhsoftware.com> Message-ID: > > Thus the value would come from en.xml, which would be: Shouldn't it be en_GB.xml, which is its parent locale? On Fri, Feb 6, 2015 at 12:26 AM, George Sexton wrote: > I'm looking at the LDML data for common/main/en_IE.xml. In this file, in > the gregorian section there is only a full date format entry. > > As documented somewhat ironically in section 4 of Unicode Technical > Standard #35 Unicode Locale Data Markup Language (LDML), a lookup for > dateFormatLength short should follow inheritance. Thus the value would come > from en.xml, which would be: > > > M/d/yy > > > > However examining the JSON file of cldr data, main/en-IE/ca-gregorian.js, > it contains: > > "short": "dd/MM/y" > > I've also had a person who is a native of that country inform me that > M/d/yy is not correct. > > Can someone help me understand why the LDML data implicitly contains (to > my understanding) an incorrect definition of the short date format for the > en-IE locale? > > > > -- > George Sexton > *MH Software, Inc.* > Voice: 303 438 9585 > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From georges at mhsoftware.com Fri Feb 6 10:21:03 2015 From: georges at mhsoftware.com (George Sexton) Date: Fri, 06 Feb 2015 09:21:03 -0700 Subject: LDML data for en_IE In-Reply-To: References: <54D42671.9060106@mhsoftware.com> Message-ID: <54D4E9EF.8000101@mhsoftware.com> On 2/6/2015 8:59 AM, Rafael Xavier wrote: > > Thus the value would come from en.xml, which would be: > > > Shouldn't it be en_GB.xml, which is its parent locale? Gosh, I looked through the en_IE.xml file and there's no parentLocale element in the file? Surely, the standard is better than to have some useful inheritance data that's required squirreled away in some uselessly named file like "supplementalData.xml" in an entirely different directory. Seriously, parentLocale should be part of the identity block in the common/main/ll_CC.xml file. Not having it there is silly. However, it would appear you're right. > > On Fri, Feb 6, 2015 at 12:26 AM, George Sexton > wrote: > > I'm looking at the LDML data for common/main/en_IE.xml. In this > file, in the gregorian section there is only a full date format entry. > > As documented somewhat ironically in section 4 of Unicode > Technical Standard #35 Unicode Locale Data Markup Language (LDML), > a lookup for dateFormatLength short should follow inheritance. > Thus the value would come from en.xml, which would be: > > > > M/d/yy > > > > > However examining the JSON file of cldr data, > main/en-IE/ca-gregorian.js, it contains: > > "short": "dd/MM/y" > > I've also had a person who is a native of that country inform me > that M/d/yy is not correct. > > Can someone help me understand why the LDML data implicitly > contains (to my understanding) an incorrect definition of the > short date format for the en-IE locale? > > > > -- > George Sexton > *MH Software, Inc.* > Voice: 303 438 9585 > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > http://rafael.xavier.blog.br > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -- George Sexton *MH Software, Inc.* Voice: 303 438 9585 http://www.mhsoftware.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Feb 14 09:11:42 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 14 Feb 2015 16:11:42 +0100 Subject: LDML data for en_IE In-Reply-To: <54D4E9EF.8000101@mhsoftware.com> References: <54D42671.9060106@mhsoftware.com> <54D4E9EF.8000101@mhsoftware.com> Message-ID: The custom inheritance of "en-IE" from "en-GB" instead of "en", is a bit questionable It may look convenint only for the current needs within CLDR data itelf, but it is an exception to the default inheritance from "en" that one would expect for more general data. I fear that inserting the inheritance of "en-IE" first via "en-GB" before "en", this could generate unexpected issues in other applications that have highly customized their own "en-GB" data (outside CLDR data) in a way not compatible with "en-IE". If there a way in CLDR data to indicate that this custom inheritance is purely internal to CLDR data and that it does not apply as a standard for all kind of data that applications could need to remain separated for "en-GB" and "en-IE" (as described and assumed *by default* in standard BCP.47 fallback resolution mechanism) If there's no way to indicate that this is a purely internal inheritance for CLDR data itsefl, we should better duplicate the necessary data entries from "en-GB" into "en-IE" and maintain them separately (both witll still inherit by default from 'en", and "en" itself from "root"). This is safer for longer term maintenance even if there is some data duplication (but most duplication is already avoided by the data already inherited by "en-GB" from "en" and by the data inherited by "en" from "root"). At least, the duplication also allows saying that instead of being inherited (so with a local draft status), that data is "confirmed" in that locale (but instead of duplicating the data value, we would just insert the entry needed only to confirm that the value in that specialization comes from another referenced locale). So in the top level element of the "en-IE" locale: and for a specific entry: M/d/yy Or something similar (for completeness only, I added above the entries for status="default" but it should be implicit with BCP47 rules and is not really needed). The idea being to be able to track with high level of granualrity (not just for the whole locale) the confirmation status and maintain alternate proposals in "unconfirmed" status than the one with "draft" status (still not confirmed formally but having the best votes for now : applications may decide to discard "unconfirmed" entries, or could use it only as alternate solutions when there's no succes with normal entries with implicit cofirmed status or with default status, for example when trying to parse dates with a lenient parser; a strict date input parser would always reject input not matching the implicit "confirmed" format or the "default" format). 2015-02-06 17:21 GMT+01:00 George Sexton : > > On 2/6/2015 8:59 AM, Rafael Xavier wrote: > > Thus the value would come from en.xml, which would be: > > > Shouldn't it be en_GB.xml, which is its parent locale? > > > Gosh, I looked through the en_IE.xml file and there's no parentLocale > element in the file? Surely, the standard is better than to have some > useful inheritance data that's required squirreled away in some uselessly > named file like "supplementalData.xml" in an entirely different directory. > > Seriously, parentLocale should be part of the identity block in the > common/main/ll_CC.xml file. Not having it there is silly. > > However, it would appear you're right. > > > > On Fri, Feb 6, 2015 at 12:26 AM, George Sexton > wrote: > >> I'm looking at the LDML data for common/main/en_IE.xml. In this file, >> in the gregorian section there is only a full date format entry. >> >> As documented somewhat ironically in section 4 of Unicode Technical >> Standard #35 Unicode Locale Data Markup Language (LDML), a lookup for >> dateFormatLength short should follow inheritance. Thus the value would come >> from en.xml, which would be: >> >> >> M/d/yy >> >> >> >> However examining the JSON file of cldr data, main/en-IE/ca-gregorian.js, >> it contains: >> >> "short": "dd/MM/y" >> >> I've also had a person who is a native of that country inform me that >> M/d/yy is not correct. >> >> Can someone help me understand why the LDML data implicitly contains (to >> my understanding) an incorrect definition of the short date format for the >> en-IE locale? >> >> >> >> -- >> George Sexton >> *MH Software, Inc.* >> Voice: 303 438 9585 >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > http://rafael.xavier.blog.br > > > _______________________________________________ > CLDR-Users mailing listCLDR-Users at unicode.orghttp://unicode.org/mailman/listinfo/cldr-users > > > -- > George Sexton > *MH Software, Inc.* > Voice: 303 438 9585 > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Feb 14 09:23:02 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 14 Feb 2015 16:23:02 +0100 Subject: LDML data for en_IE In-Reply-To: References: <54D42671.9060106@mhsoftware.com> <54D4E9EF.8000101@mhsoftware.com> Message-ID: Some other examples with "'confirmed' (default status) inheritances : - in the "en" locale: or equivalently: (we don't need the default inheritance but if for completeness we want it, add: - in the "en-UM" locale (spoecialization for Minor Outlying Islands of the United States): (you'll need very few exceptions most possibly only for applicable timezones or for the ITU international region selection code for phone, and phone number format, but "en-US" itself des not have a single timezone) 2015-02-14 16:11 GMT+01:00 Philippe Verdy : > The custom inheritance of "en-IE" from "en-GB" instead of "en", is a bit > questionable > It may look convenint only for the current needs within CLDR data itelf, > but it is an exception to the default inheritance from "en" that one would > expect for more general data. > I fear that inserting the inheritance of "en-IE" first via "en-GB" before > "en", this could generate unexpected issues in other applications that have > highly customized their own "en-GB" data (outside CLDR data) in a way not > compatible with "en-IE". > If there a way in CLDR data to indicate that this custom inheritance is > purely internal to CLDR data and that it does not apply as a standard for > all kind of data that applications could need to remain separated for > "en-GB" and "en-IE" (as described and assumed *by default* in standard > BCP.47 fallback resolution mechanism) > > If there's no way to indicate that this is a purely internal inheritance > for CLDR data itsefl, we should better duplicate the necessary data entries > from "en-GB" into "en-IE" and maintain them separately (both witll still > inherit by default from 'en", and "en" itself from "root"). This is safer > for longer term maintenance even if there is some data duplication (but > most duplication is already avoided by the data already inherited by > "en-GB" from "en" and by the data inherited by "en" from "root"). > > At least, the duplication also allows saying that instead of being > inherited (so with a local draft status), that data is "confirmed" in that > locale (but instead of duplicating the data value, we would just insert the > entry needed only to confirm that the value in that specialization comes > from another referenced locale). > > So in the top level element of the "en-IE" locale: > > > > > and for a specific entry: > > > > > > M/d/yy > > > > Or something similar (for completeness only, I added above the entries for > status="default" but it should be implicit with BCP47 rules and is not > really needed). > The idea being to be able to track with high level of granualrity (not > just for the whole locale) the confirmation status and maintain alternate > proposals in "unconfirmed" status than the one with "draft" status (still > not confirmed formally but having the best votes for now : applications may > decide to discard "unconfirmed" entries, or could use it only as alternate > solutions when there's no succes with normal entries with implicit cofirmed > status or with default status, for example when trying to parse dates with > a lenient parser; a strict date input parser would always reject input not > matching the implicit "confirmed" format or the "default" format). > > > 2015-02-06 17:21 GMT+01:00 George Sexton : > >> >> On 2/6/2015 8:59 AM, Rafael Xavier wrote: >> >> Thus the value would come from en.xml, which would be: >> >> >> Shouldn't it be en_GB.xml, which is its parent locale? >> >> >> Gosh, I looked through the en_IE.xml file and there's no parentLocale >> element in the file? Surely, the standard is better than to have some >> useful inheritance data that's required squirreled away in some uselessly >> named file like "supplementalData.xml" in an entirely different directory. >> >> Seriously, parentLocale should be part of the identity block in the >> common/main/ll_CC.xml file. Not having it there is silly. >> >> However, it would appear you're right. >> >> >> >> On Fri, Feb 6, 2015 at 12:26 AM, George Sexton >> wrote: >> >>> I'm looking at the LDML data for common/main/en_IE.xml. In this file, >>> in the gregorian section there is only a full date format entry. >>> >>> As documented somewhat ironically in section 4 of Unicode Technical >>> Standard #35 Unicode Locale Data Markup Language (LDML), a lookup for >>> dateFormatLength short should follow inheritance. Thus the value would come >>> from en.xml, which would be: >>> >>> >>> M/d/yy >>> >>> >>> >>> However examining the JSON file of cldr data, >>> main/en-IE/ca-gregorian.js, it contains: >>> >>> "short": "dd/MM/y" >>> >>> I've also had a person who is a native of that country inform me that >>> M/d/yy is not correct. >>> >>> Can someone help me understand why the LDML data implicitly contains (to >>> my understanding) an incorrect definition of the short date format for the >>> en-IE locale? >>> >>> >>> >>> -- >>> George Sexton >>> *MH Software, Inc.* >>> Voice: 303 438 9585 >>> http://www.mhsoftware.com >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >> >> >> -- >> +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers >> http://rafael.xavier.blog.br >> >> >> _______________________________________________ >> CLDR-Users mailing listCLDR-Users at unicode.orghttp://unicode.org/mailman/listinfo/cldr-users >> >> >> -- >> George Sexton >> *MH Software, Inc.* >> Voice: 303 438 9585 >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sat Feb 14 09:38:26 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 14 Feb 2015 16:38:26 +0100 Subject: LDML data for en_IE In-Reply-To: <54D4E9EF.8000101@mhsoftware.com> References: <54D42671.9060106@mhsoftware.com> <54D4E9EF.8000101@mhsoftware.com> Message-ID: The way that CLDR is structured, you can't assume that the only data relevant for a locale, or for the interpretation of that locale's data, is in that locale's common/main XML file. You really have to look at the other directories in common/*. That being said, we could probably do a better job of explaining the structure in LDML, and it might be worth adding an XML comment to the top of each of the data files to point back to that, to help prevent misunderstandings. If you agree, you might file a ticket. Mark *? Il meglio ? l?inimico del bene ?* On Fri, Feb 6, 2015 at 5:21 PM, George Sexton wrote: > > On 2/6/2015 8:59 AM, Rafael Xavier wrote: > > Thus the value would come from en.xml, which would be: > > > Shouldn't it be en_GB.xml, which is its parent locale? > > > Gosh, I looked through the en_IE.xml file and there's no parentLocale > element in the file? Surely, the standard is better than to have some > useful inheritance data that's required squirreled away in some uselessly > named file like "supplementalData.xml" in an entirely different directory. > > Seriously, parentLocale should be part of the identity block in the > common/main/ll_CC.xml file. Not having it there is silly. > > However, it would appear you're right. > > > > On Fri, Feb 6, 2015 at 12:26 AM, George Sexton > wrote: > >> I'm looking at the LDML data for common/main/en_IE.xml. In this file, >> in the gregorian section there is only a full date format entry. >> >> As documented somewhat ironically in section 4 of Unicode Technical >> Standard #35 Unicode Locale Data Markup Language (LDML), a lookup for >> dateFormatLength short should follow inheritance. Thus the value would come >> from en.xml, which would be: >> >> >> M/d/yy >> >> >> >> However examining the JSON file of cldr data, main/en-IE/ca-gregorian.js, >> it contains: >> >> "short": "dd/MM/y" >> >> I've also had a person who is a native of that country inform me that >> M/d/yy is not correct. >> >> Can someone help me understand why the LDML data implicitly contains (to >> my understanding) an incorrect definition of the short date format for the >> en-IE locale? >> >> >> >> -- >> George Sexton >> *MH Software, Inc.* >> Voice: 303 438 9585 >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > http://rafael.xavier.blog.br > > > _______________________________________________ > CLDR-Users mailing listCLDR-Users at unicode.orghttp://unicode.org/mailman/listinfo/cldr-users > > > -- > George Sexton > *MH Software, Inc.* > Voice: 303 438 9585 > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sat Feb 14 09:39:50 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 14 Feb 2015 16:39:50 +0100 Subject: LDML data for en_IE In-Reply-To: References: <54D42671.9060106@mhsoftware.com> <54D4E9EF.8000101@mhsoftware.com> Message-ID: CLDR quite deliberately has the inheritance it does for IE and GB (although we are tweaking that in the current release; look at trunk). These are far more related, both for CLDR and other translation work, than either is with US. Mark *? Il meglio ? l?inimico del bene ?* On Sat, Feb 14, 2015 at 4:11 PM, Philippe Verdy wrote: > The custom inheritance of "en-IE" from "en-GB" instead of "en", is a bit > questionable > It may look convenint only for the current needs within CLDR data itelf, > but it is an exception to the default inheritance from "en" that one would > expect for more general data. > I fear that inserting the inheritance of "en-IE" first via "en-GB" before > "en", this could generate unexpected issues in other applications that have > highly customized their own "en-GB" data (outside CLDR data) in a way not > compatible with "en-IE". > If there a way in CLDR data to indicate that this custom inheritance is > purely internal to CLDR data and that it does not apply as a standard for > all kind of data that applications could need to remain separated for > "en-GB" and "en-IE" (as described and assumed *by default* in standard > BCP.47 fallback resolution mechanism) > > If there's no way to indicate that this is a purely internal inheritance > for CLDR data itsefl, we should better duplicate the necessary data entries > from "en-GB" into "en-IE" and maintain them separately (both witll still > inherit by default from 'en", and "en" itself from "root"). This is safer > for longer term maintenance even if there is some data duplication (but > most duplication is already avoided by the data already inherited by > "en-GB" from "en" and by the data inherited by "en" from "root"). > > At least, the duplication also allows saying that instead of being > inherited (so with a local draft status), that data is "confirmed" in that > locale (but instead of duplicating the data value, we would just insert the > entry needed only to confirm that the value in that specialization comes > from another referenced locale). > > So in the top level element of the "en-IE" locale: > > > > > and for a specific entry: > > > > > > M/d/yy > > > > Or something similar (for completeness only, I added above the entries for > status="default" but it should be implicit with BCP47 rules and is not > really needed). > The idea being to be able to track with high level of granualrity (not > just for the whole locale) the confirmation status and maintain alternate > proposals in "unconfirmed" status than the one with "draft" status (still > not confirmed formally but having the best votes for now : applications may > decide to discard "unconfirmed" entries, or could use it only as alternate > solutions when there's no succes with normal entries with implicit cofirmed > status or with default status, for example when trying to parse dates with > a lenient parser; a strict date input parser would always reject input not > matching the implicit "confirmed" format or the "default" format). > > > 2015-02-06 17:21 GMT+01:00 George Sexton : > >> >> On 2/6/2015 8:59 AM, Rafael Xavier wrote: >> >> Thus the value would come from en.xml, which would be: >> >> >> Shouldn't it be en_GB.xml, which is its parent locale? >> >> >> Gosh, I looked through the en_IE.xml file and there's no parentLocale >> element in the file? Surely, the standard is better than to have some >> useful inheritance data that's required squirreled away in some uselessly >> named file like "supplementalData.xml" in an entirely different directory. >> >> Seriously, parentLocale should be part of the identity block in the >> common/main/ll_CC.xml file. Not having it there is silly. >> >> However, it would appear you're right. >> >> >> >> On Fri, Feb 6, 2015 at 12:26 AM, George Sexton >> wrote: >> >>> I'm looking at the LDML data for common/main/en_IE.xml. In this file, >>> in the gregorian section there is only a full date format entry. >>> >>> As documented somewhat ironically in section 4 of Unicode Technical >>> Standard #35 Unicode Locale Data Markup Language (LDML), a lookup for >>> dateFormatLength short should follow inheritance. Thus the value would come >>> from en.xml, which would be: >>> >>> >>> M/d/yy >>> >>> >>> >>> However examining the JSON file of cldr data, >>> main/en-IE/ca-gregorian.js, it contains: >>> >>> "short": "dd/MM/y" >>> >>> I've also had a person who is a native of that country inform me that >>> M/d/yy is not correct. >>> >>> Can someone help me understand why the LDML data implicitly contains (to >>> my understanding) an incorrect definition of the short date format for the >>> en-IE locale? >>> >>> >>> >>> -- >>> George Sexton >>> *MH Software, Inc.* >>> Voice: 303 438 9585 >>> http://www.mhsoftware.com >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >> >> >> -- >> +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers >> http://rafael.xavier.blog.br >> >> >> _______________________________________________ >> CLDR-Users mailing listCLDR-Users at unicode.orghttp://unicode.org/mailman/listinfo/cldr-users >> >> >> -- >> George Sexton >> *MH Software, Inc.* >> Voice: 303 438 9585 >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rxaviers at gmail.com Mon Feb 16 10:44:44 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Mon, 16 Feb 2015 14:44:44 -0200 Subject: Bundle Lookup In-Reply-To: References:

<548A47BE.5080900@gmail.com>

Message-ID: For the record, Language Matching documentation improvements coming up. "... To make it clear that the recommended methodology for Bundle lookup is to use Language Matching": http://www.unicode.org/cldr/trac/ticket/8067 On Wed, Jan 14, 2015 at 2:04 PM, Rafael Xavier wrote: > Hello everyone, > > It's clear to me there are docs improvements coming up. But, giving the > fact I'm still digging into it (testing LangageMatching as suggested for > bundle lookup matcher), I would like to share my findings with you. It's > still a draft. But, it has a suggestion that I think it would suite better > than LanguageMatching for bundle lookup matcher purposes. > > > https://docs.google.com/document/d/1qLbuz659VvCVhgyd08KRP0SMuqCvK9bSS3-0W-kMuuw/edit?usp=sharing > > On Fri, Dec 12, 2014 at 7:31 PM, Rafael Xavier wrote: > >> Looking forward to hearing how that shall work. >> >> Thank you very much so far. >> >> On Fri, Dec 12, 2014 at 6:27 PM, Mark Davis [image: ?]? < >> mark at macchiato.com> wrote: >>> >>> >>> >>> >>> Mark >>> >>> *? Il meglio ? l?inimico del bene ?* >>> >>> On Fri, Dec 12, 2014 at 7:48 PM, Rafael Xavier >>> wrote: >>> >>>> Mark, >>>> >>>> Giving an arbitrary locale ID, the recommended and only process to >>>> deduce its respective bundle (reliably) is through Language Matching. >>>> >>>> Is that true? >>>> >>> >>> ?As I said: " >>> That being said, often people don't understand language matching, and so >>> we are in the process of adding more information so that there is a direct >>> mapping from between locale IDs that are always considered to be >>> "identical" on a deep level, like en-GB and en-Latn-GB. >>> ?"? >>> ? >>> >>> >>>> >>>> Considering all bundles are always present, isn't there any less >>>> expensive algorithm that could be recommended? >>>> >>>> Thank you. >>>> >>>> >>>> PS: My use case is a little different. I have *n* distributions of my >>>> application. On each distribution, it's embedded with a different locale. >>>> So, I don't need the full power of Language Matching on what's regard >>>> having an arbitrary list of desired locales vs an aribtrary list of >>>> available locales. Anyway, I do want my application to look up for the >>>> right bundle given a locale (e.g., `zh-Hans-TW` when given `zh-TW`). >>>> >>>> On Fri, Dec 12, 2014 at 2:50 PM, Mark Davis [image: ?]? < >>>> mark at macchiato.com> wrote: >>>>> >>>>> I also want to be clear that there are two closely-related but very >>>>> different tasks. >>>>> >>>>> 1. *Inherited item lookup. *Given that you have a CLDR resource >>>>> bundle, with inheritance, where do I go to get inherited items? >>>>> >>>>> That is specified by CLDR by means of the parentLocale + truncation >>>>> algorithm, plus the alias element. (There are a few cases where we have >>>>> "Lateral Inheritance" where the specification is in the text of LDML, >>>>> such as when looking for an alt variant.) >>>>> >>>>> So back to Rafael's original question: >>>>> >>>>> 1. en-Latn-GB, and zh-TW are not CLDR bundles, so this doesn't >>>>> apply to them. >>>>> 2. en-US-u-nu-usd: the u-nu-usd doesn't select within a bundle, >>>>> but rather customizes a service that uses information in the bundle. The >>>>> item lookup (using by the currency formatting service) would be en-US >>>>> => en => root. >>>>> >>>>> >>>>> 2. *Bundle lookup. *Given a locale ID, where do I get the best >>>>> matching CLDR bundle? >>>>> >>>>> My application has a set of supported locales, and the user comes in >>>>> with a set of desired locales. What is the best bundle for that user? >>>>> >>>>> Here we are not as clear as we should be. The recommended process is in >>>>> http://www.unicode.org/reports/tr35/#LanguageMatching >>>>> >>>>> So back to Rafael's original question: >>>>> >>>>> 1. en-Latn-GB, and zh-TW. When these are looked up with Language >>>>> Matching, assuming that all the CLDR locales are available, they would >>>>> return, respectively, en-GB and zh-Hant-TW. >>>>> >>>>> That being said, often people don't understand language matching, and >>>>> so we are in the process of adding more information so that there is a >>>>> direct mapping from between locale IDs that are always considered to >>>>> be "identical" on a deep level, like en-GB and en-Latn-GB. >>>>> >>>>> >>>>> >>>>> Mark >>>>> >>>>> *? Il meglio ? l?inimico del bene ?* >>>>> >>>>> On Fri, Dec 12, 2014 at 5:04 PM, John Emmons wrote: >>>>> >>>>>> Yes, Edward, there is a very good reason we don't want zh-Hant to >>>>>> inherit from zh. Simply put, in situations where you have locale resources >>>>>> that aren't 100% populated, allowing zh-Hant to inherit from zh produces a >>>>>> mixture of simplified and traditional Chinese, which is acceptable to no >>>>>> one. This is what we call "cross script inheritance" in CLDR. While it >>>>>> might be acceptable to some in the case of Chinese, it is certainly a >>>>>> bigger problem in languages like Serbian, where you have both Latin and >>>>>> Cyrillic scripts in use, and you certainly don't ever want a mixture of >>>>>> Latin and Cyrillic scripts >>>>>> >>>>>> These relationships are documented in CLDR's supplemental data, where >>>>>> you have specified: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> John C. Emmons >>>>>> Globalization Architect & Unicode CLDR TC Chairman >>>>>> IBM Software Group >>>>>> Internet: emmo at us.ibm.com >>>>>> >>>>>> >>>>>> [image: Inactive hide details for Edwin Hoogerbeets ---12/11/2014 >>>>>> 07:41:26 PM---Rafael, also take a look at common/supplemental/likelyS]Edwin >>>>>> Hoogerbeets ---12/11/2014 07:41:26 PM---Rafael, also take a look at >>>>>> common/supplemental/likelySubtags.xml. If the caller has passed you an i >>>>>> >>>>>> From: Edwin Hoogerbeets >>>>>> To: John Emmons/Austin/IBM at IBMUS, Rafael Xavier >>>>>> Cc: J?rn Zaefferer , " >>>>>> cldr-users at unicode.org" >>>>>> Date: 12/11/2014 07:41 PM >>>>>> Subject: Re: Bundle Lookup >>>>>> ------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> Rafael, also take a look at common/supplemental/likelySubtags.xml. If >>>>>> the caller has passed you an incompletely specified locale, you can use >>>>>> those mappings to see if you can get to a locale for which you do have a >>>>>> string bundle. I think that is the source for the "language aliases" to >>>>>> which John was referring. >>>>>> >>>>>> John, for the last part of your example zh-TW inheritance chain, >>>>>> wouldn't you just truncate "zh-Hant" again to "zh" like in the en-GB >>>>>> example before inheriting from the root? If not, what is the reasoning >>>>>> there? Is there already a document that specifies the inheritance rules in >>>>>> CLDR? >>>>>> >>>>>> For efficiency, I can imagine you would put the common translations >>>>>> in "zh" where there is no difference between traditional and simplified, >>>>>> and other translations in "zh-Hant" or "zh-Hans" where there is. That would >>>>>> save some disk space and you could leverage linguistic bug fixes at the >>>>>> "zh" level. For other locales like "sr-Latn" and "sr-Cyrl" there would be >>>>>> nothing in common so the string bundle at the "sr" level would be >>>>>> essentially empty, but it should still appear in the inheritance chain just >>>>>> in case. >>>>>> >>>>>> Edwin >>>>>> >>>>>> >>>>>> On 12/11/2014 02:53 PM, John Emmons wrote: >>>>>> >>>>>> >>>>>> #3 is currently a problem, which we are working on. Basically, >>>>>> "Latn" needs to be stripped out because it isn't necessary. Then follow >>>>>> the normal inheritance: >>>>>> >>>>>> en-GB: en-GB ? (parentLocale) en-001 ? (truncation) en ? root >>>>>> >>>>>> #4 - Any unicode locale extensions are meant to identify >>>>>> particular behaviors that are desired in the context of a given locale. >>>>>> Think of them like "options". They are not meant to be used in the context >>>>>> of bundle lookups. >>>>>> >>>>>> #5 - zh_TW - Now that proper language aliases are in place ( See >>>>>> *http://unicode.org/cldr/trac/ticket/5949* >>>>>> ) >>>>>> >>>>>> zh-TW: zh-TW ? (languageAlias) zh-Hant-TW ? (truncation) zh-Hant >>>>>> (parentLocale) ? root >>>>>> >>>>>> Regards, >>>>>> >>>>>> John C. Emmons >>>>>> Globalization Architect & Unicode CLDR TC Chairman >>>>>> IBM Software Group >>>>>> Internet: *emmo at us.ibm.com* >>>>>> >>>>>> >>>>>> [image: Inactive hide details for Rafael Xavier ---12/11/2014 >>>>>> 01:02:57 PM---Friends, This is a very basic question. See below. There ar]Rafael >>>>>> Xavier ---12/11/2014 01:02:57 PM---Friends, This is a very basic question. >>>>>> See below. There are lots of documentation >>>>>> >>>>>> From: Rafael Xavier ** >>>>>> To: *"cldr-users at unicode.org"* >>>>>> ** >>>>>> Cc: J?rn Zaefferer ** >>>>>> >>>>>> Date: 12/11/2014 01:02 PM >>>>>> Subject: Bundle Lookup >>>>>> Sent by: "CLDR-Users" ** >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> Friends, >>>>>> >>>>>> This is a very basic question. See below. There are lots of >>>>>> documentation about locale inheritance and matching. But, it fails in same >>>>>> cases to me. >>>>>> >>>>>> * Giving a locale, what's the procedure to find the **bundle** lookup >>>>>> chain?* >>>>>> >>>>>> 1. en-US: en-US ? (truncation) en ? root >>>>>> >>>>>> This one is dead simple. No problem. >>>>>> >>>>>> 2. en-GB: en-GB ? (parentLocale) en-001 ? (truncation) en ? root >>>>>> >>>>>> This one is also dead simple. Although, documentation says en-GB >>>>>> ? en. Is it outdated or am I doing something wrong? >>>>>> >>>>>> Anyway, the ones I'm interested in knowing are: >>>>>> >>>>>> 3. en-Latn-GB >>>>>> 4. en-US-u-nu-usd >>>>>> 5. zh-TW >>>>>> >>>>>> Please, could someone show me what's the chain of these locales >>>>>> (and obviously explain the steps)? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> -- >>>>>> *+55 (16) 98138-1582* <%2B55%20%2816%29%2098138-1582>, *+1 (415) >>>>>> 568-5854* <%2B1%20%28415%29%20568-5854>, skype: rxaviers >>>>>> *http://rafael.xavier.blog.br* >>>>>> _______________________________________________ >>>>>> CLDR-Users mailing list >>>>>> *CLDR-Users at unicode.org* >>>>>> *http://unicode.org/mailman/listinfo/cldr-users* >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> CLDR-Users mailing list >>>>>> *CLDR-Users at unicode.org* >>>>>> *http://unicode.org/mailman/listinfo/cldr-users* >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> CLDR-Users mailing list >>>>>> CLDR-Users at unicode.org >>>>>> http://unicode.org/mailman/listinfo/cldr-users >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers >>>> http://rafael.xavier.blog.br >>>> >>> >>> >> >> -- >> +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers >> http://rafael.xavier.blog.br >> > > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > http://rafael.xavier.blog.br > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: emoji_u2615.png Type: image/png Size: 1890 bytes Desc: not available URL: From rxaviers at gmail.com Wed Feb 18 14:27:29 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Wed, 18 Feb 2015 18:27:29 -0200 Subject: Ecma-402 Proposal for fixing its LookupMatcher algorithm Message-ID: Hello friends, I've submited one more Ecma-402 proposal that I would certainly like to hear your thoughts. In a brief, I believe that Ecma-402 LookupMatcher (9.2.2 and 9.2.3) fails to perform such task in some cases, which I describe at: https://github.com/tc39/ecma402/pull/3 Feedback is welcome, -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From James_Lin at symantec.com Fri Feb 27 12:14:57 2015 From: James_Lin at symantec.com (James Lin) Date: Fri, 27 Feb 2015 10:14:57 -0800 Subject: BIDI percentage sign Message-ID: Hi I looked through the Unicode standard Annex #9 and unable to find out if percentage sign "%" should reside on the LEFT of the numeric character or RIGHT? My understanding is if the numeric is in Latin or Western Arabic number, 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For Eastern Arabic, "%" sign should be on the LEFT: %??? Is this correct? Thank you -James From srl at icu-project.org Fri Feb 27 12:29:13 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Fri, 27 Feb 2015 10:29:13 -0800 Subject: BIDI percentage sign In-Reply-To: References: Message-ID: <54F0B779.20704@icu-project.org> Related: http://unicode.org/cldr/trac/ticket/7969 http://unicode.org/cldr/trac/ticket/7895 On 2/27/2015 10:14 AM, James Lin wrote: > Hi > I looked through the Unicode standard Annex #9 and unable to find out if percentage sign "%" should reside on the LEFT of the numeric character or RIGHT? > > My understanding is if the numeric is in Latin or Western Arabic number, 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For Eastern Arabic, "%" sign should be on the LEFT: %??? > > Is this correct? > > Thank you > -James > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -- IBMer but all opinions are mine. // GPG: 9731166CD8E23A83BEE7C6D3ACA5DBE1FD8FABF1 https://www.ohloh.net/accounts/srl295 // https://ssl.icu-project.org/trac/wiki/Srl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From James_Lin at symantec.com Fri Feb 27 12:40:41 2015 From: James_Lin at symantec.com (James Lin) Date: Fri, 27 Feb 2015 10:40:41 -0800 Subject: BIDI percentage sign In-Reply-To: <54F0B779.20704@icu-project.org> References: <54F0B779.20704@icu-project.org> Message-ID: Nice, a ticket is open. Can I get notification once this is fixed? Thanks Steven. -----Original Message----- From: CLDR-Users [mailto:cldr-users-bounces at unicode.org] On Behalf Of Steven R. Loomis Sent: Friday, February 27, 2015 10:29 AM To: cldr-users at unicode.org Subject: Re: BIDI percentage sign Related: http://unicode.org/cldr/trac/ticket/7969 http://unicode.org/cldr/trac/ticket/7895 On 2/27/2015 10:14 AM, James Lin wrote: > Hi > I looked through the Unicode standard Annex #9 and unable to find out if percentage sign "%" should reside on the LEFT of the numeric character or RIGHT? > > My understanding is if the numeric is in Latin or Western Arabic number, 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For Eastern Arabic, "%" sign should be on the LEFT: %??? > > Is this correct? > > Thank you > -James > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -- IBMer but all opinions are mine. // GPG: 9731166CD8E23A83BEE7C6D3ACA5DBE1FD8FABF1 https://www.ohloh.net/accounts/srl295 // https://ssl.icu-project.org/trac/wiki/Srl From shervinafshar at gmail.com Fri Feb 27 12:47:05 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Fri, 27 Feb 2015 10:47:05 -0800 Subject: BIDI percentage sign In-Reply-To: References: <54F0B779.20704@icu-project.org> Message-ID: You're added to the CC for both. ? Shervin On Fri, Feb 27, 2015 at 10:40 AM, James Lin wrote: > Nice, a ticket is open. > > Can I get notification once this is fixed? > > Thanks Steven. > > -----Original Message----- > From: CLDR-Users [mailto:cldr-users-bounces at unicode.org] On Behalf Of > Steven R. Loomis > Sent: Friday, February 27, 2015 10:29 AM > To: cldr-users at unicode.org > Subject: Re: BIDI percentage sign > > Related: > http://unicode.org/cldr/trac/ticket/7969 > http://unicode.org/cldr/trac/ticket/7895 > > > On 2/27/2015 10:14 AM, James Lin wrote: > > Hi > > I looked through the Unicode standard Annex #9 and unable to find out if > percentage sign "%" should reside on the LEFT of the numeric character or > RIGHT? > > > > My understanding is if the numeric is in Latin or Western Arabic number, > 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For > Eastern Arabic, "%" sign should be on the LEFT: %??? > > > > Is this correct? > > > > Thank you > > -James > > > > _______________________________________________ > > CLDR-Users mailing list > > CLDR-Users at unicode.org > > http://unicode.org/mailman/listinfo/cldr-users > > -- > IBMer but all opinions are mine. // GPG: > 9731166CD8E23A83BEE7C6D3ACA5DBE1FD8FABF1 > https://www.ohloh.net/accounts/srl295 // > https://ssl.icu-project.org/trac/wiki/Srl > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Fri Feb 27 12:51:46 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 27 Feb 2015 19:51:46 +0100 Subject: BIDI percentage sign In-Reply-To: References: Message-ID: And why Mareicans are putting the currency unit symbol to the right ? It is still read *after* the amount... The only readon I see is to avoid adding an initial digit when the amount is writen over a blank space. You can't add a digit after only because you also add the decimal separator and subunits, or because you write these subunits with a small fraction, or in superscript.. My feeling is that this is a purely typographical tradition and it ia not related to the way you read it loud. For othe measurement units, the unit symbol is placed after the number, not before. This has nothing to do with the Bidi ordering : that symbol preserves its existing ordering even if you place it after or before by the choice of the redactor and his perception of traditions. Number figures use a different system than the rest of the text. 2015-02-27 19:14 GMT+01:00 James Lin : > Hi > I looked through the Unicode standard Annex #9 and unable to find out if > percentage sign "%" should reside on the LEFT of the numeric character or > RIGHT? > > My understanding is if the numeric is in Latin or Western Arabic number, 1 > 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For Eastern > Arabic, "%" sign should be on the LEFT: %??? > > Is this correct? > > Thank you > -James > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shervinafshar at gmail.com Fri Feb 27 13:01:27 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Fri, 27 Feb 2015 11:01:27 -0800 Subject: BIDI percentage sign In-Reply-To: References:

Message-ID: Mareicans?! Moroccans? Americans? Martians? ? Shervin On Fri, Feb 27, 2015 at 10:51 AM, Philippe Verdy wrote: > And why Mareicans are putting the currency unit symbol to the right ? It > is still read *after* the amount... > The only readon I see is to avoid adding an initial digit when the amount > is writen over a blank space. You can't add a digit after only because you > also add the decimal separator and subunits, or because you write these > subunits with a small fraction, or in superscript.. My feeling is that this > is a purely typographical tradition and it ia not related to the way you > read it loud. > For othe measurement units, the unit symbol is placed after the number, > not before. This has nothing to do with the Bidi ordering : that symbol > preserves its existing ordering even if you place it after or before by the > choice of the redactor and his perception of traditions. Number figures use > a different system than the rest of the text. > > 2015-02-27 19:14 GMT+01:00 James Lin : > >> Hi >> I looked through the Unicode standard Annex #9 and unable to find out if >> percentage sign "%" should reside on the LEFT of the numeric character or >> RIGHT? >> >> My understanding is if the numeric is in Latin or Western Arabic number, >> 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For >> Eastern Arabic, "%" sign should be on the LEFT: %??? >> >> Is this correct? >> >> Thank you >> -James >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Fri Feb 27 13:27:00 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 27 Feb 2015 20:27:00 +0100 Subject: BIDI percentage sign In-Reply-To: References:

Message-ID: Sorry for the letter inversions, Americans. 2015-02-27 20:01 GMT+01:00 Shervin Afshar : > Mareicans?! Moroccans? Americans? Martians? > > ? Shervin > > On Fri, Feb 27, 2015 at 10:51 AM, Philippe Verdy > wrote: > >> And why Mareicans are putting the currency unit symbol to the right ? It >> is still read *after* the amount... >> The only readon I see is to avoid adding an initial digit when the amount >> is writen over a blank space. You can't add a digit after only because you >> also add the decimal separator and subunits, or because you write these >> subunits with a small fraction, or in superscript.. My feeling is that this >> is a purely typographical tradition and it ia not related to the way you >> read it loud. >> For othe measurement units, the unit symbol is placed after the number, >> not before. This has nothing to do with the Bidi ordering : that symbol >> preserves its existing ordering even if you place it after or before by the >> choice of the redactor and his perception of traditions. Number figures use >> a different system than the rest of the text. >> >> 2015-02-27 19:14 GMT+01:00 James Lin : >> >>> Hi >>> I looked through the Unicode standard Annex #9 and unable to find out if >>> percentage sign "%" should reside on the LEFT of the numeric character or >>> RIGHT? >>> >>> My understanding is if the numeric is in Latin or Western Arabic number, >>> 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For >>> Eastern Arabic, "%" sign should be on the LEFT: %??? >>> >>> Is this correct? >>> >>> Thank you >>> -James >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shervinafshar at gmail.com Fri Feb 27 15:29:29 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Fri, 27 Feb 2015 13:29:29 -0800 Subject: Why $ appears on the left side of value? (was: Re: BIDI percentage sign) Message-ID: As far as common knowledge goes[1], this is purely a matter of convention. But in some banking contexts, I've seen currency values written as follwos: $200 USD $200 CAD $200 AUD 200$00 CVE ?200 EUR ?200 ILS My take on this is that, here redundancy is used to avoid ambiguity. [1]: http://english.stackexchange.com/questions/11326/what-is-the-difference-between-20-and-20 ? Shervin On Fri, Feb 27, 2015 at 10:51 AM, Philippe Verdy wrote: > And why Mareicans are putting the currency unit symbol to the right ? It > is still read *after* the amount... > The only readon I see is to avoid adding an initial digit when the amount > is writen over a blank space. You can't add a digit after only because you > also add the decimal separator and subunits, or because you write these > subunits with a small fraction, or in superscript.. My feeling is that this > is a purely typographical tradition and it ia not related to the way you > read it loud. > For othe measurement units, the unit symbol is placed after the number, > not before. This has nothing to do with the Bidi ordering : that symbol > preserves its existing ordering even if you place it after or before by the > choice of the redactor and his perception of traditions. Number figures use > a different system than the rest of the text. > > 2015-02-27 19:14 GMT+01:00 James Lin : > >> Hi >> I looked through the Unicode standard Annex #9 and unable to find out if >> percentage sign "%" should reside on the LEFT of the numeric character or >> RIGHT? >> >> My understanding is if the numeric is in Latin or Western Arabic number, >> 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For >> Eastern Arabic, "%" sign should be on the LEFT: %??? >> >> Is this correct? >> >> Thank you >> -James >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: