From unicode at unicode.org Mon Dec 2 06:01:52 2019 From: unicode at unicode.org (Costello, Roger L. via Unicode) Date: Mon, 2 Dec 2019 12:01:52 +0000 Subject: A neat description of encoding characters Message-ID: >From the book titled "Computer Power and Human Reason" by Joseph Weizenbaum, p.74-75 Suppose that the alphabet with which we wish to concern ourselves consists of 256 distinct symbols. Imagine that we have a deck of 256 cards, each of which has a distinct symbol of our alphabet printed on it, and, of course, such that there corresponds one card to each symbol. How many questions that can be answered "yes" or "no" would one have to ask, given one card randomly selected from the deck, in order to be able to decide which character is printed on the card? We can certainly make the decision by asking at most 256 questions. We can somehow order the symbols and begin by asking if it is the first in our ordering, e.g., "It is an uppercase A?" If the answer is "no," then we ask if it is the second, and so on. But if our ordering is known both to ourselves and to our respondent, there is a much more economical way of organizing our questioning. We ask whether the character we are seeking is in the first half of the set. Whatever the answer, we will have isolated a set of 128 characters among the character we seek resides. We again ask whether it is in the first half of that smaller set, and so on. Proceeding in this way, we are bound to discover what character is printed on the selected card by asking exactly eight questions. We could have recorded the answers we received to our questions by writing "1" whenever the answer was "yes" and "0" whenever it was "no." That record would then consist of eight so-called bits each of which is either "1" or "0". This eight-bit string is then an unambiguous representation of the character we are seeking. Moreover, each character of the whole set has a unique eight-bit representation within the same ordering. From unicode at unicode.org Mon Dec 2 08:49:07 2019 From: unicode at unicode.org (=?utf-8?B?5qKB5rW3IExpYW5nIEhhaQ==?= via Unicode) Date: Mon, 2 Dec 2019 22:49:07 +0800 Subject: A neat description of encoding characters In-Reply-To: References: Message-ID: Grrr? It?s an okayish analog for binary numbers, but not really relevant to character encoding. Encoded characters are just assigned with integers, which could in turn be represented in any base. The binary nature of computers? way of storing numbers does not have much to do with how character encoding works?unless you really want to start explaining character encoding with those so basic ideas such as ?What is electricity??, ?What is a computer??, ? Best, ?? Liang Hai https://lianghai.github.io > On Dec 2, 2019, at 20:01, Costello, Roger L. via Unicode wrote: > > From the book titled "Computer Power and Human Reason" by Joseph Weizenbaum, p.74-75 > > Suppose that the alphabet with which we wish to concern ourselves consists of 256 distinct symbols. Imagine that we have a deck of 256 cards, each of which has a distinct symbol of our alphabet printed on it, and, of course, such that there corresponds one card to each symbol. How many questions that can be answered "yes" or "no" would one have to ask, given one card randomly selected from the deck, in order to be able to decide which character is printed on the card? We can certainly make the decision by asking at most 256 questions. We can somehow order the symbols and begin by asking if it is the first in our ordering, e.g., "It is an uppercase A?" If the answer is "no," then we ask if it is the second, and so on. But if our ordering is known both to ourselves and to our respondent, there is a much more economical way of organizing our questioning. We ask whether the character we are seeking is in the first half of the set. Whatever the answer, we will have isolated a set! > of 128 characters among the character we seek resides. We again ask whether it is in the first half of that smaller set, and so on. Proceeding in this way, we are bound to discover what character is printed on the selected card by asking exactly eight questions. We could have recorded the answers we received to our questions by writing "1" whenever the answer was "yes" and "0" whenever it was "no." That record would then consist of eight so-called bits each of which is either "1" or "0". This eight-bit string is then an unambiguous representation of the character we are seeking. Moreover, each character of the whole set has a unique eight-bit representation within the same ordering. > From unicode at unicode.org Mon Dec 2 01:04:34 2019 From: unicode at unicode.org (=?UTF-8?B?4KS14KS/4KS24KWN4KS14KS+4KS44KWLIOCkteCkvuCkuOClgeCkleCkv+CknOCkgyAoVmlzaA==?= =?UTF-8?B?dmFzIFZhc3VraSk=?= via Unicode) Date: Mon, 2 Dec 2019 12:34:34 +0530 Subject: Proposal to add Roman transliteration schemes to ISO 15924. Message-ID: bcc: as an FYI - plz respond on the unicode mailing list as needed. namaste! Sanskrit has traditionally been written in a variety of scripts ranging from Sharada to Grantha. In the past two centuries, it has been written in Latin based scripts as well (please see https://en.wikipedia.org/wiki/Devanagari_transliteration ). We would like these Latin based scripts (IAST, ISO 15919, Kyoto-Harvard, ITRANS, Velthuis, SLP1, WX, National Library at Kolkata romanisation) to be included in the https://unicode.org/iso15924/iso15924-codes.html list. The reason is that we would like to be able to present sanskrit text in a variety of scripts and representations (see related thread ) - and search engines like Google recommend using ISO 15924 to specify the script. Please guide us as to how to proceed. -- -- Vishvas /???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 2 09:58:53 2019 From: unicode at unicode.org (James Tauber via Unicode) Date: Mon, 2 Dec 2019 10:58:53 -0500 Subject: A neat description of encoding characters In-Reply-To: References: Message-ID: Indeed. Unicode separates: (1) selecting a character repertoire; (2) assigning each character a numerical character code; (3) choosing an encoding form to represent those character codes as code units (made up of bytes). (2) and (3) are not conflated. James On Mon, Dec 2, 2019 at 9:54 AM ?? Liang Hai via Unicode wrote: > Grrr? It?s an okayish analog for binary numbers, but not really relevant > to character encoding. Encoded characters are just assigned with integers, > which could in turn be represented in any base. > > The binary nature of computers? way of storing numbers does not have much > to do with how character encoding works?unless you really want to start > explaining character encoding with those so basic ideas such as ?What is > electricity??, ?What is a computer??, ? > > Best, > ?? Liang Hai > https://lianghai.github.io > > > On Dec 2, 2019, at 20:01, Costello, Roger L. via Unicode < > unicode at unicode.org> wrote: > > > > From the book titled "Computer Power and Human Reason" by Joseph > Weizenbaum, p.74-75 > > > > Suppose that the alphabet with which we wish to concern ourselves > consists of 256 distinct symbols. Imagine that we have a deck of 256 cards, > each of which has a distinct symbol of our alphabet printed on it, and, of > course, such that there corresponds one card to each symbol. How many > questions that can be answered "yes" or "no" would one have to ask, given > one card randomly selected from the deck, in order to be able to decide > which character is printed on the card? We can certainly make the decision > by asking at most 256 questions. We can somehow order the symbols and begin > by asking if it is the first in our ordering, e.g., "It is an uppercase A?" > If the answer is "no," then we ask if it is the second, and so on. But if > our ordering is known both to ourselves and to our respondent, there is a > much more economical way of organizing our questioning. We ask whether the > character we are seeking is in the first half of the set. Whatever the > answer, we will have isolated a s! > et! > > of 128 characters among the character we seek resides. We again ask > whether it is in the first half of that smaller set, and so on. Proceeding > in this way, we are bound to discover what character is printed on the > selected card by asking exactly eight questions. We could have recorded the > answers we received to our questions by writing "1" whenever the answer was > "yes" and "0" whenever it was "no." That record would then consist of eight > so-called bits each of which is either "1" or "0". This eight-bit string is > then an unambiguous representation of the character we are seeking. > Moreover, each character of the whole set has a unique eight-bit > representation within the same ordering. > > > > > -- *James Tauber* Eldarion | Scaife Viewer | jktauber.com (Greek Linguistics) | Modelling Music | Digital Tolkien Subscribe to my email newsletter ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 2 10:40:12 2019 From: unicode at unicode.org (Roozbeh Pournader via Unicode) Date: Mon, 2 Dec 2019 08:40:12 -0800 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: References: Message-ID: You don't need an ISO 15924 script code. You need to think in terms of BCP 47. Sanskrit in Latin would be sa-Latn. Now, if you want to distinguish the different transcription systems for writing Sanskrit in Latin, you can apply to registry a BCP 47 variant. There are also BCP 47 extension T, which may also be useful to you: https://tools.ietf.org/html/rfc6497 On Mon, Dec 2, 2019, 7:48 AM ???????? ???????? (Vishvas Vasuki) via Unicode wrote: > bcc: as an FYI - plz respond on > the unicode mailing list as needed. > > namaste! > > Sanskrit has traditionally been written in a variety of scripts ranging > from Sharada to Grantha. In the past two centuries, it has been written in > Latin based scripts as well (please see > https://en.wikipedia.org/wiki/Devanagari_transliteration > ). We > would like these Latin based scripts (IAST, ISO 15919, Kyoto-Harvard, > ITRANS, Velthuis, SLP1, WX, National Library at Kolkata romanisation) to be > included in the https://unicode.org/iso15924/iso15924-codes.html list. > > The reason is that we would like to be able to present sanskrit text in a > variety of scripts and representations (see related thread > ) > - and search engines like Google > recommend using ISO > 15924 to specify the script. Please guide us as to how to proceed. > > -- > -- > Vishvas /???????? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 2 11:09:02 2019 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Mon, 2 Dec 2019 09:09:02 -0800 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: References: Message-ID: On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode < unicode at unicode.org> wrote: > You don't need an ISO 15924 script code. You need to think in terms of BCP > 47. Sanskrit in Latin would be sa-Latn. > Right! Now, if you want to distinguish the different transcription systems for > writing Sanskrit in Latin, you can apply to registry a BCP 47 variant. > There are also BCP 47 extension T, which may also be useful to you: > > https://tools.ietf.org/html/rfc6497 > And that extension is administered by Unicode, with documentation and data here: http://www.unicode.org/reports/tr35/tr35.html#t_Extension Best regards, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 2 18:59:59 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 3 Dec 2019 00:59:59 +0000 Subject: A neat description of encoding characters In-Reply-To: References: Message-ID: <20191203005959.6e92e190@JRWUBU2> On Mon, 2 Dec 2019 12:01:52 +0000 "Costello, Roger L. via Unicode" wrote: > From the book titled "Computer Power and Human Reason" by Joseph > Weizenbaum, p.74-75 > > Suppose that the alphabet with which we wish to concern ourselves > consists of 256 distinct symbols... Why should I wish to concern myself with only one alphabet? Richard. From unicode at unicode.org Mon Dec 2 19:27:39 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 3 Dec 2019 01:27:39 +0000 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: References: Message-ID: <20191203012739.1b98d830@JRWUBU2> On Mon, 2 Dec 2019 09:09:02 -0800 Markus Scherer via Unicode wrote: > On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode < > unicode at unicode.org> wrote: > > > You don't need an ISO 15924 script code. You need to think in terms > > of BCP 47. Sanskrit in Latin would be sa-Latn. > > > > Right! > > Now, if you want to distinguish the different transcription systems > for > > writing Sanskrit in Latin, you can apply to registry a BCP 47 > > variant. There are also BCP 47 extension T, which may also be > > useful to you: > > > > https://tools.ietf.org/html/rfc6497 > > > > And that extension is administered by Unicode, with documentation and > data here: > http://www.unicode.org/reports/tr35/tr35.html#t_Extension But that says that the definitions are at https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml , but all one currently gets from that is an error message 'XML Parsing Error: no element found'. From unicode at unicode.org Mon Dec 2 19:45:41 2019 From: unicode at unicode.org (=?UTF-8?B?4KS14KS/4KS24KWN4KS14KS+4KS44KWLIOCkteCkvuCkuOClgeCkleCkv+CknOCkgyAoVmlzaA==?= =?UTF-8?B?dmFzIFZhc3VraSk=?= via Unicode) Date: Tue, 3 Dec 2019 07:15:41 +0530 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: <20191203012739.1b98d830@JRWUBU2> References: <20191203012739.1b98d830@JRWUBU2> Message-ID: On Tue, Dec 3, 2019 at 6:59 AM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > > > You don't need an ISO 15924 script code. You need to think in terms > > > of BCP 47. Sanskrit in Latin would be sa-Latn. > > > > > > > Right! > > > > Now, if you want to distinguish the different transcription systems > > for > > > writing Sanskrit in Latin, you can apply to registry a BCP 47 > > > variant. There are also BCP 47 extension T, which may also be > > > useful to you: > > > > > > https://tools.ietf.org/html/rfc6497 > > > > > > > And that extension is administered by Unicode, with documentation and > > data here: > > http://www.unicode.org/reports/tr35/tr35.html#t_Extension > > Thanks for the pointers! > But that says that the definitions are at > > https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml > , > but all one currently gets from that is an error message 'XML Parsing > Error: no element found'. > Yes - that needs to be fixed (+markdavis at google.com - could you please? ) https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml shows iast! The subtag I would use for IAST seems to be: sa-Latn-t-sa-m0-iast (https://r12a.github.io/app-subtags/ is unable to confirm that the extension t-sa-m0-iast is all right though.. Could someone confirm?) Then, the next step seems to be to propose to add the below to https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml : ISO 15919, Kyoto-Harvard, ITRANS, Velthuis, SLP1, WX, National Library at Kolkata romanisation How to proceed with that? -- -- Vishvas /???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 2 19:58:04 2019 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Mon, 2 Dec 2019 17:58:04 -0800 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: References: <20191203012739.1b98d830@JRWUBU2> Message-ID: On Mon, Dec 2, 2019 at 5:47 PM ???????? ???????? (Vishvas Vasuki) via Unicode wrote: > But that says that the definitions are at >> > >> https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml >> , >> but all one currently gets from that is an error message 'XML Parsing >> Error: no element found'. >> > > Yes - that needs to be fixed (+markdavis at google.com - could you please? ) > > https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml > shows iast! > FYI A working link to the version in the latest release is https://github.com/unicode-org/cldr/blob/latest/common/bcp47/transform.xml The subtag I would use for IAST seems to be: > sa-Latn-t-sa-m0-iast (https://r12a.github.io/app-subtags/ is unable to > confirm that the extension > > t-sa-m0-iast is all right though.. Could someone confirm?) > I assume that the second "sa" is unnecessary, but I am not very familiar with the -t- extension. Then, the next step seems to be to propose to add the below to > https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml > : > ISO 15919, Kyoto-Harvard, ITRANS, Velthuis, SLP1, WX, National Library at > Kolkata romanisation > How to proceed with that? > I would start with filing a CLDR ticket: http://cldr.unicode.org/index/bug-reports Best regards, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 2 19:58:26 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Tue, 3 Dec 2019 01:58:26 +0000 Subject: A neat description of encoding characters In-Reply-To: <20191203005959.6e92e190@JRWUBU2> References: <20191203005959.6e92e190@JRWUBU2> Message-ID: <186ca851-ba0b-0b25-3dff-0a81db1e39eb@gmail.com> On 2019-12-03 12:59 AM, Richard Wordingham via Unicode wrote: > On Mon, 2 Dec 2019 12:01:52 +0000 > "Costello, Roger L. via Unicode" wrote: > >> From the book titled "Computer Power and Human Reason" by Joseph >> Weizenbaum, p.74-75 >> >> Suppose that the alphabet with which we wish to concern ourselves >> consists of 256 distinct symbols... > Why should I wish to concern myself with only one alphabet? > You shouldn't.? But suppose you did.? That's the hypothetical set-up for the illustration. When that book was published in 1976, that illustration may have helped some people gain a better understanding of computer encoding. Nowadays a character string might be required to produce a glyph which the user community considers to be a "character" (or letter) in its writing system.? Adding variation selectors, invisible 'formatting' characters, and non-alphabetic symbols to the mix has moved computer encoding way beyond 1976. From unicode at unicode.org Mon Dec 2 20:05:35 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 3 Dec 2019 02:05:35 +0000 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: <20191203012739.1b98d830@JRWUBU2> References: <20191203012739.1b98d830@JRWUBU2> Message-ID: <20191203020535.505dff09@JRWUBU2> On Tue, 3 Dec 2019 01:27:39 +0000 Richard Wordingham wrote: > On Mon, 2 Dec 2019 09:09:02 -0800 > Markus Scherer via Unicode wrote: > > > On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode < > > unicode at unicode.org> wrote: > > > > > You don't need an ISO 15924 script code. You need to think in > > > terms of BCP 47. Sanskrit in Latin would be sa-Latn. > > > > > > > Right! > > > > Now, if you want to distinguish the different transcription systems > > for > > > writing Sanskrit in Latin, you can apply to registry a BCP 47 > > > variant. There are also BCP 47 extension T, which may also be > > > useful to you: > > > > > > https://tools.ietf.org/html/rfc6497 > > > > > > > And that extension is administered by Unicode, with documentation > > and data here: > > http://www.unicode.org/reports/tr35/tr35.html#t_Extension > > But that says that the definitions are at > https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml , > but all one currently gets from that is an error message 'XML Parsing > Error: no element found'. A working URI is https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml . I'm still trying to work out what to do for IAST. Is it just: sa-t-m0-iast if one finds that sa-Latn allows too much latitude? How does one choose between anusvara and specific consonants for homorganic nasals? Is it sa-150-t-m0-iast v. sa-IN-t-m0-iast? Richard. From unicode at unicode.org Mon Dec 2 20:38:55 2019 From: unicode at unicode.org (Mark E. Shoulson via Unicode) Date: Mon, 2 Dec 2019 21:38:55 -0500 Subject: A neat description of encoding characters In-Reply-To: References: Message-ID: On 12/2/19 7:01 AM, Costello, Roger L. via Unicode wrote: > >From the book titled "Computer Power and Human Reason" by Joseph Weizenbaum, p.74-75 It's a reasonably good explanation of binary numbers and "encoding" in a more usual sense than we use it here in Unicode-land.? Actually makes for a basis to move on to discussing information theory.? But when Unicodites say "encoding", they mean stuff like UTF-8 vs UTF-16, which is kind of a different kettle of macaroons. ~mark From unicode at unicode.org Mon Dec 2 20:44:15 2019 From: unicode at unicode.org (=?UTF-8?B?4KS14KS/4KS24KWN4KS14KS+4KS44KWLIOCkteCkvuCkuOClgeCkleCkv+CknOCkgyAoVmlzaA==?= =?UTF-8?B?dmFzIFZhc3VraSk=?= via Unicode) Date: Tue, 3 Dec 2019 08:14:15 +0530 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: References: <20191203012739.1b98d830@JRWUBU2> Message-ID: On Tue, Dec 3, 2019 at 7:28 AM Markus Scherer wrote: > > The subtag I would use for IAST seems to be: >> sa-Latn-t-sa-m0-iast (https://r12a.github.io/app-subtags/ is unable to >> confirm that the extension >> >> t-sa-m0-iast is all right though.. Could someone confirm?) >> > > I assume that the second "sa" is unnecessary, but I am not very familiar > with the -t- extension. > The example und-Cyrl-t-und-latn-m0-ungegn-2007 in https://tools.ietf.org/rfc/rfc6497.txt led me to use: sa-Latn-t-sa-Zyyy-m0-iast for my case. > > Then, the next step seems to be to propose to add the below to >> https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml >> : >> ISO 15919, Kyoto-Harvard, ITRANS, Velthuis, SLP1, WX, National Library at >> Kolkata romanisation >> How to proceed with that? >> > > I would start with filing a CLDR ticket: > http://cldr.unicode.org/index/bug-reports > Thanks! I've filed https://unicode-org.atlassian.net/browse/CLDR-13444 . > > Best regards, > markus > -- -- Vishvas /???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 2 23:31:46 2019 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Tue, 3 Dec 2019 06:31:46 +0100 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: <20191203012739.1b98d830@JRWUBU2> References: <20191203012739.1b98d830@JRWUBU2> Message-ID: Filed the following, thanks Richard. CLDR-13445 Release link for "latest" goes to zip file On Tue, Dec 3, 2019 at 2:31 AM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > On Mon, 2 Dec 2019 09:09:02 -0800 > Markus Scherer via Unicode wrote: > > > On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode < > > unicode at unicode.org> wrote: > > > > > You don't need an ISO 15924 script code. You need to think in terms > > > of BCP 47. Sanskrit in Latin would be sa-Latn. > > > > > > > Right! > > > > Now, if you want to distinguish the different transcription systems > > for > > > writing Sanskrit in Latin, you can apply to registry a BCP 47 > > > variant. There are also BCP 47 extension T, which may also be > > > useful to you: > > > > > > https://tools.ietf.org/html/rfc6497 > > > > > > > And that extension is administered by Unicode, with documentation and > > data here: > > http://www.unicode.org/reports/tr35/tr35.html#t_Extension > > But that says that the definitions are at > > https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml > , > but all one currently gets from that is an error message 'XML Parsing > Error: no element found'. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 3 04:15:55 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 3 Dec 2019 10:15:55 +0000 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: <20191203020535.505dff09@JRWUBU2> References: <20191203012739.1b98d830@JRWUBU2> <20191203020535.505dff09@JRWUBU2> Message-ID: <20191203101555.75a29cda@JRWUBU2> On Tue, 3 Dec 2019 02:05:35 +0000 Richard Wordingham via Unicode wrote: > I'm still trying to work out what to do for IAST. Is it just: > > sa-t-m0-iast > > if one finds that > > sa-Latn > > allows too much latitude? For material that is a transcription rather than a transliteration, are there regional preferences for the homorganic nasals when writing in the writing systems generated by IAST? > How does one choose between anusvara and specific consonants > for homorganic nasals? Is it sa-150-t-m0-iast v. sa-IN-t-m0-iast? As these locales strictly speaking defined locales, I think I put the region in the wrong place. Perhaps they should be: sa-t-m0-sa-150-Deva-iast v. sa-t-m0-sa-IN-Deva-iast As a locale, is the latter the same as sa-t-m0-sa-IN-Mlym? I'm not sure how the preference for writing homorganic nasals varies by region and by script. What is the scope of IAST? Does sa-t-m0-sa-Thai exist? sa-Thai seems to prefer the nasal stops to anusvara before oral stops. The text in IAST that I encounter seems not to have ansuvara before stop consonants. I believe 'sa' would naturally expand (are there non-void prescribed rules on this?) as sa-Deva-IN, so perhaps the sa-Latn I usually see is unusual as sa-t-m0-iast and the description should be expanded to at least sa-t-m0-sa-150-iast if sa-Latn is not precise enough. Can someone advise? Richard. From unicode at unicode.org Tue Dec 3 05:35:11 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 3 Dec 2019 11:35:11 +0000 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: References: <20191203012739.1b98d830@JRWUBU2> Message-ID: <20191203113511.66fe75ff@JRWUBU2> I think the 'Latn' in sa-Latn-t-sa-m0-iast is unnecessary, though it partly depends on the range of the IAST transform. If the transformation can only convert to the Roman script then 'Latn' is superfluous; I'm not sure if the extension is formally enough to rule out Devanagari. On the other hand, some people seem to think that there is an IAST transformation to Cyrillic. However, as a locale for generated text, I feel it is inadequate. Wouldn't the expansion rules generate sa?ti from ???? rather than santi from ????? for 'they are'? Or have better fonts changed Indian practice? Richard. From unicode at unicode.org Tue Dec 3 06:05:14 2019 From: unicode at unicode.org (=?UTF-8?B?4KS14KS/4KS24KWN4KS14KS+4KS44KWLIOCkteCkvuCkuOClgeCkleCkv+CknOCkgyAoVmlzaA==?= =?UTF-8?B?dmFzIFZhc3VraSk=?= via Unicode) Date: Tue, 3 Dec 2019 17:35:14 +0530 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: <20191203101555.75a29cda@JRWUBU2> References: <20191203012739.1b98d830@JRWUBU2> <20191203020535.505dff09@JRWUBU2> <20191203101555.75a29cda@JRWUBU2> Message-ID: On Tue, Dec 3, 2019 at 3:48 PM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > On Tue, 3 Dec 2019 02:05:35 +0000 > Richard Wordingham via Unicode wrote: The text in IAST that I encounter seems not to have ansuvara before > stop consonants. That's typical. Whatever the source script (if there is one), IAST tends to be used by people who follow the sanskrit devanAgarI conventions pretty strictly (so ends up being transcription rather than transliteration.) > I believe 'sa' would naturally expand (are there > non-void prescribed rules on this?) as sa-Deva-IN, so perhaps the > sa-Latn I usually see is unusual as sa-t-m0-iast and the description > should be expanded to at least sa-t-m0-sa-150-iast if sa-Latn is not > precise enough. > Not sure what 150 is doing there.. -- -- Vishvas /???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 3 06:12:23 2019 From: unicode at unicode.org (=?UTF-8?B?4KS14KS/4KS24KWN4KS14KS+4KS44KWLIOCkteCkvuCkuOClgeCkleCkv+CknOCkgyAoVmlzaA==?= =?UTF-8?B?dmFzIFZhc3VraSk=?= via Unicode) Date: Tue, 3 Dec 2019 17:42:23 +0530 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: <20191203113511.66fe75ff@JRWUBU2> References: <20191203012739.1b98d830@JRWUBU2> <20191203113511.66fe75ff@JRWUBU2> Message-ID: On Tue, Dec 3, 2019 at 5:07 PM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > > However, as a locale for generated text, I feel it is inadequate. > Wouldn't the expansion rules generate sa?ti from ???? rather than santi > from ????? for 'they are'? True. I suppose that someone wanting to replicate the "anusvAra instead of nasal" shorthand in IAST would use a dravidian source script or a non-sanskrit source language - or ask for inclusion of a modifier after "iast" - like t-sa-m0-iast-anusavrashorthand -- -- Vishvas /???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 3 15:43:52 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 3 Dec 2019 21:43:52 +0000 Subject: Proposal to add Roman transliteration schemes to ISO 15924. In-Reply-To: References: <20191203012739.1b98d830@JRWUBU2> <20191203020535.505dff09@JRWUBU2> <20191203101555.75a29cda@JRWUBU2> Message-ID: <20191203214352.18b23154@JRWUBU2> On Tue, 3 Dec 2019 17:35:14 +0530 ???????? ???????? (Vishvas Vasuki) via Unicode wrote: > On Tue, Dec 3, 2019 at 3:48 PM Richard Wordingham via Unicode < > unicode at unicode.org> wrote: > > On Tue, 3 Dec 2019 02:05:35 +0000 > > Richard Wordingham via Unicode wrote: > The text in IAST that I encounter seems not to have ansuvara before > > stop consonants. > That's typical. > Whatever the source script (if there is one), IAST tends to be used by > people who follow the sanskrit devanAgarI conventions pretty strictly > (so ends up being transcription rather than transliteration.) > > I believe 'sa' would naturally expand (are there > > non-void prescribed rules on this?) as sa-Deva-IN, so perhaps the > > sa-Latn I usually see is unusual as sa-t-m0-iast and the description > > should be expanded to at least sa-t-m0-sa-150-iast if sa-Latn is not > > precise enough. > Not sure what 150 is doing there.. I read, but in an old book, that when Sanskrit was printed in Devanagari, clusters phonetically composed of nasal plus plosive were written using the nasal consonant, but in India were printed using anusvara. The Sanskrit version of the UN Declaration of Human Rights at Unicode (https://unicode.org/udhr/d/udhr_san.html) conforms to this pattern by using anusvara instead of clusters, but I don't know where the translation actually came from. Accordingly, I thought that to get clusters instead of anusvara before plosives, I should select Sanskrit as used in Europe, as opposed to Sanskrit as used in India. '150' is the region code for Europe. Richard. From unicode at unicode.org Thu Dec 5 18:29:35 2019 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Thu, 5 Dec 2019 16:29:35 -0800 Subject: Fwd: ICU 66preview available In-Reply-To: References: Message-ID: Dear Unicoders, If you use ICU, then testing with ICU 66*preview* is a good way of trying out Unicode 13 *beta* . (Just please don't use these snapshots in production releases.) Best regards, markus ---------- Forwarded message --------- Dear friends and users of ICU, We are pleased to announce a preview of ICU 66. ICU 66 (scheduled for release in 2020 March) will update to Unicode 13, and include some bug fixes. This will be a low-impact release with no other significant feature additions or implementation changes. ICU 66preview updates to Unicode 13 beta , including new characters, scripts, emoji, and corresponding API constants. It also updates to CLDR 36.1preview with Unicode 13 updates and bug fixes. For details please see site.icu-project.org/download/66. Please test this preview on your platforms and report bugs and regressions by Tuesday, 2020-jan-07. Please do not use this preview in production. The preliminary API reference documents are published on unicode-org.github.io/icu-docs/ ? follow the ?Dev? links there. Best regards, Markus Scherer for the ICU Project -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 16 18:50:39 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Tue, 17 Dec 2019 06:20:39 +0530 Subject: NBSP supposed to stretch, right? Message-ID: Hello. I've just tested LibreOffice, Google Docs and MS Office on Linux, Android and Windows, and it seems that NBSP doesn't get stretched like the normal space character when justified alignment requires it. Let me explain. I'm creating a document with the following text typeset in 12 pt Lohit Tamil with justified alignment on an A5 page with 0.5" margin all around: ??????? ????????? ?????? ???? ????????? ?????? ????????????. ???? ?????????? ???? ??? ???????. ?????? ?????????????????????? ??????? ??. The screenshot https://sites.google.com/site/jamadagni/files/temp/nbsp-not-expanding.png may be useful to illustrate the situation. Readers may try such similar sentences in any software/platform of their choice and report as to what happens. Here the problem arises with the phrase ???? ??? ???????. The word ???? is a honorific applying to the following name of the sage ??? ???????, so it would seem unsightly to the reader if it goes to the previous line, so I insert an NBSP between it and the name. (Isn't there such a stylistic convention in English where Mr doesn't stand at the end of a line? I don't know.) However, the phrase is shortly followed by a long word ??????????????????????, which is too long to fit on the same line and hence goes to the next line, thereby increasing the inter-word spacing on its previous line significantly. But the NBSP after the honorific doesn't stretch, making the word layout unsightly. IIUC, no-break space is just that: a space that doesn't permit a line break. This says nothing about it being fixed width. Unicode 12.0 ?2.3 on p 27 (55 of PDF) says: ?Other compatibility decomposable characters are widely used characters serving essential functions. U+00A0 no-break space is one example. In these and similar cases, such as fixed-width space characters,?.? To my understanding this itself says that NBSP isn't fixed-width. ibid ?6.2 on p 265 (293 of PDF) specifically talking about spacing characters says: ?No-Break Space. U+00A0 no-break space (NBSP) is the nonbreaking counterpart of U+0020 space. It has the same width, but behaves differently for line breaking. For more information, see Unicode Standard Annex #14, ?Unicode Line Breaking Algorithm.? The wording ?but behaves differently for line breaking? seems to vindicate what I understood that the only difference is in line breaking behaviour but the wording ?has the same width? doesn't clearly say anything about the stretching behaviour, only about the nominal advance width given as part of font data. I would have gone and filed this as a LibreOffice bug since that's the software I use most, but when I found this is a cross-software problem, I thought it would be best to have this discussed and documented here (and in a future version of the standard). My expectation is that since NBSP is not intended to be a fixed width space, and the only difference intended between it and the normal U+0020 SP being in line breaking, NBSP should be treated equal to U+0020 for the purpose of stretching for justified alignment. Only then can text such as the above be naturally easily formatted. -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Tue Dec 17 04:37:31 2019 From: unicode at unicode.org (QSJN 4 UKR via Unicode) Date: Tue, 17 Dec 2019 12:37:31 +0200 Subject: Fwd: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: Agree. By the way, it is common practice to use multiple nbsp in a row to create a larger span. In my opinion, it is wrong to replace fixed width spaces with non-breaking spaces. Quote from Microsoft Typography Character design standards: ?The no-break space is not the same character as the figure space. The figure space is not a character defined in most computer system's current code pages. In some fonts this character's width has been defined as equal to the figure width. This is an incorrect usage of the character no-break space.? From unicode at unicode.org Tue Dec 17 04:41:07 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Tue, 17 Dec 2019 16:11:07 +0530 Subject: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: On Tue 17 Dec, 2019, 16:09 QSJN 4 UKR via Unicode, wrote: > Agree. > By the way, it is common practice to use multiple nbsp in a row to > create a larger span. In my opinion, it is wrong to replace fixed > width spaces with non-breaking spaces. > Quote from Microsoft Typography Character design standards: > ?The no-break space is not the same character as the figure space. The > figure space is not a character defined in most computer system's > current code pages. In some fonts this character's width has been > defined as equal to the figure width. This is an incorrect usage of > the character no-break space.? > Sorry but I don't understand how this addresses the issue I raised. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 17 10:20:33 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Tue, 17 Dec 2019 08:20:33 -0800 Subject: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 17 13:31:39 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Tue, 17 Dec 2019 19:31:39 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: <8bb24935-7864-5114-c9c9-24db0288c288@gmail.com> On 2019-12-17 10:37 AM, QSJN 4 UKR via Unicode wrote: > Agree. > By the way, it is common practice to use multiple nbsp in a row to > create a larger span. In my opinion, it is wrong to replace fixed > width spaces with non-breaking spaces. > Quote from Microsoft Typography Character design standards: > ?The no-break space is not the same character as the figure space. The > figure space is not a character defined in most computer system's > current code pages. In some fonts this character's width has been > defined as equal to the figure width. This is an incorrect usage of > the character no-break space.? > The mention of code pages made me suspect that this quote was from an archived older web page, but it's current.? Here's the link: https://docs.microsoft.com/en-us/typography/develop/character-design-standards/whitespace Quoting from that same page, "Advance width rule : The advance width of the no-break space should be equal to the width of the space." So it follows that any justification operation should treat NO-BREAK SPACE and SPACE identically. From unicode at unicode.org Tue Dec 17 15:28:36 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 17 Dec 2019 21:28:36 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: <20191217212836.43c36be7@JRWUBU2> On Tue, 17 Dec 2019 06:20:39 +0530 Shriramana Sharma via Unicode wrote: > Hello. I've just tested LibreOffice, Google Docs and MS Office on > Linux, Android and Windows, and it seems that NBSP doesn't get > stretched like the normal space character when justified alignment > requires it. > > Let me explain. I'm creating a document with the following text > typeset in 12 pt Lohit Tamil with justified alignment on an A5 page > with 0.5" margin all around: > > ??????? ????????? ?????? ???? ????????? ?????? ????????????. ???? > ?????????? ???? ??? ???????. ?????? ?????????????????????? ??????? ??. > > The screenshot > https://sites.google.com/site/jamadagni/files/temp/nbsp-not-expanding.png > may be useful to illustrate the situation. Readers may try such > similar sentences in any software/platform of their choice and report > as to what happens. > > Here the problem arises with the phrase ???? ??? ???????. The word > ???? is a honorific applying to the following name of the sage ??? > ???????, so it would seem unsightly to the reader if it goes to the > previous line, so I insert an NBSP between it and the name. (Isn't > there such a stylistic convention in English where Mr doesn't stand at > the end of a line? I don't know.) It's not widely taught in so far as it exists. I would avoid placingthe word at the end in wide columns, just as I suppress line breaks in 'Figure?7' and '17?December', but I only apply it to short adjuncts. However, I would find the use of narrower spacing somewhere between acceptable and desirable. Thai has a similar rule, where there is generally no space between title and forename, but an obligatory space between forename and surname. To me, this is a continuation of the principle that line-breaks within phrases make them more difficult to understand. > However, the phrase is shortly followed by a long word > ??????????????????????, which is too long to fit on the same line and > hence goes to the next line, thereby increasing the inter-word spacing > on its previous line significantly. But the NBSP after the honorific > doesn't stretch, making the word layout unsightly. The strategies to deal with this general problem in English are hyphenation and abandoning justification. In this particular case, your text would benefit from using Knuth's algorithm for justification. > IIUC, no-break space is just that: a space that doesn't permit a line > break. This says nothing about it being fixed width. > > Unicode 12.0 ?2.3 on p 27 (55 of PDF) says: You're assuming that TUS is a standard. It's much more a collection of influential recommendations. Richard. From unicode at unicode.org Tue Dec 17 17:02:35 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Tue, 17 Dec 2019 15:02:35 -0800 Subject: NBSP supposed to stretch, right? In-Reply-To: <8bb24935-7864-5114-c9c9-24db0288c288@gmail.com> References: <8bb24935-7864-5114-c9c9-24db0288c288@gmail.com> Message-ID: <31ec3fa4-d2b9-cf61-4c9a-1106e10c6002@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 17 19:49:02 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Wed, 18 Dec 2019 01:49:02 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: <31ec3fa4-d2b9-cf61-4c9a-1106e10c6002@ix.netcom.com> References: <8bb24935-7864-5114-c9c9-24db0288c288@gmail.com> <31ec3fa4-d2b9-cf61-4c9a-1106e10c6002@ix.netcom.com> Message-ID: <32d7b6e6-a08d-cff5-81b3-43e9fdb3d9df@gmail.com> Asmus Freytag wrote, > And any recommendation that is not compatible with what the overwhelming > majority of software has been doing should be ignored (or only enabled on > explicit user input). > > Otherwise, you'll just advocating for a massively breaking change. It seems like the recommendations are already in place and the ?overwhelming majority of software? is already disregarding them. I don?t see the massively breaking change here.? Are there any illustrations? If legacy text containing NON-BREAK SPACE characters is popped into a justifier, the worst thing that can happen is that the text will be correctly justified under a revised application.? That?s not breaking anything, it?s fixing it.? Unlike changing the font-face, font size, or page width (which often results in reformatting the text), the line breaks are calculated before justification occurs. If a string of NON-BREAK SPACE characters appears in an HTML file, the browser should proportionally adjust all of those space characters identically with the ?normal? space characters.? This should preserve the authorial intent. As for pre-Unicode usage of NON-BREAK SPACE, were there ever any exlicit guidelines suggesting that the normal SPACE character should expand or contract for justification but that the NON-BREAK SPACE must not expand or contract? From unicode at unicode.org Wed Dec 18 06:42:43 2019 From: unicode at unicode.org (Marius Spix via Unicode) Date: Wed, 18 Dec 2019 13:42:43 +0100 Subject: HEAVY EQUALS SIGN Message-ID: <20191218134219.560eed9b@spixxi> Unicode has a HEAVY PLUS SIGN (U+2795) and a HEAVY MINUS SIGN (U+2796). I wonder, if a HEAVY EQUALS SIGN could complete that character set. This would allow emoji phrases like ?? ???= ??. (man plus cat equals love) looking typographically better, when you replace the equals sign with a new HEAVY EQUALS SIGN character. Thoughts? Marius -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: Digitale Signatur von OpenPGP URL: From unicode at unicode.org Wed Dec 18 07:43:06 2019 From: unicode at unicode.org (Joao S. O. Bueno via Unicode) Date: Wed, 18 Dec 2019 10:43:06 -0300 Subject: HEAVY EQUALS SIGN In-Reply-To: <20191218134219.560eed9b@spixxi> References: <20191218134219.560eed9b@spixxi> Message-ID: I think that as your object is emoji drawing, not mathematics, this request can't be justified that way. Maybe it would make more sense to try and check whether modification combining characters to shift the change the combined character into other weight/decoration/color and/or other character effects could be built, that could be used not only along emoji, but with all other characters. Currently those transforms require the use of another text protocol, like HTML, or ANSI sequences for terminal, or even proprietary and add-hoc text file structures like Microsoft's .doc and .rtf (and other not that proprietary, but equally dependant on specific software to be proper rendered, like .ooxml and .odf). Since modificator characters for color and others have been tried and tested in Unicode land for some emojis, the ball to have in-unicode proper character transforms could start to roll - Does anyone know if there is already an initiative like that? I'd like to know more about it. (as for the O.P.: I think the way out for you now is to use an out-of-unicode markup to select a heavier-looking font for the `+` and `=` characters) js -><- On Wed, 18 Dec 2019 at 09:42, Marius Spix via Unicode wrote: > Unicode has a HEAVY PLUS SIGN (U+2795) and a HEAVY MINUS SIGN (U+2796). > I wonder, if a HEAVY EQUALS SIGN could complete that character set. > This would allow emoji phrases like ?? ???= ??. (man plus cat equals > love) looking typographically better, when you replace the equals sign > with a new HEAVY EQUALS SIGN character. Thoughts? > > Marius > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Dec 18 12:12:23 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Wed, 18 Dec 2019 10:12:23 -0800 Subject: NBSP supposed to stretch, right? In-Reply-To: <32d7b6e6-a08d-cff5-81b3-43e9fdb3d9df@gmail.com> References: <8bb24935-7864-5114-c9c9-24db0288c288@gmail.com> <31ec3fa4-d2b9-cf61-4c9a-1106e10c6002@ix.netcom.com> <32d7b6e6-a08d-cff5-81b3-43e9fdb3d9df@gmail.com> Message-ID: <206cdb34-67b2-3a7d-8ee6-67dde3c2192a@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Dec 18 17:07:15 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Wed, 18 Dec 2019 23:07:15 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: <206cdb34-67b2-3a7d-8ee6-67dde3c2192a@ix.netcom.com> References: <8bb24935-7864-5114-c9c9-24db0288c288@gmail.com> <31ec3fa4-d2b9-cf61-4c9a-1106e10c6002@ix.netcom.com> <32d7b6e6-a08d-cff5-81b3-43e9fdb3d9df@gmail.com> <206cdb34-67b2-3a7d-8ee6-67dde3c2192a@ix.netcom.com> Message-ID: <8a54b5c0-d0fc-bc9a-a7e0-921cd014facb@gmail.com> U+0020 SPACE U+00A0 NO-BREAK SPACE These two characters are equal in every way except that one of them offers an opportunity for a line break and the other does not. If the above statement is true, then any conformant application must treat/process/display both characters identically. Responding to Asmus Freytag, > Now, if someone can show us that there are widespread implementations that > follow the above recommendation and have no interoperability issues with HTML > then I may change my tune. Can anyone show us that there are widespread implementations which would break if they started following the above recommendation? Quoting from this HTML basics page, http://www.htmlbasictutor.ca/non-breaking-space.htm ?Some browsers will ignore beyond the first instance of the non-breaking space.? and ?Not all browsers acknowledge the additional instances of the non-breaking space.? Fifteen or twenty years ago, we used NO-BREAK SPACE to indent paragraphs and to position text and graphics.? Both of those uses are presently considered no-nos because some browsers collapse NBSPs and because there are proper ways now to accomplish these kinds of effects. The introduction of browsers which collapsed NBSP strings broke existing web pages.? Perhaps the developers of those browsers decided that SPACE and NO-BREAK SPACE are indeed identical except for line breaking. Are there any modern mark-up language uses of SPACE vs NO-BREAK SPACE which would be broken if they follow the above recommendation? From unicode at unicode.org Wed Dec 18 20:58:39 2019 From: unicode at unicode.org (Fred Brennan via Unicode) Date: Thu, 19 Dec 2019 10:58:39 +0800 Subject: HEAVY EQUALS SIGN In-Reply-To: References: <20191218134219.560eed9b@spixxi> Message-ID: <1735205.9fDaAcxpeS@pc> On Wednesday, December 18, 2019 9:43:06 PM PST Joao S. O. Bueno via Unicode wrote: > Maybe it would make more sense to try and check whether modification > combining characters to shift the change the combined character into other > weight/decoration/color and/or other character effects could be built, that > could be used not only along emoji, but with all other characters. > > Currently those transforms require the use of another text protocol, like > HTML, or ANSI sequences for terminal, or even proprietary and add-hoc text > file structures like Microsoft's .doc and .rtf (and other not that > proprietary, but equally dependant on specific software to be proper > rendered, like .ooxml and .odf). > > Does anyone know if there is already an initiative like that? I'd like to > know more about it. There was a request like this, and it was first recommended for rejection by the Script Ad Hoc committee, and was then rejected by the Unicode Technical Committee. It wasn't for bold, it was for italic, but the reasons for its rejection apply broadly to bold, rotalic, etc. The request was L2/19-063, ?A proposal for encoding italics in plain text using Variation Selector 14,? by William Overington, submitted 2019-02-07. Deborah Anderson, et al., recommended the request for rejection in L2/19-173, ?Recommendations to UTC #159 April-May 2019 on Script Proposals?. In practice, although the UTC has the power to ignore their recommendation, they rarely ever do. Overington tried to answer some of their concerns in L2/19-195, ?Comments on comments about L2/19-063 Italics in Plain Text?. His comments did not sway the UTC and the UTC rejected the request. https://www.unicode.org/L2/L2019/19122.htm#159-C24 It's not worth writing another request for generic bold/italic in plaintext for any glyph in my humble opinion. The UTC and its subcommittees are opposed. I agree with them, so do many others. Best, Fred Brennan From unicode at unicode.org Wed Dec 18 22:41:43 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Thu, 19 Dec 2019 04:41:43 +0000 Subject: HEAVY EQUALS SIGN In-Reply-To: <20191218134219.560eed9b@spixxi> References: <20191218134219.560eed9b@spixxi> Message-ID: On 2019-12-18 12:42 PM, Marius Spix via Unicode wrote: > Unicode has a HEAVY PLUS SIGN (U+2795) and a HEAVY MINUS SIGN (U+2796). > I wonder, if a HEAVY EQUALS SIGN could complete that character set. > This would allow emoji phrases like ?? ???= ??. (man plus cat equals > love) looking typographically better, when you replace the equals sign > with a new HEAVY EQUALS SIGN character. Thoughts? > > Marius ?? ? ?? ? ?? From unicode at unicode.org Thu Dec 19 00:02:07 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Thu, 19 Dec 2019 06:02:07 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: On 2019-12-17 12:50 AM, Shriramana Sharma via Unicode wrote: > I would have gone and filed this as a LibreOffice bug since that's the > software I use most, but when I found this is a cross-software > problem, I thought it would be best to have this discussed and > documented here (and in a future version of the standard). There's a bug report for the LibreOffice application here... https://bugs.documentfoundation.org/show_bug.cgi?id=41652 ...which shows an interesting history of the situation. One issue is whether to be Unicode compliant or MS-Word compliant. MS-Word had apparently corrected the bug with Word 2013 but had reverted to the incorrect behavior by the time Word 2016 rolled out.? On that page it's noted that applications like InDesign, Firefox, TeX, and QuarkXPress handle U+00A0 correctly. From unicode at unicode.org Thu Dec 19 02:28:48 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Thu, 19 Dec 2019 08:28:48 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: From our colleague?s web site, http://jkorpela.fi/chars/spaces.html ?On web browsers, no-break spaces tended to be non-adjustable, but modern browsers generally stretch them on justification.? Jukka Korpela then offers pointers about avoiding unwanted stretching. and ?The change in the treatment of no-break spaces, though inconvenient, is consistent with changes in CSS specifications. For example, clause 7 Spacing of CSS Text Module Level 3 (Editor?s Draft 24 Jan. 2019) defines the no-break space, but not the fixed-with spaces, as a word-separator character, stretchable on justification.? So it appears that there?s no interoperability problem with HTML. It seems that the widespread breakage which Asmus Freytag mentions is limited to legacy applications which persist in treating U+00A0 as the old ?hard space? such as Word.? It also appears that Microsoft tried and failed to correct the problem in Word.? Perhaps they should try again.? Meanwhile, in the absence of anything from Unicode more explicit than already recommended by the Standard, Shriramana Sharma might be well advised to continue to lobby the respective software people.? As more applications migrate towards the correct treatment of U+00A0, they are probably already running into interoperability problems with Microsoft Word and may well have already implemented solutions. From unicode at unicode.org Fri Dec 20 05:55:17 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Fri, 20 Dec 2019 17:25:17 +0530 Subject: NBSP supposed to stretch, right? In-Reply-To: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> Message-ID: On 12/17/19, Asmus Freytag via Unicode wrote: > On 12/17/2019 2:41 AM, Shriramana Sharma via Unicode wrote: >> >> On Tue 17 Dec, 2019, 16:09 QSJN 4 UKR via Unicode, >> wrote: >>> >>> ?The no-break space is not the same character as the figure space. The >>> figure space is not a character defined in most computer system's >>> current code pages. In some fonts this character's width has been >>> defined as equal to the figure width. This is an incorrect usage of >>> the character no-break space.? >> >> >> Sorry but I don't understand how this addresses the issue I raised. > > You don't? > > In principle it may be true that NBSP is not fixed width, but show me > software that doesn't treat it that way. > > In HTML, NBSP isn't subject to space collapse, therefore it's the go-to > space character when you need some extra spacing that doesn't disappear. So I never asked for NBSP to disappear. I said I want it to *stretch*. And to my mind "stretch" means to become wider than one's normal width. It doesn't include decreasing or disappearing width. I don't expect NBSP to ever disappear, because spaces disappear only at linebreaks, and NBSP simply doesn't stand at linebreaks. -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Fri Dec 20 09:17:37 2019 From: unicode at unicode.org (wjgo_10009@btinternet.com via Unicode) Date: Fri, 20 Dec 2019 15:17:37 +0000 (GMT) Subject: HEAVY EQUALS SIGN In-Reply-To: <1735205.9fDaAcxpeS@pc> References: <20191218134219.560eed9b@spixxi> <1735205.9fDaAcxpeS@pc> Message-ID: <331d99a7.6b6.16f23e2b33d.Webtop.226@btinternet.com> On the matter of my document proposing using Variation Selector 14 for requesting an italic glyph for a letter, Unicode Inc. has also published a Notice of Non-Approval. https://www.unicode.org/alloc/nonapprovals.html It is indeed interesting that the Notice of Non-Approval itself uses italics for emphasis in two places. That text, at the present time, cannot be expressed in Unicode plain text with the emphasis that the Notice of Non-Approval includes. Readers of my two original documents on the topic may like to observe that I did not in any way suggest that the specialised italic characters for some mathematical uses are a precedent for the proposal that I submitted. Here is a link to a PDF (Portable Document Format) document produced earlier today of a song that I wrote earlier this year that mentions italics. http://www.users.globalnet.co.uk/~ngo/a_song_of_typography.pdf I still consider that the proposal is a good idea, but the decision has been emphatically made, so I have moved on. William Overington Friday 20 December 2019 From unicode at unicode.org Fri Dec 20 10:37:02 2019 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Fri, 20 Dec 2019 08:37:02 -0800 Subject: HEAVY EQUALS SIGN In-Reply-To: <331d99a7.6b6.16f23e2b33d.Webtop.226@btinternet.com> References: <20191218134219.560eed9b@spixxi> <1735205.9fDaAcxpeS@pc> <331d99a7.6b6.16f23e2b33d.Webtop.226@btinternet.com> Message-ID: <460424d2-1935-b4f2-e2a6-cb3a5a8276f3@sonic.net> On 12/20/2019 7:17 AM, wjgo_10009 at btinternet.com via Unicode wrote: > It is indeed interesting that the Notice of Non-Approval itself uses > italics for emphasis in two places. > > That text, at the present time, cannot be expressed in Unicode plain > text with the emphasis that the Notice of Non-Approval includes. ... which was /precisely /the point. I'm glad you noticed. --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Dec 20 18:23:39 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sat, 21 Dec 2019 00:23:39 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> Message-ID: <20191221002339.52b1eebe@JRWUBU2> On Fri, 20 Dec 2019 17:25:17 +0530 Shriramana Sharma via Unicode wrote: > So I never asked for NBSP to disappear. I said I want it to *stretch*. > And to my mind "stretch" means to become wider than one's normal > width. It doesn't include decreasing or disappearing width. Don't spaces sometimes shrink? I thought they did in some 'show codes' modes. > I don't expect NBSP to ever disappear, because spaces disappear only > at linebreaks, and NBSP simply doesn't stand at linebreaks. I can certainly imagine someone writing "  
". Richard. From unicode at unicode.org Fri Dec 20 20:29:08 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Sat, 21 Dec 2019 07:59:08 +0530 Subject: Not accepted by UTC but in ISO ballot? Message-ID: I was looking at the pipeline for something else, and for the first time I see a character category: ?not accepted by the UTC but in ISO ballot? and two characters in it. So IIUC while technically people are free to submit a document to the ISO separately without submitting to UTC, it has always been the practice to my knowledge to get a character approved by the UTC first. Anyone throw some light on these particular cases? -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Fri Dec 20 20:43:58 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Sat, 21 Dec 2019 08:13:58 +0530 Subject: [EXTERNAL] Re: NBSP supposed to stretch, right? In-Reply-To: References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> Message-ID: On 12/21/19, Murray Sargent wrote: > I checked with the Word team and they actually tried out stretching NBSP > back in 2015 in the "good client" mode. But customer feedback was negative. > The problem is that NBSP is used sometimes when stretching isn't wanted such > as between the end of a question and the question mark or in multi-word > trademarks or in italic expressions such as ad infinitum. Another example is > Text ? quotation ? more text. One doesn't want the ? and ? to > be spaced apart from "quotation" for justification purposes. > > Conceivably Word should offer a special justification option to stretch > NBSP, but user feedback has revealed that it's not a good default option. Ohkay and that's very nice meaningful feedback from actual developer+user interaction. So the way I look at this going forward is that we have four options: 1) With the existing single NBSP character, provide a software option to either make it flexible or inflexible, but this preference should be stored as part of the document and not the application settings, else shared documents would not preserve the layout intended by the creator. 2) Consider that the non-stretching behaviour of wordprocessors (probably following MS Word) is correct, and encode a new NBFSP non-breaking flexible space. [I'm looking at that convenient hole at 2065.] DTP software like InDesign/TeX (and browsers like Firefox, though web content is assumed to be more fluid typographically) should then ideally conform to this and potentially break their users' documents (esp in the case of DTP). 3) Consider that the stretching behaviour of DTP software like InDesign is correct, and encode a new FWNBSP fixed-width non-breaking space [at 2065]. Wordprocessors should then ideally conform to this and potentially break their users' documents. 4) Leave alone the existing ambiguous behaviour of NBSP, and encode two new characters [Supplemental Punctuation has space at 2E50?] for NBFSP and FW-NBSP. Like the existing 2028 and 2029 Line and Paragraph Separators with the annotation: ?may be used to represent this semantic unambiguously?. -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Fri Dec 20 20:46:44 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Sat, 21 Dec 2019 08:16:44 +0530 Subject: [EXTERNAL] Re: NBSP supposed to stretch, right? In-Reply-To: References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> Message-ID: On 12/21/19, Shriramana Sharma wrote: > 1) > > With the existing single NBSP character, provide a software option to > either make it flexible or inflexible, but this preference should be > stored as part of the document and not the application settings, else > shared documents would not preserve the layout intended by the > creator. One thing I forgot: are there any possibilities that *both* behaviours would be required in the same document? To my imagination, I who expect NBSP to be flexible won't use it between text and punctuation like those Word users, and probably they won't use it like me. -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Fri Dec 20 21:18:48 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Sat, 21 Dec 2019 08:48:48 +0530 Subject: NBSP supposed to stretch, right? In-Reply-To: <20191221002339.52b1eebe@JRWUBU2> References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> <20191221002339.52b1eebe@JRWUBU2> Message-ID: On 12/21/19, Richard Wordingham via Unicode wrote: > On Fri, 20 Dec 2019 17:25:17 +0530 > Shriramana Sharma via Unicode wrote: > >> I don't expect NBSP to ever disappear, because spaces disappear only >> at linebreaks, and NBSP simply doesn't stand at linebreaks. > > I can certainly imagine someone writing "  
". You don't need to go so far. Even the Unicode characters can be entered: A0 0A (which makes for a nice smiley like pattern, two ears besides two eyes ??). Obviously we are talking about *automatic* linebreaks. IIUC the point about NBSP is that *it itself* doesn't break, whereas SP breaks up and is *replaced* by a linebreak. Nobody said anything about manual linebreak characters *following* a space character, whether SP or NBSP or anything else. I also just tested and noticed something related: in my wordprocessor (LibreOffice Writer) when the cursor is near the end of a line and the horizontal space remaining on that line is less than the nominal advance width of the space, pressing space doesn't advance the cursor (or maybe it does and I don't see it) irrespective of whether the paragraph is left-aligned or justified, whereas inputting NBSP goes to the next line, pulling the word before it along with it. This is consistent with the current fixed-width NBSP behaviour of these wordprocessors. -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Fri Dec 20 21:50:25 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Sat, 21 Dec 2019 03:50:25 +0000 Subject: NBSP supposed to stretch, right? In-Reply-To: References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> Message-ID: On 2019-12-21 2:43 AM, Shriramana Sharma via Unicode wrote: > Ohkay and that's very nice meaningful feedback from actual > developer+user interaction. So the way I look at this going forward is > that we have four options: > > 1) > > With the existing single NBSP character, provide a software option to > either make it flexible or inflexible, but this preference should be > stored as part of the document and not the application settings, else > shared documents would not preserve the layout intended by the > creator. > > 5) Update the applications to treat NBSP correctly.? Process legacy data based on date/time stamp (or metadata) appropriately and offer users the option to update their legacy data algorithmically using proper non-stretching space characters such as FIGURE SPACE. - Options 1 and 5 have the advantage of not requiring the addition of yet more spacing characters to the Standard. From unicode at unicode.org Sat Dec 21 00:27:53 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Sat, 21 Dec 2019 11:57:53 +0530 Subject: Long standing problem with Vedic tone markers and post-base visarga/anusvara Message-ID: https://github.com/harfbuzz/harfbuzz/issues/2017 should provide the context for this. Ever since the early days of Devanagari Unicode, scholars like me dealing with Vedic Sanskrit orthography have been experiencing this problem, but chalked it upto early days and consequent insufficient support for Vedic sequences. Even now, Vedic support even on the font side is quite limited, and we also find limitations on the software side. So I hope it's time to fix them one by one. The issue I would like to discuss now is as follows: # SEMANTIC DISSOCIATION OF THE VISARGA FROM THE SYLLABLE In Vedic, syllables that carry tone markers ? which are mostly above-base or below-base ? often have to take a visarga, which is always post-base. In this case, the sequence intuitive to native scholars like me is: + + This is because the tone marker indicates the tone of the syllable (or its vowel) and the visarga is a separate aspirated sound *after* the syllable to which the tone marker doesn't apply. In fact, the only reason the visarga sign is analysed as a combining mark rather than a separate letter is that it is not used in isolation without a preceding syllable. Otherwise ie linguistically it doesn't modify the preceding syllable in any way. Anyhow, the point is that the tone marker should come before the visarga because it semantically applies to the preceding syllable and not the visarga. This is all the more so since in some Vedic contexts (Sama Gana) the visarga is far separated from the syllable by other syllables like digits (themselves carrying combining marks) or spacing anusvara, as seen in examples from my Grantha proposal L2/09-372 p 40. So the visarga is semantically quite dissociated from the preceding syllable unlikely the tone marker which is intimately associated with it. # SAME APPLICABLE TO THE ANUSVARA The same argument is also applicable to the anusvara as it also represents a nasal sound separate from the preceding syllable. (The candrabindu OTOH nasalises the preceding syllable itself.) The above Grantha proposal page also shows an example where an anusvara is orthographically separated from the preceding syllable by three characters: a tone marker + avagraha + digit. L2/15-178 shows that in equivalent contexts of Devanagari the digit 0 is used as a substitute since the Devanagari anusvara is non-spacing. All this goes to the dissociation from the syllable of the anusvara ? just like the visarga ? compared to tone markers. So to be consistent, even in case of Devanagari (or such script) where the anusvara is non-spacing, the sequence when a tone marker is also involved puts the tone marker first, as mentioned before: + + # CURRENT SITUATION INCOMPATIBLE WITH ABOVE However, even the simplest Vedic sequence (not involving Sama Vedic or multiple tone marker combinations) like ?????????? throws up a dotted circle, and one is expected (see developer feedback in that bug report) to input the visarga before tone markers, hoping the software is intelligent enough to skip over the visarga (or anusvara) place the tone marker over the preceding syllable correctly. Why it is necessary to put the visarga first in input only to have to skip over it in shaping is beyond me. So makes sense neither from a linguistic nor technological perspective to push the tone markers to the end of the syllable. Even the developers acknowledge that non-spacing marks are normally (ie outside Indic) input before spacing ones. However, they say ?we can't support that in this particular case because this is how Microsoft does it and we have to follow suit to ensure people get the same shaping for the same input?, notwithstanding the fact that the expectation to put the visarga/anusvara first is non-sensical as explained above. So everyone is looking to Microsoft Uniscribe (or whatever its successor is) to fix things first before they can follow. I figured that if this is discussed and decided here, everyone can fix it at the same time. -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Sat Dec 21 08:00:04 2019 From: unicode at unicode.org (Marius Spix via Unicode) Date: Sat, 21 Dec 2019 15:00:04 +0100 Subject: Aw: Not accepted by UTC but in ISO ballot? Message-ID: So, WG2 N5058, was literally a TROLL submission. > Gesendet: Samstag, 21. Dezember 2019 um 03:29 Uhr > Von: "Shriramana Sharma via Unicode" > An: "UnicoDe List" > Betreff: Not accepted by UTC but in ISO ballot? > > I was looking at the pipeline for something else, and for the first > time I see a character category: ?not accepted by the UTC but in ISO > ballot? and two characters in it. > > So IIUC while technically people are free to submit a document to the > ISO separately without submitting to UTC, it has always been the > practice to my knowledge to get a character approved by the UTC first. > > Anyone throw some light on these particular cases? > > -- > Shriramana Sharma ???????????? ???????????? ???????????????????????? > > From unicode at unicode.org Sat Dec 21 23:38:25 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Sun, 22 Dec 2019 11:08:25 +0530 Subject: NBSP supposed to stretch, right? In-Reply-To: References: Message-ID: On 12/19/19, James Kass via Unicode wrote: > > There's a bug report for the LibreOffice application here... > https://bugs.documentfoundation.org/show_bug.cgi?id=41652 > ...which shows an interesting history of the situation. LOL two years ago almost to the date Shriramana Sharma seems to have already *quoted* the Unicode Standard on this (https://bugs.documentfoundation.org/show_bug.cgi?id=41652#c30): The Unicode standard document http://unicode.org/reports/tr14/ clearly states that: When expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. But we have some people there on that bug saying that: While Unicode is an important standard, it's only of secondary importance to an office suite. Its primary goal is *not* creating a reference comformant implementation of the standard; rather, it should use the standard to the extent it needs to serve its users most. which is a ?? approach in my eyes but well, that's how the real world is on many things. Anyhow the above comment is continued as: And if legacy requires that some statements of standard be violated to keep existing documents intact, that should be that way, until a better design is invented and implemented, which would make possible to please both sides. This means option #1 I mentioned earlier and which seems to already have been discussed in the bug discussion: provide a per-document option or at least a Word-compatibility option as to how to treat NBSP. -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Sun Dec 22 13:08:04 2019 From: unicode at unicode.org (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?= via Unicode) Date: Sun, 22 Dec 2019 20:08:04 +0100 Subject: Aw: Re: NBSP supposed to stretch, right? In-Reply-To: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Dec 22 15:54:06 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Mon, 23 Dec 2019 03:24:06 +0530 Subject: NBSP supposed to stretch, right? In-Reply-To: References: <8b6ca6cc-ea1f-c860-7b3a-2c37638df11f@ix.netcom.com> Message-ID: So I was wondering whether TeX only does this to the ~ input character or the actual NBSP Unicode character too? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Dec 25 16:40:48 2019 From: unicode at unicode.org (S. Wascher via Unicode) Date: Wed, 25 Dec 2019 23:40:48 +0100 Subject: proofreading symbols (Korrekturzeichen) according to DIN 16511 Message-ID: <4E973834-A69F-4C75-9FAE-F6D1E280F9E2@simonwascher.info> Hello, after some clumsy and therefore unsuccessful efforts to find equivalents to the proofreading symbols defined in DIN 16511 I joined this list to ask where I can fin these symbols or lookalikes of them in Unicode. Here a webpage I found that lists these symbols and their meanings: http://www.typovia.at/index.php/desktop-publishing/din-normen/din-16511-korrekturzeichen I am specially interested in the shapes for "Absatz" and "Wortzwischenraum". Hope you can help me, at least pointing me towards the best place to ask or look. Thanks, Simon From unicode at unicode.org Thu Dec 26 19:49:36 2019 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Thu, 26 Dec 2019 17:49:36 -0800 Subject: Not accepted by UTC but in ISO ballot? In-Reply-To: References: Message-ID: <2200d684-17f3-70af-c6d7-df342c3e567c@sonic.net> Shriramana, On 12/20/2019 6:29 PM, Shriramana Sharma via Unicode wrote: > I was looking at the pipeline for something else, and for the first > time I see a character category: ?not accepted by the UTC but in ISO > ballot? and two characters in it. Those two characters changed status as of December 4, when the disposition of comments for CD3 was posted. They will not be part of the DIS ballot. The pipeline has now been updated to reflect that change of status. > > So IIUC while technically people are free to submit a document to the > ISO separately without submitting to UTC, it has always been the > practice to my knowledge to get a character approved by the UTC first. That is a preferred process, but doesn't always occur. The most obvious exception is that large new CJK repertoire additions are developed by the IRG and often go into ballot in ISO before the UTC takes a formal decision to approve them. CJK Extension G has now been approved for 13.0 by the UTC, but the entire block was listed in the pipeline for some time as "not accepted by UTC, but in active ISO technical ballot" once Extension G went into CD balloting. --Ken From unicode at unicode.org Fri Dec 27 09:06:03 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Fri, 27 Dec 2019 20:36:03 +0530 Subject: Not accepted by UTC but in ISO ballot? In-Reply-To: <2200d684-17f3-70af-c6d7-df342c3e567c@sonic.net> References: <2200d684-17f3-70af-c6d7-df342c3e567c@sonic.net> Message-ID: Hello Ken and thanks for the reply. So I understand that the need for this category is rare but occurs nevertheless. Now I'm wondering about the similar category "not accepted by UTC, and not in ISO ballot" ? why such a character would be mentioned on the pipeline at all? On Fri, 27 Dec, 2019, 07:19 Ken Whistler, wrote: > Shriramana, > > On 12/20/2019 6:29 PM, Shriramana Sharma via Unicode wrote: > > I was looking at the pipeline for something else, and for the first > > time I see a character category: ?not accepted by the UTC but in ISO > > ballot? and two characters in it. > Those two characters changed status as of December 4, when the > disposition of comments for CD3 was posted. They will not be part of the > DIS ballot. The pipeline has now been updated to reflect that change of > status. > > > > So IIUC while technically people are free to submit a document to the > > ISO separately without submitting to UTC, it has always been the > > practice to my knowledge to get a character approved by the UTC first. > > That is a preferred process, but doesn't always occur. The most obvious > exception is that large new CJK repertoire additions are developed by > the IRG and often go into ballot in ISO before the UTC takes a formal > decision to approve them. CJK Extension G has now been approved for 13.0 > by the UTC, but the entire block was listed in the pipeline for some > time as "not accepted by UTC, but in active ISO technical ballot" once > Extension G went into CD balloting. > > --Ken > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Dec 27 04:34:39 2019 From: unicode at unicode.org (wjgo_10009@btinternet.com via Unicode) Date: Fri, 27 Dec 2019 10:34:39 +0000 (GMT) Subject: Videos on YouTube Message-ID: <76437c43.e1.16f46ec27d4.Webtop.56@btinternet.com> I searched on YouTube for Gutenberg Mainz and filtered for This week and I found 12 videos uploaded 3 days ago about a symposium called Alphabetica 2019. Apparently held in Amsterdam. It seems that the videos were listed for that search as the notes include "Presented in collaboration with the Institut Designlabor Gutenberg (Hochshule Mainz)," ? [and several other organizations] so both the words Gutenberg and Mainz were matched to the search. So, a serendipitous discovery. There is an interesting section in one video about Bliss and a new interesting development relating to the (possible) encoding of Bliss characters into Unicode. https://www.youtube.com/watch?v=mwj2KilAXmo Here are links to two videos of continuous walks through Mainz: each of them includes the Statue of Gutenberg and the outside of the Gutenberg Museum yet are otherwise almost non-overlapping in their routes. https://www.youtube.com/watch?v=scjLxGh17rA https://www.youtube.com/watch?v=izqBUQkfByw William Overington Frisday 27 December 2019 From unicode at unicode.org Fri Dec 27 10:28:43 2019 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Fri, 27 Dec 2019 08:28:43 -0800 Subject: Not accepted by UTC but in ISO ballot? In-Reply-To: References: <2200d684-17f3-70af-c6d7-df342c3e567c@sonic.net> Message-ID: Shriramana, That category is used to track character(s) in process that may have been approved by WG2 but are not yet in ballot, or are in contention, and may have just been dropped from ballot, but which still have sufficient visibility to be tracked. The process is a bit rough around the edges when dealing with two separate committees with asynchronous processes and not all of whose members have unanimous agreement about what they are moving forward on. The pipeline is a means of tracking various status as the committees work to synchronize their eventual publications of new repertoire. --Ken On 12/27/2019 7:06 AM, Shriramana Sharma via Unicode wrote: > Now I'm wondering about the similar category "not accepted by UTC, and > not in ISO ballot" ? why such a character would be mentioned on the > pipeline at all? From unicode at unicode.org Fri Dec 27 16:52:44 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Fri, 27 Dec 2019 14:52:44 -0800 Subject: Twitter corrects Kwanzaa emoji Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 07:13:29 2019 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 31 Dec 2019 14:13:29 +0100 Subject: emojis for mouse buttons? Message-ID: A lot of application need to document their keymap and want to display keys. For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), independently of the button actually pressed. However there's no simple emoji to represent the very common mouse click buttons used in lot of UI. But it would be good to have emojis for the left, center, and right click (showing a mouse with the correct button filled in black), instead of writing "left click" in plain text. Has it been proposed ? See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 07:56:59 2019 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Tue, 31 Dec 2019 19:26:59 +0530 Subject: emojis for mouse buttons? In-Reply-To: References: Message-ID: Why are these called "emojis" for mouse buttons rather than just "characters" for them? On Tue, 31 Dec, 2019, 18:45 Philippe Verdy via Unicode, wrote: > A lot of application need to document their keymap and want to display > keys. > > For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), > independently of the button actually pressed. > > However there's no simple emoji to represent the very common mouse click > buttons used in lot of UI. > > But it would be good to have emojis for the left, center, and right click > (showing a mouse with the correct button filled in black), instead of > writing "left click" in plain text. > > Has it been proposed ? > > See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 09:49:53 2019 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 31 Dec 2019 16:49:53 +0100 Subject: emojis for mouse buttons? In-Reply-To: References: Message-ID: I say "emoji" because they would belong to the subsets of emojis, within characters, and existing mouse characters (but not button-specific) are already encoded as emojis (i.e. two styles: basic glyphs or color icons). What is important is less the mouse than the identification of the button (left/center/right) for documenting keymaps in UI (the documentation usually indicate the default right-hand assignment, a user may still configure the mouse driver to swap the left/right buttons). For now the alternative is to compose a localisable string like "L" or "R" or "C", followed by the generic mouse (when documenting keymaps, the surrounding square and shading may be done outside using styling, we just need the unique symbol in a more immediately readable way than just "click". A generic clic (1st button) is sometimes represented as an arrow cursor or hand with a pointing finger, and some radial strokes near the tip of the arrow, but it is not very distinctive when we need to explicitly disinguish the buttons, so I suggest a basic empty shape (rounded rectangle or ovoid like a narrow theta "?"), with the top part split in three cells by horizontal and vertical strokes, and one of the three cells filled (representing the wire or the wireless waves is not necessary). Le mar. 31 d?c. 2019 ? 14:57, Shriramana Sharma a ?crit : > Why are these called "emojis" for mouse buttons rather than just > "characters" for them? > > On Tue, 31 Dec, 2019, 18:45 Philippe Verdy via Unicode, < > unicode at unicode.org> wrote: > >> A lot of application need to document their keymap and want to display >> keys. >> >> For now there are emojis for mouses (several variants: 1, 2 or 3 >> buttons), independently of the button actually pressed. >> >> However there's no simple emoji to represent the very common mouse click >> buttons used in lot of UI. >> >> But it would be good to have emojis for the left, center, and right click >> (showing a mouse with the correct button filled in black), instead of >> writing "left click" in plain text. >> >> Has it been proposed ? >> >> See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 10:17:08 2019 From: unicode at unicode.org (John W Kennedy via Unicode) Date: Tue, 31 Dec 2019 11:17:08 -0500 Subject: emojis for mouse buttons? In-Reply-To: References: Message-ID: <1870CC5A-587B-4248-AD82-8E64CFD5DE35@gmail.com> Operationally, one does not program for ?left? or ?right? buttons, because left-handed users are encouraged to set a switch that logically turns the mouse around, with ?Button 1? being the button worked by the index finger, no matter what side of the mouse it?s on. -- John W. Kennedy "Compact is becoming contract, Man only earns and pays." -- Charles Williams. "Bors to Elayne: On the King's Coins" > On Dec 31, 2019, at 10:52 AM, Philippe Verdy via Unicode wrote: > > ? > I say "emoji" because they would belong to the subsets of emojis, within characters, and existing mouse characters (but not button-specific) are already encoded as emojis (i.e. two styles: basic glyphs or color icons). > > What is important is less the mouse than the identification of the button (left/center/right) for documenting keymaps in UI (the documentation usually indicate the default right-hand assignment, a user may still configure the mouse driver to swap the left/right buttons). > > For now the alternative is to compose a localisable string like "L" or "R" or "C", followed by the generic mouse (when documenting keymaps, the surrounding square and shading may be done outside using styling, we just need the unique symbol in a more immediately readable way than just "click". > > A generic clic (1st button) is sometimes represented as an arrow cursor or hand with a pointing finger, and some radial strokes near the tip of the arrow, but it is not very distinctive when we need to explicitly disinguish the buttons, so I suggest a basic empty shape (rounded rectangle or ovoid like a narrow theta "?"), with the top part split in three cells by horizontal and vertical strokes, and one of the three cells filled (representing the wire or the wireless waves is not necessary). > > > Le mar. 31 d?c. 2019 ? 14:57, Shriramana Sharma a ?crit : >> Why are these called "emojis" for mouse buttons rather than just "characters" for them? >> >> On Tue, 31 Dec, 2019, 18:45 Philippe Verdy via Unicode, wrote: >>> A lot of application need to document their keymap and want to display keys. >>> >>> For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), independently of the button actually pressed. >>> >>> However there's no simple emoji to represent the very common mouse click buttons used in lot of UI. >>> >>> But it would be good to have emojis for the left, center, and right click (showing a mouse with the correct button filled in black), instead of writing "left click" in plain text. >>> >>> Has it been proposed ? >>> >>> See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 10:01:42 2019 From: unicode at unicode.org (wjgo_10009@btinternet.com via Unicode) Date: Tue, 31 Dec 2019 16:01:42 +0000 (GMT) Subject: emojis for mouse buttons? In-Reply-To: References: Message-ID: <620c9411.e0d.16f5cb10582.Webtop.228@btinternet.com> I read Philippe's post and I remembered the following thread that I started in the High-Logic forum. https://forum.high-logic.com/viewtopic.php?f=10&t=3818 Are these of any interest as designs? Best regards, William Overington Tuesday 31 December 2019 ------ Original Message ------ From: "Philippe Verdy via Unicode" To: "unicode Unicode Discussion" Sent: Tuesday, 2019 Dec 31 At 13:13 Subject: emojis for mouse buttons? A lot of application need to document their keymap and want to display keys. For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), independently of the button actually pressed. However there's no simple emoji to represent the very common mouse click buttons used in lot of UI. But it would be good to have emojis for the left, center, and right click (showing a mouse with the correct button filled in black), instead of writing "left click" in plain text. Has it been proposed ? See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 13:30:58 2019 From: unicode at unicode.org (wjgo_10009@btinternet.com via Unicode) Date: Tue, 31 Dec 2019 19:30:58 +0000 (GMT) Subject: emojis for mouse buttons? In-Reply-To: References: Message-ID: <7ec43d0f.10d1.16f5d7099d1.Webtop.45@btinternet.com> How about the following. Expand Philippe's idea of the theta shape to having a three by three grid of cells, rounded at the two lower outside corners to suggest the shape of a mouse unit. The three columns left to right referring to the left button, the centre button and the right button respectively. For each button column, there is an upper cell, a middle cell and a lower cell. A filled upper cell to mean click, a filled upper cell and a filled middle cell to mean double click, a filled lower cell to mean mouse down, a filled middle cell to mean mouse up, a filled lower cell and a filled middle cell to mean mouse down then drag. So, at present, fifteen new emoji characters. In use a mouse down then drag symbol would be used followed by a mouse up symbol later. The grid could be one colour and the cell fill another colour if desired, but the design would also be unambiguous in monochrome as a display default. William Overington Tuesday 31 December 2019 ------ Original Message ------ From: "Philippe Verdy via Unicode" To: "Shriramana Sharma" Cc: "unicode Unicode Discussion" Sent: Tuesday, 2019 Dec 31 At 15:49 Subject: Re: emojis for mouse buttons? I say "emoji" because they would belong to the subsets of emojis, within characters, and existing mouse characters (but not button-specific) are already encoded as emojis (i.e. two styles: basic glyphs or color icons). What is important is less the mouse than the identification of the button (left/center/right) for documenting keymaps in UI (the documentation usually indicate the default right-hand assignment, a user may still configure the mouse driver to swap the left/right buttons). For now the alternative is to compose a localisable string like "L" or "R" or "C", followed by the generic mouse (when documenting keymaps, the surrounding square and shading may be done outside using styling, we just need the unique symbol in a more immediately readable way than just "click". A generic clic (1st button) is sometimes represented as an arrow cursor or hand with a pointing finger, and some radial strokes near the tip of the arrow, but it is not very distinctive when we need to explicitly disinguish the buttons, so I suggest a basic empty shape (rounded rectangle or ovoid like a narrow theta "?"), with the top part split in three cells by horizontal and vertical strokes, and one of the three cells filled (representing the wire or the wireless waves is not necessary). Le mar. 31 d?c. 2019 ? 14:57, Shriramana Sharma > a ?crit : Why are these called "emojis" for mouse buttons rather than just "characters" for them? On Tue, 31 Dec, 2019, 18:45 Philippe Verdy via Unicode, > wrote: A lot of application need to document their keymap and want to display keys. For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), independently of the button actually pressed. However there's no simple emoji to represent the very common mouse click buttons used in lot of UI. But it would be good to have emojis for the left, center, and right click (showing a mouse with the correct button filled in black), instead of writing "left click" in plain text. Has it been proposed ? See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 16:04:39 2019 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 31 Dec 2019 23:04:39 +0100 Subject: emojis for mouse buttons? In-Reply-To: <7ec43d0f.10d1.16f5d7099d1.Webtop.45@btinternet.com> References: <7ec43d0f.10d1.16f5d7099d1.Webtop.45@btinternet.com> Message-ID: Playing with the fiolling of the middle cell to mean a double click is a bad idea, it would be better to add one or two rounded borders separated from the button, or simply display two icons in sequence for a double click). Note that the glyphs do not necessarily have to show a mouse, it could as well be a square with its lower third part split into two or three squares, like a touchpad (see the notification icons displayed by Synaptics touchpad drivers). The same rounded borders could also mean the number of clicks. As well, if a ouse is represented, it may or may not have a wire. Emoji-styles could use more realistic 3D-like rendering with extra shadows... Le mar. 31 d?c. 2019 ? 22:16, wjgo_10009 at btinternet.com via Unicode < unicode at unicode.org> a ?crit : > How about the following. > > A filled upper cell to mean click, > > a filled upper cell and a filled middle cell to mean double click, > Note that clicking and maintaining the button is just like the convention of using "+" after a key modifier before the actual key (both key may be styled separately to decorate their glyphs into a keycap, but such styling should not be applied in the distinctive glyph; there may also be emoji sequences to combine an anonymous keycap base emoji with the following characters, using joiner controls, but this is more difficult for keys whose labels are texts made of multiple letters like "End" or words like "Print Screen", after a possible Unicode symbol for keys like Page Up, Home, End, NumLock; styling the text offers better option and accessibility even if symbols are used and a whole translatable string is surrounded by deocrating styles to create a visual keycap). -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Dec 31 17:01:15 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Tue, 31 Dec 2019 23:01:15 +0000 Subject: Long standing problem with Vedic tone markers and post-base visarga/anusvara In-Reply-To: References: Message-ID: <356457cb-57c2-0a0c-a6d7-148dfb2b52d1@gmail.com> On 2019-12-21 6:27 AM, Shriramana Sharma via Unicode wrote: > However, even the simplest Vedic sequence (not involving Sama Vedic or > multiple tone marker combinations) like ?????????? throws up a dotted > circle, and one is expected (see developer feedback in that bug > report) to input the visarga before tone markers, hoping the software > is intelligent enough to skip over the visarga (or anusvara) place the > tone marker over the preceding syllable correctly. Why it is necessary > to put the visarga first in input only to have to skip over it in > shaping is beyond me. ?? ??? ??? -- visarga last ???? -- " ??? -- visarga before accent (U+0954) ???? -- " ?? ??? ??? -- visarga last ???? -- " ??? ---- visarga before svarita (U+0951) ???? ---- " U+0951 and U+0954 have canonical combining class of 230.? Putting VISARGA (CCC=0) after those CCC=230 marks generates the dotted circle for VISARGA.? Putting VISARGA before those CCC=230 marks generates the dotted circle for U+0954 but drops the dotted circle for U+0951.? In both cases where VISARGA comes before, the mark positioning is broken.? (Mangal font, Win 7) As far as I can tell, the simplest solution would be for the Indic shaping engines to suppress the dotted circle for VISARGA (or ANUSVARA) where appropriate.? Entering/storing VISARGA or ANUSVARA at the end of the syllable makes sense since that's where it goes, visually and logically. From unicode at unicode.org Tue Dec 31 19:19:02 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Wed, 1 Jan 2020 01:19:02 +0000 Subject: Long standing problem with Vedic tone markers and post-base visarga/anusvara In-Reply-To: <356457cb-57c2-0a0c-a6d7-148dfb2b52d1@gmail.com> References: <356457cb-57c2-0a0c-a6d7-148dfb2b52d1@gmail.com> Message-ID: <412c4d8e-0120-76ef-531a-db04e8055168@gmail.com> A workaround until some kind of satisfactory adjustment is made might be to simply use COLON for VISARGA.? Or... ?VISARGA ? U+02F8 MODIFIER LETTER RAISED COLON ANUSVARA?U+02D9 DOT ABOVE ...as long as the font(s) included both those characters. ?? ??? ??? -- anusvara last ???? -- " ??: -- colon last ???: -- " ??? -- raised colon modifier last ???? -- " ??? -- spacing dot above last ???? -- "