From unicode at unicode.org Mon Sep 2 21:06:34 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 3 Sep 2019 03:06:34 +0100 Subject: LDML Keyboard Descriptions and Normalisation Message-ID: <20190903030634.15b512a2@JRWUBU2> I'm getting conflicting indications about how the LDML keyboard description handles issues of canonical equivalence. I have one simple question which some people may be able to answer. Is the keyboard specification intended to distinguish between keyboards that generally output: (a) NFC text; (b) NFD text; or (c) Deliberately unnormalised texts? For example, when documenting my own keyboards, I would want to distinguish between a keyboard that went to great trouble to output text in precomposed characters as opposed to one that took the easy route of outputting text in fully decomposed characters. For a Tibetan keyboard, it would matter whether contractions were compatible with the USE (so generally *not* NFC or NFD) or in NFC or NFD. Richard. From unicode at unicode.org Tue Sep 3 13:03:18 2019 From: unicode at unicode.org (Andrew Glass via Unicode) Date: Tue, 3 Sep 2019 18:03:18 +0000 Subject: LDML Keyboard Descriptions and Normalisation In-Reply-To: <20190903030634.15b512a2@JRWUBU2> References: <20190903030634.15b512a2@JRWUBU2> Message-ID: Hi Richard, This is a good point. A keyboard that is doing transforms should specify which type of normalization it has been designed to do. I've filed a ticket to track this. Cheers, Andrew -----Original Message----- From: Unicode On Behalf Of Richard Wordingham via Unicode Sent: 02 September 2019 19:07 To: unicode at unicode.org Subject: LDML Keyboard Descriptions and Normalisation I'm getting conflicting indications about how the LDML keyboard description handles issues of canonical equivalence. I have one simple question which some people may be able to answer. Is the keyboard specification intended to distinguish between keyboards that generally output: (a) NFC text; (b) NFD text; or (c) Deliberately unnormalised texts? For example, when documenting my own keyboards, I would want to distinguish between a keyboard that went to great trouble to output text in precomposed characters as opposed to one that took the easy route of outputting text in fully decomposed characters. For a Tibetan keyboard, it would matter whether contractions were compatible with the USE (so generally *not* NFC or NFD) or in NFC or NFD. Richard. From unicode at unicode.org Sat Sep 7 13:51:33 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sat, 7 Sep 2019 19:51:33 +0100 Subject: LDML Keyboard Descriptions and Normalisation In-Reply-To: References: <20190903030634.15b512a2@JRWUBU2> Message-ID: <20190907195133.32c3c5e3@JRWUBU2> On Tue, 3 Sep 2019 18:03:18 +0000 Andrew Glass via Unicode wrote: > Hi Richard, > > This is a good point. A keyboard that is doing transforms should > specify which type of normalization it has been designed to do. I've > filed a ticket to track this. The ticket is https://unicode-org.atlassian.net/browse/CLDR-13273 . My question was whether the recording capability is already there in LDML. It doesn't even need transforms - there are enough keys to support Latin-1 in NFC without any transforms. Richard. From unicode at unicode.org Sat Sep 7 14:02:09 2019 From: unicode at unicode.org (Cibu via Unicode) Date: Sat, 7 Sep 2019 20:02:09 +0100 Subject: LDML Keyboard Descriptions and Normalisation In-Reply-To: <20190907195133.32c3c5e3@JRWUBU2> References: <20190903030634.15b512a2@JRWUBU2> <20190907195133.32c3c5e3@JRWUBU2> Message-ID: Slightly off topic: Is there a CLDR tool to try out transformations specified in a keyboard spec? On Sat, Sep 7, 2019 at 7:54 PM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > On Tue, 3 Sep 2019 18:03:18 +0000 > Andrew Glass via Unicode wrote: > > > Hi Richard, > > > > This is a good point. A keyboard that is doing transforms should > > specify which type of normalization it has been designed to do. I've > > filed a ticket to track this. > > The ticket is https://unicode-org.atlassian.net/browse/CLDR-13273 . > > My question was whether the recording capability is already there in > LDML. It doesn't even need transforms - there are enough keys to support > Latin-1 in NFC without any transforms. > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Sep 7 14:41:34 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sat, 7 Sep 2019 20:41:34 +0100 Subject: LDML Keyboard Descriptions and Normalisation In-Reply-To: References: <20190903030634.15b512a2@JRWUBU2> <20190907195133.32c3c5e3@JRWUBU2> Message-ID: <20190907204134.36efe305@JRWUBU2> On Sat, 7 Sep 2019 20:02:09 +0100 Cibu via Unicode wrote: > Slightly off topic: Is there a CLDR tool to try out transformations > specified in a keyboard spec? No CLDR tool, or so far as I am aware, CLDR-endorsed tool. Martin Hoksen has put together a reference model in Python at https://github.com/keymanapp/ldml-keyboards-dev , and it seems highly likely that the model is consistent with the 'specification'. I get the strong feeling that the new sections of LDML Part 7 are an inadequate description of this model. I don't think the model will run with Python Version 2.7. Richard. From unicode at unicode.org Tue Sep 10 17:33:13 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 10 Sep 2019 23:33:13 +0100 Subject: LDML Keyboard Descriptions and Normalisation In-Reply-To: <20190907204134.36efe305@JRWUBU2> References: <20190903030634.15b512a2@JRWUBU2> <20190907195133.32c3c5e3@JRWUBU2> <20190907204134.36efe305@JRWUBU2> Message-ID: <20190910233313.1a2042a9@JRWUBU2> On Sat, 7 Sep 2019 20:41:34 +0100 Richard Wordingham via Unicode wrote: > I don't think the model will run with Python Version 2.7. I was wrong. It does run under Version 2.7. Richard. From unicode at unicode.org Thu Sep 12 07:53:45 2019 From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode) Date: Thu, 12 Sep 2019 14:53:45 +0200 (CEST) Subject: Proposing mostly invisible characters Message-ID: <839159959.209126.1568292825255@ox.hosteurope.de> Dear Unicoders There are some characters that have no precedent in existing encodings and are also hard to attest directly from printed sources. Can one still make a solid case for encoding those in Unicode? I am thinking of characters that are either invisible (most of the time) or can become invisible under certain circumstances. Precedence ---------- - HYPHEN U+2010 is *always* rendered as a hyphen (i.e. a centered horizontal bar glyph), which may look identical to Hyphen-Minus U+002D. - SOFT HYPHEN (SHY) U+00AD is *only* rendered as a hyphen *when* it appears at the end of a line. - At least four existing math operators are *never* rendered with a visible glyph and only explicitly encode semantics where syntax is potentially ambiguous otherwise: * FUNCTION APPLICATION U+2061 is used where no multiplication is implied, e.g. between an alphabetic function variable and an opening parenthesis: f(x). * INVISIBLE TIMES U+2062 is used where multiplication by either TIMES U+00D7 or MIDDLE DOT U+00B7 is implied, e.g. between a number and an alphabetic variable, constant or parenthesis: 2?r(a+b) * INVISIBLE SEPARATOR U+2063 is used where enumeration by a COMMA U+002C or SEMICOLON U+003B (and possibly whitespace) is implied, e.g. between two single-letter variable indices: a??. * INVISIBLE PLUS U+2064 is used where addition by PLUS SIGN U+002B is implied, e.g. between an integer and a vulgar fraction: 1?. Suggestions ----------- - INVERSE SOFT HYPHEN (ISHY) or SOFT INVISIBLE HYPHEN (SIHY) is *always* rendered as a hyphen *unless* it appears at the end of a line. - INVISIBLE HYPHEN (IHY) or ZERO-WIDTH HYPHEN (ZWH) is *never* rendered as a hyphen, *but* the word it appears in is treated as if it contained one at its position. - INVERSE SOFT COMMA (ISC) or SOFT INVISIBLE COMMA (SIC) is *always* rendered as a comma *unless* it appears at the end of a line. - INVISIBLE OPEN PARENTHESIS (IOP) and INVISIBLE CLOSE PARENTHESIS (ICP) *should not* be rendered with a visible glyph, but *may* be for inline fallback. ISHY/SIHY is especially useful for encoding (German) noun compounds in wrapped titles, e.g. on product labeling, where hyphens are often suppressed for stylistic reasons, e.g. orthographically correct _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel?Suppe_ (U+2010) may be rendered as _Spargel?Suppe_ and could then be encoded as _SpargelSuppe_. Like the existing invisible math operators, IHY/ZWH is used where the presence of its visible counterpart (i.e. HYPHEN) would be required syntactically (i.e. orthographically), but can be derived from context and convention (at least by human readers). This is useful for spell-checking, line-breaking etc., e.g. for words (commercial names in particular) with internal capital letters that would otherwise break orthographic rules and that should be broken at the of end a line without a hyphen added (i.e. like ISHY/SIHY, not SHY). This is very similar to ZERO-WIDTH SPACE (ZWSP) and WORD JOINER (WJ) indeed, except that ZWSP separates two words, where IHY/ZWH joins them into one, but unlike WJ still allows a line break. ISC/SIC is particularly useful in wrapping table headers where a possible line break can take on the separating role of a comma. IOP and ICP enclose mathematical expressions to override precedence of operators that would otherwise apply and they enclose textual annotation that should be displayed outside the normal row of characters, e.g. a sum in the numerator or denominator of a fraction and ruby/furigana pronunciation hints, respectively, that both *may* be rendered inline where advanced typographic functionality is unavailable and should then be parenthesized for clarity. From unicode at unicode.org Thu Sep 12 08:34:11 2019 From: unicode at unicode.org (r12a via Unicode) Date: Thu, 12 Sep 2019 14:34:11 +0100 Subject: The native name of Tai Viet script and language(s) In-Reply-To: <83v9ujdsn8.fsf@gnu.org> References: <83ef1dco05.fsf@gnu.org> <83v9ujdsn8.fsf@gnu.org> Message-ID: On 27/08/2019 07:33, Eli Zaretskii via Unicode wrote: > Yes, it's an old and outdated text (Emacs is around since 1985, and > supports multilingual text editing since 1997). Easy to fix, and I > will fix it, but my main difficulty is with the text that uses the > script itself, which is why I asked here. I couldn't find > copy/paste-able text for that anywhere on the Internet. For a list of languages for which Tai Viet script is used, see https://scriptsource.org/cms/scripts/page.php?item_id=script_detail&key=Tavt For some copy-pastable text you could try the Tai Dam version of the Universal Declaration of Human Rights at https://unicode.org/udhr/d/udhr_blt.html For an introduction to the script and the characters used for Tai Dam, you may find this useful https://r12a.github.io/scripts/taiviet/ If you want to try typing some Tai Viet text, try https://r12a.github.io/pickers/taiviet/ hth ri From unicode at unicode.org Thu Sep 12 09:29:46 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 12 Sep 2019 15:29:46 +0100 Subject: Proposing mostly invisible characters In-Reply-To: <839159959.209126.1568292825255@ox.hosteurope.de> References: <839159959.209126.1568292825255@ox.hosteurope.de> Message-ID: <20190912152946.4a78dce7@JRWUBU2> On Thu, 12 Sep 2019 14:53:45 +0200 (CEST) Christoph P?per via Unicode wrote: > Dear Unicoders > > There are some characters that have no precedent in existing > encodings and are also hard to attest directly from printed sources. > Can one still make a solid case for encoding those in Unicode? > > I am thinking of characters that are either invisible (most of the > time) or can become invisible under certain circumstances. > - INVISIBLE HYPHEN (IHY) or ZERO-WIDTH HYPHEN (ZWH) > is *never* rendered as a hyphen, > *but* the word it appears in is treated as if it contained one at > its position. SOFT HYPHEN is supposed to be rendered in the manner appropriate to the writing system, not necessarily like a HYPHEN. In some writing systems, such as, I gather, most very modern Lao writing systems, it has no visual indication. TUS claims that I was hallucinating when I saw word wrapping hyphens in non-scriptio continua Pali in the Tai Tham script in a Lao book. (To put it less provocatively, one needs user-level control of the rendering of soft hyphens.) So, to make a proper case for INVISIBLE HYPHEN, you at least need evidence of a contrast between soft-hyphen and an invisible hyphen. Even then, you run the risk of being told that you should use a higher level protocol which you will have to implement yourself. Also, so long as you don't need your text to be automatically split into words, you can use ZWSP for the function. Richard. From unicode at unicode.org Fri Sep 13 00:56:02 2019 From: unicode at unicode.org (Henri Sivonen via Unicode) Date: Fri, 13 Sep 2019 08:56:02 +0300 Subject: Proposing mostly invisible characters In-Reply-To: <839159959.209126.1568292825255@ox.hosteurope.de> References: <839159959.209126.1568292825255@ox.hosteurope.de> Message-ID: On Thu, Sep 12, 2019, 15:53 Christoph P?per via Unicode wrote: > ISHY/SIHY is especially useful for encoding (German) noun compounds in > wrapped titles, e.g. on product labeling, where hyphens are often > suppressed for stylistic reasons, e.g. orthographically correct > _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel?Suppe_ (U+2010) may be > rendered as _Spargel?Suppe_ and could then be encoded as > _SpargelSuppe_. > Why should this stylistic decision be encoded in the text content as opposed to being a policy applies on the CSS (or conceptually equivalent) layer? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Sep 13 03:35:02 2019 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Fri, 13 Sep 2019 11:35:02 +0300 Subject: The native name of Tai Viet script and language(s) In-Reply-To: (message from r12a on Thu, 12 Sep 2019 14:34:11 +0100) References: <83ef1dco05.fsf@gnu.org> <83v9ujdsn8.fsf@gnu.org> Message-ID: <83y2yszj9l.fsf@gnu.org> > Cc: unicode at unicode.org > From: r12a > Date: Thu, 12 Sep 2019 14:34:11 +0100 > > On 27/08/2019 07:33, Eli Zaretskii via Unicode wrote: > > Yes, it's an old and outdated text (Emacs is around since 1985, and > > supports multilingual text editing since 1997). Easy to fix, and I > > will fix it, but my main difficulty is with the text that uses the > > script itself, which is why I asked here. I couldn't find > > copy/paste-able text for that anywhere on the Internet. > > For a list of languages for which Tai Viet script is used, see > https://scriptsource.org/cms/scripts/page.php?item_id=script_detail&key=Tavt > > For some copy-pastable text you could try the Tai Dam version of the > Universal Declaration of Human Rights at > https://unicode.org/udhr/d/udhr_blt.html > > For an introduction to the script and the characters used for Tai Dam, > you may find this useful > https://r12a.github.io/scripts/taiviet/ > > If you want to try typing some Tai Viet text, try > https://r12a.github.io/pickers/taiviet/ Thank you for these pointers. From unicode at unicode.org Fri Sep 13 04:09:44 2019 From: unicode at unicode.org (=?ISO-8859-1?Q?Christoph_P=E4per?= via Unicode) Date: Fri, 13 Sep 2019 11:09:44 +0200 Subject: Proposing mostly invisible characters In-Reply-To: References: <839159959.209126.1568292825255@ox.hosteurope.de> Message-ID: CSS Text would indeed allow this in level 4: ? .label {hyphenate-character: "";} However, this suggests that *all* SHYs therein should not produce a hyphen glyph at the end of a line. I guess I would need to show then, that there are instances where this is not desired. Am 13. Sep. 2019, 07:59, um 07:59, Henri Sivonen via Unicode schrieb: >On Thu, Sep 12, 2019, 15:53 Christoph P?per via Unicode > >wrote: > >> ISHY/SIHY is especially useful for encoding (German) noun compounds >in >> wrapped titles, e.g. on product labeling, where hyphens are often >> suppressed for stylistic reasons, e.g. orthographically correct >> _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel?Suppe_ (U+2010) >may be >> rendered as _Spargel?Suppe_ and could then be encoded as >> _SpargelSuppe_. >> > >Why should this stylistic decision be encoded in the text content as >opposed to being a policy applies on the CSS (or conceptually >equivalent) >layer? > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Sep 13 09:27:21 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Fri, 13 Sep 2019 07:27:21 -0700 Subject: Proposing mostly invisible characters In-Reply-To: <839159959.209126.1568292825255@ox.hosteurope.de> References: <839159959.209126.1568292825255@ox.hosteurope.de> Message-ID: <7d880a0a-2ce7-c704-4419-066b588e6b52@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Sep 13 12:50:59 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Fri, 13 Sep 2019 18:50:59 +0100 Subject: Proposing mostly invisible characters In-Reply-To: References: <839159959.209126.1568292825255@ox.hosteurope.de> Message-ID: <20190913185059.24a3b7a9@JRWUBU2> On Fri, 13 Sep 2019 08:56:02 +0300 Henri Sivonen via Unicode wrote: > On Thu, Sep 12, 2019, 15:53 Christoph P?per via Unicode > wrote: > > > ISHY/SIHY is especially useful for encoding (German) noun compounds > > in wrapped titles, e.g. on product labeling, where hyphens are often > > suppressed for stylistic reasons, e.g. orthographically correct > > _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel?Suppe_ > > (U+2010) may be rendered as _Spargel?Suppe_ and could then be > > encoded as _SpargelSuppe_. > > > > Why should this stylistic decision be encoded in the text content as > opposed to being a policy applies on the CSS (or conceptually > equivalent) layer? How would you define such a property? Richard. From unicode at unicode.org Fri Sep 13 13:14:47 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Fri, 13 Sep 2019 11:14:47 -0700 Subject: Proposing mostly invisible characters In-Reply-To: <20190913185059.24a3b7a9@JRWUBU2> References: <839159959.209126.1568292825255@ox.hosteurope.de> <20190913185059.24a3b7a9@JRWUBU2> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 25 10:18:12 2019 From: unicode at unicode.org (wjgo_10009@btinternet.com via Unicode) Date: Wed, 25 Sep 2019 16:18:12 +0100 (BST) Subject: QID emoji and screen readers Message-ID: <2daa9a13.952.16d6900571e.Webtop.46@btinternet.com> There is currently a Public Review, number 405. http://www.unicode.org/review/pri405/ It is about the following document. http://www.unicode.org/reports/tr51/tr51-17.html The issue of screen readers is mentioned in the document. I have thought of a possible solution. However I am not expert on many of the details of what is allowed and what is not allowed in Unicode text, so I am posting the idea here so that depending upon any discussion that takes place, I might send in the idea as a formal response to the Public Review, or send in a modified form based on advice provided, or just abandon the idea as unworkable. Here is the basic idea as I suggest it at the moment, please endorse, reject, discuss, improve the idea as you think best. Decide what text, in any Unicode characters that you wish in any language you choose, is to be the text that the screen reader speaks. Save that text as a UTF-8 byte sequence. Encode that text in its UTF-8 form to produce a text string twice as long as that UTF-8 string such that, byte by byte, each UTF-8 byte is encoded as two hexadecimal "digits" each in the range 0..9, A..F and then use the tag version of each of those characters. Add a U+0020 SPACE character at the front as the base character and add a cancel tag character at the end. Include that string in the document after the QID emoji character. With my limited knowledge of the intricacies of Unicode it seems to me that that might well solve the problem. Screen reader software could decode the tag characters into a string and try to speak it out. Other software would just ignore the tag characters and display the space character. William Overington Wednesday 25 September 2019 From unicode at unicode.org Thu Sep 26 06:21:10 2019 From: unicode at unicode.org (Fred Brennan via Unicode) Date: Thu, 26 Sep 2019 19:21:10 +0800 Subject: On the lack of a SQUARE TB glyph Message-ID: <2152748.07faurLzQQ@pc> Greetings, I can't help but notice that there is no "SQUARE TB" glyph. We have SQUARE KB, GB and MB, starting at U+3385. But no SQUARE TB? SQUARE GB is at U+3387, and U+3388 is...SQUARE CAL, ?, so no space was even left for it?not very future-proof! The purposes of these glyphs is, as you know, for CJK. Perhaps terabytes were not as common when these glyphs were approved, but they are common now. There is a clear demand for a SQUARE TB. In the font SMotoya Sinkai W55 W3, which is ?2008 ???? ???, the glyph is unencoded and accessed via the Discretionary Ligatures (`dlig`) OpenType feature. It has name `T_B.dlig`. This same scheme is used in many other Motoya fonts, and presumably other CJK fonts. In some other fonts, the `hwid` feature can be used to get a similar effect. SQUARE TB is likewise seen often on packaging as terabyte hard drives are now common, as is the concept of a terabyte in operating systems. Recently new glyphs were added for the new era name, so I don't think it's a problem to add SQUARE TB. While we're at it, may as well add SQUARE PB. To be future-proof (hopefully for the next hundred years!), perhaps we ought to also add SQUARE EB, SQUARE ZB and SQUARE YB! But even if only SQUARE TB gets in it's worth it, I need it. Best, Fred Brennan From unicode at unicode.org Thu Sep 26 07:40:28 2019 From: unicode at unicode.org (Marius Spix via Unicode) Date: Thu, 26 Sep 2019 14:40:28 +0200 Subject: Aw: On the lack of a SQUARE TB glyph In-Reply-To: <2152748.07faurLzQQ@pc> References: <2152748.07faurLzQQ@pc> Message-ID: Unfortunately, the CJK Compatibility block is full, but U+321F in the Enclosed CJK Letters and Months seems to be free. I definitely see a usage for the proposed character. ? Gesendet:?Donnerstag, 26. September 2019 um 13:21 Uhr Von:?"Fred Brennan via Unicode" An:?unicode at unicode.org Betreff:?On the lack of a SQUARE TB glyph Greetings, I can't help but notice that there is no "SQUARE TB" glyph. We have SQUARE KB, GB and MB, starting at U+3385. But no SQUARE TB? SQUARE GB is at U+3387, and U+3388 is...SQUARE CAL, ?, so no space was even left for it?not very future-proof! The purposes of these glyphs is, as you know, for CJK. Perhaps terabytes were not as common when these glyphs were approved, but they are common now. There is a clear demand for a SQUARE TB. In the font SMotoya Sinkai W55 W3, which is ?2008 ???? ???, the glyph is unencoded and accessed via the Discretionary Ligatures (`dlig`) OpenType feature. It has name `T_B.dlig`. This same scheme is used in many other Motoya fonts, and presumably other CJK fonts. In some other fonts, the `hwid` feature can be used to get a similar effect. SQUARE TB is likewise seen often on packaging as terabyte hard drives are now common, as is the concept of a terabyte in operating systems. Recently new glyphs were added for the new era name, so I don't think it's a problem to add SQUARE TB. While we're at it, may as well add SQUARE PB. To be future-proof (hopefully for the next hundred years!), perhaps we ought to also add SQUARE EB, SQUARE ZB and SQUARE YB! But even if only SQUARE TB gets in it's worth it, I need it. Best, Fred Brennan ? From unicode at unicode.org Thu Sep 26 12:03:57 2019 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Thu, 26 Sep 2019 10:03:57 -0700 Subject: On the lack of a SQUARE TB glyph Message-ID: <20190926100357.665a7a7059d7ee80bb4d670165c8327d.70f74d2107.wbe@email03.godaddy.com> Fred Brennan wrote: > I can't help but notice that there is no "SQUARE TB" glyph. Marius Spix replied: > Unfortunately, the CJK Compatibility block is full, but U+321F in the > Enclosed CJK Letters and Months seems to be free. I definitely see a usage > for the proposed character. IIRC the CJK Compatibility squared characters came from a legacy character set or standard, developed at whatever point in history it was developed. So it's kind of inevitable that the set of symbols in that block might not always be up to date. This seems like a reasonable candidate for a proposal. UTC won't add a character based on mailing-list chat, of course; they'll need a proper proposal. They'll also be the ones to decide what code point is assigned, although the proposal can politely suggest one. -- Doug Ewell | Thornton, CO, US | ewellic.org From unicode at unicode.org Thu Sep 26 14:56:39 2019 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Thu, 26 Sep 2019 12:56:39 -0700 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <2152748.07faurLzQQ@pc> References: <2152748.07faurLzQQ@pc> Message-ID: <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> On 9/26/2019 4:21 AM, Fred Brennan via Unicode wrote: > There is a clear demand for a SQUARE TB. In the font SMotoya Sinkai W55 W3, > which is ?2008 ???? ???, the glyph is unencoded and accessed via the > Discretionary Ligatures (`dlig`) OpenType feature. It has name `T_B.dlig`. Aye, there's the rub. Despite the subject of this thread, the problem is not the lack of a "glyph". This and many other particular squared forms may exist in Japanese fonts. The question then devolves to whether there is a *character* encoding issue here. What data representation and interchange issue is being raised here that requires an atomic character encoding, when the *presentation* issue can just be handled with OpenType features and already existing characters? If the concern is about future-proofing the standard, then clearly, instead of indefinitely extending various groups of squared combinations for SI values, other technical values, etc., etc., the generative and scaleable way forward is simply to let Japanese squared sequence coinages be handled with OpenType features, rather than insisting that each one come back to the UTC for one-by-one character encoding. Note that there is a certain, systemic similarity here to the problem of extensibility of emoji, where encoding of multiple flags, of multiple skin tones, or of multiple gender representations, etc., is handled more generally by specifying how fonts need to map specified sequences into single glyphs, rather than by insisting that every meaningful combination end up encoded as an atomic character. --Ken From unicode at unicode.org Thu Sep 26 22:56:19 2019 From: unicode at unicode.org (Fred Brennan via Unicode) Date: Fri, 27 Sep 2019 11:56:19 +0800 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> Message-ID: <8513026.Lnrcg1TJtE@pc> On Friday, September 27, 2019 3:56:39 AM PST Ken Whistler wrote: > On 9/26/2019 4:21 AM, Fred Brennan via Unicode wrote: > > There is a clear demand for a SQUARE TB. In the font SMotoya Sinkai W55 > > W3, > > which is ?2008 ???? ???, the glyph is unencoded and accessed via the > > Discretionary Ligatures (`dlig`) OpenType feature. It has name `T_B.dlig`. > > Aye, there's the rub. Despite the subject of this thread, the problem is > not the lack of a "glyph". This and many other particular squared forms > may exist in Japanese fonts. The question then devolves to whether there > is a *character* encoding issue here. What data representation and > interchange issue is being raised here that requires an atomic character > encoding, when the *presentation* issue can just be handled with > OpenType features and already existing characters? The purpose of Unicode is plaintext encoding, is it not? The square TB form is fundamentally no different than the square form of Reiwa, U+32FF ?, which was added in a hurry. The difference is that SQUARE TB's necessity and use is a slow thing which happened over years, not all of a sudden via one announcement of the Japanese government. In plaintext SQUARE TB is fundamentally different than ASCII T followed by ASCII B. Plaintext tables (and programs generating them) and files already using SQUARE MB, SQUARE GB, etc benefit from SQUARE TB. > Note that there is a certain, systemic similarity here to the problem of > extensibility of emoji, where encoding of multiple flags, of multiple > skin tones, or of multiple gender representations, etc., is handled more > generally by specifying how fonts need to map specified sequences into > single glyphs, rather than by insisting that every meaningful > combination end up encoded as an atomic character. New emoji are still being encoded. The existence of SQUARE GB leads to its use, which then leads to people wanting SQUARE TB and resorting to hacks to get it done. If you didn't want people to request more square forms you shouldn't have encoded any at all. It's too late for that. There is no sequence of glyphs that could be logically mapped, unless you're telling me to request that the sequence T B be recommended for general interchange as SQUARE TB? That's silly. > --Ken From unicode at unicode.org Fri Sep 27 00:15:44 2019 From: unicode at unicode.org (Fred Brennan via Unicode) Date: Fri, 27 Sep 2019 13:15:44 +0800 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <8513026.Lnrcg1TJtE@pc> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc> Message-ID: <18279650.53bg6BaeYp@pc> I'm sorry to write twice to the list but after some discussion on Twitter it is certain I'm going to write a request. I only have two lingering questions. * Does the existence of the legacy Adobe encoding Adobe-Japan1-6 shift the balance? It has a SQUARE TB at CID+8306. https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5078.Adobe-Japan1-6.pdf * Should I also propose SQUARE PB up to SQUARE YB? At the very least SQUARE PB in datacenter settings seems useful. (I emailed Dr. Ken Lunde with these questions, but he's on vacation until the next meeting of the UTC. Does anyone know how much time I have before the agenda closes? When does the Script Ad Hoc meet next?) From unicode at unicode.org Fri Sep 27 01:01:10 2019 From: unicode at unicode.org (James Kass via Unicode) Date: Fri, 27 Sep 2019 06:01:10 +0000 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <18279650.53bg6BaeYp@pc> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc> <18279650.53bg6BaeYp@pc> Message-ID: <1f790b24-1ff7-e75a-af3f-40d2e88e4b35@gmail.com> On 2019-09-27 5:15 AM, Fred Brennan via Unicode wrote: > I only have two lingering questions. > > * Does the existence of the legacy Adobe encoding Adobe-Japan1-6 shift the > balance? It has a SQUARE TB at CID+8306. > > https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5078.Adobe-Japan1-6.pdf That character set also has other items not in Unicode such as numbers enclosed in squares from "0" and "00" through "100" and fractions like 3/7 and 10/11.? It was published in 2008, so it might not be considered as "legacy". From unicode at unicode.org Fri Sep 27 01:42:22 2019 From: unicode at unicode.org (David Starner via Unicode) Date: Thu, 26 Sep 2019 23:42:22 -0700 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <8513026.Lnrcg1TJtE@pc> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc> Message-ID: On Thu, Sep 26, 2019 at 8:57 PM Fred Brennan via Unicode wrote: > The purpose of Unicode is plaintext encoding, is it not? The square TB form is > fundamentally no different than the square form of Reiwa, U+32FF ?, which was > added in a hurry. The difference is that SQUARE TB's necessity and use is a > slow thing which happened over years, not all of a sudden via one announcement > of the Japanese government. Defining whether a pair of characters gets squeezed into one square is hardly a plaintext issue. The square form of Reiwa is a bit different, given its use in printing time, where there may have been an expectation that it takes up one square. It's also a new member of a tiny set, as opposed to SQUARE TB, which people have been using already in various ways. > New emoji are still being encoded. The existence of SQUARE GB leads to its > use, which then leads to people wanting SQUARE TB and resorting to hacks to > get it done. If you didn't want people to request more square forms you > shouldn't have encoded any at all. It's too late for that. It's unlikely that not encoding wouldn't have stopped the requests from coming, and it's not too late for them to dismiss those requests. Unicode, in order to become the one character set, had to become backward compatible with all the major legacy character sets out there. Unicode has piles and piles of frustrating compromises because of that, but it was felt that was the cost that had to be paid. > There is no sequence of glyphs that could be logically mapped, unless you're > telling me to request that the sequence T B be recommended for general > interchange as SQUARE TB? That's silly. Why is that silly? You've got an unbounded set of these; even the base prefixes EPTGMkhdm?np (and da) crossed with bBmglWsAKNJC?T (plus a bunch more), which is over 200 combinations without all the units, and there's some exponents encoded, so some of those will need to be encoded with exponents. And that's far from a complete list of what people might want as squares. -- Kie ekzistas vivo, ekzistas espero. From unicode at unicode.org Fri Sep 27 02:17:43 2019 From: unicode at unicode.org (Julian Bradfield via Unicode) Date: Fri, 27 Sep 2019 08:17:43 +0100 (BST) Subject: On the lack of a SQUARE TB glyph References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc> Message-ID: On 2019-09-27, David Starner via Unicode wrote: > On Thu, Sep 26, 2019 at 8:57 PM Fred Brennan via Unicode > wrote: [snip] >> There is no sequence of glyphs that could be logically mapped, unless you're >> telling me to request that the sequence T B be recommended for general >> interchange as SQUARE TB? That's silly. > > Why is that silly? You've got an unbounded set of these; even the base > prefixes EPTGMkhdm?np (and da) crossed with bBmglWsAKNJC?T (plus a > bunch more), which is over 200 combinations without all the units, and > there's some exponents encoded, so some of those will need to be > encoded with exponents. And that's far from a complete list of what > people might want as squares. Wouldn't T B be a better sequence? In fact, it would have been nice (expecially for mathematicians) if all combining marks could have been applied to character sequences, by means of some "high precedence ZWJ" that binds more tightly than combination. (Playing devil's advocate here, since I don't think maths is plain text:) Or one could allow IDS to have leaf components that are any characters, not just ideographic characters, and then one could have all sorts of fun. From unicode at unicode.org Fri Sep 27 02:29:43 2019 From: unicode at unicode.org (=?UTF-8?B?WWlmw6FuIFfDoW5n?= via Unicode) Date: Fri, 27 Sep 2019 16:29:43 +0900 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <18279650.53bg6BaeYp@pc> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc> <18279650.53bg6BaeYp@pc> Message-ID: > * Does the existence of the legacy Adobe encoding Adobe-Japan1-6 shift the > balance? It has a SQUARE TB at CID+8306. The code point suggests that it was already there as early as Adobe-Japan1-1 in 1993. The fact makes it seem certainly weird why it hasn't been included like its fellows nearby. Perhaps it could be used as a rationale that this character is erroneously missing. > * Should I also propose SQUARE PB up to SQUARE YB? At the very least SQUARE PB > in datacenter settings seems useful. It does not however imply that squared *B things have some general needs. The situation is very different than that of Reiwa. U+32FF wasn't added for commemorative purpose, but is a critical requirement of some legacy systems that hardcoded those squared eras as time control characters, lest they fall into a Y2019 problem without a new era symbol (that's why Unicode announced the code point far before the real era name came out). From what I heard, even the Japanese government hadn't recognized the problem in the first place but MS or somebody listened to their clients and choosed to encode it. So far I don't believe there's a real need because I don't know a case that KB, MB... are used outside representational purpose, but I may be wrong. 2019?9?27?(?) 14:17 Fred Brennan via Unicode : > > I'm sorry to write twice to the list but after some discussion on Twitter it > is certain I'm going to write a request. > > I only have two lingering questions. > > * Does the existence of the legacy Adobe encoding Adobe-Japan1-6 shift the > balance? It has a SQUARE TB at CID+8306. > > https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5078.Adobe-Japan1-6.pdf > > * Should I also propose SQUARE PB up to SQUARE YB? At the very least SQUARE PB > in datacenter settings seems useful. > > (I emailed Dr. Ken Lunde with these questions, but he's on vacation until the > next meeting of the UTC. Does anyone know how much time I have before the > agenda closes? When does the Script Ad Hoc meet next?) > > > From unicode at unicode.org Fri Sep 27 09:29:13 2019 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Fri, 27 Sep 2019 07:29:13 -0700 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <18279650.53bg6BaeYp@pc> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc> <18279650.53bg6BaeYp@pc> Message-ID: <86fd2fc0-d3be-308c-6241-2b30738b7ee4@sonic.net> Fred, 2 hours and 33 minutes from now (today). But you don't need to try to synch a proposal like this to a particular script ad hoc meeting. That group meets roughly once a month, and any new proposal coming in right now wouldn't be on the Unicode 13.0 train, even if the UTC immediately agreed to it. So there isn't an immediately urgent deadline for new proposals. --Ken On 9/26/2019 10:15 PM, Fred Brennan via Unicode wrote: > When does the Script Ad Hoc meet next? From unicode at unicode.org Sun Sep 29 09:42:44 2019 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Sun, 29 Sep 2019 14:42:44 +0000 Subject: On the lack of a SQUARE TB glyph In-Reply-To: References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc>

Message-ID: <476F2C9E-E91E-4077-9373-6C8D34A7B005@lboro.ac.uk> > Or one could allow IDS to have leaf components that are any > characters, not just ideographic characters, and then one could have > all sorts of fun. I do like that idea Andr? Schappo From unicode at unicode.org Sun Sep 29 11:51:07 2019 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Sun, 29 Sep 2019 10:51:07 -0600 Subject: On the lack of a SQUARE TB glyph Message-ID: <000001d576e6$184b6e90$48e24bb0$@ewellic.org> Fred Brennan wrote: > The purpose of Unicode is plaintext encoding, is it not? The square TB > form is fundamentally no different than the square form of Reiwa, > U+32FF ?, which was added in a hurry. The difference is that SQUARE > TB's necessity and use is a slow thing which happened over years, not > all of a sudden via one announcement of the Japanese government. I think the case you are going to have to make is that applications exist which *must* use the single-code-point character for this purpose, instead of simply being able to use U+0054 plus U+0042. As others have stated, it was easily demonstrated that applications existed in Japan which required a single code point for the era name. That is what necessitated the acceptance, let alone fast-tracking, of U+32FF SQUARE ERA NAME REIWA. The characters in the CJK Compatibility block were added for exactly that reason ? compatibility with character encoding standards that existed prior to Unicode. There has never been any expectation that sets or sequences in that and other "compatibility" blocks would be updated continually. Compatibility with character sets created since the wide adoption of Unicode, such as in 2008, is also not guaranteed. Earlier I wrote "[t]his seems like a reasonable candidate for a proposal," not necessarily because UTC will agree with the stated use case, but because talking about such a character on the mailing list won't get it added. > In plaintext SQUARE TB is fundamentally different than ASCII T followed by > ASCII B. Plaintext tables (and programs generating them) and files already > using SQUARE MB, SQUARE GB, etc benefit from SQUARE TB. That is something you would have to demonstrate in your proposal: that there are important processes (as in, "the government and industry and commerce depend on this") that use ? ? ? which it would not be feasible to extend or modify to use Basic Latin TB, PB, EB, etc. That was the case made for the Reiwa sign: that there were important processes using ? ? ? ? that could not simply use the two existing characters ?? for Reiwa. -- Doug Ewell | Thornton, CO, US | ewellic.org From unicode at unicode.org Sun Sep 29 14:24:57 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Sun, 29 Sep 2019 12:24:57 -0700 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <476F2C9E-E91E-4077-9373-6C8D34A7B005@lboro.ac.uk> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc>

<476F2C9E-E91E-4077-9373-6C8D34A7B005@lboro.ac.uk> Message-ID: <05294d22-1aff-bbbe-73a6-41e8f9a58b83@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Sep 30 03:01:15 2019 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Mon, 30 Sep 2019 08:01:15 +0000 Subject: On the lack of a SQUARE TB glyph In-Reply-To: References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc>

Message-ID: <00F5A7DB-D409-4F71-85C8-AD14710F4698@lboro.ac.uk> > On Sep 27, 1 Reiwa, at 08:17, Julian Bradfield via Unicode wrote: > > Or one could allow IDS to have leaf components that are any > characters, not just ideographic characters, and then one could have > all sorts of fun. I do like this idea. Note: This is a modified repost as I previously forgot to credit Julian as the originator Andr? Schappo From unicode at unicode.org Mon Sep 30 03:32:02 2019 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Mon, 30 Sep 2019 01:32:02 -0700 Subject: On the lack of a SQUARE TB glyph In-Reply-To: <00F5A7DB-D409-4F71-85C8-AD14710F4698@lboro.ac.uk> References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc>

<00F5A7DB-D409-4F71-85C8-AD14710F4698@lboro.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Sep 30 04:24:31 2019 From: unicode at unicode.org (Marius Spix via Unicode) Date: Mon, 30 Sep 2019 11:24:31 +0200 Subject: Aw: Re: On the lack of a SQUARE TB glyph In-Reply-To: References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc>

<00F5A7DB-D409-4F71-85C8-AD14710F4698@lboro.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Sep 30 07:09:03 2019 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Mon, 30 Sep 2019 13:09:03 +0100 Subject: On the lack of a SQUARE TB glyph In-Reply-To: References: <2152748.07faurLzQQ@pc> <86008240-925e-5e8c-2170-e9640c6dd3ad@sonic.net> <8513026.Lnrcg1TJtE@pc>

<00F5A7DB-D409-4F71-85C8-AD14710F4698@lboro.ac.uk> Message-ID: <20190930130903.38da1d5d@JRWUBU2> On Mon, 30 Sep 2019 01:32:02 -0700 Asmus Freytag via Unicode wrote: > On 9/30/2019 1:01 AM, Andre Schappo via Unicode wrote: > > On Sep 27, 1 Reiwa, at 08:17, Julian Bradfield via Unicode > wrote: > > Or one could allow IDS to have leaf components that are any > characters, not just ideographic characters, and then one could have > all sorts of fun. > > I do like this idea. > > Note: This is a modified repost as I previously forgot to credit > Julian as the originator Egyptian hieroglyphs have lay-out operators that are meant to control actual lay-out, whereas IDS operators are only meant as descriptions, and a compliant implementation need not perform the lay-out described. Richard. From unicode at unicode.org Mon Sep 30 10:52:20 2019 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Mon, 30 Sep 2019 08:52:20 -0700 Subject: On the lack of a SQUARE TB glyph Message-ID: <20190930085220.665a7a7059d7ee80bb4d670165c8327d.06caa072c1.wbe@email03.godaddy.com> I wrote: > As others have stated, it was easily demonstrated that applications > existed in Japan which required a single code point for the era name. > That is what necessitated the acceptance, let alone fast-tracking, of > U+32FF SQUARE ERA NAME REIWA. Well, this is what I've heard, anyway. Just out of curiosity, does anyone have actual examples of such applications? This might help demonstrate why the Reiwa sign doesn't set a precedent for TB et al. -- Doug Ewell | Thornton, CO, US | ewellic.org