From junichi.chiba.bps at gmail.com Sat Oct 1 02:04:11 2016 From: junichi.chiba.bps at gmail.com (Junichi Chiba) Date: Sat, 01 Oct 2016 07:04:11 +0000 Subject: Dates in Japanese Era Names in Unicode Standard In-Reply-To: <59642171-c152-0863-8165-ac48ace1d9a1@it.aoyama.ac.jp> References:

<6c865cc7-8227-d72a-7794-e9fe9f3bc583@it.aoyama.ac.jp> <59642171-c152-0863-8165-ac48ace1d9a1@it.aoyama.ac.jp> Message-ID: > Your analysis sounds very plausible. I suggest you send an official error report using http://www.unicode.org/reporting.html. Thank you, Martin! I sent a suggestion there together with a link to discussion here. On Fri, 30 Sep 2016 at 14:43 Martin J. D?rst wrote: > Hello Junichi, > > Your analysis sounds very plausible. I suggest you send an official > error report using http://www.unicode.org/reporting.html. > > Regards, Martin. > > On 2016/09/30 13:16, ?? ?? wrote: > >> Is it possible that these eras start at midday instead of noon ? > >> This could explain the date difference, if you do not set the time in > > your query > >> (your query will assume a default time at 00:00 midnight) > > > > The new era starts 00:00 midnight local time. > > Together with the time zone difference, I assume that the cause was a > > simple chain of mistakes while drafting the unicode document. > > > > My story: > > > > First, the author for the Table 22-8 asks somebody to send a list of the > > dates. > > For the table to work, the accuracy of "day" should be enough, rather > than > > time. > > The "day" value is thus recorded in YYYYMMDD format. > > It is then listed in a file format like a spreadsheet, that keeps day > value > > in "time" accuracy with time zone marker. > > As there is no intention to keep it in "time" accuracy, let's suppose > that > > a default marker such as UTC+0 is embed automatically. > > > > The spreadsheet is then sent to the author and opened in more "Western" > > time zone than it was recorded. > > Upon opening the file, the dates were converted to local time zone. > > Specifying a more "Western" time zone results in smaller date values. > > Thus the smaller values are picked up by the author for Table 22-8. > > > > Actually all of the day values in Table 22-8 are shifted by one earlier. > > > > Current values: > > U+337B square era name heisei 1989-01-07 to present day > > U+337C square era name syouwa 1926-12-24 to 1989-01-06 > > U+337D square era name taisyou 1912-07-29 to 1926-12-23 > > U+337E square era name meizi 1867 to 1912-07-28 > > > > Suggested correction: > > U+337B square era name heisei 1989-01-08 to present day > > U+337C square era name syouwa 1926-12-25 to 1989-01-07 > > U+337D square era name taisyou 1912-07-30 to 1926-12-24 > > U+337E square era name meizi 1868 to 1912-07-29 > > > > > > Here are some citations. > > > > I will cite from the most reliable source, law database provided by the > > government (in Japanese). > > This is the actual law about when Heisei shall start: > > > http://law.e-gov.go.jp/cgi-bin/idxselect.cgi?IDX_OPT=1&H_NAME=%8C%B3%8D%86%82%F0%89%FC%82%DF%82%E9%90%AD%97%DF&H_NAME_YOMI=%82%A0&H_NO_GENGO=H&H_NO_YEAR=&H_NO_TYPE=2&H_NO_NO=&H_FILE_NAME=S64SE001&H_RYAKU=1&H_CTG=1&H_YOMI_GUN=1&H_CTG_GUN=1 > > > >> ??????????????? > >> ... > >> ?????????? > >> ?? > >> ???????????????????? > > > > Translation: > >> Showa 64 January 7 Ordinance 1 > >> ... > >> Era name shall be Heisei. > >> Appendix > >> This ordinance shall be effective since the next day of promulgation. > > > > The release date was January 7. > > As Martin mentioned, Heisei started on the next day of the announcement. > > Thus Showa lasted until the end of January 7 very midnight, then Heisei > > started at very morning of January 8. > > > >> On the other hand, I saw places that said Showa 64 as late as July (that > >> was when I climbed Mt. Fuji; a placard put up the year before said > >> "closed until July Showa 64"). > > > > I remember the same thing when I was a child. > > For about a half year, many things such as application forms and street > > signs still displayed in Showa. I saw Passport and License showing > > expiration date as Showa 70 or 80. Coins are printed and stocked before > > release, so there are circulation of Showa 64 coins. > > > > People often carry a conversion table like: > > 1986 : Showa 61 > > 1987 : Showa 62 > > 1988 : Showa 63 > > 1989 : Showa 64 : Heisei 1 > > 1990 : Showa 65 : Heisei 2 > > 1991 : Showa 66 : Heisei 3 > > > > I also cite start of Showa. This is citation from Wikisource, another > > reliable source for public documents. > > > https://ja.wikisource.org/wiki/%E6%98%AD%E5%92%8C%E3%83%88%E6%94%B9%E5%85%83 > >> ?????????????????????????????????????????????????????????? > >> ???? > >> ???????????? > > Translation: > >> In the name of Emperor who is given inherited soverignty to administer > > state affairs, We let Taisho 15 December 25 and forth be begin of Showa. > >> Signed by Emperor > >> Taisho 15 December 25 > > As Martin mentioned, eras before Heisei were renewed in the way that > > announcement overwrites the old day. > > > > > > Here is start of Taisho: > > > https://ja.wikisource.org/wiki/%E6%98%8E%E6%B2%BB%E5%9B%9B%E5%8D%81%E4%BA%94%E5%B9%B4%E4%B8%83%E6%9C%88%E4%B8%89%E5%8D%81%E6%97%A5%E4%BB%A5%E5%BE%8C%E3%83%B2%E6%94%B9%E3%83%A1%E3%83%86%E5%A4%A7%E6%AD%A3%E5%85%83%E5%B9%B4%E3%83%88%E7%88%B2%E3%82%B9 > >> ???????????????????????????? > >> ?????????????????????????????????????? > >> ???? > >> ??????????? > > > > Translation: > >> In the name of Emperor under inherited spirit of soverignty to > administer > > state affairs with virtue, We let, regarding ordinance enacted by the > > previous Emperor, Meiji 45 July 30 and forth be begin of Taisho. > >> Signed by Emperor > >> Meiji 45 July 30 > > > > With this law, Meiji 45 July 30 is overwritten by Taisho 1 July 30. > > > > > > Lastly, here is start of Meiji. > > > https://ja.wikisource.org/wiki/%E4%BB%8A%E5%BE%8C%E5%B9%B4%E8%99%9F%E3%83%8F%E5%BE%A1%E4%B8%80%E4%BB%A3%E4%B8%80%E8%99%9F%E3%83%8B%E5%AE%9A%E3%83%A1%E6%85%B6%E6%87%89%E5%9B%9B%E5%B9%B4%E3%83%B2%E6%94%B9%E3%83%86%E6%98%8E%E6%B2%BB%E5%85%83%E5%B9%B4%E3%83%88%E7%88%B2%E3%82%B9%E5%8F%8A%E8%A9%94%E6%9B%B8 > >> ?? > >> ...????????????????????????????? > >> ???????? > > > > Translation: > >> Imperial Edict > >> ... Keio 4 be renamed as Meiji 1 and since now the tradition of frequent > > renaming of Era be limited to one Era per Emperor. > > > > Since Meiji, the Era is less frequently renewed. It is more engineer > > friendly! > > > > In Table 22-8, the Meiji start day is omitted. > > The omission itself is reasonable. It can avoid controversy in writing > the > > day along Lunar calendar used until Meiji 5 December 2 midnight. (The > next > > day is Meiji 6 January 1.) > > > > The problem here is the year shown as 1867. > > The ordinance was released on Meiji 1 September 8 Lunar, which was 1868 > > October 23 Gregorian. > > Meiji 1 January 1 Lunar (and Keio 4 January 1 Lunar) is 1868 January 25 > > Gregorian. > > My best guess is that the author of Table 22-8 picked up the year value > > from spreadsheet showing "1867-12-31" in local time, originally intended > to > > show merely "1868-01". > > > > On Thu, 29 Sep 2016 at 19:46 Martin J. D?rst > wrote: > > > >> Just a few not very closely related comments: > >> > >> On 2016/09/29 19:06, Philippe Verdy wrote: > >>> Is it possible that these eras start at midday instead of noon ? This > >> could > >>> explain the date difference, if you do not set the time in your query > >> (your > >>> query will assume a default time at 00:00 midnight) > >> > >> It's extremely difficult to imagine this for Japan in this day and age. > >> > >> I was in Japan when the era changed from Showa to Heisei. I remember the > >> announcement very well, but I don't remember anything about the exact > >> time of the cutover. > >> > >> > >>> Many people still count the second half of the night after midnight as > >> part > >>> of the previous day (and so will say "Saturday evening"/"Saturday > night" > >>> even if it's already the first hours of Sunday). > >> > >> In Japan, that happens e.g. in displays of restaurants and bars, which > >> may announce their opening hours as 17:30-27:00 (i.e. open until three > >> in the morning the next day). But that's only a convention for > >> convenience, everybody knows that it's already the next day on the > >> calendar. > >> > >> > >>> If you test dates and don't want to specify hours, it is highly > >> recommended > >>> to set the default time at midday. For the Japanese eras, it's not > clear > >> at > >>> which time they really start, except for the last two eras since WW2 > but > >>> setting time at midday shoudl give the correct result. However there's > no > >>> ambiguity during the day of era switch, if the era is correctly > specified > >>> (and not just the year number in era). > >> > >> Yes indeed. These days, people just refer to 1989 (and any dates in it) > >> as Heisei 1 (????). This is all the easier because otherwise, an > >> exception would be necesary for only 7 days. > >> > >> On the other hand, I saw places that said Showa 64 as late as July (that > >> was when I climbed Mt. Fuji; a placard put up the year before said > >> "closed until July Showa 64"). I also got some money in February or so > >> that year and had to sign a receipt that said Showa 64 because it was > >> printed earlier. > >> > >> The Japanese Wikipedia article, at the bottom of the ?? > >> (https://ja.wikipedia.org/wiki/??#.E6.94.B9.E5.85.83) section, says > that > >> in contrast to the two earlier changes in era, the change started on the > >> next day, in order to give engineers time for the change. That next day > >> was a Sunday, which meant that in effect, they had even more time, > >> because most systems had to work with the new ear only from Monday. But > >> I guess it must have been a busy weekend for those involved, anyway. > >> > >> To know all the details, the best thing to do would be to check the > >> official government documents, which should be available online. But I > >> wouldn't be surprised if they were not specifying things to the second. > >> > >> Regards, Martin. > >> > >>> 2016-09-29 5:13 GMT+02:00 Junichi Chiba : > >>> > >>>> Dear all, > >>>> > >>>> Nice to e-meet you. > >>>> > >>>> I'm looking at the latest Unicode Standard [1] listing the dates for > >>>> Japanese Era Names in Table 22-8. > >>>> What I noticed is the begin and end dates for each era. > >>>> They seem to have one day difference with the dates that are > recognized > >>>> publicly in Japan. > >>>> For example, the current Heisei actually started January 8th, 1989, > >> after > >>>> Showa ended on 7th, 1989. > >>>> > >>>> However, the Unicode Standard says in Table 22-8: > >>>> U+337B square era name heisei 1989-01-07 to present day > >>>> U+337C square era name syouwa 1926-12-24 to 1989-01-06 > >>>> > >>>> Looking at Wikipedia in Japanese [2] and English [3], you can see > exact > >>>> dates for Syouwa end and Heisei start. > >>>> Could there be certain intentions to leave some difference in this > >>>> description and official dates? > >>>> Is the date counted according to GMT, instead of local date/time for > >> some > >>>> reason? > >>>> > >>>> REFERENCE > >>>> > >>>> [1] > >> http://www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf > >>>> > >>>> [2] https://ja.wikipedia.org/wiki/%E5%B9%B3%E6%88%90 > >>>>> > 1989????64??1?7????????????????????????????????????????????1989????64?? > >>>> 1?7????????????????????????1?8??????????? > >>>> > >>>> [3] https://en.wikipedia.org/wiki/Heisei_period > >>>>> Thus, 1989 corresponds to Sh?wa 64 until 7 January and Heisei 1 ... > >>>> since 8 January. > >>>>> On 7 January 1989, at 07:55 JST, the Grand Steward of Japan's > Imperial > >>>> Household Agency, Sh?ichi Fujimori, announced Emperor Hirohito's > >> death,... > >>>>> The Heisei era went into effect immediately upon the day after > Emperor > >>>> Akihito's succession to the throne on 7 January 1989. > >>>> > >>> > >> > >> -- > >> Martin J. D?rst > >> Department of Intelligent Information Technology > >> Collegue of Science and Engineering > >> Aoyama Gakuin University > >> Fuchinobe 5-1-10, Chuo-ku, Sagamihara > >> 252-5258 Japan > >> > > > > -- > Martin J. D?rst > Department of Intelligent Information Technology > Collegue of Science and Engineering > Aoyama Gakuin University > Fuchinobe 5-1-10, Chuo-ku, Sagamihara > 252-5258 Japan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.lukyanov at yspu.org Sat Oct 1 03:12:15 2016 From: a.lukyanov at yspu.org (a.lukyanov) Date: Sat, 01 Oct 2016 11:12:15 +0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <328312cd-094c-5f9b-62fd-7803e51173f8@ix.netcom.com> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <563c28fc-7772-59f6-01ae-ab99bcf64a39@cs.tut.fi> <99AC47C7-6BAC-4D76-A669-2D7743B00B69@evertype.com> <328312cd-094c-5f9b-62fd-7803e51173f8@ix.netcom.com> Message-ID: <57EF6FDF.4070304@yspu.org> I think that the right thing to do would be to create several new control/formatting characters, like this: "previous character is superscript" "previous character is subscript" "previous character is small caps (for use in phonetic transcription only)" "previous character is mathematical blackletter" etc Then people will be able to apply this features on any character as long as their font supports it. From khaledhosny at eglug.org Sat Oct 1 03:29:33 2016 From: khaledhosny at eglug.org (Khaled Hosny) Date: Sat, 1 Oct 2016 10:29:33 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> Message-ID: <20161001082933.GA2819@macbook> On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: > 30.9.2016, 19:11, Leonardo Boiko wrote: > > > The Unicode codepoints are not intended as a place to store > > typographically variant glyphs (much like the Unicode "italic" > > characters aren't designed as a way of encoding italic faces). > > There is no disagreement on this. What I was pointing at was that when using > rich text or markup, it is complicated or impossible to have typographically > correct glyphs used (even when they exist), whereas the use of Unicode > codepoints for subscript or superscript characters may do that in a much > simpler way. That is not generally true. In TeX you get true superscript glyphs by default. On the web you can use font features in CSS to get them as well, provided that you are using a font that supports them. Regards, Khaled From jkorpela at cs.tut.fi Sat Oct 1 07:00:50 2016 From: jkorpela at cs.tut.fi (Jukka K. Korpela) Date: Sat, 1 Oct 2016 15:00:50 +0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161001082933.GA2819@macbook> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: 1.10.2016, 11:29, Khaled Hosny wrote: > On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: [...] >> What I was pointing at was that when using >> rich text or markup, it is complicated or impossible to have typographically >> correct glyphs used (even when they exist), whereas the use of Unicode >> codepoints for subscript or superscript characters may do that in a much >> simpler way. > > That is not generally true. It is generally true, but not without exceptions. > In TeX you get true superscript glyphs by default. I suppose you?re right, though I don?t know exactly how TeX implements superscripts. I suspect the fonts that TeX normally uses do not contain (many) superscript or subscript glyph variants, but TeX might actually map e.g. ^2 in math mode to a superscript glyph for 2 (identical with to the glyph for ?). > On the web you can use font features in CSS to get them as > well, provided that you are using a font that supports them. This is a good example of my general statement. If you use the simple way in CSS, you use vertical-align set to sub or super together with a font-size setting. This is simple and ?works?, but it does not use subscript or superscript glyphs but algorithmically operates on normal glyphs (and produces different results in different browsers etc.). The newer way, setting font features, is a) much less widely known, 2) much less supported in browsers, 3) requires extra settings to deal with browser-specific names of the relevant properties. Yucca From glorieul at coanda-deviation.info Sat Oct 1 08:48:59 2016 From: glorieul at coanda-deviation.info (lorieul) Date: Sat, 01 Oct 2016 15:48:59 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: <1475329739.1352.0.camel@coanda-deviation.info> Re, On Fri, 2016-09-30 at 11:57 +0200, Gael Lorieul wrote: > I wonder why only a subset of the alphabet is available as subscript > and/or superscript ? On Fri, 2016-09-30 at 17:08 +0200, "J?rg Knappen" wrote: > They were found in older charactersets and Unicode > provides so-called "round-trip compatibility" to those > older character sets. Okay I understand better the context now? On Fri, 2016-09-30 at 17:19 +0200, Philippe Verdy wrote: > Your problem here is that "start" and "end" are not > symbols/variables but actual English words. Why > would this usage be restricted only to English ? > The same formula would need to be really translated > in various languages and scripts, needing then > mapping all letters in Latin, Greek, Cyrillic, but > even also Arabic, Japanese Chinese, Hindi... On Fri, 2016-09-30 at 13:11 -0300, Leonardo Boiko wrote: > The Unicode codepoints are not intended as a place > to store typographically variant glyphs (much like > the Unicode "italic" characters aren't designed as > a way of encoding italic faces). I understand your point? On Fri, 2016-09-30 at 17:08 +0200, "J?rg Knappen" wrote: > Sub- and Superscripts are considered "higher level markup" > and not parts of plain text in UNicode. You can easily get > at them using LaTeX notation or HTML tags for sub- or superscripts. The drawback of that solution is lack of readability in the sources. I would like to have a formatting in the spirit of markdown i.e. a formating that is easy to read both in the sources and after html- or pdf- or whatever-generation. Indeed Latex formulas are often not easy to decypher? Since one spends more time reading source code than documentation it is important that the comments within the source files are also easily readable. This way, there is no need to constantly switch back-and-forth between text editor and documentation : the source code suffices to itself. On Sat, 2016-10-01 at 11:12 +0300, a.lukyanov wrote: > I think that the right thing to do would be to create > several new control/formatting characters, like this: > > "previous character is superscript" > "previous character is subscript" > "previous character is small caps (for use in phonetic transcription only)" > "previous character is mathematical blackletter" > etc > > Then people will be able to apply this features on any > character as long as their font supports it. That would be a nice alternative indeed. Regards, Ga?l From verdy_p at wanadoo.fr Sat Oct 1 09:00:35 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Oct 2016 16:00:35 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: I disagree. Fonts normally contain metrics for proper positioning of the superscript and subscript baselines and relative heights. They "may" provide additional features to overide the glyphs or relative positioning if this is needded for the coherence with the preencoded superscripts/superscripts that are mapped in the font, or to adjust the visual weights of strokesand adjust some angles, or for correct hinting on low resolution displays. These specific features do not need to be enabled explicitly in CSS, they should be enabled by default. Problems only occur with defective fonts that have incomplete data, and for which browsers (in fact their internal text renderers) are attempting to define some reasonnable defaults. This may for some time produce some incoherent styles but this is temporary. Slowly but surely, these defects are being corrected. As Unicode encodes things for the long term, there's no need to define temporary workaround by encoding new variants. The existing superscript/subscripts have then been encoded for other prupose: preserve separate semantics of letter modifiers in plain text or in IPA as **distinct** symbols. Any other use is still possible by people hacing these characters as if it was a general way for writing superscript/subscript, but these are just hacks that break the identity of the represented text. They have also been encoded for roundtrip compatibiluty with older standards where it is impossible to determine what is the intended semantics, but also because these old characters were used in low resolution displays or monospaced displays (where more exact font metrics needed for maths formulas could not be respected at all). Even in TeX or math formulas in general, all symbols used in superscript/subscript are preserving their own identity: this is just a question of layout where the applied style adds (but does not replaces or removes) more semantics. In summary, we should use the normal characters including in Tex/Maths. Then the layout engine will do its best with the fonts they have, will honor their suggested metrics (if they are defined), will attempt to alias some missing character mappings in fonts, or will synthetize these styles using the best metrics available in font or computed with reasonnable defaults for the scripts. And for all this you do not need more than a "sub" or "sup" element in HTML, and in TeX/MathML you just use its standard "^" or "_" layout operators. Only at this time, if authors are seeing that the current implementations are still not what they expected, they will attempt to hack a bit the presentaiton by adding some specific styles (but only as a temporary workaround, which will no longer be needed in the long term and that could cause incoherences later with updated fonts or updated text engines that would produce better and more coherent results). 2016-10-01 14:00 GMT+02:00 Jukka K. Korpela : > 1.10.2016, 11:29, Khaled Hosny wrote: > > On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: >> > [...] > >> What I was pointing at was that when using > >> rich text or markup, it is complicated or impossible to have >>> typographically >>> correct glyphs used (even when they exist), whereas the use of Unicode >>> codepoints for subscript or superscript characters may do that in a much >>> simpler way. >>> >> >> That is not generally true. >> > > It is generally true, but not without exceptions. > > In TeX you get true superscript glyphs by default. >> > > I suppose you?re right, though I don?t know exactly how TeX implements > superscripts. I suspect the fonts that TeX normally uses do not contain > (many) superscript or subscript glyph variants, but TeX might actually map > e.g. ^2 in math mode to a superscript glyph for 2 (identical with to the > glyph for ?). > > On the web you can use font features in CSS to get them as >> well, provided that you are using a font that supports them. >> > > This is a good example of my general statement. If you use the simple way > in CSS, you use vertical-align set to sub or super together with a > font-size setting. This is simple and ?works?, but it does not use > subscript or superscript glyphs but algorithmically operates on normal > glyphs (and produces different results in different browsers etc.). The > newer way, setting font features, is a) much less widely known, 2) much > less supported in browsers, 3) requires extra settings to deal with > browser-specific names of the relevant properties. > > Yucca > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Sat Oct 1 09:21:47 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Sat, 1 Oct 2016 16:21:47 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <1475329739.1352.0.camel@coanda-deviation.info> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> <1475329739.1352.0.camel@coanda-deviation.info> Message-ID: <8C55D747-01EB-4AE3-8F14-E07C29F1A97E@telia.com> > On 1 Oct 2016, at 15:48, lorieul wrote: > Indeed Latex formulas are often not easy to > decypher? One can improve readability by using more Unicode characters [1] and the unicode-math package [2], or switching to ConTeXt , which has builtin support. 1. http://milde.users.sourceforge.net/LUCR/Math/unimathsymbols.xhtml 2. https://www.ctan.org/pkg/unicode-math From verdy_p at wanadoo.fr Sat Oct 1 09:24:10 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Oct 2016 16:24:10 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <1475329739.1352.0.camel@coanda-deviation.info> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> <1475329739.1352.0.camel@coanda-deviation.info> Message-ID: 2016-10-01 15:48 GMT+02:00 lorieul : > > The drawback of that solution is lack of readability in the sources. I > would like to have a formatting in the spirit of markdown i.e. a > formating that is easy to read both in the sources and after html- or > pdf- or whatever-generation. Indeed Latex formulas are often not easy to > decypher? Since one spends more time reading source code than > documentation it is important that the comments within the source files > are also easily readable. This way, there is no need to constantly > switch back-and-forth between text editor and documentation : the source > code suffices to itself. The LaTex markup for superscripts/subscript is very simple("^" and "_") , even if you need extra parenthesese to surround subformulas. But in this context of maths formulas, the coder should understand those math formulas in order to implement or use them correctly. But a mere comment block in a source is the the best place to explain everything. In msot cases you'll use references to other documents and will use a precise terminology that is even easier to read than formulas. project managermnt tools help collecting all the needed pieces needed for communication between programmers and users of modules, but a source code does not replace a more formal documentation. Note also that Maths superscripts/subscripts need to support multiple levels of superscripts/subscripts with variable sizes. This is not possible with the Unicode-encoded characters designed only for a single level, but not a problem for TeX, MathML or HTML. The apparent simplicity using preencodec character "variants" becomes a nightmare later for parsing formulas (what does "x??" means: is it "(x^2)^2", i.e. "x^4", or "x^(22)" ?) or generated derived formulas Such problem however does not exist for their use only in linear plaint-text (for example as IPA symbols). -------------- next part -------------- An HTML attachment was scrubbed... URL: From guoyunhebrave at gmail.com Sat Oct 1 08:50:28 2016 From: guoyunhebrave at gmail.com (Guo Yunhe) Date: Sat, 1 Oct 2016 16:50:28 +0300 Subject: Minimum set of Emoji characters Message-ID: Hi, fontconfig project is looking for a define of all basic Emoji characters that a emoji font must have. Is it available from Unicode standards? -- Guo Yunhe From khaledhosny at eglug.org Sat Oct 1 10:37:34 2016 From: khaledhosny at eglug.org (Khaled Hosny) Date: Sat, 1 Oct 2016 17:37:34 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: <20161001153734.GB2923@macbook> On Sat, Oct 01, 2016 at 03:00:50PM +0300, Jukka K. Korpela wrote: > 1.10.2016, 11:29, Khaled Hosny wrote: > > > On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: > [...] > >> What I was pointing at was that when using > > > rich text or markup, it is complicated or impossible to have typographically > > > correct glyphs used (even when they exist), whereas the use of Unicode > > > codepoints for subscript or superscript characters may do that in a much > > > simpler way. > > > > That is not generally true. > > It is generally true, but not without exceptions. > > > In TeX you get true superscript glyphs by default. > > I suppose you?re right, though I don?t know exactly how TeX implements > superscripts. I suspect the fonts that TeX normally uses do not contain > (many) superscript or subscript glyph variants, but TeX might actually map > e.g. ^2 in math mode to a superscript glyph for 2 (identical with to the > glyph for ?). TeX has fonts designed for use at 8pt (size of 1st level scripts) and 5pt (the size of 2nd level scripts) with all the optical correction for them to look right when scaled down. They provide all the glyphs provided by the fonts for larger font sizes, so any character can be used in super or subscripts, no special mapping is needed. Regards, Khaled From mpsuzuki at hiroshima-u.ac.jp Sat Oct 1 11:19:17 2016 From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya) Date: Sun, 02 Oct 2016 01:19:17 +0900 Subject: [Unicode] Minimum set of Emoji characters In-Reply-To: References: Message-ID: <57EFE205.6080605@hiroshima-u.ac.jp> Dear Guo, Have you checked the thread from my post? http://www.unicode.org/mail-arch/unicode-ml/y2016-m09/0026.html Regards, mpsuzuki Guo Yunhe wrote: > Hi, fontconfig project is looking for a define of all basic Emoji > characters that a emoji font must have. Is it available from Unicode > standards? > From mark at macchiato.com Sun Oct 2 09:32:47 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 2 Oct 2016 16:32:47 +0200 Subject: [Unicode] Minimum set of Emoji characters In-Reply-To: <57EFE205.6080605@hiroshima-u.ac.jp> References: <57EFE205.6080605@hiroshima-u.ac.jp> Message-ID: ?At this point, the original set of Japanese emoji has long since been surpassed. The recommendation is to support the set of emoji in the data files referenced by http://www.unicode.org/reports/tr51/. There's much more information about various choices there. Note that there is a proposed new version that will be discussed in early November, at http://www.unicode.org/reports/tr51/proposed.html?, with additional emoji focused around gender support. Mark On Sat, Oct 1, 2016 at 6:19 PM, suzuki toshiya wrote: > Dear Guo, > > Have you checked the thread from my post? > http://www.unicode.org/mail-arch/unicode-ml/y2016-m09/0026.html > > Regards, > mpsuzuki > > Guo Yunhe wrote: > > Hi, fontconfig project is looking for a define of all basic Emoji > > characters that a emoji font must have. Is it available from Unicode > > standards? > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Oct 3 12:14:48 2016 From: doug at ewellic.org (Doug Ewell) Date: Mon, 03 Oct 2016 10:14:48 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> a.lukyanov wrote: > I think that the right thing to do would be to create several new > control/formatting characters, like this: > > "previous character is superscript" > "previous character is subscript" > "previous character is small caps (for use in phonetic transcription > only)" > "previous character is mathematical blackletter" > etc > > Then people will be able to apply this features on any character as > long as their font supports it. I happen to think this would be exactly the wrong thing to do, completely contrary to the principles of plain text that Unicode was founded upon. But you never know what might gain traction, so stay tuned. -- Doug Ewell | Thornton, CO, US | ewellic.org From leoboiko at namakajiri.net Mon Oct 3 12:40:23 2016 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 3 Oct 2016 14:40:23 -0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> Message-ID: Besides, there are already control/formatting characters for such purposes ? several ones, even. They look like this: ^{, ^{}, \textsuperscript{},
\*{ \*} ?

What's more, these powerful control/formatting characters allow one to
apply not only super/subscript and blackletter, but many more features to
any character as long as the font supports them, including bold, italics,
small-caps, optical size changes and countless others. I heartily
recommend using these special control/formatting characters, as they can
considerably *enrich *any text.

2016-10-03 14:14 GMT-03:00 Doug Ewell :

> a.lukyanov wrote:
>
> > I think that the right thing to do would be to create several new
> > control/formatting characters, like this:
> >
> > "previous character is superscript"
> > "previous character is subscript"
> > "previous character is small caps (for use in phonetic transcription
> > only)"
> > "previous character is mathematical blackletter"
> > etc
> >
> > Then people will be able to apply this features on any character as
> > long as their font supports it.
>
> I happen to think this would be exactly the wrong thing to do,
> completely contrary to the principles of plain text that Unicode was
> founded upon. But you never know what might gain traction, so stay
> tuned.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From jkorpela at cs.tut.fi Mon Oct 3 12:51:47 2016
From: jkorpela at cs.tut.fi (Jukka K. Korpela)
Date: Mon, 3 Oct 2016 20:51:47 +0300
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com>

Message-ID: <81691895-adde-70b9-3fe5-685a35e815f5@cs.tut.fi>

3.10.2016, 20:40, Leonardo Boiko wrote:

> Besides, there are already control/formatting characters for such
> purposes ? several ones, even. They look like this: ^{, ^{},
> \textsuperscript{}, \*{ \*} ?

They are not control or formatting characters. They are markup used at
higher protocol levels ? in different markup systems

Yucca

From steve at swales.us Mon Oct 3 12:59:41 2016
From: steve at swales.us (Steve Swales)
Date: Mon, 3 Oct 2016 10:59:41 -0700
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com>
References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com>
Message-ID:

> On Oct 3, 2016, at 10:14 AM, Doug Ewell wrote:
>
> a.lukyanov wrote:
>
>> I think that the right thing to do would be to create several new
>> control/formatting characters, like this:
>>
>> "previous character is superscript"
>> "previous character is subscript"
>> "previous character is small caps (for use in phonetic transcription
>> only)"
>> "previous character is mathematical blackletter"
>> etc
>>
>> Then people will be able to apply this features on any character as
>> long as their font supports it.
>
> I happen to think this would be exactly the wrong thing to do,
> completely contrary to the principles of plain text that Unicode was
> founded upon. But you never know what might gain traction, so stay
> tuned.

I guess I don?t see how it is fundamentally different from other variant selector uses within Unicode, and the ability to write properly formatted mathematical and chemical formulas (for example) in a plain text environment like text messaging seems like a fairly compelling use case.

-steve

From leoboiko at namakajiri.net Mon Oct 3 13:08:02 2016
From: leoboiko at namakajiri.net (Leonardo Boiko)
Date: Mon, 3 Oct 2016 15:08:02 -0300
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <81691895-adde-70b9-3fe5-685a35e815f5@cs.tut.fi>
References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com>

<81691895-adde-70b9-3fe5-685a35e815f5@cs.tut.fi>
Message-ID:

2016-10-03 14:51 GMT-03:00 Jukka K. Korpela :

> They are not control or formatting characters. They are markup used at
> higher protocol levels ? in different markup systems
>
>
That's exactly the point, yes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From neil at tonal.clara.co.uk Mon Oct 3 13:33:30 2016
From: neil at tonal.clara.co.uk (Neil Harris)
Date: Mon, 3 Oct 2016 19:33:30 +0100
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com>

Message-ID: <95c8f288-72f9-1726-c0c9-219341ac64a1@tonal.clara.co.uk>

On 03/10/16 18:59, Steve Swales wrote:
>> On Oct 3, 2016, at 10:14 AM, Doug Ewell wrote:
>>
>> a.lukyanov wrote:
>>
>>> I think that the right thing to do would be to create several new
>>> control/formatting characters, like this:
>>>
>>> "previous character is superscript"
>>> "previous character is subscript"
>>> "previous character is small caps (for use in phonetic transcription
>>> only)"
>>> "previous character is mathematical blackletter"
>>> etc
>>>
>>> Then people will be able to apply this features on any character as
>>> long as their font supports it.
>> I happen to think this would be exactly the wrong thing to do,
>> completely contrary to the principles of plain text that Unicode was
>> founded upon. But you never know what might gain traction, so stay
>> tuned.
> I guess I don?t see how it is fundamentally different from other variant selector uses within Unicode, and the ability to write properly formatted mathematical and chemical formulas (for example) in a plain text environment like text messaging seems like a fairly compelling use case.
>
> -steve
>
>
>

Yes, but since there are existing well-standardized higher-level
protocols already in existence (HTML, MATHML, TeX, etc. etc.) that do
exactly that. They should be used instead, as opposed to trying to make
Unicode something other than a plain-text character encoding, contrary
to its design principles. Moreover, while you describe seems
superficially simple, as soon as you try to expand it, you will find you
end up with systems like this:
http://unicode.org/notes/tn28/UTN28-PlainTextMath.pdf which are neither
one nor the other, and in spite of their proposal as a plain-text
notation, actually ends up being an ad-hoc higher-level protocol anyway.

Neil

From gwalla at gmail.com Mon Oct 3 13:41:51 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Mon, 3 Oct 2016 11:41:51 -0700
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com>

Message-ID:

On Mon, Oct 3, 2016 at 10:59 AM, Steve Swales wrote:

>
> > On Oct 3, 2016, at 10:14 AM, Doug Ewell wrote:
> >
> > a.lukyanov wrote:
> >
> >> I think that the right thing to do would be to create several new
> >> control/formatting characters, like this:
> >>
> >> "previous character is superscript"
> >> "previous character is subscript"
> >> "previous character is small caps (for use in phonetic transcription
> >> only)"
> >> "previous character is mathematical blackletter"
> >> etc
> >>
> >> Then people will be able to apply this features on any character as
> >> long as their font supports it.
> >
> > I happen to think this would be exactly the wrong thing to do,
> > completely contrary to the principles of plain text that Unicode was
> > founded upon. But you never know what might gain traction, so stay
> > tuned.
>
> I guess I don?t see how it is fundamentally different from other variant
> selector uses within Unicode, and the ability to write properly formatted
> mathematical and chemical formulas (for example) in a plain text
> environment like text messaging seems like a fairly compelling use case.
>

That would not be sufficient for properly formatted mathematical formulas.
Exponentiation alone requires an indefinite number of levels of
superscripting, and that's not even getting into things like summation,
integrals, and even the division bar, which require complex two-dimensional
positioning. I don't think chemical formulas need any characters that
aren't already encoded, though atomic symbols are properly formatted with
superscripted mass stacked on top of subscripted atomic number, and
stacking is sometimes used with polyatomic ions (but optional AIUI, so
something like Hg??? is acceptable and understood). If you're referring to
full structural formulas, all bets are off: those are clearly 2-dimensional
diagrams.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Mon Oct 3 13:47:09 2016
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 03 Oct 2016 11:47:09 -0700
Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?=
Message-ID: <20161003114709.665a7a7059d7ee80bb4d670165c8327d.751e97706d.wbe@email03.godaddy.com>

Steve Swales wrote:

>> I happen to think this would be exactly the wrong thing to do,
>> completely contrary to the principles of plain text that Unicode was
>> founded upon. But you never know what might gain traction, so stay
>> tuned.
>
> I guess I don?t see how it is fundamentally different from other
> variant selector uses within Unicode,

Good question. Other variation selectors -- I assume this means U+FE00
through U+FE0F, plus the Plane 14 variation selectors, plus the
Mongolian and ideographic selectors -- are defined and registered for
use with specific, individual base characters. There are a lot of
combinations defined for "text style" and "emoji style," with more
probably on the way, but even in this seemingly open-ended field,
variation selectors are valid only in defined combinations.

The concept here was to invent combining characters for superscript,
subscript, blackletter, etc. that could be applied to any base
character. This is fundamentally different from "valid only in defined
combinations."

> and the ability to write properly formatted mathematical and chemical
> formulas (for example) in a plain text environment like text messaging
> seems like a fairly compelling use case.

It certainly does. That's why UTC did the extensive research, way back
in the 2000 time frame, to determine what characters were appropriate in
mathematical contexts before encoding the Mathematical Alphanumeric
Symbols. They came up with Latin letters for a wide variety of styles,
and digits, Greek letters, and a few others for a subset of those
styles, that were agreed to have special meaning in mathematical
notation. They did not make the set open-ended, as if arbitrary
characters such as & or ? had similar special meaning.

Basic chemical formulas like H?SO? or [ClO?]? can be written in
plain Unicode text. At some point the line between basic and non-basic
has to be drawn, just as with arbitrarily stacked superscripts in math,
and some sort of fancy-text solution has to take over.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From asmusf at ix.netcom.com Mon Oct 3 15:47:09 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Mon, 3 Oct 2016 13:47:09 -0700
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <20161003114709.665a7a7059d7ee80bb4d670165c8327d.751e97706d.wbe@email03.godaddy.com>
References: <20161003114709.665a7a7059d7ee80bb4d670165c8327d.751e97706d.wbe@email03.godaddy.com>
Message-ID:

An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Mon Oct 3 16:43:04 2016
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 03 Oct 2016 14:43:04 -0700
Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?=
Message-ID: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>

Asmus Freytag (c) wrote:

> As a result, you can write basic formulas for select compounds, but
> not all. Given that these basic formulae don't need full 2-D layout,
> this still seems like an arbitrary restriction.

Adding a carefully selected group of styled characters to the original,
carefully selected set seems perfectly reasonable, and is how Unicode
has worked for around 25 years. Is your suggestion to do that, or to
throw the doors wide open?

--
Doug Ewell | Thornton, CO, US | ewellic.org

From samjnaa at gmail.com Tue Oct 4 03:13:57 2016
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Tue, 4 Oct 2016 13:43:57 +0530
Subject: Android character picker
In-Reply-To:
References:
Message-ID:

Hello.

Kindly advise on what is the most comprehensive and up to date Unicode
character picker for Android available.

Am not able to find a good one.

Thanks..
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From charupdate at orange.fr Tue Oct 4 05:35:53 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Tue, 4 Oct 2016 12:35:53 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
Message-ID: <861342229.4994.1475577353789.JavaMail.www@wwinf1n25>

On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
> On 10/3/2016 11:47 AM, Doug Ewell wrote:
> > Basic chemical formulas like H?SO? or [ClO?]? can be written in
> > plain Unicode text. At some point the line between basic and non-basic
> > has to be drawn, just as with arbitrarily stacked superscripts in math,
> > and some sort of fancy-text solution has to take over.
>
> UTC determined many years ago in response to a proposal, that alpha, beta
> and gamma, common in organic chemistry, were not acceptable for encoding
> as super/subscripts.
>
> At the time, this was requested to support plain text databases used for
> regulatory purposes, where these were required as super or subscripts.
>
> Later, the beta and gamma were encoded for phonetic notation, but not the
> alpha.
>
> As a result, you can write basic formulas for select compounds, but not all.
> Given that these basic formulae don't need full 2-D layout, this still seems
> like an arbitrary restriction.

When it?s about informatics, arbitrary restrictions are precisely what gets me
upset. Those limitations are?as I wrote the other day?a useless worsening
of the usability and usefulness of a product.

On Mon, 03 Oct 2016 14:43:04 -0700, Doug Ewell replied:
> Asmus Freytag (c) wrote:
>
> > As a result, you can write basic formulas for select compounds, but
> > not all. Given that these basic formulae don't need full 2-D layout,
> > this still seems like an arbitrary restriction.
>
> Adding a carefully selected group of styled characters to the original,
> carefully selected set seems perfectly reasonable, and is how Unicode
> has worked for around 25 years. Is your suggestion to do that, or to
> throw the doors wide open?

I guess there is no need to throw any door open, and I?m sure that no
suggestion to do so is included here. After the great many options that
have been discussed, it?s now up to encode no more than one or, say,
a handful more superscripts and subscripts, to enable people to achieve
a great deal of database architecture.

Marcel

From c933103 at gmail.com Tue Oct 4 06:50:05 2016
From: c933103 at gmail.com (gfb hjjhjh)
Date: Tue, 4 Oct 2016 19:50:05 +0800
Subject: What happened to Unicode CLDR's site?
Message-ID:

Why is the site suspended by Google and how to access it now?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From daniel.buenzli at erratique.ch Tue Oct 4 06:57:19 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Tue, 4 Oct 2016 13:57:19 +0200
Subject: What happened to Unicode CLDR's site?
In-Reply-To:
References:
Message-ID:

On Tuesday 4 October 2016 at 13:50, gfb hjjhjh wrote:
> Why is the site suspended by Google and how to access it now?

FWIW I reported the issue today using the website's reporting form. So I guess the answer is wait.

Best,

Daniel

From liste at secarica.ro Tue Oct 4 07:51:24 2016
From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=)
Date: Tue, 4 Oct 2016 15:51:24 +0300
Subject: What happened to Unicode CLDR's site?
In-Reply-To:
References:
Message-ID: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>

?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris:

> Why is the site suspended by Google and how to access it now?

Just curious: Unicode = Google ? (physically)

I am asking this because by entering directly http://cldr.unicode.org
the error result belongs to Google and not to unicode.org.

?

Cristi

--
Cristian Secar?
http://www.sec?ric?.ro

From marc.blanchet at viagenie.ca Tue Oct 4 08:04:02 2016
From: marc.blanchet at viagenie.ca (Marc Blanchet)
Date: Tue, 04 Oct 2016 09:04:02 -0400
Subject: What happened to Unicode CLDR's site?
In-Reply-To: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>
References:
<20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>
Message-ID:

On 4 Oct 2016, at 8:51, Cristian Secar? wrote:

> ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris:
>
>> Why is the site suspended by Google and how to access it now?
>
> Just curious: Unicode = Google ? (physically)

well, does not look Google to me? but see below

//////////////
dig unicode.org NS

;; ANSWER SECTION:
unicode.org. 86400 IN NS nserver.euro.apple.com.
unicode.org. 86400 IN NS nserver2.apple.com.
unicode.org. 86400 IN NS nserver3.apple.com.
unicode.org. 86400 IN NS nserver.apple.com.
unicode.org. 86400 IN NS nserver.asia.apple.com.
unicode.org. 86400 IN NS nserver4.apple.com.

///////
dig unicode.org A

;; ANSWER SECTION:
unicode.org. 2757 IN A 216.97.88.9

whois 216.97.88.9

NetRange: 216.97.0.0 - 216.97.127.255
CIDR: 216.97.0.0/17
NetName: CORESPACE-4
NetHandle: NET-216-97-0-0-1
Parent: NET216 (NET-216-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS54489
Organization: CoreSpace, Inc. (CORES-27)
RegDate: 2000-08-23
Updated: 2013-02-21
Ref: https://whois.arin.net/rest/net/NET-216-97-0-0-1

OrgName: CoreSpace, Inc.
OrgId: CORES-27
Address: 7505 John W. Carpenter Freeway
City: Dallas
StateProv: TX
PostalCode: 75247
Country: US
RegDate: 2009-08-10
Updated: 2012-04-30
Ref: https://whois.arin.net/rest/org/CORES-27
//////////
BUT:

dig cldr.unicode.org A

;; ANSWER SECTION:
cldr.unicode.org. 37687 IN CNAME ghs.google.com.
ghs.google.com. 86400 IN CNAME ghs.l.google.com.
ghs.l.google.com. 230 IN A 173.194.208.121

so cldr seems to be hosted by Google.

Marc.
>
> I am asking this because by entering directly http://cldr.unicode.org
> the error result belongs to Google and not to unicode.org.
>
> ?
>
> Cristi
>
> --
> Cristian Secar?
> http://www.sec?ric?.ro

From srl at icu-project.org Tue Oct 4 08:53:06 2016
From: srl at icu-project.org (Steven R. Loomis)
Date: Tue, 4 Oct 2016 06:53:06 -0700
Subject: What happened to Unicode CLDR's site?
In-Reply-To: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>
References:
<20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>
Message-ID: <1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org>

Yes, the web content is hosted by google sites, a web hosting provider.

As to it being down, i understand this is being looked into.

Enviado desde nuestro iPhone.

> El oct. 4, 2016, a las 5:51 AM, Cristian Secar? escribi?:
>
> ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris:
>
>> Why is the site suspended by Google and how to access it now?
>
> Just curious: Unicode = Google ? (physically)
>
> I am asking this because by entering directly http://cldr.unicode.org
> the error result belongs to Google and not to unicode.org.
>
> ?
>
> Cristi
>
> --
> Cristian Secar?
> http://www.sec?ric?.ro
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Tue Oct 4 11:00:18 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Tue, 4 Oct 2016 18:00:18 +0200
Subject: What happened to Unicode CLDR's site?
In-Reply-To: <1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org>
References:
<20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>
<1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org>
Message-ID:

It looks that an automated bot run by Google detected an excessive use of
bandwidth and launch the block, waiting for another subcription or payment,
even if the site was (possibly) donated by Google itself. That bit probably
does not know what it does and acts like any other hosted site. (Google's
own usage policy is probably more enforced now: you can host free websites
but above some threshold it will be blocked).

Note also that this is the webhosting which is blocked, not the domain name
(hosted by Apple who probably offered it to the Consortium).

There's probably been a lack of communication somewhere in Google, or an
administrator error that removed an exception for a site that should have
first been handled specially internally by a human hierarchy.

If the usage limit was exhausted, may be this is because the site was
harvested by some malwares and I think it's reasonnable to block it first
before scanning, cleaning, restoring damaged parts from a safe backup, and
investigating about which protection measures were missing or should be
taken).

There's certainly people looking for what happend precisely. I hope this is
just an administrative measure that can be easily reversed and that no
damage happend to CLDR data (and to private data there about CLDR surveyors
or user authentication databases). I don't think there's damage on the
released CLDR data, but there could be losses in some recent ongoing works.

2016-10-04 15:53 GMT+02:00 Steven R. Loomis :

> Yes, the web content is hosted by google sites, a web hosting provider.
>
> As to it being down, i understand this is being looked into.
>
> Enviado desde nuestro iPhone.
>
> El oct. 4, 2016, a las 5:51 AM, Cristian Secar?
> escribi?:
>
> ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris:
>
> Why is the site suspended by Google and how to access it now?
>
>
> Just curious: Unicode = Google ? (physically)
>
> I am asking this because by entering directly http://cldr.unicode.org
> the error result belongs to Google and not to unicode.org.
>
> ?
>
> Cristi
>
> --
> Cristian Secar?
> http://www.sec?ric?.ro
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Tue Oct 4 11:25:36 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 04 Oct 2016 09:25:36 -0700
Subject: What happened to Unicode CLDR's =?UTF-8?Q?site=3F?=
Message-ID: <20161004092536.665a7a7059d7ee80bb4d670165c8327d.e4fa88a6b9.wbe@email03.godaddy.com>

It seems to be back up as of 16:23 UTC.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From srl at icu-project.org Tue Oct 4 11:56:53 2016
From: srl at icu-project.org (Steven R. Loomis)
Date: Tue, 04 Oct 2016 09:56:53 -0700
Subject: What happened to Unicode CLDR's site?
In-Reply-To: <20161004092536.665a7a7059d7ee80bb4d670165c8327d.e4fa88a6b9.wbe@email03.godaddy.com>
References: <20161004092536.665a7a7059d7ee80bb4d670165c8327d.e4fa88a6b9.wbe@email03.godaddy.com>
Message-ID: <1E394F3C-60B4-431D-8011-BB9B7B9033EF@icu-project.org>

Depending on DNS propagation, you may see minor glitches today. But the content should all be back up.

-s

El [FECHA], "[NOMBRE]" <[DIRECCI?N]> escribi?:

>It seems to be back up as of 16:23 UTC.
>
>--
>Doug Ewell | Thornton, CO, US | ewellic.org

From leoboiko at namakajiri.net Tue Oct 4 12:25:56 2016
From: leoboiko at namakajiri.net (Leonardo Boiko)
Date: Tue, 4 Oct 2016 14:25:56 -0300
Subject: What happened to Unicode CLDR's site?
In-Reply-To:
References:
<20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>
<1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org>

Message-ID:

The Google error message felt a bit too harsh for a webhosting client who
merely exceeded their allotted bandwidth. It made it sound like the
website was hosting something illegal.

2016-10-04 13:00 GMT-03:00 Philippe Verdy :

> It looks that an automated bot run by Google detected an excessive use of
> bandwidth and launch the block, waiting for another subcription or payment,
> even if the site was (possibly) donated by Google itself. That bit probably
> does not know what it does and acts like any other hosted site. (Google's
> own usage policy is probably more enforced now: you can host free websites
> but above some threshold it will be blocked).
>
> Note also that this is the webhosting which is blocked, not the domain
> name (hosted by Apple who probably offered it to the Consortium).
>
> There's probably been a lack of communication somewhere in Google, or an
> administrator error that removed an exception for a site that should have
> first been handled specially internally by a human hierarchy.
>
> If the usage limit was exhausted, may be this is because the site was
> harvested by some malwares and I think it's reasonnable to block it first
> before scanning, cleaning, restoring damaged parts from a safe backup, and
> investigating about which protection measures were missing or should be
> taken).
>
> There's certainly people looking for what happend precisely. I hope this
> is just an administrative measure that can be easily reversed and that no
> damage happend to CLDR data (and to private data there about CLDR surveyors
> or user authentication databases). I don't think there's damage on the
> released CLDR data, but there could be losses in some recent ongoing works.
>
> 2016-10-04 15:53 GMT+02:00 Steven R. Loomis :
>
>> Yes, the web content is hosted by google sites, a web hosting provider.
>>
>> As to it being down, i understand this is being looked into.
>>
>> Enviado desde nuestro iPhone.
>>
>> El oct. 4, 2016, a las 5:51 AM, Cristian Secar?
>> escribi?:
>>
>> ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris:
>>
>> Why is the site suspended by Google and how to access it now?
>>
>>
>> Just curious: Unicode = Google ? (physically)
>>
>> I am asking this because by entering directly http://cldr.unicode.org
>> the error result belongs to Google and not to unicode.org.
>>
>> ?
>>
>> Cristi
>>
>> --
>> Cristian Secar?
>> http://www.sec?ric?.ro
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Tue Oct 4 12:59:04 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Tue, 4 Oct 2016 19:59:04 +0200
Subject: What happened to Unicode CLDR's site?
In-Reply-To:
References:
<20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro>
<1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org>

Message-ID:

2016-10-04 19:25 GMT+02:00 Leonardo Boiko :

> The Google error message felt a bit too harsh for a webhosting client who
> merely exceeded their allotted bandwidth. It made it sound like the
> website was hosting something illegal.
>

It's not impossible that the site was hacked a bit somewhere and used by a
thirdparty to host illegal content, or some malwares causing it to generate
a spike of bandwidth.

Stopping the website temporarily is a safe measure before admins can
explain what is causing this unexpected excess, and some cleanup operations
are eventually performed and some additional security measures taken
(Google itself cannot do that cleanup without an active action by the site
maintainer).

However I agree that the automatic message sent by Google used by the
blocker was very harsh.
Google can detect malwares running on hosted sites and could be more
informative about the cause:
- blocked because of a security issue (without explaining more to the
public, could be a DDoS damaging the operations on other hosted websites,
or hacked contents...).
- blocked until the site admins solve technical problems.
- blocked temporarily because of excess bandwidth (but no security issue
detected), but not saying publicly if this is because of failed payments
(this is private communication between the host provider and the web
service).
- blocked temporarily due to a technical problem on the hosting platform
itself.
- blocked indefinitely due to a legal constraint (such as a court order,
the court order may force the publication of a legal notice on a static
page).
And it should provide a better way of contact for site admins, or for
explaining what visitors can do (if there was a malware hosted on the site,
what they should do themselves on their own devices).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From liste at secarica.ro Tue Oct 4 14:31:41 2016
From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=)
Date: Tue, 4 Oct 2016 22:31:41 +0300
Subject: Android character picker
In-Reply-To:
References:

Message-ID: <20161004223141.b05b7afd011e3875052b7f76@secarica.ro>

?n data de Tue, 4 Oct 2016 13:43:57 +0530, Shriramana Sharma a scris:

> Kindly advise on what is the most comprehensive and up to date Unicode
> character picker for Android available.
> Am not able to find a good one.

You didn't mentioned what (if) application you tried already and perhaps
what means "a good one" by comparison.

A search for "charmap" on Google Play gives at least two results. I
tried (superficially) one of them, on which I was able to pick a group
of characters with no problem.

Cristi

--
Cristian Secar?
http://www.sec?ric?.ro

From duerst at it.aoyama.ac.jp Wed Oct 5 00:27:44 2016
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=)
Date: Wed, 5 Oct 2016 14:27:44 +0900
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
Message-ID: <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>

On 2016/10/04 19:35, Marcel Schneider wrote:
> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:

>> Later, the beta and gamma were encoded for phonetic notation, but not the
>> alpha.
>>
>> As a result, you can write basic formulas for select compounds, but not all.
>> Given that these basic formulae don't need full 2-D layout, this still seems
>> like an arbitrary restriction.
>
> When it?s about informatics, arbitrary restrictions are precisely what gets me
> upset. Those limitations are?as I wrote the other day?a useless worsening
> of the usability and usefulness of a product.

This kind of "let's avoid arbitrary limitations" argument works very
well for subjects that are theoretical, straightforward, and rigid in
nature. Many (but not all) subjects in computer science (informatics)
are indeed of such a nature.

The Unicode Consortium (or more specifically, the UTC) does a lot of
hard work to create theories where appropriate, and to explain them
where possible. But they recognize (and we should do so, too) that in
the end, writing is a *cultural* phenomenon, where straightforward,
rigid theories have severe limitations.

From a certain viewpoint (the chemist's in the example above), the
result may look arbitrary, but from another viewpoint (the
phoneticist's), it looks perfectly fine. At first, it looks like it
would be easy to fix such problems, but each fix risks to introduce new
arbitrariness when seen from somebody else's viewpoint. Getting upset
won't help.

Regards, Martin.

From charupdate at orange.fr Wed Oct 5 08:57:48 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 5 Oct 2016 15:57:48 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
Message-ID: <283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote:
> On 2016/10/04 19:35, Marcel Schneider wrote:
>> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
>>
>>> Later, the beta and gamma were encoded for phonetic notation, but not the
>>> alpha.
>>>
>>> As a result, you can write basic formulas for select compounds, but not all.
>>> Given that these basic formulae don't need full 2-D layout, this still seems
>>> like an arbitrary restriction.
>>
>> When it?s about informatics, arbitrary restrictions are precisely what gets me
>> upset. Those limitations are?as I wrote the other day?a useless worsening
>> of the usability and usefulness of a product.
>
> This kind of "let's avoid arbitrary limitations" argument works very
> well for subjects that are theoretical, straightforward, and rigid in
> nature. Many (but not all) subjects in computer science (informatics)
> are indeed of such a nature.
>
> The Unicode Consortium (or more specifically, the UTC) does a lot of
> hard work to create theories where appropriate, and to explain them
> where possible. But they recognize (and we should do so, too) that in
> the end, writing is a *cultural* phenomenon, where straightforward,
> rigid theories have severe limitations.
>
> From a certain viewpoint (the chemist's in the example above), the
> result may look arbitrary, but from another viewpoint (the
> phoneticist's), it looks perfectly fine. At first, it looks like it
> would be easy to fix such problems, but each fix risks to introduce new
> arbitrariness when seen from somebody else's viewpoint. Getting upset
> won't help.

I?ve got the point, thanks. Phonetics need to write running text that is
immediately legible, while a chemistry database may use particular notational
conventions that work with baseline letters to be parsed on semantics or light
markup for proper display in the UI. The UTC decision thus questioned the design
principle of using plain text for chemical formulae. No doubt it was understood
that validating this choice would have opened the door to encoding more special
characters for upgrading or similar purposes.

At this point I?d like to mention what I thought about since this thread
was launched. The French language makes extensive use of superscripts
to note abbreviations. This is not a mere styling issue, as it is in English.
E.g. without superscripts, the abbreviation ?nos? [numbers] is ambiguated with
the pronoun ?nos? [our]. The most that can be easily disambiguated is ?n?? [number]
with the degree sign available on the common French keyboard layout.
For the anecdote: When a technician led me to discover the field
?no centre mess? in the UI of my cellphone, it took me several seconds to understand
?number of SMS center/centre? which is the actual meaning; but here, some additional
confusion resulted from the interlanguage homograph ?no?.

Written words being ambiguated with one another is a common phenomenon in
natural languages. Performing disambiguation is widely achieved by adding
vowel signs (Hebrew) or diacritics (Latin script using languages).
French was disfavored in computer practice (applied informatics) during a
certain time when diacritics were unavailable?on uppercase letters longer
than on lowercase.
AFAIK, Latin letters like ??? and ??? first gained binary existence thanks
to the ISO?6937 charset, while a Dutch standards author asked his compatriots
to always write ?ij? with two ASCII letters, and two Frenchmen prevented the ???
from being encoded in Latin-1 at the intended code points because of its
non-existence in computer printers.

But today, thanks to Unicode, that?s all over. Therefore I suggest to grant
the French language full support by enabling superscript lowercase letters
in order that the SUPERSCRIPT deadkey that the French Standards body recommends,
will work for all abreviations. There is no point about other letters than the basic
alphabet superscripted, as no French abbreviation exceeds this range (despite of
what I believed in 2014, like many other people).
Additionally I?m proposing a modifier key combination (using a new modifier key on
the 105th key on ISO keyboards) to access the lowercase superscripts on live keys:
Shift + Num + [letter key] ? [superscript lowercase].
I can easily type ?on the 105?? key?, and so will all users in France, at least
with the dead key.

The missing letter is superscript q == MODIFIER LETTER SMALL Q.
Actually, when Shift + Num + Q is pressed on the projects,
? ?q_n?existe_pas? [ superscript ?q? does not exist] is inserted.

Karl Pentzlin had the merit of proposing the missing letter superscript q
for use in French abbreviations, but the UTC must have refused by arguing
from English usage and from French recommendations. These are now changing.
More, as I tried to demonstrate above, one cannot always rely on such
low-profile recommendations, which express more the humility and undemandingness
of their author, than the real practical needs and linguistical requirements.

As of searchability, Google have even the mathematical alphabets in their
equivalence classes, so that any request written e.g. in doublestruck letters
is read as if it were entered in plain ASCII.

Best regards,

Marcel

From moyogo at gmail.com Wed Oct 5 09:17:30 2016
From: moyogo at gmail.com (Denis Jacquerye)
Date: Wed, 05 Oct 2016 14:17:30 +0000
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <283719302.9783.1475675868120.JavaMail.www@wwinf1f05>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>
Message-ID:

> There is no point about other letters than the basic alphabet
superscripted,
> as no French abbreviation exceeds this range (despite of what I believed
> in 2014, like many other people).

What does that mean? How would that help for the French vernacular
3?me, or the Spanish C.?a. You might find
there are many more uses than you think. Higher level protocols can already
support these.
Maybe what we need is better and more general higher level protocol support.

On Wed, 5 Oct 2016 at 15:01 Marcel Schneider wrote:

> On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote:
> > On 2016/10/04 19:35, Marcel Schneider wrote:
> >> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
> >>
> >>> Later, the beta and gamma were encoded for phonetic notation, but not
> the
> >>> alpha.
> >>>
> >>> As a result, you can write basic formulas for select compounds, but
> not all.
> >>> Given that these basic formulae don't need full 2-D layout, this still
> seems
> >>> like an arbitrary restriction.
> >>
> >> When it?s about informatics, arbitrary restrictions are precisely what
> gets me
> >> upset. Those limitations are?as I wrote the other day?a useless
> worsening
> >> of the usability and usefulness of a product.
> >
> > This kind of "let's avoid arbitrary limitations" argument works very
> > well for subjects that are theoretical, straightforward, and rigid in
> > nature. Many (but not all) subjects in computer science (informatics)
> > are indeed of such a nature.
> >
> > The Unicode Consortium (or more specifically, the UTC) does a lot of
> > hard work to create theories where appropriate, and to explain them
> > where possible. But they recognize (and we should do so, too) that in
> > the end, writing is a *cultural* phenomenon, where straightforward,
> > rigid theories have severe limitations.
> >
> > From a certain viewpoint (the chemist's in the example above), the
> > result may look arbitrary, but from another viewpoint (the
> > phoneticist's), it looks perfectly fine. At first, it looks like it
> > would be easy to fix such problems, but each fix risks to introduce new
> > arbitrariness when seen from somebody else's viewpoint. Getting upset
> > won't help.
>
> I?ve got the point, thanks. Phonetics need to write running text that is
> immediately legible, while a chemistry database may use particular
> notational
> conventions that work with baseline letters to be parsed on semantics or
> light
> markup for proper display in the UI. The UTC decision thus questioned the
> design
> principle of using plain text for chemical formulae. No doubt it was
> understood
> that validating this choice would have opened the door to encoding more
> special
> characters for upgrading or similar purposes.
>
> At this point I?d like to mention what I thought about since this thread
> was launched. The French language makes extensive use of superscripts
> to note abbreviations. This is not a mere styling issue, as it is in
> English.
> E.g. without superscripts, the abbreviation ?nos? [numbers] is ambiguated
> with
> the pronoun ?nos? [our]. The most that can be easily disambiguated is ?n??
> [number]
> with the degree sign available on the common French keyboard layout.
> For the anecdote: When a technician led me to discover the field
> ?no centre mess? in the UI of my cellphone, it took me several seconds to
> understand
> ?number of SMS center/centre? which is the actual meaning; but here, some
> additional
> confusion resulted from the interlanguage homograph ?no?.
>
> Written words being ambiguated with one another is a common phenomenon in
> natural languages. Performing disambiguation is widely achieved by adding
> vowel signs (Hebrew) or diacritics (Latin script using languages).
> French was disfavored in computer practice (applied informatics) during a
> certain time when diacritics were unavailable?on uppercase letters longer
> than on lowercase.
> AFAIK, Latin letters like ??? and ??? first gained binary existence thanks
> to the ISO?6937 charset, while a Dutch standards author asked his
> compatriots
> to always write ?ij? with two ASCII letters, and two Frenchmen prevented
> the ???
> from being encoded in Latin-1 at the intended code points because of its
> non-existence in computer printers.
>
> But today, thanks to Unicode, that?s all over. Therefore I suggest to grant
> the French language full support by enabling superscript lowercase letters
> in order that the SUPERSCRIPT deadkey that the French Standards body
> recommends,
> will work for all abreviations. There is no point about other letters than
> the basic
> alphabet superscripted, as no French abbreviation exceeds this range
> (despite of
> what I believed in 2014, like many other people).
> Additionally I?m proposing a modifier key combination (using a new
> modifier key on
> the 105th key on ISO keyboards) to access the lowercase superscripts on
> live keys:
> Shift + Num + [letter key] ? [superscript lowercase].
> I can easily type ?on the 105?? key?, and so will all users in France, at
> least
> with the dead key.
>
> The missing letter is superscript q == MODIFIER LETTER SMALL Q.
> Actually, when Shift + Num + Q is pressed on the projects,
> ? ?q_n?existe_pas? [ superscript ?q? does not exist] is inserted.
>
> Karl Pentzlin had the merit of proposing the missing letter superscript q
> for use in French abbreviations, but the UTC must have refused by arguing
> from English usage and from French recommendations. These are now changing.
> More, as I tried to demonstrate above, one cannot always rely on such
> low-profile recommendations, which express more the humility and
> undemandingness
> of their author, than the real practical needs and linguistical
> requirements.
>
> As of searchability, Google have even the mathematical alphabets in their
> equivalence classes, so that any request written e.g. in doublestruck
> letters
> is read as if it were entered in plain ASCII.
>
> Best regards,
>
> Marcel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From A.Schappo at lboro.ac.uk Wed Oct 5 09:37:23 2016
From: A.Schappo at lboro.ac.uk (Andre Schappo)
Date: Wed, 5 Oct 2016 14:37:23 +0000
Subject: My Annual Unicode Questions
Message-ID: <278CED01-10B0-4E02-A452-147A8F08D919@lboro.ac.uk>

This week is the first week of the new academic year at my university. One of the modules I co-teach is entitled "Programming for the WWW" which encompasses JavaScript and DHTML. This is a first year module. There were approx 70 students in the lab practical this morning. I asked them my annual questions.

Q. Who has heard of Unicode?
A. Approx 20% of the class raised their hands. (Same as last year http://www.unicode.org/mail-arch/unicode-ml/y2015-m12/0073.html)

Q. Who understands Unicode?
A. One student raised his hand. (This is an improvement on last year as no hand was raised last year)

Andr? Schappo

From martinmueller at northwestern.edu Wed Oct 5 01:35:52 2016
From: martinmueller at northwestern.edu (Martin Mueller)
Date: Wed, 5 Oct 2016 06:35:52 +0000
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
Message-ID: <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu>

There is always a lot more history than reason in the world. That said, given that alphabets have fixed numbers, it?s weird that bits of super and subscripted letters appear in this or that limited range but that you can?t cobble a whole alphabet together in a consistent manner. If any , why not all, especially if there are only two or three dozen.

On 10/4/16, 11:27 PM, "Unicode on behalf of Martin J. D?rst" wrote:

On 2016/10/04 19:35, Marcel Schneider wrote:
> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:

>> Later, the beta and gamma were encoded for phonetic notation, but not the
>> alpha.
>>
>> As a result, you can write basic formulas for select compounds, but not all.
>> Given that these basic formulae don't need full 2-D layout, this still seems
>> like an arbitrary restriction.
>
> When it?s about informatics, arbitrary restrictions are precisely what gets me
> upset. Those limitations are?as I wrote the other day?a useless worsening
> of the usability and usefulness of a product.

This kind of "let's avoid arbitrary limitations" argument works very
well for subjects that are theoretical, straightforward, and rigid in
nature. Many (but not all) subjects in computer science (informatics)
are indeed of such a nature.

The Unicode Consortium (or more specifically, the UTC) does a lot of
hard work to create theories where appropriate, and to explain them
where possible. But they recognize (and we should do so, too) that in
the end, writing is a *cultural* phenomenon, where straightforward,
rigid theories have severe limitations.

From a certain viewpoint (the chemist's in the example above), the
result may look arbitrary, but from another viewpoint (the
phoneticist's), it looks perfectly fine. At first, it looks like it
would be easy to fix such problems, but each fix risks to introduce new
arbitrariness when seen from somebody else's viewpoint. Getting upset
won't help.

Regards, Martin.

From charupdate at orange.fr Wed Oct 5 10:04:05 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 5 Oct 2016 17:04:05 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

Message-ID: <1301337346.11235.1475679845658.JavaMail.www@wwinf1f05>

On Wed, 05 Oct 2016 14:17:30 +0000, Denis Jacquerye wrote;

>> There is no point about other letters than the basic alphabet superscripted,
>> as no French abbreviation exceeds this range (despite of what I believed
>> in 2014, like many other people).
>
> What does that mean? How would that help for the French vernacular
> 3?me, or the Spanish C.?a. You might find
> there are many more uses than you think. Higher level protocols can already
> support these.
> Maybe what we need is better and more general higher level protocol support.

I agree with most points.

?
> better and more general higher level protocol support.

Perhaps starting with Word not cancelling superscripting as soon as
a character style is applied.

?
> Higher level protocols can already support these.

They can even support the copyleft symbol by turning the copyright sign,
as the proposer of the former indicated, with CSS (one example: [1]).

?
> the Spanish C.?a. You might find
> there are many more uses than you think.

Spanish and many other languages are different in that they use punctuation
to note abbreviations, while in French, even the dot is prohibited in this
use case. Spanish ?C.?a? is intelligible even without superscripting.
Having said that? maybe there remain some cases that are not covered with
superscripted basic letters while they are prone to confuse people, OK.

?
> How would that help for the French vernacular 3?me

It don?t, but as I wrote in parentheses (unfortunately without quoting
any example of an ordinal number), this corresponds to ??what I believed
in 2014, like many other people??.

Kind regards,
Marcel

[1]: http://dispoclavier.com/#h448 [last line before table caption]

From kenwhistler at att.net Wed Oct 5 10:09:33 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Wed, 5 Oct 2016 08:09:33 -0700
Subject: My Annual Unicode Questions
In-Reply-To: <278CED01-10B0-4E02-A452-147A8F08D919@lboro.ac.uk>
References: <278CED01-10B0-4E02-A452-147A8F08D919@lboro.ac.uk>
Message-ID: <68d6d4fd-5de9-a904-caf9-22571a869918@att.net>

On 10/5/2016 7:37 AM, Andre Schappo wrote:
> Q. Who understands Unicode?
> A. One student raised his hand. (This is an improvement on last year as no hand was raised last year)

A brave soul, indeed!

After 27 years of Unicode development, and with the standard (and its
accumulated ancillary standards, data, repositories, and libraries)
grown so huge, it is no longer clear to me how many participants in a
*UTC* meeting would raise their hands in response to that question!

--Ken

From doug at ewellic.org Wed Oct 5 10:20:26 2016
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 05 Oct 2016 08:20:26 -0700
Subject: My Annual Unicode Questions
Message-ID: <20161005082026.665a7a7059d7ee80bb4d670165c8327d.e452b35eae.wbe@email03.godaddy.com>

Ken Whistler wrote:

>> Q. Who understands Unicode?
>> A. One student raised his hand. (This is an improvement on last year
>> as no hand was raised last year)
>
> After 27 years of Unicode development, and with the standard (and its
> accumulated ancillary standards, data, repositories, and libraries)
> grown so huge, it is no longer clear to me how many participants in a
> *UTC* meeting would raise their hands in response to that question!

The bar for "understands" is lower among non-experts. I would actually
be considered the resident "Unicode expert" among my co-workers, which
might surprise some folks and alarm others.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From verdy_p at wanadoo.fr Wed Oct 5 10:34:02 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 5 Oct 2016 17:34:02 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

Message-ID:

2016-10-05 16:17 GMT+02:00 Denis Jacquerye :

> > There is no point about other letters than the basic alphabet
> superscripted,
> > as no French abbreviation exceeds this range (despite of what I believed
> > in 2014, like many other people).
>
> What does that mean? How would that help for the French vernacular
> 3?me, or the Spanish C.?a. You might find
> there are many more uses than you think. Higher level protocols can already
> support these.
> Maybe what we need is better and more general higher level protocol
> support.
>
I agree, French allows abbreviating many words by appending the last new
letters in superscripts. 3e is recommended but
3?me
is still very frequent. As well you'll see abbreviations using ^?
(a frequent termination for past participles, generally used with the
previous consonnant and possibly followed with the f?minine/plural final
letters, all in superscript).

Almost nobody use the preencoded superscript letters for this (notably not
for "1^er", or its recommended feminine form "1^re",
still frequently written "1^?re")
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From charupdate at orange.fr Wed Oct 5 10:44:38 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 5 Oct 2016 17:44:38 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu>
Message-ID: <996912232.12078.1475682278718.JavaMail.www@wwinf1f05>

On Wed, 5 Oct 2016 06:35:52 +0000, Martin Mueller wrote:

> There is always a lot more history than reason in the world.
> That said, given that alphabets have fixed numbers, it?s weird
> that bits of super and subscripted letters appear in this or
> that limited range but that you can?t cobble a whole alphabet
> together in a consistent manner. If any , why not all, especially
> if there are only two or three dozen.

They would end up in the SMP, threatening their usability on Windows
keyboard layouts due to their not being defined in XML like Apple?s are,
and not being able to output two UTF-16 code points by dead keys, but for
IMEs this is no problem.

>From a more theoretical viewpoint, encoding superscripted letters as such
is opposed to Unicode?s design principles, as it has already been pointed
out. This is why only legacy superscripts have SUPERSCRIPT in their name.

As of the scattered code point allocations, they come from the pragmatic
encoding. A letter isn?t encoded as a preformatted superscript unless
there are one or more precise usages, documented in the proposal.

To come back to my new point in this thread: I?m believing that in French,
superscript lowercase letters have a particular function as abbreviation
indicators, in the absence of any other visible sign. This viewpoint is
now gaining audience, as it comes from French authorities (DGLFLF, Afnor)
who are demanding the /superscript/ dead key, to write abbreviations.
In French, there is a need and a demand to move this from higher level
to plain text.

Hence the need of the MODIFIER LETTER SMALL Q, for a proper solution.
E.g., when trying to abbreviate ?Biblioth?que? to ?Bibque? in
plain text, one will actually end up with ?Bib ?q_n?existe_pas???.
There must be such a message, otherwise users may think there is a bug
in the keyboard.
Once the encoding of MODIFIER LETTER SMALL Q is at the point where the new
scalar value is known, this will take the place of the sequence, and first
display as a notdef box. Explaining this is then a matter of documentation.
I wasn?t upset about the missing superscript q. But end-users could get.

Regards,

Marcel

From frederic.grosshans at gmail.com Wed Oct 5 12:02:51 2016
From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=)
Date: Wed, 5 Oct 2016 19:02:51 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <283719302.9783.1475675868120.JavaMail.www@wwinf1f05>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>
Message-ID:

Le 05/10/2016 ? 15:57, Marcel Schneider a ?crit :
> On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote:
>> On 2016/10/04 19:35, Marcel Schneider wrote:
>>> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
>>>
>>>> Later, the beta and gamma were encoded for phonetic notation, but not the
>>>> alpha.
>>>>
>>>> As a result, you can write basic formulas for select compounds, but not all.
>>>> Given that these basic formulae don't need full 2-D layout, this still seems
>>>> like an arbitrary restriction.
>>> When it?s about informatics, arbitrary restrictions are precisely what gets me
>>> upset. Those limitations are?as I wrote the other day?a useless worsening
>>> of the usability and usefulness of a product.
>> This kind of "let's avoid arbitrary limitations" argument works very
>> well for subjects that are theoretical, straightforward, and rigid in
>> nature. Many (but not all) subjects in computer science (informatics)
>> are indeed of such a nature.
>>
>> The Unicode Consortium (or more specifically, the UTC) does a lot of
>> hard work to create theories where appropriate, and to explain them
>> where possible. But they recognize (and we should do so, too) that in
>> the end, writing is a *cultural* phenomenon, where straightforward,
>> rigid theories have severe limitations.
>>
>> From a certain viewpoint (the chemist's in the example above), the
>> result may look arbitrary, but from another viewpoint (the
>> phoneticist's), it looks perfectly fine. At first, it looks like it
>> would be easy to fix such problems, but each fix risks to introduce new
>> arbitrariness when seen from somebody else's viewpoint. Getting upset
>> won't help.
> I?ve got the point, thanks. Phonetics need to write running text that is
> immediately legible, while a chemistry database may use particular notational
> conventions that work with baseline letters to be parsed on semantics or light
> markup for proper display in the UI. The UTC decision thus questioned the design
> principle of using plain text for chemical formulae. No doubt it was understood
> that validating this choice would have opened the door to encoding more special
> characters for upgrading or similar purposes.
I think there is a big difference between adding a few characters for a
new use (chemistry formulae) and completing an obvious almost complete
set. People are used to see the 26 basic alphabetic Latin character
(abcdefghijklmnopqrstuvwxyz) being treated preferentially by computers,
but are always surprised when only one of them is treated differently.
Initially, superscript letters where restricted to a few letter, and it
made sense to restrict the temptation to complete the set. But now that
all modifier small latin letters except q are encoded, it makes little
sense. Many people use these characters (arguably wrongly) for many uses
beyond IPA, and they are invariably surprised if they need q. The
special status of the basic Latin alphabet means that almost no one
would be surprised not to find a superscripted ?, ?, or ? and adding the
last missing latin basic letter q would not open the door to any more
character.

>
> At this point I?d like to mention what I thought about since this thread
> was launched. The French language makes extensive use of superscripts
> to note abbreviations. [...] Therefore I suggest to grant
> the French language full support by enabling superscript lowercase letters
> in order that the SUPERSCRIPT deadkey that the French Standards body recommends,
> will work for all abreviations. There is no point about other letters than the basic
> alphabet superscripted, as no French abbreviation exceeds this range (despite of
> what I believed in 2014, like many other people).
Whether ? (and ?) are needed or not is another question. Even if it were
useful (as argued ny others in this thread), it brings non trivial
technical difficulties in terms of NFC/NFD. But since people are used to
see these characters being treated differently, I think the ?problem? of
the lack of superscript composed character is less obvious than the lack
of *MODIFIER LETTER SMALL Q, in the sense that the first absence is
perceived (by the Unicode naive user) as more normal than the second.

Fr?d?ric

From charupdate at orange.fr Wed Oct 5 17:10:32 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 6 Oct 2016 00:10:32 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

Message-ID: <1791413156.18188.1475705432866.JavaMail.www@wwinf1f05>

On Wed, 5 Oct 2016 19:02:51 +0200, Fr?d?ric Grosshans wrote:
Le 05/10/2016 ? 15:57, Marcel Schneider a ?crit :
> On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote:
[?]
>>>
>>> From a certain viewpoint (the chemist's in the example above), the
>>> result may look arbitrary, but from another viewpoint (the
>>> phoneticist's), it looks perfectly fine. At first, it looks like it
>>> would be easy to fix such problems, but each fix risks to introduce new
>>> arbitrariness when seen from somebody else's viewpoint. Getting upset
>>> won't help.
>> I?ve got the point, thanks. Phonetics need to write running text that is
>> immediately legible, while a chemistry database may use particular notational
>> conventions that work with baseline letters to be parsed on semantics or light
>> markup for proper display in the UI. The UTC decision thus questioned the design
>> principle of using plain text for chemical formulae. No doubt it was understood
>> that validating this choice would have opened the door to encoding more special
>> characters for upgrading or similar purposes.
>
> I think there is a big difference between adding a few characters for a
> new use (chemistry formulae) and completing an obvious almost complete
> set. People are used to see the 26 basic alphabetic Latin character
> (abcdefghijklmnopqrstuvwxyz) being treated preferentially by computers,
> but are always surprised when only one of them is treated differently.
> Initially, superscript letters where restricted to a few letter, and it
> made sense to restrict the temptation to complete the set. But now that
> all modifier small latin letters except q are encoded, it makes little
> sense. Many people use these characters (arguably wrongly) for many uses
> beyond IPA, and they are invariably surprised if they need q. The
> special status of the basic Latin alphabet means that almost no one
> would be surprised not to find a superscripted ?, ?, or ? and adding the
> last missing latin basic letter q would not open the door to any more
> character.
>
That is however exactly what I believed, that this would open that door.
It seems to me as if the missing superscript q were the last key to keep
that door locked (how nice an image, as the small q is somewhat key-shaped).
It is as if completing that series would trigger an avalanche of superscript
alphabets and symbols to be asked for encoding without any means to be refused.

And, troublesome enough, this is exactly how the proposal to encode *MODIFIER
LETTER SMALL Q was percieved, despite the rationale, which must have been
completely misunderstood, although it seems to me to be written in good English.
Thanks to Denis Jacquerye?s detailed answer to the question ?Why is there no
character for "superscript q" in Unicode? [1], I got all links quickly [2][3][4].

>>
>> At this point I?d like to mention what I thought about since this thread
>> was launched. The French language makes extensive use of superscripts
>> to note abbreviations. [...] Therefore I suggest to grant
>> the French language full support by enabling superscript lowercase letters
>> in order that the SUPERSCRIPT deadkey that the French Standards body recommends,
>> will work for all abreviations. There is no point about other letters than the basic
>> alphabet superscripted, as no French abbreviation exceeds this range (despite of
>> what I believed in 2014, like many other people).
> Whether ? (and ?) are needed or not is another question. Even if it were
> useful (as argued ny others in this thread), it brings non trivial
> technical difficulties in terms of NFC/NFD. But since people are used to
> see these characters being treated differently, I think the ?problem? of
> the lack of superscript composed character is less obvious than the lack
> of *MODIFIER LETTER SMALL Q, in the sense that the first absence is
> perceived (by the Unicode naive user) as more normal than the second.

I really love your point of view, I understand that it is already shared by
most people, and I strongly hope that it be adopted by the UTC.
Perhaps it is, as there is no notice of non-approval found in the archive.

However I?d like to know the answer to the proposer at/after the UTC meeting
of August 9-13, 2010 at Redmond [5]. Such requests have to be sent to this List,
which is monitored by meeting participants.

Regards,
Marcel

[1] Denis Jacquerye?s post:
https://www.quora.com/Why-is-there-no-character-for-superscript-q-in-Unicode/answer/Denis-Jacquerye-1

[2] Karl Pentzlin?s proposal:
http://www.unicode.org/L2/L2010/10230-modifier-q.pdf

[3] A comment on behalf of Adobe Systems, written up the first day of
the UTC meeting where the proposal was rejected:
http://www.unicode.org/L2/L2010/10315-comment.pdf

[4] Karl Pentzlin?s reply, two days later i.e. three days before
the end of the meeting:
http://www.unicode.org/L2/L2010/10316-cmts.pdf

[5] The anchor in the UTC minutes at the related Action Item:
http://www.unicode.org/cgi-bin/GetL2Ref.pl?124-A146

From charupdate at orange.fr Thu Oct 6 02:21:11 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 6 Oct 2016 09:21:11 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

Message-ID: <451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

On Wed, 5 Oct 2016 17:34:02 +0200, Philippe Verdy wrote:
[?]
>
> I agree, French allows abbreviating many words by appending the last new
> letters in superscripts. 3e is recommended but
> 3?me
> is still very frequent. As well you'll see abbreviations using ^?
> (a frequent termination for past participles, generally used with the
> previous consonnant and possibly followed with the f?minine/plural final
> letters, all in superscript).

I did never see that. Would you show us some examples to look up? I?m curious
whether they could be managed without accented superscripts.
Anyway, combining diacritics should be placeable on superscripts as well.

>
> Almost nobody use the preencoded superscript letters for this (notably not
> for "1^er", or its recommended feminine form "1^re",
> still frequently written "1^?re")

They don?t because these are not on the keyboard. Trust me, I wouldn?t use
them neither if I hadn?t them on a (prototype) keyboard layout.
You may say the same about the ?, the ? and ?, and so on. Why do people
abbreviate ?num?ro? by ?n??? Because we *do have it* (the degree sign) on
our keyboards.

BTW, there was another (subsequent) proposal [1], to complete with superscript q
but not only. At the time, it was up to fill out the SUPERSCRIPT and SUBSCRIPT
dead keys (called latching group selectors). But no trace of any UTC meeting item
can be found, at least when I try to do the search.

The known issue about this proposal is that it is a part of the ISO/IEC 9995
standardization process, as it should contribute to part 9 of that standard.
That is an issue because of Microsoft being fiercely opposed to ISO/IEC 9995.

I understand Microsoft, as the standard in question is in my opinion actually
suboptimal. But this is another issue, not to be discussed neither in this
thread, nor on this Mailing List at all (except perhaps by other subscribers).

Regards,
Marcel

[1] The proposal:
http://www.unicode.org/L2/L2011/11208-n4068.pdf

From verdy_p at wanadoo.fr Thu Oct 6 04:16:53 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 6 Oct 2016 11:16:53 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <451253030.1751.1475738472001.JavaMail.www@wwinf1f05>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>
Message-ID:

2016-10-06 9:21 GMT+02:00 Marcel Schneider :

> > Almost nobody use the preencoded superscript letters for this (notably
> not
> > for "1^er", or its recommended feminine form "1^re",
> > still frequently written "1^?re")
>
> They don?t because these are not on the keyboard. Trust me, I wouldn?t use
> them neither if I hadn?t them on a (prototype) keyboard layout.
>

I will certainly not trust you, and you won't challenge me on that. The
keyboard is definitely not the issue here. Only the degree sign on French
keyboards is very frequently used (instead of the superscript o, or any
final o), in: n? (num?ro), d? (ditto), r? (recto), v? (verso), f? (folio).
For Latin ordinals: 1? (primo / premi?rement), 2? (secundo / deuxi?mement),
the superscript o or degree may be dropped sometimes but it is most often a
degree sign in many encoded documents (there's no real difference in
handwritten or printed text with many font styles)...

It's a common fact that these informal abbrevations (using final "?me",
"?re", in superscripts or not) ARE REALLY frequently used (examples are
easy to find), they are hadwritten or composed in wordprocessors or even in
web editors, because it's so simple to transform them with superscripts.
And this happens even if the prefered forms use shorter abbreviations
"1er", "2e" without needing any accent in this case (the same also occurs
with ordinals using roman digits).

Note that the same abbreviations are ALSO found without superscripts, such
as "1er", "1re" (or "1?re"), "2e" (or "2?me" ; and when it is the last in a
pair : "2nd", "2nde" or "2de" ): this clearly demonstrates that this is
just a prefered typographic style for the final letters of abbreviations,
and not a separate encoding of the same letters). But not for n?, d?, r?,
v?, d? (using a final plain o after the abbreviated first letters would
create confusion, the degree sign is then highly prefered to the absence of
subperscript, even if the superscript o would be better).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From irgendeinbenutzername at gmail.com Thu Oct 6 09:54:07 2016
From: irgendeinbenutzername at gmail.com (Charlotte Buff)
Date: Thu, 6 Oct 2016 16:54:07 +0200
Subject: Dealing with Unencodeable Characters
Message-ID:

One of Unicode's goals is round-trip compatibility with old legacy
character sets, which is why we gathered many compatibility characters over
time that would normally have been out of scope for the standard. It's why
Zapf Dingbats and arabic presentation forms are in Unicode for example.
However, there are some characters that form part of these sets yet are
deliberately not encoded in Unicode because they were considered unsuitable
for inclusion. The two that come to mind are the Windows logo from
Wingdings and the Shibuya 109 emoji from the original Japanese vendor sets.

Given that these two have no Unicode equivalents, their source character
sets are not fully compatible with Unicode, i.e. there is going to be data
loss and confusion when trying to convert into or from Unicode.

If theoretically I wanted to convert an old Shift JIS document containing
emoji to Unicode, how should I ideally handle Shibuya 109?

I remember the early emoji proposal documents originally contained "emoji
compatibility symbols" which where used to map to source characters that
weren't meant to be included with a specified semantic. I believe STATUE OF
LIBERTY was one of those characters and was simply called EMOJI
COMPATIBILITY SYMBOL-XX so that that specific landmark wouldn't strictly be
part of Unicode. Obviously this approach ultimatively wasn't implemented,
but I wonder whether there could be designated compatibility characters for
this kind of issue. Private use characters are an obvious choice but of
course their meaning is user-defined, so while all other emoji in my Shift
JIS document would receive an unambiguous Unicode mapping, Shibuya 109
would remain vague and very limited in interchange options.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From frederic.grosshans at gmail.com Thu Oct 6 09:55:32 2016
From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=)
Date: Thu, 6 Oct 2016 16:55:32 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <451253030.1751.1475738472001.JavaMail.www@wwinf1f05>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>
Message-ID:

Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit :
>
> I did never see that. Would you show us some examples to look up? I?m curious
> whether they could be managed without accented superscripts.
> Anyway, combining diacritics should be placeable on superscripts as well.
Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04)
The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is
nothing to add.

Fr?d?ric

From jkorpela at cs.tut.fi Thu Oct 6 10:53:04 2016
From: jkorpela at cs.tut.fi (Jukka K. Korpela)
Date: Thu, 6 Oct 2016 18:53:04 +0300
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID:

6.10.2016, 17:55, Fr?d?ric Grosshans wrote:

> Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit :
>>
>> I did never see that. Would you show us some examples to look up? I?m
>> curious
>> whether they could be managed without accented superscripts.
>> Anyway, combining diacritics should be placeable on superscripts as well.
> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04)
> The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is
> nothing to add.

It?s fine that it works in some environment(s), but it would be
unrealistic to expect it to work generally. In most environments,
assuming the font used supports the characters involved in the first
place, the result is probably a grave accent struck over the superscript
e, in a rather ugly way.

Even though Unicode superscript (and subscript) characters have a lot of
practical use in many contexts, this isn?t really one of them. In a case
like this, in most environments, and especially if you want the text to
display well in different environments, the solution is to use just
?3?me?, perhaps with some method (?above? the character level) used to
format the letters as superscript, when not limited to plain text ? but
I?m afraid most fonts don?t have a superscript glyph for ??? available,
so it would usually be best to give up the superscripting idea here.

Yucca

From oren.watson at gmail.com Thu Oct 6 11:04:17 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Thu, 6 Oct 2016 12:04:17 -0400
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID:

---------- Forwarded message ----------
From: Oren Watson
Date: Thu, Oct 6, 2016 at 12:03 PM
Subject: Re: Why incomplete subscript/superscript alphabet ?
To: "Jukka K. Korpela"

If this is a real need, why not petition more software to allow the use of
the U+8C partial line up and U+8B partial line down characters for the this
purpose?

On Thu, Oct 6, 2016 at 11:53 AM, Jukka K. Korpela
wrote:

> 6.10.2016, 17:55, Fr?d?ric Grosshans wrote:
>
> Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit :
>>
>>>
>>> I did never see that. Would you show us some examples to look up? I?m
>>> curious
>>> whether they could be managed without accented superscripts.
>>> Anyway, combining diacritics should be placeable on superscripts as well.
>>>
>> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04)
>> The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is
>> nothing to add.
>>
>
> It?s fine that it works in some environment(s), but it would be
> unrealistic to expect it to work generally. In most environments, assuming
> the font used supports the characters involved in the first place, the
> result is probably a grave accent struck over the superscript e, in a
> rather ugly way.
>
> Even though Unicode superscript (and subscript) characters have a lot of
> practical use in many contexts, this isn?t really one of them. In a case
> like this, in most environments, and especially if you want the text to
> display well in different environments, the solution is to use just ?3?me?,
> perhaps with some method (?above? the character level) used to format the
> letters as superscript, when not limited to plain text ? but I?m afraid
> most fonts don?t have a superscript glyph for ??? available, so it would
> usually be best to give up the superscripting idea here.
>
> Yucca
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From kenwhistler at att.net Thu Oct 6 11:27:13 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 6 Oct 2016 09:27:13 -0700
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID: <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>

On 10/6/2016 9:04 AM, Oren Watson wrote:
> If this is a real need, why not petition more software to allow the
> use of the U+8C partial line up and U+8B partial line down characters
> for the this purpose?
>

Because U+008C and U+008B are relics from the days when control codes
were used in terminal control protocols and to drive print trains in
devices like this:

https://en.wikipedia.org/wiki/Line_printer#/media/File:IBM_line_printer_1403.JPG

Their functions have been completely overtaken by markup conventions
such as _... and ^..., which *are* widely supported
already, even in most email clients, ri^ght out of the b_ox .

And I suspect that Yucca's statement "so it would usually be best to
give up the superscripting idea here" is intended to mean give up on
asking for a separately encoded superscript character for each Latin
letter, including accented ones (or applying accents to separately
encoded superscript letters). Because, after all, this stuff already
just works: ?3^?me ? (and not ?3?????, by the way!).

--Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From oren.watson at gmail.com Thu Oct 6 11:32:07 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Thu, 6 Oct 2016 12:32:07 -0400
Subject: Fwd: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

<3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>

Message-ID:

I meant, petition say the devs of Konsole, iTerm, xterm etc, and other
programs which deal purely in plain text to support 8b and 8c characters
for formatting. Markup doesn't exist everywhere.

On Thu, Oct 6, 2016 at 12:27 PM, Ken Whistler wrote:

>
>
> On 10/6/2016 9:04 AM, Oren Watson wrote:
>
> If this is a real need, why not petition more software to allow the use of
> the U+8C partial line up and U+8B partial line down characters for the this
> purpose?
>
>
> Because U+008C and U+008B are relics from the days when control codes were
> used in terminal control protocols and to drive print trains in devices
> like this:
>
> https://en.wikipedia.org/wiki/Line_printer#/media/File:IBM_l
> ine_printer_1403.JPG
>
> Their functions have been completely overtaken by markup conventions such
> as _... and ^..., which *are* widely supported
> already, even in most email clients, right out of the box.
>
> And I suspect that Yucca's statement "so it would usually be best to give
> up the superscripting idea here" is intended to mean give up on asking for
> a separately encoded superscript character for each Latin letter, including
> accented ones (or applying accents to separately encoded superscript
> letters). Because, after all, this stuff already just works: ?3?me? (and
> not ?3?????, by the way!).
>
> --Ken
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From charupdate at orange.fr Thu Oct 6 13:03:32 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 6 Oct 2016 20:03:32 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID: <1098989413.14438.1475777012613.JavaMail.www@wwinf1f05>

On Thu, 6 Oct 2016 16:55:32 +0200, Fr?d?ric Grosshans wrote:

[?]
>> Anyway, combining diacritics should be placeable on superscripts as well.
> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04)
> The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is
> nothing to add.

As others pointed out, this depends also on the font. In my webmail and
in my text editor, the accent displays above the m, struck across the
upper edge of the superscript letter.

The French Standards body is asking for a facility on the keyboard
to input the French ordinal indicator, basically a superscript e,
as a plain text character:
XX? si?cle [20th century, or since we are on it: 20?? century].
There is no recommended use of accents when talking about French ordinals.
See this shocking image (the neon sign was deprecated *and* faulty):

https://twitter.com/XimeLelong/status/776448216346791936

Regards,
Marcel

From kenwhistler at att.net Thu Oct 6 13:03:25 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 6 Oct 2016 11:03:25 -0700
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

<3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>

Message-ID: <7fdd20ef-d309-c089-e2cc-11df024da44f@att.net>

On 10/6/2016 9:32 AM, Oren Watson wrote:
> I meant, petition say the devs of Konsole, iTerm, xterm etc, and other
> programs which deal purely in plain text to support 8b and 8c
> characters for formatting. Markup doesn't exist everywhere.
>

Fair enough.

But most actual terminals didn't support partial line advances (although
line printers and electric typewriter terminals could):

http://www.ccs.neu.edu/research/gpc/MSim/vona/terminal/VT100_Escape_Codes.html

so there would seem to be little call for terminal emulators to do so in
such cases. (And by the way, it is arguable that markup *does* exist for
terminals. After all, that is what character attribute controls like
^[[1m for bold mode are all about.)

And *consoles*, which pretty much by definition do *un*formatted text,
are poor contexts to try to fancy up with out-of-scope formatting
requirements.

In general I fail to see any significant ROI for this kind of
requirement. Trying to patch up consoles with hacks to deal with Latin
superscripts and subscripts is just another scheme that will run up on
the rocks at the very next formatting requirement thrown at it -- or for
that matter, when attempting to render plain text in nearly *any*
complex script encoded in Unicode.

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Thu Oct 6 13:06:56 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 6 Oct 2016 20:06:56 +0200
Subject: Dealing with Unencodeable Characters
In-Reply-To:
References:
Message-ID:

PUA characters are still used when mapping corporate logos (from Windows
and Apple/MacOS) in fonts for the relevant systems.

Microsoft then opted to include these corporate logos (and specific UI
icons) in a separate font, also with PUA mappings, and then added new PUA
fonts as needed.

E.g.:
* "Segoe MDL2 Assets" on Windows 10, even if many of theses characters are
symbols are also encoded separately with standard codes, only to make sure
they have a coherent design and metrics instead of taking them from various
random fonts). There are for example icons representing battery levels,
wifi reception levels with bars, status icons for muting on/off some
devices or UI services for talks, cameras, selection of screen,
enabling/disabling the touch interface, displaying the state of headphones,
presenting incoming phone calls or keeping them silent... and several
variants of common arrows and common geometric symbols, or even some
characters for the Windows calculator such as common arithmetic signs.
You'll note many variants of arrow heads. May be these characters are aslo
used internally for being used as internal fallbacks in IE/Edge, but all
this is left completely undocumented (colutarily in my opinion to make sure
that other users will not create and exchange documents intended to be
interoperable).
* "Webdings" contain various elaborate icons that are designed to be
realistic rather than symbolic, sometimes in several locale-sensitive
variants (e.g. the Earth globe, centered on America, or Europe/Africa, or
on Asia/Australia). Here again you'll find various arrow heads for
displaying UI buttons.
* "Wingdings", "Wingdings 2", are here again maaping various forms of
arrows and arrow heads, plus some emojis or enclosed characters, or
decorative characters. "Wingdings" also includes another Windows logo at
position 0xFF; these fonts are not mapped to Unicode but to 8-bit code
positions 0x21..0xFF.
* "Wingdings 3" uses a mix of non-Unicode mappings in 0x21..0xFF and some
characters and other regular Unicode positions (in 0x2000..0X9FFF) multiple
times (every block of 0x100 code positions, i.e. each glyph is mapped 128
or 129 times in that font). None of these characters have a Unicode mapping.
* You probably remember the case of the "Marlett" font created to support
the UI of Windows 7 (but most positions are assigned to .notdef/"tofu") and
that has a position 0x57 mapped to a Windows logo. There's also an old font
"MT Extra" made by Math Type (in 1996 according to its details), containing
some maths symbols (probably still used by some modules in the Equations
edit for compatibility of documents created with old versions of Office).
These two fonts are using only 8-bit code mappings (in 0x21..0xFF, but most
of them are mapped to a .notdef/"tofu" glyph).

Such fonts are installed and used by specific software modules, and at
discrete font sizes and not even hinted (they could as well use collections
of scalable vector graphics, but a single font allows these symbols to be
more efficiently loaded and to be hinted for low resolution display at
small font sizes). They may still be used in other applications but without
any warranty of interoperability or support for upgrades/downgrades across
Windows versions. In fact these fonts are not relaly supported outside of
the specific software modules needing them to render their UI. They may
disppear or change significantly at any time.

2016-10-06 16:54 GMT+02:00 Charlotte Buff :

> One of Unicode's goals is round-trip compatibility with old legacy
> character sets, which is why we gathered many compatibility characters over
> time that would normally have been out of scope for the standard. It's why
> Zapf Dingbats and arabic presentation forms are in Unicode for example.
> However, there are some characters that form part of these sets yet are
> deliberately not encoded in Unicode because they were considered unsuitable
> for inclusion. The two that come to mind are the Windows logo from
> Wingdings and the Shibuya 109 emoji from the original Japanese vendor sets.
>
> Given that these two have no Unicode equivalents, their source character
> sets are not fully compatible with Unicode, i.e. there is going to be data
> loss and confusion when trying to convert into or from Unicode.
>
> If theoretically I wanted to convert an old Shift JIS document containing
> emoji to Unicode, how should I ideally handle Shibuya 109?
>
> I remember the early emoji proposal documents originally contained "emoji
> compatibility symbols" which where used to map to source characters that
> weren't meant to be included with a specified semantic. I believe STATUE OF
> LIBERTY was one of those characters and was simply called EMOJI
> COMPATIBILITY SYMBOL-XX so that that specific landmark wouldn't strictly be
> part of Unicode. Obviously this approach ultimatively wasn't implemented,
> but I wonder whether there could be designated compatibility characters for
> this kind of issue. Private use characters are an obvious choice but of
> course their meaning is user-defined, so while all other emoji in my Shift
> JIS document would receive an unambiguous Unicode mapping, Shibuya 109
> would remain vague and very limited in interchange options.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From charupdate at orange.fr Thu Oct 6 13:14:13 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 6 Oct 2016 20:14:13 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

<3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>
Message-ID: <1885626438.14631.1475777653900.JavaMail.www@wwinf1f05>

On Thu, 6 Oct 2016 09:27:13 -0700, Ken Whistler wrote:

[?]
> Their functions have been completely overtaken by markup conventions
> such as _... and ^..., which *are* widely supported
> already, even in most email clients, ri^ght out of the b_ox .
>
> And I suspect that Yucca's statement "so it would usually be best to
> give up the superscripting idea here" is intended to mean give up on
> asking for a separately encoded superscript character for each Latin
> letter, including accented ones (or applying accents to separately
> encoded superscript letters). Because, after all, this stuff already
> just works: ?3^?me ? (and not ?3?????, by the way!).

High level formatting in high-end mail clients is of little use
when the target environment is plain text.
It?s still unambiguous, though.

As of superscript ???, I had asked for it as soon as 2014, and I fully
understood that Unicode no longer encourages proposals of any *new*
precomposed characters. This was before I learned that ?3?me? is not
good French. These long ordinal indicators are deprecated.

Regards,
Marcel

From verdy_p at wanadoo.fr Thu Oct 6 13:16:36 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 6 Oct 2016 20:16:36 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID:

It does not render very well, the accent is not correctly positioned
vertically (far too high) above the superscript e and colliding with the
previous line of text at normal line-height, because fonts do not support
this pair with proper positioning. The combination is just rendered in some
"best effort" by the text renderer of my browser. When used in the Windows
UI, the accent collides with the following superscript "m".
Let's not talk about how you would superscript a "???" (very poor
positioning if using combining characters) or "????" (the result would be
misleading with most fonts if using combining characters) or "???"
(impossible)...

2016-10-06 16:55 GMT+02:00 Fr?d?ric Grosshans
:

> Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit :
>
>>
>> I did never see that. Would you show us some examples to look up? I?m
>> curious
>> whether they could be managed without accented superscripts.
>> Anyway, combining diacritics should be placeable on superscripts as well.
>>
> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04)
> The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is nothing
> to add.
>
> Fr?d?ric
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From jkorpela at cs.tut.fi Thu Oct 6 13:20:22 2016
From: jkorpela at cs.tut.fi (Jukka K. Korpela)
Date: Thu, 6 Oct 2016 21:20:22 +0300
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

<3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>
Message-ID:

6.10.2016, 19:27, Ken Whistler wrote:

> Their functions have been completely overtaken by markup conventions
> such as _... and ^..., which *are* widely supported
> already, even in most email clients, ri^ght out of the b_ox .

They are widely supported, but very widely in a typographically inferior
way. This is essential especially when it comes to things like ?3?me?,
where one might want to display the letters in superscript style as a
matter of typography-

> And I suspect that Yucca's statement "so it would usually be best to
> give up the superscripting idea here" is intended to mean give up on
> asking for a separately encoded superscript character for each Latin
> letter, including accented ones

Not quite. Adding superscript characters for all Latin letters is not a
good idea at all, but I was not referring that. Instead, I suggested
that in a case like ?3?me?, it?s best to give up the idea of
superscripting the letters using any techniques available now (including
e.g. ^{markup), in most situations. Flat rendering of ?3?me? is
better than a typographically poor rendering with superscripts.

> Because, after all, this stuff already
> just works: ?3^?me ? (and not ?3?????, by the way!).

It works for a rather limited range of values for ?works?. I?m not sure
what happens in my reply... it seems that Thunderbird does something
funny here. Anyway, what I saw in my Thunderbird is what I usually see
when ^{is used: ??me? in slightly reduced font in elevated position,
messing up line spacing, and looking rather different from superscript
glyphs designed by a typographer.

Independently of the technique used to ask software to show something as
a superscript (e.g. using a superscript character code point in Unicode,
using ^{, using superscript formatting in a word processor, or using
^{...} in TeX), typographically accepted rendering must use a
superscript glyph, designed by a typographer to match the overall style
of the font, or maybe a sophisticated algorithm that constructs the
rendering from a normal glyph.

In a sense, superscript code points make this easier: the rendering can
simply pick up the corresponding glyph for the font ? if it has one (a
big ?if?). But this is not a good argument in favor of adding such
points en masse. It is, however, a good argument in favor of using
existing superscript code points, like ???, with good font support.

Yucca

From kenwhistler at att.net Thu Oct 6 13:30:52 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 6 Oct 2016 11:30:52 -0700
Subject: Dealing with Unencodeable Characters
In-Reply-To:
References:
Message-ID: <24956e36-247f-7d70-5e81-691f320f8435@att.net>

On 10/6/2016 7:54 AM, Charlotte Buff wrote:
> If theoretically I wanted to convert an old Shift JIS document
> containing emoji to Unicode, how should I ideally handle Shibuya 109?

And the general answer to that is convert to U+FFFD, unless you are
doing something specific and know what you are doing. ... in which case
you can use PUA or insert an image, or whatever else you need to do.

This is not a character *standardization* issue that requires the UTC to
come up with a generic interchange solution for every pre-Unicode
character encoding of everything that ever was, whether it be some
oddball Shift JIS extensions that were omitted in the consensus on
encoding the Japanese Carrier Emoji:

http://www.unicode.org/reports/tr51/tr51-7.html#Japanese_Carrier

or other odds and ends from bizarre, dead-end, disused character
encodings from a previous generation.

By the way, the biggest ongoing problem we deal with here is the
continuing urge to proliferate font-encoded hacks for particular
languages and writing systems. The text interchange problems that such
schemes pose on an ongoing basis far far outweigh issues like what to do
with a Shibuya 109 emoji, imo.

--Ken

From doug at ewellic.org Thu Oct 6 14:02:20 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 06 Oct 2016 12:02:20 -0700
Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?=
Message-ID: <20161006120220.665a7a7059d7ee80bb4d670165c8327d.f785149136.wbe@email03.godaddy.com>

>> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu
>> 16.04) The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there
>> is nothing to add.
>
> It does not render very well, the accent is not correctly positioned
> vertically (far too high) above the superscript e and colliding with
> the previous line of text at normal line-height, because fonts do not
> support this pair with proper positioning.

http://www.unicode.org/faq/char_combmark.html#12b

Poor display support today is not supposed to be a rationale for
permanently encoding new precomposed letters.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From doug at ewellic.org Thu Oct 6 14:03:29 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 06 Oct 2016 12:03:29 -0700
Subject: Dealing with Unencodeable Characters
Message-ID: <20161006120329.665a7a7059d7ee80bb4d670165c8327d.d0bdde4c26.wbe@email03.godaddy.com>

> * "Wingdings", "Wingdings 2", are here again maaping various forms of
> arrows and arrow heads, plus some emojis or enclosed characters, or
> decorative characters. "Wingdings" also includes another Windows logo
> at position 0xFF; these fonts are not mapped to Unicode but to 8-bit
> code positions 0x21..0xFF.
> * "Wingdings 3" uses a mix of non-Unicode mappings in 0x21..0xFF and
> some characters and other regular Unicode positions (in 0x2000..
> 0X9FFF) multiple times (every block of 0x100 code positions, i.e. each
> glyph is mapped 128 or 129 times in that font). None of these
> characters have a Unicode mapping.

It's true that the Wingdings and Webdings fonts themselves, which date
back to the 1990s, are "symbol fonts" with glyphs mapped to the ASCII
range. However, to clear up any possible confusion, all glyphs in these
fonts have had actual Unicode mappings since version 7.0 (June 2014).

--
Doug Ewell | Thornton, CO, US | ewellic.org

From doug at ewellic.org Thu Oct 6 14:06:07 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 06 Oct 2016 12:06:07 -0700
Subject: Dealing with Unencodeable Characters
Message-ID: <20161006120607.665a7a7059d7ee80bb4d670165c8327d.31d43928da.wbe@email03.godaddy.com>

Charlotte Buff wrote:

> Private use characters are an obvious choice but of course their
> meaning is user-defined, so while all other emoji in my Shift JIS
> document would receive an unambiguous Unicode mapping, Shibuya 109
> would remain vague and very limited in interchange options.

But that's exactly what private-use characters were invented for: so you
can represent characters in a given character encoding framework which
are not encoded for some reason.

Of course you need a private agreement of some kind, but it can be as
simple as "Hey, everybody, in the attached document (or in any documents
I create) U+FF109 means SHIBUYA 109." Private agreements don't have to
be secret or limited-distribution, and they don't have to be excessively
formal.

Unicode rejected the "compatibility symbols" because they would have
amounted to private-use characters defined by Unicode, where the formal
names and definitions of the characters were not specified but, shhh, we
all know what they REALLY mean. This would have been the Wrong Thing to
Do on many levels.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From verdy_p at wanadoo.fr Thu Oct 6 14:21:00 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 6 Oct 2016 21:21:00 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <20161006120220.665a7a7059d7ee80bb4d670165c8327d.f785149136.wbe@email03.godaddy.com>
References: <20161006120220.665a7a7059d7ee80bb4d670165c8327d.f785149136.wbe@email03.godaddy.com>
Message-ID:

2016-10-06 21:02 GMT+02:00 Doug Ewell :

> >> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu
> >> 16.04) The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there
> >> is nothing to add.
> >
> > It does not render very well, the accent is not correctly positioned
> > vertically (far too high) above the superscript e and colliding with
> > the previous line of text at normal line-height, because fonts do not
> > support this pair with proper positioning.
>
> http://www.unicode.org/faq/char_combmark.html#12b
>
> Poor display support today is not supposed to be a rationale for
> permanently encoding new precomposed letters.
>

I've not asked for that, I just wanted to comment the fact that using
subscripts encoded for compatibilty with legacy standards or specific uses
(such as IPA) by following them with random combining diacritics not
designed for this usage is not the way to go. The generic styling markup
(appropriate for each kind of document is the way to go.

For abbreviations in plain-text files, it is often better to not even try
to render these superscript styles, without using any additional markup at
all, and then use the full range of letters for relevant scripts.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Thu Oct 6 14:39:01 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 6 Oct 2016 21:39:01 +0200
Subject: Dealing with Unencodeable Characters
In-Reply-To: <20161006120329.665a7a7059d7ee80bb4d670165c8327d.d0bdde4c26.wbe@email03.godaddy.com>
References: <20161006120329.665a7a7059d7ee80bb4d670165c8327d.d0bdde4c26.wbe@email03.godaddy.com>
Message-ID:

2016-10-06 21:03 GMT+02:00 Doug Ewell :

> > * "Wingdings", "Wingdings 2", are here again maaping various forms of
> > arrows and arrow heads, plus some emojis or enclosed characters, or
> > decorative characters. "Wingdings" also includes another Windows logo
> > at position 0xFF; these fonts are not mapped to Unicode but to 8-bit
> > code positions 0x21..0xFF.
> > * "Wingdings 3" uses a mix of non-Unicode mappings in 0x21..0xFF and
> > some characters and other regular Unicode positions (in 0x2000..
> > 0X9FFF) multiple times (every block of 0x100 code positions, i.e. each
> > glyph is mapped 128 or 129 times in that font). None of these
> > characters have a Unicode mapping.
>
> It's true that the Wingdings and Webdings fonts themselves, which date
> back to the 1990s, are "symbol fonts" with glyphs mapped to the ASCII
> range. However, to clear up any possible confusion, all glyphs in these
> fonts have had actual Unicode mappings since version 7.0 (June 2014).
>
These mappings exist theoretically but not in these fonts themselves
(notably not when there are multiple variants of the same encoded
characters, notably for many arrows and arrow heads).

The 3 glyphs for the Earth globe (centered on Americas, or Europe+Africa or
South/East Asia+Australia) are not distinguished at all in Unicode (I've
not seen any sequence with variants selectors to help distinguishinhg them,
and there are some fonts showing the Earth globe centered on the
Antarctic). Unicode seems to also allow the character to show a flat
Mercator map centered on these positions, or other projections, as the
encoded character just means "Earth".

So no, the mappings are theoretical and allow wide variations, that these
fonts purposely want to distinguish. They are used without directly without
using any Unicode mapping, for internal implementation reasons, or specific
meanings in specific applications, or because this makes a coherent
graphical design for an UI (fonts are used for this prupose, but many
applications do not need fonts for this usage, they just use collections of
icons, frequently packed in a ZIP/JAR archive, or using CSS selectors in
SVG files, or hidden in their graphic source code by directly using drawing
APIs, in which they can add custom visual effects such as animations,
glowing, transparency, custom superpositions and compositions custom
layouts and interaction with user events or application events and states).
Using the Unicode mappings in these fonts would not allow selecting the
appropriate dinctiguished glyphs, the UI would become confusive or no
longer usable or would create a ugly patchwork.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From gwalla at gmail.com Thu Oct 6 14:44:05 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Thu, 6 Oct 2016 12:44:05 -0700
Subject: Bit arithmetic on Unicode characters?
Message-ID:

Other than converting between UTFs, is bit arithmetic commonly performed on
Unicode characters? I was under the impression that it's a rarity if it is
done at all.

I've been working on a proposal for additional chess symbols used in chess
problems and variant games, and I've been in communication with the World
Federation for Chess Composition, which is the international organization
in charge of chess problems. We have agreement on the repertoire and the
text of the proposal, but the arrangement of the proposed characters within
the new block is a sticking point. Some representatives of the WFCC have
proposed alternate arrangements that assume there will be a need for
bitwise operations to covert between the existing chess symbols in the
Miscellaneous Symbols block and related symbols in the new block. I don't
see the need but maybe I'm missing something.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From christoph.paeper at crissov.de Thu Oct 6 14:48:25 2016
From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=)
Date: Thu, 6 Oct 2016 21:48:25 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID:

Jukka K. Korpela :
>
> ? the solution is to use just ?3?me?, perhaps with some method (?above? the character level) used to format the letters as superscript, when not limited to plain text ?

For ordinal numbers, it?s relatively simple to code language-dependent glyph substitution in Opentype which would not require any additional effort from the author, ?3?me? would just work, ?3e? ? ?3?? would require some extra care to avoid false positives. Letter-only abbreviations, however, would only work reliably with an added marker. Many languages conventionally, which are written in the roman script, including English, choose an apostrophe, but inter-letter periods are also not unheard of. That means, ?M?me? and ?M.me? could also be easily converted to ?M??? on a font/glyph level. If the used OTF feature is supported and active, this will work in plain text environments, but, of course, it depends on the font.

From doug at ewellic.org Thu Oct 6 15:01:01 2016
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 06 Oct 2016 13:01:01 -0700
Subject: Dealing with Unencodeable Characters
Message-ID: <20161006130101.665a7a7059d7ee80bb4d670165c8327d.becafe8546.wbe@email03.godaddy.com>

Philippe Verdy wrote:

> The 3 glyphs for the Earth globe (centered on Americas, or
> Europe+Africa or South/East Asia+Australia) are not distinguished at
> all in Unicode (I've not seen any sequence with variants selectors to
> help distinguishinhg them,

0xFC through 0xFE in Webdings are:

1F30D;EARTH GLOBE EUROPE-AFRICA;So;0;ON;;;;;N;;;;;
1F30F;EARTH GLOBE ASIA-AUSTRALIA;So;0;ON;;;;;N;;;;;
1F30E;EARTH GLOBE AMERICAS;So;0;ON;;;;;N;;;;;

I was asked not to publish my mapping tables (which were taken from one
of the final versions of the proposal) because they wouldn't have been
provided directly by Microsoft. But let me know if you need any
additional mappings on a one-off basis. *All glyphs in the Wingdings and
Webdings fonts have had actual Unicode mappings since version 7.0 (June
2014).*

> and there are some fonts showing the Earth globe centered on the
> Antarctic).

Sorry, I must have missed the part in
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0058.html where
you were talking about that.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From charupdate at orange.fr Thu Oct 6 15:12:24 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 6 Oct 2016 22:12:24 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

<3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net>

Message-ID: <2088087407.16544.1475784744365.JavaMail.www@wwinf1f05>

On Thu, 6 Oct 2016 21:20:22 +0300, Jukka K. Korpela wrote:

> In a sense, superscript code points make this easier: the rendering can
> simply pick up the corresponding glyph for the font ? if it has one (a
> big ?if?). But this is not a good argument in favor of adding such
> points en masse. It is, however, a good argument in favor of using
> existing superscript code points, like ???, with good font support.

The topic was mainly about completing the Latin alphabet with the missing
superscript (lowercase) and subscript characters, and eventually small caps.
As of me and many others, we were not asking for more than that. And IMHO,
this is not too much asked, after the ?????????????????? mathematical alphabets.
And no diacriticised letters are required as superscripts to fully support
the French language in Unicode.

I like very much your recommendation of *simplicity.* On a web page or
so, you can do a lot with CSS. On the other hand, every language should
be able to be written in plain text following its specificities. For French,
that means that superscripts as abbreviation indicators are required in plain
text. This is not a pregnant need for digits, like it isn?t in English. But it
is in French for titles, common nouns, and so on.

One other advantage of plain text abbreviations with superscripts is that you
are able to search-and-replace the indicators with formatted baseline letters
when the layout is made up. The reverse is way harder, if not impossible
once the formatting is lost. Its about the stability of the writing system.

The French recommendation is *not* to use long ordinal indicators, only
one or exceptionally two letters. What can be called ?a hack? is using
the degree sign to ape a superscript small o. This very year 2016, there
*can* be an end of those workarounds, since finally, our country is about
to be given several *official* decent keyboards (keyboard layouts).

Regards,
Marcel

From charupdate at orange.fr Thu Oct 6 15:19:35 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 6 Oct 2016 22:19:35 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu>
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu>
Message-ID: <541186650.16609.1475785175378.JavaMail.www@wwinf1f05>

On Wed, 5 Oct 2016 06:35:52 +0000, Martin Mueller wrote:
[?]
> That said, given that alphabets have fixed numbers, it?s weird
> that bits of super and subscripted letters appear in this or
> that limited range but that you can?t cobble a whole alphabet
> together in a consistent manner.

Indeed your point looked good to me, and it does again. Here?s why:

> If any , why not all, especially
> if there are only two or three dozen.

Phonetics typically use Latin script as a basis. Like mathematics use
bold, italic, script, sans-serif and double-struck, phonetics use
superscript, subscript, and small caps. From a Unicode viewpoint,
phonetics are not less important than mathematics. Mathematicians have
been granted more than one dozen complete or completing alphabets of
preformatted characters. Phoneticists have never been granted any
complete alphabet. They must always prove their needs in detail,
whereas mathematicians have full liberty in choosing variables.

According to my hypothesis and while waiting, I believe that
the intent of the gap kept in the superscript lowercase range,
is to maintain a limitation to the performance of plain text.
I don?t see very well how to apply Hanlon?s razor here, because
there seems to be a strong unwillingness to see people getting
keyboards that allow them to write in plain text without being
bound to high-end software. The goal seems to be to keep the users
dependent on a special formatting feature and to draw them away
from simplicity.
This results clearly from the weird arguments that were thrown
against the proposal of *MODIFIER LETTER SMALL Q. The comment
on behalf of Adobe had only a slight resemblance of commenting
the proposal as such, [?].

Trying to sum up: By encoding these few characters, there would
indeed be a door that is thrown wide open. However, it has then
been pointed out that there would be *no rush* through that door.

Regards,
Marcel

From verdy_p at wanadoo.fr Thu Oct 6 15:22:58 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 6 Oct 2016 22:22:58 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID:

2016-10-06 21:48 GMT+02:00 Christoph P?per :

>
> For ordinal numbers, it?s relatively simple to code language-dependent
> glyph substitution in Opentype which would not require any additional
> effort from the author, ?3?me? would just work, ?3e? ? ?3?? would require
> some extra care to avoid false positives. Letter-only abbreviations,
> however, would only work reliably with an added marker. Many languages
> conventionally, which are written in the roman script, including English,
> choose an apostrophe, but inter-letter periods are also not unheard of.
> That means, ?M?me? and ?M.me? could also be easily converted to ?M??? on a
> font/glyph level. If the used OTF feature is supported and active, this
> will work in plain text environments, but, of course, it depends on the
> font.
>

The *standard* French abbreviation for Madame is NOT "M'me" or "M.me" but
"Mme" without confusion, the superscript on final letters "me" is optional.
False positives on "3e" are extremely rare, and writing it as ?3?? does not
change the isolated ambiguities that could exist with a custom numbering
(but for numbersing sections headers, the title is separated by a
punctuation os there's some context such as its presence in a numbered
list, or the presence of explicit word such as articles ("le 3e") and the
grammatical syntax of sentences.

But if semantic is your issue, we could insert an invisible Unicode mark of
abbreviation (notably the invisible abbreviation dot, which may be rendered
as a dot in some contexts where distinctions by styles cannot be used, or
could be rendered by using superscripts for letters glued after it). We
have such characters for mathematics (invisible addition mark and invisible
multiplication marks (to disambiguate cases in formulas, such as a number
followed by a fraction: does "3 1/2" mean 3.5 or 1.5 ?)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From asmusf at ix.netcom.com Thu Oct 6 16:00:15 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Thu, 6 Oct 2016 14:00:15 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
Message-ID: <588d7cd6-4037-218a-5c32-3d2ddc0e2c6d@ix.netcom.com>

An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Thu Oct 6 16:07:00 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 6 Oct 2016 23:07:00 +0200
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
Message-ID:

As far as we know, arithmetic is performed only in
- subsets of decimal digits in ASCII and for a dozen of scripts and
converting automatically between them using a single additive constant for
the 10 digits.
- Basic Latin/ASCII for mapping lettercases and mapping non-decimal digits
(adding 6 starting at 10 to use letters A..Z after 0..9)
- the subset of precomposed syllables in Hangul (needed also for checking
canonical equivalences and for the standard NFC/NFD normalizations, and
partly for implementing NFKC/NFKD normalizations and collation).
- in all other cases, this is not reliable at all (characters may still be
allocated in unused slots without any relation to case mappings, e.g. for
the slot in the basic Greek alphabet with the final sigma only encoded in
lowercase, or for mapping the Turkic distinction of dotted I and undotted
i): you'll need proper mapping tables.
- for symbols which could benefit of it (such as box-drawing characters),
it is not used, except for Braille patterns, or for mapping between black
and white versions of chess pieces, or mapping between comparable mahjong
tiles series in their basic set (but not necessarily with the same constant
in extended sets, as it would have required allocating them in more columns
than strictly needed), or for ASCII letters with mapping mathematical
variants of Latin letters or RIS symbols or wide variants for CJK.

2016-10-06 21:44 GMT+02:00 Garth Wallace :

> Other than converting between UTFs, is bit arithmetic commonly performed
> on Unicode characters? I was under the impression that it's a rarity if it
> is done at all.
>
> I've been working on a proposal for additional chess symbols used in chess
> problems and variant games, and I've been in communication with the World
> Federation for Chess Composition, which is the international organization
> in charge of chess problems. We have agreement on the repertoire and the
> text of the proposal, but the arrangement of the proposed characters within
> the new block is a sticking point. Some representatives of the WFCC have
> proposed alternate arrangements that assume there will be a need for
> bitwise operations to covert between the existing chess symbols in the
> Miscellaneous Symbols block and related symbols in the new block. I don't
> see the need but maybe I'm missing something.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From christoph.paeper at crissov.de Thu Oct 6 16:08:52 2016
From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=)
Date: Thu, 6 Oct 2016 23:08:52 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com>
<861342229.4994.1475577353789.JavaMail.www@wwinf1n25>
<92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp>
<283719302.9783.1475675868120.JavaMail.www@wwinf1f05>

<451253030.1751.1475738472001.JavaMail.www@wwinf1f05>

Message-ID: <543B726F-F559-44A3-9ACB-84261E77A7A2@crissov.de>

Philippe Verdy :
>
> But if semantic is your issue, we could insert an invisible Unicode mark of abbreviation (notably the invisible abbreviation dot, which may be rendered as a dot in some contexts where distinctions by styles cannot be used, or could be rendered by using superscripts for letters glued after it).

Yes, the necessary marker I mentioned would not need to have a visible glyph. U+002E Full Stop and U+0027 Apostrophe or, preferably, U+2019 Right Single Quotation Mark (alias curly apostrophe) are just common choices in related languages and, of course, already exist. Some style guides allow or recommend to omit (some of) them: ?e. g.?, ?e.g.?, ?eg.?, ?eg?. In acronyms with non-initial capitals, in particular, they?ve almost died out, except in cases like ?U.S.? vs. ?US? vs. ???? vs. ?us? (next to ?UK? and ?UN?).

U+2065 would be an obvious choice (coming right after Invisible Times, Separator and Plus). Possible names:

- Invisible Terminator (as in ?Inc.?)
- Invisible Ellipsis (as in ?L?t?d?, ?M?me?) alias Zero-Width Ellipsis
- Invisible Apostrophe (as in ?Dos and Don?ts?)
- Invisible Full Stop (as in ?L.L.C.?)
- Abbreviation Mark
- Contraction Mark

For ?3?me? and ?3e?, I could also imagine some XY Joiner character to make the most sense.

From kenwhistler at att.net Thu Oct 6 16:28:18 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 6 Oct 2016 14:28:18 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
Message-ID: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

On 10/6/2016 12:44 PM, Garth Wallace wrote:
> Some representatives of the WFCC have proposed alternate arrangements
> that assume there will be a need for bitwise operations to covert
> between the existing chess symbols in the Miscellaneous Symbols block
> and related symbols in the new block. I don't see the need but maybe
> I'm missing something.

I don't think you are missing anything. Bitwise operations would
certainly *not* be needed in a case like this. Small lookup and mapping
tables would suffice.

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From lorna_evans at sil.org Thu Oct 6 17:09:33 2016
From: lorna_evans at sil.org (Lorna Evans)
Date: Thu, 6 Oct 2016 17:09:33 -0500
Subject: IJ with accent
In-Reply-To: <57EB7849.3070908@yspu.org>
References: <57EB7849.3070908@yspu.org>
Message-ID:

Has it been mentioned that U+0133 is not listed in the Soft_Dotted
properties? So, that would indicate it shouldn't have the dot removed
when you do put an acute over U+0133.

Lorna

On 9/28/2016 2:59 AM, a.lukyanov wrote:
> Dutch language writing uses the ligature ? (U+0132, U+0133). When
> accented, it should take an accent on each component, like this:
>
>
>
> If one uses two separate characters (i+j), one can put an accent on
> each character (?j?).
>
> However, if monolithic ligature ? is used, how one can accent it
> correctly? Unicode standard does not answer this.
>
> Probably one should use the sequence U+0133 U+301, with the accent
> doubling automatically, but this is not implemented (??).
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 3608 bytes
Desc: not available
URL:

From everson at evertype.com Thu Oct 6 18:01:17 2016
From: everson at evertype.com (Michael Everson)
Date: Fri, 7 Oct 2016 00:01:17 +0100
Subject: IJ with accent
In-Reply-To:
References: <57EB7849.3070908@yspu.org>

Message-ID: <55648164-7E66-40C2-8DB1-3D98E80A3EF2@evertype.com>

On 6 Oct 2016, at 23:09, Lorna Evans wrote:
>
> Has it been mentioned that U+0133 is not listed in the Soft_Dotted properties? So, that would indicate it shouldn't have the dot removed when you do put an acute over U+0133.

It ought to have that property.

Michael Everson

From richard.wordingham at ntlworld.com Thu Oct 6 18:32:39 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 7 Oct 2016 00:32:39 +0100
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
Message-ID: <20161007003239.5d1eee7b@JRWUBU2>

On Thu, 6 Oct 2016 12:44:05 -0700
Garth Wallace wrote:

> Other than converting between UTFs, is bit arithmetic commonly
> performed on Unicode characters? I was under the impression that it's
> a rarity if it is done at all.

It's possible to use it for the bulk of case folding, especially if the
program only supports a specific repertoire.

For specialist tasks, exploiting arithmetic relationships make sense.
I would expect that most ASCII clones are handled that way. The
problem is that manually constructed lookup tables are prone to human
error.

Richard.

From Shawn.Steele at microsoft.com Thu Oct 6 18:39:37 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Thu, 6 Oct 2016 23:39:37 +0000
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <20161007003239.5d1eee7b@JRWUBU2>
References:
<20161007003239.5d1eee7b@JRWUBU2>
Message-ID:

You can't even case Latin that way. Unless maybe you only care about English.

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham
Sent: Thursday, October 6, 2016 4:33 PM
To: unicode at unicode.org
Subject: Re: Bit arithmetic on Unicode characters?

On Thu, 6 Oct 2016 12:44:05 -0700
Garth Wallace wrote:

> Other than converting between UTFs, is bit arithmetic commonly
> performed on Unicode characters? I was under the impression that it's
> a rarity if it is done at all.

It's possible to use it for the bulk of case folding, especially if the program only supports a specific repertoire.

For specialist tasks, exploiting arithmetic relationships make sense.
I would expect that most ASCII clones are handled that way. The problem is that manually constructed lookup tables are prone to human error.

Richard.

From kenwhistler at att.net Thu Oct 6 18:54:21 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 6 Oct 2016 16:54:21 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <20161007003239.5d1eee7b@JRWUBU2>
References:
<20161007003239.5d1eee7b@JRWUBU2>
Message-ID:

On 10/6/2016 4:32 PM, Richard Wordingham wrote:
> The
> problem is that manually constructed lookup tables are prone to human
> error.

... as are manually constructed algorithms that attempt to take
advantage of sub-ranges of case pair adjacency in the Unicode code
charts to do casing with bit arithmetic.

--Ken

From richard.wordingham at ntlworld.com Thu Oct 6 19:28:19 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 7 Oct 2016 01:28:19 +0100
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<20161007003239.5d1eee7b@JRWUBU2>

Message-ID: <20161007012819.684a22c6@JRWUBU2>

On Thu, 6 Oct 2016 16:54:21 -0700
Ken Whistler wrote:

> On 10/6/2016 4:32 PM, Richard Wordingham wrote:
> > The
> > problem is that manually constructed lookup tables are prone to
> > human error.
>
> ... as are manually constructed algorithms that attempt to take
> advantage of sub-ranges of case pair adjacency in the Unicode code
> charts to do casing with bit arithmetic.

Yes, it's a trade-off. The application I had in mind is converting
between mathematical letter variants and their 'plain' forms. Perhaps
there is just enough information in the UCD to allow exhaustive,
automated tests.

For _simple_ case folding, algorithmic case folding can be expanded to
a list of range tests, generalising what is often done for ASCII.
Obviously the testing should be repeated with each new version of
Unicode, which is straightforward if the case folding is compliant with
Unicode. (Turkish would be a reason for not being compliant.)

Richard.

From Shawn.Steele at microsoft.com Thu Oct 6 19:42:08 2016
From: Shawn.Steele at microsoft.com (Shawn Steele)
Date: Fri, 7 Oct 2016 00:42:08 +0000
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <20161007012819.684a22c6@JRWUBU2>
References:
<20161007003239.5d1eee7b@JRWUBU2>

<20161007012819.684a22c6@JRWUBU2>
Message-ID:

Presumably a table-based approach would merely require rerunning the table-building script from the UCD when new versions were released.

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham
Sent: Thursday, October 6, 2016 5:28 PM
To: unicode at unicode.org
Subject: Re: Bit arithmetic on Unicode characters?

On Thu, 6 Oct 2016 16:54:21 -0700
Ken Whistler wrote:

> On 10/6/2016 4:32 PM, Richard Wordingham wrote:
> > The
> > problem is that manually constructed lookup tables are prone to
> > human error.
>
> ... as are manually constructed algorithms that attempt to take
> advantage of sub-ranges of case pair adjacency in the Unicode code
> charts to do casing with bit arithmetic.

Yes, it's a trade-off. The application I had in mind is converting between mathematical letter variants and their 'plain' forms. Perhaps there is just enough information in the UCD to allow exhaustive, automated tests.

For _simple_ case folding, algorithmic case folding can be expanded to a list of range tests, generalising what is often done for ASCII.
Obviously the testing should be repeated with each new version of Unicode, which is straightforward if the case folding is compliant with Unicode. (Turkish would be a reason for not being compliant.)

Richard.

From oren.watson at gmail.com Thu Oct 6 20:18:15 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Thu, 6 Oct 2016 21:18:15 -0400
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <20161007012819.684a22c6@JRWUBU2>
References:
<20161007003239.5d1eee7b@JRWUBU2>

<20161007012819.684a22c6@JRWUBU2>
Message-ID:

That application is hindered by the fact that

?????????????????????????????????????????????? are unallocated
characters, forming gaps in the otherwise contiguous mathematical
alphabets.

On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham <
richard.wordingham at ntlworld.com> wrote:

> On Thu, 6 Oct 2016 16:54:21 -0700
> Ken Whistler wrote:
>
> > On 10/6/2016 4:32 PM, Richard Wordingham wrote:
> > > The
> > > problem is that manually constructed lookup tables are prone to
> > > human error.
> >
> > ... as are manually constructed algorithms that attempt to take
> > advantage of sub-ranges of case pair adjacency in the Unicode code
> > charts to do casing with bit arithmetic.
>
> Yes, it's a trade-off. The application I had in mind is converting
> between mathematical letter variants and their 'plain' forms. Perhaps
> there is just enough information in the UCD to allow exhaustive,
> automated tests.
>
> For _simple_ case folding, algorithmic case folding can be expanded to
> a list of range tests, generalising what is often done for ASCII.
> Obviously the testing should be repeated with each new version of
> Unicode, which is straightforward if the case folding is compliant with
> Unicode. (Turkish would be a reason for not being compliant.)
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From lang.support at gmail.com Thu Oct 6 21:11:32 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Fri, 7 Oct 2016 13:11:32 +1100
Subject: font-encoded hacks
Message-ID:

Considering the mess that adhoc fonts create. What is the best way forward?

Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?

Most governemt translations I am seeing in Australia for Burmese are in
Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy
8-bit fonts.

Andrew

On Friday, 7 October 2016, Ken Whistler wrote:
> By the way, the biggest ongoing problem we deal with here is the
continuing urge to proliferate font-encoded hacks for particular languages
and writing systems. The text interchange problems that such schemes pose
on an ongoing basis far far outweigh issues like what to do with a Shibuya
109 emoji, imo.

--
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From duerst at it.aoyama.ac.jp Fri Oct 7 01:08:23 2016
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=)
Date: Fri, 7 Oct 2016 15:08:23 +0900
Subject: font-encoded hacks
In-Reply-To:
References:
Message-ID: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>

Hello Andrew,

On 2016/10/07 11:11, Andrew Cunningham wrote:
> Considering the mess that adhoc fonts create. What is the best way forward?

That's very clear: Use Unicode.

> Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?
>
> Most governemt translations I am seeing in Australia for Burmese are in
> Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy
> 8-bit fonts.

Why don't you tell the Australian government?

Regards, Martin.

From moyogo at gmail.com Fri Oct 7 01:42:13 2016
From: moyogo at gmail.com (Denis Jacquerye)
Date: Fri, 07 Oct 2016 06:42:13 +0000
Subject: font-encoded hacks
In-Reply-To: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>
References:
<1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>
Message-ID:

In may case people resort to these hacks because it is an easier short term
solution. All they have to do is use a specific font. They don't have to
switch or find and install a keyboard layout and they don't have to upgrade
to an OS that supports their script with Unicode properly. Because of these
sort term solutions it's hard for a switch to Unicode to gain proper
momentum. Unfortunately, not everybody sees the long term benefit, or often
they see it but cannot do it practically.

Too often Unicode compliant fonts or keyboard layouts have been lacking or
at least have taken much longer to be implemented.
One could wonder if a technical group for keyboards layouts would help this
process.

On Fri, Oct 7, 2016, 07:12 Martin J. D?rst wrote:

> Hello Andrew,
>
> On 2016/10/07 11:11, Andrew Cunningham wrote:
> > Considering the mess that adhoc fonts create. What is the best way
> forward?
>
> That's very clear: Use Unicode.
>
> > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?
> >
> > Most governemt translations I am seeing in Australia for Burmese are in
> > Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy
> > 8-bit fonts.
>
> Why don't you tell the Australian government?
>
> Regards, Martin.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From mark at macchiato.com Fri Oct 7 01:54:00 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Fri, 7 Oct 2016 08:54:00 +0200
Subject: font-encoded hacks
In-Reply-To:
References:
<1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>

Message-ID:

We do provide data for keyboard mappings in CLDR (
http://unicode.org/cldr/charts/latest/keyboards/index.html). There are some
further pieces we need to put into place.

1. Provide a bulk uploader that applies our sanity-checking tests for a
proposed keyboard mapping, and provides real-time feedback to users about
the problems they need to fix.
2. Provide code that converts from the CLDR format into the major
platforms' formats (we have the reverse direction already).
3. (Optional) Prettier charts!

Mark

On Fri, Oct 7, 2016 at 8:42 AM, Denis Jacquerye wrote:

> In may case people resort to these hacks because it is an easier short
> term solution. All they have to do is use a specific font. They don't have
> to switch or find and install a keyboard layout and they don't have to
> upgrade to an OS that supports their script with Unicode properly. Because
> of these sort term solutions it's hard for a switch to Unicode to gain
> proper momentum. Unfortunately, not everybody sees the long term benefit,
> or often they see it but cannot do it practically.
>
> Too often Unicode compliant fonts or keyboard layouts have been lacking or
> at least have taken much longer to be implemented.
> One could wonder if a technical group for keyboards layouts would help
> this process.
>
> On Fri, Oct 7, 2016, 07:12 Martin J. D?rst wrote:
>
>> Hello Andrew,
>>
>> On 2016/10/07 11:11, Andrew Cunningham wrote:
>> > Considering the mess that adhoc fonts create. What is the best way
>> forward?
>>
>> That's very clear: Use Unicode.
>>
>> > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?
>> >
>> > Most governemt translations I am seeing in Australia for Burmese are in
>> > Zawgyi, while most of the Sgaw Karen tramslations are routinely in
>> legacy
>> > 8-bit fonts.
>>
>> Why don't you tell the Australian government?
>>
>> Regards, Martin.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From richard.wordingham at ntlworld.com Fri Oct 7 02:14:07 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 7 Oct 2016 08:14:07 +0100
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<20161007003239.5d1eee7b@JRWUBU2>

<20161007012819.684a22c6@JRWUBU2>

Message-ID: <20161007081407.52a6fa5e@JRWUBU2>

On Thu, 6 Oct 2016 21:18:15 -0400
Oren Watson wrote:

> On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham <
> richard.wordingham at ntlworld.com> wrote:

> > Yes, it's a trade-off. The application I had in mind is converting
> > between mathematical letter variants and their 'plain' forms.
> > Perhaps there is just enough information in the UCD to allow
> > exhaustive, automated tests.

> That application is hindered by the fact that
>
> ?????????????????????????????????????????????? are unallocated
> characters, forming gaps in the otherwise contiguous mathematical
> alphabets.

(Aside: That written statement is illegal! -:)

Yep. It's a known nuisance, which is why I suggested exhaustive tests.
My email client found a font to render U+1D547 as the unwary
would expect, i.e. using a glyph suitable for ? U+2119 DOUBLE-STRUCK
CAPITAL P. I was surprised when I first saw those gaps; I would have
expected characters with appropriate singleton decompositions to protect
the unwary. (The idea might have come up at the time of encoding, and
been dismissed with reasons.) I don't know whether the font's
misrendering is an accident or is deliberate partial protection of the
victims of bad character code selection.

An old application of arithmetic was transliteration between the
major Indian Indic scripts. That falls foul of Tamil and of characters
that were not represented in ISCII.

Richard.

From gwalla at gmail.com Fri Oct 7 02:27:47 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Fri, 7 Oct 2016 00:27:47 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<20161007003239.5d1eee7b@JRWUBU2>

<20161007012819.684a22c6@JRWUBU2>

Message-ID:

On Thu, Oct 6, 2016 at 5:42 PM, Shawn Steele
wrote:

> Presumably a table-based approach would merely require rerunning the
> table-building script from the UCD when new versions were released.
>

For casing, sure, but that's not really relevant in this context, since
Unicode doesn't really address chess piece properties like white/black
beyond naming conventions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From haberg-1 at telia.com Fri Oct 7 03:43:44 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Fri, 7 Oct 2016 10:43:44 +0200
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<20161007003239.5d1eee7b@JRWUBU2>

<20161007012819.684a22c6@JRWUBU2>

Message-ID:

> On 7 Oct 2016, at 09:27, Garth Wallace wrote:
>
> Unicode doesn't really address chess piece properties like white/black beyond naming conventions.

>From the formal point of view, Unicode only assigns character numbers (code points), which gets a binary representation first when encoded, like with UTF-8 which makes it agree with ASCII for small numbers. The math alphabetical letters are out of order because of legacy, but that is not a problem as one will use an interface that sorts it out. These numbers are only for display to humans, and computers are nowadays fast enough to sort it out. A chess program has its own, optimized representation anyway.

So possibly you might add more properties.

From neil at tonal.clara.co.uk Fri Oct 7 05:59:42 2016
From: neil at tonal.clara.co.uk (Neil Harris)
Date: Fri, 7 Oct 2016 11:59:42 +0100
Subject: font-encoded hacks
In-Reply-To:
References:
<1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>

Message-ID: <979ca47c-bc82-41ed-5ec8-9d29658791d5@tonal.clara.co.uk>

On 07/10/16 07:42, Denis Jacquerye wrote:
> In may case people resort to these hacks because it is an easier short term
> solution. All they have to do is use a specific font. They don't have to
> switch or find and install a keyboard layout and they don't have to upgrade
> to an OS that supports their script with Unicode properly. Because of these
> sort term solutions it's hard for a switch to Unicode to gain proper
> momentum. Unfortunately, not everybody sees the long term benefit, or often
> they see it but cannot do it practically.
>
> Too often Unicode compliant fonts or keyboard layouts have been lacking or
> at least have taken much longer to be implemented.
> One could wonder if a technical group for keyboards layouts would help this
> process.

What might also help is a reconceptualization of these hacks as being in
effect non-standard character encodings: the existing software
infrastructure for handling charsets could then be co-opted to convert
them to (and possibly from) Unicode if desired.

Neil

From doug at ewellic.org Fri Oct 7 11:06:31 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 07 Oct 2016 09:06:31 -0700
Subject: Bit arithmetic on Unicode =?UTF-8?Q?characters=3F?=
Message-ID: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com>

Richard Wordingham wrote:

> Yes, it's a trade-off. The application I had in mind is converting
> between mathematical letter variants and their 'plain' forms.

Long-time list members might remember a Windows utility I wrote to
convert between normal Unicode text and Mathematical Alphanumeric
Symbols. Andrew West (of BabelPad fame) has a similar, web-based app
that also supports things like small caps and superscript.

Both of these use lookup tables to do the conversions, and use
algorithms only for very broad-based operations, like distinguishing the
Latin-letter range in the MAS block from the Greek letters and the
digits. There's no practical value in implementing conversions like this
algorithmically. Maybe if there were one or two exceptions in the MAS
range instead of two dozen, it might be different.

> Perhaps there is just enough information in the UCD to allow
> exhaustive, automated tests.

I can't find anything in the UCD that distinguishes one "font variant"
from another (UnicodeData.txt shown as an example):

1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D49C;MATHEMATICAL SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D4D0;MATHEMATICAL BOLD SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D504;MATHEMATICAL FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D538;MATHEMATICAL DOUBLE-STRUCK CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D56C;MATHEMATICAL BOLD FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D5A0;MATHEMATICAL SANS-SERIF CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
1D5D4;MATHEMATICAL SANS-SERIF BOLD CAPITAL A;Lu;0;L;
0041;;;;N;;;;;
1D608;MATHEMATICAL SANS-SERIF ITALIC CAPITAL A;Lu;0;L;
0041;;;;N;;;;;
1D63C;MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A;Lu;0;L;
0041;;;;N;;;;;
1D670;MATHEMATICAL MONOSPACE CAPITAL A;Lu;0;L; 0041;;;;N;;;;;

And that's probably as it should be, because UTC never intended MAS to
be readily transformed to and from "plain" characters. They're supposed
to be used for mathematical expressions in which styled letters have
special meaning. (My utility, and I'm sure Andrew's, were written
entirely tongue-in-cheek.)

> My email client found a font to render U+1D547 as the unwary
> would expect, i.e. using a glyph suitable for ? U+2119 DOUBLE-STRUCK
> CAPITAL P. I was surprised when I first saw those gaps; I would have
> expected characters with appropriate singleton decompositions to protect
> the unwary. (The idea might have come up at the time of encoding, and
> been dismissed with reasons.)

Unifying identical characters with identical meanings, rather than
creating pointless duplicates, was a major design tenet of Unicode.

> I don't know whether the font's misrendering is an accident or is
> deliberate partial protection of the victims of bad character code
> selection.

Either way, it's a bug. Users who try to render an unassigned code point
should not be "protected" by showing them a glyph that the font designer
thought should be there. They should be shown a .notdef glyph so they
know something is wrong.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From doug at ewellic.org Fri Oct 7 11:22:21 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 07 Oct 2016 09:22:21 -0700
Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?=
Message-ID: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Marcel Schneider wrote:

> According to my hypothesis and while waiting, I believe that
> the intent of the gap kept in the superscript lowercase range,
> is to maintain a limitation to the performance of plain text.
> I don't see very well how to apply Hanlon's razor here, because
> there seems to be a strong unwillingness to see people getting
> keyboards that allow them to write in plain text without being
> bound to high-end software. The goal seems to be to keep the users
> dependent on a special formatting feature and to draw them away
> from simplicity.

Hanlon's Razor doesn't apply here, because it's not a dichotomy between
malice and stupidity.

Unicode has a particular definition of what constitutes "plain text,"
and it's become evident over the past 25 years that some people have
different definitions. That's probably never going to change (I
personally don't believe the difference between black-and-white pictures
of cows and color pictures of cows is a plain-text distinction), but the
Unicode definition is really the one that matters in discussions like
this.

What doesn't help, IMHO, is to claim that UTC has some ulterior motive
to restrict the applicability of plain text and manipulate users and
"draw them away from simplicity." I think insinuations of evil intent
need to be better-founded than that.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From haberg-1 at telia.com Fri Oct 7 11:57:02 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Fri, 7 Oct 2016 18:57:02 +0200
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com>
References: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com>
Message-ID: <20119351-749E-4B33-8A07-79C592810CE0@telia.com>

> On 7 Oct 2016, at 18:06, Doug Ewell wrote:

> I can't find anything in the UCD that distinguishes one "font variant"
> from another (UnicodeData.txt shown as an example):
>
> 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D49C;MATHEMATICAL SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D4D0;MATHEMATICAL BOLD SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D504;MATHEMATICAL FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D538;MATHEMATICAL DOUBLE-STRUCK CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D56C;MATHEMATICAL BOLD FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D5A0;MATHEMATICAL SANS-SERIF CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D5D4;MATHEMATICAL SANS-SERIF BOLD CAPITAL A;Lu;0;L;
> 0041;;;;N;;;;;
> 1D608;MATHEMATICAL SANS-SERIF ITALIC CAPITAL A;Lu;0;L;
> 0041;;;;N;;;;;
> 1D63C;MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A;Lu;0;L;
> 0041;;;;N;;;;;
> 1D670;MATHEMATICAL MONOSPACE CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
>
> And that's probably as it should be, because UTC never intended MAS to
> be readily transformed to and from "plain" characters. They're supposed
> to be used for mathematical expressions in which styled letters have
> special meaning.

I use them for input text files, and it is not particularly difficult. An efficient method is to use text substitutions, as available on MacOS. The resulting file is UTF-8 with the correct character, and typesetting systems like LuaTeX with ConTeXt or LaTeX/unicode-math translates it into a PDF. It is usually easy to immediately spot if a math style is wrong. Using it in the input makes one more aware of new styles that in the past was not available.

From oren.watson at gmail.com Fri Oct 7 13:25:43 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Fri, 7 Oct 2016 14:25:43 -0400
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Message-ID:

Would it be appropriate to submit an omnibus proposal for encoding all
remaining english letters in subscript, small caps, and superscript in the
SMP for the purpose of not arbitrarily constraining the use of unicode for
new linguistic theories and ideas, similar to the mathematical characters?

superscripted: CFQXYZ, q
subscript: A-Z, bcdfgqwyz
small capital: QX
total: 44 characters.

On Fri, Oct 7, 2016 at 12:22 PM, Doug Ewell wrote:

> Marcel Schneider wrote:
>
> > According to my hypothesis and while waiting, I believe that
> > the intent of the gap kept in the superscript lowercase range,
> > is to maintain a limitation to the performance of plain text.
> > I don't see very well how to apply Hanlon's razor here, because
> > there seems to be a strong unwillingness to see people getting
> > keyboards that allow them to write in plain text without being
> > bound to high-end software. The goal seems to be to keep the users
> > dependent on a special formatting feature and to draw them away
> > from simplicity.
>
> Hanlon's Razor doesn't apply here, because it's not a dichotomy between
> malice and stupidity.
>
> Unicode has a particular definition of what constitutes "plain text,"
> and it's become evident over the past 25 years that some people have
> different definitions. That's probably never going to change (I
> personally don't believe the difference between black-and-white pictures
> of cows and color pictures of cows is a plain-text distinction), but the
> Unicode definition is really the one that matters in discussions like
> this.
>
> What doesn't help, IMHO, is to claim that UTC has some ulterior motive
> to restrict the applicability of plain text and manipulate users and
> "draw them away from simplicity." I think insinuations of evil intent
> need to be better-founded than that.
>
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From everson at evertype.com Fri Oct 7 13:33:09 2016
From: everson at evertype.com (Michael Everson)
Date: Fri, 7 Oct 2016 19:33:09 +0100
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Message-ID:

On 7 Oct 2016, at 19:25, Oren Watson wrote:
>
> Would it be appropriate to submit an omnibus proposal for encoding all remaining english letters in subscript, small caps, and superscript in the SMP for the purpose of not arbitrarily constraining the use of unicode for new linguistic theories and ideas, similar to the mathematical characters?
>
> superscripted: CFQXYZ, q

I?d support these.

> subscript: A-Z, bcdfgqwyz

If NONE of the letters A-Z have been subscripted there?s not much reason to think that?s common or useful. I?d support bcdfgqwyz

> small capital: QX

Small capital Q is under ballot.

The subscript Greek alpha had a very good rationale recently.

Michael Everson

From doug at ewellic.org Fri Oct 7 13:47:49 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 07 Oct 2016 11:47:49 -0700
Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?=
Message-ID: <20161007114749.665a7a7059d7ee80bb4d670165c8327d.49430e8579.wbe@email03.godaddy.com>

Oren Watson wrote:

> Would it be appropriate to submit an omnibus proposal for encoding all
> remaining english letters in subscript, small caps, and superscript in
> the SMP for the purpose of not arbitrarily constraining the use of
> unicode for new linguistic theories and ideas, similar to the
> mathematical characters?

"For new theories and ideas" is a red flag. For letters in writing
systems, it's traditionally been important to show how the character(s)
would be used in current, real-world scenarios, not for some future,
as-yet unknown purpose. It's likely that the proposals to add the
existing subscript and superscript and smallcap letters were required to
include such rationales.

Using the math alphabets as a precedent for encoding something might not
be an effective strategy, as they are often considered to be exceptional
and not analogous to characters used for writing human languages.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From kenwhistler at att.net Fri Oct 7 13:53:16 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Fri, 7 Oct 2016 11:53:16 -0700
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Message-ID:

On 10/7/2016 11:25 AM, Oren Watson wrote:
> Would it be appropriate to submit an omnibus proposal for encoding all
> remaining english letters in subscript, small caps, and superscript in
> the SMP for the purpose of not arbitrarily constraining the use of
> unicode for new linguistic theories and ideas, similar to the
> mathematical characters?
>

I don't see that the use of Unicode characters for new linguistic
theories and ideas is arbitrarily constrained as it stands. So no, I
don't think it make sense to submit such a proposal on spec. I don't
understand peoples' fascination with multiplying the encoding of the
Latin alphabet A-Z over and over and over again. Modifier letters are
different from the mathematical styled alphabets -- modifier letters
include many letters and symbols beyond A-Z, and there isn't any clear
marginal benefit in trying to "complete" their set somehow by filling in
Latin alphabet encoding gaps without clear use cases.

--Ken

From oren.watson at gmail.com Fri Oct 7 14:32:16 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Fri, 7 Oct 2016 15:32:16 -0400
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Message-ID:

Hmm... "filling in Latin alphabet encoding gaps without clear use cases" is
exactly what was done for the blackboard bold letters.
I scarcely think that a use case was submitted for every one of the
blackboard bold etc letters in the mathematical set; merely the use of
blackboard bold for a general purpose of denoting sets such as the
naturals, reals, complex numbers etc, and the fact that arbitrary letters
might be used if a mathematician desired, seems to have sufficed.

I believe the same logic applies to the case of linguistics, where the use
of superscripts are a common convention.

On Fri, Oct 7, 2016 at 2:53 PM, Ken Whistler wrote:

>
>
> On 10/7/2016 11:25 AM, Oren Watson wrote:
>
>> Would it be appropriate to submit an omnibus proposal for encoding all
>> remaining english letters in subscript, small caps, and superscript in the
>> SMP for the purpose of not arbitrarily constraining the use of unicode for
>> new linguistic theories and ideas, similar to the mathematical characters?
>>
>>
> I don't see that the use of Unicode characters for new linguistic theories
> and ideas is arbitrarily constrained as it stands. So no, I don't think it
> make sense to submit such a proposal on spec. I don't understand peoples'
> fascination with multiplying the encoding of the Latin alphabet A-Z over
> and over and over again. Modifier letters are different from the
> mathematical styled alphabets -- modifier letters include many letters and
> symbols beyond A-Z, and there isn't any clear marginal benefit in trying to
> "complete" their set somehow by filling in Latin alphabet encoding gaps
> without clear use cases.
>
> --Ken
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From lang.support at gmail.com Fri Oct 7 15:54:11 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 8 Oct 2016 07:54:11 +1100
Subject: font-encoded hacks
In-Reply-To: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>
References:
<1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>
Message-ID:

On 7 Oct 2016 17:08, "Martin J. D?rst" wrote:
>
> Hello Andrew,
>
>
> On 2016/10/07 11:11, Andrew Cunningham wrote:
>>
>> Considering the mess that adhoc fonts create. What is the best way
forward?
>
>
> That's very clear: Use Unicode.
>

LOL, thanks Martin. That has been my position for a long time.

>
>> Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?
>>
>> Most governemt translations I am seeing in Australia for Burmese are in
>> Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy
>> 8-bit fonts.
>
>
> Why don't you tell the Australian government?

Easier to tell the state governments, than the Federal government. But it
is something I am working on.

>
> Regards, Martin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From lang.support at gmail.com Fri Oct 7 16:22:16 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 8 Oct 2016 08:22:16 +1100
Subject: font-encoded hacks
In-Reply-To:
References:
<1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>

Message-ID:

Hi Denis,

In some ways, it was easier. But looking at each language, the issues seem
to be have a slightly different slant.

Sgaw Karen is interesting in comparison to Burmese. There is some use of
the hacked Zwekabin font by bloggers, but most content, and key media still
use 8 bit fonts. Although little use of Unicode.

The lack of uptake of Unicode fonts seems to lie in the fact that the
default rendering for most Myanmar script fonts is Burmese. If Sgaw Karen,
etc are supported it is via locl features. If a Sgaw Karen user is using
the font in software when they can't control the necessary opentype
features, or don't know they can and need to .... you will eventually get a
perception that their language isn't supported.

There are font developers among the Burmese, Mon, Shan ethnic groups
developing Unicode fonts tailored for there needs.

Burmese situation is quite different. A topic that I have discussed often
with Burmese colleagues. I have my theories. But the current resurgence of
Zawgyi very much depends on the ability of mobile devices to render Myanmar
Unicode, and the choices telcos and handset manufacturers make regarding
system fonts.

Regarding keyboards, it is interesting comparing Khmer and Burmese. Uptake
of Unicode was earlier and quicker for Khmer. When Khmer keyboards were
developed, the Khmer developers chose to live with the severe limitations
of system level input frameworks. It is only this year that I have started
to see truly innovative research into what a Khmer input system should be.

Burmese Unicode developers on the other hand were never satisfied with
those limitations, and various developers looked into alternatives.

Andrew

On 7 Oct 2016 17:42, "Denis Jacquerye" wrote:
>
> In may case people resort to these hacks because it is an easier short
term solution. All they have to do is use a specific font. They don't have
to switch or find and install a keyboard layout and they don't have to
upgrade to an OS that supports their script with Unicode properly. Because
of these sort term solutions it's hard for a switch to Unicode to gain
proper momentum. Unfortunately, not everybody sees the long term benefit,
or often they see it but cannot do it practically.
>
> Too often Unicode compliant fonts or keyboard layouts have been lacking
or at least have taken much longer to be implemented.
> One could wonder if a technical group for keyboards layouts would help
this process.
>
>
> On Fri, Oct 7, 2016, 07:12 Martin J. D?rst wrote:
>>
>> Hello Andrew,
>>
>> On 2016/10/07 11:11, Andrew Cunningham wrote:
>> > Considering the mess that adhoc fonts create. What is the best way
forward?
>>
>> That's very clear: Use Unicode.
>>
>> > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?
>> >
>> > Most governemt translations I am seeing in Australia for Burmese are in
>> > Zawgyi, while most of the Sgaw Karen tramslations are routinely in
legacy
>> > 8-bit fonts.
>>
>> Why don't you tell the Australian government?
>>
>> Regards, Martin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From lang.support at gmail.com Fri Oct 7 16:26:40 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 8 Oct 2016 08:26:40 +1100
Subject: font-encoded hacks
In-Reply-To:
References:
<1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>

Message-ID:

Hi Mark,

The converters would be interesting to see, and would be personally useful
to me.

But the type of keyboard layouts and input frameworks reflected in CLDR
have limited bearing on issues related to the uptake of Unicode for Myanmar
script.

Andrew

On 7 Oct 2016 17:54, "Mark Davis ??" wrote:

> We do provide data for keyboard mappings in CLDR (http://unicode.org/cldr/
> charts/latest/keyboards/index.html). There are some further pieces we
> need to put into place.
>
> 1. Provide a bulk uploader that applies our sanity-checking tests for
> a proposed keyboard mapping, and provides real-time feedback to users about
> the problems they need to fix.
> 2. Provide code that converts from the CLDR format into the major
> platforms' formats (we have the reverse direction already).
> 3. (Optional) Prettier charts!
>
>
> Mark
>
> On Fri, Oct 7, 2016 at 8:42 AM, Denis Jacquerye wrote:
>
>> In may case people resort to these hacks because it is an easier short
>> term solution. All they have to do is use a specific font. They don't have
>> to switch or find and install a keyboard layout and they don't have to
>> upgrade to an OS that supports their script with Unicode properly. Because
>> of these sort term solutions it's hard for a switch to Unicode to gain
>> proper momentum. Unfortunately, not everybody sees the long term benefit,
>> or often they see it but cannot do it practically.
>>
>> Too often Unicode compliant fonts or keyboard layouts have been lacking
>> or at least have taken much longer to be implemented.
>> One could wonder if a technical group for keyboards layouts would help
>> this process.
>>
>> On Fri, Oct 7, 2016, 07:12 Martin J. D?rst
>> wrote:
>>
>>> Hello Andrew,
>>>
>>> On 2016/10/07 11:11, Andrew Cunningham wrote:
>>> > Considering the mess that adhoc fonts create. What is the best way
>>> forward?
>>>
>>> That's very clear: Use Unicode.
>>>
>>> > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?
>>> >
>>> > Most governemt translations I am seeing in Australia for Burmese are in
>>> > Zawgyi, while most of the Sgaw Karen tramslations are routinely in
>>> legacy
>>> > 8-bit fonts.
>>>
>>> Why don't you tell the Australian government?
>>>
>>> Regards, Martin.
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From lang.support at gmail.com Fri Oct 7 16:35:58 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 8 Oct 2016 08:35:58 +1100
Subject: font-encoded hacks
In-Reply-To: <979ca47c-bc82-41ed-5ec8-9d29658791d5@tonal.clara.co.uk>
References:
<1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp>

<979ca47c-bc82-41ed-5ec8-9d29658791d5@tonal.clara.co.uk>
Message-ID:

HI Neil,

I tend to prefer refering to them as Pseudo-Unicode solutions, rather than
hacked fonts or adhoc fonts, and differentiating them from legacy or 8-bit
solutions.

My preferred approach would to be to treat them as a separate encoding. But
I doubt that will likely happen.

It doesn't help that a mobile devices I purchase in Australia will ship
with a Unicode font installed, but the same device and model, may ship with
a non-Unicode font installed in Myanmar and potentially other parts of SE
Asia.

Andrew

On 7 Oct 2016 22:04, "Neil Harris" wrote:

> On 07/10/16 07:42, Denis Jacquerye wrote:
>
>> In may case people resort to these hacks because it is an easier short
>> term
>> solution. All they have to do is use a specific font. They don't have to
>> switch or find and install a keyboard layout and they don't have to
>> upgrade
>> to an OS that supports their script with Unicode properly. Because of
>> these
>> sort term solutions it's hard for a switch to Unicode to gain proper
>> momentum. Unfortunately, not everybody sees the long term benefit, or
>> often
>> they see it but cannot do it practically.
>>
>> Too often Unicode compliant fonts or keyboard layouts have been lacking or
>> at least have taken much longer to be implemented.
>> One could wonder if a technical group for keyboards layouts would help
>> this
>> process.
>>
>
> What might also help is a reconceptualization of these hacks as being in
> effect non-standard character encodings: the existing software
> infrastructure for handling charsets could then be co-opted to convert them
> to (and possibly from) Unicode if desired.
>
> Neil
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From richard.wordingham at ntlworld.com Fri Oct 7 17:21:03 2016
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 7 Oct 2016 23:21:03 +0100
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com>
References: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com>
Message-ID: <20161007232103.5e51b9bd@JRWUBU2>

On Fri, 07 Oct 2016 09:06:31 -0700
"Doug Ewell" wrote:

> Richard Wordingham wrote:

> > Perhaps there is just enough information in the UCD to allow
> > exhaustive, automated tests.

> I can't find anything in the UCD that distinguishes one "font variant"
> from another (UnicodeData.txt shown as an example):

> 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;;
> 1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;;

It's in that most treacherous of properties, the character's name.

Richard.

From doug at ewellic.org Fri Oct 7 17:31:00 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 07 Oct 2016 15:31:00 -0700
Subject: Bit arithmetic on Unicode =?UTF-8?Q?characters=3F?=
Message-ID: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com>

Richard Wordingham wrote:

>> I can't find anything in the UCD that distinguishes one "font
>> variant" from another (UnicodeData.txt shown as an example):
>
> It's in that most treacherous of properties, the character's name.

Well, "treacherous" is right. I'd hesitate to trust an algorithm to
recognize PLANCK CONSTANT as the character name that logically fits
between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From andrewcwest at gmail.com Fri Oct 7 17:41:08 2016
From: andrewcwest at gmail.com (Andrew West)
Date: Fri, 7 Oct 2016 23:41:08 +0100
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com>
References: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com>
Message-ID:

On 7 October 2016 at 23:31, Doug Ewell wrote:
>
> Well, "treacherous" is right. I'd hesitate to trust an algorithm to
> recognize PLANCK CONSTANT as the character name that logically fits
> between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I.

Well, it could be picked up from that most treacherous of Unicode data
files http://www.unicode.org/Public/UNIDATA/NamesList.txt

Andrew

From oren.watson at gmail.com Fri Oct 7 17:48:41 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Fri, 7 Oct 2016 18:48:41 -0400
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com>

Message-ID:

Except that it states at the very start of that file "this file should not be
parsed for machine-readable information."

On Fri, Oct 7, 2016 at 6:41 PM, Andrew West wrote:

> On 7 October 2016 at 23:31, Doug Ewell wrote:
> >
> > Well, "treacherous" is right. I'd hesitate to trust an algorithm to
> > recognize PLANCK CONSTANT as the character name that logically fits
> > between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I.
>
> Well, it could be picked up from that most treacherous of Unicode data
> files http://www.unicode.org/Public/UNIDATA/NamesList.txt
>
> Andrew
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Fri Oct 7 17:52:53 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 07 Oct 2016 15:52:53 -0700
Subject: Bit arithmetic on Unicode =?UTF-8?Q?characters=3F?=
Message-ID: <20161007155253.665a7a7059d7ee80bb4d670165c8327d.fd98cfe8d8.wbe@email03.godaddy.com>

Andrew West wrote:

> Well, it could be picked up from that most treacherous of Unicode data
> files http://www.unicode.org/Public/UNIDATA/NamesList.txt

Even then, you have:

...
1D454 MATHEMATICAL ITALIC SMALL G
# 0067 latin small letter g
1D455
x (planck constant - 210E)
1D456 MATHEMATICAL ITALIC SMALL I
# 0069 latin small letter i
...

The only way you can tell from this that U+210E is a mathematical italic
small H is from the context of the previous character. That wouldn't
bode well if the letter A were one of the exceptionally located code
points. Thankfully, it never is, so this cleverness might work after
all.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From gwalla at gmail.com Fri Oct 7 23:29:10 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Fri, 7 Oct 2016 21:29:10 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>
Message-ID:

On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote:

>
> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>
> Some representatives of the WFCC have proposed alternate arrangements that
> assume there will be a need for bitwise operations to covert between the
> existing chess symbols in the Miscellaneous Symbols block and related
> symbols in the new block. I don't see the need but maybe I'm missing
> something.
>
>
> I don't think you are missing anything. Bitwise operations would certainly
> *not* be needed in a case like this. Small lookup and mapping tables
> would suffice.
>
> --Ken
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From gwalla at gmail.com Fri Oct 7 23:36:56 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Fri, 7 Oct 2016 21:36:56 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>
Message-ID:

Sorry about the blank reply. Itchy trigger finger.

On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote:

>
> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>
> Some representatives of the WFCC have proposed alternate arrangements that
> assume there will be a need for bitwise operations to covert between the
> existing chess symbols in the Miscellaneous Symbols block and related
> symbols in the new block. I don't see the need but maybe I'm missing
> something.
>
>
> I don't think you are missing anything. Bitwise operations would certainly
> *not* be needed in a case like this. Small lookup and mapping tables
> would suffice.
>
> --Ken
>
>
Thank you.

Just to be clear, this is the proposed allocation as it stands:
http://i556.photobucket.com/albums/ss7/Garth_Wallace/proposed%20characters_zps81m0frvl.png

That arrangement is the result of some discussion with a representative of
the WFCC.

And here are the alternatives that another WFCC representative recently
proposed and that prompted my question:
http://i556.photobucket.com/albums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From jcb+unicode at inf.ed.ac.uk Sat Oct 8 05:03:12 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Sat, 8 Oct 2016 11:03:12 +0100 (BST)
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Message-ID:

On 2016-10-07, Oren Watson wrote:
> I scarcely think that a use case was submitted for every one of the
> blackboard bold etc letters in the mathematical set; merely the use of
> blackboard bold for a general purpose of denoting sets such as the
> naturals, reals, complex numbers etc, and the fact that arbitrary letters
> might be used if a mathematician desired, seems to have sufficed.

Indeed. I happen to think the whole math alphabet thing was a dumb
mistake. But even if it isn't - and incidentally in some communities
there is or was a convention of using blackboard bold letters for
matrices, which justifies all of them -:

> I believe the same logic applies to the case of linguistics, where the use
> of superscripts are a common convention.

Either superscripts are being used mathematically, in which case you
can use mathematical markup, or they're being used with very specific
semantics, as in the phonetic modifier letters. For the latter case,
there is a standard. First you get your letter recognized by the IPA,
then you encode it. The IPA doesn't recognize arbitrary superscripts.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From haberg-1 at telia.com Sat Oct 8 08:28:02 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Sat, 8 Oct 2016 15:28:02 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Message-ID: <77F4CBD2-3C01-4D5C-9C46-D119B979C755@telia.com>

> On 8 Oct 2016, at 12:03, Julian Bradfield wrote:
>
> I happen to think the whole math alphabet thing was a dumb
> mistake.

They are useful in mathematics, but other sciences may not use them.

> But even if it isn't - and incidentally in some communities
> there is or was a convention of using blackboard bold letters for
> matrices, which justifies all of them -:

The double-struck letters are popular among mathematicians.

>> I believe the same logic applies to the case of linguistics, where the use
>> of superscripts are a common convention.
>
> Either superscripts are being used mathematically, in which case you
> can use mathematical markup, ?

The principle for adding stuff to Unicode, I think, was that the semantics should be expressible in a text-only file, modulo what the technology is able to express.

For math, it is not known exactly what is required to express it semantically. TeX treats it as syntactic markup, for example, for superscripts and subscripts on the left hand side, and tensor component notation.

Rendering technologies have evolved, though, so from that point of view, more would be possible today.

From ken.shirriff at gmail.com Sat Oct 8 10:24:59 2016
From: ken.shirriff at gmail.com (Ken Shirriff)
Date: Sat, 8 Oct 2016 08:24:59 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

Message-ID:

Looking at the image, the idea of the proposal is to include chess piece
symbols in all four 90? rotations? Wouldn't it be better to do this in
markup than in Unicode? I fear a combinatorial explosion if Unicode starts
including all the possible orientations of characters. (Maybe there's a
good reason to do this for chess; I'm just going off the image

.)

Ken

On Fri, Oct 7, 2016 at 9:36 PM, Garth Wallace wrote:

> Sorry about the blank reply. Itchy trigger finger.
>
> On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote:
>
>>
>> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>>
>> Some representatives of the WFCC have proposed alternate arrangements
>> that assume there will be a need for bitwise operations to covert between
>> the existing chess symbols in the Miscellaneous Symbols block and related
>> symbols in the new block. I don't see the need but maybe I'm missing
>> something.
>>
>>
>> I don't think you are missing anything. Bitwise operations would
>> certainly *not* be needed in a case like this. Small lookup and mapping
>> tables would suffice.
>>
>> --Ken
>>
>>
> Thank you.
>
> Just to be clear, this is the proposed allocation as it stands:
> http://i556.photobucket.com/albums/ss7/Garth_Wallace/
> proposed%20characters_zps81m0frvl.png
>
> That arrangement is the result of some discussion with a representative of
> the WFCC.
>
> And here are the alternatives that another WFCC representative recently
> proposed and that prompted my question: http://i556.photobucket.com/
> albums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Sat Oct 8 11:31:05 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sat, 8 Oct 2016 18:31:05 +0200
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

Message-ID:

Markup for rotation is highly underdeveloped, and in this case for chess it
has its own semantics, it's not just a presentation feature, possibly meant
for playing on larger boards with more players than 2, and distinguished
just like there's a distinction between white and black, or meant to signal
some dangerous positions or candidate target positions for the next moves.

I also see some additions like florettes, and elephants needed for
traditional Asian variants of the game, plus combined forms (e.g.
tower+horse) which are quite intrigating.
There are also variants rotated 45 degrees.

All those are not just meant for display on the grid of a board but in
discussions about strategies. There are also combining notations added on
top of chess pieces (e.g. numbering pawns that are otherwise identical, but
in plain text you can still use notations with superscript digits or
letters, distinguished clearly from the numbering of grid positions, or by
adding some other punctuation marks).

I still don't see in these images the elephants (or other pieces like
unmovable rocks or rivers, or special pieces added to create handicaps for
one of the player). I've also seen some chess players using special queens
by putting a pawn on top of a nother falt pawn, with more limited movements
than a standard queen. There are also bishops/sorcerers/magicians, eagles,
dragoons, tigers/lions, rats, dogs/foxes, snakes,
spiders, soldiers/archers, canons, walls/fortresses, gold/treasures...
Chess games have a lot of variants with their supporters. Modern movies are
also promoting some variants.

2016-10-08 17:24 GMT+02:00 Ken Shirriff :

> Looking at the image, the idea of the proposal is to include chess piece
> symbols in all four 90? rotations? Wouldn't it be better to do this in
> markup than in Unicode? I fear a combinatorial explosion if Unicode starts
> including all the possible orientations of characters. (Maybe there's a
> good reason to do this for chess; I'm just going off the image
>
> .)
>
> Ken
>
> On Fri, Oct 7, 2016 at 9:36 PM, Garth Wallace wrote:
>
>> Sorry about the blank reply. Itchy trigger finger.
>>
>> On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote:
>>
>>>
>>> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>>>
>>> Some representatives of the WFCC have proposed alternate arrangements
>>> that assume there will be a need for bitwise operations to covert between
>>> the existing chess symbols in the Miscellaneous Symbols block and related
>>> symbols in the new block. I don't see the need but maybe I'm missing
>>> something.
>>>
>>>
>>> I don't think you are missing anything. Bitwise operations would
>>> certainly *not* be needed in a case like this. Small lookup and mapping
>>> tables would suffice.
>>>
>>> --Ken
>>>
>>>
>> Thank you.
>>
>> Just to be clear, this is the proposed allocation as it stands:
>> http://i556.photobucket.com/albums/ss7/Garth_Wallace/propose
>> d%20characters_zps81m0frvl.png
>>
>> That arrangement is the result of some discussion with a representative
>> of the WFCC.
>>
>> And here are the alternatives that another WFCC representative recently
>> proposed and that prompted my question: http://i556.photobucket.com/al
>> bums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From jameskasskrv at gmail.com Sat Oct 8 12:57:41 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Sat, 8 Oct 2016 09:57:41 -0800
Subject: Noto unified font
Message-ID:

Google and Monotype unveil The Noto Project's unified font for all
languages:
https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-projects-unified-font-for-all-languages/

About ten years or so ago, I recall being actively discouraged from working
on the Code2xxx fonts because pan-Unicode fonts were pass?, because there
was no perceived need for displaying multilingual text in a coherent
typeface, and that the optimal solution was for people to simply have
multiple fonts targeting the users' required scripts.

Ironic, isn't it?

Best regards,

James Kass
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Sat Oct 8 14:08:07 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sat, 8 Oct 2016 21:08:07 +0200
Subject: Noto unified font
In-Reply-To:
References:
Message-ID:

Technically it is not a single font but a coherent collection of fonts made
specifically for each script (some scripts having several national
variants, notably for sinographs, most of them having two styles except
symbols, most of them having two weights, except symbols that have a single
weight and sinograms having more...)

So no they are not "pan-Unicode". Each font in the collection however has
its own metrics, best suited for each script, and they are still made to
harmonize together (tested side-by-side with Latin and CJK) so they look
great in multilingual documents. It would have not been possible in a
single font anyway.

2016-10-08 19:57 GMT+02:00 James Kass :

> Google and Monotype unveil The Noto Project's unified font for all
> languages:
> https://techcrunch.com/2016/10/06/google-and-monotype-
> unveil-the-noto-projects-unified-font-for-all-languages/
>
> About ten years or so ago, I recall being actively discouraged from
> working on the Code2xxx fonts because pan-Unicode fonts were pass?, because
> there was no perceived need for displaying multilingual text in a coherent
> typeface, and that the optimal solution was for people to simply have
> multiple fonts targeting the users' required scripts.
>
> Ironic, isn't it?
>
> Best regards,
>
> James Kass
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From charupdate at orange.fr Sat Oct 8 14:45:28 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sat, 8 Oct 2016 21:45:28 +0200 (CEST)
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>
Message-ID: <970158823.8821.1475955929007.JavaMail.www@wwinf1j26>

On Fri, 07 Oct 2016 09:22:21 -0700, Doug Ewell wrote:

> Marcel Schneider wrote:
>
>> According to my hypothesis and while waiting, I believe that
>> the intent of the gap kept in the superscript lowercase range,
>> is to maintain a limitation to the performance of plain text.
>> I don't see very well how to apply Hanlon's razor here, because
>> there seems to be a strong unwillingness to see people getting
>> keyboards that allow them to write in plain text without being
>> bound to high-end software. The goal seems to be to keep the users
>> dependent on a special formatting feature and to draw them away
>> from simplicity.
>
> Hanlon's Razor doesn't apply here, because it's not a dichotomy between
> malice and stupidity.

*If* the comment[1] on the proposal to encode *MODIFIER LETTER SMALL Q
had the status of a newspaper article, I really *could* apply Hanlon?s Razor,
and the issue would be settled. Sadly it hasn?t. More, this paper encloses
the only *known* reason(s) why the UTC was drawn to reject the proposal.

>
> Unicode has a particular definition of what constitutes "plain text,"
> and it's become evident over the past 25 years that some people have
> different definitions. That's probably never going to change (I
> personally don't believe the difference between black-and-white pictures
> of cows and color pictures of cows is a plain-text distinction),

Unicode has added the distinction between text style and emoji style, and
I never doubted that there are good reasons to do so. As I understand it,
this allows to multiply the number of emoji without any expense of scalar
values, for the streamlined implementation of an enhanced performance of
plain text. There is a big forthcoming benefit for users all over the world,
not just Latin script, or not just one language community. Or not just
the international keyboard standard, if this is the point here.

> but the Unicode definition is really the one that matters in discussions
> like this.

This is why the proposer did use it. Let?s quote him:

On 2010-07-13, Karl Pentzlin wrote:[2]

>>> French abbreviations of single words often are done by showing
>>> the last letter, phoneme, or syllable of the word as superscript,
>>> instead of showing an abbreviation dot or similar.
>>> As abbreviations of this kind are plain text, the abbreviation method
>>> being a fixed convention like the use of punctuation marks, it is
>>> desirable to have the possibility to use modifier letters in this case,
>>> rather than to have to rely on markup or higher level protocols.

The Unicode Standard says:[4]

>>>> The relationship between appearance and content of plain text
>>>> may be summarized as follows:
>>>> Plain text must contain enough information to permit the text
>>>> to be rendered legibly, and nothing more.
>>>> The Unicode Standard encodes plain text.

On 2010-08-10, Karl Pentzlin wrote:[3]

>>> On the other hand: "Biblio^que" (abbreviation for French "Biblioth?que")
>>> does not have the same meaning as "Biblioque" (no valid French word).
>>> Thus, here the use of superscript carries semantic, and is therefore
>>> plain text.

>
> What doesn't help, IMHO, is to claim that UTC has some ulterior motive
> to restrict the applicability of plain text and manipulate users and
> "draw them away from simplicity." I think insinuations of evil intent
> need to be better-founded than that.

First I wish to thank you for having posted this analysis, making me thus
aware that the wording of my hypothesis was lacking clarity.
The ?unwillingness? that I?ve deciphered, is NOT UTC?s.

I think that a clear distinction ought to be drawn between *the UTC* as
a whole, whose motives in this case I?ve asked for and have not been
given any idea, while staying firmly convinced that it is always benevolent
and eager to help all language communities to express themselves and to be
recorded, and on the other hand some hypothetical kind of lobbying that led
to produce the cited comment,[1] which in itself is enough to question the
forces implied, and which interest they might have in keeping one language
community away from fully unambiguous expression in plain text, and beyond,
in unsupporting the work of ISO/IEC SC35/WG1[5] for enhancement and
completion of the international keyboard standard.

There is also a *really long* answer in my (plain) text editor.
It?s finally not sent to the Unicode Mailing List. /*except on request*/

Regards,
Marcel

[1] The comment on the proposal:
http://www.unicode.org/L2/L2010/10315-comment.pdf
[2] The proposal:
http://www.unicode.org/L2/L2010/10230-modifier-q.pdf
[3] The proposers comment on the comment and the proposal:
http://www.unicode.org/L2/L2010/10316-cmts.pdf
[4] On page 19 of TUS 9.0.
[5] On Mon Jan 04 2010 - 19:37:45 CST, Karl Pentzlin wrote:

> Microsoft is to be praised for its engagement in providing localized
> variants of its operating system and other software, thus supporting
> the cultural diversity. It is a pity that the company did not accept
> the invitation to participate in the special area covered by ISO/IEC
> SC35/WG1, to support their own goals there.

Please read full discussion:
http://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0040.html

From luke at dashjr.org Sat Oct 8 18:44:03 2016
From: luke at dashjr.org (Luke Dashjr)
Date: Sat, 8 Oct 2016 23:44:03 +0000
Subject: Noto unified font
In-Reply-To:
References:
Message-ID: <201610082344.04995.luke@dashjr.org>

On Saturday, October 08, 2016 5:57:41 PM James Kass wrote:
> Google and Monotype unveil The Noto Project's unified font for all
> languages:
> https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-proje
> cts-unified-font-for-all-languages/

It's unfortunate they released it under the non-free OFL license. :(

From samjnaa at gmail.com Sat Oct 8 18:50:40 2016
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Sun, 9 Oct 2016 05:20:40 +0530
Subject: Noto unified font
In-Reply-To: <201610082344.04995.luke@dashjr.org>
References:
<201610082344.04995.luke@dashjr.org>
Message-ID:

Interested to know why you think OFL is non-free...

On 9 Oct 2016 05:18, "Luke Dashjr" wrote:

> On Saturday, October 08, 2016 5:57:41 PM James Kass wrote:
> > Google and Monotype unveil The Noto Project's unified font for all
> > languages:
> > https://techcrunch.com/2016/10/06/google-and-monotype-
> unveil-the-noto-proje
> > cts-unified-font-for-all-languages/
>
> It's unfortunate they released it under the non-free OFL license. :(
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From luke at dashjr.org Sat Oct 8 19:00:33 2016
From: luke at dashjr.org (Luke Dashjr)
Date: Sun, 9 Oct 2016 00:00:33 +0000
Subject: Noto unified font
In-Reply-To:
References:
<201610082344.04995.luke@dashjr.org>

Message-ID: <201610090000.35037.luke@dashjr.org>

It forbids sale of the font by itself. (I'm aware the FSF thinks there's a
loophole by bundling "hello world", but I don't think this would hold up in
court.)

On Saturday, October 08, 2016 11:50:40 PM Shriramana Sharma wrote:
> Interested to know why you think OFL is non-free...
>
> On 9 Oct 2016 05:18, "Luke Dashjr" wrote:
> > On Saturday, October 08, 2016 5:57:41 PM James Kass wrote:
> > > Google and Monotype unveil The Noto Project's unified font for all
> > > languages:
> > > https://techcrunch.com/2016/10/06/google-and-monotype-> >
> > unveil-the-noto-proje
> >
> > > cts-unified-font-for-all-languages/
> >
> > It's unfortunate they released it under the non-free OFL license. :(

From samjnaa at gmail.com Sat Oct 8 19:16:37 2016
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Sun, 9 Oct 2016 05:46:37 +0530
Subject: Noto unified font
In-Reply-To:
References:
<201610082344.04995.luke@dashjr.org>

<201610090000.35037.luke@dashjr.org>

Message-ID:

That's your definition of non-free then... If I were a font developer and
of mind to release my font for use without charge, I wouldn't want anyone
else to make money out of selling it when I myself - who put the effort
into preparing it - don't make money from selling it. So it protects the
moral rights of the developer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From jameskasskrv at gmail.com Sat Oct 8 19:20:20 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Sat, 8 Oct 2016 16:20:20 -0800
Subject: Noto unified font
In-Reply-To:
References:

Message-ID:

Philippe Verdy wrote,

> Technically it is not a single font but a coherent collection of fonts made
> specifically for each script ...

In a constantly changing world, it should be a pleasant experience to
be reminded
that some things remain constant.

Whether the Noto font family is released as one file or many, it seems that
somebody considers it a worthwhile endeavor.

Longtime Unicode proponents remember when complex script shaping (for
example) wasn't supported. Nowadays, thanks in good part to Unicode
pioneers,
most everything just works "right out of the box".

As it should.

With the advent of the Noto font (or font collection), users have the option of
getting a reasonable display of desired characters rather than strings of boxes
or last resort fallbacks. That's also as it should be, IMHO.

Best regards,

James Kass

On Sat, Oct 8, 2016 at 11:08 AM, Philippe Verdy wrote:
> Technically it is not a single font but a coherent collection of fonts made
> specifically for each script (some scripts having several national variants,
> notably for sinographs, most of them having two styles except symbols, most
> of them having two weights, except symbols that have a single weight and
> sinograms having more...)
>
> So no they are not "pan-Unicode". Each font in the collection however has
> its own metrics, best suited for each script, and they are still made to
> harmonize together (tested side-by-side with Latin and CJK) so they look
> great in multilingual documents. It would have not been possible in a single
> font anyway.
>
>
> 2016-10-08 19:57 GMT+02:00 James Kass :
>>
>> Google and Monotype unveil The Noto Project's unified font for all
>> languages:
>>
>> https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-projects-unified-font-for-all-languages/
>>
>> About ten years or so ago, I recall being actively discouraged from
>> working on the Code2xxx fonts because pan-Unicode fonts were pass?, because
>> there was no perceived need for displaying multilingual text in a coherent
>> typeface, and that the optimal solution was for people to simply have
>> multiple fonts targeting the users' required scripts.
>>
>> Ironic, isn't it?
>>
>> Best regards,
>>
>> James Kass
>
>

From gwalla at gmail.com Sat Oct 8 20:02:56 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Sat, 8 Oct 2016 18:02:56 -0700
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

Message-ID:

On Sat, Oct 8, 2016 at 9:31 AM, Philippe Verdy wrote:

> Markup for rotation is highly underdeveloped, and in this case for chess
> it has its own semantics, it's not just a presentation feature, possibly
> meant for playing on larger boards with more players than 2, and
> distinguished just like there's a distinction between white and black, or
> meant to signal some dangerous positions or candidate target positions for
> the next moves.
>

Not exactly. Rotation of chess piece symbols is not a presentation feature
(at least as I understand the term), and isn't meant for use with
multiplayer games. The rotated pieces are used in chess problems,
specifically heterodox or "fairy chess" problems, where they stand in for
non-standard pieces. A rotated rook, for instance, means "a piece that is
not a rook but is similar in some respects"; which piece it represents
specifically depends on context. Conventionally, the upside-down queen
represents a "grasshopper" and the upside-down knight a "nightrider", but
otherwise they are assigned on a problem-by-problem basis. This practice
dates back to the early 20th century and was originally so that problem
composers wouldn't have to cut new type for every new piece they invent but
is now traditional.

I also see some additions like florettes, and elephants needed for
> traditional Asian variants of the game, plus combined forms (e.g.
> tower+horse) which are quite intrigating.
> There are also variants rotated 45 degrees.
>

The florettes are also used in problems, as are the equihoppers (the symbol
that looks a bit like a bow tie or spindle). The compound symbols are found
in problems and in several common variants such as Capablanca Chess and
Grand Chess. The jester's cap is similar. The elephant and fers are used in
shatranj or medieval chess.

> All those are not just meant for display on the grid of a board but in
> discussions about strategies. There are also combining notations added on
> top of chess pieces (e.g. numbering pawns that are otherwise identical, but
> in plain text you can still use notations with superscript digits or
> letters, distinguished clearly from the numbering of grid positions, or by
> adding some other punctuation marks).
>

I haven't encountered that. It's rarely necessary to differentiate
individual pawns in notation: their moves are so limited that it's usually
obvious which pawn is moving, and there is a standard method of
disambiguating moves by starting square if needed.

> I still don't see in these images the elephants (or other pieces like
> unmovable rocks or rivers, or special pieces added to create handicaps for
> one of the player). I've also seen some chess players using special queens
> by putting a pawn on top of a nother falt pawn, with more limited movements
> than a standard queen. There are also bishops/sorcerers/magicians, eagles,
> dragoons, tigers/lions, rats, dogs/foxes, snakes,
> spiders, soldiers/archers, canons, walls/fortresses, gold/treasures...
> Chess games have a lot of variants with their supporters. Modern movies are
> also promoting some variants.
>

There are elephants in the proposal, using a shape found in medieval
manuscripts. Rocks and rivers are board features and not found in notation.

>
> 2016-10-08 17:24 GMT+02:00 Ken Shirriff :
>
>>
>> Looking at the image, the idea of the proposal is to include chess piece
>> symbols in all four 90? rotations? Wouldn't it be better to do this in
>> markup than in Unicode? I fear a combinatorial explosion if Unicode starts
>> including all the possible orientations of characters. (Maybe there's a
>> good reason to do this for chess; I'm just going off the image
>>
>> .)
>>
>
The proposal covers this. These have a well-established use in chess
notation, which doesn't apply to non-chess symbols. Markup would be the
wrong way to do this. It's not like, say, electronic schematics where a
diode symbol may be found in any orientation but still always represents a
diode: a rotated queen symbol is specifically *not a queen* but another
piece entirely.

Currently, fairy chess problemists rely on font hacks and PDFs (even for
relatively short texts).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From leoboiko at namakajiri.net Sat Oct 8 21:02:56 2016
From: leoboiko at namakajiri.net (Leonardo Boiko)
Date: Sat, 8 Oct 2016 23:02:56 -0300
Subject: Noto unified font
In-Reply-To:
References:
<201610082344.04995.luke@dashjr.org>

<201610090000.35037.luke@dashjr.org>

Message-ID:

That's not "his" definition of non-free. Restrictions on selling copies
commercially violate the Free Software Foundation's definition of non-free:
https://www.gnu.org/philosophy/free-sw.html
https://www.gnu.org/licenses/license-list.html#NonFreeSoftwareLicenses

And also the Open Source Initiative's definition of non-free:
https://opensource.org/osd-annotated
https://opensource.org/faq#commercial

And also the Debian project's definition of non-free:
https://www.debian.org/social_contract#guidelines

In short, every single major free software organization requires free
software to allow the user complete freedom of redistribution, commercial
or otherwise. Otherwise the software isn't free in the sense of giving the
user freedom; it is merely free of charge.

2016-10-08 21:16 GMT-03:00 Shriramana Sharma :

> That's your definition of non-free then... If I were a font developer and
> of mind to release my font for use without charge, I wouldn't want anyone
> else to make money out of selling it when I myself - who put the effort
> into preparing it - don't make money from selling it. So it protects the
> moral rights of the developer.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From luke at dashjr.org Sat Oct 8 21:50:42 2016
From: luke at dashjr.org (Luke Dashjr)
Date: Sun, 9 Oct 2016 02:50:42 +0000
Subject: Noto unified font
In-Reply-To: <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>
References:
<201610082344.04995.luke@dashjr.org>
<8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>
Message-ID: <201610090250.44483.luke@dashjr.org>

On Sunday, October 09, 2016 12:08:05 AM Harshula wrote:
> On 09/10/16 10:44, Luke Dashjr wrote:
> > It's unfortunate they released it under the non-free OFL license. :(
>
> Which alternate license would you recommend?

MIT license or LGPL seem reasonable and common among free fonts. Some also
choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply to
fonts.

On Sunday, October 09, 2016 12:16:37 AM you wrote:
> That's your definition of non-free then... If I were a font developer and
> of mind to release my font for use without charge, I wouldn't want anyone
> else to make money out of selling it when I myself - who put the effort
> into preparing it - don't make money from selling it. So it protects the
> moral rights of the developer.

It's the standard definition of free software.

https://www.gnu.org/philosophy/selling.en.html

From harshula at hj.id.au Sat Oct 8 19:08:05 2016
From: harshula at hj.id.au (Harshula)
Date: Sun, 9 Oct 2016 11:08:05 +1100
Subject: Noto unified font
In-Reply-To: <201610082344.04995.luke@dashjr.org>
References:
<201610082344.04995.luke@dashjr.org>
Message-ID: <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>

On 09/10/16 10:44, Luke Dashjr wrote:
> It's unfortunate they released it under the non-free OFL license. :(

Which alternate license would you recommend?

cya,
#

From harshula at hj.id.au Sat Oct 8 22:35:36 2016
From: harshula at hj.id.au (Harshula)
Date: Sun, 9 Oct 2016 14:35:36 +1100
Subject: Noto unified font
In-Reply-To: <201610090250.44483.luke@dashjr.org>
References:
<201610082344.04995.luke@dashjr.org>
<8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>
<201610090250.44483.luke@dashjr.org>
Message-ID: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

On 09/10/16 13:50, Luke Dashjr wrote:
> On Sunday, October 09, 2016 12:08:05 AM Harshula wrote:
>> On 09/10/16 10:44, Luke Dashjr wrote:
>>> It's unfortunate they released it under the non-free OFL license. :(

FSF appears to classify OFL as a Free license (though incompatible with
the GNU GPL & FDL):
https://www.gnu.org/licenses/license-list.en.html#Fonts

>> Which alternate license would you recommend?
>
> MIT license or LGPL seem reasonable and common among free fonts. Some also
> choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply to
> fonts.

Interestingly, Noto project saw advantages of OFL and moved to using it,
not too long ago:
https://github.com/googlei18n/noto-fonts/blob/master/NEWS

It seems you disagree with FSF's interpretation of the OFL and bundling
Hello World as being sufficient. Are there other reasons for your
preference for MIT/LGPL/GPL over OFL?

> On Sunday, October 09, 2016 12:16:37 AM you wrote:
>> That's your definition of non-free then... If I were a font developer and
>> of mind to release my font for use without charge, I wouldn't want anyone
>> else to make money out of selling it when I myself - who put the effort
>> into preparing it - don't make money from selling it. So it protects the
>> moral rights of the developer.

Why are you attributing Shriramana Sharma's email to me? It might be
clearer if you replied to his email.

cya,
#

From verdy_p at wanadoo.fr Sat Oct 8 23:21:32 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 9 Oct 2016 06:21:32 +0200
Subject: Noto unified font
In-Reply-To:
References:

Message-ID:

2016-10-09 2:20 GMT+02:00 James Kass :

> Philippe Verdy wrote,
>
> > Technically it is not a single font but a coherent collection of fonts
> made
> > specifically for each script ...
>
> In a constantly changing world, it should be a pleasant experience to
> be reminded that some things remain constant.
>
> Whether the Noto font family is released as one file or many, it seems that
> somebody considers it a worthwhile endeavor.
>

The major reason there are several fonts and not just one is because not
all scripts have the same variants and styles (and it's not a defect of the
design). And there are different requirements for example allowing choosing
preferences between color or monochrmatic emojis, using standard (narrow)
Latin from Noto Sans, or wider variants of Latin for CJK: in a stylesheet
you can still customize the order even if Noto Sans will be part of all
sets of families. Some variants don't make sens at all for Arabic
(sans-serif and serif, but are replaced by two traditional variants of the
script); monospaced fonts are also not available for Arabic (they exist but
are extremely poor), or many Indic scripts. The purpose is not to invent
new designs but present designs that are easily read and convenient for
each script (and that's why there are also more weights in the CJK fonts;
for Latin additional weights way be directly infered from the two stadnard
weights, may be later there will be Latin/Greek/Cyrillic with more weights,
but the need was less urgent than for CJK due to its complexity to make it
readable and still preserve a coherent overall blackness/contrast).

May be some fonts in this set could be merged, e.g. the Cherokee font could
be merged with the Latin/Greek/Cyrillic font.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Sat Oct 8 23:37:24 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 9 Oct 2016 06:37:24 +0200
Subject: Noto unified font
In-Reply-To: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>
References:
<201610082344.04995.luke@dashjr.org>
<8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>
<201610090250.44483.luke@dashjr.org>
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>
Message-ID:

The licence itself says it respects the 4 FSF freedoms. It also explicitly
allows reselling (rule DFSG #1):
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL

It is not directly compatible with the GPL in a composite product, but with
LGPL there's no problem, and there's no problem if the font is clearly
separable and distributed along with its licence, even if the software
coming with it or the package containing it is commercial: you are allowed
to detach it from the package and redistribute.

Really you are challenging the licence for unfair reasons
May be you just think that the GPL or MIT licences are enough.

Or you'd like the Public Domain (which in fact offers no protection and no
long term warranty as it is reappropriatable at any time by proprietary
licences, even retrospectively, we see everyday companies registering
properties on pseudo-new technologies that are in fact inherited from the
past and are used since centuries or more by the whole humanity, they leave
some space only for today's current usages in limtied scopes, but protect
everything else by inventing some strange concepts around the basic
feature, with unfair claims and then want to collect taxes). Also an
international public domain does not exist at all (it is always restricted
by new additions to the copyright laws). Publishing somethingf in the
Public domain is really unsafe.

2016-10-09 5:35 GMT+02:00 Harshula :

> On 09/10/16 13:50, Luke Dashjr wrote:
> > On Sunday, October 09, 2016 12:08:05 AM Harshula wrote:
> >> On 09/10/16 10:44, Luke Dashjr wrote:
> >>> It's unfortunate they released it under the non-free OFL license. :(
>
> FSF appears to classify OFL as a Free license (though incompatible with
> the GNU GPL & FDL):
> https://www.gnu.org/licenses/license-list.en.html#Fonts
>
> >> Which alternate license would you recommend?
> >
> > MIT license or LGPL seem reasonable and common among free fonts. Some
> also
> > choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply
> to
> > fonts.
>
> Interestingly, Noto project saw advantages of OFL and moved to using it,
> not too long ago:
> https://github.com/googlei18n/noto-fonts/blob/master/NEWS
>
> It seems you disagree with FSF's interpretation of the OFL and bundling
> Hello World as being sufficient. Are there other reasons for your
> preference for MIT/LGPL/GPL over OFL?
>
> > On Sunday, October 09, 2016 12:16:37 AM you wrote:
> >> That's your definition of non-free then... If I were a font developer
> and
> >> of mind to release my font for use without charge, I wouldn't want
> anyone
> >> else to make money out of selling it when I myself - who put the effort
> >> into preparing it - don't make money from selling it. So it protects the
> >> moral rights of the developer.
>
> Why are you attributing Shriramana Sharma's email to me? It might be
> clearer if you replied to his email.
>
> cya,
> #
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From jameskasskrv at gmail.com Sun Oct 9 00:26:11 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Sat, 8 Oct 2016 21:26:11 -0800
Subject: Noto unified font
In-Reply-To:
References:
<201610082344.04995.luke@dashjr.org>
<8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>
<201610090250.44483.luke@dashjr.org>
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

Message-ID:

Philippe Verdy wrote,

> The purpose is not to invent new designs but present designs
> that are easily read and convenient for each script ...

Based on what I've seen so far, Monotype has done a splendid job. No
doubt involving plenty of design work. Philippe Verdy has outlined
some of the design decisions already, and it should be noted that
designing a pan-Unicode font (or font collection) for multilingual
text display using easily read script-conventional glyphs probably
isn't as easy as it sounds.

The word "free" when applied to any product means "free of charge".

"Freeware" appears to be a contraction of "free software". If so, the
two terms are identical in meaning. If not, speakers of standard
English would consider them so. It's too bad the promoters of
"free-libre" software didn't call it "libre". Creating an artificial
distinction between identical terms in order to promote a philosophy
some reject smacks of Newspeak.

Best regards,

James Kass

From luke at dashjr.org Sun Oct 9 01:17:57 2016
From: luke at dashjr.org (Luke Dashjr)
Date: Sun, 9 Oct 2016 06:17:57 +0000
Subject: Noto unified font
In-Reply-To:
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

Message-ID: <201610090617.59735.luke@dashjr.org>

On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote:
> The licence itself says it respects the 4 FSF freedoms. It also explicitly
> allows reselling (rule DFSG #1):
> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL

No, it doesn't. That link is just a commentary, and of no relevance to non-
SIL-owned fonts.

The actual license itself begins with the problematic restriction:

1) Neither the Font Software nor any of its individual components,
in Original or Modified Versions, may be sold by itself.

> It is not directly compatible with the GPL in a composite product, but with
> LGPL there's no problem,

LGPL doesn't work that way. It allows other software to use it without being
compatible, but any component or dependency of the LGPL'd software must meet
the same requirements as the GPL.

> Really you are challenging the licence for unfair reasons

What unfair reasons are those? My *only* concern is that it is not free.
I don't even care to sell the fonts myself, but simply do not use non-free
software on principle.

Luke

From prosfilaes at gmail.com Sun Oct 9 01:36:43 2016
From: prosfilaes at gmail.com (David Starner)
Date: Sun, 09 Oct 2016 06:36:43 +0000
Subject: Noto unified font
In-Reply-To:
References:
<201610082344.04995.luke@dashjr.org>
<8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>
<201610090250.44483.luke@dashjr.org>
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

Message-ID:

On Sat, Oct 8, 2016 at 11:07 PM James Kass wrote:

> The word "free" when applied to any product means "free of charge".
>

Using the word "product" sort of biases your argument, does it not?

"Freeware" appears to be a contraction of "free software". If so, the
> two terms are identical in meaning.

That's bad lexicography. A "PC" is not merely a computer that is personal.
"software" is not "ware" that is "soft".

The first use of the word freeware was in late 1982, and the use of free
software was used in Infoworld in 1983 to refer to public domain software.
The distinction has been around for a long time.

It's too bad the promoters of
> "free-libre" software didn't call it "libre". Creating an artificial
> distinction between identical terms in order to promote a philosophy
> some reject smacks of Newspeak.
>

Which someone else would complain about. That is one of the meanings of
"free" in English. English is a large confusing language with many
communities with their own jargon, and for 30 years "free software" has
referred to software that can be used without restriction on changing and
reselling in certain English speaking communities. Like British/American
disagreements, it seems to be a problem more frequently of people getting
annoyed than people getting confused.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From liste at secarica.ro Sun Oct 9 03:15:52 2016
From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=)
Date: Sun, 9 Oct 2016 11:15:52 +0300
Subject: Noto unified font
In-Reply-To: <201610090000.35037.luke@dashjr.org>
References:
<201610082344.04995.luke@dashjr.org>

<201610090000.35037.luke@dashjr.org>
Message-ID: <20161009111552.86e86c61201dfb753e0b778c@secarica.ro>

?n data de Sun, 9 Oct 2016 00:00:33 +0000, Luke Dashjr a scris:

> It forbids sale of the font by itself.

I would say "big deal".

A font belongs merely to the "cultural" side of a project or product. I this area it is better to discourage any commercial interests in order to serve better the cultural aspects and avoid any [artificial] obstacles.

So, I fail to understand why the forbid the sale of the font itself is a problem or a bad thing. On contrary !

Cristi

--
Cristian Secar?
http://www.sec?ric?.ro

From dzo at bisharat.net Sun Oct 9 05:05:18 2016
From: dzo at bisharat.net (dzo at bisharat.net)
Date: Sun, 9 Oct 2016 10:05:18 +0000
Subject: Noto unified font
In-Reply-To:
References:

Message-ID: <1492952671-1476007520-cardhu_decombobulator_blackberry.rim.net-289303052-@b13.c1.bise6.blackberry>

James, Any thoughts about a Code 2xxx suite/family based on all the work you've already done?

All, A tangential question wrt the history of computer font development: What kind of collections / repositories of old fonts are there? In particular, thinking of pre-Unicode "special fonts" including hacks for languages written with extended Latin characters.

I understand that Chantal Enguehard has a collection of 8-bit fonts developed for African languages. Are there others? Any thoughts about a "museum" of fonts and encodings? Could have educational value in the future.

Don Osborn

Sent via BlackBerry by AT&T

-----Original Message-----
From: James Kass
Sender: "Unicode" Date: Sat, 8 Oct 2016 16:20:20
To: Unicode Public
Subject: Re: Noto unified font

Philippe Verdy wrote,

> Technically it is not a single font but a coherent collection of fonts made
> specifically for each script ...

In a constantly changing world, it should be a pleasant experience to
be reminded
that some things remain constant.

Whether the Noto font family is released as one file or many, it seems that
somebody considers it a worthwhile endeavor.

Longtime Unicode proponents remember when complex script shaping (for
example) wasn't supported. Nowadays, thanks in good part to Unicode
pioneers,
most everything just works "right out of the box".

As it should.

With the advent of the Noto font (or font collection), users have the option of
getting a reasonable display of desired characters rather than strings of boxes
or last resort fallbacks. That's also as it should be, IMHO.

Best regards,

James Kass

On Sat, Oct 8, 2016 at 11:08 AM, Philippe Verdy wrote:
> Technically it is not a single font but a coherent collection of fonts made
> specifically for each script (some scripts having several national variants,
> notably for sinographs, most of them having two styles except symbols, most
> of them having two weights, except symbols that have a single weight and
> sinograms having more...)
>
> So no they are not "pan-Unicode". Each font in the collection however has
> its own metrics, best suited for each script, and they are still made to
> harmonize together (tested side-by-side with Latin and CJK) so they look
> great in multilingual documents. It would have not been possible in a single
> font anyway.
>
>
> 2016-10-08 19:57 GMT+02:00 James Kass :
>>
>> Google and Monotype unveil The Noto Project's unified font for all
>> languages:
>>
>> https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-projects-unified-font-for-all-languages/
>>
>> About ten years or so ago, I recall being actively discouraged from
>> working on the Code2xxx fonts because pan-Unicode fonts were pass?, because
>> there was no perceived need for displaying multilingual text in a coherent
>> typeface, and that the optimal solution was for people to simply have
>> multiple fonts targeting the users' required scripts.
>>
>> Ironic, isn't it?
>>
>> Best regards,
>>
>> James Kass
>
>

From mark at macchiato.com Sun Oct 9 06:00:30 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Sun, 9 Oct 2016 13:00:30 +0200
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

Message-ID:

Essentially all of the game pieces that are in Unicode were added for
compatibility with existing character sets. ?I'm guessing that ?there are
hundreds to thousands of possible other symbols associated with games in
one way or another, or that could be dug out of instruction manuals (eg,
http://www.catan.com/files/downloads/catan_5th_ed_rules_eng_150303.pdf).
(Many of those would be encumbered by copyright issues, but there are no
doubt others that would not.)

I would recommend that any proposal for additional game symbols provide
clear evidence for why those particular game symbols are required to be
exchanged in plain text, in a way that many, many other possible game
symbols are not.

Mark

On Sun, Oct 9, 2016 at 3:02 AM, Garth Wallace wrote:

> On Sat, Oct 8, 2016 at 9:31 AM, Philippe Verdy wrote:
>
>> Markup for rotation is highly underdeveloped, and in this case for chess
>> it has its own semantics, it's not just a presentation feature, possibly
>> meant for playing on larger boards with more players than 2, and
>> distinguished just like there's a distinction between white and black, or
>> meant to signal some dangerous positions or candidate target positions for
>> the next moves.
>>
>
> Not exactly. Rotation of chess piece symbols is not a presentation feature
> (at least as I understand the term), and isn't meant for use with
> multiplayer games. The rotated pieces are used in chess problems,
> specifically heterodox or "fairy chess" problems, where they stand in for
> non-standard pieces. A rotated rook, for instance, means "a piece that is
> not a rook but is similar in some respects"; which piece it represents
> specifically depends on context. Conventionally, the upside-down queen
> represents a "grasshopper" and the upside-down knight a "nightrider", but
> otherwise they are assigned on a problem-by-problem basis. This practice
> dates back to the early 20th century and was originally so that problem
> composers wouldn't have to cut new type for every new piece they invent but
> is now traditional.
>
> I also see some additions like florettes, and elephants needed for
>> traditional Asian variants of the game, plus combined forms (e.g.
>> tower+horse) which are quite intrigating.
>> There are also variants rotated 45 degrees.
>>
>
> The florettes are also used in problems, as are the equihoppers (the
> symbol that looks a bit like a bow tie or spindle). The compound symbols
> are found in problems and in several common variants such as Capablanca
> Chess and Grand Chess. The jester's cap is similar. The elephant and fers
> are used in shatranj or medieval chess.
>
>
>> All those are not just meant for display on the grid of a board but in
>> discussions about strategies. There are also combining notations added on
>> top of chess pieces (e.g. numbering pawns that are otherwise identical, but
>> in plain text you can still use notations with superscript digits or
>> letters, distinguished clearly from the numbering of grid positions, or by
>> adding some other punctuation marks).
>>
>
> I haven't encountered that. It's rarely necessary to differentiate
> individual pawns in notation: their moves are so limited that it's usually
> obvious which pawn is moving, and there is a standard method of
> disambiguating moves by starting square if needed.
>
>
>> I still don't see in these images the elephants (or other pieces like
>> unmovable rocks or rivers, or special pieces added to create handicaps for
>> one of the player). I've also seen some chess players using special queens
>> by putting a pawn on top of a nother falt pawn, with more limited movements
>> than a standard queen. There are also bishops/sorcerers/magicians, eagles,
>> dragoons, tigers/lions, rats, dogs/foxes, snakes,
>> spiders, soldiers/archers, canons, walls/fortresses, gold/treasures...
>> Chess games have a lot of variants with their supporters. Modern movies are
>> also promoting some variants.
>>
>
> There are elephants in the proposal, using a shape found in medieval
> manuscripts. Rocks and rivers are board features and not found in notation.
>
>
>>
>> 2016-10-08 17:24 GMT+02:00 Ken Shirriff :
>>
>>>
>>> Looking at the image, the idea of the proposal is to include chess piece
>>> symbols in all four 90? rotations? Wouldn't it be better to do this in
>>> markup than in Unicode? I fear a combinatorial explosion if Unicode starts
>>> including all the possible orientations of characters. (Maybe there's a
>>> good reason to do this for chess; I'm just going off the image
>>>
>>> .)
>>>
>>
> The proposal covers this. These have a well-established use in chess
> notation, which doesn't apply to non-chess symbols. Markup would be the
> wrong way to do this. It's not like, say, electronic schematics where a
> diode symbol may be found in any orientation but still always represents a
> diode: a rotated queen symbol is specifically *not a queen* but another
> piece entirely.
>
> Currently, fairy chess problemists rely on font hacks and PDFs (even for
> relatively short texts).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Sun Oct 9 06:28:55 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 9 Oct 2016 13:28:55 +0200
Subject: Noto unified font
In-Reply-To: <201610090617.59735.luke@dashjr.org>
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

<201610090617.59735.luke@dashjr.org>
Message-ID:

2016-10-09 8:17 GMT+02:00 Luke Dashjr :

> On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote:
> > The licence itself says it respects the 4 FSF freedoms. It also
> explicitly
> > allows reselling (rule DFSG #1):
> > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL
>
> No, it doesn't. That link is just a commentary, and of no relevance to non-
> SIL-owned fonts.
>

The link is the one directly used on the Noto description page when it
refers to the OFL licence. It is not saying that it is only for SIL-owned
fonts. Google/Monotype would have linked to another page if needed but this
is the most relevant one explicitly stated by Google on the Noto site.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From oren.watson at gmail.com Sun Oct 9 07:14:50 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Sun, 9 Oct 2016 08:14:50 -0400
Subject: Noto unified font
In-Reply-To:
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

<201610090617.59735.luke@dashjr.org>

Message-ID:

I am disappointed with Noto Mono, which only covers Latin script, and not
Greek, and Cyrillic when most existing monospace fonts do.

On Sun, Oct 9, 2016 at 7:28 AM, Philippe Verdy wrote:

>
>
> 2016-10-09 8:17 GMT+02:00 Luke Dashjr :
>
>> On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote:
>> > The licence itself says it respects the 4 FSF freedoms. It also
>> explicitly
>> > allows reselling (rule DFSG #1):
>> > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL
>>
>> No, it doesn't. That link is just a commentary, and of no relevance to
>> non-
>> SIL-owned fonts.
>>
>
> The link is the one directly used on the Noto description page when it
> refers to the OFL licence. It is not saying that it is only for SIL-owned
> fonts. Google/Monotype would have linked to another page if needed but this
> is the most relevant one explicitly stated by Google on the Noto site.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From haberg-1 at telia.com Sun Oct 9 08:01:09 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Sun, 9 Oct 2016 15:01:09 +0200
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

Message-ID:

> On 9 Oct 2016, at 13:00, Mark Davis ?? wrote:
>
> Essentially all of the game pieces that are in Unicode were added for compatibility with existing character sets. ?I'm guessing that ?there are hundreds to thousands of possible other symbols associated with games in one way or another,

There is http://www.chessvariants.com/.

From charupdate at orange.fr Sun Oct 9 08:25:25 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 9 Oct 2016 15:25:25 +0200 (CEST)
Subject: Bit arithmetic on Unicode characters? / Re: Why incomplete
subscript/superscript alphabet ?
In-Reply-To:
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

Message-ID: <882230670.5591.1476019525557.JavaMail.www@wwinf1p27>

On Sun, 9 Oct 2016 13:00:30 +0200, Mark Davis ?? wrote:

[?]
>
> I would recommend that any proposal for additional game symbols provide
> clear evidence for why those particular game symbols are required to be
> exchanged in plain text, in a way that many, many other possible game
> symbols are not.

I missed this point: ?are required to be EXCHANGED in plain text.?

Would it be possible to add this as a requirement into the relevant section
of TUS, please? Indeed I can?t see any need to feed those French abbreviations
into a plain text data exchange. We?d rather write them out, or use the common
acronyms:
?BN? for ?Biblioth?que Nationale? [National Library];
?BM? for ?Biblioth?que Municipale? [City Library].

However what we can do when it comes to abbreviate ?biblioth?que? or other
words ending in ?-que? in plain text, one step I think we could do towards
disambiguation is to emit a *new* recommendation for the abbreviation dot,
that *is* already used in ?M.? for ?Monsieur? [Mister], and also in ?cf.?
and other Latin abbreviations. So in plain text one could write either
?Biblio.que? or ?Bib.que? for ?Biblioth?que? [Library].

While the official rejection rationale of *MODIFIER LETTER SMALL Q is still
missing, I can now believe that it reiterated the recommendation to use
markup, the more as MS Word does not mess up line spacing when superscript
formatting is applied, and as this is better-looking in Tahoma than modifier
letters when used to express semantics of abbreviation indicator or ordinal
indicator. I?ve run a test on ?M^gr?, for ?Monseigneur? [Monsignor], and on
?3^e?. To avoid process garbage, I?ve made the results available on-line.[1]

What got me really started, was the bizarre ?Comment? on the Proposal to
encode *MODIFIER LETTER SMALL Q. What I can do now, is to suggest to apply
some kind of quality management on both sides, so that corporate officials
refrain from publishing sloppy ad-hoc papers for consideration by the UTC,
and Unicode won?t be reduced to accept all and everything for archiving in
the Document Register.

I believe that this could be a practicable way to avoid other people to get
bugged.

Regards,
Marcel

[1] Interested subscribers are welcome to view the screenshot from:
http://dispoclavier.com/French-abbrev-super-vs-modif.png
and to open the Word document from:
http://dispoclavier.com/French-abbrev-super-vs-modif.docx

From verdy_p at wanadoo.fr Sun Oct 9 09:14:50 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 9 Oct 2016 16:14:50 +0200
Subject: Noto unified font
In-Reply-To:
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

<201610090617.59735.luke@dashjr.org>

Message-ID:

This was not the first prority of the project I think. Monospace fonts were
used for text input in web forms but this old use id now deprecating,
except probably for CJK, due to poor readability and design and the
inability to handle lot of scripts.
Monospace fonts are still used for programming languages where code is
almost always in Latin and translatable contents are preferably stored in
external resources. For editing the external resources, there's no need of
complex data structures, the format is most often linear and you don't need
any monospace fonts.

But there are still programs created mixing code and UI text in static
strings and some limited usages in internationalized regexps (which is a
sort of programming language with complex rules). I suggest that such
editors should have an interface to swtich instantly from a monospace and
normal font. There are decent text editors that are friendly with Latin/CJK
monospace fonts and proportional fonts for other scripts or symbols.

And the Noto project is not finished :

- Its monospace can still be improved to cover more than just Latin and
general punctuation.
- Adding Cyrillic, Greek, and a few other scripts that work well in
monospace styles (e.g. Hebrew, possibly Georgian and Armenian or even
Cherokee) would seem a good future goal (monospace fonts for Arabic are
most horrible, except in very creative/fancy designs, even if the Arabic
script is very flexible using long joining, but some complex ligatures
which don't fit well in a character cell).
- However it is really not needed for CJK scripts (that have their own
fonts already with monospace metrics), including the Japanese kanas and
Bopomofo (as well as mappings for subsets of Latin/Greek/Cyrillic inherited
from legacy non-Unicode charsets).

But another project should now target more urgent needs: fonts with
excellent typographic features for printing, advertizing, titling, to be
used for finalized publications (printed or in PDFs) which would be
beautiful, or that would better reproduce the best handwritten/painted
artworks, or that woudl restore the best typographic traditions used since
centuries. Peoiple now start rediscovering the beauty of these traditions
but rarely with solutions that are usable with our modern languages using a
richer repertoire of characters (many borrowed directly from other scripts
or languages), so the best-looking fonts are only designed for some limited
languages (most often the major European languages, but frequently only
Basic English and Classical Latin or Greek) :

- the serif style fonts still need extensions of their coverage (I think it
is more urgent than the monospace styles).

I like also the fact that the Noto project opted for distinguishing the two
major traditions for the Arabic script.

About each year, there's an updated version of the set, but most often this
occurs due to the extension of the universal repertoire (and it is easier
to separate the designs per script as it eases the updating process and
tests if they are just extended with some new characters, new encoded
variants, or new pairs with diacritics or complex ligatures and layouts for
Indic scripts.

And in fact I'd like that Windows Update to also include this distribution
(independantly of the many legacy fonts for MS Office). For now Noto Sans
still competes with the "Segoe" families made for the Windows UI, but it
has a limited coverage (May be Noto should be installed by default with
Chrome and Safari, probably also with JRE/JDK for Java). It is highly
preferable to the older Arial, Verdana, Times New Roman family whose
coverage is now old (but still distributed and updated with MS IE/Edge).

For monospaced fonts, "Consolas" from Microsoft is still better than Noto
and the older "Courier New".

2016-10-09 14:14 GMT+02:00 Oren Watson :

> I am disappointed with Noto Mono, which only covers Latin script, and not
> Greek, and Cyrillic when most existing monospace fonts do.
>
> On Sun, Oct 9, 2016 at 7:28 AM, Philippe Verdy wrote:
>
>>
>>
>> 2016-10-09 8:17 GMT+02:00 Luke Dashjr :
>>
>>> On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote:
>>> > The licence itself says it respects the 4 FSF freedoms. It also
>>> explicitly
>>> > allows reselling (rule DFSG #1):
>>> > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL
>>>
>>> No, it doesn't. That link is just a commentary, and of no relevance to
>>> non-
>>> SIL-owned fonts.
>>>
>>
>> The link is the one directly used on the Noto description page when it
>> refers to the OFL licence. It is not saying that it is only for SIL-owned
>> fonts. Google/Monotype would have linked to another page if needed but this
>> is the most relevant one explicitly stated by Google on the Noto site.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From liste at secarica.ro Sun Oct 9 10:25:27 2016
From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=)
Date: Sun, 9 Oct 2016 18:25:27 +0300
Subject: Noto unified font
In-Reply-To:
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

<201610090617.59735.luke@dashjr.org>

Message-ID: <20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro>

?n data de Sun, 9 Oct 2016 16:14:50 +0200, Philippe Verdy a scris:

> And the Noto project is not finished :
>
> - Its monospace can still be improved to cover more than just Latin
> and general punctuation.
> - Adding Cyrillic, Greek, and a few other scripts that work well in
> monospace styles (e.g. Hebrew, possibly Georgian and Armenian or even
> Cherokee) would seem a good future goal

I checked the NotoMono-Regular.ttf file [1]:
- Greek includes range U+0384 to U+03CE (less the reserved ones) plus U+03D1, U+03D2 and U+03D6
- Cyrillic seems to include the whole range, except for U+0487 combining mark
- Hebrew, Georgian, Armenian and Cherokee ? blanks only

The NotoSansMonoCJKxx range is poorer in this area, but still includes the "basic" Greek and Cyrillic.

Cristi

[1] from https://www.google.com/get/noto/

--
Cristian Secar?
http://www.sec?ric?.ro

From verdy_p at wanadoo.fr Sun Oct 9 11:12:57 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 9 Oct 2016 18:12:57 +0200
Subject: Noto unified font
In-Reply-To: <20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro>
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

<201610090617.59735.luke@dashjr.org>

<20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro>
Message-ID:

I meant the **complete** coverage. Basic Greek and Basic Cyrillic is not
enough.

Also I did not say that Hebrew, Georgian, Armenian and Cherokee were
included, this was a suggestion (Cherokee being largely an adaptation of
Latin+Greek+Cyrillic with some additional strokes for new letters, it could
as well be included in the default Noto Sans and could share glyphs)

2016-10-09 17:25 GMT+02:00 Cristian Secar? :

> ?n data de Sun, 9 Oct 2016 16:14:50 +0200, Philippe Verdy a scris:
>
> > And the Noto project is not finished :
> >
> > - Its monospace can still be improved to cover more than just Latin
> > and general punctuation.
> > - Adding Cyrillic, Greek, and a few other scripts that work well in
> > monospace styles (e.g. Hebrew, possibly Georgian and Armenian or even
> > Cherokee) would seem a good future goal
>
> I checked the NotoMono-Regular.ttf file [1]:
> - Greek includes range U+0384 to U+03CE (less the reserved ones) plus
> U+03D1, U+03D2 and U+03D6
> - Cyrillic seems to include the whole range, except for U+0487 combining
> mark
> - Hebrew, Georgian, Armenian and Cherokee ? blanks only
>
> The NotoSansMonoCJKxx range is poorer in this area, but still includes the
> "basic" Greek and Cyrillic.
>
> Cristi
>
> [1] from https://www.google.com/get/noto/
>
> --
> Cristian Secar?
> http://www.sec?ric?.ro
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From moyogo at gmail.com Sun Oct 9 11:23:31 2016
From: moyogo at gmail.com (Denis Jacquerye)
Date: Sun, 09 Oct 2016 16:23:31 +0000
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com>

Message-ID:

Regarding the superscript q, in some rare cases, it is used to indicate
pharyngealization or a pharyngeal consonant instead of the Latin letter
pharyngeal voiced fricative U+0295 ?, the modifier letter reversed glottal
stop U+02C1 ? or the modifier letter small reversed glottal stop U+02E4 ?.

Mena?n du Plessis uses a modifier letter small q after a vowel in ?Xam to
indicate pharyngealization of that vowel in a few papers (Notes on Qings
own languages
, A century
of the Specimens of Bushman Folklore
,
A unity hypothesis for the Southern African Khoesan languages

).

A superscript q is also used in the name of the Mquq?in/Brooks Peninsula
Provincial Park by the Ministry of Environment of British Columbia on the
dedicated page on its website
and by
the Minister of the Aboriginal Affairs and Northern Development Canada,
British Columbia Ministry of Aboriginal Relations and Reconciliation, and
the Maa-nulth First Nations in Maa-nulth First Nations Final Agreement
Implementation
Report / 2011-2012

and 2012-2013

.
Given the references on the Nuu-Chah-Nulth orthography that are online, it
seems the superscript q is used instead of the standard orthography?s Latin
letter pharyngeal voiced fricative U+0295 ? in the transcription Mquq?in.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From jameskasskrv at gmail.com Sun Oct 9 14:28:03 2016
From: jameskasskrv at gmail.com (James Kass)
Date: Sun, 9 Oct 2016 11:28:03 -0800
Subject: Noto unified font
In-Reply-To:
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

<201610090617.59735.luke@dashjr.org>

<20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro>

Message-ID:

David Starner responded,

>> The word "free" when applied to any product means "free of charge".
>
> Using the word "product" sort of biases your argument, does it not?

Webster's defines "product" as something produced by nature, industry,
or art. So an apple is a product whether it's a wild apple, a
cultivated apple, or a road apple. Software is also a product, and as
with any product, it's either free or for sale.

> ... it seems to be a problem more frequently of people getting
> annoyed than people getting confused.

Isn't confusion annoying?

Best regards,

James Kass

From verdy_p at wanadoo.fr Sun Oct 9 18:57:05 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 10 Oct 2016 01:57:05 +0200
Subject: Noto unified font
In-Reply-To:
References:
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

<201610090617.59735.luke@dashjr.org>

<20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro>

Message-ID:

I did not receive the message from David Starner you are quoting, it was
probably not sent to this list but I did not received it privately (not
even in my "spam mailbox").
Anyway I agree with your response, David Starner has a strange
interpretation of this common word (notably in the context what I used it
after "any").

However In my sense a product is the result of an process requiring an
active participation. The webster definition is a bit larger (and also
match with the meaning of the term "produit" in French, which also includes
results of natural processes such as apples or ashes from a volvano: the
term emphases the fact that there's a process of transformation from a
state to another and that the result has an added value, but of course not
necessarily a financial value by itself or a financial cost).

Here were' speaking about software (or structured data) which is always the
result of an active process going from an idea to some implementation and
its advertizing and distribution. It always has a financial cost, but this
cost is already shared and spread with the means we use to access or
distribute this result, or discuss and improve it. It also has a finanial
cost given the time devoted to make it (time is money: if you're not paid
for it, it will cost you in terms of the money you don't collect for that
time not spent on other tasks, but it also means the time gained by others
easily using the result with low costs that they will still have to support
themselves; only to receive this email, you've spent money for your FAI and
paid the bill for the electricity and spent time on your computer whose
aging will require you to change it in some months or years when it will no
longer be usable for the tools you need everyday on it).

Open sourcing a software or data or graphic design, or artistic product, or
a font here is a way to share and split the costs to smaller amounts that
more people can support, instead on giving all the money to a single
producer, assuming also all risks when investing in it for the creation,
production, distribution and support., it eliminates single points of
failure or defects by allowing more freedom for the replacement or
servicing, with lower losses and risks taken by the participants to this
process. It allows anyone participating in less tasks, that are less
compelx to them, and then delegate the rest to others in mutual
cooperations. Generally it also allows faster developements and easier
adaptations by varying methods. And instread of investing time in a single
activity, we invest time in many more, just when we need them or when we
think we may be useful and more efficient in some limited domains.

In the open sourcing processes, you have to be confident that people will
help you and you'll help them, but not just in a one-to-one relation with
direct returns and in timely delays (like in commercial contracts). You
don't order people to do things for you, you don't pay them directly, you
are also never required to donate something in exchange immediately. The
benefits are only there because you are part of the process and because
everyone gets more than what he donates (the total added value is then
larger than in private commercial relations). We are not just consumers but
also producers and creators in a collective work where the goal is largely
focused on actual needs and usages. All people like to be creative, and
it's always intereting to see many people adding their own creativity to a
project, for things we would have not imagined ourself or not expected that
they would find smarter solutions than ours.

In fact it is for the same reason that we have developed collective laws
and have governments and elected delegates, or public services all around
the world (but as opposed to them, there's no required tax to pay, no dated
bills, even if we still have rules to obey: the licence terms for which we
also want to be supported by collective laws protecting these terms against
unfairness or against abuses).

2016-10-09 21:28 GMT+02:00 James Kass :

> David Starner responded,
>
> >> The word "free" when applied to any product means "free of charge".
> >
> > Using the word "product" sort of biases your argument, does it not?
>
> Webster's defines "product" as something produced by nature, industry,
> or art. So an apple is a product whether it's a wild apple, a
> cultivated apple, or a road apple. Software is also a product, and as
> with any product, it's either free or for sale.
>
> > ... it seems to be a problem more frequently of people getting
> > annoyed than people getting confused.
>
> Isn't confusion annoying?
>
> Best regards,
>
> James Kass
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Sun Oct 9 20:03:53 2016
From: doug at ewellic.org (Doug Ewell)
Date: Sun, 9 Oct 2016 19:03:53 -0600
Subject: Noto unified font
Message-ID: <7A89301ABEEA4CFE8254349B77B82AC4@DougEwell>

Philippe Verdy wrote:

> I did not receive the message from David Starner you are quoting, it
> was probably not sent to this list but I did not received it privately
> (not even in my "spam mailbox").

http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0134.html

--
Doug Ewell | Thornton, CO, US | ewellic.org

From doug at ewellic.org Sun Oct 9 20:13:32 2016
From: doug at ewellic.org (Doug Ewell)
Date: Sun, 9 Oct 2016 19:13:32 -0600
Subject: Fwd: Why incomplete subscript/superscript alphabet ?
Message-ID:

Denis Jacquerye wrote:

> Regarding the superscript q, in some rare cases, it is used to
> indicate pharyngealization or a pharyngeal consonant instead of the
> Latin letter pharyngeal voiced fricative U+0295 ?, the modifier letter
> reversed glottal stop U+02C1 ? or the modifier letter small reversed
> glottal stop U+02E4 ?.
> ...

Sounds like good material to include in a proposal.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From prosfilaes at gmail.com Sun Oct 9 23:33:07 2016
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 10 Oct 2016 04:33:07 +0000
Subject: Bit arithmetic on Unicode characters?
In-Reply-To:
References:
<3a9d909b-1b66-2614-0cd2-2e1207963642@att.net>

Message-ID:

On Sun, Oct 9, 2016 at 4:03 AM Mark Davis ?? wrote:

> Essentially all of the game pieces that are in Unicode were added for
> compatibility with existing character sets. ?I'm guessing that ?there are
> hundreds to thousands of possible other symbols associated with games in
> one way or another, or that could be dug out of instruction manuals (eg,
> http://www.catan.com/files/downloads/catan_5th_ed_rules_eng_150303.pdf).
> (Many of those would be encumbered by copyright issues, but there are no
> doubt others that would not.)
>

I see two symbols used in text in that Catan manual; there's a white star
(U+2606) and a twelve-pointed red star (U+2739 or U+1F7D2?). I don't see
why books about games would be any different than any other book in this
manner; symbols used in running text should be encoded.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From haberg-1 at telia.com Mon Oct 10 04:30:48 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Mon, 10 Oct 2016 11:30:48 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
Message-ID: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

> On 10 Oct 2016, at 03:13, Doug Ewell wrote:
>
> Denis Jacquerye wrote:
>
>> Regarding the superscript q, in some rare cases, it is used to
>> indicate pharyngealization or a pharyngeal consonant instead of the
>> Latin letter pharyngeal voiced fricative U+0295 ?, the modifier letter
>> reversed glottal stop U+02C1 ? or the modifier letter small reversed
>> glottal stop U+02E4 ?.
>> ...
>
> Sounds like good material to include in a proposal.

I think that IPA might be designed for broad phonetic transcriptions [1], with a requirement to distinguish phonemes within each given language. For example, the English /l/ is thicker than the Swedish, but in IPA, there is only one symbol, as there is no phonemic distinction with each language. The alveolar click /!/ may be pronounced with or without the tongue hitting the floor of the mouth, but as there is not phonemic distinction within any given language, there is only one symbol [2].

Thus, linguists wanting to describe pronunciation in more detail are left at improvising notation. The situation is thus more like that of mathematics, where notation is somewhat in flux.

1. https://en.wikipedia.org/wiki/Phonetic_transcription
2. https://en.wikipedia.org/wiki/Alveolar_clicks

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 08:24:51 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 14:24:51 +0100
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>
Message-ID:

On 2016-10-10, Hans ?berg wrote:
> I think that IPA might be designed for broad phonetic transcriptions
> [1], with a requirement to distinguish phonemes within each given
> language. For example, the English /l/ is thicker than the Swedish,
> but in IPA, there is only one symbol, as there is no phonemic
> distinction with each language. The alveolar click /!/ may be
> pronounced with or without the tongue hitting the floor of the
> mouth, but as there is not phonemic distinction within any given
> language, there is only one symbol [2].

But the IPA has many diacritics exactly for this purpose.
The velarized English coda /l/ is usually described as [l?]
with U+0334 COMBINING TILDE OVERLAY, or can be notated [l?]
with U+02E0 MODIFIER LETTER SMALL GAMMA.

The alveolar click with percussive flap hasn't made it into the
standard IPA, but in ExtIPA it's [??] (preferably kerned together).

> Thus, linguists wanting to describe pronunciation in more detail are left at improvising notation. The situation is thus more like that of mathematics, where notation is somewhat in flux.

There is improvisation when you're studying something new, of course,
but there's a lot of standardization.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From haberg-1 at telia.com Mon Oct 10 11:04:36 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Mon, 10 Oct 2016 18:04:36 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

Message-ID: <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

> On 10 Oct 2016, at 15:24, Julian Bradfield wrote:
>
> On 2016-10-10, Hans ?berg wrote:
>> I think that IPA might be designed for broad phonetic transcriptions
>> [1], with a requirement to distinguish phonemes within each given
>> language. For example, the English /l/ is thicker than the Swedish,
>> but in IPA, there is only one symbol, as there is no phonemic
>> distinction with each language. The alveolar click /!/ may be
>> pronounced with or without the tongue hitting the floor of the
>> mouth, but as there is not phonemic distinction within any given
>> language, there is only one symbol [2].
>
> But the IPA has many diacritics exactly for this purpose.
> The velarized English coda /l/ is usually described as [l?]
> with U+0334 COMBINING TILDE OVERLAY, or can be notated [l?]
> with U+02E0 MODIFIER LETTER SMALL GAMMA.
>
> The alveolar click with percussive flap hasn't made it into the
> standard IPA, but in ExtIPA it's [??] (preferably kerned together).

There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used.

>> Thus, linguists wanting to describe pronunciation in more detail are left at improvising notation. The situation is thus more like that of mathematics, where notation is somewhat in flux.
>
> There is improvisation when you're studying something new, of course,
> but there's a lot of standardization.

The preceding discussion was dealing additions to Unicode one-by-one?the question is what might be added so that linguists do not feel restrained.

From everson at evertype.com Mon Oct 10 11:30:46 2016
From: everson at evertype.com (Michael Everson)
Date: Mon, 10 Oct 2016 17:30:46 +0100
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

Message-ID:

On 10 Oct 2016, at 14:24, Julian Bradfield wrote:

> But the IPA has many diacritics exactly for this purpose. The velarized English coda /l/ is usually described as [l?] with U+0334 COMBINING TILDE OVERLAY,

026B ? LATIN SMALL LETTER L WITH MIDDLE TILDE

> The alveolar click with percussive flap hasn't made it into the standard IPA, but in ExtIPA it's [??] (preferably kerned together).

> On 10 Oct 2016, at 17:04, Hans ?berg wrote:
>
>> The alveolar click with percussive flap hasn't made it into the
>> standard IPA, but in ExtIPA it's [??] (preferably kerned together).
>
> There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used.

Has neither the right shape nor the right properties.

Michael Everson

From verdy_p at wanadoo.fr Mon Oct 10 12:57:13 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 10 Oct 2016 19:57:13 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>
Message-ID:

2016-10-10 18:04 GMT+02:00 Hans ?berg :

>
> > On 10 Oct 2016, at 15:24, Julian Bradfield
> wrote:
> >
> > On 2016-10-10, Hans ?berg wrote:
> >> I think that IPA might be designed for broad phonetic transcriptions
> >> [1], with a requirement to distinguish phonemes within each given
> >> language. For example, the English /l/ is thicker than the Swedish,
> >> but in IPA, there is only one symbol, as there is no phonemic
> >> distinction with each language. The alveolar click /!/ may be
> >> pronounced with or without the tongue hitting the floor of the
> >> mouth, but as there is not phonemic distinction within any given
> >> language, there is only one symbol [2].
> >
> > But the IPA has many diacritics exactly for this purpose.
> > The velarized English coda /l/ is usually described as [l?]
> > with U+0334 COMBINING TILDE OVERLAY, or can be notated [l?]
> > with U+02E0 MODIFIER LETTER SMALL GAMMA.
> >
> > The alveolar click with percussive flap hasn't made it into the
> > standard IPA, but in ExtIPA it's [??] (preferably kerned together).
>
> There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used.
>

I disagree, IPA does not use such confusive ligature that would be read as
a repeated click and not a single one. Reversing the second one (and
slighly kerning it, thow I don't know how, to avoid the confusion with
"!i", i.e. a click followed by a vowel, most proably writing them on top of
each other or slanted/italicized) is a valuable visual distinction for a
single distinctive phoneme.

But IPA also proposes something else when more precise distinctions are
needed for noting not just the linguistic phonemes but their precise
phonetic realisations (e.g. in papers speaking about regional speach
accents), such as combining the normal phonemic symbol with a
diacritic,usually placed below, such as the dental modifier U+032A that
looks like a small bridge or some arrowhead-like diacritics (U+032C caron
below or U+032D circumflex below) to indicate a more precise placement of
the tongue.

Clicks are also pronouncable by themselves in isolation without any vowel
(in fact it's much easiler to pronounce them without a vowel) but they may
easily be pitched (on a small range of about 6 or 7 musical tones) instead
of being vovalized. However I've not seen any discritics to also annote the
pitch.

In Chinese vowels are annotated with distinctive tones (but some of them
variable, where clicks can hardly have a raising or lowering tone). The
pitch is easily realized by more or less opening the mouth or by slighly
closing lip or rounding them (giving an appearence of "vowel", though they
are not voiced through the mouth as they are usually "aspirated" there, but
only voiced within air expirated through nasal areas). All this looks like
technical possibilities of human voice, appropriate for phonetic analysis
but rarely for actual phonemes of languages as they are hard to be
distinguished in a group of people.

These distonctions are however easiler to recognize within the context of a
complete speach along with other surrounding phonemes (Chinese may be
realized on 6 or 7 musical pitch tones by any one, but in speach only 3 are
used and the other phonemic tones are combination of the 3 basic tones, and
the mapping from the 3 basic tone to musical pitch tones/frequencies is
highly variable between persons depending on age, sex, body weight, health,
muscular development, or handicap: the phonemic tones are subdivions of the
possibilities of all the possible realizations that a mixed group of people
will want to exchange with good mutual understanding).

In Unicode there are several sets of tone modifiers that are encoded as
spacing modifiers (and in Pinyin, they are frequently noted with standard
European digits but have no direct relation with the musical pitch tone or
even with the 3 basic pitches used to compose the phonemic tones). Chinese
(but also Vietnamese) may also use diacritics above (acute, grave,
circumflex, tilde...). Linguists needing internationlization use distinct
symbols written after the vocalic phoneme or just after a vowelless
consonnantal phoneme, or just after a neutral schwa for a neutral/unclear
vowel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Mon Oct 10 14:42:40 2016
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 10 Oct 2016 12:42:40 -0700
Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?=
Message-ID: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com>

Hans ?berg wrote:

> I think that IPA might be designed for broad phonetic transcriptions
> [1], with a requirement to distinguish phonemes within each given
> language.

>From the Wikipedia article you cited:

"For example, one particular pronunciation of the English word little
may be transcribed using the IPA as /?l?t?l/ or [?l????]; the
broad, phonemic transcription, placed between slashes, indicates merely
that the word ends with phoneme /l/, but the narrow, allophonic
transcription, placed between square brackets, indicates that this final
/l/ ([?]) is dark (velarized)."

IPA can be used pretty much as broadly or as narrowly as one wishes.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 14:43:29 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 20:43:29 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>
Message-ID:

On 2016-10-10, Hans ?berg wrote:
>> On 10 Oct 2016, at 15:24, Julian Bradfield wrote:
>> The alveolar click with percussive flap hasn't made it into the
>> standard IPA, but in ExtIPA it's [??] (preferably kerned together).

> There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used.

!! was used by one famous Africanist, but that was before ExtIPA existed.

> The preceding discussion was dealing additions to Unicode one-by-one?the question is what might be added so that linguists do not feel restrained.

Linguists aren't stupid, and they have no need for plain text
representations of all their symbology. Linguists write in Word or
LaTeX (or sometimes HTML), all of which can produce a wide range of
symbols beyond the wit of Unicode.

As I have remarked before, I have used "latin letter turned small
capital K", for reasons that seemed good to me, and I was not one whit
restrained by its absence from Unicode - nor was the journal.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From haberg-1 at telia.com Mon Oct 10 15:03:42 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Mon, 10 Oct 2016 22:03:42 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com>
References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com>
Message-ID: <0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com>

> On 10 Oct 2016, at 21:42, Doug Ewell wrote:
>
> Hans ?berg wrote:
>
>> I think that IPA might be designed for broad phonetic transcriptions
>> [1], with a requirement to distinguish phonemes within each given
>> language.
>
> From the Wikipedia article you cited:
>
> "For example, one particular pronunciation of the English word little
> may be transcribed using the IPA as /?l?t?l/ or [?l????]; the
> broad, phonemic transcription, placed between slashes, indicates merely
> that the word ends with phoneme /l/, but the narrow, allophonic
> transcription, placed between square brackets, indicates that this final
> /l/ ([?]) is dark (velarized)."
>
> IPA can be used pretty much as broadly or as narrowly as one wishes.

Within each language, but is not designed to capture differences between different languages or dialects.

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:04:54 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 21:04:54 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

Message-ID:

On 2016-10-10, Philippe Verdy wrote:
> 2016-10-10 18:04 GMT+02:00 Hans ?berg :
>> > On 10 Oct 2016, at 15:24, Julian Bradfield
>> wrote:

>> > The alveolar click with percussive flap hasn't made it into the
>> > standard IPA, but in ExtIPA it's [??] (preferably kerned together).
>>
>> There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used.

> I disagree, IPA does not use such confusive ligature that would be read as
> a repeated click and not a single one. Reversing the second one (and
> slighly kerning it, thow I don't know how, to avoid the confusion with
> "!i", i.e. a click followed by a vowel, most proably writing them on top of
> each other or slanted/italicized) is a valuable visual distinction for a
> single distinctive phoneme.

What confusion? ? is not easily confusable with i - ask the Spanish!

> But IPA also proposes something else when more precise distinctions are
> needed for noting not just the linguistic phonemes but their precise

Did you read the bit where I said that?

> Clicks are also pronouncable by themselves in isolation without any vowel
> (in fact it's much easiler to pronounce them without a vowel) but they may
> easily be pitched (on a small range of about 6 or 7 musical tones) instead
> of being vovalized. However I've not seen any discritics to also annote the
> pitch.

Because no language uses clicks this way, and phonetic alphabets are
not written for composers of mouth music. If one wished to do so, one
would use the standard tone indicators.

> In Chinese vowels are annotated with distinctive tones (but some of them
> variable, where clicks can hardly have a raising or lowering tone). The
> pitch is easily realized by more or less opening the mouth or by slighly
> closing lip or rounding them (giving an appearence of "vowel", though they
> are not voiced through the mouth as they are usually "aspirated" there, but
> only voiced within air expirated through nasal areas). All this looks like

What are you on about?

> technical possibilities of human voice, appropriate for phonetic analysis
> but rarely for actual phonemes of languages as they are hard to be
> distinguished in a group of people.

Those who learn languages natively have no problems distinguishing
voiced, voiceless, aspirated, breathy, nasal, glottalized,... clicks.

> These distonctions are however easiler to recognize within the context of a
> complete speach along with other surrounding phonemes (Chinese may be
> realized on 6 or 7 musical pitch tones by any one, but in speach only 3 are
> used and the other phonemic tones are combination of the 3 basic tones, and

(a) There is no such thing as "Chinese" - there are many different
languages in China, with a continuum of dialect gradations.
(b) Even if you mean Mandarin, the usual notation for the five (four
plus neutral) Mandarin tones uses five pitch levels to describe
the contours, not three.

> spacing modifiers (and in Pinyin, they are frequently noted with standard
> European digits but have no direct relation with the musical pitch tone or
> even with the 3 basic pitches used to compose the phonemic tones). Chinese
> (but also Vietnamese) may also use diacritics above (acute, grave,
> circumflex, tilde...). Linguists needing internationlization use distinct
> symbols written after the vocalic phoneme or just after a vowelless
> consonnantal phoneme, or just after a neutral schwa for a neutral/unclear
> vowel.

Linguists don't need internationalization. They use IPA or other
notations.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From haberg-1 at telia.com Mon Oct 10 15:09:40 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Mon, 10 Oct 2016 22:09:40 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

Message-ID: <2BB69E14-7238-49C0-AB41-2B648320780E@telia.com>

> On 10 Oct 2016, at 21:43, Julian Bradfield wrote:

> Linguists aren't stupid, and they have no need for plain text
> representations of all their symbology. Linguists write in Word or
> LaTeX (or sometimes HTML), all of which can produce a wide range of
> symbols beyond the wit of Unicode.
>
> As I have remarked before, I have used "latin letter turned small
> capital K", for reasons that seemed good to me, and I was not one whit
> restrained by its absence from Unicode - nor was the journal.

It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics?

From everson at evertype.com Mon Oct 10 15:14:23 2016
From: everson at evertype.com (Michael Everson)
Date: Mon, 10 Oct 2016 21:14:23 +0100
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

Message-ID: <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>

On 10 Oct 2016, at 21:04, Julian Bradfield wrote:
>
> Linguists don't need internationalization. They use IPA or other notations.

We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost.

Michael

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:15:34 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 21:15:34 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com>
<0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com>
Message-ID:

On 2016-10-10, Hans ?berg wrote:
>> On 10 Oct 2016, at 21:42, Doug Ewell wrote:
>> Hans ?berg wrote:
>>> I think that IPA might be designed for broad phonetic transcriptions
>>> [1], with a requirement to distinguish phonemes within each given
>>> language.
...
>> IPA can be used pretty much as broadly or as narrowly as one wishes.
>
> Within each language, but is not designed to capture differences between different languages or dialects.

What do you mean? The IPA in narrow transcription is intended to
provide as detailed a description as a human mind can manage of
sounds. It doesn't care whether you're describing differences between
languages or differences within languages (a distinction that is not
in any case well defined).

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:24:11 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 21:24:11 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>
Message-ID:

On 2016-10-10, Michael Everson wrote:
> On 10 Oct 2016, at 21:04, Julian Bradfield wrote:
>>
>> Linguists don't need internationalization. They use IPA or other notations.
>
> We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost.

We have no need to make such distinctions in "plain text".

It's convenient to have major distinctions easily accessible without
font hacking, but there's no need to have every notation one might
dream up forcibly incorporated into "plain text".
In particular, for super/subscripts, which is where we came in, even
the benighted souls using Word still typically recognize and can use
LaTeX notation.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From kenwhistler at att.net Mon Oct 10 15:24:41 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Mon, 10 Oct 2016 13:24:41 -0700
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>
Message-ID: <3c040475-bf35-09ca-0121-2dbdec31961b@att.net>

On 10/10/2016 1:14 PM, Michael Everson wrote:
> On 10 Oct 2016, at 21:04, Julian Bradfield wrote:
>> Linguists don't need internationalization. They use IPA or other notations.
> We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost.
>
> Michael
>

Recte: We need reliable notation systems. Otherwise distinctions we wish
to represent may be lost.

Whether a "reliable notation system" has to be entirely plain text in
its content, or includes reliable standard means for markup, such as
XML, is a matter for debate and consensus among the linguists involved.

Linguists need to represent all kinds of things, and assuming that all
pertinent text content of interest to them is ipso facto plain text is
erroneous.

--Ken

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:31:28 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 21:31:28 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<2BB69E14-7238-49C0-AB41-2B648320780E@telia.com>
Message-ID:

On 2016-10-10, Hans ?berg wrote:
> It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics?

I don't see the need to do everything in plain text. Long ago, I spent
a great deal of time getting my editor to do semi-wysiwyg TeX maths
(work later incorporated into x-symbol), but actually it's a waste of
time and I've given up. Working mathematicians know LaTeX and its control
sequences. Even my 12-year old uses LaTeX control sequences to
communicate with his online maths courses.

Because phonetics has a much small set of symbols, I do kw??t l??k
bi??? e?bl t? du? ??s, and because they're also used in non-specialist
writing, it's useful to have the symbols hacked into Unicode instead
of hacked into specialist fonts.
But subscripts? No need.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From haberg-1 at telia.com Mon Oct 10 15:34:39 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Mon, 10 Oct 2016 22:34:39 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com>
<0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com>

Message-ID: <2869E093-6788-40DA-A646-F2DCBB9CF778@telia.com>

> On 10 Oct 2016, at 22:15, Julian Bradfield wrote:
>
> On 2016-10-10, Hans ?berg wrote:
>>> On 10 Oct 2016, at 21:42, Doug Ewell wrote:
>>> Hans ?berg wrote:
>>>> I think that IPA might be designed for broad phonetic transcriptions
>>>> [1], with a requirement to distinguish phonemes within each given
>>>> language.
> ...
>>> IPA can be used pretty much as broadly or as narrowly as one wishes.
>>
>> Within each language, but is not designed to capture differences between different languages or dialects.
>
> What do you mean? The IPA in narrow transcription is intended to
> provide as detailed a description as a human mind can manage of
> sounds. It doesn't care whether you're describing differences between
> languages or differences within languages (a distinction that is not
> in any case well defined).

It is designed for phonemic transcriptions, cf.,
https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet

From verdy_p at wanadoo.fr Mon Oct 10 15:36:33 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 10 Oct 2016 22:36:33 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

Message-ID:

2016-10-10 22:04 GMT+02:00 Julian Bradfield :

> On 2016-10-10, Philippe Verdy wrote:
> > 2016-10-10 18:04 GMT+02:00 Hans ?berg :
> >> > On 10 Oct 2016, at 15:24, Julian Bradfield
> >> wrote:
>
> >> > The alveolar click with percussive flap hasn't made it into the
> >> > standard IPA, but in ExtIPA it's [??] (preferably kerned together).
> >>
> >> There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used.
>
> > I disagree, IPA does not use such confusive ligature that would be read
> as
> > a repeated click and not a single one. Reversing the second one (and
> > slighly kerning it, thow I don't know how, to avoid the confusion with
> > "!i", i.e. a click followed by a vowel, most proably writing them on top
> of
> > each other or slanted/italicized) is a valuable visual distinction for a
> > single distinctive phoneme.
>
> What confusion? ? is not easily confusable with i - ask the Spanish!
>

Not relevant! Here were'e not speaking about punctuation between words, but
inclusion within words in phonetic trancrtiptions where even word
separation is not always relevant and punctuation us almost absent.
There's no case in Spanish with "?" in the middle of a word. But here we're
speaking about noting a consonant within words where vowels can also be
expected in phonetic transcriptions. And there the confusion with a
following voiwel i is very likely. On the opposite, IPA symbols are
carefully chosen to avoid visual confusions (and that's why they only exist
in a single lettercase).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From everson at evertype.com Mon Oct 10 15:38:56 2016
From: everson at evertype.com (Michael Everson)
Date: Mon, 10 Oct 2016 21:38:56 +0100
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>

Message-ID: <14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com>

On 10 Oct 2016, at 21:24, Julian Bradfield wrote:
>
>> We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost.
>
> We have no need to make such distinctions in "plain text?.

You mightn?t.

> It's convenient to have major distinctions easily accessible without
> font hacking,

Yes, indeed.

> but there's no need to have every notation one might dream up forcibly incorporated into "plain text?.

Hyperbole.

> In particular, for super/subscripts, which is where we came in, even
> the benighted souls using Word still typically recognize and can use
> LaTeX notation.

I can?t use LaTeX notation. I don?t use that proprietary system. And don?t you dare tell me that I am benighted, or using Word. Neither applies.

On 10 Oct 2016, at 21:31, Julian Bradfield wrote:

> On 2016-10-10, Hans ?berg wrote:
>> It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics?
>
> I don't see the need to do everything in plain text.

Of course not. You?re a programmer.

(Mathematical typesetting is not my concern.)

> Because phonetics has a much small set of symbols, I do kw??t l??k
> bi??? e?bl t? du? ??s, and because they're also used in non-specialist
> writing, it's useful to have the symbols hacked into Unicode instead
> of hacked into specialist fonts.
> But subscripts? No need.

And yet we use such things.

I have an edition of the Bible I?m setting. Big book. Verse numbers. I like these to be superscript so they?re unobtrusive. Damn right I use the superscript characters for these. I can process the text, export it for concordance processing, whatever, and those out-of-text notations DON?T get converted to regular digits, which I need.

Michael

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:52:54 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 21:52:54 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

Message-ID:

On 2016-10-10, Philippe Verdy wrote:
> Not relevant! Here were'e not speaking about punctuation between words, but
> inclusion within words in phonetic trancrtiptions where even word
> separation is not always relevant and punctuation us almost absent.
> There's no case in Spanish with "?" in the middle of a word. But here we're
> speaking about noting a consonant within words where vowels can also be
> expected in phonetic transcriptions. And there the confusion with a
> following voiwel i is very likely. On the opposite, IPA symbols are
> carefully chosen to avoid visual confusions (and that's why they only exist
> in a single lettercase).

and are less confusable than and ,
especially in a sanserif font. In both cases, the main visual cue is
a descender/ascender in one letter than isn't in the other.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:58:01 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 21:58:01 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>

<14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com>
Message-ID:

On 2016-10-10, Michael Everson wrote:
> I can?t use LaTeX notation. I don?t use that proprietary system. And don?t you dare tell me that I am benighted, or using Word. Neither applies.

That's an interesting use of "proprietary" you have there, but I
suppose with your Alician interests, Humpty Dumpty's attitude to words
may have rubbed off on you! What *do* you mean?

> I have an edition of the Bible I?m setting. Big book. Verse numbers. I like these to be superscript so they?re unobtrusive. Damn right I use the superscript characters for these. I can process the text, export it for concordance processing, whatever, and those out-of-text notations DON?T get converted to regular digits, which I need.

If you were doing it properly, the text would be stored in a suitable
markup, as would the verse numbers, and both the typesetting and the
concordance processing would deal with them appropriately.
No need for Unicode hacks.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From haberg-1 at telia.com Mon Oct 10 15:59:10 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Mon, 10 Oct 2016 22:59:10 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<2BB69E14-7238-49C0-AB41-2B648320780E@telia.com>

Message-ID: <48BBA151-5017-4357-94A1-63000F93CD34@telia.com>

> On 10 Oct 2016, at 22:31, Julian Bradfield wrote:
>
> On 2016-10-10, Hans ?berg wrote:
>> It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics?
>
> I don't see the need to do everything in plain text. Long ago, I spent
> a great deal of time getting my editor to do semi-wysiwyg TeX maths
> (work later incorporated into x-symbol), but actually it's a waste of
> time and I've given up.

A fast input method is using text substitutions together with a Unicode capable editor generating UTF-8. Then use LuaTeX together with ConTeXt or LaTeX/unicode-math.

On MacOS, it works interactively: when a matching input string is detected, it is replaced. It does not take long time to design such a text substitutions set: I made one for all Unicode math letters, more than a thousand.

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 16:01:56 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 22:01:56 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com>
<0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com>

<2869E093-6788-40DA-A646-F2DCBB9CF778@telia.com>
Message-ID:

On 2016-10-10, Hans ?berg wrote:
>> On 10 Oct 2016, at 22:15, Julian Bradfield wrote:
>> What do you mean? The IPA in narrow transcription is intended to
>> provide as detailed a description as a human mind can manage of
>> sounds. It doesn't care whether you're describing differences between
>> languages or differences within languages (a distinction that is not
>> in any case well defined).
>
> It is designed for phonemic transcriptions, cf.,
> https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet

It *was* designed, in 1870-something. Try reading the Handbook of the
IPA. It contains many samples of languages transcribed both in a broad
phonemic transcription appropriate for the language, and in a narrow
phonetic transcription which should allow a competent phonetician to
produce an understandable and reasonably accurate rendition of the
passage. Indeed, a couple of decades ago, I participated in a public
engagement event in which a few of us attempted to do exactly that.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From everson at evertype.com Mon Oct 10 16:06:29 2016
From: everson at evertype.com (Michael Everson)
Date: Mon, 10 Oct 2016 22:06:29 +0100
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>

<14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com>

Message-ID: <3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com>

On 10 Oct 2016, at 21:58, Julian Bradfield wrote:
> On 2016-10-10, Michael Everson wrote:
>> I can?t use LaTeX notation. I don?t use that proprietary system. And don?t you dare tell me that I am benighted, or using Word. Neither applies.
>
> That's an interesting use of "proprietary" you have there, but I
> suppose with your Alician interests, Humpty Dumpty's attitude to words
> may have rubbed off on you! What *do* you mean?

You have to have special knowledge and special software to use it. Apparently it?s used to good effect in mathematics, though a great deal of TeX material appears printed and has an obvious ?TeX? feel which to me looks rather ugly. In any case, TeX guys love TeX. And then there?s the rest of us.

>> I have an edition of the Bible I?m setting. Big book. Verse numbers. I like these to be superscript so they?re unobtrusive. Damn right I use the superscript characters for these. I can process the text, export it for concordance processing, whatever, and those out-of-text notations DON?T get converted to regular digits, which I need.
>
> If you were doing it properly, the text would be stored in a suitable
> markup, as would the verse numbers, and both the typesetting and the
> concordance processing would deal with them appropriately.

?Properly?, sayeth the computer programmer. Sorry, Julian, but I use professional tools to typeset, and your disdain for that process isn?t going to change that industry. This ?suitable markup? business you?re talking about is not something people outside of ivory towers actually use.

> No need for Unicode hacks.

Unicode has superscript digits, preserved in plain text.
Do I need to do calculations with these? No. Do I need them to be identical to ASCII digits? No. I need them to be persistent, searchable if necessary (yes the search is inconvenient vis ? vis the keyboard), and preserved in plain text. Because if they?re not preserved in plain text, then I may have to convert them again, which is tedious and inconvenient. Characters is save than markup, in an instance like this.

That?s not using Unicode for a hack. That?s using Unicode to preserve distinctions in plain text.

Michael

From haberg-1 at telia.com Mon Oct 10 16:20:11 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Mon, 10 Oct 2016 23:20:11 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com>
<0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com>

<2869E093-6788-40DA-A646-F2DCBB9CF778@telia.com>

Message-ID:

> On 10 Oct 2016, at 23:01, Julian Bradfield wrote:
>
> On 2016-10-10, Hans ?berg wrote:
>>> On 10 Oct 2016, at 22:15, Julian Bradfield wrote:
>>> What do you mean? The IPA in narrow transcription is intended to
>>> provide as detailed a description as a human mind can manage of
>>> sounds. It doesn't care whether you're describing differences between
>>> languages or differences within languages (a distinction that is not
>>> in any case well defined).
>>
>> It is designed for phonemic transcriptions, cf.,
>> https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet
>
> It *was* designed, in 1870-something. Try reading the Handbook of the
> IPA. It contains many samples of languages transcribed both in a broad
> phonemic transcription appropriate for the language, and in a narrow
> phonetic transcription which should allow a competent phonetician to
> produce an understandable and reasonably accurate rendition of the
> passage. Indeed, a couple of decades ago, I participated in a public
> engagement event in which a few of us attempted to do exactly that.

But the alveolar clicks requires an extension.

From jcb+unicode at inf.ed.ac.uk Mon Oct 10 16:36:49 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Mon, 10 Oct 2016 22:36:49 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>

<14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com>

<3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com>
Message-ID:

On 2016-10-10, Michael Everson wrote:
> On 10 Oct 2016, at 21:58, Julian Bradfield wrote:
>> That's an interesting use of "proprietary" you have there, but I
....
> You have to have special knowledge and special software to use it.

That's not what "proprietary" means. To quote the OED (which, by the
way, is produced by an actual professional publisher, and is stored in
XML, unless I'm badly mistaken), "proprietary" means "Of a product,
esp. a drug or medicine: of which the manufacture or sale is
restricted to a particular person or persons; (in later use)
spec. marketed under and protected by patent or registered trade
name."
If you're typesetting your bible with no special software and no
special knowledge, then you must be doing it by hand in cold
metal. Somehow, I don't think you are.
I suspect you're using software that is owned by somebody and marketed
and protected.

> Apparently it?s used to good effect in mathematics, though a great
> deal of TeX material appears printed and has an obvious ?TeX? feel

It's for printing, so of course it appears printed. The obvious TeX
feel is the result of using the default style, which arises from
Knuth's personal taste in mathematical typesetting, with Lamport's
(abominable) taste in structural layout on top. There are tens of
thousands of journals and books produced with LaTeX, in hundreds or
thousands of styles.

Among publishers you may have heard of, Addison-Wesley, CUP, Elsevier,
John Benjamins, OUP, Princeton UP, Wiley all use LaTeX for a
significant proportion of their output. They're all professionals.

> ?Properly?, sayeth the computer programmer. Sorry, Julian, but I use professional tools to typeset, and your disdain for that process isn?t going to change that industry. This ?suitable markup? business you?re talking about is not something people outside of ivory towers actually use.

You're a dilettante publisher using low-end professional graphic
design tools to publish. Indesign, for example, is far easier to use
for far greater effect than any LaTeX-based system if you're producing
magazines or posters; but it's far worse if you care about the content.

> That?s not using Unicode for a hack. That?s using Unicode to preserve distinctions in plain text.

Only because you've a priori decided that superscripts are plain
text instead of extra-textual decorations.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From doug at ewellic.org Mon Oct 10 16:39:33 2016
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 10 Oct 2016 14:39:33 -0700
Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?=
Message-ID: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com>

Hans ?berg wrote:

>>>> What do you mean? The IPA in narrow transcription is intended to
>>>> provide as detailed a description as a human mind can manage of
>>>> sounds.
>>>
>>> It is designed for phonemic transcriptions, cf.,
>>> https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet
>>
>> It *was* designed, in 1870-something. Try reading the Handbook of the
>> IPA. It contains many samples of languages transcribed both in a
>> broad phonemic transcription appropriate for the language, and in a
>> narrow phonetic transcription which should allow a competent
>> phonetician to produce an understandable and reasonably accurate
>> rendition of the passage.
>
> But the alveolar clicks requires an extension.

You've found ONE instance of non-distorted speech where IPA does not
distinguish between two allophones. That is very different from saying
that IPA is unsuitable for phonetic transcription.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From everson at evertype.com Mon Oct 10 16:42:00 2016
From: everson at evertype.com (Michael Everson)
Date: Mon, 10 Oct 2016 22:42:00 +0100
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>

<14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com>

<3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com>

Message-ID:

On 10 Oct 2016, at 22:36, Julian Bradfield wrote:

> You're a dilettante publisher using low-end professional graphic
> design tools to publish.

??

Best,
Michael Everson
http://evertype.com/catalogue.html

From frederic.grosshans at gmail.com Mon Oct 10 16:49:24 2016
From: frederic.grosshans at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBHcm9zc2hhbnM=?=)
Date: Mon, 10 Oct 2016 21:49:24 +0000
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<2BB69E14-7238-49C0-AB41-2B648320780E@telia.com>

Message-ID:

Le lun. 10 oct. 2016 22:32, Julian Bradfield a
?crit :

> On 2016-10-10, Hans ?berg wrote:
> > It is possible to write math just using ASCII and TeX, which was the
> original idea of TeX. Is that want you want for linguistics?
>
> I don't see the need to do everything in plain text. Long ago, I spent
> a great deal of time getting my editor to do semi-wysiwyg TeX maths
> (work later incorporated into x-symbol), but actually it's a waste of
> time and I've given up. Working mathematicians know LaTeX and its control
> sequences. Even my 12-year old uses LaTeX control sequences to
> communicate with his online maths courses.
>

I am a physicist regularly using LaTeX. I actually use a LaTeX-based input
method to have plain TeX math when possible. It makes more readable TeX
files and emails, specially when the equations are a bit long. It also save
characters when I livetweet scientific talks (like here
https://twitter.com/fgrosshans/status/780715752752029696)

The possibility to have reasonable plaintext math also helps to have
reasonable results when copypasting an equation from a pdf on a mathjax
enabled website.

Of course, full plaintext math is not possible, and I don't think anyone
reasonable wants a plaintext solution even for something as common as
nested exponents and indices. Rich text formats like TeX have their use
case, but that doesn't mean plain text math, with all its limitations, is
useless.

Fr?d?ric

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From haberg-1 at telia.com Mon Oct 10 17:05:20 2016
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Tue, 11 Oct 2016 00:05:20 +0200
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com>
References: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com>
Message-ID:

> On 10 Oct 2016, at 23:39, Doug Ewell wrote:
>
> Hans ?berg wrote:
>
>>>>> What do you mean? The IPA in narrow transcription is intended to
>>>>> provide as detailed a description as a human mind can manage of
>>>>> sounds.
>>>>
>>>> It is designed for phonemic transcriptions, cf.,
>>>> https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet
>>>
>>> It *was* designed, in 1870-something. Try reading the Handbook of the
>>> IPA. It contains many samples of languages transcribed both in a
>>> broad phonemic transcription appropriate for the language, and in a
>>> narrow phonetic transcription which should allow a competent
>>> phonetician to produce an understandable and reasonably accurate
>>> rendition of the passage.
>>
>> But the alveolar clicks requires an extension.
>
> You've found ONE instance of non-distorted speech where IPA does not
> distinguish between two allophones. That is very different from saying
> that IPA is unsuitable for phonetic transcription.

There are others, for example, in Dutch, the letter "v" and in "van" is pronounced in dialects in continuous variations between [f] and [v] depending on the timing of the fricative and the following vowel. It has become popular in some dictionaries to use [d] in the AmE where the BrE uses [t], but when listening, it sounds more like a [t] drawn towards [d]. The Merriam-Webster dictionary has its own system trying to capture variations.

One does not really speak separate consonants and vowels, but they slide over and adapt. Describing that is pretty tricky.

From mark at kli.org Mon Oct 10 17:06:48 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Mon, 10 Oct 2016 18:06:48 -0400
Subject: Why incomplete subscript/superscript alphabet ?
In-Reply-To:
References:
<107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com>

<6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com>

<31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com>

<14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com>

<3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com>

Message-ID: <36a17a8b-57d9-8f2f-53c9-2dcf8de69aba@kli.org>

On 10/10/2016 05:36 PM, Julian Bradfield wrote:
> On 2016-10-10, Michael Everson wrote:
>
>> Apparently it?s used to good effect in mathematics, though a great
>> deal of TeX material appears printed and has an obvious ?TeX? feel
> It's for printing, so of course it appears printed. The obvious TeX
> feel is the result of using the default style, which arises from
> Knuth's personal taste in mathematical typesetting, with Lamport's
> (abominable) taste in structural layout on top. There are tens of
> thousands of journals and books produced with LaTeX, in hundreds or
> thousands of styles.
>
> Among publishers you may have heard of, Addison-Wesley, CUP, Elsevier,
> John Benjamins, OUP, Princeton UP, Wiley all use LaTeX for a
> significant proportion of their output. They're all professionals.
>
To me, the main "TeX" feel that TeX-printed things tend to share is
Knuth's distinctive Computer Modern font, not necessarily structure.
You can typeset amazing things in TeX (viz. the Comparing Torah that
Michael published for me); limitations there are mostly of your own making.

(I haven't really been able to keep up with this thread in general, though.)

~mark

From root at unicode.org Mon Oct 10 17:13:58 2016
From: root at unicode.org (Sarasvati)
Date: Mon, 10 Oct 2016 17:13:58 -0500
Subject: Why incomplete subscript/superscript alphabet ?
Message-ID: <201610102213.u9AMDwmq013813@sarasvati.unicode.org>

Hello everyone.

The level of discourse in this thread is beginning to deteriorate.

Please rein in some of the excesses or the thread may have to be terminated.

Regards from your,
-- Sarasvati

From jcb+unicode at inf.ed.ac.uk Tue Oct 11 03:39:06 2016
From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield)
Date: Tue, 11 Oct 2016 09:39:06 +0100 (BST)
Subject: Why incomplete subscript/superscript alphabet ?
References: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com>

Message-ID:

On 2016-10-10, Hans ?berg wrote:
> There are others, for example, in Dutch, the letter "v" and in "van"
> is pronounced in dialects in continuous variations between [f] and
> [v] depending on the timing of the fricative and the following
> vowel.

Continuous variation is a universal truth of language.
The IPA has mechanisms for describing crude differences in voicing,
but if you're working at the level of, say, a difference between 0 ms
and 20 ms in average voice onset time, you need to be using numbers and
instruments, not symbols and the ear.

The most extreme attempt I know to extend the IPA to fine phonetic detail
is Canepari's book, with lots of symbols not in Unicode (I
think...it's a long whlie since I looked at). It's completely ignored,
because the level of detail he attempts to represent is well beyond
the reproducible abilities of phoneticians unaided by acoustic
analysis.

> It has become popular in some dictionaries to use [d] in the
> AmE where the BrE uses [t], but when listening, it sounds more like
> a [t] drawn towards [d].

Are you talking about American flapping, where a /t/ between vowels is
realized as [?]? I'd be surprised if any very serious dictionaries
use to represent that - can you give an example?

> One does not really speak separate consonants and vowels, but they slide over and adapt. Describing that is pretty tricky.

This is also a universal truth of language! But it doesn't stop us
making sensible abstractions, and notating them symbolically.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

From c933103 at gmail.com Tue Oct 11 04:21:12 2016
From: c933103 at gmail.com (gfb hjjhjh)
Date: Tue, 11 Oct 2016 17:21:12 +0800
Subject: Implementation of ideographic description characters
In-Reply-To:
References:

Message-ID:

After some researches, there is already a Mediawiki extension named as ids
that do exactly what I asked about. (
https://www.mediawiki.org/wiki/Extension:Ids) With the only problem is that
?is still not yet supported by the system. Now the question is can this
extension become something integrated into a font.

2016-08-05 3:26 GMT+08:00 Thomas H Gewecke :

>
> On Aug 4, 2016, at 2:45 PM, gfb hjjhjh wrote:
>
> That Wikipedia page also have a section named as "Ideographic Description
> Sequences" which is exactly forming sequences base on those ideographic
> description characters
>
>
> As I understand it, such sequences may provide a ?description? of kanji
> useful for some purposes, but are not sufficient to properly ?render? them.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From ruland at luckymail.com Tue Oct 11 07:54:54 2016
From: ruland at luckymail.com (Charlie Ruland)
Date: Tue, 11 Oct 2016 14:54:54 +0200
Subject: Implementation of ideographic description characters
In-Reply-To:
References:

Message-ID: <76fd2a2d-5097-6b9e-1a24-d9d607b8852e@luckymail.com>

This Mediawiki extension reminds me of svghanzi.appspot.com/
. If you don't understand the Russian
instructions, read Creating Characters by SVG

by John Pasden.

gfb hjjhjh wrote:
> After some researches, there is already a Mediawiki extension named as
> ids that do exactly what I asked about.
> (https://www.mediawiki.org/wiki/Extension:Ids) With the only problem
> is that ?is still not yet supported by the system. Now the question is
> can this extension become something integrated into a font.
>
> 2016-08-05 3:26 GMT+08:00 Thomas H Gewecke >:
>
>
>> On Aug 4, 2016, at 2:45 PM, gfb hjjhjh > > wrote:
>>
>> That Wikipedia page also have a section named as "Ideographic
>> Description Sequences" which is exactly forming sequences base on
>> those ideographic description characters
>>
>>
>
> As I understand it, such sequences may provide a ?description? of
> kanji useful for some purposes, but are not sufficient to
> properly ?render? them.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Tue Oct 11 09:27:05 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Tue, 11 Oct 2016 16:27:05 +0200
Subject: Implementation of ideographic description characters
In-Reply-To:
References:

Message-ID:

Actually that extension for now only has data tuned for Traditional
Chinese, and does not implement the full set of IDS mappings (not the
complete Unicode repertoire), but it contains really many mappings for many
IDS strings that have no Unicode encoding. Only very few ideographic
sources are used (not all those listed in the Unihan database) and only two
"True" variants are supported (for some characters) in the database but
only one returned by the current renderer implementation in Java.

Some mappings exist in tow versions: a generic one using some undecomposed
strokes/parts (from the Unicode repertoire), and an expanded one where some
strokes are further decomposed (but using Traditional Chinese rules). In
many mappings, the two IDS are identical. The generic mapping is used to
handle many cases using overstriking IDS decompositions (which are not
further decomposed in the "expanded" IDS).

The database it contains is still in development though, but its schema
cannot really handle locale-specific variants, or additional variants that
are encoded in Unicode, except if they have a mapping in the CNS encoding
(the database contains a snapshot of the CNS to Big5 and CNS to Unicode
conversion tables, but they are not indexed and probably not used by the
Java written engine, i suppose they are just there only to allow
registering the composite glyphs that have been mapped to an IDS).

Then all IDS are are mapped into a dozen of virtual fonts (with a numeric
id between 0 and 13) and a glyph ID (assigned in the PUA range of the BMP;
font 0 is special as it contains all the base glyphs needed to compose all
other virtual fonts).

But for now this database contains no instruction for more precise
placement or resizing of components, the placement is performed using
generic rules from the IDS itself (and some rules impleemnted in the Java
code for adjusting specific strokes depending on their placement, and
adjusting the relative stroke weights in the composition), and that's
probably why the overstriking IDS (with ?)) cannot be processed: instead
they are mapped directly to a NULL unicode entry if needed or left
undecomposed both in the generic IDS and the extended IDS.

It's interesting though. But to adapt the code to Japanese or Korean,
you'll need to extend the current schema. Notably in the main table
containing the list of all supported IDS (generic plus expanded) as it
allows only a single mapping to Unicode (or NULL if there's no such
encoding) and has no column for specifying a localisation variant or
ideographic source (such a dictionnary, book, regional standard, or epoch).

----

Note that when viewing these IDS strings, I've seen that Chrome really has
a problem in displaying the IDS symbols (probably because of incorrect
autohinting): the dotted squares become random foms at usual font sizes
(12px or less) and just display garbage. It may be caused by some fonts on
my Windows 10 system. You need to zoom in the page to get a correct view of
IDS strings. When looking into the Chrome console, I see that symbols are
taken from a couple of system fonts (provided by Windows). Normally the IDS
symbols are very simple in design and even if they are dotted and can be
quirky to adjust at small sizes (to avoid dots to disappear or merge into
segments of lines), my opinion is that hinting for these symbols is simply
bad in Windows fonts or uses some proprietary technics in the OpenType
renderer of Windows, not supported by the font renderer of Chrome. Those
symbols should be correct with most common foint sizes used on the web. In
plain-text editors, the glyphs are correct at reasonnable font sizes, but
the top dotted border of these symbols is most often truncated (probably
extended too high above the line-height, and probably using incorrect
metrics).

2016-10-11 11:21 GMT+02:00 gfb hjjhjh :

> After some researches, there is already a Mediawiki extension named as ids
> that do exactly what I asked about. (https://www.mediawiki.org/
> wiki/Extension:Ids) With the only problem is that ?is still not yet
> supported by the system. Now the question is can this extension become
> something integrated into a font.
>
> 2016-08-05 3:26 GMT+08:00 Thomas H Gewecke :
>
>>
>> On Aug 4, 2016, at 2:45 PM, gfb hjjhjh wrote:
>>
>> That Wikipedia page also have a section named as "Ideographic Description
>> Sequences" which is exactly forming sequences base on those ideographic
>> description characters
>>
>>
>> As I understand it, such sequences may provide a ?description? of kanji
>> useful for some purposes, but are not sufficient to properly ?render? them.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From bobbytung at wanderer.tw Tue Oct 11 07:39:52 2016
From: bobbytung at wanderer.tw (=?UTF-8?B?6JGj56aP6IiI?=)
Date: Tue, 11 Oct 2016 20:39:52 +0800
Subject: Implementation of ideographic description characters
In-Reply-To:
References:

Message-ID: <4403576340395054382@unknownmsgid>

The ids extension can dynamically composing parts with ids into a SVG and
displayed on media wiki.

I know the team implied this function in Taiwan. They are dealing a
Taiwanese dictionary contained several Hanzi not encoded into Unicode.

Bobby Tung

gfb hjjhjh ? 2016?10?11? ??5:27 ???

After some researches, there is already a Mediawiki extension named as ids
that do exactly what I asked about. (
https://www.mediawiki.org/wiki/Extension:Ids) With the only problem is that
?is still not yet supported by the system. Now the question is can this
extension become something integrated into a font.

2016-08-05 3:26 GMT+08:00 Thomas H Gewecke :

>
> On Aug 4, 2016, at 2:45 PM, gfb hjjhjh wrote:
>
> That Wikipedia page also have a section named as "Ideographic Description
> Sequences" which is exactly forming sequences base on those ideographic
> description characters
>
>
> As I understand it, such sequences may provide a ?description? of kanji
> useful for some purposes, but are not sufficient to properly ?render? them.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From dzo at bisharat.net Tue Oct 11 10:52:20 2016
From: dzo at bisharat.net (dzo at bisharat.net)
Date: Tue, 11 Oct 2016 15:52:20 +0000
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on
Message-ID: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry>

Of possible interest - I noted recently the continued use of "3" for "?" in tweets & some web content about a pair of Ghanaian plays whose titles include the Ga language term "Wogb? J?k?."

See http:/niamey.blogspot.com/2016/10/wogb-jk-ghanaian-language-input-support.html

The problem is input systems, not availability of fonts as it once was. Keyboard layouts exist for Ga and other Ghanaian languages, and these enable typing needed extended Latin characters. But a number of them, including possibly all for mobile devices, work by substituting selected key assignments, which in the case of multilingual text would apparently mean switching keyboards to accommodate characters not present in both/all languages used. Not ideal.

What are the possibilities of extended keyboard options on mobile devices for extended Latin characters to facilitate multilingual text composition? What is current thinking / practice wrt expanding virtual keyboards?

This gets beyond Unicode proper to ISO/IEC 9995 and perhaps ISO/IEC 14755, so may be beyond the scope of the list. Any responses off-list I can summarize if of wider interest.

Thanks in advance for any info.

Don Osborn

Sent via BlackBerry by AT&T

From prosfilaes at gmail.com Tue Oct 11 11:26:08 2016
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 11 Oct 2016 16:26:08 +0000
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry>
References: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry>
Message-ID:

On Tue, Oct 11, 2016 at 8:55 AM wrote:

> What is current thinking / practice wrt expanding virtual keyboards?
>

I'm just a user here, and that of the English and Esperanto keyboards on
Android, but given swipe input and autocorrect both depending on knowing
what language is being entered, it seems unlikely that virtual keyboards
are going to evolve towards being better at multilingual input.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Tue Oct 11 11:48:00 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 11 Oct 2016 09:48:00 -0700
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
Message-ID: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>

Don Osborn wrote:

> What are the possibilities of extended keyboard options on mobile
> devices for extended Latin characters to facilitate multilingual text
> composition? What is current thinking / practice wrt expanding virtual
> keyboards?
>
> This gets beyond Unicode proper to ISO/IEC 9995 and perhaps ISO/IEC
> 14755, so may be beyond the scope of the list. Any responses off-list
> I can summarize if of wider interest.

You mentioned mobile devices, but also mentioned ISO/IEC 9995 and 14755,
which seem to deal primarily with computer keyboards.

On Windows, John Cowan's Moby Latin keyboard [1] allows the input of
more than 800 non-ASCII characters, including the two mentioned in your
post (? and ?):

AltGr+p, o 0254 LATIN SMALL LETTER OPEN O
AltGr+p, e 025B LATIN SMALL LETTER OPEN E

Moby Latin is a strict superset of the standard U.S. English keyboard;
that is, none of the standard keystrokes were redefined, unlike
keyboards such as United States-International which tend to redefine
keys for ASCII characters that look like diacritical marks, making
adoption difficult. There are also versions of Moby based on the
standard U.K. keyboard.

[1]
http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html

--
Doug Ewell | Thornton, CO, US | ewellic.org

From charupdate at orange.fr Wed Oct 12 01:45:33 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 12 Oct 2016 08:45:33 +0200 (CEST)
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry>
References: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry>
Message-ID: <436090000.1081.1476254733252.JavaMail.www@wwinf1h22>

On Tue, 11 Oct 2016 15:52:20 +0000, dzo_at_bisharat.net wrote:

> Of possible interest - I noted recently the continued use of "3" for "?" in tweets
> & some web content about a pair of Ghanaian plays whose titles include the Ga
> language term "Wogb? J?k?."
>
> See http:/niamey.blogspot.com/2016/10/wogb-jk-ghanaian-language-input-support.html
>
> The problem is input systems, not availability of fonts as it once was. Keyboard
> layouts exist for Ga and other Ghanaian languages, and these enable typing needed
> extended Latin characters. But a number of them, including possibly all for mobile
> devices, work by substituting selected key assignments, which in the case of
> multilingual text would apparently mean switching keyboards to accommodate
> characters not present in both/all languages used. Not ideal.
>
> What are the possibilities of extended keyboard options on mobile devices for
> extended Latin characters to facilitate multilingual text composition? What is
> current thinking / practice wrt expanding virtual keyboards?
>
> This gets beyond Unicode proper to ISO/IEC 9995 and perhaps ISO/IEC
> 14755, so may be beyond the scope of the list. Any responses off-list
> I can summarize if of wider interest.

One way to deal with increased sets of directly accessed letters is to map the
extended letters on the digits row, and to toggle between a languages layout
without directly accessed digits, and an ASCII layout, and to do this not via the
system facility, but with a hard-coded toggle on key E00. This way I?m catering for
French, [1] and I project to derive a Malian layout from it, but for Ga one has to
start from the US-English layout, except where French has been adopted in Ghana.

I see no particular challenges in starting from whatever layout to implement this
(including Vietnamese and Lithuanian, where digits are *already* on third level),
when the users are interested in a change for enhancement; but in adding an extra
row to a cellphone on-screen keyboard I do see several.

Kind regards,
Marcel

[1] http://dispoclavier.com/#i0

From zelpahd at gmail.com Wed Oct 12 05:58:30 2016
From: zelpahd at gmail.com (zelpa)
Date: Wed, 12 Oct 2016 21:58:30 +1100
Subject: Emoji end goal
Message-ID:

So what exactly is the end goal for emoji? First we had the fitzpatrick
skin modifiers, now there's the proposal for gendered emoji sequences using
ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
what is the true end goal? Will we one day be able to display our Fallout 4
character with a single emoji and 60 modifiers? And honestly, who is asking
for these additions? Does anybody WANT a hair colour modifier? Seems to me
like the consortium might just be pandering to a few silly requests (by
people who have no actual idea what unicode is) to get media attention.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From leoboiko at gmail.com Wed Oct 12 08:47:01 2016
From: leoboiko at gmail.com (Leonardo Boiko)
Date: Wed, 12 Oct 2016 10:47:01 -0300
Subject: Emoji end goal
In-Reply-To:
References:
Message-ID:

Yes, the end goal of the Unicode Consortium is media attention by way of
virtue signaling. For every online article about emoji modifiers, each
individual member of the Consortium earns a fifty-Euro bonus from our
masters, the global feminist cultural-Marxist Jewish conspiracy, for our
support in propagating political correctness and ultimately implementing
ONU's One World Government. In fact, the end goal for emoji (as originally
planned by Gramsci and Adorno in UAX #1922) is to be the mandatory
Newspeak-style writing system of the NWO, so as to brainwash citizens away
from scientific truths like race realism or the sociobiology of gender. As
soon as WOMAN+ ZWJ+President Hillary finish assassinating the last
remaining ASCII reactionaries, full emoji deployment will be in order, and
we'll indoctrinate every child to internalize standard Communist dogma such
as "all ethnicities deserve equal representation in media" or "all
combinations of genders and professions should be considered equally
valid". The lead experiments at Tumblr and Instagram were very successful,
proving that emoji have great potential as tools of indoctrination.

2016/10/12 10:02 "zelpa" :

> So what exactly is the end goal for emoji? First we had the fitzpatrick
> skin modifiers, now there's the proposal for gendered emoji sequences using
> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
> what is the true end goal? Will we one day be able to display our Fallout 4
> character with a single emoji and 60 modifiers? And honestly, who is asking
> for these additions? Does anybody WANT a hair colour modifier? Seems to me
> like the consortium might just be pandering to a few silly requests (by
> people who have no actual idea what unicode is) to get media attention.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From zelpahd at gmail.com Wed Oct 12 08:55:39 2016
From: zelpahd at gmail.com (zelpa)
Date: Thu, 13 Oct 2016 00:55:39 +1100
Subject: Emoji end goal
In-Reply-To:
References:

Message-ID:

>"all ethnicities deserve equal representation in media" or "all
combinations of genders and professions should be considered equally"
I wasn't aware that bald yellow people were a race, sorry. If anything,
adding the skintone modifiers has made me feel LESS included, what if I
don't fit in to one of the 5 categories? What if I drank too much colloidal
silver and have blue skin? Sure would be nice to be able to express an
emotion without also expressing my gender and race. What a wacky world
would that be. And as for the professions? As I've said on the mailing list
in the past, the current proposal makes it IMPOSSIBLE to display certain
professions as gender-neutral. Is that really a step forward? Can we not
just have gender-neutral, race-neutral emoji? Is that really too much to
ask?

On Thu, Oct 13, 2016 at 12:47 AM, Leonardo Boiko wrote:

> Yes, the end goal of the Unicode Consortium is media attention by way of
> virtue signaling. For every online article about emoji modifiers, each
> individual member of the Consortium earns a fifty-Euro bonus from our
> masters, the global feminist cultural-Marxist Jewish conspiracy, for our
> support in propagating political correctness and ultimately implementing
> ONU's One World Government. In fact, the end goal for emoji (as originally
> planned by Gramsci and Adorno in UAX #1922) is to be the mandatory
> Newspeak-style writing system of the NWO, so as to brainwash citizens away
> from scientific truths like race realism or the sociobiology of gender. As
> soon as WOMAN+ ZWJ+President Hillary finish assassinating the last
> remaining ASCII reactionaries, full emoji deployment will be in order, and
> we'll indoctrinate every child to internalize standard Communist dogma such
> as "all ethnicities deserve equal representation in media" or "all
> combinations of genders and professions should be considered equally
> valid". The lead experiments at Tumblr and Instagram were very successful,
> proving that emoji have great potential as tools of indoctrination.
>
> 2016/10/12 10:02 "zelpa" :
>
>> So what exactly is the end goal for emoji? First we had the fitzpatrick
>> skin modifiers, now there's the proposal for gendered emoji sequences using
>> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
>> what is the true end goal? Will we one day be able to display our Fallout 4
>> character with a single emoji and 60 modifiers? And honestly, who is asking
>> for these additions? Does anybody WANT a hair colour modifier? Seems to me
>> like the consortium might just be pandering to a few silly requests (by
>> people who have no actual idea what unicode is) to get media attention.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From 637275 at gmail.com Wed Oct 12 10:17:22 2016
From: 637275 at gmail.com (Rebecca T)
Date: Wed, 12 Oct 2016 11:17:22 -0400
Subject: Emoji end goal
In-Reply-To:
References:
Message-ID:

Well, I think it?s definitely important to have representation and
expression for people of all skin tones and genders even in things like
emoji.

I think we?re rapidly reaching a limit for variation sequences, and I?m
personally not begging for hair color modifiers (although I would welcome
them).

I do worry a bit about the burden of supporting emoji on new systems.
Drawing thousands (not that anyone can even count how many emoji there are)
is a significant burden on developers creating new systems, and the
alternative (tofu) isn?t appealing. There is Symbola (which leaves
something to be desired, to say the least) and the graphical solutions,
like Apple?s image-based or Microsoft?s layered-vector approach, have
non-trivial implementations (stuff I wouldn?t want to take care of if I was
creating a new system).

I guess what I?m saying is: does anyone want to extent Unifont into the
astral planes?

On Wednesday, October 12, 2016, zelpa wrote:

> So what exactly is the end goal for emoji? First we had the fitzpatrick
> skin modifiers, now there's the proposal for gendered emoji sequences using
> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
> what is the true end goal? Will we one day be able to display our Fallout 4
> character with a single emoji and 60 modifiers? And honestly, who is asking
> for these additions? Does anybody WANT a hair colour modifier? Seems to me
> like the consortium might just be pandering to a few silly requests (by
> people who have no actual idea what unicode is) to get media attention.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From oren.watson at gmail.com Wed Oct 12 11:17:36 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Wed, 12 Oct 2016 12:17:36 -0400
Subject: Emoji end goal
Message-ID:

I am the maker of a similar project to unifont, albeit a work in progress
(see link below), and I certainly won't be supporting anything more than
gender-neutral, race-neutral emoji. This is due to technical
considerations: I don't plan on having colors in my font. The GNU unifont
project already has many emoji, but they also are not colored.

On the other hand, emoji are far from the most technically challenging
category of characters in unicode.

http://www.orenwatson.be/fontdemo.htm

On Wed, Oct 12, 2016 at 11:17 AM, Rebecca T <637275 at gmail.com> wrote:

> Well, I think it?s definitely important to have representation and
> expression for people of all skin tones and genders even in things like
> emoji.
>
> I think we?re rapidly reaching a limit for variation sequences, and I?m
> personally not begging for hair color modifiers (although I would welcome
> them).
>
> I do worry a bit about the burden of supporting emoji on new systems.
> Drawing thousands (not that anyone can even count how many emoji there are)
> is a significant burden on developers creating new systems, and the
> alternative (tofu) isn?t appealing. There is Symbola (which leaves
> something to be desired, to say the least) and the graphical solutions,
> like Apple?s image-based or Microsoft?s layered-vector approach, have
> non-trivial implementations (stuff I wouldn?t want to take care of if I was
> creating a new system).
>
> I guess what I?m saying is: does anyone want to extent Unifont into the
> astral planes?
>
> On Wednesday, October 12, 2016, zelpa wrote:
>
>> So what exactly is the end goal for emoji? First we had the fitzpatrick
>> skin modifiers, now there's the proposal for gendered emoji sequences using
>> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
>> what is the true end goal? Will we one day be able to display our Fallout 4
>> character with a single emoji and 60 modifiers? And honestly, who is asking
>> for these additions? Does anybody WANT a hair colour modifier? Seems to me
>> like the consortium might just be pandering to a few silly requests (by
>> people who have no actual idea what unicode is) to get media attention.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Wed Oct 12 11:31:20 2016
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 12 Oct 2016 09:31:20 -0700
Subject: Emoji end goal
Message-ID: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com>

Leonardo Boiko wrote:

Gosh, even I wouldn't have gone that far.

--
Doug Ewell | Thornton, CO, US | ewellic.org

From 637275 at gmail.com Wed Oct 12 11:56:01 2016
From: 637275 at gmail.com (Rebecca T)
Date: Wed, 12 Oct 2016 12:56:01 -0400
Subject: Emoji end goal
In-Reply-To:
References:

Message-ID:

Sure, and kanji have romanisations but that doesn?t make the latin alphabet
language neutral. And yes, emoji were supposed to be language neutral but
all the implementers made them default to male. I think you have an
*argument* with skin-tone neutrality but I think you?d be hard-pressed to
find any POC who think the Fitzpatrick modifiers were a mistake.

Also, the ?what if my skin was blue? argument is a red herring ? nobody has
blue skin, so it?s a moot point.
However, if you do find yourself drinking silver, I suggest U+1F922
?? Nauseated Face.

On Wednesday, October 12, 2016, zelpa wrote:

> >"all ethnicities deserve equal representation in media" or "all
> combinations of genders and professions should be considered equally"
> I wasn't aware that bald yellow people were a race, sorry. If anything,
> adding the skintone modifiers has made me feel LESS included, what if I
> don't fit in to one of the 5 categories? What if I drank too much colloidal
> silver and have blue skin? Sure would be nice to be able to express an
> emotion without also expressing my gender and race. What a wacky world
> would that be. And as for the professions? As I've said on the mailing list
> in the past, the current proposal makes it IMPOSSIBLE to display certain
> professions as gender-neutral. Is that really a step forward? Can we not
> just have gender-neutral, race-neutral emoji? Is that really too much to
> ask?
>
>
> On Thu, Oct 13, 2016 at 12:47 AM, Leonardo Boiko
> wrote:
>
>> Yes, the end goal of the Unicode Consortium is media attention by way of
>> virtue signaling. For every online article about emoji modifiers, each
>> individual member of the Consortium earns a fifty-Euro bonus from our
>> masters, the global feminist cultural-Marxist Jewish conspiracy, for our
>> support in propagating political correctness and ultimately implementing
>> ONU's One World Government. In fact, the end goal for emoji (as originally
>> planned by Gramsci and Adorno in UAX #1922) is to be the mandatory
>> Newspeak-style writing system of the NWO, so as to brainwash citizens away
>> from scientific truths like race realism or the sociobiology of gender. As
>> soon as WOMAN+ ZWJ+President Hillary finish assassinating the last
>> remaining ASCII reactionaries, full emoji deployment will be in order, and
>> we'll indoctrinate every child to internalize standard Communist dogma such
>> as "all ethnicities deserve equal representation in media" or "all
>> combinations of genders and professions should be considered equally
>> valid". The lead experiments at Tumblr and Instagram were very successful,
>> proving that emoji have great potential as tools of indoctrination.
>>
>> 2016/10/12 10:02 "zelpa" :
>>
>>> So what exactly is the end goal for emoji? First we had the fitzpatrick
>>> skin modifiers, now there's the proposal for gendered emoji sequences using
>>> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
>>> what is the true end goal? Will we one day be able to display our Fallout 4
>>> character with a single emoji and 60 modifiers? And honestly, who is asking
>>> for these additions? Does anybody WANT a hair colour modifier? Seems to me
>>> like the consortium might just be pandering to a few silly requests (by
>>> people who have no actual idea what unicode is) to get media attention.
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From verdy_p at wanadoo.fr Wed Oct 12 12:09:58 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 12 Oct 2016 19:09:58 +0200
Subject: Emoji end goal
In-Reply-To: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com>
References: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com>
Message-ID:

I think that emojis at the minimum shoudl all be dispalyable isolately,
without being required to form pseudo ligatures or to use colors. Skin
colors can still be displayed with a patchwork-like rectangle after it and
could still use monochromaic pattern fills. The number of combinations is
exploding and most of them are infact not evident at all (or are highly
culturally oriented).

Amojis should remain simple, showing basic shapes, but I don't see why it
could not differentiate a man or a woman, independantly of the ligatures
that may be created with them (using a completely invented adhoc
"orthography" that actually follows no standard at all and does not match
cultural differences or the way we perceive the associations, that are more
and more limiting their semantic interpretation in a too much restricted
way.

We certaionly don't have enough history is using emojis for creating and
standardizing such pseudo-orthography. Emojis remain a new pseudo-language,
but it reuses a typography based on visible symbols that have a long
cutlural tradition with other cultural meanings and many unexpected
semantics that don't work with the current associations created.

So in fact I only support very few associations:
- associating two "Flag" pseudo-letters (but a rendering should still be OK
if the emojis just show the actual letters within a left or right part of a
frame for a flag., without attempting to combine them into an actual
colored flag (which will need to evolve with time).
- associating skin color emojis after an emoji for a real human person or
perosn face (no need this in fiction characters or for coloring other parts
such as hands, fingers, eyes, hair, nose...)

In all cases, colors should always remain an option. Please keep emojis
simple and always usable in isolation, leaving their interpretation and
associations only to reading humans according to their local culture and
social interactions. The way they are used now is in fact abusing the
initial goal of Unicode encoding which is to not encode according to
specific languages or culture, and not break their basic semantic. byt
mising them into something that is not clearly separable and does not carry
the same amount of semantics.

2016-10-12 18:31 GMT+02:00 Doug Ewell :

> Leonardo Boiko wrote:
>
>
>
> Gosh, even I wouldn't have gone that far.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From 637275 at gmail.com Wed Oct 12 13:45:00 2016
From: 637275 at gmail.com (Rebecca T)
Date: Wed, 12 Oct 2016 14:45:00 -0400
Subject: Emoji end goal
In-Reply-To:
References: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com>

Message-ID:

Agreed. I think a good response to ?that?d _double_ the codepoints, so we
should just add a ligature? is ?if it would be such a burden to implement
that you don?t want to use space in the charts for what are, fundamentally,
hundreds of *semantically different* ideographs, why are we dumping that
burden onto vendors??

On Wed, Oct 12, 2016 at 1:09 PM, Philippe Verdy wrote:

> I think that emojis at the minimum shoudl all be dispalyable isolately,
> without being required to form pseudo ligatures or to use colors. Skin
> colors can still be displayed with a patchwork-like rectangle after it and
> could still use monochromaic pattern fills. The number of combinations is
> exploding and most of them are infact not evident at all (or are highly
> culturally oriented).
>
> Amojis should remain simple, showing basic shapes, but I don't see why it
> could not differentiate a man or a woman, independantly of the ligatures
> that may be created with them (using a completely invented adhoc
> "orthography" that actually follows no standard at all and does not match
> cultural differences or the way we perceive the associations, that are more
> and more limiting their semantic interpretation in a too much restricted
> way.
>
> We certaionly don't have enough history is using emojis for creating and
> standardizing such pseudo-orthography. Emojis remain a new pseudo-language,
> but it reuses a typography based on visible symbols that have a long
> cutlural tradition with other cultural meanings and many unexpected
> semantics that don't work with the current associations created.
>
> So in fact I only support very few associations:
> - associating two "Flag" pseudo-letters (but a rendering should still be
> OK if the emojis just show the actual letters within a left or right part
> of a frame for a flag., without attempting to combine them into an actual
> colored flag (which will need to evolve with time).
> - associating skin color emojis after an emoji for a real human person or
> perosn face (no need this in fiction characters or for coloring other parts
> such as hands, fingers, eyes, hair, nose...)
>
> In all cases, colors should always remain an option. Please keep emojis
> simple and always usable in isolation, leaving their interpretation and
> associations only to reading humans according to their local culture and
> social interactions. The way they are used now is in fact abusing the
> initial goal of Unicode encoding which is to not encode according to
> specific languages or culture, and not break their basic semantic. byt
> mising them into something that is not clearly separable and does not carry
> the same amount of semantics.
>
> 2016-10-12 18:31 GMT+02:00 Doug Ewell :
>
>> Leonardo Boiko wrote:
>>
>>
>>
>> Gosh, even I wouldn't have gone that far.
>>
>> --
>> Doug Ewell | Thornton, CO, US | ewellic.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From prosfilaes at gmail.com Wed Oct 12 15:14:31 2016
From: prosfilaes at gmail.com (David Starner)
Date: Wed, 12 Oct 2016 20:14:31 +0000
Subject: Emoji end goal
In-Reply-To:
References: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com>

Message-ID:

On Wed, Oct 12, 2016 at 11:48 AM Rebecca T <637275 at gmail.com> wrote:

> Agreed. I think a good response to ?that?d _double_ the codepoints, so we
> should just add a ligature? is ?if it would be such a burden to implement
> that you don?t want to use space in the charts for what are, fundamentally,
> hundreds of *semantically different* ideographs, why are we dumping that
> burden onto vendors??
>

Because the vendors want it. There's far more people who can and will
implement emoji completely than who support all Han ideographs or many
ancient scripts. If you don't want to support it because it's too big a
burden, then don't. If you don't have that option because your users are
demanding it, then Unicode is successfully providing the options the users
want, and if that feature is too much of a burden for you to support,
perhaps the problem is that you picked a problem you couldn't feasibly
solve.

I'd compare OSes. An operating system is probably about a man-year of work,
until you have all this problem with people wanting fancy font support and
graphical user interfaces and both IPv4 and IPv6 support and reading CDs
and audio support and all this ridiculous stuff. (A real OS supports either
punch cards or a keyboard for input, and outputs to a line printer.) Today,
pretty much only a major megacorp can make an OS from scratch, and even
Google used the Linux kernel and Java to simplify making Android. You could
blame Unicode for a small part of that, but Unicode isn't making you
implement Unicode in your OS; your users are making that demand.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From irgendeinbenutzername at gmail.com Wed Oct 12 15:40:20 2016
From: irgendeinbenutzername at gmail.com (Charlotte Buff)
Date: Wed, 12 Oct 2016 22:40:20 +0200
Subject: Emoji end goal
Message-ID:

On Wed, 12 Oct 2016 20:14:31 +0000 David Starner > wrote:
> Because the vendors want it.

I wouldn't say so in general. Emoji fonts are far more work than regular
black-and-white vectors and I honestly believe that vendors with PNG-based
fonts like Apple and Google are slowly reaching the point where they can no
longer reasonably support any more emoji because their font sizes would
just blow up. I have noticed that recently vendors have become quite picky
on what emoji they want to support, going so far as blocking the addition
of new symbol characters to the UCS entirely, rather than just refusing to
give them emoji presentation once added. (Why they still thought the
hundreds of new gendered emoji were a good idea is another question.)

It's not like back in Unicode 7 when Apple and friends happily added half
of Webdings to their colorful emoji fonts for no apparent reason. I think
vendors really don't want to spend their time and effort on emoji anymore.
Things like hair colors are pretty much unfeasible for anyone besides
Microsoft, but as soon as there is some kind of semi-official Unicode
mechanism for that, user will *demand* you to follow through and implement
all possible variants.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From oren.watson at gmail.com Wed Oct 12 16:40:16 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Wed, 12 Oct 2016 17:40:16 -0400
Subject: Emoji end goal
In-Reply-To:
References:
Message-ID:

I think ultimately there isn't an end goal. Unlike most of the other
languages/scripts that unicode supports, emoji is currently in a state of
rapid, decentralized, and asynchronous evolution and development, with
various companies and communities contributing new ideas every year. It
doesn't have an end goal because it isn't a project with a single entity or
leader who defines its direction, as for example Esperanto was.

On Wed, Oct 12, 2016 at 6:58 AM, zelpa wrote:

> So what exactly is the end goal for emoji? First we had the fitzpatrick
> skin modifiers, now there's the proposal for gendered emoji sequences using
> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
> what is the true end goal? Will we one day be able to display our Fallout 4
> character with a single emoji and 60 modifiers? And honestly, who is asking
> for these additions? Does anybody WANT a hair colour modifier? Seems to me
> like the consortium might just be pandering to a few silly requests (by
> people who have no actual idea what unicode is) to get media attention.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From c933103 at gmail.com Thu Oct 13 03:39:52 2016
From: c933103 at gmail.com (gfb hjjhjh)
Date: Thu, 13 Oct 2016 16:39:52 +0800
Subject: Emoji end goal
In-Reply-To:
References:

Message-ID:

So, according to the emoji FAQ ,
the end goal of emoji is to have no emoji? Or something like Softbank's
escape sequence?
>Q: What is the longer term plan for emoji?
>A: The Unicode Consortium encourages the use of embedded graphics (a.k.a.
?stickers?) as a longer-term solution, since they allow much more freedom
of expression. See Longer Term Solutions
in UTR #51
.

btw is it just me or is the original Japanese carrier emoji, specifically
those provided by DoCoMo, still not completely coded into Unicode? I
counted the number of I-mode emoji listed on Japanese Wikipedia in the tron
code section and there're apparently more emoji than those that are in
emoji but I don't know which is missing.

2016-10-13 5:40 GMT+08:00 Oren Watson :

> I think ultimately there isn't an end goal. Unlike most of the other
> languages/scripts that unicode supports, emoji is currently in a state of
> rapid, decentralized, and asynchronous evolution and development, with
> various companies and communities contributing new ideas every year. It
> doesn't have an end goal because it isn't a project with a single entity or
> leader who defines its direction, as for example Esperanto was.
>
> On Wed, Oct 12, 2016 at 6:58 AM, zelpa wrote:
>
>> So what exactly is the end goal for emoji? First we had the fitzpatrick
>> skin modifiers, now there's the proposal for gendered emoji sequences using
>> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So
>> what is the true end goal? Will we one day be able to display our Fallout 4
>> character with a single emoji and 60 modifiers? And honestly, who is asking
>> for these additions? Does anybody WANT a hair colour modifier? Seems to me
>> like the consortium might just be pandering to a few silly requests (by
>> people who have no actual idea what unicode is) to get media attention.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From gwalla at gmail.com Thu Oct 13 10:35:44 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Thu, 13 Oct 2016 08:35:44 -0700
Subject: Emoji end goal
In-Reply-To:
References:

Message-ID:

On Thu, Oct 13, 2016 at 1:39 AM, gfb hjjhjh wrote:

> So, according to the emoji FAQ
> , the end goal of emoji is to
> have no emoji? Or something like Softbank's escape sequence?
> >Q: What is the longer term plan for emoji?
> >A: The Unicode Consortium encourages the use of embedded graphics (a.k.a.
> ?stickers?) as a longer-term solution, since they allow much more freedom
> of expression. See Longer Term Solutions
> in UTR #51
> .
>
> btw is it just me or is the original Japanese carrier emoji, specifically
> those provided by DoCoMo, still not completely coded into Unicode? I
> counted the number of I-mode emoji listed on Japanese Wikipedia in the tron
> code section and there're apparently more emoji than those that are in
> emoji but I don't know which is missing.
>
Shibuya 109 was left out because AIUI, unlike the other landmarks, it's
private property. Are there any others?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From harshula at hj.id.au Thu Oct 13 21:08:18 2016
From: harshula at hj.id.au (Harshula)
Date: Fri, 14 Oct 2016 13:08:18 +1100
Subject: Noto unified font
In-Reply-To:
References:
<201610082344.04995.luke@dashjr.org>
<8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au>
<201610090250.44483.luke@dashjr.org>
<53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au>

Message-ID:

Philippe, I presume your response was intended for Luke. If not, you may
want to re-read the thread.

On 09/10/16 15:37, Philippe Verdy wrote:
> The licence itself says it respects the 4 FSF freedoms. It also
> explicitly allows reselling (rule DFSG #1):
> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL
>
> It is not directly compatible with the GPL in a composite product, but
> with LGPL there's no problem, and there's no problem if the font is
> clearly separable and distributed along with its licence, even if the
> software coming with it or the package containing it is commercial: you
> are allowed to detach it from the package and redistribute.
>
> Really you are challenging the licence for unfair reasons
> May be you just think that the GPL or MIT licences are enough.
>
> Or you'd like the Public Domain (which in fact offers no protection and
> no long term warranty as it is reappropriatable at any time by
> proprietary licences, even retrospectively, we see everyday companies
> registering properties on pseudo-new technologies that are in fact
> inherited from the past and are used since centuries or more by the
> whole humanity, they leave some space only for today's current usages in
> limtied scopes, but protect everything else by inventing some strange
> concepts around the basic feature, with unfair claims and then want to
> collect taxes). Also an international public domain does not exist at
> all (it is always restricted by new additions to the copyright laws).
> Publishing somethingf in the Public domain is really unsafe.
>
> 2016-10-09 5:35 GMT+02:00 Harshula >:
>
> On 09/10/16 13:50, Luke Dashjr wrote:
> > On Sunday, October 09, 2016 12:08:05 AM Harshula wrote:
> >> On 09/10/16 10:44, Luke Dashjr wrote:
> >>> It's unfortunate they released it under the non-free OFL license. :(
>
> FSF appears to classify OFL as a Free license (though incompatible with
> the GNU GPL & FDL):
> https://www.gnu.org/licenses/license-list.en.html#Fonts
>
>
> >> Which alternate license would you recommend?
> >
> > MIT license or LGPL seem reasonable and common among free fonts. Some also
> > choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply to
> > fonts.
>
> Interestingly, Noto project saw advantages of OFL and moved to using it,
> not too long ago:
> https://github.com/googlei18n/noto-fonts/blob/master/NEWS
>
>
> It seems you disagree with FSF's interpretation of the OFL and bundling
> Hello World as being sufficient. Are there other reasons for your
> preference for MIT/LGPL/GPL over OFL?
>
> > On Sunday, October 09, 2016 12:16:37 AM you wrote:
> >> That's your definition of non-free then... If I were a font developer and
> >> of mind to release my font for use without charge, I wouldn't want anyone
> >> else to make money out of selling it when I myself - who put the effort
> >> into preparing it - don't make money from selling it. So it protects the
> >> moral rights of the developer.
>
> Why are you attributing Shriramana Sharma's email to me? It might be
> clearer if you replied to his email.
>
> cya,
> #

From charupdate at orange.fr Fri Oct 14 08:17:14 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Fri, 14 Oct 2016 15:17:14 +0200 (CEST)
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To:
References: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry>

Message-ID: <700499764.8011.1476451035080.JavaMail.www@wwinf1f34>

On Tue, 11 Oct 2016 15:52:20 +0000, Don Osborn wrote:
[?]
>
> The problem is input systems, not availability of fonts as it once was. Keyboard
> layouts exist for Ga and other Ghanaian languages, and these enable typing needed
> extended Latin characters. But a number of them, including possibly all for mobile
> devices, work by substituting selected key assignments, which in the case of
> multilingual text would apparently mean switching keyboards to accommodate
> characters not present in both/all languages used. Not ideal.
>
[?]

AIUI, what is drawing people away from getting able to efficiently input
Extended Latin alongside with Basic Latin, is the fear of becoming unable
to efficiently input digits as soon as these don?t show up in the Base shift
state any longer.

Thus IMHO it could be interesting for many more of the world?s languages
to see that there is a good reason to depart from the typical layout pattern
that has the digits in the Base shift state, and to see that this is in
practice feasible inside the system input framework, which doesn?t have
so much of the severe limitations that are often pointed. These mainly
result from the appearance that the Windows keyboarding framework is given
in the MSKLC UI, while the author of this useful software himself invited
his users to expand the features by using the included Keyboard Table
Generation Tool (Unicode) 3.40.
So do I, FWIW.

While still being very busy with the French keyboard layouts that I?m
working on, I?m already able to share one more feature for keyboards that
have the 102d/105th key, next to left Shift. It is obtained by mapping on
this key e.g. the 0x10 modifier, and by allocating this new level to an
emulated numerical keypad with hex digits beside Arabic digits, a comma
key beside the decimal separator dot key, double and triple zero keys,
the zero doubled on VK_0 to complete and to facilitate input of binary
numbers, with % and $ and much more, and U+202F on the space bar.
In many languages, this is used as a tousands separator, and in all
languages before the unit (as in ?1,234.56?$?).

This new ?Num? modifier is optional, as is the extra key proper to ISO
keyboards. But I strongly recommend to always add the extra toggle I?ve
already mentioned, on key E00 (or instead of Capitals Lock if this is
disliked in the target locale).

I believe that such keyboards will address the issue.

Best regards,

Marcel

From doug at ewellic.org Fri Oct 14 10:33:48 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 14 Oct 2016 08:33:48 -0700
Subject: Emoji end goal
Message-ID: <20161014083348.665a7a7059d7ee80bb4d670165c8327d.3cdcef3df5.wbe@email03.godaddy.com>

gfb hjjhjh wrote:

> So, according to the emoji FAQ
> ,
> the end goal of emoji is to have no emoji? Or something like
> Softbank's escape sequence?
>
>> Q: What is the longer term plan for emoji?
>> A: The Unicode Consortium encourages the use of embedded graphics
>> (a.k.a. "stickers") as a longer-term solution, since they allow much
>> more freedom of expression. See Longer Term Solutions
>> in UTR #51
>> .

There is a new emoji proposal [1] that cites the existence of "many apps
and sticker packs" with the proposed image as one rationale for encoding
it as a character. If ESC accepts this rationale, then the passage in
UTR #51 cited above will not only be incorrect, it will have been turned
on its ear.

[1] http://www.unicode.org/L2/L2016/16280-breastfeeding-emoji.pdf

--
Doug Ewell | Thornton, CO, US | ewellic.org

From mjansche at google.com Fri Oct 14 12:07:23 2016
From: mjansche at google.com (Martin Jansche)
Date: Fri, 14 Oct 2016 18:07:23 +0100
Subject: Amiguity(?) in Sinhala named sequences
Message-ID:

For Sinhala, the following named sequences are defined (for good reasons):

SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA
SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB
SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D

I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll write
Ya for 0DBA and Ra for 0DBB.

Note that these give rise to two potentially ambiguous codepoint strings,
namely

0DBB 0DCA 200D 0DBA
0DBB 0DCA 200D 0DBB

I'll concentrate on the first, as all arguments apply to the second one
analogously.

At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible parses:

0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya
0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya

First question: Does the standard give any guidance as to which one is the
intended parse? The section on Sinhala in the Unicode Standard is silent
about this. Is there a general principle I'm missing?

Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not used
and is considered incorrect, suggesting that the second parse (Repaya+Ya)
should be the default interpretation of this sequence. However, SLS 1134
does not address the potential ambiguity of this sequence explicitly and
the description there could be read as informative, not normative.

Second question: Given that one parse of this sequence should be the
default, how does one represent the non-default parse?

In most cases one can guess what the intended meaning is, but I suspect
this is somewhat of a gray area. In practice, trying to render these
problematic sequences and their neighbors in HarfBuzz with a variety of
fonts results in a variety of outcomes (including occasionally unexpected
glyph choices). If the meaning of these sequences is not well defined, that
would partly explain the variation across fonts.

Am I missing something fundamental? If not, it seems this issue should be
called out explicit in some part of the standard.

Regards,
-- martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From asmusf at ix.netcom.com Fri Oct 14 13:09:28 2016
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Fri, 14 Oct 2016 11:09:28 -0700
Subject: Amiguity(?) in Sinhala named sequences
In-Reply-To:
References:
Message-ID: <7c3ceb6b-c7b9-834d-3bf2-83037c3faeb0@ix.netcom.com>

This is an interesting question.

It seems the task of parsing a text into sequences depends on the
purpose. Not all sequences of interest are named and, in the general
case, not all attempts at parsing may be unique. In this case, it looks
like the named sequences would correspond to a specific (ligated) glyph
that matches a user-perceived unit of the writing system.

Such a parsing task is akin to scanning, for example, strings using the
Latin script for ligatures - while trying to emulate the rules that were
in effect during days of hot metal typesetting for certain languages.
For example, it wasn't enough to know that a certain cluster of letters
might have a ligature glyph, one would also have to know whether the
cluster straddled a (compound) word boundary or not. Just knowing the
specification of ligated sequences alone would not be enough to identify
a correct parse.

Such rules, however, are usually not part of the Unicode standard.

The situation here is similar; the standard simply specifies that a
certain sequence of code points has a collective name. In case of
ambiguities, you'll have to turn to external sources to resolve them.

Now, if this isthe only such ambiguity (or one of a very small number)
and if identification of the correct sequence is essential for selecting
the correct rendering, I don't see why the script description for
Sinhala couldn't be augmented to discuss that issue.

In which case, the way to proceed is to assemble the full set of facts
and submit them to the UTC using the reporting form on the website.

A./

On 10/14/2016 10:07 AM, Martin Jansche wrote:
> For Sinhala, the following named sequences are defined (for good reasons):
>
> SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA
> SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB
> SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D
>
> I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll
> write Ya for 0DBA and Ra for 0DBB.
>
> Note that these give rise to two potentially ambiguous codepoint
> strings, namely
>
> 0DBB 0DCA 200D 0DBA
> 0DBB 0DCA 200D 0DBB
>
> I'll concentrate on the first, as all arguments apply to the second
> one analogously.
>
> At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible
> parses:
>
> 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya
> 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya
>
> First question: Does the standard give any guidance as to which one is
> the intended parse? The section on Sinhala in the Unicode Standard is
> silent about this. Is there a general principle I'm missing?
>
> Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not
> used and is considered incorrect, suggesting that the second parse
> (Repaya+Ya) should be the default interpretation of this sequence.
> However, SLS 1134 does not address the potential ambiguity of this
> sequence explicitly and the description there could be read as
> informative, not normative.
>
> Second question: Given that one parse of this sequence should be the
> default, how does one represent the non-default parse?
>
> In most cases one can guess what the intended meaning is, but I
> suspect this is somewhat of a gray area. In practice, trying to render
> these problematic sequences and their neighbors in HarfBuzz with a
> variety of fonts results in a variety of outcomes (including
> occasionally unexpected glyph choices). If the meaning of these
> sequences is not well defined, that would partly explain the variation
> across fonts.
>
> Am I missing something fundamental? If not, it seems this issue should
> be called out explicit in some part of the standard.
>
> Regards,
> -- martin

From charupdate at orange.fr Sun Oct 16 12:08:59 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 16 Oct 2016 19:08:59 +0200 (CEST)
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>
References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>
Message-ID: <1995960297.7266.1476637739730.JavaMail.www@wwinf1f09>

On 11 Oct 2016 09:48:00 -0700, Doug Ewell wrote:
[?]
>
> You mentioned mobile devices, but also mentioned ISO/IEC 9995 and 14755,
> which seem to deal primarily with computer keyboards.
>
> On Windows, John Cowan's Moby Latin keyboard [1] allows the input of
> more than 800 non-ASCII characters, including the two mentioned in your
> post (? and ?):
>
> AltGr+p, o 0254 LATIN SMALL LETTER OPEN O
> AltGr+p, e 025B LATIN SMALL LETTER OPEN E
>
> Moby Latin is a strict superset of the standard U.S. English keyboard;
> that is, none of the standard keystrokes were redefined, unlike
> keyboards such as United States-International which tend to redefine
> keys for ASCII characters that look like diacritical marks, making
> adoption difficult. There are also versions of Moby based on the
> standard U.K. keyboard.
>
> [1]
> http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html
>

U.S. Moby Latin and Whacking Latin keyboard driver packages
are not available any more. What happened?
Neither can John Cowan?s home pae be accessed:
http://home.ccil.org/%7Ecowan/XML/
Though the Chester County Interlink host is not down.
Still the ReadMe can be accessed, from another domain:
http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html

From charupdate at orange.fr Sun Oct 16 12:31:46 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 16 Oct 2016 19:31:46 +0200 (CEST)
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>
References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>
Message-ID: <2082406741.7551.1476639106475.JavaMail.www@wwinf1f09>

I guess that Moby Latin is now being reengineered, see:

http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html#vietnamese

?These assignments are considered temporary, and will be
reconsidered when the Microsoft program used to generate
Moby Latin can handle serial dead keys.?

Obviously the Microsoft program used to generate will be
KbdUTool, the Microsoft Keyboard Table Generation Tool (Unicode).

I?m so glad that now what many people were waiting for,
serial dead keys, is going to become a common feature
on Windows.

All the best,
Marcel
On Sun, 16 Oct 2016 19:08:59 +0200 (CEST), I wrote:
[?]

> U.S. Moby Latin and Whacking Latin keyboard driver packages
> are not available any more. What happened?
> Neither can John Cowan?s home pae be accessed:
> http://home.ccil.org/%7Ecowan/XML/
> Though the Chester County Interlink host is not down.
> Still the ReadMe can be accessed, from another domain:
> http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html

From mark at kli.org Sun Oct 16 13:25:34 2016
From: mark at kli.org (Mark E. Shoulson)
Date: Sun, 16 Oct 2016 14:25:34 -0400
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To: <1995960297.7266.1476637739730.JavaMail.www@wwinf1f09>
References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>
<1995960297.7266.1476637739730.JavaMail.www@wwinf1f09>
Message-ID:

I have the rare good fortune to see John Cowan on a near-daily basis
(except this month, with all the Jewish Holidays); I'll forward your
message on.

~mark

On 10/16/2016 01:08 PM, Marcel Schneider wrote:
> On 11 Oct 2016 09:48:00 -0700, Doug Ewell wrote:
> [?]
>> You mentioned mobile devices, but also mentioned ISO/IEC 9995 and 14755,
>> which seem to deal primarily with computer keyboards.
>>
>> On Windows, John Cowan's Moby Latin keyboard [1] allows the input of
>> more than 800 non-ASCII characters, including the two mentioned in your
>> post (? and ?):
>>
>> AltGr+p, o 0254 LATIN SMALL LETTER OPEN O
>> AltGr+p, e 025B LATIN SMALL LETTER OPEN E
>>
>> Moby Latin is a strict superset of the standard U.S. English keyboard;
>> that is, none of the standard keystrokes were redefined, unlike
>> keyboards such as United States-International which tend to redefine
>> keys for ASCII characters that look like diacritical marks, making
>> adoption difficult. There are also versions of Moby based on the
>> standard U.K. keyboard.
>>
>> [1]
>> http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html
>>
> U.S. Moby Latin and Whacking Latin keyboard driver packages
> are not available any more. What happened?
> Neither can John Cowan?s home pae be accessed:
> http://home.ccil.org/%7Ecowan/XML/
> Though the Chester County Interlink host is not down.
> Still the ReadMe can be accessed, from another domain:
> http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html

From doug at ewellic.org Sun Oct 16 13:25:27 2016
From: doug at ewellic.org (Doug Ewell)
Date: Sun, 16 Oct 2016 12:25:27 -0600
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To: <2082406741.7551.1476639106475.JavaMail.www@wwinf1f09>
References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>
<2082406741.7551.1476639106475.JavaMail.www@wwinf1f09>
Message-ID: <194ACB1C3362402CB0CC03D79BFF3A8F@DougEwell>

Marcel Schneider wrote:

> I guess that Moby Latin is now being reengineered, see:
>
> http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html#vietnamese

That's Caoimh?n ? Donna?le's mirror of the readme file for Whacking
Latin, the UK version of Moby Latin. I don't see anything about
re-engineering it, but maybe I missed something.

> Obviously the Microsoft program used to generate will be
> KbdUTool, the Microsoft Keyboard Table Generation Tool (Unicode).

Yes, via MSKLC.

> I?m so glad that now what many people were waiting for,
> serial dead keys, is going to become a common feature
> on Windows.

I would be glad to see that too, but where do you see that on the
referenced page? All I see is John's original text about working around
the MSKLC limitation.

If you want to work directly with KbdUTool to get serial dead keys,
bypassing MSKLC, here is Kaplan's post from 2011 on how to do this. Be
sure to read all the warnings twice:

http://archives.miloush.net/michkap/archive/2011/04/16/10154700.html

--
Doug Ewell | Thornton, CO, US | ewellic.org

From charupdate at orange.fr Sun Oct 16 15:59:01 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 16 Oct 2016 22:59:01 +0200 (CEST)
Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters
live on
In-Reply-To: <194ACB1C3362402CB0CC03D79BFF3A8F@DougEwell>
References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com>
<2082406741.7551.1476639106475.JavaMail.www@wwinf1f09>
<194ACB1C3362402CB0CC03D79BFF3A8F@DougEwell>
Message-ID: <1031266513.10647.1476651541641.JavaMail.www@wwinf1f09>

On Sun, 16 Oct 2016 14:25:34 -0400, Mark E. Shoulson wrote:

> I have the rare good fortune to see John Cowan on a near-daily basis
> (except this month, with all the Jewish Holidays); I'll forward your
> message on.

Thank you.

On Sun, 16 Oct 2016 12:25:27 -0600, Doug Ewell wrote:

> Marcel Schneider wrote:
>
> > I guess that Moby Latin is now being reengineered, see:
> >
> > http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html#vietnamese
>
> That's Caoimh?n ? Donna?le's mirror of the readme file for Whacking
> Latin, the UK version of Moby Latin. I don't see anything about
> re-engineering it, but maybe I missed something.

Right, it isn?t talking about re-engineering. ?Reconsidered? is not re-engineered.
Though I still guess that the author is doing much more now.
I remembered this sentence from having read it when you?d shared Moby Latin here.

>
> > Obviously the Microsoft program used to generate will be
> > KbdUTool, the Microsoft Keyboard Table Generation Tool (Unicode).
>
> Yes, via MSKLC.

Then there would be nothing to be reconsidered. I?ve in mind using the -s flag
to generate the C sources, then setting these read-only once edited.

>
> > I?m so glad that now what many people were waiting for,
> > serial dead keys, is going to become a common feature
> > on Windows.
>
> I would be glad to see that too, but where do you see that on the
> referenced page? All I see is John's original text about working around
> the MSKLC limitation.

By experiencing the current use of KbdUTool (via a script in batch that
I?ve written with a comfortable UI for end-users), I feel myself in a
position to extrapolate this from John Cowan?s wording of the disclaimer:

?These assignments are considered temporary, and will be
reconsidered when the Microsoft program used to generate
Moby Latin can handle serial dead keys.?

It doesn?t say what program. Just ?the Microsoft program used.?
If today, this variable is set to 'KbdUTool' instead of 'MSKLC',
then suddenly the Microsoft program ?can handle serial dead keys.?

>
> If you want to work directly with KbdUTool to get serial dead keys,
> bypassing MSKLC, here is Kaplan's post from 2011 on how to do this. Be
> sure to read all the warnings twice:
>
> http://archives.miloush.net/michkap/archive/2011/04/16/10154700.html

Thank you for this link. This is what I should refer to when citing the feature.

There is the test issue, that seems rather awesome. Does a working layout
driver prove that there is no known bug? I?m actually using sucb a working
layout driver. E.g. pressing the Acute dead key twice, then ?o?, inserts ???.

I?d suggest not to do this in the .klc file, too complicated through its
apparent simplicity because the diacritic doesn?t show up on each line.

Kind regards,
Marcel

From harshula at hj.id.au Sun Oct 16 18:15:57 2016
From: harshula at hj.id.au (Harshula)
Date: Mon, 17 Oct 2016 10:15:57 +1100
Subject: Amiguity(?) in Sinhala named sequences
In-Reply-To:
References:
Message-ID: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au>

Hi Martin,

On 15/10/16 04:07, Martin Jansche wrote:
> For Sinhala, the following named sequences are defined (for good reasons):
>
> SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA
> SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB
> SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D
>
> I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll
> write Ya for 0DBA and Ra for 0DBB.
>
> Note that these give rise to two potentially ambiguous codepoint
> strings, namely
>
> 0DBB 0DCA 200D 0DBA
> 0DBB 0DCA 200D 0DBB
>
> I'll concentrate on the first, as all arguments apply to the second one
> analogously.
>
> At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible parses:
>
> 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya
> 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya
>
> First question: Does the standard give any guidance as to which one is
> the intended parse? The section on Sinhala in the Unicode Standard is
> silent about this. Is there a general principle I'm missing?
>
> Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not
> used and is considered incorrect, suggesting that the second parse
> (Repaya+Ya) should be the default interpretation of this sequence.
> However, SLS 1134 does not address the potential ambiguity of this
> sequence explicitly and the description there could be read as
> informative, not normative.

1) re: 0DBB 0DCA 200D 0DBA

SLS 1134 was updated in 2011 (The latest public version I could find is
v3.41. This extract is the same in v3.6.):
https://sourceforge.net/p/sinhala/mailman/attachment/4D957C56.5050204 at cse.mrt.ac.lk/1/

"1. The yansaya is not used following the letter ?. e.g.: the spelling
??????? is incorrect."

If the above is insufficient, it's best to discuss the issue with Harsha
(CC'd) and Ruvan (CC'd).

2) re: 0DBB 0DCA 200D 0DBB

Harsha & Ruvan can clarify this too.

cya,
#

> Second question: Given that one parse of this sequence should be the
> default, how does one represent the non-default parse?
>
> In most cases one can guess what the intended meaning is, but I suspect
> this is somewhat of a gray area. In practice, trying to render these
> problematic sequences and their neighbors in HarfBuzz with a variety of
> fonts results in a variety of outcomes (including occasionally
> unexpected glyph choices). If the meaning of these sequences is not well
> defined, that would partly explain the variation across fonts.
>
> Am I missing something fundamental? If not, it seems this issue should
> be called out explicit in some part of the standard.
>
> Regards,
> -- martin

From cibucj at gmail.com Sun Oct 16 22:12:54 2016
From: cibucj at gmail.com (=?UTF-8?B?4LS44LS/4LSs4LWBIOKAjA==?=)
Date: Mon, 17 Oct 2016 04:12:54 +0100
Subject: Amiguity(?) in Sinhala named sequences
In-Reply-To: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au>
References:
<9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au>
Message-ID:

Hi Martin,

Isn't this question analogous to asking whether the layout engine should
use C1-conjoining form or C2-conjoining form for a
sequence in any indic? that is, whether the should form a
glyph while C2 keeping its independent form or vice versa. (Potentially
there can be more forms - that is, full ligature and explicit Virama form).
If the question you asked is equivalent, then the answer is traditionally
is left to the font to decide.

BTW, even for a given C1 and C2 for a given script, a font can potentially
choose a different answer based on its its purpose/character, like a font
for Malayalam traditional script Vs a font for reformed script.

regards,
Cibu

On Mon, Oct 17, 2016 at 12:15 AM, Harshula wrote:

> Hi Martin,
>
> On 15/10/16 04:07, Martin Jansche wrote:
> > For Sinhala, the following named sequences are defined (for good
> reasons):
> >
> > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA
> > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB
> > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D
> >
> > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll
> > write Ya for 0DBA and Ra for 0DBB.
> >
> > Note that these give rise to two potentially ambiguous codepoint
> > strings, namely
> >
> > 0DBB 0DCA 200D 0DBA
> > 0DBB 0DCA 200D 0DBB
> >
> > I'll concentrate on the first, as all arguments apply to the second one
> > analogously.
> >
> > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible
> parses:
> >
> > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya
> > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya
> >
> > First question: Does the standard give any guidance as to which one is
> > the intended parse? The section on Sinhala in the Unicode Standard is
> > silent about this. Is there a general principle I'm missing?
> >
> > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not
> > used and is considered incorrect, suggesting that the second parse
> > (Repaya+Ya) should be the default interpretation of this sequence.
> > However, SLS 1134 does not address the potential ambiguity of this
> > sequence explicitly and the description there could be read as
> > informative, not normative.
>
> 1) re: 0DBB 0DCA 200D 0DBA
>
> SLS 1134 was updated in 2011 (The latest public version I could find is
> v3.41. This extract is the same in v3.6.):
> https://sourceforge.net/p/sinhala/mailman/attachment/
> 4D957C56.5050204 at cse.mrt.ac.lk/1/
>
> "1. The yansaya is not used following the letter ?. e.g.: the spelling
> ??????? is incorrect."
>
> If the above is insufficient, it's best to discuss the issue with Harsha
> (CC'd) and Ruvan (CC'd).
>
> 2) re: 0DBB 0DCA 200D 0DBB
>
> Harsha & Ruvan can clarify this too.
>
> cya,
> #
>
>
> > Second question: Given that one parse of this sequence should be the
> > default, how does one represent the non-default parse?
> >
> > In most cases one can guess what the intended meaning is, but I suspect
> > this is somewhat of a gray area. In practice, trying to render these
> > problematic sequences and their neighbors in HarfBuzz with a variety of
> > fonts results in a variety of outcomes (including occasionally
> > unexpected glyph choices). If the meaning of these sequences is not well
> > defined, that would partly explain the variation across fonts.
> >
> > Am I missing something fundamental? If not, it seems this issue should
> > be called out explicit in some part of the standard.
> >
> > Regards,
> > -- martin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From mjansche at google.com Mon Oct 17 09:58:13 2016
From: mjansche at google.com (Martin Jansche)
Date: Mon, 17 Oct 2016 15:58:13 +0100
Subject: Amiguity(?) in Sinhala named sequences
In-Reply-To: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au>
References:
<9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au>
Message-ID:

Thanks for the pointer to the 2011 version of SLS 1134. After reading that
and discussing further with Cibu, here's a tentative proposal:

* The most logical[*] interpretation of the sequence 0DBB 0DCA 200D 0DBA is
as Repaya+Ya. A standard (Unicode and/or SLS) should call this out
explicitly. ([*]Logical: In other scripts, including Devanagari, Myanmar,
etc. similar types of modifiers that logically precede a letter are
represented in this way, sometimes without ZWJ or with a different
character in lieu of ZWJ. Also this interpretation plays well alongside a
hypothetical alternative encoding of Yansaya using a single codepoint.)

* A standard (Unicode and/or SLS) should specify how Ra+Yansaya should be
encoded. SLS 1134 points out that Ra+Yansaya is an incorrect spelling, yet
in order to make this point it has to show the glyph sequence for
Ra+Yansaya. So there is clearly some need to be able to render this, even
if it's only at this meta-linguistic level. Plus SLS 1134 is very explicit
that e.g. keyboarding should allow for letter combinations to be entered
even if they are not practically useful. One possible way of encoding
Ra+Yansaya is 0DBB 200C 0DCA 200D 0DBA, i.e. Ra ZWNJ Yansaya. This renders
as intended in HarfBuzz with NotoSansSinhala, but not with
LBhashitaComplex. If we had a clear directive regarding how Ra+Yansaya
should be represented, we could work on getting fonts updated.

* Everything about 0DBB 0DCA 200D 0DBA also applies to 0DBB 0DCA 200D 0DBB.
This is much less relevant in practice, but the same arguments about
ambiguity apply and should be resolved in the same way.

Regards,
-- martin

On Mon, Oct 17, 2016 at 12:15 AM, Harshula wrote:

> Hi Martin,
>
> On 15/10/16 04:07, Martin Jansche wrote:
> > For Sinhala, the following named sequences are defined (for good
> reasons):
> >
> > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA
> > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB
> > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D
> >
> > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll
> > write Ya for 0DBA and Ra for 0DBB.
> >
> > Note that these give rise to two potentially ambiguous codepoint
> > strings, namely
> >
> > 0DBB 0DCA 200D 0DBA
> > 0DBB 0DCA 200D 0DBB
> >
> > I'll concentrate on the first, as all arguments apply to the second one
> > analogously.
> >
> > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible
> parses:
> >
> > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya
> > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya
> >
> > First question: Does the standard give any guidance as to which one is
> > the intended parse? The section on Sinhala in the Unicode Standard is
> > silent about this. Is there a general principle I'm missing?
> >
> > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not
> > used and is considered incorrect, suggesting that the second parse
> > (Repaya+Ya) should be the default interpretation of this sequence.
> > However, SLS 1134 does not address the potential ambiguity of this
> > sequence explicitly and the description there could be read as
> > informative, not normative.
>
> 1) re: 0DBB 0DCA 200D 0DBA
>
> SLS 1134 was updated in 2011 (The latest public version I could find is
> v3.41. This extract is the same in v3.6.):
> https://sourceforge.net/p/sinhala/mailman/attachment/
> 4D957C56.5050204 at cse.mrt.ac.lk/1/
>
> "1. The yansaya is not used following the letter ?. e.g.: the spelling
> ??????? is incorrect."
>
> If the above is insufficient, it's best to discuss the issue with Harsha
> (CC'd) and Ruvan (CC'd).
>
> 2) re: 0DBB 0DCA 200D 0DBB
>
> Harsha & Ruvan can clarify this too.
>
> cya,
> #
>
>
> > Second question: Given that one parse of this sequence should be the
> > default, how does one represent the non-default parse?
> >
> > In most cases one can guess what the intended meaning is, but I suspect
> > this is somewhat of a gray area. In practice, trying to render these
> > problematic sequences and their neighbors in HarfBuzz with a variety of
> > fonts results in a variety of outcomes (including occasionally
> > unexpected glyph choices). If the meaning of these sequences is not well
> > defined, that would partly explain the variation across fonts.
> >
> > Am I missing something fundamental? If not, it seems this issue should
> > be called out explicit in some part of the standard.
> >
> > Regards,
> > -- martin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From asmusf at ix.netcom.com Mon Oct 17 11:52:48 2016
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Mon, 17 Oct 2016 09:52:48 -0700
Subject: Amiguity(?) in Sinhala named sequences
In-Reply-To:
References:
<9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au>

Message-ID:

An HTML attachment was scrubbed...
URL:

From sr.erickson at gmail.com Fri Oct 21 12:11:36 2016
From: sr.erickson at gmail.com (seth erickson)
Date: Fri, 21 Oct 2016 10:11:36 -0700
Subject: Historical question about 'universal signs'
Message-ID:

Greetings Unicoders,

I'm trying to find information (for research purposes) about a character
set mentioned in Joseph Becker's 1988 draft proposal [1]:

"In 1978, the initial proposal for a set of 'Universal Signs' was made by
Bob Belleville at Xerox PARC. Many persons contributed ideas to the
development of a new encoding design. Beginning in 1980, these efforts
evolved into the Xerox Character Code Standard (XCCS) [...]"

XCCS is fairly well documented but I'm having trouble finding anything
about the proposal by Bob Belleville. Any pointers would be appreciated.

Thanks,

Seth Erickson
PhD student
Department of Information Studies
University of California, Los Angeles

[1] http://unicode.org/history/unicode88.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From doug at ewellic.org Sun Oct 23 12:01:29 2016
From: doug at ewellic.org (Doug Ewell)
Date: Sun, 23 Oct 2016 11:01:29 -0600
Subject: XCCS (was: Historical question about 'universal signs')
In-Reply-To:
References:
Message-ID:

seth erickson wrote:

> XCCS is fairly well documented

That hasn't been my experience. I'd be interested in any links you can
forward that go beyond "Unicode built on" or "drew ideas from" or "was
influenced by" XCCS.

Thanks,

--
Doug Ewell | Thornton, CO, US | ewellic.org

From sr.erickson at gmail.com Mon Oct 24 23:20:06 2016
From: sr.erickson at gmail.com (seth erickson)
Date: Mon, 24 Oct 2016 21:20:06 -0700
Subject: XCCS (was: Historical question about 'universal signs')
In-Reply-To:
References:

Message-ID:

See pg. 57-63 of this:

Xerox. (1985). *Xerox System Network Architecture: General Information
Manua*l (No. XNSG 068504). Retrieved from
http://archive.org/details/bitsavers_xeroxxnsXNNetworkArchitectureGeneralInformationMan_10024221

SE

On Sun, Oct 23, 2016 at 10:01 AM, Doug Ewell wrote:

> seth erickson wrote:
>
> XCCS is fairly well documented
>>
>
> That hasn't been my experience. I'd be interested in any links you can
> forward that go beyond "Unicode built on" or "drew ideas from" or "was
> influenced by" XCCS.
>
> Thanks,
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From shizhao at gmail.com Thu Oct 27 06:48:23 2016
From: shizhao at gmail.com (shi zhao)
Date: Thu, 27 Oct 2016 11:48:23 +0000
Subject: about China Font Bank
Message-ID:

from
http://www.nytimes.com/2016/10/25/opinion/chinas-digital-soft-power-play.html?_r=0

This month, the Chinese government plans to introduce codes for some 3,000
Chinese characters as part of a grand project, known as the China Font
Bank, to digitize 500,000 characters previously unavailable in electronic
form.

The project highlights 100,000 characters from the country?s 56 ethnic
minorities, and another 100,000 rare and ancient characters from China?s
written corpus. Deploying almost 30 companies, institutions and
universities, it?s the largest state-funded digitization project ever
undertaken.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

From john at mitre.org Thu Oct 27 09:13:35 2016
From: john at mitre.org (Burger, John D.)
Date: Thu, 27 Oct 2016 10:13:35 -0400
Subject: about China Font Bank
In-Reply-To:
References:
Message-ID: <328CAB25-BC48-40E8-8AAC-D3156AA55940@mitre.org>

Language Log has a good article on this, including reactions from several sinographers:

http://languagelog.ldc.upenn.edu/nll/?p=29034

- JB

> On Oct 27, 2016, at 07:48, shi zhao wrote:
>
> from http://www.nytimes.com/2016/10/25/opinion/chinas-digital-soft-power-play.html?_r=0
>
> This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form.
>
> The project highlights 100,000 characters from the country?s 56 ethnic minorities, and another 100,000 rare and ancient characters from China?s written corpus. Deploying almost 30 companies, institutions and universities, it?s the largest state-funded digitization project ever undertaken.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:}}}}}