From junichi.chiba.bps at gmail.com Sat Oct 1 02:04:11 2016 From: junichi.chiba.bps at gmail.com (Junichi Chiba) Date: Sat, 01 Oct 2016 07:04:11 +0000 Subject: Dates in Japanese Era Names in Unicode Standard In-Reply-To: <59642171-c152-0863-8165-ac48ace1d9a1@it.aoyama.ac.jp> References: <6c865cc7-8227-d72a-7794-e9fe9f3bc583@it.aoyama.ac.jp> <59642171-c152-0863-8165-ac48ace1d9a1@it.aoyama.ac.jp> Message-ID: > Your analysis sounds very plausible. I suggest you send an official error report using http://www.unicode.org/reporting.html. Thank you, Martin! I sent a suggestion there together with a link to discussion here. On Fri, 30 Sep 2016 at 14:43 Martin J. D?rst wrote: > Hello Junichi, > > Your analysis sounds very plausible. I suggest you send an official > error report using http://www.unicode.org/reporting.html. > > Regards, Martin. > > On 2016/09/30 13:16, ?? ?? wrote: > >> Is it possible that these eras start at midday instead of noon ? > >> This could explain the date difference, if you do not set the time in > > your query > >> (your query will assume a default time at 00:00 midnight) > > > > The new era starts 00:00 midnight local time. > > Together with the time zone difference, I assume that the cause was a > > simple chain of mistakes while drafting the unicode document. > > > > My story: > > > > First, the author for the Table 22-8 asks somebody to send a list of the > > dates. > > For the table to work, the accuracy of "day" should be enough, rather > than > > time. > > The "day" value is thus recorded in YYYYMMDD format. > > It is then listed in a file format like a spreadsheet, that keeps day > value > > in "time" accuracy with time zone marker. > > As there is no intention to keep it in "time" accuracy, let's suppose > that > > a default marker such as UTC+0 is embed automatically. > > > > The spreadsheet is then sent to the author and opened in more "Western" > > time zone than it was recorded. > > Upon opening the file, the dates were converted to local time zone. > > Specifying a more "Western" time zone results in smaller date values. > > Thus the smaller values are picked up by the author for Table 22-8. > > > > Actually all of the day values in Table 22-8 are shifted by one earlier. > > > > Current values: > > U+337B square era name heisei 1989-01-07 to present day > > U+337C square era name syouwa 1926-12-24 to 1989-01-06 > > U+337D square era name taisyou 1912-07-29 to 1926-12-23 > > U+337E square era name meizi 1867 to 1912-07-28 > > > > Suggested correction: > > U+337B square era name heisei 1989-01-08 to present day > > U+337C square era name syouwa 1926-12-25 to 1989-01-07 > > U+337D square era name taisyou 1912-07-30 to 1926-12-24 > > U+337E square era name meizi 1868 to 1912-07-29 > > > > > > Here are some citations. > > > > I will cite from the most reliable source, law database provided by the > > government (in Japanese). > > This is the actual law about when Heisei shall start: > > > http://law.e-gov.go.jp/cgi-bin/idxselect.cgi?IDX_OPT=1&H_NAME=%8C%B3%8D%86%82%F0%89%FC%82%DF%82%E9%90%AD%97%DF&H_NAME_YOMI=%82%A0&H_NO_GENGO=H&H_NO_YEAR=&H_NO_TYPE=2&H_NO_NO=&H_FILE_NAME=S64SE001&H_RYAKU=1&H_CTG=1&H_YOMI_GUN=1&H_CTG_GUN=1 > > > >> ??????????????? > >> ... > >> ?????????? > >> ?? > >> ???????????????????? > > > > Translation: > >> Showa 64 January 7 Ordinance 1 > >> ... > >> Era name shall be Heisei. > >> Appendix > >> This ordinance shall be effective since the next day of promulgation. > > > > The release date was January 7. > > As Martin mentioned, Heisei started on the next day of the announcement. > > Thus Showa lasted until the end of January 7 very midnight, then Heisei > > started at very morning of January 8. > > > >> On the other hand, I saw places that said Showa 64 as late as July (that > >> was when I climbed Mt. Fuji; a placard put up the year before said > >> "closed until July Showa 64"). > > > > I remember the same thing when I was a child. > > For about a half year, many things such as application forms and street > > signs still displayed in Showa. I saw Passport and License showing > > expiration date as Showa 70 or 80. Coins are printed and stocked before > > release, so there are circulation of Showa 64 coins. > > > > People often carry a conversion table like: > > 1986 : Showa 61 > > 1987 : Showa 62 > > 1988 : Showa 63 > > 1989 : Showa 64 : Heisei 1 > > 1990 : Showa 65 : Heisei 2 > > 1991 : Showa 66 : Heisei 3 > > > > I also cite start of Showa. This is citation from Wikisource, another > > reliable source for public documents. > > > https://ja.wikisource.org/wiki/%E6%98%AD%E5%92%8C%E3%83%88%E6%94%B9%E5%85%83 > >> ?????????????????????????????????????????????????????????? > >> ???? > >> ???????????? > > Translation: > >> In the name of Emperor who is given inherited soverignty to administer > > state affairs, We let Taisho 15 December 25 and forth be begin of Showa. > >> Signed by Emperor > >> Taisho 15 December 25 > > As Martin mentioned, eras before Heisei were renewed in the way that > > announcement overwrites the old day. > > > > > > Here is start of Taisho: > > > https://ja.wikisource.org/wiki/%E6%98%8E%E6%B2%BB%E5%9B%9B%E5%8D%81%E4%BA%94%E5%B9%B4%E4%B8%83%E6%9C%88%E4%B8%89%E5%8D%81%E6%97%A5%E4%BB%A5%E5%BE%8C%E3%83%B2%E6%94%B9%E3%83%A1%E3%83%86%E5%A4%A7%E6%AD%A3%E5%85%83%E5%B9%B4%E3%83%88%E7%88%B2%E3%82%B9 > >> ???????????????????????????? > >> ?????????????????????????????????????? > >> ???? > >> ??????????? > > > > Translation: > >> In the name of Emperor under inherited spirit of soverignty to > administer > > state affairs with virtue, We let, regarding ordinance enacted by the > > previous Emperor, Meiji 45 July 30 and forth be begin of Taisho. > >> Signed by Emperor > >> Meiji 45 July 30 > > > > With this law, Meiji 45 July 30 is overwritten by Taisho 1 July 30. > > > > > > Lastly, here is start of Meiji. > > > https://ja.wikisource.org/wiki/%E4%BB%8A%E5%BE%8C%E5%B9%B4%E8%99%9F%E3%83%8F%E5%BE%A1%E4%B8%80%E4%BB%A3%E4%B8%80%E8%99%9F%E3%83%8B%E5%AE%9A%E3%83%A1%E6%85%B6%E6%87%89%E5%9B%9B%E5%B9%B4%E3%83%B2%E6%94%B9%E3%83%86%E6%98%8E%E6%B2%BB%E5%85%83%E5%B9%B4%E3%83%88%E7%88%B2%E3%82%B9%E5%8F%8A%E8%A9%94%E6%9B%B8 > >> ?? > >> ...????????????????????????????? > >> ???????? > > > > Translation: > >> Imperial Edict > >> ... Keio 4 be renamed as Meiji 1 and since now the tradition of frequent > > renaming of Era be limited to one Era per Emperor. > > > > Since Meiji, the Era is less frequently renewed. It is more engineer > > friendly! > > > > In Table 22-8, the Meiji start day is omitted. > > The omission itself is reasonable. It can avoid controversy in writing > the > > day along Lunar calendar used until Meiji 5 December 2 midnight. (The > next > > day is Meiji 6 January 1.) > > > > The problem here is the year shown as 1867. > > The ordinance was released on Meiji 1 September 8 Lunar, which was 1868 > > October 23 Gregorian. > > Meiji 1 January 1 Lunar (and Keio 4 January 1 Lunar) is 1868 January 25 > > Gregorian. > > My best guess is that the author of Table 22-8 picked up the year value > > from spreadsheet showing "1867-12-31" in local time, originally intended > to > > show merely "1868-01". > > > > On Thu, 29 Sep 2016 at 19:46 Martin J. D?rst > wrote: > > > >> Just a few not very closely related comments: > >> > >> On 2016/09/29 19:06, Philippe Verdy wrote: > >>> Is it possible that these eras start at midday instead of noon ? This > >> could > >>> explain the date difference, if you do not set the time in your query > >> (your > >>> query will assume a default time at 00:00 midnight) > >> > >> It's extremely difficult to imagine this for Japan in this day and age. > >> > >> I was in Japan when the era changed from Showa to Heisei. I remember the > >> announcement very well, but I don't remember anything about the exact > >> time of the cutover. > >> > >> > >>> Many people still count the second half of the night after midnight as > >> part > >>> of the previous day (and so will say "Saturday evening"/"Saturday > night" > >>> even if it's already the first hours of Sunday). > >> > >> In Japan, that happens e.g. in displays of restaurants and bars, which > >> may announce their opening hours as 17:30-27:00 (i.e. open until three > >> in the morning the next day). But that's only a convention for > >> convenience, everybody knows that it's already the next day on the > >> calendar. > >> > >> > >>> If you test dates and don't want to specify hours, it is highly > >> recommended > >>> to set the default time at midday. For the Japanese eras, it's not > clear > >> at > >>> which time they really start, except for the last two eras since WW2 > but > >>> setting time at midday shoudl give the correct result. However there's > no > >>> ambiguity during the day of era switch, if the era is correctly > specified > >>> (and not just the year number in era). > >> > >> Yes indeed. These days, people just refer to 1989 (and any dates in it) > >> as Heisei 1 (????). This is all the easier because otherwise, an > >> exception would be necesary for only 7 days. > >> > >> On the other hand, I saw places that said Showa 64 as late as July (that > >> was when I climbed Mt. Fuji; a placard put up the year before said > >> "closed until July Showa 64"). I also got some money in February or so > >> that year and had to sign a receipt that said Showa 64 because it was > >> printed earlier. > >> > >> The Japanese Wikipedia article, at the bottom of the ?? > >> (https://ja.wikipedia.org/wiki/??#.E6.94.B9.E5.85.83) section, says > that > >> in contrast to the two earlier changes in era, the change started on the > >> next day, in order to give engineers time for the change. That next day > >> was a Sunday, which meant that in effect, they had even more time, > >> because most systems had to work with the new ear only from Monday. But > >> I guess it must have been a busy weekend for those involved, anyway. > >> > >> To know all the details, the best thing to do would be to check the > >> official government documents, which should be available online. But I > >> wouldn't be surprised if they were not specifying things to the second. > >> > >> Regards, Martin. > >> > >>> 2016-09-29 5:13 GMT+02:00 Junichi Chiba : > >>> > >>>> Dear all, > >>>> > >>>> Nice to e-meet you. > >>>> > >>>> I'm looking at the latest Unicode Standard [1] listing the dates for > >>>> Japanese Era Names in Table 22-8. > >>>> What I noticed is the begin and end dates for each era. > >>>> They seem to have one day difference with the dates that are > recognized > >>>> publicly in Japan. > >>>> For example, the current Heisei actually started January 8th, 1989, > >> after > >>>> Showa ended on 7th, 1989. > >>>> > >>>> However, the Unicode Standard says in Table 22-8: > >>>> U+337B square era name heisei 1989-01-07 to present day > >>>> U+337C square era name syouwa 1926-12-24 to 1989-01-06 > >>>> > >>>> Looking at Wikipedia in Japanese [2] and English [3], you can see > exact > >>>> dates for Syouwa end and Heisei start. > >>>> Could there be certain intentions to leave some difference in this > >>>> description and official dates? > >>>> Is the date counted according to GMT, instead of local date/time for > >> some > >>>> reason? > >>>> > >>>> REFERENCE > >>>> > >>>> [1] > >> http://www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf > >>>> > >>>> [2] https://ja.wikipedia.org/wiki/%E5%B9%B3%E6%88%90 > >>>>> > 1989????64??1?7????????????????????????????????????????????1989????64?? > >>>> 1?7????????????????????????1?8??????????? > >>>> > >>>> [3] https://en.wikipedia.org/wiki/Heisei_period > >>>>> Thus, 1989 corresponds to Sh?wa 64 until 7 January and Heisei 1 ... > >>>> since 8 January. > >>>>> On 7 January 1989, at 07:55 JST, the Grand Steward of Japan's > Imperial > >>>> Household Agency, Sh?ichi Fujimori, announced Emperor Hirohito's > >> death,... > >>>>> The Heisei era went into effect immediately upon the day after > Emperor > >>>> Akihito's succession to the throne on 7 January 1989. > >>>> > >>> > >> > >> -- > >> Martin J. D?rst > >> Department of Intelligent Information Technology > >> Collegue of Science and Engineering > >> Aoyama Gakuin University > >> Fuchinobe 5-1-10, Chuo-ku, Sagamihara > >> 252-5258 Japan > >> > > > > -- > Martin J. D?rst > Department of Intelligent Information Technology > Collegue of Science and Engineering > Aoyama Gakuin University > Fuchinobe 5-1-10, Chuo-ku, Sagamihara > 252-5258 Japan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.lukyanov at yspu.org Sat Oct 1 03:12:15 2016 From: a.lukyanov at yspu.org (a.lukyanov) Date: Sat, 01 Oct 2016 11:12:15 +0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <328312cd-094c-5f9b-62fd-7803e51173f8@ix.netcom.com> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <563c28fc-7772-59f6-01ae-ab99bcf64a39@cs.tut.fi> <99AC47C7-6BAC-4D76-A669-2D7743B00B69@evertype.com> <328312cd-094c-5f9b-62fd-7803e51173f8@ix.netcom.com> Message-ID: <57EF6FDF.4070304@yspu.org> I think that the right thing to do would be to create several new control/formatting characters, like this: "previous character is superscript" "previous character is subscript" "previous character is small caps (for use in phonetic transcription only)" "previous character is mathematical blackletter" etc Then people will be able to apply this features on any character as long as their font supports it. From khaledhosny at eglug.org Sat Oct 1 03:29:33 2016 From: khaledhosny at eglug.org (Khaled Hosny) Date: Sat, 1 Oct 2016 10:29:33 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> Message-ID: <20161001082933.GA2819@macbook> On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: > 30.9.2016, 19:11, Leonardo Boiko wrote: > > > The Unicode codepoints are not intended as a place to store > > typographically variant glyphs (much like the Unicode "italic" > > characters aren't designed as a way of encoding italic faces). > > There is no disagreement on this. What I was pointing at was that when using > rich text or markup, it is complicated or impossible to have typographically > correct glyphs used (even when they exist), whereas the use of Unicode > codepoints for subscript or superscript characters may do that in a much > simpler way. That is not generally true. In TeX you get true superscript glyphs by default. On the web you can use font features in CSS to get them as well, provided that you are using a font that supports them. Regards, Khaled From jkorpela at cs.tut.fi Sat Oct 1 07:00:50 2016 From: jkorpela at cs.tut.fi (Jukka K. Korpela) Date: Sat, 1 Oct 2016 15:00:50 +0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161001082933.GA2819@macbook> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: 1.10.2016, 11:29, Khaled Hosny wrote: > On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: [...] >> What I was pointing at was that when using >> rich text or markup, it is complicated or impossible to have typographically >> correct glyphs used (even when they exist), whereas the use of Unicode >> codepoints for subscript or superscript characters may do that in a much >> simpler way. > > That is not generally true. It is generally true, but not without exceptions. > In TeX you get true superscript glyphs by default. I suppose you?re right, though I don?t know exactly how TeX implements superscripts. I suspect the fonts that TeX normally uses do not contain (many) superscript or subscript glyph variants, but TeX might actually map e.g. ^2 in math mode to a superscript glyph for 2 (identical with to the glyph for ?). > On the web you can use font features in CSS to get them as > well, provided that you are using a font that supports them. This is a good example of my general statement. If you use the simple way in CSS, you use vertical-align set to sub or super together with a font-size setting. This is simple and ?works?, but it does not use subscript or superscript glyphs but algorithmically operates on normal glyphs (and produces different results in different browsers etc.). The newer way, setting font features, is a) much less widely known, 2) much less supported in browsers, 3) requires extra settings to deal with browser-specific names of the relevant properties. Yucca From glorieul at coanda-deviation.info Sat Oct 1 08:48:59 2016 From: glorieul at coanda-deviation.info (lorieul) Date: Sat, 01 Oct 2016 15:48:59 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: <1475329739.1352.0.camel@coanda-deviation.info> Re, On Fri, 2016-09-30 at 11:57 +0200, Gael Lorieul wrote: > I wonder why only a subset of the alphabet is available as subscript > and/or superscript ? On Fri, 2016-09-30 at 17:08 +0200, "J?rg Knappen" wrote: > They were found in older charactersets and Unicode > provides so-called "round-trip compatibility" to those > older character sets. Okay I understand better the context now? On Fri, 2016-09-30 at 17:19 +0200, Philippe Verdy wrote: > Your problem here is that "start" and "end" are not > symbols/variables but actual English words. Why > would this usage be restricted only to English ? > The same formula would need to be really translated > in various languages and scripts, needing then > mapping all letters in Latin, Greek, Cyrillic, but > even also Arabic, Japanese Chinese, Hindi... On Fri, 2016-09-30 at 13:11 -0300, Leonardo Boiko wrote: > The Unicode codepoints are not intended as a place > to store typographically variant glyphs (much like > the Unicode "italic" characters aren't designed as > a way of encoding italic faces). I understand your point? On Fri, 2016-09-30 at 17:08 +0200, "J?rg Knappen" wrote: > Sub- and Superscripts are considered "higher level markup" > and not parts of plain text in UNicode. You can easily get > at them using LaTeX notation or HTML tags for sub- or superscripts. The drawback of that solution is lack of readability in the sources. I would like to have a formatting in the spirit of markdown i.e. a formating that is easy to read both in the sources and after html- or pdf- or whatever-generation. Indeed Latex formulas are often not easy to decypher? Since one spends more time reading source code than documentation it is important that the comments within the source files are also easily readable. This way, there is no need to constantly switch back-and-forth between text editor and documentation : the source code suffices to itself. On Sat, 2016-10-01 at 11:12 +0300, a.lukyanov wrote: > I think that the right thing to do would be to create > several new control/formatting characters, like this: > > "previous character is superscript" > "previous character is subscript" > "previous character is small caps (for use in phonetic transcription only)" > "previous character is mathematical blackletter" > etc > > Then people will be able to apply this features on any > character as long as their font supports it. That would be a nice alternative indeed. Regards, Ga?l From verdy_p at wanadoo.fr Sat Oct 1 09:00:35 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Oct 2016 16:00:35 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: I disagree. Fonts normally contain metrics for proper positioning of the superscript and subscript baselines and relative heights. They "may" provide additional features to overide the glyphs or relative positioning if this is needded for the coherence with the preencoded superscripts/superscripts that are mapped in the font, or to adjust the visual weights of strokesand adjust some angles, or for correct hinting on low resolution displays. These specific features do not need to be enabled explicitly in CSS, they should be enabled by default. Problems only occur with defective fonts that have incomplete data, and for which browsers (in fact their internal text renderers) are attempting to define some reasonnable defaults. This may for some time produce some incoherent styles but this is temporary. Slowly but surely, these defects are being corrected. As Unicode encodes things for the long term, there's no need to define temporary workaround by encoding new variants. The existing superscript/subscripts have then been encoded for other prupose: preserve separate semantics of letter modifiers in plain text or in IPA as **distinct** symbols. Any other use is still possible by people hacing these characters as if it was a general way for writing superscript/subscript, but these are just hacks that break the identity of the represented text. They have also been encoded for roundtrip compatibiluty with older standards where it is impossible to determine what is the intended semantics, but also because these old characters were used in low resolution displays or monospaced displays (where more exact font metrics needed for maths formulas could not be respected at all). Even in TeX or math formulas in general, all symbols used in superscript/subscript are preserving their own identity: this is just a question of layout where the applied style adds (but does not replaces or removes) more semantics. In summary, we should use the normal characters including in Tex/Maths. Then the layout engine will do its best with the fonts they have, will honor their suggested metrics (if they are defined), will attempt to alias some missing character mappings in fonts, or will synthetize these styles using the best metrics available in font or computed with reasonnable defaults for the scripts. And for all this you do not need more than a "sub" or "sup" element in HTML, and in TeX/MathML you just use its standard "^" or "_" layout operators. Only at this time, if authors are seeing that the current implementations are still not what they expected, they will attempt to hack a bit the presentaiton by adding some specific styles (but only as a temporary workaround, which will no longer be needed in the long term and that could cause incoherences later with updated fonts or updated text engines that would produce better and more coherent results). 2016-10-01 14:00 GMT+02:00 Jukka K. Korpela : > 1.10.2016, 11:29, Khaled Hosny wrote: > > On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: >> > [...] > >> What I was pointing at was that when using > >> rich text or markup, it is complicated or impossible to have >>> typographically >>> correct glyphs used (even when they exist), whereas the use of Unicode >>> codepoints for subscript or superscript characters may do that in a much >>> simpler way. >>> >> >> That is not generally true. >> > > It is generally true, but not without exceptions. > > In TeX you get true superscript glyphs by default. >> > > I suppose you?re right, though I don?t know exactly how TeX implements > superscripts. I suspect the fonts that TeX normally uses do not contain > (many) superscript or subscript glyph variants, but TeX might actually map > e.g. ^2 in math mode to a superscript glyph for 2 (identical with to the > glyph for ?). > > On the web you can use font features in CSS to get them as >> well, provided that you are using a font that supports them. >> > > This is a good example of my general statement. If you use the simple way > in CSS, you use vertical-align set to sub or super together with a > font-size setting. This is simple and ?works?, but it does not use > subscript or superscript glyphs but algorithmically operates on normal > glyphs (and produces different results in different browsers etc.). The > newer way, setting font features, is a) much less widely known, 2) much > less supported in browsers, 3) requires extra settings to deal with > browser-specific names of the relevant properties. > > Yucca > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Sat Oct 1 09:21:47 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Sat, 1 Oct 2016 16:21:47 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <1475329739.1352.0.camel@coanda-deviation.info> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> <1475329739.1352.0.camel@coanda-deviation.info> Message-ID: <8C55D747-01EB-4AE3-8F14-E07C29F1A97E@telia.com> > On 1 Oct 2016, at 15:48, lorieul wrote: > Indeed Latex formulas are often not easy to > decypher? One can improve readability by using more Unicode characters [1] and the unicode-math package [2], or switching to ConTeXt , which has builtin support. 1. http://milde.users.sourceforge.net/LUCR/Math/unimathsymbols.xhtml 2. https://www.ctan.org/pkg/unicode-math From verdy_p at wanadoo.fr Sat Oct 1 09:24:10 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Oct 2016 16:24:10 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <1475329739.1352.0.camel@coanda-deviation.info> References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> <1475329739.1352.0.camel@coanda-deviation.info> Message-ID: 2016-10-01 15:48 GMT+02:00 lorieul : > > The drawback of that solution is lack of readability in the sources. I > would like to have a formatting in the spirit of markdown i.e. a > formating that is easy to read both in the sources and after html- or > pdf- or whatever-generation. Indeed Latex formulas are often not easy to > decypher? Since one spends more time reading source code than > documentation it is important that the comments within the source files > are also easily readable. This way, there is no need to constantly > switch back-and-forth between text editor and documentation : the source > code suffices to itself. The LaTex markup for superscripts/subscript is very simple("^" and "_") , even if you need extra parenthesese to surround subformulas. But in this context of maths formulas, the coder should understand those math formulas in order to implement or use them correctly. But a mere comment block in a source is the the best place to explain everything. In msot cases you'll use references to other documents and will use a precise terminology that is even easier to read than formulas. project managermnt tools help collecting all the needed pieces needed for communication between programmers and users of modules, but a source code does not replace a more formal documentation. Note also that Maths superscripts/subscripts need to support multiple levels of superscripts/subscripts with variable sizes. This is not possible with the Unicode-encoded characters designed only for a single level, but not a problem for TeX, MathML or HTML. The apparent simplicity using preencodec character "variants" becomes a nightmare later for parsing formulas (what does "x??" means: is it "(x^2)^2", i.e. "x^4", or "x^(22)" ?) or generated derived formulas Such problem however does not exist for their use only in linear plaint-text (for example as IPA symbols). -------------- next part -------------- An HTML attachment was scrubbed... URL: From guoyunhebrave at gmail.com Sat Oct 1 08:50:28 2016 From: guoyunhebrave at gmail.com (Guo Yunhe) Date: Sat, 1 Oct 2016 16:50:28 +0300 Subject: Minimum set of Emoji characters Message-ID: Hi, fontconfig project is looking for a define of all basic Emoji characters that a emoji font must have. Is it available from Unicode standards? -- Guo Yunhe From khaledhosny at eglug.org Sat Oct 1 10:37:34 2016 From: khaledhosny at eglug.org (Khaled Hosny) Date: Sat, 1 Oct 2016 17:37:34 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <4bec7eba-d3bb-d6e3-5869-1929e17bc8a4@coanda-deviation.info> <65dc0e3c-011d-dba4-6126-5a7ff9596fd2@cs.tut.fi> <19524b6c-15d8-37e8-78a3-dee1d774c4a0@cs.tut.fi> <20161001082933.GA2819@macbook> Message-ID: <20161001153734.GB2923@macbook> On Sat, Oct 01, 2016 at 03:00:50PM +0300, Jukka K. Korpela wrote: > 1.10.2016, 11:29, Khaled Hosny wrote: > > > On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote: > [...] > >> What I was pointing at was that when using > > > rich text or markup, it is complicated or impossible to have typographically > > > correct glyphs used (even when they exist), whereas the use of Unicode > > > codepoints for subscript or superscript characters may do that in a much > > > simpler way. > > > > That is not generally true. > > It is generally true, but not without exceptions. > > > In TeX you get true superscript glyphs by default. > > I suppose you?re right, though I don?t know exactly how TeX implements > superscripts. I suspect the fonts that TeX normally uses do not contain > (many) superscript or subscript glyph variants, but TeX might actually map > e.g. ^2 in math mode to a superscript glyph for 2 (identical with to the > glyph for ?). TeX has fonts designed for use at 8pt (size of 1st level scripts) and 5pt (the size of 2nd level scripts) with all the optical correction for them to look right when scaled down. They provide all the glyphs provided by the fonts for larger font sizes, so any character can be used in super or subscripts, no special mapping is needed. Regards, Khaled From mpsuzuki at hiroshima-u.ac.jp Sat Oct 1 11:19:17 2016 From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya) Date: Sun, 02 Oct 2016 01:19:17 +0900 Subject: [Unicode] Minimum set of Emoji characters In-Reply-To: References: Message-ID: <57EFE205.6080605@hiroshima-u.ac.jp> Dear Guo, Have you checked the thread from my post? http://www.unicode.org/mail-arch/unicode-ml/y2016-m09/0026.html Regards, mpsuzuki Guo Yunhe wrote: > Hi, fontconfig project is looking for a define of all basic Emoji > characters that a emoji font must have. Is it available from Unicode > standards? > From mark at macchiato.com Sun Oct 2 09:32:47 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 2 Oct 2016 16:32:47 +0200 Subject: [Unicode] Minimum set of Emoji characters In-Reply-To: <57EFE205.6080605@hiroshima-u.ac.jp> References: <57EFE205.6080605@hiroshima-u.ac.jp> Message-ID: ?At this point, the original set of Japanese emoji has long since been surpassed. The recommendation is to support the set of emoji in the data files referenced by http://www.unicode.org/reports/tr51/. There's much more information about various choices there. Note that there is a proposed new version that will be discussed in early November, at http://www.unicode.org/reports/tr51/proposed.html?, with additional emoji focused around gender support. Mark On Sat, Oct 1, 2016 at 6:19 PM, suzuki toshiya wrote: > Dear Guo, > > Have you checked the thread from my post? > http://www.unicode.org/mail-arch/unicode-ml/y2016-m09/0026.html > > Regards, > mpsuzuki > > Guo Yunhe wrote: > > Hi, fontconfig project is looking for a define of all basic Emoji > > characters that a emoji font must have. Is it available from Unicode > > standards? > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Oct 3 12:14:48 2016 From: doug at ewellic.org (Doug Ewell) Date: Mon, 03 Oct 2016 10:14:48 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> a.lukyanov wrote: > I think that the right thing to do would be to create several new > control/formatting characters, like this: > > "previous character is superscript" > "previous character is subscript" > "previous character is small caps (for use in phonetic transcription > only)" > "previous character is mathematical blackletter" > etc > > Then people will be able to apply this features on any character as > long as their font supports it. I happen to think this would be exactly the wrong thing to do, completely contrary to the principles of plain text that Unicode was founded upon. But you never know what might gain traction, so stay tuned. -- Doug Ewell | Thornton, CO, US | ewellic.org From leoboiko at namakajiri.net Mon Oct 3 12:40:23 2016 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 3 Oct 2016 14:40:23 -0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> Message-ID: Besides, there are already control/formatting characters for such purposes ? several ones, even. They look like this: , ^{}, \textsuperscript{}, \*{ \*} ? What's more, these powerful control/formatting characters allow one to apply not only super/subscript and blackletter, but many more features to any character as long as the font supports them, including bold, italics, small-caps, optical size changes and countless others. I heartily recommend using these special control/formatting characters, as they can considerably *enrich *any text. 2016-10-03 14:14 GMT-03:00 Doug Ewell : > a.lukyanov wrote: > > > I think that the right thing to do would be to create several new > > control/formatting characters, like this: > > > > "previous character is superscript" > > "previous character is subscript" > > "previous character is small caps (for use in phonetic transcription > > only)" > > "previous character is mathematical blackletter" > > etc > > > > Then people will be able to apply this features on any character as > > long as their font supports it. > > I happen to think this would be exactly the wrong thing to do, > completely contrary to the principles of plain text that Unicode was > founded upon. But you never know what might gain traction, so stay > tuned. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkorpela at cs.tut.fi Mon Oct 3 12:51:47 2016 From: jkorpela at cs.tut.fi (Jukka K. Korpela) Date: Mon, 3 Oct 2016 20:51:47 +0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> Message-ID: <81691895-adde-70b9-3fe5-685a35e815f5@cs.tut.fi> 3.10.2016, 20:40, Leonardo Boiko wrote: > Besides, there are already control/formatting characters for such > purposes ? several ones, even. They look like this: , ^{}, > \textsuperscript{}, \*{ \*} ? They are not control or formatting characters. They are markup used at higher protocol levels ? in different markup systems Yucca From steve at swales.us Mon Oct 3 12:59:41 2016 From: steve at swales.us (Steve Swales) Date: Mon, 3 Oct 2016 10:59:41 -0700 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> Message-ID: > On Oct 3, 2016, at 10:14 AM, Doug Ewell wrote: > > a.lukyanov wrote: > >> I think that the right thing to do would be to create several new >> control/formatting characters, like this: >> >> "previous character is superscript" >> "previous character is subscript" >> "previous character is small caps (for use in phonetic transcription >> only)" >> "previous character is mathematical blackletter" >> etc >> >> Then people will be able to apply this features on any character as >> long as their font supports it. > > I happen to think this would be exactly the wrong thing to do, > completely contrary to the principles of plain text that Unicode was > founded upon. But you never know what might gain traction, so stay > tuned. I guess I don?t see how it is fundamentally different from other variant selector uses within Unicode, and the ability to write properly formatted mathematical and chemical formulas (for example) in a plain text environment like text messaging seems like a fairly compelling use case. -steve From leoboiko at namakajiri.net Mon Oct 3 13:08:02 2016 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 3 Oct 2016 15:08:02 -0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <81691895-adde-70b9-3fe5-685a35e815f5@cs.tut.fi> References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> <81691895-adde-70b9-3fe5-685a35e815f5@cs.tut.fi> Message-ID: 2016-10-03 14:51 GMT-03:00 Jukka K. Korpela : > They are not control or formatting characters. They are markup used at > higher protocol levels ? in different markup systems > > That's exactly the point, yes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at tonal.clara.co.uk Mon Oct 3 13:33:30 2016 From: neil at tonal.clara.co.uk (Neil Harris) Date: Mon, 3 Oct 2016 19:33:30 +0100 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> Message-ID: <95c8f288-72f9-1726-c0c9-219341ac64a1@tonal.clara.co.uk> On 03/10/16 18:59, Steve Swales wrote: >> On Oct 3, 2016, at 10:14 AM, Doug Ewell wrote: >> >> a.lukyanov wrote: >> >>> I think that the right thing to do would be to create several new >>> control/formatting characters, like this: >>> >>> "previous character is superscript" >>> "previous character is subscript" >>> "previous character is small caps (for use in phonetic transcription >>> only)" >>> "previous character is mathematical blackletter" >>> etc >>> >>> Then people will be able to apply this features on any character as >>> long as their font supports it. >> I happen to think this would be exactly the wrong thing to do, >> completely contrary to the principles of plain text that Unicode was >> founded upon. But you never know what might gain traction, so stay >> tuned. > I guess I don?t see how it is fundamentally different from other variant selector uses within Unicode, and the ability to write properly formatted mathematical and chemical formulas (for example) in a plain text environment like text messaging seems like a fairly compelling use case. > > -steve > > > Yes, but since there are existing well-standardized higher-level protocols already in existence (HTML, MATHML, TeX, etc. etc.) that do exactly that. They should be used instead, as opposed to trying to make Unicode something other than a plain-text character encoding, contrary to its design principles. Moreover, while you describe seems superficially simple, as soon as you try to expand it, you will find you end up with systems like this: http://unicode.org/notes/tn28/UTN28-PlainTextMath.pdf which are neither one nor the other, and in spite of their proposal as a plain-text notation, actually ends up being an ad-hoc higher-level protocol anyway. Neil From gwalla at gmail.com Mon Oct 3 13:41:51 2016 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 3 Oct 2016 11:41:51 -0700 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003101448.665a7a7059d7ee80bb4d670165c8327d.cfdeb41a21.wbe@email03.godaddy.com> Message-ID: On Mon, Oct 3, 2016 at 10:59 AM, Steve Swales wrote: > > > On Oct 3, 2016, at 10:14 AM, Doug Ewell wrote: > > > > a.lukyanov wrote: > > > >> I think that the right thing to do would be to create several new > >> control/formatting characters, like this: > >> > >> "previous character is superscript" > >> "previous character is subscript" > >> "previous character is small caps (for use in phonetic transcription > >> only)" > >> "previous character is mathematical blackletter" > >> etc > >> > >> Then people will be able to apply this features on any character as > >> long as their font supports it. > > > > I happen to think this would be exactly the wrong thing to do, > > completely contrary to the principles of plain text that Unicode was > > founded upon. But you never know what might gain traction, so stay > > tuned. > > I guess I don?t see how it is fundamentally different from other variant > selector uses within Unicode, and the ability to write properly formatted > mathematical and chemical formulas (for example) in a plain text > environment like text messaging seems like a fairly compelling use case. > That would not be sufficient for properly formatted mathematical formulas. Exponentiation alone requires an indefinite number of levels of superscripting, and that's not even getting into things like summation, integrals, and even the division bar, which require complex two-dimensional positioning. I don't think chemical formulas need any characters that aren't already encoded, though atomic symbols are properly formatted with superscripted mass stacked on top of subscripted atomic number, and stacking is sometimes used with polyatomic ions (but optional AIUI, so something like Hg??? is acceptable and understood). If you're referring to full structural formulas, all bets are off: those are clearly 2-dimensional diagrams. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Oct 3 13:47:09 2016 From: doug at ewellic.org (Doug Ewell) Date: Mon, 03 Oct 2016 11:47:09 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161003114709.665a7a7059d7ee80bb4d670165c8327d.751e97706d.wbe@email03.godaddy.com> Steve Swales wrote: >> I happen to think this would be exactly the wrong thing to do, >> completely contrary to the principles of plain text that Unicode was >> founded upon. But you never know what might gain traction, so stay >> tuned. > > I guess I don?t see how it is fundamentally different from other > variant selector uses within Unicode, Good question. Other variation selectors -- I assume this means U+FE00 through U+FE0F, plus the Plane 14 variation selectors, plus the Mongolian and ideographic selectors -- are defined and registered for use with specific, individual base characters. There are a lot of combinations defined for "text style" and "emoji style," with more probably on the way, but even in this seemingly open-ended field, variation selectors are valid only in defined combinations. The concept here was to invent combining characters for superscript, subscript, blackletter, etc. that could be applied to any base character. This is fundamentally different from "valid only in defined combinations." > and the ability to write properly formatted mathematical and chemical > formulas (for example) in a plain text environment like text messaging > seems like a fairly compelling use case. It certainly does. That's why UTC did the extensive research, way back in the 2000 time frame, to determine what characters were appropriate in mathematical contexts before encoding the Mathematical Alphanumeric Symbols. They came up with Latin letters for a wide variety of styles, and digits, Greek letters, and a few others for a subset of those styles, that were agreed to have special meaning in mathematical notation. They did not make the set open-ended, as if arbitrary characters such as & or ? had similar special meaning. Basic chemical formulas like H?SO? or [ClO?]? can be written in plain Unicode text. At some point the line between basic and non-basic has to be drawn, just as with arbitrarily stacked superscripts in math, and some sort of fancy-text solution has to take over. -- Doug Ewell | Thornton, CO, US | ewellic.org From asmusf at ix.netcom.com Mon Oct 3 15:47:09 2016 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Mon, 3 Oct 2016 13:47:09 -0700 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161003114709.665a7a7059d7ee80bb4d670165c8327d.751e97706d.wbe@email03.godaddy.com> References: <20161003114709.665a7a7059d7ee80bb4d670165c8327d.751e97706d.wbe@email03.godaddy.com> Message-ID: An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Oct 3 16:43:04 2016 From: doug at ewellic.org (Doug Ewell) Date: Mon, 03 Oct 2016 14:43:04 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> Asmus Freytag (c) wrote: > As a result, you can write basic formulas for select compounds, but > not all. Given that these basic formulae don't need full 2-D layout, > this still seems like an arbitrary restriction. Adding a carefully selected group of styled characters to the original, carefully selected set seems perfectly reasonable, and is how Unicode has worked for around 25 years. Is your suggestion to do that, or to throw the doors wide open? -- Doug Ewell | Thornton, CO, US | ewellic.org From samjnaa at gmail.com Tue Oct 4 03:13:57 2016 From: samjnaa at gmail.com (Shriramana Sharma) Date: Tue, 4 Oct 2016 13:43:57 +0530 Subject: Android character picker In-Reply-To: References: Message-ID: Hello. Kindly advise on what is the most comprehensive and up to date Unicode character picker for Android available. Am not able to find a good one. Thanks.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Tue Oct 4 05:35:53 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Tue, 4 Oct 2016 12:35:53 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> Message-ID: <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote: > On 10/3/2016 11:47 AM, Doug Ewell wrote: > > Basic chemical formulas like H?SO? or [ClO?]? can be written in > > plain Unicode text. At some point the line between basic and non-basic > > has to be drawn, just as with arbitrarily stacked superscripts in math, > > and some sort of fancy-text solution has to take over. > > UTC determined many years ago in response to a proposal, that alpha, beta > and gamma, common in organic chemistry, were not acceptable for encoding > as super/subscripts. > > At the time, this was requested to support plain text databases used for > regulatory purposes, where these were required as super or subscripts. > > Later, the beta and gamma were encoded for phonetic notation, but not the > alpha. > > As a result, you can write basic formulas for select compounds, but not all. > Given that these basic formulae don't need full 2-D layout, this still seems > like an arbitrary restriction. When it?s about informatics, arbitrary restrictions are precisely what gets me upset. Those limitations are?as I wrote the other day?a useless worsening of the usability and usefulness of a product. On Mon, 03 Oct 2016 14:43:04 -0700, Doug Ewell replied: > Asmus Freytag (c) wrote: > > > As a result, you can write basic formulas for select compounds, but > > not all. Given that these basic formulae don't need full 2-D layout, > > this still seems like an arbitrary restriction. > > Adding a carefully selected group of styled characters to the original, > carefully selected set seems perfectly reasonable, and is how Unicode > has worked for around 25 years. Is your suggestion to do that, or to > throw the doors wide open? I guess there is no need to throw any door open, and I?m sure that no suggestion to do so is included here. After the great many options that have been discussed, it?s now up to encode no more than one or, say, a handful more superscripts and subscripts, to enable people to achieve a great deal of database architecture. Marcel From c933103 at gmail.com Tue Oct 4 06:50:05 2016 From: c933103 at gmail.com (gfb hjjhjh) Date: Tue, 4 Oct 2016 19:50:05 +0800 Subject: What happened to Unicode CLDR's site? Message-ID: Why is the site suspended by Google and how to access it now? -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.buenzli at erratique.ch Tue Oct 4 06:57:19 2016 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Tue, 4 Oct 2016 13:57:19 +0200 Subject: What happened to Unicode CLDR's site? In-Reply-To: References: Message-ID: On Tuesday 4 October 2016 at 13:50, gfb hjjhjh wrote: > Why is the site suspended by Google and how to access it now? FWIW I reported the issue today using the website's reporting form. So I guess the answer is wait. Best, Daniel From liste at secarica.ro Tue Oct 4 07:51:24 2016 From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=) Date: Tue, 4 Oct 2016 15:51:24 +0300 Subject: What happened to Unicode CLDR's site? In-Reply-To: References: Message-ID: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris: > Why is the site suspended by Google and how to access it now? Just curious: Unicode = Google ? (physically) I am asking this because by entering directly http://cldr.unicode.org the error result belongs to Google and not to unicode.org. ? Cristi -- Cristian Secar? http://www.sec?ric?.ro From marc.blanchet at viagenie.ca Tue Oct 4 08:04:02 2016 From: marc.blanchet at viagenie.ca (Marc Blanchet) Date: Tue, 04 Oct 2016 09:04:02 -0400 Subject: What happened to Unicode CLDR's site? In-Reply-To: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> References: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> Message-ID: On 4 Oct 2016, at 8:51, Cristian Secar? wrote: > ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris: > >> Why is the site suspended by Google and how to access it now? > > Just curious: Unicode = Google ? (physically) well, does not look Google to me? but see below ////////////// dig unicode.org NS ;; ANSWER SECTION: unicode.org. 86400 IN NS nserver.euro.apple.com. unicode.org. 86400 IN NS nserver2.apple.com. unicode.org. 86400 IN NS nserver3.apple.com. unicode.org. 86400 IN NS nserver.apple.com. unicode.org. 86400 IN NS nserver.asia.apple.com. unicode.org. 86400 IN NS nserver4.apple.com. /////// dig unicode.org A ;; ANSWER SECTION: unicode.org. 2757 IN A 216.97.88.9 whois 216.97.88.9 NetRange: 216.97.0.0 - 216.97.127.255 CIDR: 216.97.0.0/17 NetName: CORESPACE-4 NetHandle: NET-216-97-0-0-1 Parent: NET216 (NET-216-0-0-0-0) NetType: Direct Allocation OriginAS: AS54489 Organization: CoreSpace, Inc. (CORES-27) RegDate: 2000-08-23 Updated: 2013-02-21 Ref: https://whois.arin.net/rest/net/NET-216-97-0-0-1 OrgName: CoreSpace, Inc. OrgId: CORES-27 Address: 7505 John W. Carpenter Freeway City: Dallas StateProv: TX PostalCode: 75247 Country: US RegDate: 2009-08-10 Updated: 2012-04-30 Ref: https://whois.arin.net/rest/org/CORES-27 ////////// BUT: dig cldr.unicode.org A ;; ANSWER SECTION: cldr.unicode.org. 37687 IN CNAME ghs.google.com. ghs.google.com. 86400 IN CNAME ghs.l.google.com. ghs.l.google.com. 230 IN A 173.194.208.121 so cldr seems to be hosted by Google. Marc. > > I am asking this because by entering directly http://cldr.unicode.org > the error result belongs to Google and not to unicode.org. > > ? > > Cristi > > -- > Cristian Secar? > http://www.sec?ric?.ro From srl at icu-project.org Tue Oct 4 08:53:06 2016 From: srl at icu-project.org (Steven R. Loomis) Date: Tue, 4 Oct 2016 06:53:06 -0700 Subject: What happened to Unicode CLDR's site? In-Reply-To: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> References: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> Message-ID: <1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org> Yes, the web content is hosted by google sites, a web hosting provider. As to it being down, i understand this is being looked into. Enviado desde nuestro iPhone. > El oct. 4, 2016, a las 5:51 AM, Cristian Secar? escribi?: > > ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris: > >> Why is the site suspended by Google and how to access it now? > > Just curious: Unicode = Google ? (physically) > > I am asking this because by entering directly http://cldr.unicode.org > the error result belongs to Google and not to unicode.org. > > ? > > Cristi > > -- > Cristian Secar? > http://www.sec?ric?.ro -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Oct 4 11:00:18 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 4 Oct 2016 18:00:18 +0200 Subject: What happened to Unicode CLDR's site? In-Reply-To: <1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org> References: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> <1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org> Message-ID: It looks that an automated bot run by Google detected an excessive use of bandwidth and launch the block, waiting for another subcription or payment, even if the site was (possibly) donated by Google itself. That bit probably does not know what it does and acts like any other hosted site. (Google's own usage policy is probably more enforced now: you can host free websites but above some threshold it will be blocked). Note also that this is the webhosting which is blocked, not the domain name (hosted by Apple who probably offered it to the Consortium). There's probably been a lack of communication somewhere in Google, or an administrator error that removed an exception for a site that should have first been handled specially internally by a human hierarchy. If the usage limit was exhausted, may be this is because the site was harvested by some malwares and I think it's reasonnable to block it first before scanning, cleaning, restoring damaged parts from a safe backup, and investigating about which protection measures were missing or should be taken). There's certainly people looking for what happend precisely. I hope this is just an administrative measure that can be easily reversed and that no damage happend to CLDR data (and to private data there about CLDR surveyors or user authentication databases). I don't think there's damage on the released CLDR data, but there could be losses in some recent ongoing works. 2016-10-04 15:53 GMT+02:00 Steven R. Loomis : > Yes, the web content is hosted by google sites, a web hosting provider. > > As to it being down, i understand this is being looked into. > > Enviado desde nuestro iPhone. > > El oct. 4, 2016, a las 5:51 AM, Cristian Secar? > escribi?: > > ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris: > > Why is the site suspended by Google and how to access it now? > > > Just curious: Unicode = Google ? (physically) > > I am asking this because by entering directly http://cldr.unicode.org > the error result belongs to Google and not to unicode.org. > > ? > > Cristi > > -- > Cristian Secar? > http://www.sec?ric?.ro > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Tue Oct 4 11:25:36 2016 From: doug at ewellic.org (Doug Ewell) Date: Tue, 04 Oct 2016 09:25:36 -0700 Subject: What happened to Unicode CLDR's =?UTF-8?Q?site=3F?= Message-ID: <20161004092536.665a7a7059d7ee80bb4d670165c8327d.e4fa88a6b9.wbe@email03.godaddy.com> It seems to be back up as of 16:23 UTC. -- Doug Ewell | Thornton, CO, US | ewellic.org From srl at icu-project.org Tue Oct 4 11:56:53 2016 From: srl at icu-project.org (Steven R. Loomis) Date: Tue, 04 Oct 2016 09:56:53 -0700 Subject: What happened to Unicode CLDR's site? In-Reply-To: <20161004092536.665a7a7059d7ee80bb4d670165c8327d.e4fa88a6b9.wbe@email03.godaddy.com> References: <20161004092536.665a7a7059d7ee80bb4d670165c8327d.e4fa88a6b9.wbe@email03.godaddy.com> Message-ID: <1E394F3C-60B4-431D-8011-BB9B7B9033EF@icu-project.org> Depending on DNS propagation, you may see minor glitches today. But the content should all be back up. -s El [FECHA], "[NOMBRE]" <[DIRECCI?N]> escribi?: >It seems to be back up as of 16:23 UTC. > >-- >Doug Ewell | Thornton, CO, US | ewellic.org From leoboiko at namakajiri.net Tue Oct 4 12:25:56 2016 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Tue, 4 Oct 2016 14:25:56 -0300 Subject: What happened to Unicode CLDR's site? In-Reply-To: References: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> <1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org> Message-ID: The Google error message felt a bit too harsh for a webhosting client who merely exceeded their allotted bandwidth. It made it sound like the website was hosting something illegal. 2016-10-04 13:00 GMT-03:00 Philippe Verdy : > It looks that an automated bot run by Google detected an excessive use of > bandwidth and launch the block, waiting for another subcription or payment, > even if the site was (possibly) donated by Google itself. That bit probably > does not know what it does and acts like any other hosted site. (Google's > own usage policy is probably more enforced now: you can host free websites > but above some threshold it will be blocked). > > Note also that this is the webhosting which is blocked, not the domain > name (hosted by Apple who probably offered it to the Consortium). > > There's probably been a lack of communication somewhere in Google, or an > administrator error that removed an exception for a site that should have > first been handled specially internally by a human hierarchy. > > If the usage limit was exhausted, may be this is because the site was > harvested by some malwares and I think it's reasonnable to block it first > before scanning, cleaning, restoring damaged parts from a safe backup, and > investigating about which protection measures were missing or should be > taken). > > There's certainly people looking for what happend precisely. I hope this > is just an administrative measure that can be easily reversed and that no > damage happend to CLDR data (and to private data there about CLDR surveyors > or user authentication databases). I don't think there's damage on the > released CLDR data, but there could be losses in some recent ongoing works. > > 2016-10-04 15:53 GMT+02:00 Steven R. Loomis : > >> Yes, the web content is hosted by google sites, a web hosting provider. >> >> As to it being down, i understand this is being looked into. >> >> Enviado desde nuestro iPhone. >> >> El oct. 4, 2016, a las 5:51 AM, Cristian Secar? >> escribi?: >> >> ?n data de Tue, 4 Oct 2016 19:50:05 +0800, gfb hjjhjh a scris: >> >> Why is the site suspended by Google and how to access it now? >> >> >> Just curious: Unicode = Google ? (physically) >> >> I am asking this because by entering directly http://cldr.unicode.org >> the error result belongs to Google and not to unicode.org. >> >> ? >> >> Cristi >> >> -- >> Cristian Secar? >> http://www.sec?ric?.ro >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Oct 4 12:59:04 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 4 Oct 2016 19:59:04 +0200 Subject: What happened to Unicode CLDR's site? In-Reply-To: References: <20161004155124.54513e4d0d5f5a50f1e70f23@secarica.ro> <1A74E2DA-F27E-4695-A963-6F164B1A4D1E@icu-project.org> Message-ID: 2016-10-04 19:25 GMT+02:00 Leonardo Boiko : > The Google error message felt a bit too harsh for a webhosting client who > merely exceeded their allotted bandwidth. It made it sound like the > website was hosting something illegal. > It's not impossible that the site was hacked a bit somewhere and used by a thirdparty to host illegal content, or some malwares causing it to generate a spike of bandwidth. Stopping the website temporarily is a safe measure before admins can explain what is causing this unexpected excess, and some cleanup operations are eventually performed and some additional security measures taken (Google itself cannot do that cleanup without an active action by the site maintainer). However I agree that the automatic message sent by Google used by the blocker was very harsh. Google can detect malwares running on hosted sites and could be more informative about the cause: - blocked because of a security issue (without explaining more to the public, could be a DDoS damaging the operations on other hosted websites, or hacked contents...). - blocked until the site admins solve technical problems. - blocked temporarily because of excess bandwidth (but no security issue detected), but not saying publicly if this is because of failed payments (this is private communication between the host provider and the web service). - blocked temporarily due to a technical problem on the hosting platform itself. - blocked indefinitely due to a legal constraint (such as a court order, the court order may force the publication of a legal notice on a static page). And it should provide a better way of contact for site admins, or for explaining what visitors can do (if there was a malware hosted on the site, what they should do themselves on their own devices). -------------- next part -------------- An HTML attachment was scrubbed... URL: From liste at secarica.ro Tue Oct 4 14:31:41 2016 From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=) Date: Tue, 4 Oct 2016 22:31:41 +0300 Subject: Android character picker In-Reply-To: References: Message-ID: <20161004223141.b05b7afd011e3875052b7f76@secarica.ro> ?n data de Tue, 4 Oct 2016 13:43:57 +0530, Shriramana Sharma a scris: > Kindly advise on what is the most comprehensive and up to date Unicode > character picker for Android available. > Am not able to find a good one. You didn't mentioned what (if) application you tried already and perhaps what means "a good one" by comparison. A search for "charmap" on Google Play gives at least two results. I tried (superficially) one of them, on which I was able to pick a group of characters with no problem. Cristi -- Cristian Secar? http://www.sec?ric?.ro From duerst at it.aoyama.ac.jp Wed Oct 5 00:27:44 2016 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Wed, 5 Oct 2016 14:27:44 +0900 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> Message-ID: <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> On 2016/10/04 19:35, Marcel Schneider wrote: > On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote: >> Later, the beta and gamma were encoded for phonetic notation, but not the >> alpha. >> >> As a result, you can write basic formulas for select compounds, but not all. >> Given that these basic formulae don't need full 2-D layout, this still seems >> like an arbitrary restriction. > > When it?s about informatics, arbitrary restrictions are precisely what gets me > upset. Those limitations are?as I wrote the other day?a useless worsening > of the usability and usefulness of a product. This kind of "let's avoid arbitrary limitations" argument works very well for subjects that are theoretical, straightforward, and rigid in nature. Many (but not all) subjects in computer science (informatics) are indeed of such a nature. The Unicode Consortium (or more specifically, the UTC) does a lot of hard work to create theories where appropriate, and to explain them where possible. But they recognize (and we should do so, too) that in the end, writing is a *cultural* phenomenon, where straightforward, rigid theories have severe limitations. From a certain viewpoint (the chemist's in the example above), the result may look arbitrary, but from another viewpoint (the phoneticist's), it looks perfectly fine. At first, it looks like it would be easy to fix such problems, but each fix risks to introduce new arbitrariness when seen from somebody else's viewpoint. Getting upset won't help. Regards, Martin. From charupdate at orange.fr Wed Oct 5 08:57:48 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 5 Oct 2016 15:57:48 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> Message-ID: <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote: > On 2016/10/04 19:35, Marcel Schneider wrote: >> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote: >> >>> Later, the beta and gamma were encoded for phonetic notation, but not the >>> alpha. >>> >>> As a result, you can write basic formulas for select compounds, but not all. >>> Given that these basic formulae don't need full 2-D layout, this still seems >>> like an arbitrary restriction. >> >> When it?s about informatics, arbitrary restrictions are precisely what gets me >> upset. Those limitations are?as I wrote the other day?a useless worsening >> of the usability and usefulness of a product. > > This kind of "let's avoid arbitrary limitations" argument works very > well for subjects that are theoretical, straightforward, and rigid in > nature. Many (but not all) subjects in computer science (informatics) > are indeed of such a nature. > > The Unicode Consortium (or more specifically, the UTC) does a lot of > hard work to create theories where appropriate, and to explain them > where possible. But they recognize (and we should do so, too) that in > the end, writing is a *cultural* phenomenon, where straightforward, > rigid theories have severe limitations. > > From a certain viewpoint (the chemist's in the example above), the > result may look arbitrary, but from another viewpoint (the > phoneticist's), it looks perfectly fine. At first, it looks like it > would be easy to fix such problems, but each fix risks to introduce new > arbitrariness when seen from somebody else's viewpoint. Getting upset > won't help. I?ve got the point, thanks. Phonetics need to write running text that is immediately legible, while a chemistry database may use particular notational conventions that work with baseline letters to be parsed on semantics or light markup for proper display in the UI. The UTC decision thus questioned the design principle of using plain text for chemical formulae. No doubt it was understood that validating this choice would have opened the door to encoding more special characters for upgrading or similar purposes. At this point I?d like to mention what I thought about since this thread was launched. The French language makes extensive use of superscripts to note abbreviations. This is not a mere styling issue, as it is in English. E.g. without superscripts, the abbreviation ?nos? [numbers] is ambiguated with the pronoun ?nos? [our]. The most that can be easily disambiguated is ?n?? [number] with the degree sign available on the common French keyboard layout. For the anecdote: When a technician led me to discover the field ?no centre mess? in the UI of my cellphone, it took me several seconds to understand ?number of SMS center/centre? which is the actual meaning; but here, some additional confusion resulted from the interlanguage homograph ?no?. Written words being ambiguated with one another is a common phenomenon in natural languages. Performing disambiguation is widely achieved by adding vowel signs (Hebrew) or diacritics (Latin script using languages). French was disfavored in computer practice (applied informatics) during a certain time when diacritics were unavailable?on uppercase letters longer than on lowercase. AFAIK, Latin letters like ??? and ??? first gained binary existence thanks to the ISO?6937 charset, while a Dutch standards author asked his compatriots to always write ?ij? with two ASCII letters, and two Frenchmen prevented the ??? from being encoded in Latin-1 at the intended code points because of its non-existence in computer printers. But today, thanks to Unicode, that?s all over. Therefore I suggest to grant the French language full support by enabling superscript lowercase letters in order that the SUPERSCRIPT deadkey that the French Standards body recommends, will work for all abreviations. There is no point about other letters than the basic alphabet superscripted, as no French abbreviation exceeds this range (despite of what I believed in 2014, like many other people). Additionally I?m proposing a modifier key combination (using a new modifier key on the 105th key on ISO keyboards) to access the lowercase superscripts on live keys: Shift + Num + [letter key] ? [superscript lowercase]. I can easily type ?on the 105?? key?, and so will all users in France, at least with the dead key. The missing letter is superscript q == MODIFIER LETTER SMALL Q. Actually, when Shift + Num + Q is pressed on the projects, ? ?q_n?existe_pas? [ superscript ?q? does not exist] is inserted. Karl Pentzlin had the merit of proposing the missing letter superscript q for use in French abbreviations, but the UTC must have refused by arguing from English usage and from French recommendations. These are now changing. More, as I tried to demonstrate above, one cannot always rely on such low-profile recommendations, which express more the humility and undemandingness of their author, than the real practical needs and linguistical requirements. As of searchability, Google have even the mathematical alphabets in their equivalence classes, so that any request written e.g. in doublestruck letters is read as if it were entered in plain ASCII. Best regards, Marcel From moyogo at gmail.com Wed Oct 5 09:17:30 2016 From: moyogo at gmail.com (Denis Jacquerye) Date: Wed, 05 Oct 2016 14:17:30 +0000 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> Message-ID: > There is no point about other letters than the basic alphabet superscripted, > as no French abbreviation exceeds this range (despite of what I believed > in 2014, like many other people). What does that mean? How would that help for the French vernacular 3?me, or the Spanish C.?a. You might find there are many more uses than you think. Higher level protocols can already support these. Maybe what we need is better and more general higher level protocol support. On Wed, 5 Oct 2016 at 15:01 Marcel Schneider wrote: > On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote: > > On 2016/10/04 19:35, Marcel Schneider wrote: > >> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote: > >> > >>> Later, the beta and gamma were encoded for phonetic notation, but not > the > >>> alpha. > >>> > >>> As a result, you can write basic formulas for select compounds, but > not all. > >>> Given that these basic formulae don't need full 2-D layout, this still > seems > >>> like an arbitrary restriction. > >> > >> When it?s about informatics, arbitrary restrictions are precisely what > gets me > >> upset. Those limitations are?as I wrote the other day?a useless > worsening > >> of the usability and usefulness of a product. > > > > This kind of "let's avoid arbitrary limitations" argument works very > > well for subjects that are theoretical, straightforward, and rigid in > > nature. Many (but not all) subjects in computer science (informatics) > > are indeed of such a nature. > > > > The Unicode Consortium (or more specifically, the UTC) does a lot of > > hard work to create theories where appropriate, and to explain them > > where possible. But they recognize (and we should do so, too) that in > > the end, writing is a *cultural* phenomenon, where straightforward, > > rigid theories have severe limitations. > > > > From a certain viewpoint (the chemist's in the example above), the > > result may look arbitrary, but from another viewpoint (the > > phoneticist's), it looks perfectly fine. At first, it looks like it > > would be easy to fix such problems, but each fix risks to introduce new > > arbitrariness when seen from somebody else's viewpoint. Getting upset > > won't help. > > I?ve got the point, thanks. Phonetics need to write running text that is > immediately legible, while a chemistry database may use particular > notational > conventions that work with baseline letters to be parsed on semantics or > light > markup for proper display in the UI. The UTC decision thus questioned the > design > principle of using plain text for chemical formulae. No doubt it was > understood > that validating this choice would have opened the door to encoding more > special > characters for upgrading or similar purposes. > > At this point I?d like to mention what I thought about since this thread > was launched. The French language makes extensive use of superscripts > to note abbreviations. This is not a mere styling issue, as it is in > English. > E.g. without superscripts, the abbreviation ?nos? [numbers] is ambiguated > with > the pronoun ?nos? [our]. The most that can be easily disambiguated is ?n?? > [number] > with the degree sign available on the common French keyboard layout. > For the anecdote: When a technician led me to discover the field > ?no centre mess? in the UI of my cellphone, it took me several seconds to > understand > ?number of SMS center/centre? which is the actual meaning; but here, some > additional > confusion resulted from the interlanguage homograph ?no?. > > Written words being ambiguated with one another is a common phenomenon in > natural languages. Performing disambiguation is widely achieved by adding > vowel signs (Hebrew) or diacritics (Latin script using languages). > French was disfavored in computer practice (applied informatics) during a > certain time when diacritics were unavailable?on uppercase letters longer > than on lowercase. > AFAIK, Latin letters like ??? and ??? first gained binary existence thanks > to the ISO?6937 charset, while a Dutch standards author asked his > compatriots > to always write ?ij? with two ASCII letters, and two Frenchmen prevented > the ??? > from being encoded in Latin-1 at the intended code points because of its > non-existence in computer printers. > > But today, thanks to Unicode, that?s all over. Therefore I suggest to grant > the French language full support by enabling superscript lowercase letters > in order that the SUPERSCRIPT deadkey that the French Standards body > recommends, > will work for all abreviations. There is no point about other letters than > the basic > alphabet superscripted, as no French abbreviation exceeds this range > (despite of > what I believed in 2014, like many other people). > Additionally I?m proposing a modifier key combination (using a new > modifier key on > the 105th key on ISO keyboards) to access the lowercase superscripts on > live keys: > Shift + Num + [letter key] ? [superscript lowercase]. > I can easily type ?on the 105?? key?, and so will all users in France, at > least > with the dead key. > > The missing letter is superscript q == MODIFIER LETTER SMALL Q. > Actually, when Shift + Num + Q is pressed on the projects, > ? ?q_n?existe_pas? [ superscript ?q? does not exist] is inserted. > > Karl Pentzlin had the merit of proposing the missing letter superscript q > for use in French abbreviations, but the UTC must have refused by arguing > from English usage and from French recommendations. These are now changing. > More, as I tried to demonstrate above, one cannot always rely on such > low-profile recommendations, which express more the humility and > undemandingness > of their author, than the real practical needs and linguistical > requirements. > > As of searchability, Google have even the mathematical alphabets in their > equivalence classes, so that any request written e.g. in doublestruck > letters > is read as if it were entered in plain ASCII. > > Best regards, > > Marcel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Schappo at lboro.ac.uk Wed Oct 5 09:37:23 2016 From: A.Schappo at lboro.ac.uk (Andre Schappo) Date: Wed, 5 Oct 2016 14:37:23 +0000 Subject: My Annual Unicode Questions Message-ID: <278CED01-10B0-4E02-A452-147A8F08D919@lboro.ac.uk> This week is the first week of the new academic year at my university. One of the modules I co-teach is entitled "Programming for the WWW" which encompasses JavaScript and DHTML. This is a first year module. There were approx 70 students in the lab practical this morning. I asked them my annual questions. Q. Who has heard of Unicode? A. Approx 20% of the class raised their hands. (Same as last year http://www.unicode.org/mail-arch/unicode-ml/y2015-m12/0073.html) Q. Who understands Unicode? A. One student raised his hand. (This is an improvement on last year as no hand was raised last year) Andr? Schappo From martinmueller at northwestern.edu Wed Oct 5 01:35:52 2016 From: martinmueller at northwestern.edu (Martin Mueller) Date: Wed, 5 Oct 2016 06:35:52 +0000 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> Message-ID: <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu> There is always a lot more history than reason in the world. That said, given that alphabets have fixed numbers, it?s weird that bits of super and subscripted letters appear in this or that limited range but that you can?t cobble a whole alphabet together in a consistent manner. If any , why not all, especially if there are only two or three dozen. On 10/4/16, 11:27 PM, "Unicode on behalf of Martin J. D?rst" wrote: On 2016/10/04 19:35, Marcel Schneider wrote: > On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote: >> Later, the beta and gamma were encoded for phonetic notation, but not the >> alpha. >> >> As a result, you can write basic formulas for select compounds, but not all. >> Given that these basic formulae don't need full 2-D layout, this still seems >> like an arbitrary restriction. > > When it?s about informatics, arbitrary restrictions are precisely what gets me > upset. Those limitations are?as I wrote the other day?a useless worsening > of the usability and usefulness of a product. This kind of "let's avoid arbitrary limitations" argument works very well for subjects that are theoretical, straightforward, and rigid in nature. Many (but not all) subjects in computer science (informatics) are indeed of such a nature. The Unicode Consortium (or more specifically, the UTC) does a lot of hard work to create theories where appropriate, and to explain them where possible. But they recognize (and we should do so, too) that in the end, writing is a *cultural* phenomenon, where straightforward, rigid theories have severe limitations. From a certain viewpoint (the chemist's in the example above), the result may look arbitrary, but from another viewpoint (the phoneticist's), it looks perfectly fine. At first, it looks like it would be easy to fix such problems, but each fix risks to introduce new arbitrariness when seen from somebody else's viewpoint. Getting upset won't help. Regards, Martin. From charupdate at orange.fr Wed Oct 5 10:04:05 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 5 Oct 2016 17:04:05 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> Message-ID: <1301337346.11235.1475679845658.JavaMail.www@wwinf1f05> On Wed, 05 Oct 2016 14:17:30 +0000, Denis Jacquerye wrote; >> There is no point about other letters than the basic alphabet superscripted, >> as no French abbreviation exceeds this range (despite of what I believed >> in 2014, like many other people). > > What does that mean? How would that help for the French vernacular > 3?me, or the Spanish C.?a. You might find > there are many more uses than you think. Higher level protocols can already > support these. > Maybe what we need is better and more general higher level protocol support. I agree with most points. ? > better and more general higher level protocol support. Perhaps starting with Word not cancelling superscripting as soon as a character style is applied. ? > Higher level protocols can already support these. They can even support the copyleft symbol by turning the copyright sign, as the proposer of the former indicated, with CSS (one example: [1]). ? > the Spanish C.?a. You might find > there are many more uses than you think. Spanish and many other languages are different in that they use punctuation to note abbreviations, while in French, even the dot is prohibited in this use case. Spanish ?C.?a? is intelligible even without superscripting. Having said that? maybe there remain some cases that are not covered with superscripted basic letters while they are prone to confuse people, OK. ? > How would that help for the French vernacular 3?me It don?t, but as I wrote in parentheses (unfortunately without quoting any example of an ordinal number), this corresponds to ??what I believed in 2014, like many other people??. Kind regards, Marcel [1]: http://dispoclavier.com/#h448 [last line before table caption] From kenwhistler at att.net Wed Oct 5 10:09:33 2016 From: kenwhistler at att.net (Ken Whistler) Date: Wed, 5 Oct 2016 08:09:33 -0700 Subject: My Annual Unicode Questions In-Reply-To: <278CED01-10B0-4E02-A452-147A8F08D919@lboro.ac.uk> References: <278CED01-10B0-4E02-A452-147A8F08D919@lboro.ac.uk> Message-ID: <68d6d4fd-5de9-a904-caf9-22571a869918@att.net> On 10/5/2016 7:37 AM, Andre Schappo wrote: > Q. Who understands Unicode? > A. One student raised his hand. (This is an improvement on last year as no hand was raised last year) A brave soul, indeed! After 27 years of Unicode development, and with the standard (and its accumulated ancillary standards, data, repositories, and libraries) grown so huge, it is no longer clear to me how many participants in a *UTC* meeting would raise their hands in response to that question! --Ken From doug at ewellic.org Wed Oct 5 10:20:26 2016 From: doug at ewellic.org (Doug Ewell) Date: Wed, 05 Oct 2016 08:20:26 -0700 Subject: My Annual Unicode Questions Message-ID: <20161005082026.665a7a7059d7ee80bb4d670165c8327d.e452b35eae.wbe@email03.godaddy.com> Ken Whistler wrote: >> Q. Who understands Unicode? >> A. One student raised his hand. (This is an improvement on last year >> as no hand was raised last year) > > After 27 years of Unicode development, and with the standard (and its > accumulated ancillary standards, data, repositories, and libraries) > grown so huge, it is no longer clear to me how many participants in a > *UTC* meeting would raise their hands in response to that question! The bar for "understands" is lower among non-experts. I would actually be considered the resident "Unicode expert" among my co-workers, which might surprise some folks and alarm others. -- Doug Ewell | Thornton, CO, US | ewellic.org From verdy_p at wanadoo.fr Wed Oct 5 10:34:02 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 5 Oct 2016 17:34:02 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> Message-ID: 2016-10-05 16:17 GMT+02:00 Denis Jacquerye : > > There is no point about other letters than the basic alphabet > superscripted, > > as no French abbreviation exceeds this range (despite of what I believed > > in 2014, like many other people). > > What does that mean? How would that help for the French vernacular > 3?me, or the Spanish C.?a. You might find > there are many more uses than you think. Higher level protocols can already > support these. > Maybe what we need is better and more general higher level protocol > support. > I agree, French allows abbreviating many words by appending the last new letters in superscripts. 3e is recommended but 3?me is still very frequent. As well you'll see abbreviations using ? (a frequent termination for past participles, generally used with the previous consonnant and possibly followed with the f?minine/plural final letters, all in superscript). Almost nobody use the preencoded superscript letters for this (notably not for "1er", or its recommended feminine form "1re", still frequently written "1?re") -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Wed Oct 5 10:44:38 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 5 Oct 2016 17:44:38 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu> Message-ID: <996912232.12078.1475682278718.JavaMail.www@wwinf1f05> On Wed, 5 Oct 2016 06:35:52 +0000, Martin Mueller wrote: > There is always a lot more history than reason in the world. > That said, given that alphabets have fixed numbers, it?s weird > that bits of super and subscripted letters appear in this or > that limited range but that you can?t cobble a whole alphabet > together in a consistent manner. If any , why not all, especially > if there are only two or three dozen. They would end up in the SMP, threatening their usability on Windows keyboard layouts due to their not being defined in XML like Apple?s are, and not being able to output two UTF-16 code points by dead keys, but for IMEs this is no problem. >From a more theoretical viewpoint, encoding superscripted letters as such is opposed to Unicode?s design principles, as it has already been pointed out. This is why only legacy superscripts have SUPERSCRIPT in their name. As of the scattered code point allocations, they come from the pragmatic encoding. A letter isn?t encoded as a preformatted superscript unless there are one or more precise usages, documented in the proposal. To come back to my new point in this thread: I?m believing that in French, superscript lowercase letters have a particular function as abbreviation indicators, in the absence of any other visible sign. This viewpoint is now gaining audience, as it comes from French authorities (DGLFLF, Afnor) who are demanding the /superscript/ dead key, to write abbreviations. In French, there is a need and a demand to move this from higher level to plain text. Hence the need of the MODIFIER LETTER SMALL Q, for a proper solution. E.g., when trying to abbreviate ?Biblioth?que? to ?Bibque? in plain text, one will actually end up with ?Bib ?q_n?existe_pas???. There must be such a message, otherwise users may think there is a bug in the keyboard. Once the encoding of MODIFIER LETTER SMALL Q is at the point where the new scalar value is known, this will take the place of the sequence, and first display as a notdef box. Explaining this is then a matter of documentation. I wasn?t upset about the missing superscript q. But end-users could get. Regards, Marcel From frederic.grosshans at gmail.com Wed Oct 5 12:02:51 2016 From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=) Date: Wed, 5 Oct 2016 19:02:51 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> Message-ID: Le 05/10/2016 ? 15:57, Marcel Schneider a ?crit : > On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote: >> On 2016/10/04 19:35, Marcel Schneider wrote: >>> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote: >>> >>>> Later, the beta and gamma were encoded for phonetic notation, but not the >>>> alpha. >>>> >>>> As a result, you can write basic formulas for select compounds, but not all. >>>> Given that these basic formulae don't need full 2-D layout, this still seems >>>> like an arbitrary restriction. >>> When it?s about informatics, arbitrary restrictions are precisely what gets me >>> upset. Those limitations are?as I wrote the other day?a useless worsening >>> of the usability and usefulness of a product. >> This kind of "let's avoid arbitrary limitations" argument works very >> well for subjects that are theoretical, straightforward, and rigid in >> nature. Many (but not all) subjects in computer science (informatics) >> are indeed of such a nature. >> >> The Unicode Consortium (or more specifically, the UTC) does a lot of >> hard work to create theories where appropriate, and to explain them >> where possible. But they recognize (and we should do so, too) that in >> the end, writing is a *cultural* phenomenon, where straightforward, >> rigid theories have severe limitations. >> >> From a certain viewpoint (the chemist's in the example above), the >> result may look arbitrary, but from another viewpoint (the >> phoneticist's), it looks perfectly fine. At first, it looks like it >> would be easy to fix such problems, but each fix risks to introduce new >> arbitrariness when seen from somebody else's viewpoint. Getting upset >> won't help. > I?ve got the point, thanks. Phonetics need to write running text that is > immediately legible, while a chemistry database may use particular notational > conventions that work with baseline letters to be parsed on semantics or light > markup for proper display in the UI. The UTC decision thus questioned the design > principle of using plain text for chemical formulae. No doubt it was understood > that validating this choice would have opened the door to encoding more special > characters for upgrading or similar purposes. I think there is a big difference between adding a few characters for a new use (chemistry formulae) and completing an obvious almost complete set. People are used to see the 26 basic alphabetic Latin character (abcdefghijklmnopqrstuvwxyz) being treated preferentially by computers, but are always surprised when only one of them is treated differently. Initially, superscript letters where restricted to a few letter, and it made sense to restrict the temptation to complete the set. But now that all modifier small latin letters except q are encoded, it makes little sense. Many people use these characters (arguably wrongly) for many uses beyond IPA, and they are invariably surprised if they need q. The special status of the basic Latin alphabet means that almost no one would be surprised not to find a superscripted ?, ?, or ? and adding the last missing latin basic letter q would not open the door to any more character. > > At this point I?d like to mention what I thought about since this thread > was launched. The French language makes extensive use of superscripts > to note abbreviations. [...] Therefore I suggest to grant > the French language full support by enabling superscript lowercase letters > in order that the SUPERSCRIPT deadkey that the French Standards body recommends, > will work for all abreviations. There is no point about other letters than the basic > alphabet superscripted, as no French abbreviation exceeds this range (despite of > what I believed in 2014, like many other people). Whether ? (and ?) are needed or not is another question. Even if it were useful (as argued ny others in this thread), it brings non trivial technical difficulties in terms of NFC/NFD. But since people are used to see these characters being treated differently, I think the ?problem? of the lack of superscript composed character is less obvious than the lack of *MODIFIER LETTER SMALL Q, in the sense that the first absence is perceived (by the Unicode naive user) as more normal than the second. Fr?d?ric From charupdate at orange.fr Wed Oct 5 17:10:32 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 6 Oct 2016 00:10:32 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> Message-ID: <1791413156.18188.1475705432866.JavaMail.www@wwinf1f05> On Wed, 5 Oct 2016 19:02:51 +0200, Fr?d?ric Grosshans wrote: Le 05/10/2016 ? 15:57, Marcel Schneider a ?crit : > On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. D?rst wrote: [?] >>> >>> From a certain viewpoint (the chemist's in the example above), the >>> result may look arbitrary, but from another viewpoint (the >>> phoneticist's), it looks perfectly fine. At first, it looks like it >>> would be easy to fix such problems, but each fix risks to introduce new >>> arbitrariness when seen from somebody else's viewpoint. Getting upset >>> won't help. >> I?ve got the point, thanks. Phonetics need to write running text that is >> immediately legible, while a chemistry database may use particular notational >> conventions that work with baseline letters to be parsed on semantics or light >> markup for proper display in the UI. The UTC decision thus questioned the design >> principle of using plain text for chemical formulae. No doubt it was understood >> that validating this choice would have opened the door to encoding more special >> characters for upgrading or similar purposes. > > I think there is a big difference between adding a few characters for a > new use (chemistry formulae) and completing an obvious almost complete > set. People are used to see the 26 basic alphabetic Latin character > (abcdefghijklmnopqrstuvwxyz) being treated preferentially by computers, > but are always surprised when only one of them is treated differently. > Initially, superscript letters where restricted to a few letter, and it > made sense to restrict the temptation to complete the set. But now that > all modifier small latin letters except q are encoded, it makes little > sense. Many people use these characters (arguably wrongly) for many uses > beyond IPA, and they are invariably surprised if they need q. The > special status of the basic Latin alphabet means that almost no one > would be surprised not to find a superscripted ?, ?, or ? and adding the > last missing latin basic letter q would not open the door to any more > character. > That is however exactly what I believed, that this would open that door. It seems to me as if the missing superscript q were the last key to keep that door locked (how nice an image, as the small q is somewhat key-shaped). It is as if completing that series would trigger an avalanche of superscript alphabets and symbols to be asked for encoding without any means to be refused. And, troublesome enough, this is exactly how the proposal to encode *MODIFIER LETTER SMALL Q was percieved, despite the rationale, which must have been completely misunderstood, although it seems to me to be written in good English. Thanks to Denis Jacquerye?s detailed answer to the question ?Why is there no character for "superscript q" in Unicode? [1], I got all links quickly [2][3][4]. >> >> At this point I?d like to mention what I thought about since this thread >> was launched. The French language makes extensive use of superscripts >> to note abbreviations. [...] Therefore I suggest to grant >> the French language full support by enabling superscript lowercase letters >> in order that the SUPERSCRIPT deadkey that the French Standards body recommends, >> will work for all abreviations. There is no point about other letters than the basic >> alphabet superscripted, as no French abbreviation exceeds this range (despite of >> what I believed in 2014, like many other people). > Whether ? (and ?) are needed or not is another question. Even if it were > useful (as argued ny others in this thread), it brings non trivial > technical difficulties in terms of NFC/NFD. But since people are used to > see these characters being treated differently, I think the ?problem? of > the lack of superscript composed character is less obvious than the lack > of *MODIFIER LETTER SMALL Q, in the sense that the first absence is > perceived (by the Unicode naive user) as more normal than the second. I really love your point of view, I understand that it is already shared by most people, and I strongly hope that it be adopted by the UTC. Perhaps it is, as there is no notice of non-approval found in the archive. However I?d like to know the answer to the proposer at/after the UTC meeting of August 9-13, 2010 at Redmond [5]. Such requests have to be sent to this List, which is monitored by meeting participants. Regards, Marcel [1] Denis Jacquerye?s post: https://www.quora.com/Why-is-there-no-character-for-superscript-q-in-Unicode/answer/Denis-Jacquerye-1 [2] Karl Pentzlin?s proposal: http://www.unicode.org/L2/L2010/10230-modifier-q.pdf [3] A comment on behalf of Adobe Systems, written up the first day of the UTC meeting where the proposal was rejected: http://www.unicode.org/L2/L2010/10315-comment.pdf [4] Karl Pentzlin?s reply, two days later i.e. three days before the end of the meeting: http://www.unicode.org/L2/L2010/10316-cmts.pdf [5] The anchor in the UTC minutes at the related Action Item: http://www.unicode.org/cgi-bin/GetL2Ref.pl?124-A146 From charupdate at orange.fr Thu Oct 6 02:21:11 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 6 Oct 2016 09:21:11 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> Message-ID: <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> On Wed, 5 Oct 2016 17:34:02 +0200, Philippe Verdy wrote: [?] > > I agree, French allows abbreviating many words by appending the last new > letters in superscripts. 3e is recommended but > 3?me > is still very frequent. As well you'll see abbreviations using ? > (a frequent termination for past participles, generally used with the > previous consonnant and possibly followed with the f?minine/plural final > letters, all in superscript). I did never see that. Would you show us some examples to look up? I?m curious whether they could be managed without accented superscripts. Anyway, combining diacritics should be placeable on superscripts as well. > > Almost nobody use the preencoded superscript letters for this (notably not > for "1er", or its recommended feminine form "1re", > still frequently written "1?re") They don?t because these are not on the keyboard. Trust me, I wouldn?t use them neither if I hadn?t them on a (prototype) keyboard layout. You may say the same about the ?, the ? and ?, and so on. Why do people abbreviate ?num?ro? by ?n??? Because we *do have it* (the degree sign) on our keyboards. BTW, there was another (subsequent) proposal [1], to complete with superscript q but not only. At the time, it was up to fill out the SUPERSCRIPT and SUBSCRIPT dead keys (called latching group selectors). But no trace of any UTC meeting item can be found, at least when I try to do the search. The known issue about this proposal is that it is a part of the ISO/IEC 9995 standardization process, as it should contribute to part 9 of that standard. That is an issue because of Microsoft being fiercely opposed to ISO/IEC 9995. I understand Microsoft, as the standard in question is in my opinion actually suboptimal. But this is another issue, not to be discussed neither in this thread, nor on this Mailing List at all (except perhaps by other subscribers). Regards, Marcel [1] The proposal: http://www.unicode.org/L2/L2011/11208-n4068.pdf From verdy_p at wanadoo.fr Thu Oct 6 04:16:53 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Oct 2016 11:16:53 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: 2016-10-06 9:21 GMT+02:00 Marcel Schneider : > > Almost nobody use the preencoded superscript letters for this (notably > not > > for "1er", or its recommended feminine form "1re", > > still frequently written "1?re") > > They don?t because these are not on the keyboard. Trust me, I wouldn?t use > them neither if I hadn?t them on a (prototype) keyboard layout. > I will certainly not trust you, and you won't challenge me on that. The keyboard is definitely not the issue here. Only the degree sign on French keyboards is very frequently used (instead of the superscript o, or any final o), in: n? (num?ro), d? (ditto), r? (recto), v? (verso), f? (folio). For Latin ordinals: 1? (primo / premi?rement), 2? (secundo / deuxi?mement), the superscript o or degree may be dropped sometimes but it is most often a degree sign in many encoded documents (there's no real difference in handwritten or printed text with many font styles)... It's a common fact that these informal abbrevations (using final "?me", "?re", in superscripts or not) ARE REALLY frequently used (examples are easy to find), they are hadwritten or composed in wordprocessors or even in web editors, because it's so simple to transform them with superscripts. And this happens even if the prefered forms use shorter abbreviations "1er", "2e" without needing any accent in this case (the same also occurs with ordinals using roman digits). Note that the same abbreviations are ALSO found without superscripts, such as "1er", "1re" (or "1?re"), "2e" (or "2?me" ; and when it is the last in a pair : "2nd", "2nde" or "2de" ): this clearly demonstrates that this is just a prefered typographic style for the final letters of abbreviations, and not a separate encoding of the same letters). But not for n?, d?, r?, v?, d? (using a final plain o after the abbreviated first letters would create confusion, the degree sign is then highly prefered to the absence of subperscript, even if the superscript o would be better). -------------- next part -------------- An HTML attachment was scrubbed... URL: From irgendeinbenutzername at gmail.com Thu Oct 6 09:54:07 2016 From: irgendeinbenutzername at gmail.com (Charlotte Buff) Date: Thu, 6 Oct 2016 16:54:07 +0200 Subject: Dealing with Unencodeable Characters Message-ID: One of Unicode's goals is round-trip compatibility with old legacy character sets, which is why we gathered many compatibility characters over time that would normally have been out of scope for the standard. It's why Zapf Dingbats and arabic presentation forms are in Unicode for example. However, there are some characters that form part of these sets yet are deliberately not encoded in Unicode because they were considered unsuitable for inclusion. The two that come to mind are the Windows logo from Wingdings and the Shibuya 109 emoji from the original Japanese vendor sets. Given that these two have no Unicode equivalents, their source character sets are not fully compatible with Unicode, i.e. there is going to be data loss and confusion when trying to convert into or from Unicode. If theoretically I wanted to convert an old Shift JIS document containing emoji to Unicode, how should I ideally handle Shibuya 109? I remember the early emoji proposal documents originally contained "emoji compatibility symbols" which where used to map to source characters that weren't meant to be included with a specified semantic. I believe STATUE OF LIBERTY was one of those characters and was simply called EMOJI COMPATIBILITY SYMBOL-XX so that that specific landmark wouldn't strictly be part of Unicode. Obviously this approach ultimatively wasn't implemented, but I wonder whether there could be designated compatibility characters for this kind of issue. Private use characters are an obvious choice but of course their meaning is user-defined, so while all other emoji in my Shift JIS document would receive an unambiguous Unicode mapping, Shibuya 109 would remain vague and very limited in interchange options. -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic.grosshans at gmail.com Thu Oct 6 09:55:32 2016 From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=) Date: Thu, 6 Oct 2016 16:55:32 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit : > > I did never see that. Would you show us some examples to look up? I?m curious > whether they could be managed without accented superscripts. > Anyway, combining diacritics should be placeable on superscripts as well. Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04) The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is nothing to add. Fr?d?ric From jkorpela at cs.tut.fi Thu Oct 6 10:53:04 2016 From: jkorpela at cs.tut.fi (Jukka K. Korpela) Date: Thu, 6 Oct 2016 18:53:04 +0300 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: 6.10.2016, 17:55, Fr?d?ric Grosshans wrote: > Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit : >> >> I did never see that. Would you show us some examples to look up? I?m >> curious >> whether they could be managed without accented superscripts. >> Anyway, combining diacritics should be placeable on superscripts as well. > Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04) > The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is > nothing to add. It?s fine that it works in some environment(s), but it would be unrealistic to expect it to work generally. In most environments, assuming the font used supports the characters involved in the first place, the result is probably a grave accent struck over the superscript e, in a rather ugly way. Even though Unicode superscript (and subscript) characters have a lot of practical use in many contexts, this isn?t really one of them. In a case like this, in most environments, and especially if you want the text to display well in different environments, the solution is to use just ?3?me?, perhaps with some method (?above? the character level) used to format the letters as superscript, when not limited to plain text ? but I?m afraid most fonts don?t have a superscript glyph for ??? available, so it would usually be best to give up the superscripting idea here. Yucca From oren.watson at gmail.com Thu Oct 6 11:04:17 2016 From: oren.watson at gmail.com (Oren Watson) Date: Thu, 6 Oct 2016 12:04:17 -0400 Subject: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: ---------- Forwarded message ---------- From: Oren Watson Date: Thu, Oct 6, 2016 at 12:03 PM Subject: Re: Why incomplete subscript/superscript alphabet ? To: "Jukka K. Korpela" If this is a real need, why not petition more software to allow the use of the U+8C partial line up and U+8B partial line down characters for the this purpose? On Thu, Oct 6, 2016 at 11:53 AM, Jukka K. Korpela wrote: > 6.10.2016, 17:55, Fr?d?ric Grosshans wrote: > > Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit : >> >>> >>> I did never see that. Would you show us some examples to look up? I?m >>> curious >>> whether they could be managed without accented superscripts. >>> Anyway, combining diacritics should be placeable on superscripts as well. >>> >> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04) >> The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is >> nothing to add. >> > > It?s fine that it works in some environment(s), but it would be > unrealistic to expect it to work generally. In most environments, assuming > the font used supports the characters involved in the first place, the > result is probably a grave accent struck over the superscript e, in a > rather ugly way. > > Even though Unicode superscript (and subscript) characters have a lot of > practical use in many contexts, this isn?t really one of them. In a case > like this, in most environments, and especially if you want the text to > display well in different environments, the solution is to use just ?3?me?, > perhaps with some method (?above? the character level) used to format the > letters as superscript, when not limited to plain text ? but I?m afraid > most fonts don?t have a superscript glyph for ??? available, so it would > usually be best to give up the superscripting idea here. > > Yucca > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenwhistler at att.net Thu Oct 6 11:27:13 2016 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 6 Oct 2016 09:27:13 -0700 Subject: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> On 10/6/2016 9:04 AM, Oren Watson wrote: > If this is a real need, why not petition more software to allow the > use of the U+8C partial line up and U+8B partial line down characters > for the this purpose? > Because U+008C and U+008B are relics from the days when control codes were used in terminal control protocols and to drive print trains in devices like this: https://en.wikipedia.org/wiki/Line_printer#/media/File:IBM_line_printer_1403.JPG Their functions have been completely overtaken by markup conventions such as ... and ..., which *are* widely supported already, even in most email clients, ri^ght out of the b_ox . And I suspect that Yucca's statement "so it would usually be best to give up the superscripting idea here" is intended to mean give up on asking for a separately encoded superscript character for each Latin letter, including accented ones (or applying accents to separately encoded superscript letters). Because, after all, this stuff already just works: ?3^?me ? (and not ?3?????, by the way!). --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From oren.watson at gmail.com Thu Oct 6 11:32:07 2016 From: oren.watson at gmail.com (Oren Watson) Date: Thu, 6 Oct 2016 12:32:07 -0400 Subject: Fwd: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> Message-ID: I meant, petition say the devs of Konsole, iTerm, xterm etc, and other programs which deal purely in plain text to support 8b and 8c characters for formatting. Markup doesn't exist everywhere. On Thu, Oct 6, 2016 at 12:27 PM, Ken Whistler wrote: > > > On 10/6/2016 9:04 AM, Oren Watson wrote: > > If this is a real need, why not petition more software to allow the use of > the U+8C partial line up and U+8B partial line down characters for the this > purpose? > > > Because U+008C and U+008B are relics from the days when control codes were > used in terminal control protocols and to drive print trains in devices > like this: > > https://en.wikipedia.org/wiki/Line_printer#/media/File:IBM_l > ine_printer_1403.JPG > > Their functions have been completely overtaken by markup conventions such > as ... and ..., which *are* widely supported > already, even in most email clients, right out of the box. > > And I suspect that Yucca's statement "so it would usually be best to give > up the superscripting idea here" is intended to mean give up on asking for > a separately encoded superscript character for each Latin letter, including > accented ones (or applying accents to separately encoded superscript > letters). Because, after all, this stuff already just works: ?3?me? (and > not ?3?????, by the way!). > > --Ken > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Oct 6 13:03:32 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 6 Oct 2016 20:03:32 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: <1098989413.14438.1475777012613.JavaMail.www@wwinf1f05> On Thu, 6 Oct 2016 16:55:32 +0200, Fr?d?ric Grosshans wrote: [?] >> Anyway, combining diacritics should be placeable on superscripts as well. > Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04) > The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is > nothing to add. As others pointed out, this depends also on the font. In my webmail and in my text editor, the accent displays above the m, struck across the upper edge of the superscript letter. The French Standards body is asking for a facility on the keyboard to input the French ordinal indicator, basically a superscript e, as a plain text character: XX? si?cle [20th century, or since we are on it: 20?? century]. There is no recommended use of accents when talking about French ordinals. See this shocking image (the neon sign was deprecated *and* faulty): https://twitter.com/XimeLelong/status/776448216346791936 Regards, Marcel From kenwhistler at att.net Thu Oct 6 13:03:25 2016 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 6 Oct 2016 11:03:25 -0700 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> Message-ID: <7fdd20ef-d309-c089-e2cc-11df024da44f@att.net> On 10/6/2016 9:32 AM, Oren Watson wrote: > I meant, petition say the devs of Konsole, iTerm, xterm etc, and other > programs which deal purely in plain text to support 8b and 8c > characters for formatting. Markup doesn't exist everywhere. > Fair enough. But most actual terminals didn't support partial line advances (although line printers and electric typewriter terminals could): http://www.ccs.neu.edu/research/gpc/MSim/vona/terminal/VT100_Escape_Codes.html so there would seem to be little call for terminal emulators to do so in such cases. (And by the way, it is arguable that markup *does* exist for terminals. After all, that is what character attribute controls like ^[[1m for bold mode are all about.) And *consoles*, which pretty much by definition do *un*formatted text, are poor contexts to try to fancy up with out-of-scope formatting requirements. In general I fail to see any significant ROI for this kind of requirement. Trying to patch up consoles with hacks to deal with Latin superscripts and subscripts is just another scheme that will run up on the rocks at the very next formatting requirement thrown at it -- or for that matter, when attempting to render plain text in nearly *any* complex script encoded in Unicode. --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Oct 6 13:06:56 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Oct 2016 20:06:56 +0200 Subject: Dealing with Unencodeable Characters In-Reply-To: References: Message-ID: PUA characters are still used when mapping corporate logos (from Windows and Apple/MacOS) in fonts for the relevant systems. Microsoft then opted to include these corporate logos (and specific UI icons) in a separate font, also with PUA mappings, and then added new PUA fonts as needed. E.g.: * "Segoe MDL2 Assets" on Windows 10, even if many of theses characters are symbols are also encoded separately with standard codes, only to make sure they have a coherent design and metrics instead of taking them from various random fonts). There are for example icons representing battery levels, wifi reception levels with bars, status icons for muting on/off some devices or UI services for talks, cameras, selection of screen, enabling/disabling the touch interface, displaying the state of headphones, presenting incoming phone calls or keeping them silent... and several variants of common arrows and common geometric symbols, or even some characters for the Windows calculator such as common arithmetic signs. You'll note many variants of arrow heads. May be these characters are aslo used internally for being used as internal fallbacks in IE/Edge, but all this is left completely undocumented (colutarily in my opinion to make sure that other users will not create and exchange documents intended to be interoperable). * "Webdings" contain various elaborate icons that are designed to be realistic rather than symbolic, sometimes in several locale-sensitive variants (e.g. the Earth globe, centered on America, or Europe/Africa, or on Asia/Australia). Here again you'll find various arrow heads for displaying UI buttons. * "Wingdings", "Wingdings 2", are here again maaping various forms of arrows and arrow heads, plus some emojis or enclosed characters, or decorative characters. "Wingdings" also includes another Windows logo at position 0xFF; these fonts are not mapped to Unicode but to 8-bit code positions 0x21..0xFF. * "Wingdings 3" uses a mix of non-Unicode mappings in 0x21..0xFF and some characters and other regular Unicode positions (in 0x2000..0X9FFF) multiple times (every block of 0x100 code positions, i.e. each glyph is mapped 128 or 129 times in that font). None of these characters have a Unicode mapping. * You probably remember the case of the "Marlett" font created to support the UI of Windows 7 (but most positions are assigned to .notdef/"tofu") and that has a position 0x57 mapped to a Windows logo. There's also an old font "MT Extra" made by Math Type (in 1996 according to its details), containing some maths symbols (probably still used by some modules in the Equations edit for compatibility of documents created with old versions of Office). These two fonts are using only 8-bit code mappings (in 0x21..0xFF, but most of them are mapped to a .notdef/"tofu" glyph). Such fonts are installed and used by specific software modules, and at discrete font sizes and not even hinted (they could as well use collections of scalable vector graphics, but a single font allows these symbols to be more efficiently loaded and to be hinted for low resolution display at small font sizes). They may still be used in other applications but without any warranty of interoperability or support for upgrades/downgrades across Windows versions. In fact these fonts are not relaly supported outside of the specific software modules needing them to render their UI. They may disppear or change significantly at any time. 2016-10-06 16:54 GMT+02:00 Charlotte Buff : > One of Unicode's goals is round-trip compatibility with old legacy > character sets, which is why we gathered many compatibility characters over > time that would normally have been out of scope for the standard. It's why > Zapf Dingbats and arabic presentation forms are in Unicode for example. > However, there are some characters that form part of these sets yet are > deliberately not encoded in Unicode because they were considered unsuitable > for inclusion. The two that come to mind are the Windows logo from > Wingdings and the Shibuya 109 emoji from the original Japanese vendor sets. > > Given that these two have no Unicode equivalents, their source character > sets are not fully compatible with Unicode, i.e. there is going to be data > loss and confusion when trying to convert into or from Unicode. > > If theoretically I wanted to convert an old Shift JIS document containing > emoji to Unicode, how should I ideally handle Shibuya 109? > > I remember the early emoji proposal documents originally contained "emoji > compatibility symbols" which where used to map to source characters that > weren't meant to be included with a specified semantic. I believe STATUE OF > LIBERTY was one of those characters and was simply called EMOJI > COMPATIBILITY SYMBOL-XX so that that specific landmark wouldn't strictly be > part of Unicode. Obviously this approach ultimatively wasn't implemented, > but I wonder whether there could be designated compatibility characters for > this kind of issue. Private use characters are an obvious choice but of > course their meaning is user-defined, so while all other emoji in my Shift > JIS document would receive an unambiguous Unicode mapping, Shibuya 109 > would remain vague and very limited in interchange options. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Oct 6 13:14:13 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 6 Oct 2016 20:14:13 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> Message-ID: <1885626438.14631.1475777653900.JavaMail.www@wwinf1f05> On Thu, 6 Oct 2016 09:27:13 -0700, Ken Whistler wrote: [?] > Their functions have been completely overtaken by markup conventions > such as ... and ..., which *are* widely supported > already, even in most email clients, ri^ght out of the b_ox . > > And I suspect that Yucca's statement "so it would usually be best to > give up the superscripting idea here" is intended to mean give up on > asking for a separately encoded superscript character for each Latin > letter, including accented ones (or applying accents to separately > encoded superscript letters). Because, after all, this stuff already > just works: ?3^?me ? (and not ?3?????, by the way!). High level formatting in high-end mail clients is of little use when the target environment is plain text. It?s still unambiguous, though. As of superscript ???, I had asked for it as soon as 2014, and I fully understood that Unicode no longer encourages proposals of any *new* precomposed characters. This was before I learned that ?3?me? is not good French. These long ordinal indicators are deprecated. Regards, Marcel From verdy_p at wanadoo.fr Thu Oct 6 13:16:36 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Oct 2016 20:16:36 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: It does not render very well, the accent is not correctly positioned vertically (far too high) above the superscript e and colliding with the previous line of text at normal line-height, because fonts do not support this pair with proper positioning. The combination is just rendered in some "best effort" by the text renderer of my browser. When used in the Windows UI, the accent collides with the following superscript "m". Let's not talk about how you would superscript a "???" (very poor positioning if using combining characters) or "????" (the result would be misleading with most fonts if using combining characters) or "???" (impossible)... 2016-10-06 16:55 GMT+02:00 Fr?d?ric Grosshans : > Le 06/10/2016 ? 09:21, Marcel Schneider a ?crit : > >> >> I did never see that. Would you show us some examples to look up? I?m >> curious >> whether they could be managed without accented superscripts. >> Anyway, combining diacritics should be placeable on superscripts as well. >> > Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu 16.04) > The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is nothing > to add. > > Fr?d?ric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkorpela at cs.tut.fi Thu Oct 6 13:20:22 2016 From: jkorpela at cs.tut.fi (Jukka K. Korpela) Date: Thu, 6 Oct 2016 21:20:22 +0300 Subject: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> Message-ID: 6.10.2016, 19:27, Ken Whistler wrote: > Their functions have been completely overtaken by markup conventions > such as ... and ..., which *are* widely supported > already, even in most email clients, ri^ght out of the b_ox . They are widely supported, but very widely in a typographically inferior way. This is essential especially when it comes to things like ?3?me?, where one might want to display the letters in superscript style as a matter of typography- > And I suspect that Yucca's statement "so it would usually be best to > give up the superscripting idea here" is intended to mean give up on > asking for a separately encoded superscript character for each Latin > letter, including accented ones Not quite. Adding superscript characters for all Latin letters is not a good idea at all, but I was not referring that. Instead, I suggested that in a case like ?3?me?, it?s best to give up the idea of superscripting the letters using any techniques available now (including e.g. markup), in most situations. Flat rendering of ?3?me? is better than a typographically poor rendering with superscripts. > Because, after all, this stuff already > just works: ?3^?me ? (and not ?3?????, by the way!). It works for a rather limited range of values for ?works?. I?m not sure what happens in my reply... it seems that Thunderbird does something funny here. Anyway, what I saw in my Thunderbird is what I usually see when is used: ??me? in slightly reduced font in elevated position, messing up line spacing, and looking rather different from superscript glyphs designed by a typographer. Independently of the technique used to ask software to show something as a superscript (e.g. using a superscript character code point in Unicode, using , using superscript formatting in a word processor, or using ^{...} in TeX), typographically accepted rendering must use a superscript glyph, designed by a typographer to match the overall style of the font, or maybe a sophisticated algorithm that constructs the rendering from a normal glyph. In a sense, superscript code points make this easier: the rendering can simply pick up the corresponding glyph for the font ? if it has one (a big ?if?). But this is not a good argument in favor of adding such points en masse. It is, however, a good argument in favor of using existing superscript code points, like ???, with good font support. Yucca From kenwhistler at att.net Thu Oct 6 13:30:52 2016 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 6 Oct 2016 11:30:52 -0700 Subject: Dealing with Unencodeable Characters In-Reply-To: References: Message-ID: <24956e36-247f-7d70-5e81-691f320f8435@att.net> On 10/6/2016 7:54 AM, Charlotte Buff wrote: > If theoretically I wanted to convert an old Shift JIS document > containing emoji to Unicode, how should I ideally handle Shibuya 109? And the general answer to that is convert to U+FFFD, unless you are doing something specific and know what you are doing. ... in which case you can use PUA or insert an image, or whatever else you need to do. This is not a character *standardization* issue that requires the UTC to come up with a generic interchange solution for every pre-Unicode character encoding of everything that ever was, whether it be some oddball Shift JIS extensions that were omitted in the consensus on encoding the Japanese Carrier Emoji: http://www.unicode.org/reports/tr51/tr51-7.html#Japanese_Carrier or other odds and ends from bizarre, dead-end, disused character encodings from a previous generation. By the way, the biggest ongoing problem we deal with here is the continuing urge to proliferate font-encoded hacks for particular languages and writing systems. The text interchange problems that such schemes pose on an ongoing basis far far outweigh issues like what to do with a Shibuya 109 emoji, imo. --Ken From doug at ewellic.org Thu Oct 6 14:02:20 2016 From: doug at ewellic.org (Doug Ewell) Date: Thu, 06 Oct 2016 12:02:20 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161006120220.665a7a7059d7ee80bb4d670165c8327d.f785149136.wbe@email03.godaddy.com> >> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu >> 16.04) The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there >> is nothing to add. > > It does not render very well, the accent is not correctly positioned > vertically (far too high) above the superscript e and colliding with > the previous line of text at normal line-height, because fonts do not > support this pair with proper positioning. http://www.unicode.org/faq/char_combmark.html#12b Poor display support today is not supposed to be a rationale for permanently encoding new precomposed letters. -- Doug Ewell | Thornton, CO, US | ewellic.org From doug at ewellic.org Thu Oct 6 14:03:29 2016 From: doug at ewellic.org (Doug Ewell) Date: Thu, 06 Oct 2016 12:03:29 -0700 Subject: Dealing with Unencodeable Characters Message-ID: <20161006120329.665a7a7059d7ee80bb4d670165c8327d.d0bdde4c26.wbe@email03.godaddy.com> > * "Wingdings", "Wingdings 2", are here again maaping various forms of > arrows and arrow heads, plus some emojis or enclosed characters, or > decorative characters. "Wingdings" also includes another Windows logo > at position 0xFF; these fonts are not mapped to Unicode but to 8-bit > code positions 0x21..0xFF. > * "Wingdings 3" uses a mix of non-Unicode mappings in 0x21..0xFF and > some characters and other regular Unicode positions (in 0x2000.. > 0X9FFF) multiple times (every block of 0x100 code positions, i.e. each > glyph is mapped 128 or 129 times in that font). None of these > characters have a Unicode mapping. It's true that the Wingdings and Webdings fonts themselves, which date back to the 1990s, are "symbol fonts" with glyphs mapped to the ASCII range. However, to clear up any possible confusion, all glyphs in these fonts have had actual Unicode mappings since version 7.0 (June 2014). -- Doug Ewell | Thornton, CO, US | ewellic.org From doug at ewellic.org Thu Oct 6 14:06:07 2016 From: doug at ewellic.org (Doug Ewell) Date: Thu, 06 Oct 2016 12:06:07 -0700 Subject: Dealing with Unencodeable Characters Message-ID: <20161006120607.665a7a7059d7ee80bb4d670165c8327d.31d43928da.wbe@email03.godaddy.com> Charlotte Buff wrote: > Private use characters are an obvious choice but of course their > meaning is user-defined, so while all other emoji in my Shift JIS > document would receive an unambiguous Unicode mapping, Shibuya 109 > would remain vague and very limited in interchange options. But that's exactly what private-use characters were invented for: so you can represent characters in a given character encoding framework which are not encoded for some reason. Of course you need a private agreement of some kind, but it can be as simple as "Hey, everybody, in the attached document (or in any documents I create) U+FF109 means SHIBUYA 109." Private agreements don't have to be secret or limited-distribution, and they don't have to be excessively formal. Unicode rejected the "compatibility symbols" because they would have amounted to private-use characters defined by Unicode, where the formal names and definitions of the characters were not specified but, shhh, we all know what they REALLY mean. This would have been the Wrong Thing to Do on many levels. -- Doug Ewell | Thornton, CO, US | ewellic.org From verdy_p at wanadoo.fr Thu Oct 6 14:21:00 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Oct 2016 21:21:00 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161006120220.665a7a7059d7ee80bb4d670165c8327d.f785149136.wbe@email03.godaddy.com> References: <20161006120220.665a7a7059d7ee80bb4d670165c8327d.f785149136.wbe@email03.godaddy.com> Message-ID: 2016-10-06 21:02 GMT+02:00 Doug Ewell : > >> Like ?3????? ? It already works on my laptop (Thunderbird in Ubuntu > >> 16.04) The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there > >> is nothing to add. > > > > It does not render very well, the accent is not correctly positioned > > vertically (far too high) above the superscript e and colliding with > > the previous line of text at normal line-height, because fonts do not > > support this pair with proper positioning. > > http://www.unicode.org/faq/char_combmark.html#12b > > Poor display support today is not supposed to be a rationale for > permanently encoding new precomposed letters. > I've not asked for that, I just wanted to comment the fact that using subscripts encoded for compatibilty with legacy standards or specific uses (such as IPA) by following them with random combining diacritics not designed for this usage is not the way to go. The generic styling markup (appropriate for each kind of document is the way to go. For abbreviations in plain-text files, it is often better to not even try to render these superscript styles, without using any additional markup at all, and then use the full range of letters for relevant scripts. -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Oct 6 14:39:01 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Oct 2016 21:39:01 +0200 Subject: Dealing with Unencodeable Characters In-Reply-To: <20161006120329.665a7a7059d7ee80bb4d670165c8327d.d0bdde4c26.wbe@email03.godaddy.com> References: <20161006120329.665a7a7059d7ee80bb4d670165c8327d.d0bdde4c26.wbe@email03.godaddy.com> Message-ID: 2016-10-06 21:03 GMT+02:00 Doug Ewell : > > * "Wingdings", "Wingdings 2", are here again maaping various forms of > > arrows and arrow heads, plus some emojis or enclosed characters, or > > decorative characters. "Wingdings" also includes another Windows logo > > at position 0xFF; these fonts are not mapped to Unicode but to 8-bit > > code positions 0x21..0xFF. > > * "Wingdings 3" uses a mix of non-Unicode mappings in 0x21..0xFF and > > some characters and other regular Unicode positions (in 0x2000.. > > 0X9FFF) multiple times (every block of 0x100 code positions, i.e. each > > glyph is mapped 128 or 129 times in that font). None of these > > characters have a Unicode mapping. > > It's true that the Wingdings and Webdings fonts themselves, which date > back to the 1990s, are "symbol fonts" with glyphs mapped to the ASCII > range. However, to clear up any possible confusion, all glyphs in these > fonts have had actual Unicode mappings since version 7.0 (June 2014). > These mappings exist theoretically but not in these fonts themselves (notably not when there are multiple variants of the same encoded characters, notably for many arrows and arrow heads). The 3 glyphs for the Earth globe (centered on Americas, or Europe+Africa or South/East Asia+Australia) are not distinguished at all in Unicode (I've not seen any sequence with variants selectors to help distinguishinhg them, and there are some fonts showing the Earth globe centered on the Antarctic). Unicode seems to also allow the character to show a flat Mercator map centered on these positions, or other projections, as the encoded character just means "Earth". So no, the mappings are theoretical and allow wide variations, that these fonts purposely want to distinguish. They are used without directly without using any Unicode mapping, for internal implementation reasons, or specific meanings in specific applications, or because this makes a coherent graphical design for an UI (fonts are used for this prupose, but many applications do not need fonts for this usage, they just use collections of icons, frequently packed in a ZIP/JAR archive, or using CSS selectors in SVG files, or hidden in their graphic source code by directly using drawing APIs, in which they can add custom visual effects such as animations, glowing, transparency, custom superpositions and compositions custom layouts and interaction with user events or application events and states). Using the Unicode mappings in these fonts would not allow selecting the appropriate dinctiguished glyphs, the UI would become confusive or no longer usable or would create a ugly patchwork. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwalla at gmail.com Thu Oct 6 14:44:05 2016 From: gwalla at gmail.com (Garth Wallace) Date: Thu, 6 Oct 2016 12:44:05 -0700 Subject: Bit arithmetic on Unicode characters? Message-ID: Other than converting between UTFs, is bit arithmetic commonly performed on Unicode characters? I was under the impression that it's a rarity if it is done at all. I've been working on a proposal for additional chess symbols used in chess problems and variant games, and I've been in communication with the World Federation for Chess Composition, which is the international organization in charge of chess problems. We have agreement on the repertoire and the text of the proposal, but the arrangement of the proposed characters within the new block is a sticking point. Some representatives of the WFCC have proposed alternate arrangements that assume there will be a need for bitwise operations to covert between the existing chess symbols in the Miscellaneous Symbols block and related symbols in the new block. I don't see the need but maybe I'm missing something. -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Thu Oct 6 14:48:25 2016 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Oct 2016 21:48:25 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: Jukka K. Korpela : > > ? the solution is to use just ?3?me?, perhaps with some method (?above? the character level) used to format the letters as superscript, when not limited to plain text ? For ordinal numbers, it?s relatively simple to code language-dependent glyph substitution in Opentype which would not require any additional effort from the author, ?3?me? would just work, ?3e? ? ?3?? would require some extra care to avoid false positives. Letter-only abbreviations, however, would only work reliably with an added marker. Many languages conventionally, which are written in the roman script, including English, choose an apostrophe, but inter-letter periods are also not unheard of. That means, ?M?me? and ?M.me? could also be easily converted to ?M??? on a font/glyph level. If the used OTF feature is supported and active, this will work in plain text environments, but, of course, it depends on the font. From doug at ewellic.org Thu Oct 6 15:01:01 2016 From: doug at ewellic.org (Doug Ewell) Date: Thu, 06 Oct 2016 13:01:01 -0700 Subject: Dealing with Unencodeable Characters Message-ID: <20161006130101.665a7a7059d7ee80bb4d670165c8327d.becafe8546.wbe@email03.godaddy.com> Philippe Verdy wrote: > The 3 glyphs for the Earth globe (centered on Americas, or > Europe+Africa or South/East Asia+Australia) are not distinguished at > all in Unicode (I've not seen any sequence with variants selectors to > help distinguishinhg them, 0xFC through 0xFE in Webdings are: 1F30D;EARTH GLOBE EUROPE-AFRICA;So;0;ON;;;;;N;;;;; 1F30F;EARTH GLOBE ASIA-AUSTRALIA;So;0;ON;;;;;N;;;;; 1F30E;EARTH GLOBE AMERICAS;So;0;ON;;;;;N;;;;; I was asked not to publish my mapping tables (which were taken from one of the final versions of the proposal) because they wouldn't have been provided directly by Microsoft. But let me know if you need any additional mappings on a one-off basis. *All glyphs in the Wingdings and Webdings fonts have had actual Unicode mappings since version 7.0 (June 2014).* > and there are some fonts showing the Earth globe centered on the > Antarctic). Sorry, I must have missed the part in http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0058.html where you were talking about that. -- Doug Ewell | Thornton, CO, US | ewellic.org From charupdate at orange.fr Thu Oct 6 15:12:24 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 6 Oct 2016 22:12:24 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> <3185cc2d-d397-c46b-3b7a-5aaca74ed38e@att.net> Message-ID: <2088087407.16544.1475784744365.JavaMail.www@wwinf1f05> On Thu, 6 Oct 2016 21:20:22 +0300, Jukka K. Korpela wrote: > In a sense, superscript code points make this easier: the rendering can > simply pick up the corresponding glyph for the font ? if it has one (a > big ?if?). But this is not a good argument in favor of adding such > points en masse. It is, however, a good argument in favor of using > existing superscript code points, like ???, with good font support. The topic was mainly about completing the Latin alphabet with the missing superscript (lowercase) and subscript characters, and eventually small caps. As of me and many others, we were not asking for more than that. And IMHO, this is not too much asked, after the ?????????????????? mathematical alphabets. And no diacriticised letters are required as superscripts to fully support the French language in Unicode. I like very much your recommendation of *simplicity.* On a web page or so, you can do a lot with CSS. On the other hand, every language should be able to be written in plain text following its specificities. For French, that means that superscripts as abbreviation indicators are required in plain text. This is not a pregnant need for digits, like it isn?t in English. But it is in French for titles, common nouns, and so on. One other advantage of plain text abbreviations with superscripts is that you are able to search-and-replace the indicators with formatted baseline letters when the layout is made up. The reverse is way harder, if not impossible once the formatting is lost. Its about the stability of the writing system. The French recommendation is *not* to use long ordinal indicators, only one or exceptionally two letters. What can be called ?a hack? is using the degree sign to ape a superscript small o. This very year 2016, there *can* be an end of those workarounds, since finally, our country is about to be given several *official* decent keyboards (keyboard layouts). Regards, Marcel From charupdate at orange.fr Thu Oct 6 15:19:35 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 6 Oct 2016 22:19:35 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu> References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <8E654A01-688D-4F5C-9BAC-B7B209BFDAE5@northwestern.edu> Message-ID: <541186650.16609.1475785175378.JavaMail.www@wwinf1f05> On Wed, 5 Oct 2016 06:35:52 +0000, Martin Mueller wrote: [?] > That said, given that alphabets have fixed numbers, it?s weird > that bits of super and subscripted letters appear in this or > that limited range but that you can?t cobble a whole alphabet > together in a consistent manner. Indeed your point looked good to me, and it does again. Here?s why: > If any , why not all, especially > if there are only two or three dozen. Phonetics typically use Latin script as a basis. Like mathematics use bold, italic, script, sans-serif and double-struck, phonetics use superscript, subscript, and small caps. From a Unicode viewpoint, phonetics are not less important than mathematics. Mathematicians have been granted more than one dozen complete or completing alphabets of preformatted characters. Phoneticists have never been granted any complete alphabet. They must always prove their needs in detail, whereas mathematicians have full liberty in choosing variables. According to my hypothesis and while waiting, I believe that the intent of the gap kept in the superscript lowercase range, is to maintain a limitation to the performance of plain text. I don?t see very well how to apply Hanlon?s razor here, because there seems to be a strong unwillingness to see people getting keyboards that allow them to write in plain text without being bound to high-end software. The goal seems to be to keep the users dependent on a special formatting feature and to draw them away from simplicity. This results clearly from the weird arguments that were thrown against the proposal of *MODIFIER LETTER SMALL Q. The comment on behalf of Adobe had only a slight resemblance of commenting the proposal as such, [?]. Trying to sum up: By encoding these few characters, there would indeed be a door that is thrown wide open. However, it has then been pointed out that there would be *no rush* through that door. Regards, Marcel From verdy_p at wanadoo.fr Thu Oct 6 15:22:58 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Oct 2016 22:22:58 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: 2016-10-06 21:48 GMT+02:00 Christoph P?per : > > For ordinal numbers, it?s relatively simple to code language-dependent > glyph substitution in Opentype which would not require any additional > effort from the author, ?3?me? would just work, ?3e? ? ?3?? would require > some extra care to avoid false positives. Letter-only abbreviations, > however, would only work reliably with an added marker. Many languages > conventionally, which are written in the roman script, including English, > choose an apostrophe, but inter-letter periods are also not unheard of. > That means, ?M?me? and ?M.me? could also be easily converted to ?M??? on a > font/glyph level. If the used OTF feature is supported and active, this > will work in plain text environments, but, of course, it depends on the > font. > The *standard* French abbreviation for Madame is NOT "M'me" or "M.me" but "Mme" without confusion, the superscript on final letters "me" is optional. False positives on "3e" are extremely rare, and writing it as ?3?? does not change the isolated ambiguities that could exist with a custom numbering (but for numbersing sections headers, the title is separated by a punctuation os there's some context such as its presence in a numbered list, or the presence of explicit word such as articles ("le 3e") and the grammatical syntax of sentences. But if semantic is your issue, we could insert an invisible Unicode mark of abbreviation (notably the invisible abbreviation dot, which may be rendered as a dot in some contexts where distinctions by styles cannot be used, or could be rendered by using superscripts for letters glued after it). We have such characters for mathematics (invisible addition mark and invisible multiplication marks (to disambiguate cases in formulas, such as a number followed by a fraction: does "3 1/2" mean 3.5 or 1.5 ?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Thu Oct 6 16:00:15 2016 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Thu, 6 Oct 2016 14:00:15 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: Message-ID: <588d7cd6-4037-218a-5c32-3d2ddc0e2c6d@ix.netcom.com> An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Oct 6 16:07:00 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Oct 2016 23:07:00 +0200 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: Message-ID: As far as we know, arithmetic is performed only in - subsets of decimal digits in ASCII and for a dozen of scripts and converting automatically between them using a single additive constant for the 10 digits. - Basic Latin/ASCII for mapping lettercases and mapping non-decimal digits (adding 6 starting at 10 to use letters A..Z after 0..9) - the subset of precomposed syllables in Hangul (needed also for checking canonical equivalences and for the standard NFC/NFD normalizations, and partly for implementing NFKC/NFKD normalizations and collation). - in all other cases, this is not reliable at all (characters may still be allocated in unused slots without any relation to case mappings, e.g. for the slot in the basic Greek alphabet with the final sigma only encoded in lowercase, or for mapping the Turkic distinction of dotted I and undotted i): you'll need proper mapping tables. - for symbols which could benefit of it (such as box-drawing characters), it is not used, except for Braille patterns, or for mapping between black and white versions of chess pieces, or mapping between comparable mahjong tiles series in their basic set (but not necessarily with the same constant in extended sets, as it would have required allocating them in more columns than strictly needed), or for ASCII letters with mapping mathematical variants of Latin letters or RIS symbols or wide variants for CJK. 2016-10-06 21:44 GMT+02:00 Garth Wallace : > Other than converting between UTFs, is bit arithmetic commonly performed > on Unicode characters? I was under the impression that it's a rarity if it > is done at all. > > I've been working on a proposal for additional chess symbols used in chess > problems and variant games, and I've been in communication with the World > Federation for Chess Composition, which is the international organization > in charge of chess problems. We have agreement on the repertoire and the > text of the proposal, but the arrangement of the proposed characters within > the new block is a sticking point. Some representatives of the WFCC have > proposed alternate arrangements that assume there will be a need for > bitwise operations to covert between the existing chess symbols in the > Miscellaneous Symbols block and related symbols in the new block. I don't > see the need but maybe I'm missing something. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Thu Oct 6 16:08:52 2016 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Oct 2016 23:08:52 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161003144304.665a7a7059d7ee80bb4d670165c8327d.995051a4fd.wbe@email03.godaddy.com> <861342229.4994.1475577353789.JavaMail.www@wwinf1n25> <92360e6c-a3a8-28a6-e666-3d2612fee14b@it.aoyama.ac.jp> <283719302.9783.1475675868120.JavaMail.www@wwinf1f05> <451253030.1751.1475738472001.JavaMail.www@wwinf1f05> Message-ID: <543B726F-F559-44A3-9ACB-84261E77A7A2@crissov.de> Philippe Verdy : > > But if semantic is your issue, we could insert an invisible Unicode mark of abbreviation (notably the invisible abbreviation dot, which may be rendered as a dot in some contexts where distinctions by styles cannot be used, or could be rendered by using superscripts for letters glued after it). Yes, the necessary marker I mentioned would not need to have a visible glyph. U+002E Full Stop and U+0027 Apostrophe or, preferably, U+2019 Right Single Quotation Mark (alias curly apostrophe) are just common choices in related languages and, of course, already exist. Some style guides allow or recommend to omit (some of) them: ?e. g.?, ?e.g.?, ?eg.?, ?eg?. In acronyms with non-initial capitals, in particular, they?ve almost died out, except in cases like ?U.S.? vs. ?US? vs. ???? vs. ?us? (next to ?UK? and ?UN?). U+2065 would be an obvious choice (coming right after Invisible Times, Separator and Plus). Possible names: - Invisible Terminator (as in ?Inc.?) - Invisible Ellipsis (as in ?L?t?d?, ?M?me?) alias Zero-Width Ellipsis - Invisible Apostrophe (as in ?Dos and Don?ts?) - Invisible Full Stop (as in ?L.L.C.?) - Abbreviation Mark - Contraction Mark For ?3?me? and ?3e?, I could also imagine some XY Joiner character to make the most sense. From kenwhistler at att.net Thu Oct 6 16:28:18 2016 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 6 Oct 2016 14:28:18 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: Message-ID: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> On 10/6/2016 12:44 PM, Garth Wallace wrote: > Some representatives of the WFCC have proposed alternate arrangements > that assume there will be a need for bitwise operations to covert > between the existing chess symbols in the Miscellaneous Symbols block > and related symbols in the new block. I don't see the need but maybe > I'm missing something. I don't think you are missing anything. Bitwise operations would certainly *not* be needed in a case like this. Small lookup and mapping tables would suffice. --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From lorna_evans at sil.org Thu Oct 6 17:09:33 2016 From: lorna_evans at sil.org (Lorna Evans) Date: Thu, 6 Oct 2016 17:09:33 -0500 Subject: IJ with accent In-Reply-To: <57EB7849.3070908@yspu.org> References: <57EB7849.3070908@yspu.org> Message-ID: Has it been mentioned that U+0133 is not listed in the Soft_Dotted properties? So, that would indicate it shouldn't have the dot removed when you do put an acute over U+0133. Lorna On 9/28/2016 2:59 AM, a.lukyanov wrote: > Dutch language writing uses the ligature ? (U+0132, U+0133). When > accented, it should take an accent on each component, like this: > > > > If one uses two separate characters (i+j), one can put an accent on > each character (?j?). > > However, if monolithic ligature ? is used, how one can accent it > correctly? Unicode standard does not answer this. > > Probably one should use the sequence U+0133 U+301, with the accent > doubling automatically, but this is not implemented (??). > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 3608 bytes Desc: not available URL: From everson at evertype.com Thu Oct 6 18:01:17 2016 From: everson at evertype.com (Michael Everson) Date: Fri, 7 Oct 2016 00:01:17 +0100 Subject: IJ with accent In-Reply-To: References: <57EB7849.3070908@yspu.org> Message-ID: <55648164-7E66-40C2-8DB1-3D98E80A3EF2@evertype.com> On 6 Oct 2016, at 23:09, Lorna Evans wrote: > > Has it been mentioned that U+0133 is not listed in the Soft_Dotted properties? So, that would indicate it shouldn't have the dot removed when you do put an acute over U+0133. It ought to have that property. Michael Everson From richard.wordingham at ntlworld.com Thu Oct 6 18:32:39 2016 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 7 Oct 2016 00:32:39 +0100 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: Message-ID: <20161007003239.5d1eee7b@JRWUBU2> On Thu, 6 Oct 2016 12:44:05 -0700 Garth Wallace wrote: > Other than converting between UTFs, is bit arithmetic commonly > performed on Unicode characters? I was under the impression that it's > a rarity if it is done at all. It's possible to use it for the bulk of case folding, especially if the program only supports a specific repertoire. For specialist tasks, exploiting arithmetic relationships make sense. I would expect that most ASCII clones are handled that way. The problem is that manually constructed lookup tables are prone to human error. Richard. From Shawn.Steele at microsoft.com Thu Oct 6 18:39:37 2016 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Thu, 6 Oct 2016 23:39:37 +0000 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <20161007003239.5d1eee7b@JRWUBU2> References: <20161007003239.5d1eee7b@JRWUBU2> Message-ID: You can't even case Latin that way. Unless maybe you only care about English. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham Sent: Thursday, October 6, 2016 4:33 PM To: unicode at unicode.org Subject: Re: Bit arithmetic on Unicode characters? On Thu, 6 Oct 2016 12:44:05 -0700 Garth Wallace wrote: > Other than converting between UTFs, is bit arithmetic commonly > performed on Unicode characters? I was under the impression that it's > a rarity if it is done at all. It's possible to use it for the bulk of case folding, especially if the program only supports a specific repertoire. For specialist tasks, exploiting arithmetic relationships make sense. I would expect that most ASCII clones are handled that way. The problem is that manually constructed lookup tables are prone to human error. Richard. From kenwhistler at att.net Thu Oct 6 18:54:21 2016 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 6 Oct 2016 16:54:21 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <20161007003239.5d1eee7b@JRWUBU2> References: <20161007003239.5d1eee7b@JRWUBU2> Message-ID: On 10/6/2016 4:32 PM, Richard Wordingham wrote: > The > problem is that manually constructed lookup tables are prone to human > error. ... as are manually constructed algorithms that attempt to take advantage of sub-ranges of case pair adjacency in the Unicode code charts to do casing with bit arithmetic. --Ken From richard.wordingham at ntlworld.com Thu Oct 6 19:28:19 2016 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 7 Oct 2016 01:28:19 +0100 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <20161007003239.5d1eee7b@JRWUBU2> Message-ID: <20161007012819.684a22c6@JRWUBU2> On Thu, 6 Oct 2016 16:54:21 -0700 Ken Whistler wrote: > On 10/6/2016 4:32 PM, Richard Wordingham wrote: > > The > > problem is that manually constructed lookup tables are prone to > > human error. > > ... as are manually constructed algorithms that attempt to take > advantage of sub-ranges of case pair adjacency in the Unicode code > charts to do casing with bit arithmetic. Yes, it's a trade-off. The application I had in mind is converting between mathematical letter variants and their 'plain' forms. Perhaps there is just enough information in the UCD to allow exhaustive, automated tests. For _simple_ case folding, algorithmic case folding can be expanded to a list of range tests, generalising what is often done for ASCII. Obviously the testing should be repeated with each new version of Unicode, which is straightforward if the case folding is compliant with Unicode. (Turkish would be a reason for not being compliant.) Richard. From Shawn.Steele at microsoft.com Thu Oct 6 19:42:08 2016 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Fri, 7 Oct 2016 00:42:08 +0000 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <20161007012819.684a22c6@JRWUBU2> References: <20161007003239.5d1eee7b@JRWUBU2> <20161007012819.684a22c6@JRWUBU2> Message-ID: Presumably a table-based approach would merely require rerunning the table-building script from the UCD when new versions were released. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham Sent: Thursday, October 6, 2016 5:28 PM To: unicode at unicode.org Subject: Re: Bit arithmetic on Unicode characters? On Thu, 6 Oct 2016 16:54:21 -0700 Ken Whistler wrote: > On 10/6/2016 4:32 PM, Richard Wordingham wrote: > > The > > problem is that manually constructed lookup tables are prone to > > human error. > > ... as are manually constructed algorithms that attempt to take > advantage of sub-ranges of case pair adjacency in the Unicode code > charts to do casing with bit arithmetic. Yes, it's a trade-off. The application I had in mind is converting between mathematical letter variants and their 'plain' forms. Perhaps there is just enough information in the UCD to allow exhaustive, automated tests. For _simple_ case folding, algorithmic case folding can be expanded to a list of range tests, generalising what is often done for ASCII. Obviously the testing should be repeated with each new version of Unicode, which is straightforward if the case folding is compliant with Unicode. (Turkish would be a reason for not being compliant.) Richard. From oren.watson at gmail.com Thu Oct 6 20:18:15 2016 From: oren.watson at gmail.com (Oren Watson) Date: Thu, 6 Oct 2016 21:18:15 -0400 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <20161007012819.684a22c6@JRWUBU2> References: <20161007003239.5d1eee7b@JRWUBU2> <20161007012819.684a22c6@JRWUBU2> Message-ID: That application is hindered by the fact that ?????????????????????????????????????????????? are unallocated characters, forming gaps in the otherwise contiguous mathematical alphabets. On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > On Thu, 6 Oct 2016 16:54:21 -0700 > Ken Whistler wrote: > > > On 10/6/2016 4:32 PM, Richard Wordingham wrote: > > > The > > > problem is that manually constructed lookup tables are prone to > > > human error. > > > > ... as are manually constructed algorithms that attempt to take > > advantage of sub-ranges of case pair adjacency in the Unicode code > > charts to do casing with bit arithmetic. > > Yes, it's a trade-off. The application I had in mind is converting > between mathematical letter variants and their 'plain' forms. Perhaps > there is just enough information in the UCD to allow exhaustive, > automated tests. > > For _simple_ case folding, algorithmic case folding can be expanded to > a list of range tests, generalising what is often done for ASCII. > Obviously the testing should be repeated with each new version of > Unicode, which is straightforward if the case folding is compliant with > Unicode. (Turkish would be a reason for not being compliant.) > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lang.support at gmail.com Thu Oct 6 21:11:32 2016 From: lang.support at gmail.com (Andrew Cunningham) Date: Fri, 7 Oct 2016 13:11:32 +1100 Subject: font-encoded hacks Message-ID: Considering the mess that adhoc fonts create. What is the best way forward? Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? Most governemt translations I am seeing in Australia for Burmese are in Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy 8-bit fonts. Andrew On Friday, 7 October 2016, Ken Whistler wrote: > By the way, the biggest ongoing problem we deal with here is the continuing urge to proliferate font-encoded hacks for particular languages and writing systems. The text interchange problems that such schemes pose on an ongoing basis far far outweigh issues like what to do with a Shibuya 109 emoji, imo. -- Andrew Cunningham lang.support at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Fri Oct 7 01:08:23 2016 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Fri, 7 Oct 2016 15:08:23 +0900 Subject: font-encoded hacks In-Reply-To: References: Message-ID: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> Hello Andrew, On 2016/10/07 11:11, Andrew Cunningham wrote: > Considering the mess that adhoc fonts create. What is the best way forward? That's very clear: Use Unicode. > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? > > Most governemt translations I am seeing in Australia for Burmese are in > Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy > 8-bit fonts. Why don't you tell the Australian government? Regards, Martin. From moyogo at gmail.com Fri Oct 7 01:42:13 2016 From: moyogo at gmail.com (Denis Jacquerye) Date: Fri, 07 Oct 2016 06:42:13 +0000 Subject: font-encoded hacks In-Reply-To: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> References: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> Message-ID: In may case people resort to these hacks because it is an easier short term solution. All they have to do is use a specific font. They don't have to switch or find and install a keyboard layout and they don't have to upgrade to an OS that supports their script with Unicode properly. Because of these sort term solutions it's hard for a switch to Unicode to gain proper momentum. Unfortunately, not everybody sees the long term benefit, or often they see it but cannot do it practically. Too often Unicode compliant fonts or keyboard layouts have been lacking or at least have taken much longer to be implemented. One could wonder if a technical group for keyboards layouts would help this process. On Fri, Oct 7, 2016, 07:12 Martin J. D?rst wrote: > Hello Andrew, > > On 2016/10/07 11:11, Andrew Cunningham wrote: > > Considering the mess that adhoc fonts create. What is the best way > forward? > > That's very clear: Use Unicode. > > > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? > > > > Most governemt translations I am seeing in Australia for Burmese are in > > Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy > > 8-bit fonts. > > Why don't you tell the Australian government? > > Regards, Martin. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Fri Oct 7 01:54:00 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 7 Oct 2016 08:54:00 +0200 Subject: font-encoded hacks In-Reply-To: References: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> Message-ID: We do provide data for keyboard mappings in CLDR ( http://unicode.org/cldr/charts/latest/keyboards/index.html). There are some further pieces we need to put into place. 1. Provide a bulk uploader that applies our sanity-checking tests for a proposed keyboard mapping, and provides real-time feedback to users about the problems they need to fix. 2. Provide code that converts from the CLDR format into the major platforms' formats (we have the reverse direction already). 3. (Optional) Prettier charts! Mark On Fri, Oct 7, 2016 at 8:42 AM, Denis Jacquerye wrote: > In may case people resort to these hacks because it is an easier short > term solution. All they have to do is use a specific font. They don't have > to switch or find and install a keyboard layout and they don't have to > upgrade to an OS that supports their script with Unicode properly. Because > of these sort term solutions it's hard for a switch to Unicode to gain > proper momentum. Unfortunately, not everybody sees the long term benefit, > or often they see it but cannot do it practically. > > Too often Unicode compliant fonts or keyboard layouts have been lacking or > at least have taken much longer to be implemented. > One could wonder if a technical group for keyboards layouts would help > this process. > > On Fri, Oct 7, 2016, 07:12 Martin J. D?rst wrote: > >> Hello Andrew, >> >> On 2016/10/07 11:11, Andrew Cunningham wrote: >> > Considering the mess that adhoc fonts create. What is the best way >> forward? >> >> That's very clear: Use Unicode. >> >> > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? >> > >> > Most governemt translations I am seeing in Australia for Burmese are in >> > Zawgyi, while most of the Sgaw Karen tramslations are routinely in >> legacy >> > 8-bit fonts. >> >> Why don't you tell the Australian government? >> >> Regards, Martin. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Oct 7 02:14:07 2016 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 7 Oct 2016 08:14:07 +0100 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <20161007003239.5d1eee7b@JRWUBU2> <20161007012819.684a22c6@JRWUBU2> Message-ID: <20161007081407.52a6fa5e@JRWUBU2> On Thu, 6 Oct 2016 21:18:15 -0400 Oren Watson wrote: > On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham < > richard.wordingham at ntlworld.com> wrote: > > Yes, it's a trade-off. The application I had in mind is converting > > between mathematical letter variants and their 'plain' forms. > > Perhaps there is just enough information in the UCD to allow > > exhaustive, automated tests. > That application is hindered by the fact that > > ?????????????????????????????????????????????? are unallocated > characters, forming gaps in the otherwise contiguous mathematical > alphabets. (Aside: That written statement is illegal! -:) Yep. It's a known nuisance, which is why I suggested exhaustive tests. My email client found a font to render U+1D547 as the unwary would expect, i.e. using a glyph suitable for ? U+2119 DOUBLE-STRUCK CAPITAL P. I was surprised when I first saw those gaps; I would have expected characters with appropriate singleton decompositions to protect the unwary. (The idea might have come up at the time of encoding, and been dismissed with reasons.) I don't know whether the font's misrendering is an accident or is deliberate partial protection of the victims of bad character code selection. An old application of arithmetic was transliteration between the major Indian Indic scripts. That falls foul of Tamil and of characters that were not represented in ISCII. Richard. From gwalla at gmail.com Fri Oct 7 02:27:47 2016 From: gwalla at gmail.com (Garth Wallace) Date: Fri, 7 Oct 2016 00:27:47 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <20161007003239.5d1eee7b@JRWUBU2> <20161007012819.684a22c6@JRWUBU2> Message-ID: On Thu, Oct 6, 2016 at 5:42 PM, Shawn Steele wrote: > Presumably a table-based approach would merely require rerunning the > table-building script from the UCD when new versions were released. > For casing, sure, but that's not really relevant in this context, since Unicode doesn't really address chess piece properties like white/black beyond naming conventions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Fri Oct 7 03:43:44 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Fri, 7 Oct 2016 10:43:44 +0200 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <20161007003239.5d1eee7b@JRWUBU2> <20161007012819.684a22c6@JRWUBU2> Message-ID: > On 7 Oct 2016, at 09:27, Garth Wallace wrote: > > Unicode doesn't really address chess piece properties like white/black beyond naming conventions. >From the formal point of view, Unicode only assigns character numbers (code points), which gets a binary representation first when encoded, like with UTF-8 which makes it agree with ASCII for small numbers. The math alphabetical letters are out of order because of legacy, but that is not a problem as one will use an interface that sorts it out. These numbers are only for display to humans, and computers are nowadays fast enough to sort it out. A chess program has its own, optimized representation anyway. So possibly you might add more properties. From neil at tonal.clara.co.uk Fri Oct 7 05:59:42 2016 From: neil at tonal.clara.co.uk (Neil Harris) Date: Fri, 7 Oct 2016 11:59:42 +0100 Subject: font-encoded hacks In-Reply-To: References: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> Message-ID: <979ca47c-bc82-41ed-5ec8-9d29658791d5@tonal.clara.co.uk> On 07/10/16 07:42, Denis Jacquerye wrote: > In may case people resort to these hacks because it is an easier short term > solution. All they have to do is use a specific font. They don't have to > switch or find and install a keyboard layout and they don't have to upgrade > to an OS that supports their script with Unicode properly. Because of these > sort term solutions it's hard for a switch to Unicode to gain proper > momentum. Unfortunately, not everybody sees the long term benefit, or often > they see it but cannot do it practically. > > Too often Unicode compliant fonts or keyboard layouts have been lacking or > at least have taken much longer to be implemented. > One could wonder if a technical group for keyboards layouts would help this > process. What might also help is a reconceptualization of these hacks as being in effect non-standard character encodings: the existing software infrastructure for handling charsets could then be co-opted to convert them to (and possibly from) Unicode if desired. Neil From doug at ewellic.org Fri Oct 7 11:06:31 2016 From: doug at ewellic.org (Doug Ewell) Date: Fri, 07 Oct 2016 09:06:31 -0700 Subject: Bit arithmetic on Unicode =?UTF-8?Q?characters=3F?= Message-ID: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com> Richard Wordingham wrote: > Yes, it's a trade-off. The application I had in mind is converting > between mathematical letter variants and their 'plain' forms. Long-time list members might remember a Windows utility I wrote to convert between normal Unicode text and Mathematical Alphanumeric Symbols. Andrew West (of BabelPad fame) has a similar, web-based app that also supports things like small caps and superscript. Both of these use lookup tables to do the conversions, and use algorithms only for very broad-based operations, like distinguishing the Latin-letter range in the MAS block from the Greek letters and the digits. There's no practical value in implementing conversions like this algorithmically. Maybe if there were one or two exceptions in the MAS range instead of two dozen, it might be different. > Perhaps there is just enough information in the UCD to allow > exhaustive, automated tests. I can't find anything in the UCD that distinguishes one "font variant" from another (UnicodeData.txt shown as an example): 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D49C;MATHEMATICAL SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D4D0;MATHEMATICAL BOLD SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D504;MATHEMATICAL FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D538;MATHEMATICAL DOUBLE-STRUCK CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D56C;MATHEMATICAL BOLD FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D5A0;MATHEMATICAL SANS-SERIF CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D5D4;MATHEMATICAL SANS-SERIF BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D608;MATHEMATICAL SANS-SERIF ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D63C;MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D670;MATHEMATICAL MONOSPACE CAPITAL A;Lu;0;L; 0041;;;;N;;;;; And that's probably as it should be, because UTC never intended MAS to be readily transformed to and from "plain" characters. They're supposed to be used for mathematical expressions in which styled letters have special meaning. (My utility, and I'm sure Andrew's, were written entirely tongue-in-cheek.) > My email client found a font to render U+1D547 as the unwary > would expect, i.e. using a glyph suitable for ? U+2119 DOUBLE-STRUCK > CAPITAL P. I was surprised when I first saw those gaps; I would have > expected characters with appropriate singleton decompositions to protect > the unwary. (The idea might have come up at the time of encoding, and > been dismissed with reasons.) Unifying identical characters with identical meanings, rather than creating pointless duplicates, was a major design tenet of Unicode. > I don't know whether the font's misrendering is an accident or is > deliberate partial protection of the victims of bad character code > selection. Either way, it's a bug. Users who try to render an unassigned code point should not be "protected" by showing them a glyph that the font designer thought should be there. They should be shown a .notdef glyph so they know something is wrong. -- Doug Ewell | Thornton, CO, US | ewellic.org From doug at ewellic.org Fri Oct 7 11:22:21 2016 From: doug at ewellic.org (Doug Ewell) Date: Fri, 07 Oct 2016 09:22:21 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Marcel Schneider wrote: > According to my hypothesis and while waiting, I believe that > the intent of the gap kept in the superscript lowercase range, > is to maintain a limitation to the performance of plain text. > I don't see very well how to apply Hanlon's razor here, because > there seems to be a strong unwillingness to see people getting > keyboards that allow them to write in plain text without being > bound to high-end software. The goal seems to be to keep the users > dependent on a special formatting feature and to draw them away > from simplicity. Hanlon's Razor doesn't apply here, because it's not a dichotomy between malice and stupidity. Unicode has a particular definition of what constitutes "plain text," and it's become evident over the past 25 years that some people have different definitions. That's probably never going to change (I personally don't believe the difference between black-and-white pictures of cows and color pictures of cows is a plain-text distinction), but the Unicode definition is really the one that matters in discussions like this. What doesn't help, IMHO, is to claim that UTC has some ulterior motive to restrict the applicability of plain text and manipulate users and "draw them away from simplicity." I think insinuations of evil intent need to be better-founded than that. -- Doug Ewell | Thornton, CO, US | ewellic.org From haberg-1 at telia.com Fri Oct 7 11:57:02 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Fri, 7 Oct 2016 18:57:02 +0200 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com> References: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com> Message-ID: <20119351-749E-4B33-8A07-79C592810CE0@telia.com> > On 7 Oct 2016, at 18:06, Doug Ewell wrote: > I can't find anything in the UCD that distinguishes one "font variant" > from another (UnicodeData.txt shown as an example): > > 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D49C;MATHEMATICAL SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D4D0;MATHEMATICAL BOLD SCRIPT CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D504;MATHEMATICAL FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D538;MATHEMATICAL DOUBLE-STRUCK CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D56C;MATHEMATICAL BOLD FRAKTUR CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D5A0;MATHEMATICAL SANS-SERIF CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D5D4;MATHEMATICAL SANS-SERIF BOLD CAPITAL A;Lu;0;L; > 0041;;;;N;;;;; > 1D608;MATHEMATICAL SANS-SERIF ITALIC CAPITAL A;Lu;0;L; > 0041;;;;N;;;;; > 1D63C;MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A;Lu;0;L; > 0041;;;;N;;;;; > 1D670;MATHEMATICAL MONOSPACE CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > > And that's probably as it should be, because UTC never intended MAS to > be readily transformed to and from "plain" characters. They're supposed > to be used for mathematical expressions in which styled letters have > special meaning. I use them for input text files, and it is not particularly difficult. An efficient method is to use text substitutions, as available on MacOS. The resulting file is UTF-8 with the correct character, and typesetting systems like LuaTeX with ConTeXt or LaTeX/unicode-math translates it into a PDF. It is usually easy to immediately spot if a math style is wrong. Using it in the input makes one more aware of new styles that in the past was not available. From oren.watson at gmail.com Fri Oct 7 13:25:43 2016 From: oren.watson at gmail.com (Oren Watson) Date: Fri, 7 Oct 2016 14:25:43 -0400 Subject: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: Would it be appropriate to submit an omnibus proposal for encoding all remaining english letters in subscript, small caps, and superscript in the SMP for the purpose of not arbitrarily constraining the use of unicode for new linguistic theories and ideas, similar to the mathematical characters? superscripted: CFQXYZ, q subscript: A-Z, bcdfgqwyz small capital: QX total: 44 characters. On Fri, Oct 7, 2016 at 12:22 PM, Doug Ewell wrote: > Marcel Schneider wrote: > > > According to my hypothesis and while waiting, I believe that > > the intent of the gap kept in the superscript lowercase range, > > is to maintain a limitation to the performance of plain text. > > I don't see very well how to apply Hanlon's razor here, because > > there seems to be a strong unwillingness to see people getting > > keyboards that allow them to write in plain text without being > > bound to high-end software. The goal seems to be to keep the users > > dependent on a special formatting feature and to draw them away > > from simplicity. > > Hanlon's Razor doesn't apply here, because it's not a dichotomy between > malice and stupidity. > > Unicode has a particular definition of what constitutes "plain text," > and it's become evident over the past 25 years that some people have > different definitions. That's probably never going to change (I > personally don't believe the difference between black-and-white pictures > of cows and color pictures of cows is a plain-text distinction), but the > Unicode definition is really the one that matters in discussions like > this. > > What doesn't help, IMHO, is to claim that UTC has some ulterior motive > to restrict the applicability of plain text and manipulate users and > "draw them away from simplicity." I think insinuations of evil intent > need to be better-founded than that. > > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Fri Oct 7 13:33:09 2016 From: everson at evertype.com (Michael Everson) Date: Fri, 7 Oct 2016 19:33:09 +0100 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: On 7 Oct 2016, at 19:25, Oren Watson wrote: > > Would it be appropriate to submit an omnibus proposal for encoding all remaining english letters in subscript, small caps, and superscript in the SMP for the purpose of not arbitrarily constraining the use of unicode for new linguistic theories and ideas, similar to the mathematical characters? > > superscripted: CFQXYZ, q I?d support these. > subscript: A-Z, bcdfgqwyz If NONE of the letters A-Z have been subscripted there?s not much reason to think that?s common or useful. I?d support bcdfgqwyz > small capital: QX Small capital Q is under ballot. The subscript Greek alpha had a very good rationale recently. Michael Everson From doug at ewellic.org Fri Oct 7 13:47:49 2016 From: doug at ewellic.org (Doug Ewell) Date: Fri, 07 Oct 2016 11:47:49 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161007114749.665a7a7059d7ee80bb4d670165c8327d.49430e8579.wbe@email03.godaddy.com> Oren Watson wrote: > Would it be appropriate to submit an omnibus proposal for encoding all > remaining english letters in subscript, small caps, and superscript in > the SMP for the purpose of not arbitrarily constraining the use of > unicode for new linguistic theories and ideas, similar to the > mathematical characters? "For new theories and ideas" is a red flag. For letters in writing systems, it's traditionally been important to show how the character(s) would be used in current, real-world scenarios, not for some future, as-yet unknown purpose. It's likely that the proposals to add the existing subscript and superscript and smallcap letters were required to include such rationales. Using the math alphabets as a precedent for encoding something might not be an effective strategy, as they are often considered to be exceptional and not analogous to characters used for writing human languages. -- Doug Ewell | Thornton, CO, US | ewellic.org From kenwhistler at att.net Fri Oct 7 13:53:16 2016 From: kenwhistler at att.net (Ken Whistler) Date: Fri, 7 Oct 2016 11:53:16 -0700 Subject: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: On 10/7/2016 11:25 AM, Oren Watson wrote: > Would it be appropriate to submit an omnibus proposal for encoding all > remaining english letters in subscript, small caps, and superscript in > the SMP for the purpose of not arbitrarily constraining the use of > unicode for new linguistic theories and ideas, similar to the > mathematical characters? > I don't see that the use of Unicode characters for new linguistic theories and ideas is arbitrarily constrained as it stands. So no, I don't think it make sense to submit such a proposal on spec. I don't understand peoples' fascination with multiplying the encoding of the Latin alphabet A-Z over and over and over again. Modifier letters are different from the mathematical styled alphabets -- modifier letters include many letters and symbols beyond A-Z, and there isn't any clear marginal benefit in trying to "complete" their set somehow by filling in Latin alphabet encoding gaps without clear use cases. --Ken From oren.watson at gmail.com Fri Oct 7 14:32:16 2016 From: oren.watson at gmail.com (Oren Watson) Date: Fri, 7 Oct 2016 15:32:16 -0400 Subject: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: Hmm... "filling in Latin alphabet encoding gaps without clear use cases" is exactly what was done for the blackboard bold letters. I scarcely think that a use case was submitted for every one of the blackboard bold etc letters in the mathematical set; merely the use of blackboard bold for a general purpose of denoting sets such as the naturals, reals, complex numbers etc, and the fact that arbitrary letters might be used if a mathematician desired, seems to have sufficed. I believe the same logic applies to the case of linguistics, where the use of superscripts are a common convention. On Fri, Oct 7, 2016 at 2:53 PM, Ken Whistler wrote: > > > On 10/7/2016 11:25 AM, Oren Watson wrote: > >> Would it be appropriate to submit an omnibus proposal for encoding all >> remaining english letters in subscript, small caps, and superscript in the >> SMP for the purpose of not arbitrarily constraining the use of unicode for >> new linguistic theories and ideas, similar to the mathematical characters? >> >> > I don't see that the use of Unicode characters for new linguistic theories > and ideas is arbitrarily constrained as it stands. So no, I don't think it > make sense to submit such a proposal on spec. I don't understand peoples' > fascination with multiplying the encoding of the Latin alphabet A-Z over > and over and over again. Modifier letters are different from the > mathematical styled alphabets -- modifier letters include many letters and > symbols beyond A-Z, and there isn't any clear marginal benefit in trying to > "complete" their set somehow by filling in Latin alphabet encoding gaps > without clear use cases. > > --Ken > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lang.support at gmail.com Fri Oct 7 15:54:11 2016 From: lang.support at gmail.com (Andrew Cunningham) Date: Sat, 8 Oct 2016 07:54:11 +1100 Subject: font-encoded hacks In-Reply-To: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> References: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> Message-ID: On 7 Oct 2016 17:08, "Martin J. D?rst" wrote: > > Hello Andrew, > > > On 2016/10/07 11:11, Andrew Cunningham wrote: >> >> Considering the mess that adhoc fonts create. What is the best way forward? > > > That's very clear: Use Unicode. > LOL, thanks Martin. That has been my position for a long time. > >> Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? >> >> Most governemt translations I am seeing in Australia for Burmese are in >> Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy >> 8-bit fonts. > > > Why don't you tell the Australian government? Easier to tell the state governments, than the Federal government. But it is something I am working on. > > Regards, Martin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lang.support at gmail.com Fri Oct 7 16:22:16 2016 From: lang.support at gmail.com (Andrew Cunningham) Date: Sat, 8 Oct 2016 08:22:16 +1100 Subject: font-encoded hacks In-Reply-To: References: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> Message-ID: Hi Denis, In some ways, it was easier. But looking at each language, the issues seem to be have a slightly different slant. Sgaw Karen is interesting in comparison to Burmese. There is some use of the hacked Zwekabin font by bloggers, but most content, and key media still use 8 bit fonts. Although little use of Unicode. The lack of uptake of Unicode fonts seems to lie in the fact that the default rendering for most Myanmar script fonts is Burmese. If Sgaw Karen, etc are supported it is via locl features. If a Sgaw Karen user is using the font in software when they can't control the necessary opentype features, or don't know they can and need to .... you will eventually get a perception that their language isn't supported. There are font developers among the Burmese, Mon, Shan ethnic groups developing Unicode fonts tailored for there needs. Burmese situation is quite different. A topic that I have discussed often with Burmese colleagues. I have my theories. But the current resurgence of Zawgyi very much depends on the ability of mobile devices to render Myanmar Unicode, and the choices telcos and handset manufacturers make regarding system fonts. Regarding keyboards, it is interesting comparing Khmer and Burmese. Uptake of Unicode was earlier and quicker for Khmer. When Khmer keyboards were developed, the Khmer developers chose to live with the severe limitations of system level input frameworks. It is only this year that I have started to see truly innovative research into what a Khmer input system should be. Burmese Unicode developers on the other hand were never satisfied with those limitations, and various developers looked into alternatives. Andrew On 7 Oct 2016 17:42, "Denis Jacquerye" wrote: > > In may case people resort to these hacks because it is an easier short term solution. All they have to do is use a specific font. They don't have to switch or find and install a keyboard layout and they don't have to upgrade to an OS that supports their script with Unicode properly. Because of these sort term solutions it's hard for a switch to Unicode to gain proper momentum. Unfortunately, not everybody sees the long term benefit, or often they see it but cannot do it practically. > > Too often Unicode compliant fonts or keyboard layouts have been lacking or at least have taken much longer to be implemented. > One could wonder if a technical group for keyboards layouts would help this process. > > > On Fri, Oct 7, 2016, 07:12 Martin J. D?rst wrote: >> >> Hello Andrew, >> >> On 2016/10/07 11:11, Andrew Cunningham wrote: >> > Considering the mess that adhoc fonts create. What is the best way forward? >> >> That's very clear: Use Unicode. >> >> > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? >> > >> > Most governemt translations I am seeing in Australia for Burmese are in >> > Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy >> > 8-bit fonts. >> >> Why don't you tell the Australian government? >> >> Regards, Martin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lang.support at gmail.com Fri Oct 7 16:26:40 2016 From: lang.support at gmail.com (Andrew Cunningham) Date: Sat, 8 Oct 2016 08:26:40 +1100 Subject: font-encoded hacks In-Reply-To: References: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> Message-ID: Hi Mark, The converters would be interesting to see, and would be personally useful to me. But the type of keyboard layouts and input frameworks reflected in CLDR have limited bearing on issues related to the uptake of Unicode for Myanmar script. Andrew On 7 Oct 2016 17:54, "Mark Davis ??" wrote: > We do provide data for keyboard mappings in CLDR (http://unicode.org/cldr/ > charts/latest/keyboards/index.html). There are some further pieces we > need to put into place. > > 1. Provide a bulk uploader that applies our sanity-checking tests for > a proposed keyboard mapping, and provides real-time feedback to users about > the problems they need to fix. > 2. Provide code that converts from the CLDR format into the major > platforms' formats (we have the reverse direction already). > 3. (Optional) Prettier charts! > > > Mark > > On Fri, Oct 7, 2016 at 8:42 AM, Denis Jacquerye wrote: > >> In may case people resort to these hacks because it is an easier short >> term solution. All they have to do is use a specific font. They don't have >> to switch or find and install a keyboard layout and they don't have to >> upgrade to an OS that supports their script with Unicode properly. Because >> of these sort term solutions it's hard for a switch to Unicode to gain >> proper momentum. Unfortunately, not everybody sees the long term benefit, >> or often they see it but cannot do it practically. >> >> Too often Unicode compliant fonts or keyboard layouts have been lacking >> or at least have taken much longer to be implemented. >> One could wonder if a technical group for keyboards layouts would help >> this process. >> >> On Fri, Oct 7, 2016, 07:12 Martin J. D?rst >> wrote: >> >>> Hello Andrew, >>> >>> On 2016/10/07 11:11, Andrew Cunningham wrote: >>> > Considering the mess that adhoc fonts create. What is the best way >>> forward? >>> >>> That's very clear: Use Unicode. >>> >>> > Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? >>> > >>> > Most governemt translations I am seeing in Australia for Burmese are in >>> > Zawgyi, while most of the Sgaw Karen tramslations are routinely in >>> legacy >>> > 8-bit fonts. >>> >>> Why don't you tell the Australian government? >>> >>> Regards, Martin. >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lang.support at gmail.com Fri Oct 7 16:35:58 2016 From: lang.support at gmail.com (Andrew Cunningham) Date: Sat, 8 Oct 2016 08:35:58 +1100 Subject: font-encoded hacks In-Reply-To: <979ca47c-bc82-41ed-5ec8-9d29658791d5@tonal.clara.co.uk> References: <1c980be4-3d1c-1737-f57c-03b8a5ad4ecc@it.aoyama.ac.jp> <979ca47c-bc82-41ed-5ec8-9d29658791d5@tonal.clara.co.uk> Message-ID: HI Neil, I tend to prefer refering to them as Pseudo-Unicode solutions, rather than hacked fonts or adhoc fonts, and differentiating them from legacy or 8-bit solutions. My preferred approach would to be to treat them as a separate encoding. But I doubt that will likely happen. It doesn't help that a mobile devices I purchase in Australia will ship with a Unicode font installed, but the same device and model, may ship with a non-Unicode font installed in Myanmar and potentially other parts of SE Asia. Andrew On 7 Oct 2016 22:04, "Neil Harris" wrote: > On 07/10/16 07:42, Denis Jacquerye wrote: > >> In may case people resort to these hacks because it is an easier short >> term >> solution. All they have to do is use a specific font. They don't have to >> switch or find and install a keyboard layout and they don't have to >> upgrade >> to an OS that supports their script with Unicode properly. Because of >> these >> sort term solutions it's hard for a switch to Unicode to gain proper >> momentum. Unfortunately, not everybody sees the long term benefit, or >> often >> they see it but cannot do it practically. >> >> Too often Unicode compliant fonts or keyboard layouts have been lacking or >> at least have taken much longer to be implemented. >> One could wonder if a technical group for keyboards layouts would help >> this >> process. >> > > What might also help is a reconceptualization of these hacks as being in > effect non-standard character encodings: the existing software > infrastructure for handling charsets could then be co-opted to convert them > to (and possibly from) Unicode if desired. > > Neil > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Oct 7 17:21:03 2016 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 7 Oct 2016 23:21:03 +0100 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com> References: <20161007090631.665a7a7059d7ee80bb4d670165c8327d.7700fa085f.wbe@email03.godaddy.com> Message-ID: <20161007232103.5e51b9bd@JRWUBU2> On Fri, 07 Oct 2016 09:06:31 -0700 "Doug Ewell" wrote: > Richard Wordingham wrote: > > Perhaps there is just enough information in the UCD to allow > > exhaustive, automated tests. > I can't find anything in the UCD that distinguishes one "font variant" > from another (UnicodeData.txt shown as an example): > 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; > 1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041;;;;N;;;;; It's in that most treacherous of properties, the character's name. Richard. From doug at ewellic.org Fri Oct 7 17:31:00 2016 From: doug at ewellic.org (Doug Ewell) Date: Fri, 07 Oct 2016 15:31:00 -0700 Subject: Bit arithmetic on Unicode =?UTF-8?Q?characters=3F?= Message-ID: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com> Richard Wordingham wrote: >> I can't find anything in the UCD that distinguishes one "font >> variant" from another (UnicodeData.txt shown as an example): > > It's in that most treacherous of properties, the character's name. Well, "treacherous" is right. I'd hesitate to trust an algorithm to recognize PLANCK CONSTANT as the character name that logically fits between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I. -- Doug Ewell | Thornton, CO, US | ewellic.org From andrewcwest at gmail.com Fri Oct 7 17:41:08 2016 From: andrewcwest at gmail.com (Andrew West) Date: Fri, 7 Oct 2016 23:41:08 +0100 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com> References: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com> Message-ID: On 7 October 2016 at 23:31, Doug Ewell wrote: > > Well, "treacherous" is right. I'd hesitate to trust an algorithm to > recognize PLANCK CONSTANT as the character name that logically fits > between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I. Well, it could be picked up from that most treacherous of Unicode data files http://www.unicode.org/Public/UNIDATA/NamesList.txt Andrew From oren.watson at gmail.com Fri Oct 7 17:48:41 2016 From: oren.watson at gmail.com (Oren Watson) Date: Fri, 7 Oct 2016 18:48:41 -0400 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <20161007153100.665a7a7059d7ee80bb4d670165c8327d.457ef7205b.wbe@email03.godaddy.com> Message-ID: Except that it states at the very start of that file "this file should not be parsed for machine-readable information." On Fri, Oct 7, 2016 at 6:41 PM, Andrew West wrote: > On 7 October 2016 at 23:31, Doug Ewell wrote: > > > > Well, "treacherous" is right. I'd hesitate to trust an algorithm to > > recognize PLANCK CONSTANT as the character name that logically fits > > between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I. > > Well, it could be picked up from that most treacherous of Unicode data > files http://www.unicode.org/Public/UNIDATA/NamesList.txt > > Andrew > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Oct 7 17:52:53 2016 From: doug at ewellic.org (Doug Ewell) Date: Fri, 07 Oct 2016 15:52:53 -0700 Subject: Bit arithmetic on Unicode =?UTF-8?Q?characters=3F?= Message-ID: <20161007155253.665a7a7059d7ee80bb4d670165c8327d.fd98cfe8d8.wbe@email03.godaddy.com> Andrew West wrote: > Well, it could be picked up from that most treacherous of Unicode data > files http://www.unicode.org/Public/UNIDATA/NamesList.txt Even then, you have: ... 1D454 MATHEMATICAL ITALIC SMALL G # 0067 latin small letter g 1D455 x (planck constant - 210E) 1D456 MATHEMATICAL ITALIC SMALL I # 0069 latin small letter i ... The only way you can tell from this that U+210E is a mathematical italic small H is from the context of the previous character. That wouldn't bode well if the letter A were one of the exceptionally located code points. Thankfully, it never is, so this cleverness might work after all. -- Doug Ewell | Thornton, CO, US | ewellic.org From gwalla at gmail.com Fri Oct 7 23:29:10 2016 From: gwalla at gmail.com (Garth Wallace) Date: Fri, 7 Oct 2016 21:29:10 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote: > > On 10/6/2016 12:44 PM, Garth Wallace wrote: > > Some representatives of the WFCC have proposed alternate arrangements that > assume there will be a need for bitwise operations to covert between the > existing chess symbols in the Miscellaneous Symbols block and related > symbols in the new block. I don't see the need but maybe I'm missing > something. > > > I don't think you are missing anything. Bitwise operations would certainly > *not* be needed in a case like this. Small lookup and mapping tables > would suffice. > > --Ken > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwalla at gmail.com Fri Oct 7 23:36:56 2016 From: gwalla at gmail.com (Garth Wallace) Date: Fri, 7 Oct 2016 21:36:56 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: Sorry about the blank reply. Itchy trigger finger. On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote: > > On 10/6/2016 12:44 PM, Garth Wallace wrote: > > Some representatives of the WFCC have proposed alternate arrangements that > assume there will be a need for bitwise operations to covert between the > existing chess symbols in the Miscellaneous Symbols block and related > symbols in the new block. I don't see the need but maybe I'm missing > something. > > > I don't think you are missing anything. Bitwise operations would certainly > *not* be needed in a case like this. Small lookup and mapping tables > would suffice. > > --Ken > > Thank you. Just to be clear, this is the proposed allocation as it stands: http://i556.photobucket.com/albums/ss7/Garth_Wallace/proposed%20characters_zps81m0frvl.png That arrangement is the result of some discussion with a representative of the WFCC. And here are the alternatives that another WFCC representative recently proposed and that prompted my question: http://i556.photobucket.com/albums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcb+unicode at inf.ed.ac.uk Sat Oct 8 05:03:12 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Sat, 8 Oct 2016 11:03:12 +0100 (BST) Subject: Fwd: Why incomplete subscript/superscript alphabet ? References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: On 2016-10-07, Oren Watson wrote: > I scarcely think that a use case was submitted for every one of the > blackboard bold etc letters in the mathematical set; merely the use of > blackboard bold for a general purpose of denoting sets such as the > naturals, reals, complex numbers etc, and the fact that arbitrary letters > might be used if a mathematician desired, seems to have sufficed. Indeed. I happen to think the whole math alphabet thing was a dumb mistake. But even if it isn't - and incidentally in some communities there is or was a convention of using blackboard bold letters for matrices, which justifies all of them -: > I believe the same logic applies to the case of linguistics, where the use > of superscripts are a common convention. Either superscripts are being used mathematically, in which case you can use mathematical markup, or they're being used with very specific semantics, as in the phonetic modifier letters. For the latter case, there is a standard. First you get your letter recognized by the IPA, then you encode it. The IPA doesn't recognize arbitrary superscripts. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From haberg-1 at telia.com Sat Oct 8 08:28:02 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Sat, 8 Oct 2016 15:28:02 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: <77F4CBD2-3C01-4D5C-9C46-D119B979C755@telia.com> > On 8 Oct 2016, at 12:03, Julian Bradfield wrote: > > I happen to think the whole math alphabet thing was a dumb > mistake. They are useful in mathematics, but other sciences may not use them. > But even if it isn't - and incidentally in some communities > there is or was a convention of using blackboard bold letters for > matrices, which justifies all of them -: The double-struck letters are popular among mathematicians. >> I believe the same logic applies to the case of linguistics, where the use >> of superscripts are a common convention. > > Either superscripts are being used mathematically, in which case you > can use mathematical markup, ? The principle for adding stuff to Unicode, I think, was that the semantics should be expressible in a text-only file, modulo what the technology is able to express. For math, it is not known exactly what is required to express it semantically. TeX treats it as syntactic markup, for example, for superscripts and subscripts on the left hand side, and tensor component notation. Rendering technologies have evolved, though, so from that point of view, more would be possible today. From ken.shirriff at gmail.com Sat Oct 8 10:24:59 2016 From: ken.shirriff at gmail.com (Ken Shirriff) Date: Sat, 8 Oct 2016 08:24:59 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: Looking at the image, the idea of the proposal is to include chess piece symbols in all four 90? rotations? Wouldn't it be better to do this in markup than in Unicode? I fear a combinatorial explosion if Unicode starts including all the possible orientations of characters. (Maybe there's a good reason to do this for chess; I'm just going off the image .) Ken On Fri, Oct 7, 2016 at 9:36 PM, Garth Wallace wrote: > Sorry about the blank reply. Itchy trigger finger. > > On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote: > >> >> On 10/6/2016 12:44 PM, Garth Wallace wrote: >> >> Some representatives of the WFCC have proposed alternate arrangements >> that assume there will be a need for bitwise operations to covert between >> the existing chess symbols in the Miscellaneous Symbols block and related >> symbols in the new block. I don't see the need but maybe I'm missing >> something. >> >> >> I don't think you are missing anything. Bitwise operations would >> certainly *not* be needed in a case like this. Small lookup and mapping >> tables would suffice. >> >> --Ken >> >> > Thank you. > > Just to be clear, this is the proposed allocation as it stands: > http://i556.photobucket.com/albums/ss7/Garth_Wallace/ > proposed%20characters_zps81m0frvl.png > > That arrangement is the result of some discussion with a representative of > the WFCC. > > And here are the alternatives that another WFCC representative recently > proposed and that prompted my question: http://i556.photobucket.com/ > albums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Oct 8 11:31:05 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 8 Oct 2016 18:31:05 +0200 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: Markup for rotation is highly underdeveloped, and in this case for chess it has its own semantics, it's not just a presentation feature, possibly meant for playing on larger boards with more players than 2, and distinguished just like there's a distinction between white and black, or meant to signal some dangerous positions or candidate target positions for the next moves. I also see some additions like florettes, and elephants needed for traditional Asian variants of the game, plus combined forms (e.g. tower+horse) which are quite intrigating. There are also variants rotated 45 degrees. All those are not just meant for display on the grid of a board but in discussions about strategies. There are also combining notations added on top of chess pieces (e.g. numbering pawns that are otherwise identical, but in plain text you can still use notations with superscript digits or letters, distinguished clearly from the numbering of grid positions, or by adding some other punctuation marks). I still don't see in these images the elephants (or other pieces like unmovable rocks or rivers, or special pieces added to create handicaps for one of the player). I've also seen some chess players using special queens by putting a pawn on top of a nother falt pawn, with more limited movements than a standard queen. There are also bishops/sorcerers/magicians, eagles, dragoons, tigers/lions, rats, dogs/foxes, snakes, spiders, soldiers/archers, canons, walls/fortresses, gold/treasures... Chess games have a lot of variants with their supporters. Modern movies are also promoting some variants. 2016-10-08 17:24 GMT+02:00 Ken Shirriff : > Looking at the image, the idea of the proposal is to include chess piece > symbols in all four 90? rotations? Wouldn't it be better to do this in > markup than in Unicode? I fear a combinatorial explosion if Unicode starts > including all the possible orientations of characters. (Maybe there's a > good reason to do this for chess; I'm just going off the image > > .) > > Ken > > On Fri, Oct 7, 2016 at 9:36 PM, Garth Wallace wrote: > >> Sorry about the blank reply. Itchy trigger finger. >> >> On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler wrote: >> >>> >>> On 10/6/2016 12:44 PM, Garth Wallace wrote: >>> >>> Some representatives of the WFCC have proposed alternate arrangements >>> that assume there will be a need for bitwise operations to covert between >>> the existing chess symbols in the Miscellaneous Symbols block and related >>> symbols in the new block. I don't see the need but maybe I'm missing >>> something. >>> >>> >>> I don't think you are missing anything. Bitwise operations would >>> certainly *not* be needed in a case like this. Small lookup and mapping >>> tables would suffice. >>> >>> --Ken >>> >>> >> Thank you. >> >> Just to be clear, this is the proposed allocation as it stands: >> http://i556.photobucket.com/albums/ss7/Garth_Wallace/propose >> d%20characters_zps81m0frvl.png >> >> That arrangement is the result of some discussion with a representative >> of the WFCC. >> >> And here are the alternatives that another WFCC representative recently >> proposed and that prompted my question: http://i556.photobucket.com/al >> bums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskasskrv at gmail.com Sat Oct 8 12:57:41 2016 From: jameskasskrv at gmail.com (James Kass) Date: Sat, 8 Oct 2016 09:57:41 -0800 Subject: Noto unified font Message-ID: Google and Monotype unveil The Noto Project's unified font for all languages: https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-projects-unified-font-for-all-languages/ About ten years or so ago, I recall being actively discouraged from working on the Code2xxx fonts because pan-Unicode fonts were pass?, because there was no perceived need for displaying multilingual text in a coherent typeface, and that the optimal solution was for people to simply have multiple fonts targeting the users' required scripts. Ironic, isn't it? Best regards, James Kass -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Oct 8 14:08:07 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 8 Oct 2016 21:08:07 +0200 Subject: Noto unified font In-Reply-To: References: Message-ID: Technically it is not a single font but a coherent collection of fonts made specifically for each script (some scripts having several national variants, notably for sinographs, most of them having two styles except symbols, most of them having two weights, except symbols that have a single weight and sinograms having more...) So no they are not "pan-Unicode". Each font in the collection however has its own metrics, best suited for each script, and they are still made to harmonize together (tested side-by-side with Latin and CJK) so they look great in multilingual documents. It would have not been possible in a single font anyway. 2016-10-08 19:57 GMT+02:00 James Kass : > Google and Monotype unveil The Noto Project's unified font for all > languages: > https://techcrunch.com/2016/10/06/google-and-monotype- > unveil-the-noto-projects-unified-font-for-all-languages/ > > About ten years or so ago, I recall being actively discouraged from > working on the Code2xxx fonts because pan-Unicode fonts were pass?, because > there was no perceived need for displaying multilingual text in a coherent > typeface, and that the optimal solution was for people to simply have > multiple fonts targeting the users' required scripts. > > Ironic, isn't it? > > Best regards, > > James Kass > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Sat Oct 8 14:45:28 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 8 Oct 2016 21:45:28 +0200 (CEST) Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: <970158823.8821.1475955929007.JavaMail.www@wwinf1j26> On Fri, 07 Oct 2016 09:22:21 -0700, Doug Ewell wrote: > Marcel Schneider wrote: > >> According to my hypothesis and while waiting, I believe that >> the intent of the gap kept in the superscript lowercase range, >> is to maintain a limitation to the performance of plain text. >> I don't see very well how to apply Hanlon's razor here, because >> there seems to be a strong unwillingness to see people getting >> keyboards that allow them to write in plain text without being >> bound to high-end software. The goal seems to be to keep the users >> dependent on a special formatting feature and to draw them away >> from simplicity. > > Hanlon's Razor doesn't apply here, because it's not a dichotomy between > malice and stupidity. *If* the comment[1] on the proposal to encode *MODIFIER LETTER SMALL Q had the status of a newspaper article, I really *could* apply Hanlon?s Razor, and the issue would be settled. Sadly it hasn?t. More, this paper encloses the only *known* reason(s) why the UTC was drawn to reject the proposal. > > Unicode has a particular definition of what constitutes "plain text," > and it's become evident over the past 25 years that some people have > different definitions. That's probably never going to change (I > personally don't believe the difference between black-and-white pictures > of cows and color pictures of cows is a plain-text distinction), Unicode has added the distinction between text style and emoji style, and I never doubted that there are good reasons to do so. As I understand it, this allows to multiply the number of emoji without any expense of scalar values, for the streamlined implementation of an enhanced performance of plain text. There is a big forthcoming benefit for users all over the world, not just Latin script, or not just one language community. Or not just the international keyboard standard, if this is the point here. > but the Unicode definition is really the one that matters in discussions > like this. This is why the proposer did use it. Let?s quote him: On 2010-07-13, Karl Pentzlin wrote:[2] >>> French abbreviations of single words often are done by showing >>> the last letter, phoneme, or syllable of the word as superscript, >>> instead of showing an abbreviation dot or similar. >>> As abbreviations of this kind are plain text, the abbreviation method >>> being a fixed convention like the use of punctuation marks, it is >>> desirable to have the possibility to use modifier letters in this case, >>> rather than to have to rely on markup or higher level protocols. The Unicode Standard says:[4] >>>> The relationship between appearance and content of plain text >>>> may be summarized as follows: >>>> Plain text must contain enough information to permit the text >>>> to be rendered legibly, and nothing more. >>>> The Unicode Standard encodes plain text. On 2010-08-10, Karl Pentzlin wrote:[3] >>> On the other hand: "Biblio^que" (abbreviation for French "Biblioth?que") >>> does not have the same meaning as "Biblioque" (no valid French word). >>> Thus, here the use of superscript carries semantic, and is therefore >>> plain text. > > What doesn't help, IMHO, is to claim that UTC has some ulterior motive > to restrict the applicability of plain text and manipulate users and > "draw them away from simplicity." I think insinuations of evil intent > need to be better-founded than that. First I wish to thank you for having posted this analysis, making me thus aware that the wording of my hypothesis was lacking clarity. The ?unwillingness? that I?ve deciphered, is NOT UTC?s. I think that a clear distinction ought to be drawn between *the UTC* as a whole, whose motives in this case I?ve asked for and have not been given any idea, while staying firmly convinced that it is always benevolent and eager to help all language communities to express themselves and to be recorded, and on the other hand some hypothetical kind of lobbying that led to produce the cited comment,[1] which in itself is enough to question the forces implied, and which interest they might have in keeping one language community away from fully unambiguous expression in plain text, and beyond, in unsupporting the work of ISO/IEC SC35/WG1[5] for enhancement and completion of the international keyboard standard. There is also a *really long* answer in my (plain) text editor. It?s finally not sent to the Unicode Mailing List. /*except on request*/ Regards, Marcel [1] The comment on the proposal: http://www.unicode.org/L2/L2010/10315-comment.pdf [2] The proposal: http://www.unicode.org/L2/L2010/10230-modifier-q.pdf [3] The proposers comment on the comment and the proposal: http://www.unicode.org/L2/L2010/10316-cmts.pdf [4] On page 19 of TUS 9.0. [5] On Mon Jan 04 2010 - 19:37:45 CST, Karl Pentzlin wrote: > Microsoft is to be praised for its engagement in providing localized > variants of its operating system and other software, thus supporting > the cultural diversity. It is a pity that the company did not accept > the invitation to participate in the special area covered by ISO/IEC > SC35/WG1, to support their own goals there. Please read full discussion: http://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0040.html From luke at dashjr.org Sat Oct 8 18:44:03 2016 From: luke at dashjr.org (Luke Dashjr) Date: Sat, 8 Oct 2016 23:44:03 +0000 Subject: Noto unified font In-Reply-To: References: Message-ID: <201610082344.04995.luke@dashjr.org> On Saturday, October 08, 2016 5:57:41 PM James Kass wrote: > Google and Monotype unveil The Noto Project's unified font for all > languages: > https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-proje > cts-unified-font-for-all-languages/ It's unfortunate they released it under the non-free OFL license. :( From samjnaa at gmail.com Sat Oct 8 18:50:40 2016 From: samjnaa at gmail.com (Shriramana Sharma) Date: Sun, 9 Oct 2016 05:20:40 +0530 Subject: Noto unified font In-Reply-To: <201610082344.04995.luke@dashjr.org> References: <201610082344.04995.luke@dashjr.org> Message-ID: Interested to know why you think OFL is non-free... On 9 Oct 2016 05:18, "Luke Dashjr" wrote: > On Saturday, October 08, 2016 5:57:41 PM James Kass wrote: > > Google and Monotype unveil The Noto Project's unified font for all > > languages: > > https://techcrunch.com/2016/10/06/google-and-monotype- > unveil-the-noto-proje > > cts-unified-font-for-all-languages/ > > It's unfortunate they released it under the non-free OFL license. :( > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke at dashjr.org Sat Oct 8 19:00:33 2016 From: luke at dashjr.org (Luke Dashjr) Date: Sun, 9 Oct 2016 00:00:33 +0000 Subject: Noto unified font In-Reply-To: References: <201610082344.04995.luke@dashjr.org> Message-ID: <201610090000.35037.luke@dashjr.org> It forbids sale of the font by itself. (I'm aware the FSF thinks there's a loophole by bundling "hello world", but I don't think this would hold up in court.) On Saturday, October 08, 2016 11:50:40 PM Shriramana Sharma wrote: > Interested to know why you think OFL is non-free... > > On 9 Oct 2016 05:18, "Luke Dashjr" wrote: > > On Saturday, October 08, 2016 5:57:41 PM James Kass wrote: > > > Google and Monotype unveil The Noto Project's unified font for all > > > languages: > > > https://techcrunch.com/2016/10/06/google-and-monotype-> > > > unveil-the-noto-proje > > > > > cts-unified-font-for-all-languages/ > > > > It's unfortunate they released it under the non-free OFL license. :( From samjnaa at gmail.com Sat Oct 8 19:16:37 2016 From: samjnaa at gmail.com (Shriramana Sharma) Date: Sun, 9 Oct 2016 05:46:37 +0530 Subject: Noto unified font In-Reply-To: References: <201610082344.04995.luke@dashjr.org> <201610090000.35037.luke@dashjr.org> Message-ID: That's your definition of non-free then... If I were a font developer and of mind to release my font for use without charge, I wouldn't want anyone else to make money out of selling it when I myself - who put the effort into preparing it - don't make money from selling it. So it protects the moral rights of the developer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskasskrv at gmail.com Sat Oct 8 19:20:20 2016 From: jameskasskrv at gmail.com (James Kass) Date: Sat, 8 Oct 2016 16:20:20 -0800 Subject: Noto unified font In-Reply-To: References: Message-ID: Philippe Verdy wrote, > Technically it is not a single font but a coherent collection of fonts made > specifically for each script ... In a constantly changing world, it should be a pleasant experience to be reminded that some things remain constant. Whether the Noto font family is released as one file or many, it seems that somebody considers it a worthwhile endeavor. Longtime Unicode proponents remember when complex script shaping (for example) wasn't supported. Nowadays, thanks in good part to Unicode pioneers, most everything just works "right out of the box". As it should. With the advent of the Noto font (or font collection), users have the option of getting a reasonable display of desired characters rather than strings of boxes or last resort fallbacks. That's also as it should be, IMHO. Best regards, James Kass On Sat, Oct 8, 2016 at 11:08 AM, Philippe Verdy wrote: > Technically it is not a single font but a coherent collection of fonts made > specifically for each script (some scripts having several national variants, > notably for sinographs, most of them having two styles except symbols, most > of them having two weights, except symbols that have a single weight and > sinograms having more...) > > So no they are not "pan-Unicode". Each font in the collection however has > its own metrics, best suited for each script, and they are still made to > harmonize together (tested side-by-side with Latin and CJK) so they look > great in multilingual documents. It would have not been possible in a single > font anyway. > > > 2016-10-08 19:57 GMT+02:00 James Kass : >> >> Google and Monotype unveil The Noto Project's unified font for all >> languages: >> >> https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-projects-unified-font-for-all-languages/ >> >> About ten years or so ago, I recall being actively discouraged from >> working on the Code2xxx fonts because pan-Unicode fonts were pass?, because >> there was no perceived need for displaying multilingual text in a coherent >> typeface, and that the optimal solution was for people to simply have >> multiple fonts targeting the users' required scripts. >> >> Ironic, isn't it? >> >> Best regards, >> >> James Kass > > From gwalla at gmail.com Sat Oct 8 20:02:56 2016 From: gwalla at gmail.com (Garth Wallace) Date: Sat, 8 Oct 2016 18:02:56 -0700 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: On Sat, Oct 8, 2016 at 9:31 AM, Philippe Verdy wrote: > Markup for rotation is highly underdeveloped, and in this case for chess > it has its own semantics, it's not just a presentation feature, possibly > meant for playing on larger boards with more players than 2, and > distinguished just like there's a distinction between white and black, or > meant to signal some dangerous positions or candidate target positions for > the next moves. > Not exactly. Rotation of chess piece symbols is not a presentation feature (at least as I understand the term), and isn't meant for use with multiplayer games. The rotated pieces are used in chess problems, specifically heterodox or "fairy chess" problems, where they stand in for non-standard pieces. A rotated rook, for instance, means "a piece that is not a rook but is similar in some respects"; which piece it represents specifically depends on context. Conventionally, the upside-down queen represents a "grasshopper" and the upside-down knight a "nightrider", but otherwise they are assigned on a problem-by-problem basis. This practice dates back to the early 20th century and was originally so that problem composers wouldn't have to cut new type for every new piece they invent but is now traditional. I also see some additions like florettes, and elephants needed for > traditional Asian variants of the game, plus combined forms (e.g. > tower+horse) which are quite intrigating. > There are also variants rotated 45 degrees. > The florettes are also used in problems, as are the equihoppers (the symbol that looks a bit like a bow tie or spindle). The compound symbols are found in problems and in several common variants such as Capablanca Chess and Grand Chess. The jester's cap is similar. The elephant and fers are used in shatranj or medieval chess. > All those are not just meant for display on the grid of a board but in > discussions about strategies. There are also combining notations added on > top of chess pieces (e.g. numbering pawns that are otherwise identical, but > in plain text you can still use notations with superscript digits or > letters, distinguished clearly from the numbering of grid positions, or by > adding some other punctuation marks). > I haven't encountered that. It's rarely necessary to differentiate individual pawns in notation: their moves are so limited that it's usually obvious which pawn is moving, and there is a standard method of disambiguating moves by starting square if needed. > I still don't see in these images the elephants (or other pieces like > unmovable rocks or rivers, or special pieces added to create handicaps for > one of the player). I've also seen some chess players using special queens > by putting a pawn on top of a nother falt pawn, with more limited movements > than a standard queen. There are also bishops/sorcerers/magicians, eagles, > dragoons, tigers/lions, rats, dogs/foxes, snakes, > spiders, soldiers/archers, canons, walls/fortresses, gold/treasures... > Chess games have a lot of variants with their supporters. Modern movies are > also promoting some variants. > There are elephants in the proposal, using a shape found in medieval manuscripts. Rocks and rivers are board features and not found in notation. > > 2016-10-08 17:24 GMT+02:00 Ken Shirriff : > >> >> Looking at the image, the idea of the proposal is to include chess piece >> symbols in all four 90? rotations? Wouldn't it be better to do this in >> markup than in Unicode? I fear a combinatorial explosion if Unicode starts >> including all the possible orientations of characters. (Maybe there's a >> good reason to do this for chess; I'm just going off the image >> >> .) >> > The proposal covers this. These have a well-established use in chess notation, which doesn't apply to non-chess symbols. Markup would be the wrong way to do this. It's not like, say, electronic schematics where a diode symbol may be found in any orientation but still always represents a diode: a rotated queen symbol is specifically *not a queen* but another piece entirely. Currently, fairy chess problemists rely on font hacks and PDFs (even for relatively short texts). -------------- next part -------------- An HTML attachment was scrubbed... URL: From leoboiko at namakajiri.net Sat Oct 8 21:02:56 2016 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Sat, 8 Oct 2016 23:02:56 -0300 Subject: Noto unified font In-Reply-To: References: <201610082344.04995.luke@dashjr.org> <201610090000.35037.luke@dashjr.org> Message-ID: That's not "his" definition of non-free. Restrictions on selling copies commercially violate the Free Software Foundation's definition of non-free: https://www.gnu.org/philosophy/free-sw.html https://www.gnu.org/licenses/license-list.html#NonFreeSoftwareLicenses And also the Open Source Initiative's definition of non-free: https://opensource.org/osd-annotated https://opensource.org/faq#commercial And also the Debian project's definition of non-free: https://www.debian.org/social_contract#guidelines In short, every single major free software organization requires free software to allow the user complete freedom of redistribution, commercial or otherwise. Otherwise the software isn't free in the sense of giving the user freedom; it is merely free of charge. 2016-10-08 21:16 GMT-03:00 Shriramana Sharma : > That's your definition of non-free then... If I were a font developer and > of mind to release my font for use without charge, I wouldn't want anyone > else to make money out of selling it when I myself - who put the effort > into preparing it - don't make money from selling it. So it protects the > moral rights of the developer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke at dashjr.org Sat Oct 8 21:50:42 2016 From: luke at dashjr.org (Luke Dashjr) Date: Sun, 9 Oct 2016 02:50:42 +0000 Subject: Noto unified font In-Reply-To: <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> References: <201610082344.04995.luke@dashjr.org> <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> Message-ID: <201610090250.44483.luke@dashjr.org> On Sunday, October 09, 2016 12:08:05 AM Harshula wrote: > On 09/10/16 10:44, Luke Dashjr wrote: > > It's unfortunate they released it under the non-free OFL license. :( > > Which alternate license would you recommend? MIT license or LGPL seem reasonable and common among free fonts. Some also choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply to fonts. On Sunday, October 09, 2016 12:16:37 AM you wrote: > That's your definition of non-free then... If I were a font developer and > of mind to release my font for use without charge, I wouldn't want anyone > else to make money out of selling it when I myself - who put the effort > into preparing it - don't make money from selling it. So it protects the > moral rights of the developer. It's the standard definition of free software. https://www.gnu.org/philosophy/selling.en.html From harshula at hj.id.au Sat Oct 8 19:08:05 2016 From: harshula at hj.id.au (Harshula) Date: Sun, 9 Oct 2016 11:08:05 +1100 Subject: Noto unified font In-Reply-To: <201610082344.04995.luke@dashjr.org> References: <201610082344.04995.luke@dashjr.org> Message-ID: <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> On 09/10/16 10:44, Luke Dashjr wrote: > It's unfortunate they released it under the non-free OFL license. :( Which alternate license would you recommend? cya, # From harshula at hj.id.au Sat Oct 8 22:35:36 2016 From: harshula at hj.id.au (Harshula) Date: Sun, 9 Oct 2016 14:35:36 +1100 Subject: Noto unified font In-Reply-To: <201610090250.44483.luke@dashjr.org> References: <201610082344.04995.luke@dashjr.org> <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> <201610090250.44483.luke@dashjr.org> Message-ID: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> On 09/10/16 13:50, Luke Dashjr wrote: > On Sunday, October 09, 2016 12:08:05 AM Harshula wrote: >> On 09/10/16 10:44, Luke Dashjr wrote: >>> It's unfortunate they released it under the non-free OFL license. :( FSF appears to classify OFL as a Free license (though incompatible with the GNU GPL & FDL): https://www.gnu.org/licenses/license-list.en.html#Fonts >> Which alternate license would you recommend? > > MIT license or LGPL seem reasonable and common among free fonts. Some also > choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply to > fonts. Interestingly, Noto project saw advantages of OFL and moved to using it, not too long ago: https://github.com/googlei18n/noto-fonts/blob/master/NEWS It seems you disagree with FSF's interpretation of the OFL and bundling Hello World as being sufficient. Are there other reasons for your preference for MIT/LGPL/GPL over OFL? > On Sunday, October 09, 2016 12:16:37 AM you wrote: >> That's your definition of non-free then... If I were a font developer and >> of mind to release my font for use without charge, I wouldn't want anyone >> else to make money out of selling it when I myself - who put the effort >> into preparing it - don't make money from selling it. So it protects the >> moral rights of the developer. Why are you attributing Shriramana Sharma's email to me? It might be clearer if you replied to his email. cya, # From verdy_p at wanadoo.fr Sat Oct 8 23:21:32 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Oct 2016 06:21:32 +0200 Subject: Noto unified font In-Reply-To: References: Message-ID: 2016-10-09 2:20 GMT+02:00 James Kass : > Philippe Verdy wrote, > > > Technically it is not a single font but a coherent collection of fonts > made > > specifically for each script ... > > In a constantly changing world, it should be a pleasant experience to > be reminded that some things remain constant. > > Whether the Noto font family is released as one file or many, it seems that > somebody considers it a worthwhile endeavor. > The major reason there are several fonts and not just one is because not all scripts have the same variants and styles (and it's not a defect of the design). And there are different requirements for example allowing choosing preferences between color or monochrmatic emojis, using standard (narrow) Latin from Noto Sans, or wider variants of Latin for CJK: in a stylesheet you can still customize the order even if Noto Sans will be part of all sets of families. Some variants don't make sens at all for Arabic (sans-serif and serif, but are replaced by two traditional variants of the script); monospaced fonts are also not available for Arabic (they exist but are extremely poor), or many Indic scripts. The purpose is not to invent new designs but present designs that are easily read and convenient for each script (and that's why there are also more weights in the CJK fonts; for Latin additional weights way be directly infered from the two stadnard weights, may be later there will be Latin/Greek/Cyrillic with more weights, but the need was less urgent than for CJK due to its complexity to make it readable and still preserve a coherent overall blackness/contrast). May be some fonts in this set could be merged, e.g. the Cherokee font could be merged with the Latin/Greek/Cyrillic font. -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Oct 8 23:37:24 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Oct 2016 06:37:24 +0200 Subject: Noto unified font In-Reply-To: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> References: <201610082344.04995.luke@dashjr.org> <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> <201610090250.44483.luke@dashjr.org> <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> Message-ID: The licence itself says it respects the 4 FSF freedoms. It also explicitly allows reselling (rule DFSG #1): http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL It is not directly compatible with the GPL in a composite product, but with LGPL there's no problem, and there's no problem if the font is clearly separable and distributed along with its licence, even if the software coming with it or the package containing it is commercial: you are allowed to detach it from the package and redistribute. Really you are challenging the licence for unfair reasons May be you just think that the GPL or MIT licences are enough. Or you'd like the Public Domain (which in fact offers no protection and no long term warranty as it is reappropriatable at any time by proprietary licences, even retrospectively, we see everyday companies registering properties on pseudo-new technologies that are in fact inherited from the past and are used since centuries or more by the whole humanity, they leave some space only for today's current usages in limtied scopes, but protect everything else by inventing some strange concepts around the basic feature, with unfair claims and then want to collect taxes). Also an international public domain does not exist at all (it is always restricted by new additions to the copyright laws). Publishing somethingf in the Public domain is really unsafe. 2016-10-09 5:35 GMT+02:00 Harshula : > On 09/10/16 13:50, Luke Dashjr wrote: > > On Sunday, October 09, 2016 12:08:05 AM Harshula wrote: > >> On 09/10/16 10:44, Luke Dashjr wrote: > >>> It's unfortunate they released it under the non-free OFL license. :( > > FSF appears to classify OFL as a Free license (though incompatible with > the GNU GPL & FDL): > https://www.gnu.org/licenses/license-list.en.html#Fonts > > >> Which alternate license would you recommend? > > > > MIT license or LGPL seem reasonable and common among free fonts. Some > also > > choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply > to > > fonts. > > Interestingly, Noto project saw advantages of OFL and moved to using it, > not too long ago: > https://github.com/googlei18n/noto-fonts/blob/master/NEWS > > It seems you disagree with FSF's interpretation of the OFL and bundling > Hello World as being sufficient. Are there other reasons for your > preference for MIT/LGPL/GPL over OFL? > > > On Sunday, October 09, 2016 12:16:37 AM you wrote: > >> That's your definition of non-free then... If I were a font developer > and > >> of mind to release my font for use without charge, I wouldn't want > anyone > >> else to make money out of selling it when I myself - who put the effort > >> into preparing it - don't make money from selling it. So it protects the > >> moral rights of the developer. > > Why are you attributing Shriramana Sharma's email to me? It might be > clearer if you replied to his email. > > cya, > # > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskasskrv at gmail.com Sun Oct 9 00:26:11 2016 From: jameskasskrv at gmail.com (James Kass) Date: Sat, 8 Oct 2016 21:26:11 -0800 Subject: Noto unified font In-Reply-To: References: <201610082344.04995.luke@dashjr.org> <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> <201610090250.44483.luke@dashjr.org> <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> Message-ID: Philippe Verdy wrote, > The purpose is not to invent new designs but present designs > that are easily read and convenient for each script ... Based on what I've seen so far, Monotype has done a splendid job. No doubt involving plenty of design work. Philippe Verdy has outlined some of the design decisions already, and it should be noted that designing a pan-Unicode font (or font collection) for multilingual text display using easily read script-conventional glyphs probably isn't as easy as it sounds. The word "free" when applied to any product means "free of charge". "Freeware" appears to be a contraction of "free software". If so, the two terms are identical in meaning. If not, speakers of standard English would consider them so. It's too bad the promoters of "free-libre" software didn't call it "libre". Creating an artificial distinction between identical terms in order to promote a philosophy some reject smacks of Newspeak. Best regards, James Kass From luke at dashjr.org Sun Oct 9 01:17:57 2016 From: luke at dashjr.org (Luke Dashjr) Date: Sun, 9 Oct 2016 06:17:57 +0000 Subject: Noto unified font In-Reply-To: References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> Message-ID: <201610090617.59735.luke@dashjr.org> On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote: > The licence itself says it respects the 4 FSF freedoms. It also explicitly > allows reselling (rule DFSG #1): > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL No, it doesn't. That link is just a commentary, and of no relevance to non- SIL-owned fonts. The actual license itself begins with the problematic restriction: 1) Neither the Font Software nor any of its individual components, in Original or Modified Versions, may be sold by itself. > It is not directly compatible with the GPL in a composite product, but with > LGPL there's no problem, LGPL doesn't work that way. It allows other software to use it without being compatible, but any component or dependency of the LGPL'd software must meet the same requirements as the GPL. > Really you are challenging the licence for unfair reasons What unfair reasons are those? My *only* concern is that it is not free. I don't even care to sell the fonts myself, but simply do not use non-free software on principle. Luke From prosfilaes at gmail.com Sun Oct 9 01:36:43 2016 From: prosfilaes at gmail.com (David Starner) Date: Sun, 09 Oct 2016 06:36:43 +0000 Subject: Noto unified font In-Reply-To: References: <201610082344.04995.luke@dashjr.org> <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> <201610090250.44483.luke@dashjr.org> <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> Message-ID: On Sat, Oct 8, 2016 at 11:07 PM James Kass wrote: > The word "free" when applied to any product means "free of charge". > Using the word "product" sort of biases your argument, does it not? "Freeware" appears to be a contraction of "free software". If so, the > two terms are identical in meaning. That's bad lexicography. A "PC" is not merely a computer that is personal. "software" is not "ware" that is "soft". The first use of the word freeware was in late 1982, and the use of free software was used in Infoworld in 1983 to refer to public domain software. The distinction has been around for a long time. It's too bad the promoters of > "free-libre" software didn't call it "libre". Creating an artificial > distinction between identical terms in order to promote a philosophy > some reject smacks of Newspeak. > Which someone else would complain about. That is one of the meanings of "free" in English. English is a large confusing language with many communities with their own jargon, and for 30 years "free software" has referred to software that can be used without restriction on changing and reselling in certain English speaking communities. Like British/American disagreements, it seems to be a problem more frequently of people getting annoyed than people getting confused. -------------- next part -------------- An HTML attachment was scrubbed... URL: From liste at secarica.ro Sun Oct 9 03:15:52 2016 From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=) Date: Sun, 9 Oct 2016 11:15:52 +0300 Subject: Noto unified font In-Reply-To: <201610090000.35037.luke@dashjr.org> References: <201610082344.04995.luke@dashjr.org> <201610090000.35037.luke@dashjr.org> Message-ID: <20161009111552.86e86c61201dfb753e0b778c@secarica.ro> ?n data de Sun, 9 Oct 2016 00:00:33 +0000, Luke Dashjr a scris: > It forbids sale of the font by itself. I would say "big deal". A font belongs merely to the "cultural" side of a project or product. I this area it is better to discourage any commercial interests in order to serve better the cultural aspects and avoid any [artificial] obstacles. So, I fail to understand why the forbid the sale of the font itself is a problem or a bad thing. On contrary ! Cristi -- Cristian Secar? http://www.sec?ric?.ro From dzo at bisharat.net Sun Oct 9 05:05:18 2016 From: dzo at bisharat.net (dzo at bisharat.net) Date: Sun, 9 Oct 2016 10:05:18 +0000 Subject: Noto unified font In-Reply-To: References: Message-ID: <1492952671-1476007520-cardhu_decombobulator_blackberry.rim.net-289303052-@b13.c1.bise6.blackberry> James, Any thoughts about a Code 2xxx suite/family based on all the work you've already done? All, A tangential question wrt the history of computer font development: What kind of collections / repositories of old fonts are there? In particular, thinking of pre-Unicode "special fonts" including hacks for languages written with extended Latin characters. I understand that Chantal Enguehard has a collection of 8-bit fonts developed for African languages. Are there others? Any thoughts about a "museum" of fonts and encodings? Could have educational value in the future. Don Osborn Sent via BlackBerry by AT&T -----Original Message----- From: James Kass Sender: "Unicode" Date: Sat, 8 Oct 2016 16:20:20 To: Unicode Public Subject: Re: Noto unified font Philippe Verdy wrote, > Technically it is not a single font but a coherent collection of fonts made > specifically for each script ... In a constantly changing world, it should be a pleasant experience to be reminded that some things remain constant. Whether the Noto font family is released as one file or many, it seems that somebody considers it a worthwhile endeavor. Longtime Unicode proponents remember when complex script shaping (for example) wasn't supported. Nowadays, thanks in good part to Unicode pioneers, most everything just works "right out of the box". As it should. With the advent of the Noto font (or font collection), users have the option of getting a reasonable display of desired characters rather than strings of boxes or last resort fallbacks. That's also as it should be, IMHO. Best regards, James Kass On Sat, Oct 8, 2016 at 11:08 AM, Philippe Verdy wrote: > Technically it is not a single font but a coherent collection of fonts made > specifically for each script (some scripts having several national variants, > notably for sinographs, most of them having two styles except symbols, most > of them having two weights, except symbols that have a single weight and > sinograms having more...) > > So no they are not "pan-Unicode". Each font in the collection however has > its own metrics, best suited for each script, and they are still made to > harmonize together (tested side-by-side with Latin and CJK) so they look > great in multilingual documents. It would have not been possible in a single > font anyway. > > > 2016-10-08 19:57 GMT+02:00 James Kass : >> >> Google and Monotype unveil The Noto Project's unified font for all >> languages: >> >> https://techcrunch.com/2016/10/06/google-and-monotype-unveil-the-noto-projects-unified-font-for-all-languages/ >> >> About ten years or so ago, I recall being actively discouraged from >> working on the Code2xxx fonts because pan-Unicode fonts were pass?, because >> there was no perceived need for displaying multilingual text in a coherent >> typeface, and that the optimal solution was for people to simply have >> multiple fonts targeting the users' required scripts. >> >> Ironic, isn't it? >> >> Best regards, >> >> James Kass > > From mark at macchiato.com Sun Oct 9 06:00:30 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 9 Oct 2016 13:00:30 +0200 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: Essentially all of the game pieces that are in Unicode were added for compatibility with existing character sets. ?I'm guessing that ?there are hundreds to thousands of possible other symbols associated with games in one way or another, or that could be dug out of instruction manuals (eg, http://www.catan.com/files/downloads/catan_5th_ed_rules_eng_150303.pdf). (Many of those would be encumbered by copyright issues, but there are no doubt others that would not.) I would recommend that any proposal for additional game symbols provide clear evidence for why those particular game symbols are required to be exchanged in plain text, in a way that many, many other possible game symbols are not. Mark On Sun, Oct 9, 2016 at 3:02 AM, Garth Wallace wrote: > On Sat, Oct 8, 2016 at 9:31 AM, Philippe Verdy wrote: > >> Markup for rotation is highly underdeveloped, and in this case for chess >> it has its own semantics, it's not just a presentation feature, possibly >> meant for playing on larger boards with more players than 2, and >> distinguished just like there's a distinction between white and black, or >> meant to signal some dangerous positions or candidate target positions for >> the next moves. >> > > Not exactly. Rotation of chess piece symbols is not a presentation feature > (at least as I understand the term), and isn't meant for use with > multiplayer games. The rotated pieces are used in chess problems, > specifically heterodox or "fairy chess" problems, where they stand in for > non-standard pieces. A rotated rook, for instance, means "a piece that is > not a rook but is similar in some respects"; which piece it represents > specifically depends on context. Conventionally, the upside-down queen > represents a "grasshopper" and the upside-down knight a "nightrider", but > otherwise they are assigned on a problem-by-problem basis. This practice > dates back to the early 20th century and was originally so that problem > composers wouldn't have to cut new type for every new piece they invent but > is now traditional. > > I also see some additions like florettes, and elephants needed for >> traditional Asian variants of the game, plus combined forms (e.g. >> tower+horse) which are quite intrigating. >> There are also variants rotated 45 degrees. >> > > The florettes are also used in problems, as are the equihoppers (the > symbol that looks a bit like a bow tie or spindle). The compound symbols > are found in problems and in several common variants such as Capablanca > Chess and Grand Chess. The jester's cap is similar. The elephant and fers > are used in shatranj or medieval chess. > > >> All those are not just meant for display on the grid of a board but in >> discussions about strategies. There are also combining notations added on >> top of chess pieces (e.g. numbering pawns that are otherwise identical, but >> in plain text you can still use notations with superscript digits or >> letters, distinguished clearly from the numbering of grid positions, or by >> adding some other punctuation marks). >> > > I haven't encountered that. It's rarely necessary to differentiate > individual pawns in notation: their moves are so limited that it's usually > obvious which pawn is moving, and there is a standard method of > disambiguating moves by starting square if needed. > > >> I still don't see in these images the elephants (or other pieces like >> unmovable rocks or rivers, or special pieces added to create handicaps for >> one of the player). I've also seen some chess players using special queens >> by putting a pawn on top of a nother falt pawn, with more limited movements >> than a standard queen. There are also bishops/sorcerers/magicians, eagles, >> dragoons, tigers/lions, rats, dogs/foxes, snakes, >> spiders, soldiers/archers, canons, walls/fortresses, gold/treasures... >> Chess games have a lot of variants with their supporters. Modern movies are >> also promoting some variants. >> > > There are elephants in the proposal, using a shape found in medieval > manuscripts. Rocks and rivers are board features and not found in notation. > > >> >> 2016-10-08 17:24 GMT+02:00 Ken Shirriff : >> >>> >>> Looking at the image, the idea of the proposal is to include chess piece >>> symbols in all four 90? rotations? Wouldn't it be better to do this in >>> markup than in Unicode? I fear a combinatorial explosion if Unicode starts >>> including all the possible orientations of characters. (Maybe there's a >>> good reason to do this for chess; I'm just going off the image >>> >>> .) >>> >> > The proposal covers this. These have a well-established use in chess > notation, which doesn't apply to non-chess symbols. Markup would be the > wrong way to do this. It's not like, say, electronic schematics where a > diode symbol may be found in any orientation but still always represents a > diode: a rotated queen symbol is specifically *not a queen* but another > piece entirely. > > Currently, fairy chess problemists rely on font hacks and PDFs (even for > relatively short texts). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sun Oct 9 06:28:55 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Oct 2016 13:28:55 +0200 Subject: Noto unified font In-Reply-To: <201610090617.59735.luke@dashjr.org> References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> <201610090617.59735.luke@dashjr.org> Message-ID: 2016-10-09 8:17 GMT+02:00 Luke Dashjr : > On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote: > > The licence itself says it respects the 4 FSF freedoms. It also > explicitly > > allows reselling (rule DFSG #1): > > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > > No, it doesn't. That link is just a commentary, and of no relevance to non- > SIL-owned fonts. > The link is the one directly used on the Noto description page when it refers to the OFL licence. It is not saying that it is only for SIL-owned fonts. Google/Monotype would have linked to another page if needed but this is the most relevant one explicitly stated by Google on the Noto site. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oren.watson at gmail.com Sun Oct 9 07:14:50 2016 From: oren.watson at gmail.com (Oren Watson) Date: Sun, 9 Oct 2016 08:14:50 -0400 Subject: Noto unified font In-Reply-To: References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> <201610090617.59735.luke@dashjr.org> Message-ID: I am disappointed with Noto Mono, which only covers Latin script, and not Greek, and Cyrillic when most existing monospace fonts do. On Sun, Oct 9, 2016 at 7:28 AM, Philippe Verdy wrote: > > > 2016-10-09 8:17 GMT+02:00 Luke Dashjr : > >> On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote: >> > The licence itself says it respects the 4 FSF freedoms. It also >> explicitly >> > allows reselling (rule DFSG #1): >> > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL >> >> No, it doesn't. That link is just a commentary, and of no relevance to >> non- >> SIL-owned fonts. >> > > The link is the one directly used on the Noto description page when it > refers to the OFL licence. It is not saying that it is only for SIL-owned > fonts. Google/Monotype would have linked to another page if needed but this > is the most relevant one explicitly stated by Google on the Noto site. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Sun Oct 9 08:01:09 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Sun, 9 Oct 2016 15:01:09 +0200 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: > On 9 Oct 2016, at 13:00, Mark Davis ?? wrote: > > Essentially all of the game pieces that are in Unicode were added for compatibility with existing character sets. ?I'm guessing that ?there are hundreds to thousands of possible other symbols associated with games in one way or another, There is http://www.chessvariants.com/. From charupdate at orange.fr Sun Oct 9 08:25:25 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Sun, 9 Oct 2016 15:25:25 +0200 (CEST) Subject: Bit arithmetic on Unicode characters? / Re: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: <882230670.5591.1476019525557.JavaMail.www@wwinf1p27> On Sun, 9 Oct 2016 13:00:30 +0200, Mark Davis ?? wrote: [?] > > I would recommend that any proposal for additional game symbols provide > clear evidence for why those particular game symbols are required to be > exchanged in plain text, in a way that many, many other possible game > symbols are not. I missed this point: ?are required to be EXCHANGED in plain text.? Would it be possible to add this as a requirement into the relevant section of TUS, please? Indeed I can?t see any need to feed those French abbreviations into a plain text data exchange. We?d rather write them out, or use the common acronyms: ?BN? for ?Biblioth?que Nationale? [National Library]; ?BM? for ?Biblioth?que Municipale? [City Library]. However what we can do when it comes to abbreviate ?biblioth?que? or other words ending in ?-que? in plain text, one step I think we could do towards disambiguation is to emit a *new* recommendation for the abbreviation dot, that *is* already used in ?M.? for ?Monsieur? [Mister], and also in ?cf.? and other Latin abbreviations. So in plain text one could write either ?Biblio.que? or ?Bib.que? for ?Biblioth?que? [Library]. While the official rejection rationale of *MODIFIER LETTER SMALL Q is still missing, I can now believe that it reiterated the recommendation to use markup, the more as MS Word does not mess up line spacing when superscript formatting is applied, and as this is better-looking in Tahoma than modifier letters when used to express semantics of abbreviation indicator or ordinal indicator. I?ve run a test on ?M^gr?, for ?Monseigneur? [Monsignor], and on ?3^e?. To avoid process garbage, I?ve made the results available on-line.[1] What got me really started, was the bizarre ?Comment? on the Proposal to encode *MODIFIER LETTER SMALL Q. What I can do now, is to suggest to apply some kind of quality management on both sides, so that corporate officials refrain from publishing sloppy ad-hoc papers for consideration by the UTC, and Unicode won?t be reduced to accept all and everything for archiving in the Document Register. I believe that this could be a practicable way to avoid other people to get bugged. Regards, Marcel [1] Interested subscribers are welcome to view the screenshot from: http://dispoclavier.com/French-abbrev-super-vs-modif.png and to open the Word document from: http://dispoclavier.com/French-abbrev-super-vs-modif.docx From verdy_p at wanadoo.fr Sun Oct 9 09:14:50 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Oct 2016 16:14:50 +0200 Subject: Noto unified font In-Reply-To: References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> <201610090617.59735.luke@dashjr.org> Message-ID: This was not the first prority of the project I think. Monospace fonts were used for text input in web forms but this old use id now deprecating, except probably for CJK, due to poor readability and design and the inability to handle lot of scripts. Monospace fonts are still used for programming languages where code is almost always in Latin and translatable contents are preferably stored in external resources. For editing the external resources, there's no need of complex data structures, the format is most often linear and you don't need any monospace fonts. But there are still programs created mixing code and UI text in static strings and some limited usages in internationalized regexps (which is a sort of programming language with complex rules). I suggest that such editors should have an interface to swtich instantly from a monospace and normal font. There are decent text editors that are friendly with Latin/CJK monospace fonts and proportional fonts for other scripts or symbols. And the Noto project is not finished : - Its monospace can still be improved to cover more than just Latin and general punctuation. - Adding Cyrillic, Greek, and a few other scripts that work well in monospace styles (e.g. Hebrew, possibly Georgian and Armenian or even Cherokee) would seem a good future goal (monospace fonts for Arabic are most horrible, except in very creative/fancy designs, even if the Arabic script is very flexible using long joining, but some complex ligatures which don't fit well in a character cell). - However it is really not needed for CJK scripts (that have their own fonts already with monospace metrics), including the Japanese kanas and Bopomofo (as well as mappings for subsets of Latin/Greek/Cyrillic inherited from legacy non-Unicode charsets). But another project should now target more urgent needs: fonts with excellent typographic features for printing, advertizing, titling, to be used for finalized publications (printed or in PDFs) which would be beautiful, or that would better reproduce the best handwritten/painted artworks, or that woudl restore the best typographic traditions used since centuries. Peoiple now start rediscovering the beauty of these traditions but rarely with solutions that are usable with our modern languages using a richer repertoire of characters (many borrowed directly from other scripts or languages), so the best-looking fonts are only designed for some limited languages (most often the major European languages, but frequently only Basic English and Classical Latin or Greek) : - the serif style fonts still need extensions of their coverage (I think it is more urgent than the monospace styles). I like also the fact that the Noto project opted for distinguishing the two major traditions for the Arabic script. About each year, there's an updated version of the set, but most often this occurs due to the extension of the universal repertoire (and it is easier to separate the designs per script as it eases the updating process and tests if they are just extended with some new characters, new encoded variants, or new pairs with diacritics or complex ligatures and layouts for Indic scripts. And in fact I'd like that Windows Update to also include this distribution (independantly of the many legacy fonts for MS Office). For now Noto Sans still competes with the "Segoe" families made for the Windows UI, but it has a limited coverage (May be Noto should be installed by default with Chrome and Safari, probably also with JRE/JDK for Java). It is highly preferable to the older Arial, Verdana, Times New Roman family whose coverage is now old (but still distributed and updated with MS IE/Edge). For monospaced fonts, "Consolas" from Microsoft is still better than Noto and the older "Courier New". 2016-10-09 14:14 GMT+02:00 Oren Watson : > I am disappointed with Noto Mono, which only covers Latin script, and not > Greek, and Cyrillic when most existing monospace fonts do. > > On Sun, Oct 9, 2016 at 7:28 AM, Philippe Verdy wrote: > >> >> >> 2016-10-09 8:17 GMT+02:00 Luke Dashjr : >> >>> On Sunday, October 09, 2016 4:37:24 AM Philippe Verdy wrote: >>> > The licence itself says it respects the 4 FSF freedoms. It also >>> explicitly >>> > allows reselling (rule DFSG #1): >>> > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL >>> >>> No, it doesn't. That link is just a commentary, and of no relevance to >>> non- >>> SIL-owned fonts. >>> >> >> The link is the one directly used on the Noto description page when it >> refers to the OFL licence. It is not saying that it is only for SIL-owned >> fonts. Google/Monotype would have linked to another page if needed but this >> is the most relevant one explicitly stated by Google on the Noto site. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liste at secarica.ro Sun Oct 9 10:25:27 2016 From: liste at secarica.ro (Cristian =?UTF-8?B?U2VjYXLEgw==?=) Date: Sun, 9 Oct 2016 18:25:27 +0300 Subject: Noto unified font In-Reply-To: References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> <201610090617.59735.luke@dashjr.org> Message-ID: <20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro> ?n data de Sun, 9 Oct 2016 16:14:50 +0200, Philippe Verdy a scris: > And the Noto project is not finished : > > - Its monospace can still be improved to cover more than just Latin > and general punctuation. > - Adding Cyrillic, Greek, and a few other scripts that work well in > monospace styles (e.g. Hebrew, possibly Georgian and Armenian or even > Cherokee) would seem a good future goal I checked the NotoMono-Regular.ttf file [1]: - Greek includes range U+0384 to U+03CE (less the reserved ones) plus U+03D1, U+03D2 and U+03D6 - Cyrillic seems to include the whole range, except for U+0487 combining mark - Hebrew, Georgian, Armenian and Cherokee ? blanks only The NotoSansMonoCJKxx range is poorer in this area, but still includes the "basic" Greek and Cyrillic. Cristi [1] from https://www.google.com/get/noto/ -- Cristian Secar? http://www.sec?ric?.ro From verdy_p at wanadoo.fr Sun Oct 9 11:12:57 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Oct 2016 18:12:57 +0200 Subject: Noto unified font In-Reply-To: <20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro> References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> <201610090617.59735.luke@dashjr.org> <20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro> Message-ID: I meant the **complete** coverage. Basic Greek and Basic Cyrillic is not enough. Also I did not say that Hebrew, Georgian, Armenian and Cherokee were included, this was a suggestion (Cherokee being largely an adaptation of Latin+Greek+Cyrillic with some additional strokes for new letters, it could as well be included in the default Noto Sans and could share glyphs) 2016-10-09 17:25 GMT+02:00 Cristian Secar? : > ?n data de Sun, 9 Oct 2016 16:14:50 +0200, Philippe Verdy a scris: > > > And the Noto project is not finished : > > > > - Its monospace can still be improved to cover more than just Latin > > and general punctuation. > > - Adding Cyrillic, Greek, and a few other scripts that work well in > > monospace styles (e.g. Hebrew, possibly Georgian and Armenian or even > > Cherokee) would seem a good future goal > > I checked the NotoMono-Regular.ttf file [1]: > - Greek includes range U+0384 to U+03CE (less the reserved ones) plus > U+03D1, U+03D2 and U+03D6 > - Cyrillic seems to include the whole range, except for U+0487 combining > mark > - Hebrew, Georgian, Armenian and Cherokee ? blanks only > > The NotoSansMonoCJKxx range is poorer in this area, but still includes the > "basic" Greek and Cyrillic. > > Cristi > > [1] from https://www.google.com/get/noto/ > > -- > Cristian Secar? > http://www.sec?ric?.ro > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moyogo at gmail.com Sun Oct 9 11:23:31 2016 From: moyogo at gmail.com (Denis Jacquerye) Date: Sun, 09 Oct 2016 16:23:31 +0000 Subject: Fwd: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161007092221.665a7a7059d7ee80bb4d670165c8327d.002e682fe0.wbe@email03.godaddy.com> Message-ID: Regarding the superscript q, in some rare cases, it is used to indicate pharyngealization or a pharyngeal consonant instead of the Latin letter pharyngeal voiced fricative U+0295 ?, the modifier letter reversed glottal stop U+02C1 ? or the modifier letter small reversed glottal stop U+02E4 ?. Mena?n du Plessis uses a modifier letter small q after a vowel in ?Xam to indicate pharyngealization of that vowel in a few papers (Notes on Qings own languages , A century of the Specimens of Bushman Folklore , A unity hypothesis for the Southern African Khoesan languages ). A superscript q is also used in the name of the Mquq?in/Brooks Peninsula Provincial Park by the Ministry of Environment of British Columbia on the dedicated page on its website and by the Minister of the Aboriginal Affairs and Northern Development Canada, British Columbia Ministry of Aboriginal Relations and Reconciliation, and the Maa-nulth First Nations in Maa-nulth First Nations Final Agreement Implementation Report / 2011-2012 and 2012-2013 . Given the references on the Nuu-Chah-Nulth orthography that are online, it seems the superscript q is used instead of the standard orthography?s Latin letter pharyngeal voiced fricative U+0295 ? in the transcription Mquq?in. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskasskrv at gmail.com Sun Oct 9 14:28:03 2016 From: jameskasskrv at gmail.com (James Kass) Date: Sun, 9 Oct 2016 11:28:03 -0800 Subject: Noto unified font In-Reply-To: References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> <201610090617.59735.luke@dashjr.org> <20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro> Message-ID: David Starner responded, >> The word "free" when applied to any product means "free of charge". > > Using the word "product" sort of biases your argument, does it not? Webster's defines "product" as something produced by nature, industry, or art. So an apple is a product whether it's a wild apple, a cultivated apple, or a road apple. Software is also a product, and as with any product, it's either free or for sale. > ... it seems to be a problem more frequently of people getting > annoyed than people getting confused. Isn't confusion annoying? Best regards, James Kass From verdy_p at wanadoo.fr Sun Oct 9 18:57:05 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 10 Oct 2016 01:57:05 +0200 Subject: Noto unified font In-Reply-To: References: <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> <201610090617.59735.luke@dashjr.org> <20161009182527.021ac487b2f1dec8e66ac6ec@secarica.ro> Message-ID: I did not receive the message from David Starner you are quoting, it was probably not sent to this list but I did not received it privately (not even in my "spam mailbox"). Anyway I agree with your response, David Starner has a strange interpretation of this common word (notably in the context what I used it after "any"). However In my sense a product is the result of an process requiring an active participation. The webster definition is a bit larger (and also match with the meaning of the term "produit" in French, which also includes results of natural processes such as apples or ashes from a volvano: the term emphases the fact that there's a process of transformation from a state to another and that the result has an added value, but of course not necessarily a financial value by itself or a financial cost). Here were' speaking about software (or structured data) which is always the result of an active process going from an idea to some implementation and its advertizing and distribution. It always has a financial cost, but this cost is already shared and spread with the means we use to access or distribute this result, or discuss and improve it. It also has a finanial cost given the time devoted to make it (time is money: if you're not paid for it, it will cost you in terms of the money you don't collect for that time not spent on other tasks, but it also means the time gained by others easily using the result with low costs that they will still have to support themselves; only to receive this email, you've spent money for your FAI and paid the bill for the electricity and spent time on your computer whose aging will require you to change it in some months or years when it will no longer be usable for the tools you need everyday on it). Open sourcing a software or data or graphic design, or artistic product, or a font here is a way to share and split the costs to smaller amounts that more people can support, instead on giving all the money to a single producer, assuming also all risks when investing in it for the creation, production, distribution and support., it eliminates single points of failure or defects by allowing more freedom for the replacement or servicing, with lower losses and risks taken by the participants to this process. It allows anyone participating in less tasks, that are less compelx to them, and then delegate the rest to others in mutual cooperations. Generally it also allows faster developements and easier adaptations by varying methods. And instread of investing time in a single activity, we invest time in many more, just when we need them or when we think we may be useful and more efficient in some limited domains. In the open sourcing processes, you have to be confident that people will help you and you'll help them, but not just in a one-to-one relation with direct returns and in timely delays (like in commercial contracts). You don't order people to do things for you, you don't pay them directly, you are also never required to donate something in exchange immediately. The benefits are only there because you are part of the process and because everyone gets more than what he donates (the total added value is then larger than in private commercial relations). We are not just consumers but also producers and creators in a collective work where the goal is largely focused on actual needs and usages. All people like to be creative, and it's always intereting to see many people adding their own creativity to a project, for things we would have not imagined ourself or not expected that they would find smarter solutions than ours. In fact it is for the same reason that we have developed collective laws and have governments and elected delegates, or public services all around the world (but as opposed to them, there's no required tax to pay, no dated bills, even if we still have rules to obey: the licence terms for which we also want to be supported by collective laws protecting these terms against unfairness or against abuses). 2016-10-09 21:28 GMT+02:00 James Kass : > David Starner responded, > > >> The word "free" when applied to any product means "free of charge". > > > > Using the word "product" sort of biases your argument, does it not? > > Webster's defines "product" as something produced by nature, industry, > or art. So an apple is a product whether it's a wild apple, a > cultivated apple, or a road apple. Software is also a product, and as > with any product, it's either free or for sale. > > > ... it seems to be a problem more frequently of people getting > > annoyed than people getting confused. > > Isn't confusion annoying? > > Best regards, > > James Kass > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Sun Oct 9 20:03:53 2016 From: doug at ewellic.org (Doug Ewell) Date: Sun, 9 Oct 2016 19:03:53 -0600 Subject: Noto unified font Message-ID: <7A89301ABEEA4CFE8254349B77B82AC4@DougEwell> Philippe Verdy wrote: > I did not receive the message from David Starner you are quoting, it > was probably not sent to this list but I did not received it privately > (not even in my "spam mailbox"). http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0134.html -- Doug Ewell | Thornton, CO, US | ewellic.org From doug at ewellic.org Sun Oct 9 20:13:32 2016 From: doug at ewellic.org (Doug Ewell) Date: Sun, 9 Oct 2016 19:13:32 -0600 Subject: Fwd: Why incomplete subscript/superscript alphabet ? Message-ID: Denis Jacquerye wrote: > Regarding the superscript q, in some rare cases, it is used to > indicate pharyngealization or a pharyngeal consonant instead of the > Latin letter pharyngeal voiced fricative U+0295 ?, the modifier letter > reversed glottal stop U+02C1 ? or the modifier letter small reversed > glottal stop U+02E4 ?. > ... Sounds like good material to include in a proposal. -- Doug Ewell | Thornton, CO, US | ewellic.org From prosfilaes at gmail.com Sun Oct 9 23:33:07 2016 From: prosfilaes at gmail.com (David Starner) Date: Mon, 10 Oct 2016 04:33:07 +0000 Subject: Bit arithmetic on Unicode characters? In-Reply-To: References: <3a9d909b-1b66-2614-0cd2-2e1207963642@att.net> Message-ID: On Sun, Oct 9, 2016 at 4:03 AM Mark Davis ?? wrote: > Essentially all of the game pieces that are in Unicode were added for > compatibility with existing character sets. ?I'm guessing that ?there are > hundreds to thousands of possible other symbols associated with games in > one way or another, or that could be dug out of instruction manuals (eg, > http://www.catan.com/files/downloads/catan_5th_ed_rules_eng_150303.pdf). > (Many of those would be encumbered by copyright issues, but there are no > doubt others that would not.) > I see two symbols used in text in that Catan manual; there's a white star (U+2606) and a twelve-pointed red star (U+2739 or U+1F7D2?). I don't see why books about games would be any different than any other book in this manner; symbols used in running text should be encoded. -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Mon Oct 10 04:30:48 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 10 Oct 2016 11:30:48 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: Message-ID: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> > On 10 Oct 2016, at 03:13, Doug Ewell wrote: > > Denis Jacquerye wrote: > >> Regarding the superscript q, in some rare cases, it is used to >> indicate pharyngealization or a pharyngeal consonant instead of the >> Latin letter pharyngeal voiced fricative U+0295 ?, the modifier letter >> reversed glottal stop U+02C1 ? or the modifier letter small reversed >> glottal stop U+02E4 ?. >> ... > > Sounds like good material to include in a proposal. I think that IPA might be designed for broad phonetic transcriptions [1], with a requirement to distinguish phonemes within each given language. For example, the English /l/ is thicker than the Swedish, but in IPA, there is only one symbol, as there is no phonemic distinction with each language. The alveolar click /!/ may be pronounced with or without the tongue hitting the floor of the mouth, but as there is not phonemic distinction within any given language, there is only one symbol [2]. Thus, linguists wanting to describe pronunciation in more detail are left at improvising notation. The situation is thus more like that of mathematics, where notation is somewhat in flux. 1. https://en.wikipedia.org/wiki/Phonetic_transcription 2. https://en.wikipedia.org/wiki/Alveolar_clicks From jcb+unicode at inf.ed.ac.uk Mon Oct 10 08:24:51 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 14:24:51 +0100 Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> Message-ID: On 2016-10-10, Hans ?berg wrote: > I think that IPA might be designed for broad phonetic transcriptions > [1], with a requirement to distinguish phonemes within each given > language. For example, the English /l/ is thicker than the Swedish, > but in IPA, there is only one symbol, as there is no phonemic > distinction with each language. The alveolar click /!/ may be > pronounced with or without the tongue hitting the floor of the > mouth, but as there is not phonemic distinction within any given > language, there is only one symbol [2]. But the IPA has many diacritics exactly for this purpose. The velarized English coda /l/ is usually described as [l?] with U+0334 COMBINING TILDE OVERLAY, or can be notated [l?] with U+02E0 MODIFIER LETTER SMALL GAMMA. The alveolar click with percussive flap hasn't made it into the standard IPA, but in ExtIPA it's [??] (preferably kerned together). > Thus, linguists wanting to describe pronunciation in more detail are left at improvising notation. The situation is thus more like that of mathematics, where notation is somewhat in flux. There is improvisation when you're studying something new, of course, but there's a lot of standardization. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From haberg-1 at telia.com Mon Oct 10 11:04:36 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 10 Oct 2016 18:04:36 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> Message-ID: <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> > On 10 Oct 2016, at 15:24, Julian Bradfield wrote: > > On 2016-10-10, Hans ?berg wrote: >> I think that IPA might be designed for broad phonetic transcriptions >> [1], with a requirement to distinguish phonemes within each given >> language. For example, the English /l/ is thicker than the Swedish, >> but in IPA, there is only one symbol, as there is no phonemic >> distinction with each language. The alveolar click /!/ may be >> pronounced with or without the tongue hitting the floor of the >> mouth, but as there is not phonemic distinction within any given >> language, there is only one symbol [2]. > > But the IPA has many diacritics exactly for this purpose. > The velarized English coda /l/ is usually described as [l?] > with U+0334 COMBINING TILDE OVERLAY, or can be notated [l?] > with U+02E0 MODIFIER LETTER SMALL GAMMA. > > The alveolar click with percussive flap hasn't made it into the > standard IPA, but in ExtIPA it's [??] (preferably kerned together). There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used. >> Thus, linguists wanting to describe pronunciation in more detail are left at improvising notation. The situation is thus more like that of mathematics, where notation is somewhat in flux. > > There is improvisation when you're studying something new, of course, > but there's a lot of standardization. The preceding discussion was dealing additions to Unicode one-by-one?the question is what might be added so that linguists do not feel restrained. From everson at evertype.com Mon Oct 10 11:30:46 2016 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Oct 2016 17:30:46 +0100 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> Message-ID: On 10 Oct 2016, at 14:24, Julian Bradfield wrote: > But the IPA has many diacritics exactly for this purpose. The velarized English coda /l/ is usually described as [l?] with U+0334 COMBINING TILDE OVERLAY, 026B ? LATIN SMALL LETTER L WITH MIDDLE TILDE > The alveolar click with percussive flap hasn't made it into the standard IPA, but in ExtIPA it's [??] (preferably kerned together). > On 10 Oct 2016, at 17:04, Hans ?berg wrote: > >> The alveolar click with percussive flap hasn't made it into the >> standard IPA, but in ExtIPA it's [??] (preferably kerned together). > > There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used. Has neither the right shape nor the right properties. Michael Everson From verdy_p at wanadoo.fr Mon Oct 10 12:57:13 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 10 Oct 2016 19:57:13 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> Message-ID: 2016-10-10 18:04 GMT+02:00 Hans ?berg : > > > On 10 Oct 2016, at 15:24, Julian Bradfield > wrote: > > > > On 2016-10-10, Hans ?berg wrote: > >> I think that IPA might be designed for broad phonetic transcriptions > >> [1], with a requirement to distinguish phonemes within each given > >> language. For example, the English /l/ is thicker than the Swedish, > >> but in IPA, there is only one symbol, as there is no phonemic > >> distinction with each language. The alveolar click /!/ may be > >> pronounced with or without the tongue hitting the floor of the > >> mouth, but as there is not phonemic distinction within any given > >> language, there is only one symbol [2]. > > > > But the IPA has many diacritics exactly for this purpose. > > The velarized English coda /l/ is usually described as [l?] > > with U+0334 COMBINING TILDE OVERLAY, or can be notated [l?] > > with U+02E0 MODIFIER LETTER SMALL GAMMA. > > > > The alveolar click with percussive flap hasn't made it into the > > standard IPA, but in ExtIPA it's [??] (preferably kerned together). > > There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used. > I disagree, IPA does not use such confusive ligature that would be read as a repeated click and not a single one. Reversing the second one (and slighly kerning it, thow I don't know how, to avoid the confusion with "!i", i.e. a click followed by a vowel, most proably writing them on top of each other or slanted/italicized) is a valuable visual distinction for a single distinctive phoneme. But IPA also proposes something else when more precise distinctions are needed for noting not just the linguistic phonemes but their precise phonetic realisations (e.g. in papers speaking about regional speach accents), such as combining the normal phonemic symbol with a diacritic,usually placed below, such as the dental modifier U+032A that looks like a small bridge or some arrowhead-like diacritics (U+032C caron below or U+032D circumflex below) to indicate a more precise placement of the tongue. Clicks are also pronouncable by themselves in isolation without any vowel (in fact it's much easiler to pronounce them without a vowel) but they may easily be pitched (on a small range of about 6 or 7 musical tones) instead of being vovalized. However I've not seen any discritics to also annote the pitch. In Chinese vowels are annotated with distinctive tones (but some of them variable, where clicks can hardly have a raising or lowering tone). The pitch is easily realized by more or less opening the mouth or by slighly closing lip or rounding them (giving an appearence of "vowel", though they are not voiced through the mouth as they are usually "aspirated" there, but only voiced within air expirated through nasal areas). All this looks like technical possibilities of human voice, appropriate for phonetic analysis but rarely for actual phonemes of languages as they are hard to be distinguished in a group of people. These distonctions are however easiler to recognize within the context of a complete speach along with other surrounding phonemes (Chinese may be realized on 6 or 7 musical pitch tones by any one, but in speach only 3 are used and the other phonemic tones are combination of the 3 basic tones, and the mapping from the 3 basic tone to musical pitch tones/frequencies is highly variable between persons depending on age, sex, body weight, health, muscular development, or handicap: the phonemic tones are subdivions of the possibilities of all the possible realizations that a mixed group of people will want to exchange with good mutual understanding). In Unicode there are several sets of tone modifiers that are encoded as spacing modifiers (and in Pinyin, they are frequently noted with standard European digits but have no direct relation with the musical pitch tone or even with the 3 basic pitches used to compose the phonemic tones). Chinese (but also Vietnamese) may also use diacritics above (acute, grave, circumflex, tilde...). Linguists needing internationlization use distinct symbols written after the vocalic phoneme or just after a vowelless consonnantal phoneme, or just after a neutral schwa for a neutral/unclear vowel. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Oct 10 14:42:40 2016 From: doug at ewellic.org (Doug Ewell) Date: Mon, 10 Oct 2016 12:42:40 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com> Hans ?berg wrote: > I think that IPA might be designed for broad phonetic transcriptions > [1], with a requirement to distinguish phonemes within each given > language. >From the Wikipedia article you cited: "For example, one particular pronunciation of the English word little may be transcribed using the IPA as /?l?t?l/ or [?l????]; the broad, phonemic transcription, placed between slashes, indicates merely that the word ends with phoneme /l/, but the narrow, allophonic transcription, placed between square brackets, indicates that this final /l/ ([?]) is dark (velarized)." IPA can be used pretty much as broadly or as narrowly as one wishes. -- Doug Ewell | Thornton, CO, US | ewellic.org From jcb+unicode at inf.ed.ac.uk Mon Oct 10 14:43:29 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 20:43:29 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> Message-ID: On 2016-10-10, Hans ?berg wrote: >> On 10 Oct 2016, at 15:24, Julian Bradfield wrote: >> The alveolar click with percussive flap hasn't made it into the >> standard IPA, but in ExtIPA it's [??] (preferably kerned together). > There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used. !! was used by one famous Africanist, but that was before ExtIPA existed. > The preceding discussion was dealing additions to Unicode one-by-one?the question is what might be added so that linguists do not feel restrained. Linguists aren't stupid, and they have no need for plain text representations of all their symbology. Linguists write in Word or LaTeX (or sometimes HTML), all of which can produce a wide range of symbols beyond the wit of Unicode. As I have remarked before, I have used "latin letter turned small capital K", for reasons that seemed good to me, and I was not one whit restrained by its absence from Unicode - nor was the journal. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From haberg-1 at telia.com Mon Oct 10 15:03:42 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 10 Oct 2016 22:03:42 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com> References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com> Message-ID: <0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com> > On 10 Oct 2016, at 21:42, Doug Ewell wrote: > > Hans ?berg wrote: > >> I think that IPA might be designed for broad phonetic transcriptions >> [1], with a requirement to distinguish phonemes within each given >> language. > > From the Wikipedia article you cited: > > "For example, one particular pronunciation of the English word little > may be transcribed using the IPA as /?l?t?l/ or [?l????]; the > broad, phonemic transcription, placed between slashes, indicates merely > that the word ends with phoneme /l/, but the narrow, allophonic > transcription, placed between square brackets, indicates that this final > /l/ ([?]) is dark (velarized)." > > IPA can be used pretty much as broadly or as narrowly as one wishes. Within each language, but is not designed to capture differences between different languages or dialects. From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:04:54 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 21:04:54 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> Message-ID: On 2016-10-10, Philippe Verdy wrote: > 2016-10-10 18:04 GMT+02:00 Hans ?berg : >> > On 10 Oct 2016, at 15:24, Julian Bradfield >> wrote: >> > The alveolar click with percussive flap hasn't made it into the >> > standard IPA, but in ExtIPA it's [??] (preferably kerned together). >> >> There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used. > I disagree, IPA does not use such confusive ligature that would be read as > a repeated click and not a single one. Reversing the second one (and > slighly kerning it, thow I don't know how, to avoid the confusion with > "!i", i.e. a click followed by a vowel, most proably writing them on top of > each other or slanted/italicized) is a valuable visual distinction for a > single distinctive phoneme. What confusion? ? is not easily confusable with i - ask the Spanish! > But IPA also proposes something else when more precise distinctions are > needed for noting not just the linguistic phonemes but their precise Did you read the bit where I said that? > Clicks are also pronouncable by themselves in isolation without any vowel > (in fact it's much easiler to pronounce them without a vowel) but they may > easily be pitched (on a small range of about 6 or 7 musical tones) instead > of being vovalized. However I've not seen any discritics to also annote the > pitch. Because no language uses clicks this way, and phonetic alphabets are not written for composers of mouth music. If one wished to do so, one would use the standard tone indicators. > In Chinese vowels are annotated with distinctive tones (but some of them > variable, where clicks can hardly have a raising or lowering tone). The > pitch is easily realized by more or less opening the mouth or by slighly > closing lip or rounding them (giving an appearence of "vowel", though they > are not voiced through the mouth as they are usually "aspirated" there, but > only voiced within air expirated through nasal areas). All this looks like What are you on about? > technical possibilities of human voice, appropriate for phonetic analysis > but rarely for actual phonemes of languages as they are hard to be > distinguished in a group of people. Those who learn languages natively have no problems distinguishing voiced, voiceless, aspirated, breathy, nasal, glottalized,... clicks. > These distonctions are however easiler to recognize within the context of a > complete speach along with other surrounding phonemes (Chinese may be > realized on 6 or 7 musical pitch tones by any one, but in speach only 3 are > used and the other phonemic tones are combination of the 3 basic tones, and (a) There is no such thing as "Chinese" - there are many different languages in China, with a continuum of dialect gradations. (b) Even if you mean Mandarin, the usual notation for the five (four plus neutral) Mandarin tones uses five pitch levels to describe the contours, not three. > spacing modifiers (and in Pinyin, they are frequently noted with standard > European digits but have no direct relation with the musical pitch tone or > even with the 3 basic pitches used to compose the phonemic tones). Chinese > (but also Vietnamese) may also use diacritics above (acute, grave, > circumflex, tilde...). Linguists needing internationlization use distinct > symbols written after the vocalic phoneme or just after a vowelless > consonnantal phoneme, or just after a neutral schwa for a neutral/unclear > vowel. Linguists don't need internationalization. They use IPA or other notations. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From haberg-1 at telia.com Mon Oct 10 15:09:40 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 10 Oct 2016 22:09:40 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> Message-ID: <2BB69E14-7238-49C0-AB41-2B648320780E@telia.com> > On 10 Oct 2016, at 21:43, Julian Bradfield wrote: > Linguists aren't stupid, and they have no need for plain text > representations of all their symbology. Linguists write in Word or > LaTeX (or sometimes HTML), all of which can produce a wide range of > symbols beyond the wit of Unicode. > > As I have remarked before, I have used "latin letter turned small > capital K", for reasons that seemed good to me, and I was not one whit > restrained by its absence from Unicode - nor was the journal. It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics? From everson at evertype.com Mon Oct 10 15:14:23 2016 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Oct 2016 21:14:23 +0100 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> Message-ID: <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> On 10 Oct 2016, at 21:04, Julian Bradfield wrote: > > Linguists don't need internationalization. They use IPA or other notations. We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost. Michael From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:15:34 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 21:15:34 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com> <0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com> Message-ID: On 2016-10-10, Hans ?berg wrote: >> On 10 Oct 2016, at 21:42, Doug Ewell wrote: >> Hans ?berg wrote: >>> I think that IPA might be designed for broad phonetic transcriptions >>> [1], with a requirement to distinguish phonemes within each given >>> language. ... >> IPA can be used pretty much as broadly or as narrowly as one wishes. > > Within each language, but is not designed to capture differences between different languages or dialects. What do you mean? The IPA in narrow transcription is intended to provide as detailed a description as a human mind can manage of sounds. It doesn't care whether you're describing differences between languages or differences within languages (a distinction that is not in any case well defined). -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:24:11 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 21:24:11 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> Message-ID: On 2016-10-10, Michael Everson wrote: > On 10 Oct 2016, at 21:04, Julian Bradfield wrote: >> >> Linguists don't need internationalization. They use IPA or other notations. > > We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost. We have no need to make such distinctions in "plain text". It's convenient to have major distinctions easily accessible without font hacking, but there's no need to have every notation one might dream up forcibly incorporated into "plain text". In particular, for super/subscripts, which is where we came in, even the benighted souls using Word still typically recognize and can use LaTeX notation. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From kenwhistler at att.net Mon Oct 10 15:24:41 2016 From: kenwhistler at att.net (Ken Whistler) Date: Mon, 10 Oct 2016 13:24:41 -0700 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> Message-ID: <3c040475-bf35-09ca-0121-2dbdec31961b@att.net> On 10/10/2016 1:14 PM, Michael Everson wrote: > On 10 Oct 2016, at 21:04, Julian Bradfield wrote: >> Linguists don't need internationalization. They use IPA or other notations. > We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost. > > Michael > Recte: We need reliable notation systems. Otherwise distinctions we wish to represent may be lost. Whether a "reliable notation system" has to be entirely plain text in its content, or includes reliable standard means for markup, such as XML, is a matter for debate and consensus among the linguists involved. Linguists need to represent all kinds of things, and assuming that all pertinent text content of interest to them is ipso facto plain text is erroneous. --Ken From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:31:28 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 21:31:28 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <2BB69E14-7238-49C0-AB41-2B648320780E@telia.com> Message-ID: On 2016-10-10, Hans ?berg wrote: > It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics? I don't see the need to do everything in plain text. Long ago, I spent a great deal of time getting my editor to do semi-wysiwyg TeX maths (work later incorporated into x-symbol), but actually it's a waste of time and I've given up. Working mathematicians know LaTeX and its control sequences. Even my 12-year old uses LaTeX control sequences to communicate with his online maths courses. Because phonetics has a much small set of symbols, I do kw??t l??k bi??? e?bl t? du? ??s, and because they're also used in non-specialist writing, it's useful to have the symbols hacked into Unicode instead of hacked into specialist fonts. But subscripts? No need. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From haberg-1 at telia.com Mon Oct 10 15:34:39 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 10 Oct 2016 22:34:39 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com> <0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com> Message-ID: <2869E093-6788-40DA-A646-F2DCBB9CF778@telia.com> > On 10 Oct 2016, at 22:15, Julian Bradfield wrote: > > On 2016-10-10, Hans ?berg wrote: >>> On 10 Oct 2016, at 21:42, Doug Ewell wrote: >>> Hans ?berg wrote: >>>> I think that IPA might be designed for broad phonetic transcriptions >>>> [1], with a requirement to distinguish phonemes within each given >>>> language. > ... >>> IPA can be used pretty much as broadly or as narrowly as one wishes. >> >> Within each language, but is not designed to capture differences between different languages or dialects. > > What do you mean? The IPA in narrow transcription is intended to > provide as detailed a description as a human mind can manage of > sounds. It doesn't care whether you're describing differences between > languages or differences within languages (a distinction that is not > in any case well defined). It is designed for phonemic transcriptions, cf., https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet From verdy_p at wanadoo.fr Mon Oct 10 15:36:33 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 10 Oct 2016 22:36:33 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> Message-ID: 2016-10-10 22:04 GMT+02:00 Julian Bradfield : > On 2016-10-10, Philippe Verdy wrote: > > 2016-10-10 18:04 GMT+02:00 Hans ?berg : > >> > On 10 Oct 2016, at 15:24, Julian Bradfield > >> wrote: > > >> > The alveolar click with percussive flap hasn't made it into the > >> > standard IPA, but in ExtIPA it's [??] (preferably kerned together). > >> > >> There is ? DOUBLE EXCLAMATION MARK U+203C which perhaps might be used. > > > I disagree, IPA does not use such confusive ligature that would be read > as > > a repeated click and not a single one. Reversing the second one (and > > slighly kerning it, thow I don't know how, to avoid the confusion with > > "!i", i.e. a click followed by a vowel, most proably writing them on top > of > > each other or slanted/italicized) is a valuable visual distinction for a > > single distinctive phoneme. > > What confusion? ? is not easily confusable with i - ask the Spanish! > Not relevant! Here were'e not speaking about punctuation between words, but inclusion within words in phonetic trancrtiptions where even word separation is not always relevant and punctuation us almost absent. There's no case in Spanish with "?" in the middle of a word. But here we're speaking about noting a consonant within words where vowels can also be expected in phonetic transcriptions. And there the confusion with a following voiwel i is very likely. On the opposite, IPA symbols are carefully chosen to avoid visual confusions (and that's why they only exist in a single lettercase). -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Mon Oct 10 15:38:56 2016 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Oct 2016 21:38:56 +0100 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> Message-ID: <14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com> On 10 Oct 2016, at 21:24, Julian Bradfield wrote: > >> We need reliable plain-text notation systems. Otherwise distinctions we wish to encode may be lost. > > We have no need to make such distinctions in "plain text?. You mightn?t. > It's convenient to have major distinctions easily accessible without > font hacking, Yes, indeed. > but there's no need to have every notation one might dream up forcibly incorporated into "plain text?. Hyperbole. > In particular, for super/subscripts, which is where we came in, even > the benighted souls using Word still typically recognize and can use > LaTeX notation. I can?t use LaTeX notation. I don?t use that proprietary system. And don?t you dare tell me that I am benighted, or using Word. Neither applies. On 10 Oct 2016, at 21:31, Julian Bradfield wrote: > On 2016-10-10, Hans ?berg wrote: >> It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics? > > I don't see the need to do everything in plain text. Of course not. You?re a programmer. (Mathematical typesetting is not my concern.) > Because phonetics has a much small set of symbols, I do kw??t l??k > bi??? e?bl t? du? ??s, and because they're also used in non-specialist > writing, it's useful to have the symbols hacked into Unicode instead > of hacked into specialist fonts. > But subscripts? No need. And yet we use such things. I have an edition of the Bible I?m setting. Big book. Verse numbers. I like these to be superscript so they?re unobtrusive. Damn right I use the superscript characters for these. I can process the text, export it for concordance processing, whatever, and those out-of-text notations DON?T get converted to regular digits, which I need. Michael From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:52:54 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 21:52:54 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> Message-ID: On 2016-10-10, Philippe Verdy wrote: > Not relevant! Here were'e not speaking about punctuation between words, but > inclusion within words in phonetic trancrtiptions where even word > separation is not always relevant and punctuation us almost absent. > There's no case in Spanish with "?" in the middle of a word. But here we're > speaking about noting a consonant within words where vowels can also be > expected in phonetic transcriptions. And there the confusion with a > following voiwel i is very likely. On the opposite, IPA symbols are > carefully chosen to avoid visual confusions (and that's why they only exist > in a single lettercase). and are less confusable than and , especially in a sanserif font. In both cases, the main visual cue is a descender/ascender in one letter than isn't in the other. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jcb+unicode at inf.ed.ac.uk Mon Oct 10 15:58:01 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 21:58:01 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> <14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com> Message-ID: On 2016-10-10, Michael Everson wrote: > I can?t use LaTeX notation. I don?t use that proprietary system. And don?t you dare tell me that I am benighted, or using Word. Neither applies. That's an interesting use of "proprietary" you have there, but I suppose with your Alician interests, Humpty Dumpty's attitude to words may have rubbed off on you! What *do* you mean? > I have an edition of the Bible I?m setting. Big book. Verse numbers. I like these to be superscript so they?re unobtrusive. Damn right I use the superscript characters for these. I can process the text, export it for concordance processing, whatever, and those out-of-text notations DON?T get converted to regular digits, which I need. If you were doing it properly, the text would be stored in a suitable markup, as would the verse numbers, and both the typesetting and the concordance processing would deal with them appropriately. No need for Unicode hacks. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From haberg-1 at telia.com Mon Oct 10 15:59:10 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 10 Oct 2016 22:59:10 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <2BB69E14-7238-49C0-AB41-2B648320780E@telia.com> Message-ID: <48BBA151-5017-4357-94A1-63000F93CD34@telia.com> > On 10 Oct 2016, at 22:31, Julian Bradfield wrote: > > On 2016-10-10, Hans ?berg wrote: >> It is possible to write math just using ASCII and TeX, which was the original idea of TeX. Is that want you want for linguistics? > > I don't see the need to do everything in plain text. Long ago, I spent > a great deal of time getting my editor to do semi-wysiwyg TeX maths > (work later incorporated into x-symbol), but actually it's a waste of > time and I've given up. A fast input method is using text substitutions together with a Unicode capable editor generating UTF-8. Then use LuaTeX together with ConTeXt or LaTeX/unicode-math. On MacOS, it works interactively: when a matching input string is detected, it is replaced. It does not take long time to design such a text substitutions set: I made one for all Unicode math letters, more than a thousand. From jcb+unicode at inf.ed.ac.uk Mon Oct 10 16:01:56 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 22:01:56 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com> <0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com> <2869E093-6788-40DA-A646-F2DCBB9CF778@telia.com> Message-ID: On 2016-10-10, Hans ?berg wrote: >> On 10 Oct 2016, at 22:15, Julian Bradfield wrote: >> What do you mean? The IPA in narrow transcription is intended to >> provide as detailed a description as a human mind can manage of >> sounds. It doesn't care whether you're describing differences between >> languages or differences within languages (a distinction that is not >> in any case well defined). > > It is designed for phonemic transcriptions, cf., > https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet It *was* designed, in 1870-something. Try reading the Handbook of the IPA. It contains many samples of languages transcribed both in a broad phonemic transcription appropriate for the language, and in a narrow phonetic transcription which should allow a competent phonetician to produce an understandable and reasonably accurate rendition of the passage. Indeed, a couple of decades ago, I participated in a public engagement event in which a few of us attempted to do exactly that. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From everson at evertype.com Mon Oct 10 16:06:29 2016 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Oct 2016 22:06:29 +0100 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> <14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com> Message-ID: <3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com> On 10 Oct 2016, at 21:58, Julian Bradfield wrote: > On 2016-10-10, Michael Everson wrote: >> I can?t use LaTeX notation. I don?t use that proprietary system. And don?t you dare tell me that I am benighted, or using Word. Neither applies. > > That's an interesting use of "proprietary" you have there, but I > suppose with your Alician interests, Humpty Dumpty's attitude to words > may have rubbed off on you! What *do* you mean? You have to have special knowledge and special software to use it. Apparently it?s used to good effect in mathematics, though a great deal of TeX material appears printed and has an obvious ?TeX? feel which to me looks rather ugly. In any case, TeX guys love TeX. And then there?s the rest of us. >> I have an edition of the Bible I?m setting. Big book. Verse numbers. I like these to be superscript so they?re unobtrusive. Damn right I use the superscript characters for these. I can process the text, export it for concordance processing, whatever, and those out-of-text notations DON?T get converted to regular digits, which I need. > > If you were doing it properly, the text would be stored in a suitable > markup, as would the verse numbers, and both the typesetting and the > concordance processing would deal with them appropriately. ?Properly?, sayeth the computer programmer. Sorry, Julian, but I use professional tools to typeset, and your disdain for that process isn?t going to change that industry. This ?suitable markup? business you?re talking about is not something people outside of ivory towers actually use. > No need for Unicode hacks. Unicode has superscript digits, preserved in plain text. Do I need to do calculations with these? No. Do I need them to be identical to ASCII digits? No. I need them to be persistent, searchable if necessary (yes the search is inconvenient vis ? vis the keyboard), and preserved in plain text. Because if they?re not preserved in plain text, then I may have to convert them again, which is tedious and inconvenient. Characters is save than markup, in an instance like this. That?s not using Unicode for a hack. That?s using Unicode to preserve distinctions in plain text. Michael From haberg-1 at telia.com Mon Oct 10 16:20:11 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 10 Oct 2016 23:20:11 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <20161010124240.665a7a7059d7ee80bb4d670165c8327d.61fa206381.wbe@email03.godaddy.com> <0014AEA5-7A0B-41B4-9C1D-FEF915AF39A4@telia.com> <2869E093-6788-40DA-A646-F2DCBB9CF778@telia.com> Message-ID: > On 10 Oct 2016, at 23:01, Julian Bradfield wrote: > > On 2016-10-10, Hans ?berg wrote: >>> On 10 Oct 2016, at 22:15, Julian Bradfield wrote: >>> What do you mean? The IPA in narrow transcription is intended to >>> provide as detailed a description as a human mind can manage of >>> sounds. It doesn't care whether you're describing differences between >>> languages or differences within languages (a distinction that is not >>> in any case well defined). >> >> It is designed for phonemic transcriptions, cf., >> https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet > > It *was* designed, in 1870-something. Try reading the Handbook of the > IPA. It contains many samples of languages transcribed both in a broad > phonemic transcription appropriate for the language, and in a narrow > phonetic transcription which should allow a competent phonetician to > produce an understandable and reasonably accurate rendition of the > passage. Indeed, a couple of decades ago, I participated in a public > engagement event in which a few of us attempted to do exactly that. But the alveolar clicks requires an extension. From jcb+unicode at inf.ed.ac.uk Mon Oct 10 16:36:49 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Mon, 10 Oct 2016 22:36:49 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> <14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com> <3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com> Message-ID: On 2016-10-10, Michael Everson wrote: > On 10 Oct 2016, at 21:58, Julian Bradfield wrote: >> That's an interesting use of "proprietary" you have there, but I .... > You have to have special knowledge and special software to use it. That's not what "proprietary" means. To quote the OED (which, by the way, is produced by an actual professional publisher, and is stored in XML, unless I'm badly mistaken), "proprietary" means "Of a product, esp. a drug or medicine: of which the manufacture or sale is restricted to a particular person or persons; (in later use) spec. marketed under and protected by patent or registered trade name." If you're typesetting your bible with no special software and no special knowledge, then you must be doing it by hand in cold metal. Somehow, I don't think you are. I suspect you're using software that is owned by somebody and marketed and protected. > Apparently it?s used to good effect in mathematics, though a great > deal of TeX material appears printed and has an obvious ?TeX? feel It's for printing, so of course it appears printed. The obvious TeX feel is the result of using the default style, which arises from Knuth's personal taste in mathematical typesetting, with Lamport's (abominable) taste in structural layout on top. There are tens of thousands of journals and books produced with LaTeX, in hundreds or thousands of styles. Among publishers you may have heard of, Addison-Wesley, CUP, Elsevier, John Benjamins, OUP, Princeton UP, Wiley all use LaTeX for a significant proportion of their output. They're all professionals. > ?Properly?, sayeth the computer programmer. Sorry, Julian, but I use professional tools to typeset, and your disdain for that process isn?t going to change that industry. This ?suitable markup? business you?re talking about is not something people outside of ivory towers actually use. You're a dilettante publisher using low-end professional graphic design tools to publish. Indesign, for example, is far easier to use for far greater effect than any LaTeX-based system if you're producing magazines or posters; but it's far worse if you care about the content. > That?s not using Unicode for a hack. That?s using Unicode to preserve distinctions in plain text. Only because you've a priori decided that superscripts are plain text instead of extra-textual decorations. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From doug at ewellic.org Mon Oct 10 16:39:33 2016 From: doug at ewellic.org (Doug Ewell) Date: Mon, 10 Oct 2016 14:39:33 -0700 Subject: Why incomplete subscript/superscript alphabet =?UTF-8?Q?=3F?= Message-ID: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com> Hans ?berg wrote: >>>> What do you mean? The IPA in narrow transcription is intended to >>>> provide as detailed a description as a human mind can manage of >>>> sounds. >>> >>> It is designed for phonemic transcriptions, cf., >>> https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet >> >> It *was* designed, in 1870-something. Try reading the Handbook of the >> IPA. It contains many samples of languages transcribed both in a >> broad phonemic transcription appropriate for the language, and in a >> narrow phonetic transcription which should allow a competent >> phonetician to produce an understandable and reasonably accurate >> rendition of the passage. > > But the alveolar clicks requires an extension. You've found ONE instance of non-distorted speech where IPA does not distinguish between two allophones. That is very different from saying that IPA is unsuitable for phonetic transcription. -- Doug Ewell | Thornton, CO, US | ewellic.org From everson at evertype.com Mon Oct 10 16:42:00 2016 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Oct 2016 22:42:00 +0100 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> <14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com> <3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com> Message-ID: On 10 Oct 2016, at 22:36, Julian Bradfield wrote: > You're a dilettante publisher using low-end professional graphic > design tools to publish. ?? Best, Michael Everson http://evertype.com/catalogue.html From frederic.grosshans at gmail.com Mon Oct 10 16:49:24 2016 From: frederic.grosshans at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBHcm9zc2hhbnM=?=) Date: Mon, 10 Oct 2016 21:49:24 +0000 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <2BB69E14-7238-49C0-AB41-2B648320780E@telia.com> Message-ID: Le lun. 10 oct. 2016 22:32, Julian Bradfield a ?crit : > On 2016-10-10, Hans ?berg wrote: > > It is possible to write math just using ASCII and TeX, which was the > original idea of TeX. Is that want you want for linguistics? > > I don't see the need to do everything in plain text. Long ago, I spent > a great deal of time getting my editor to do semi-wysiwyg TeX maths > (work later incorporated into x-symbol), but actually it's a waste of > time and I've given up. Working mathematicians know LaTeX and its control > sequences. Even my 12-year old uses LaTeX control sequences to > communicate with his online maths courses. > I am a physicist regularly using LaTeX. I actually use a LaTeX-based input method to have plain TeX math when possible. It makes more readable TeX files and emails, specially when the equations are a bit long. It also save characters when I livetweet scientific talks (like here https://twitter.com/fgrosshans/status/780715752752029696) The possibility to have reasonable plaintext math also helps to have reasonable results when copypasting an equation from a pdf on a mathjax enabled website. Of course, full plaintext math is not possible, and I don't think anyone reasonable wants a plaintext solution even for something as common as nested exponents and indices. Rich text formats like TeX have their use case, but that doesn't mean plain text math, with all its limitations, is useless. Fr?d?ric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Mon Oct 10 17:05:20 2016 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Tue, 11 Oct 2016 00:05:20 +0200 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com> References: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com> Message-ID: > On 10 Oct 2016, at 23:39, Doug Ewell wrote: > > Hans ?berg wrote: > >>>>> What do you mean? The IPA in narrow transcription is intended to >>>>> provide as detailed a description as a human mind can manage of >>>>> sounds. >>>> >>>> It is designed for phonemic transcriptions, cf., >>>> https://en.wikipedia.org/wiki/History_of_the_International_Phonetic_Alphabet >>> >>> It *was* designed, in 1870-something. Try reading the Handbook of the >>> IPA. It contains many samples of languages transcribed both in a >>> broad phonemic transcription appropriate for the language, and in a >>> narrow phonetic transcription which should allow a competent >>> phonetician to produce an understandable and reasonably accurate >>> rendition of the passage. >> >> But the alveolar clicks requires an extension. > > You've found ONE instance of non-distorted speech where IPA does not > distinguish between two allophones. That is very different from saying > that IPA is unsuitable for phonetic transcription. There are others, for example, in Dutch, the letter "v" and in "van" is pronounced in dialects in continuous variations between [f] and [v] depending on the timing of the fricative and the following vowel. It has become popular in some dictionaries to use [d] in the AmE where the BrE uses [t], but when listening, it sounds more like a [t] drawn towards [d]. The Merriam-Webster dictionary has its own system trying to capture variations. One does not really speak separate consonants and vowels, but they slide over and adapt. Describing that is pretty tricky. From mark at kli.org Mon Oct 10 17:06:48 2016 From: mark at kli.org (Mark E. Shoulson) Date: Mon, 10 Oct 2016 18:06:48 -0400 Subject: Why incomplete subscript/superscript alphabet ? In-Reply-To: References: <107E881C-5B0F-42B6-9C32-91F7FB2CFEC4@telia.com> <6E9FEDAB-D75B-4831-9036-E67732741E1E@telia.com> <31ECF7B9-0C54-4C5C-A74A-0880ED5F4787@evertype.com> <14408930-1A4C-48BB-9CAC-2365620AD9C4@evertype.com> <3D7294A8-5817-41F0-9C85-C7E3CCFE0C2A@evertype.com> Message-ID: <36a17a8b-57d9-8f2f-53c9-2dcf8de69aba@kli.org> On 10/10/2016 05:36 PM, Julian Bradfield wrote: > On 2016-10-10, Michael Everson wrote: > >> Apparently it?s used to good effect in mathematics, though a great >> deal of TeX material appears printed and has an obvious ?TeX? feel > It's for printing, so of course it appears printed. The obvious TeX > feel is the result of using the default style, which arises from > Knuth's personal taste in mathematical typesetting, with Lamport's > (abominable) taste in structural layout on top. There are tens of > thousands of journals and books produced with LaTeX, in hundreds or > thousands of styles. > > Among publishers you may have heard of, Addison-Wesley, CUP, Elsevier, > John Benjamins, OUP, Princeton UP, Wiley all use LaTeX for a > significant proportion of their output. They're all professionals. > To me, the main "TeX" feel that TeX-printed things tend to share is Knuth's distinctive Computer Modern font, not necessarily structure. You can typeset amazing things in TeX (viz. the Comparing Torah that Michael published for me); limitations there are mostly of your own making. (I haven't really been able to keep up with this thread in general, though.) ~mark From root at unicode.org Mon Oct 10 17:13:58 2016 From: root at unicode.org (Sarasvati) Date: Mon, 10 Oct 2016 17:13:58 -0500 Subject: Why incomplete subscript/superscript alphabet ? Message-ID: <201610102213.u9AMDwmq013813@sarasvati.unicode.org> Hello everyone. The level of discourse in this thread is beginning to deteriorate. Please rein in some of the excesses or the thread may have to be terminated. Regards from your, -- Sarasvati From jcb+unicode at inf.ed.ac.uk Tue Oct 11 03:39:06 2016 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Tue, 11 Oct 2016 09:39:06 +0100 (BST) Subject: Why incomplete subscript/superscript alphabet ? References: <20161010143933.665a7a7059d7ee80bb4d670165c8327d.5af1a28760.wbe@email03.godaddy.com> Message-ID: On 2016-10-10, Hans ?berg wrote: > There are others, for example, in Dutch, the letter "v" and in "van" > is pronounced in dialects in continuous variations between [f] and > [v] depending on the timing of the fricative and the following > vowel. Continuous variation is a universal truth of language. The IPA has mechanisms for describing crude differences in voicing, but if you're working at the level of, say, a difference between 0 ms and 20 ms in average voice onset time, you need to be using numbers and instruments, not symbols and the ear. The most extreme attempt I know to extend the IPA to fine phonetic detail is Canepari's book, with lots of symbols not in Unicode (I think...it's a long whlie since I looked at). It's completely ignored, because the level of detail he attempts to represent is well beyond the reproducible abilities of phoneticians unaided by acoustic analysis. > It has become popular in some dictionaries to use [d] in the > AmE where the BrE uses [t], but when listening, it sounds more like > a [t] drawn towards [d]. Are you talking about American flapping, where a /t/ between vowels is realized as [?]? I'd be surprised if any very serious dictionaries use to represent that - can you give an example? > One does not really speak separate consonants and vowels, but they slide over and adapt. Describing that is pretty tricky. This is also a universal truth of language! But it doesn't stop us making sensible abstractions, and notating them symbolically. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From c933103 at gmail.com Tue Oct 11 04:21:12 2016 From: c933103 at gmail.com (gfb hjjhjh) Date: Tue, 11 Oct 2016 17:21:12 +0800 Subject: Implementation of ideographic description characters In-Reply-To: References: Message-ID: After some researches, there is already a Mediawiki extension named as ids that do exactly what I asked about. ( https://www.mediawiki.org/wiki/Extension:Ids) With the only problem is that ?is still not yet supported by the system. Now the question is can this extension become something integrated into a font. 2016-08-05 3:26 GMT+08:00 Thomas H Gewecke : > > On Aug 4, 2016, at 2:45 PM, gfb hjjhjh wrote: > > That Wikipedia page also have a section named as "Ideographic Description > Sequences" which is exactly forming sequences base on those ideographic > description characters > > > As I understand it, such sequences may provide a ?description? of kanji > useful for some purposes, but are not sufficient to properly ?render? them. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruland at luckymail.com Tue Oct 11 07:54:54 2016 From: ruland at luckymail.com (Charlie Ruland) Date: Tue, 11 Oct 2016 14:54:54 +0200 Subject: Implementation of ideographic description characters In-Reply-To: References: Message-ID: <76fd2a2d-5097-6b9e-1a24-d9d607b8852e@luckymail.com> This Mediawiki extension reminds me of svghanzi.appspot.com/ . If you don't understand the Russian instructions, read Creating Characters by SVG by John Pasden. gfb hjjhjh wrote: > After some researches, there is already a Mediawiki extension named as > ids that do exactly what I asked about. > (https://www.mediawiki.org/wiki/Extension:Ids) With the only problem > is that ?is still not yet supported by the system. Now the question is > can this extension become something integrated into a font. > > 2016-08-05 3:26 GMT+08:00 Thomas H Gewecke >: > > >> On Aug 4, 2016, at 2:45 PM, gfb hjjhjh > > wrote: >> >> That Wikipedia page also have a section named as "Ideographic >> Description Sequences" which is exactly forming sequences base on >> those ideographic description characters >> >> > > As I understand it, such sequences may provide a ?description? of > kanji useful for some purposes, but are not sufficient to > properly ?render? them. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Oct 11 09:27:05 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 11 Oct 2016 16:27:05 +0200 Subject: Implementation of ideographic description characters In-Reply-To: References: Message-ID: Actually that extension for now only has data tuned for Traditional Chinese, and does not implement the full set of IDS mappings (not the complete Unicode repertoire), but it contains really many mappings for many IDS strings that have no Unicode encoding. Only very few ideographic sources are used (not all those listed in the Unihan database) and only two "True" variants are supported (for some characters) in the database but only one returned by the current renderer implementation in Java. Some mappings exist in tow versions: a generic one using some undecomposed strokes/parts (from the Unicode repertoire), and an expanded one where some strokes are further decomposed (but using Traditional Chinese rules). In many mappings, the two IDS are identical. The generic mapping is used to handle many cases using overstriking IDS decompositions (which are not further decomposed in the "expanded" IDS). The database it contains is still in development though, but its schema cannot really handle locale-specific variants, or additional variants that are encoded in Unicode, except if they have a mapping in the CNS encoding (the database contains a snapshot of the CNS to Big5 and CNS to Unicode conversion tables, but they are not indexed and probably not used by the Java written engine, i suppose they are just there only to allow registering the composite glyphs that have been mapped to an IDS). Then all IDS are are mapped into a dozen of virtual fonts (with a numeric id between 0 and 13) and a glyph ID (assigned in the PUA range of the BMP; font 0 is special as it contains all the base glyphs needed to compose all other virtual fonts). But for now this database contains no instruction for more precise placement or resizing of components, the placement is performed using generic rules from the IDS itself (and some rules impleemnted in the Java code for adjusting specific strokes depending on their placement, and adjusting the relative stroke weights in the composition), and that's probably why the overstriking IDS (with ?)) cannot be processed: instead they are mapped directly to a NULL unicode entry if needed or left undecomposed both in the generic IDS and the extended IDS. It's interesting though. But to adapt the code to Japanese or Korean, you'll need to extend the current schema. Notably in the main table containing the list of all supported IDS (generic plus expanded) as it allows only a single mapping to Unicode (or NULL if there's no such encoding) and has no column for specifying a localisation variant or ideographic source (such a dictionnary, book, regional standard, or epoch). ---- Note that when viewing these IDS strings, I've seen that Chrome really has a problem in displaying the IDS symbols (probably because of incorrect autohinting): the dotted squares become random foms at usual font sizes (12px or less) and just display garbage. It may be caused by some fonts on my Windows 10 system. You need to zoom in the page to get a correct view of IDS strings. When looking into the Chrome console, I see that symbols are taken from a couple of system fonts (provided by Windows). Normally the IDS symbols are very simple in design and even if they are dotted and can be quirky to adjust at small sizes (to avoid dots to disappear or merge into segments of lines), my opinion is that hinting for these symbols is simply bad in Windows fonts or uses some proprietary technics in the OpenType renderer of Windows, not supported by the font renderer of Chrome. Those symbols should be correct with most common foint sizes used on the web. In plain-text editors, the glyphs are correct at reasonnable font sizes, but the top dotted border of these symbols is most often truncated (probably extended too high above the line-height, and probably using incorrect metrics). 2016-10-11 11:21 GMT+02:00 gfb hjjhjh : > After some researches, there is already a Mediawiki extension named as ids > that do exactly what I asked about. (https://www.mediawiki.org/ > wiki/Extension:Ids) With the only problem is that ?is still not yet > supported by the system. Now the question is can this extension become > something integrated into a font. > > 2016-08-05 3:26 GMT+08:00 Thomas H Gewecke : > >> >> On Aug 4, 2016, at 2:45 PM, gfb hjjhjh wrote: >> >> That Wikipedia page also have a section named as "Ideographic Description >> Sequences" which is exactly forming sequences base on those ideographic >> description characters >> >> >> As I understand it, such sequences may provide a ?description? of kanji >> useful for some purposes, but are not sufficient to properly ?render? them. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobbytung at wanderer.tw Tue Oct 11 07:39:52 2016 From: bobbytung at wanderer.tw (=?UTF-8?B?6JGj56aP6IiI?=) Date: Tue, 11 Oct 2016 20:39:52 +0800 Subject: Implementation of ideographic description characters In-Reply-To: References: Message-ID: <4403576340395054382@unknownmsgid> The ids extension can dynamically composing parts with ids into a SVG and displayed on media wiki. I know the team implied this function in Taiwan. They are dealing a Taiwanese dictionary contained several Hanzi not encoded into Unicode. Bobby Tung gfb hjjhjh ? 2016?10?11? ??5:27 ??? After some researches, there is already a Mediawiki extension named as ids that do exactly what I asked about. ( https://www.mediawiki.org/wiki/Extension:Ids) With the only problem is that ?is still not yet supported by the system. Now the question is can this extension become something integrated into a font. 2016-08-05 3:26 GMT+08:00 Thomas H Gewecke : > > On Aug 4, 2016, at 2:45 PM, gfb hjjhjh wrote: > > That Wikipedia page also have a section named as "Ideographic Description > Sequences" which is exactly forming sequences base on those ideographic > description characters > > > As I understand it, such sequences may provide a ?description? of kanji > useful for some purposes, but are not sufficient to properly ?render? them. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Tue Oct 11 10:52:20 2016 From: dzo at bisharat.net (dzo at bisharat.net) Date: Tue, 11 Oct 2016 15:52:20 +0000 Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on Message-ID: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry> Of possible interest - I noted recently the continued use of "3" for "?" in tweets & some web content about a pair of Ghanaian plays whose titles include the Ga language term "Wogb? J?k?." See http:/niamey.blogspot.com/2016/10/wogb-jk-ghanaian-language-input-support.html The problem is input systems, not availability of fonts as it once was. Keyboard layouts exist for Ga and other Ghanaian languages, and these enable typing needed extended Latin characters. But a number of them, including possibly all for mobile devices, work by substituting selected key assignments, which in the case of multilingual text would apparently mean switching keyboards to accommodate characters not present in both/all languages used. Not ideal. What are the possibilities of extended keyboard options on mobile devices for extended Latin characters to facilitate multilingual text composition? What is current thinking / practice wrt expanding virtual keyboards? This gets beyond Unicode proper to ISO/IEC 9995 and perhaps ISO/IEC 14755, so may be beyond the scope of the list. Any responses off-list I can summarize if of wider interest. Thanks in advance for any info. Don Osborn Sent via BlackBerry by AT&T From prosfilaes at gmail.com Tue Oct 11 11:26:08 2016 From: prosfilaes at gmail.com (David Starner) Date: Tue, 11 Oct 2016 16:26:08 +0000 Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry> References: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry> Message-ID: On Tue, Oct 11, 2016 at 8:55 AM wrote: > What is current thinking / practice wrt expanding virtual keyboards? > I'm just a user here, and that of the English and Esperanto keyboards on Android, but given swipe input and autocorrect both depending on knowing what language is being entered, it seems unlikely that virtual keyboards are going to evolve towards being better at multilingual input. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Tue Oct 11 11:48:00 2016 From: doug at ewellic.org (Doug Ewell) Date: Tue, 11 Oct 2016 09:48:00 -0700 Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on Message-ID: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> Don Osborn wrote: > What are the possibilities of extended keyboard options on mobile > devices for extended Latin characters to facilitate multilingual text > composition? What is current thinking / practice wrt expanding virtual > keyboards? > > This gets beyond Unicode proper to ISO/IEC 9995 and perhaps ISO/IEC > 14755, so may be beyond the scope of the list. Any responses off-list > I can summarize if of wider interest. You mentioned mobile devices, but also mentioned ISO/IEC 9995 and 14755, which seem to deal primarily with computer keyboards. On Windows, John Cowan's Moby Latin keyboard [1] allows the input of more than 800 non-ASCII characters, including the two mentioned in your post (? and ?): AltGr+p, o 0254 LATIN SMALL LETTER OPEN O AltGr+p, e 025B LATIN SMALL LETTER OPEN E Moby Latin is a strict superset of the standard U.S. English keyboard; that is, none of the standard keystrokes were redefined, unlike keyboards such as United States-International which tend to redefine keys for ASCII characters that look like diacritical marks, making adoption difficult. There are also versions of Moby based on the standard U.K. keyboard. [1] http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html -- Doug Ewell | Thornton, CO, US | ewellic.org From charupdate at orange.fr Wed Oct 12 01:45:33 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 12 Oct 2016 08:45:33 +0200 (CEST) Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry> References: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry> Message-ID: <436090000.1081.1476254733252.JavaMail.www@wwinf1h22> On Tue, 11 Oct 2016 15:52:20 +0000, dzo_at_bisharat.net wrote: > Of possible interest - I noted recently the continued use of "3" for "?" in tweets > & some web content about a pair of Ghanaian plays whose titles include the Ga > language term "Wogb? J?k?." > > See http:/niamey.blogspot.com/2016/10/wogb-jk-ghanaian-language-input-support.html > > The problem is input systems, not availability of fonts as it once was. Keyboard > layouts exist for Ga and other Ghanaian languages, and these enable typing needed > extended Latin characters. But a number of them, including possibly all for mobile > devices, work by substituting selected key assignments, which in the case of > multilingual text would apparently mean switching keyboards to accommodate > characters not present in both/all languages used. Not ideal. > > What are the possibilities of extended keyboard options on mobile devices for > extended Latin characters to facilitate multilingual text composition? What is > current thinking / practice wrt expanding virtual keyboards? > > This gets beyond Unicode proper to ISO/IEC 9995 and perhaps ISO/IEC > 14755, so may be beyond the scope of the list. Any responses off-list > I can summarize if of wider interest. One way to deal with increased sets of directly accessed letters is to map the extended letters on the digits row, and to toggle between a languages layout without directly accessed digits, and an ASCII layout, and to do this not via the system facility, but with a hard-coded toggle on key E00. This way I?m catering for French, [1] and I project to derive a Malian layout from it, but for Ga one has to start from the US-English layout, except where French has been adopted in Ghana. I see no particular challenges in starting from whatever layout to implement this (including Vietnamese and Lithuanian, where digits are *already* on third level), when the users are interested in a change for enhancement; but in adding an extra row to a cellphone on-screen keyboard I do see several. Kind regards, Marcel [1] http://dispoclavier.com/#i0 From zelpahd at gmail.com Wed Oct 12 05:58:30 2016 From: zelpahd at gmail.com (zelpa) Date: Wed, 12 Oct 2016 21:58:30 +1100 Subject: Emoji end goal Message-ID: So what exactly is the end goal for emoji? First we had the fitzpatrick skin modifiers, now there's the proposal for gendered emoji sequences using ZWJ. There was even the proposal for the hair colour modifier in TR 53. So what is the true end goal? Will we one day be able to display our Fallout 4 character with a single emoji and 60 modifiers? And honestly, who is asking for these additions? Does anybody WANT a hair colour modifier? Seems to me like the consortium might just be pandering to a few silly requests (by people who have no actual idea what unicode is) to get media attention. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leoboiko at gmail.com Wed Oct 12 08:47:01 2016 From: leoboiko at gmail.com (Leonardo Boiko) Date: Wed, 12 Oct 2016 10:47:01 -0300 Subject: Emoji end goal In-Reply-To: References: Message-ID: Yes, the end goal of the Unicode Consortium is media attention by way of virtue signaling. For every online article about emoji modifiers, each individual member of the Consortium earns a fifty-Euro bonus from our masters, the global feminist cultural-Marxist Jewish conspiracy, for our support in propagating political correctness and ultimately implementing ONU's One World Government. In fact, the end goal for emoji (as originally planned by Gramsci and Adorno in UAX #1922) is to be the mandatory Newspeak-style writing system of the NWO, so as to brainwash citizens away from scientific truths like race realism or the sociobiology of gender. As soon as WOMAN+ ZWJ+President Hillary finish assassinating the last remaining ASCII reactionaries, full emoji deployment will be in order, and we'll indoctrinate every child to internalize standard Communist dogma such as "all ethnicities deserve equal representation in media" or "all combinations of genders and professions should be considered equally valid". The lead experiments at Tumblr and Instagram were very successful, proving that emoji have great potential as tools of indoctrination. 2016/10/12 10:02 "zelpa" : > So what exactly is the end goal for emoji? First we had the fitzpatrick > skin modifiers, now there's the proposal for gendered emoji sequences using > ZWJ. There was even the proposal for the hair colour modifier in TR 53. So > what is the true end goal? Will we one day be able to display our Fallout 4 > character with a single emoji and 60 modifiers? And honestly, who is asking > for these additions? Does anybody WANT a hair colour modifier? Seems to me > like the consortium might just be pandering to a few silly requests (by > people who have no actual idea what unicode is) to get media attention. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zelpahd at gmail.com Wed Oct 12 08:55:39 2016 From: zelpahd at gmail.com (zelpa) Date: Thu, 13 Oct 2016 00:55:39 +1100 Subject: Emoji end goal In-Reply-To: References: Message-ID: >"all ethnicities deserve equal representation in media" or "all combinations of genders and professions should be considered equally" I wasn't aware that bald yellow people were a race, sorry. If anything, adding the skintone modifiers has made me feel LESS included, what if I don't fit in to one of the 5 categories? What if I drank too much colloidal silver and have blue skin? Sure would be nice to be able to express an emotion without also expressing my gender and race. What a wacky world would that be. And as for the professions? As I've said on the mailing list in the past, the current proposal makes it IMPOSSIBLE to display certain professions as gender-neutral. Is that really a step forward? Can we not just have gender-neutral, race-neutral emoji? Is that really too much to ask? On Thu, Oct 13, 2016 at 12:47 AM, Leonardo Boiko wrote: > Yes, the end goal of the Unicode Consortium is media attention by way of > virtue signaling. For every online article about emoji modifiers, each > individual member of the Consortium earns a fifty-Euro bonus from our > masters, the global feminist cultural-Marxist Jewish conspiracy, for our > support in propagating political correctness and ultimately implementing > ONU's One World Government. In fact, the end goal for emoji (as originally > planned by Gramsci and Adorno in UAX #1922) is to be the mandatory > Newspeak-style writing system of the NWO, so as to brainwash citizens away > from scientific truths like race realism or the sociobiology of gender. As > soon as WOMAN+ ZWJ+President Hillary finish assassinating the last > remaining ASCII reactionaries, full emoji deployment will be in order, and > we'll indoctrinate every child to internalize standard Communist dogma such > as "all ethnicities deserve equal representation in media" or "all > combinations of genders and professions should be considered equally > valid". The lead experiments at Tumblr and Instagram were very successful, > proving that emoji have great potential as tools of indoctrination. > > 2016/10/12 10:02 "zelpa" : > >> So what exactly is the end goal for emoji? First we had the fitzpatrick >> skin modifiers, now there's the proposal for gendered emoji sequences using >> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So >> what is the true end goal? Will we one day be able to display our Fallout 4 >> character with a single emoji and 60 modifiers? And honestly, who is asking >> for these additions? Does anybody WANT a hair colour modifier? Seems to me >> like the consortium might just be pandering to a few silly requests (by >> people who have no actual idea what unicode is) to get media attention. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 637275 at gmail.com Wed Oct 12 10:17:22 2016 From: 637275 at gmail.com (Rebecca T) Date: Wed, 12 Oct 2016 11:17:22 -0400 Subject: Emoji end goal In-Reply-To: References: Message-ID: Well, I think it?s definitely important to have representation and expression for people of all skin tones and genders even in things like emoji. I think we?re rapidly reaching a limit for variation sequences, and I?m personally not begging for hair color modifiers (although I would welcome them). I do worry a bit about the burden of supporting emoji on new systems. Drawing thousands (not that anyone can even count how many emoji there are) is a significant burden on developers creating new systems, and the alternative (tofu) isn?t appealing. There is Symbola (which leaves something to be desired, to say the least) and the graphical solutions, like Apple?s image-based or Microsoft?s layered-vector approach, have non-trivial implementations (stuff I wouldn?t want to take care of if I was creating a new system). I guess what I?m saying is: does anyone want to extent Unifont into the astral planes? On Wednesday, October 12, 2016, zelpa wrote: > So what exactly is the end goal for emoji? First we had the fitzpatrick > skin modifiers, now there's the proposal for gendered emoji sequences using > ZWJ. There was even the proposal for the hair colour modifier in TR 53. So > what is the true end goal? Will we one day be able to display our Fallout 4 > character with a single emoji and 60 modifiers? And honestly, who is asking > for these additions? Does anybody WANT a hair colour modifier? Seems to me > like the consortium might just be pandering to a few silly requests (by > people who have no actual idea what unicode is) to get media attention. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oren.watson at gmail.com Wed Oct 12 11:17:36 2016 From: oren.watson at gmail.com (Oren Watson) Date: Wed, 12 Oct 2016 12:17:36 -0400 Subject: Emoji end goal Message-ID: I am the maker of a similar project to unifont, albeit a work in progress (see link below), and I certainly won't be supporting anything more than gender-neutral, race-neutral emoji. This is due to technical considerations: I don't plan on having colors in my font. The GNU unifont project already has many emoji, but they also are not colored. On the other hand, emoji are far from the most technically challenging category of characters in unicode. http://www.orenwatson.be/fontdemo.htm On Wed, Oct 12, 2016 at 11:17 AM, Rebecca T <637275 at gmail.com> wrote: > Well, I think it?s definitely important to have representation and > expression for people of all skin tones and genders even in things like > emoji. > > I think we?re rapidly reaching a limit for variation sequences, and I?m > personally not begging for hair color modifiers (although I would welcome > them). > > I do worry a bit about the burden of supporting emoji on new systems. > Drawing thousands (not that anyone can even count how many emoji there are) > is a significant burden on developers creating new systems, and the > alternative (tofu) isn?t appealing. There is Symbola (which leaves > something to be desired, to say the least) and the graphical solutions, > like Apple?s image-based or Microsoft?s layered-vector approach, have > non-trivial implementations (stuff I wouldn?t want to take care of if I was > creating a new system). > > I guess what I?m saying is: does anyone want to extent Unifont into the > astral planes? > > On Wednesday, October 12, 2016, zelpa wrote: > >> So what exactly is the end goal for emoji? First we had the fitzpatrick >> skin modifiers, now there's the proposal for gendered emoji sequences using >> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So >> what is the true end goal? Will we one day be able to display our Fallout 4 >> character with a single emoji and 60 modifiers? And honestly, who is asking >> for these additions? Does anybody WANT a hair colour modifier? Seems to me >> like the consortium might just be pandering to a few silly requests (by >> people who have no actual idea what unicode is) to get media attention. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Oct 12 11:31:20 2016 From: doug at ewellic.org (Doug Ewell) Date: Wed, 12 Oct 2016 09:31:20 -0700 Subject: Emoji end goal Message-ID: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com> Leonardo Boiko wrote: Gosh, even I wouldn't have gone that far. -- Doug Ewell | Thornton, CO, US | ewellic.org From 637275 at gmail.com Wed Oct 12 11:56:01 2016 From: 637275 at gmail.com (Rebecca T) Date: Wed, 12 Oct 2016 12:56:01 -0400 Subject: Emoji end goal In-Reply-To: References: Message-ID: Sure, and kanji have romanisations but that doesn?t make the latin alphabet language neutral. And yes, emoji were supposed to be language neutral but all the implementers made them default to male. I think you have an *argument* with skin-tone neutrality but I think you?d be hard-pressed to find any POC who think the Fitzpatrick modifiers were a mistake. Also, the ?what if my skin was blue? argument is a red herring ? nobody has blue skin, so it?s a moot point. However, if you do find yourself drinking silver, I suggest U+1F922 ?? Nauseated Face. On Wednesday, October 12, 2016, zelpa wrote: > >"all ethnicities deserve equal representation in media" or "all > combinations of genders and professions should be considered equally" > I wasn't aware that bald yellow people were a race, sorry. If anything, > adding the skintone modifiers has made me feel LESS included, what if I > don't fit in to one of the 5 categories? What if I drank too much colloidal > silver and have blue skin? Sure would be nice to be able to express an > emotion without also expressing my gender and race. What a wacky world > would that be. And as for the professions? As I've said on the mailing list > in the past, the current proposal makes it IMPOSSIBLE to display certain > professions as gender-neutral. Is that really a step forward? Can we not > just have gender-neutral, race-neutral emoji? Is that really too much to > ask? > > > On Thu, Oct 13, 2016 at 12:47 AM, Leonardo Boiko > wrote: > >> Yes, the end goal of the Unicode Consortium is media attention by way of >> virtue signaling. For every online article about emoji modifiers, each >> individual member of the Consortium earns a fifty-Euro bonus from our >> masters, the global feminist cultural-Marxist Jewish conspiracy, for our >> support in propagating political correctness and ultimately implementing >> ONU's One World Government. In fact, the end goal for emoji (as originally >> planned by Gramsci and Adorno in UAX #1922) is to be the mandatory >> Newspeak-style writing system of the NWO, so as to brainwash citizens away >> from scientific truths like race realism or the sociobiology of gender. As >> soon as WOMAN+ ZWJ+President Hillary finish assassinating the last >> remaining ASCII reactionaries, full emoji deployment will be in order, and >> we'll indoctrinate every child to internalize standard Communist dogma such >> as "all ethnicities deserve equal representation in media" or "all >> combinations of genders and professions should be considered equally >> valid". The lead experiments at Tumblr and Instagram were very successful, >> proving that emoji have great potential as tools of indoctrination. >> >> 2016/10/12 10:02 "zelpa" : >> >>> So what exactly is the end goal for emoji? First we had the fitzpatrick >>> skin modifiers, now there's the proposal for gendered emoji sequences using >>> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So >>> what is the true end goal? Will we one day be able to display our Fallout 4 >>> character with a single emoji and 60 modifiers? And honestly, who is asking >>> for these additions? Does anybody WANT a hair colour modifier? Seems to me >>> like the consortium might just be pandering to a few silly requests (by >>> people who have no actual idea what unicode is) to get media attention. >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Wed Oct 12 12:09:58 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 12 Oct 2016 19:09:58 +0200 Subject: Emoji end goal In-Reply-To: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com> References: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com> Message-ID: I think that emojis at the minimum shoudl all be dispalyable isolately, without being required to form pseudo ligatures or to use colors. Skin colors can still be displayed with a patchwork-like rectangle after it and could still use monochromaic pattern fills. The number of combinations is exploding and most of them are infact not evident at all (or are highly culturally oriented). Amojis should remain simple, showing basic shapes, but I don't see why it could not differentiate a man or a woman, independantly of the ligatures that may be created with them (using a completely invented adhoc "orthography" that actually follows no standard at all and does not match cultural differences or the way we perceive the associations, that are more and more limiting their semantic interpretation in a too much restricted way. We certaionly don't have enough history is using emojis for creating and standardizing such pseudo-orthography. Emojis remain a new pseudo-language, but it reuses a typography based on visible symbols that have a long cutlural tradition with other cultural meanings and many unexpected semantics that don't work with the current associations created. So in fact I only support very few associations: - associating two "Flag" pseudo-letters (but a rendering should still be OK if the emojis just show the actual letters within a left or right part of a frame for a flag., without attempting to combine them into an actual colored flag (which will need to evolve with time). - associating skin color emojis after an emoji for a real human person or perosn face (no need this in fiction characters or for coloring other parts such as hands, fingers, eyes, hair, nose...) In all cases, colors should always remain an option. Please keep emojis simple and always usable in isolation, leaving their interpretation and associations only to reading humans according to their local culture and social interactions. The way they are used now is in fact abusing the initial goal of Unicode encoding which is to not encode according to specific languages or culture, and not break their basic semantic. byt mising them into something that is not clearly separable and does not carry the same amount of semantics. 2016-10-12 18:31 GMT+02:00 Doug Ewell : > Leonardo Boiko wrote: > > > > Gosh, even I wouldn't have gone that far. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 637275 at gmail.com Wed Oct 12 13:45:00 2016 From: 637275 at gmail.com (Rebecca T) Date: Wed, 12 Oct 2016 14:45:00 -0400 Subject: Emoji end goal In-Reply-To: References: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com> Message-ID: Agreed. I think a good response to ?that?d _double_ the codepoints, so we should just add a ligature? is ?if it would be such a burden to implement that you don?t want to use space in the charts for what are, fundamentally, hundreds of *semantically different* ideographs, why are we dumping that burden onto vendors?? On Wed, Oct 12, 2016 at 1:09 PM, Philippe Verdy wrote: > I think that emojis at the minimum shoudl all be dispalyable isolately, > without being required to form pseudo ligatures or to use colors. Skin > colors can still be displayed with a patchwork-like rectangle after it and > could still use monochromaic pattern fills. The number of combinations is > exploding and most of them are infact not evident at all (or are highly > culturally oriented). > > Amojis should remain simple, showing basic shapes, but I don't see why it > could not differentiate a man or a woman, independantly of the ligatures > that may be created with them (using a completely invented adhoc > "orthography" that actually follows no standard at all and does not match > cultural differences or the way we perceive the associations, that are more > and more limiting their semantic interpretation in a too much restricted > way. > > We certaionly don't have enough history is using emojis for creating and > standardizing such pseudo-orthography. Emojis remain a new pseudo-language, > but it reuses a typography based on visible symbols that have a long > cutlural tradition with other cultural meanings and many unexpected > semantics that don't work with the current associations created. > > So in fact I only support very few associations: > - associating two "Flag" pseudo-letters (but a rendering should still be > OK if the emojis just show the actual letters within a left or right part > of a frame for a flag., without attempting to combine them into an actual > colored flag (which will need to evolve with time). > - associating skin color emojis after an emoji for a real human person or > perosn face (no need this in fiction characters or for coloring other parts > such as hands, fingers, eyes, hair, nose...) > > In all cases, colors should always remain an option. Please keep emojis > simple and always usable in isolation, leaving their interpretation and > associations only to reading humans according to their local culture and > social interactions. The way they are used now is in fact abusing the > initial goal of Unicode encoding which is to not encode according to > specific languages or culture, and not break their basic semantic. byt > mising them into something that is not clearly separable and does not carry > the same amount of semantics. > > 2016-10-12 18:31 GMT+02:00 Doug Ewell : > >> Leonardo Boiko wrote: >> >> >> >> Gosh, even I wouldn't have gone that far. >> >> -- >> Doug Ewell | Thornton, CO, US | ewellic.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prosfilaes at gmail.com Wed Oct 12 15:14:31 2016 From: prosfilaes at gmail.com (David Starner) Date: Wed, 12 Oct 2016 20:14:31 +0000 Subject: Emoji end goal In-Reply-To: References: <20161012093120.665a7a7059d7ee80bb4d670165c8327d.58ec0c61e7.wbe@email03.godaddy.com> Message-ID: On Wed, Oct 12, 2016 at 11:48 AM Rebecca T <637275 at gmail.com> wrote: > Agreed. I think a good response to ?that?d _double_ the codepoints, so we > should just add a ligature? is ?if it would be such a burden to implement > that you don?t want to use space in the charts for what are, fundamentally, > hundreds of *semantically different* ideographs, why are we dumping that > burden onto vendors?? > Because the vendors want it. There's far more people who can and will implement emoji completely than who support all Han ideographs or many ancient scripts. If you don't want to support it because it's too big a burden, then don't. If you don't have that option because your users are demanding it, then Unicode is successfully providing the options the users want, and if that feature is too much of a burden for you to support, perhaps the problem is that you picked a problem you couldn't feasibly solve. I'd compare OSes. An operating system is probably about a man-year of work, until you have all this problem with people wanting fancy font support and graphical user interfaces and both IPv4 and IPv6 support and reading CDs and audio support and all this ridiculous stuff. (A real OS supports either punch cards or a keyboard for input, and outputs to a line printer.) Today, pretty much only a major megacorp can make an OS from scratch, and even Google used the Linux kernel and Java to simplify making Android. You could blame Unicode for a small part of that, but Unicode isn't making you implement Unicode in your OS; your users are making that demand. -------------- next part -------------- An HTML attachment was scrubbed... URL: From irgendeinbenutzername at gmail.com Wed Oct 12 15:40:20 2016 From: irgendeinbenutzername at gmail.com (Charlotte Buff) Date: Wed, 12 Oct 2016 22:40:20 +0200 Subject: Emoji end goal Message-ID: On Wed, 12 Oct 2016 20:14:31 +0000 David Starner > wrote: > Because the vendors want it. I wouldn't say so in general. Emoji fonts are far more work than regular black-and-white vectors and I honestly believe that vendors with PNG-based fonts like Apple and Google are slowly reaching the point where they can no longer reasonably support any more emoji because their font sizes would just blow up. I have noticed that recently vendors have become quite picky on what emoji they want to support, going so far as blocking the addition of new symbol characters to the UCS entirely, rather than just refusing to give them emoji presentation once added. (Why they still thought the hundreds of new gendered emoji were a good idea is another question.) It's not like back in Unicode 7 when Apple and friends happily added half of Webdings to their colorful emoji fonts for no apparent reason. I think vendors really don't want to spend their time and effort on emoji anymore. Things like hair colors are pretty much unfeasible for anyone besides Microsoft, but as soon as there is some kind of semi-official Unicode mechanism for that, user will *demand* you to follow through and implement all possible variants. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oren.watson at gmail.com Wed Oct 12 16:40:16 2016 From: oren.watson at gmail.com (Oren Watson) Date: Wed, 12 Oct 2016 17:40:16 -0400 Subject: Emoji end goal In-Reply-To: References: Message-ID: I think ultimately there isn't an end goal. Unlike most of the other languages/scripts that unicode supports, emoji is currently in a state of rapid, decentralized, and asynchronous evolution and development, with various companies and communities contributing new ideas every year. It doesn't have an end goal because it isn't a project with a single entity or leader who defines its direction, as for example Esperanto was. On Wed, Oct 12, 2016 at 6:58 AM, zelpa wrote: > So what exactly is the end goal for emoji? First we had the fitzpatrick > skin modifiers, now there's the proposal for gendered emoji sequences using > ZWJ. There was even the proposal for the hair colour modifier in TR 53. So > what is the true end goal? Will we one day be able to display our Fallout 4 > character with a single emoji and 60 modifiers? And honestly, who is asking > for these additions? Does anybody WANT a hair colour modifier? Seems to me > like the consortium might just be pandering to a few silly requests (by > people who have no actual idea what unicode is) to get media attention. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From c933103 at gmail.com Thu Oct 13 03:39:52 2016 From: c933103 at gmail.com (gfb hjjhjh) Date: Thu, 13 Oct 2016 16:39:52 +0800 Subject: Emoji end goal In-Reply-To: References: Message-ID: So, according to the emoji FAQ , the end goal of emoji is to have no emoji? Or something like Softbank's escape sequence? >Q: What is the longer term plan for emoji? >A: The Unicode Consortium encourages the use of embedded graphics (a.k.a. ?stickers?) as a longer-term solution, since they allow much more freedom of expression. See Longer Term Solutions in UTR #51 . btw is it just me or is the original Japanese carrier emoji, specifically those provided by DoCoMo, still not completely coded into Unicode? I counted the number of I-mode emoji listed on Japanese Wikipedia in the tron code section and there're apparently more emoji than those that are in emoji but I don't know which is missing. 2016-10-13 5:40 GMT+08:00 Oren Watson : > I think ultimately there isn't an end goal. Unlike most of the other > languages/scripts that unicode supports, emoji is currently in a state of > rapid, decentralized, and asynchronous evolution and development, with > various companies and communities contributing new ideas every year. It > doesn't have an end goal because it isn't a project with a single entity or > leader who defines its direction, as for example Esperanto was. > > On Wed, Oct 12, 2016 at 6:58 AM, zelpa wrote: > >> So what exactly is the end goal for emoji? First we had the fitzpatrick >> skin modifiers, now there's the proposal for gendered emoji sequences using >> ZWJ. There was even the proposal for the hair colour modifier in TR 53. So >> what is the true end goal? Will we one day be able to display our Fallout 4 >> character with a single emoji and 60 modifiers? And honestly, who is asking >> for these additions? Does anybody WANT a hair colour modifier? Seems to me >> like the consortium might just be pandering to a few silly requests (by >> people who have no actual idea what unicode is) to get media attention. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwalla at gmail.com Thu Oct 13 10:35:44 2016 From: gwalla at gmail.com (Garth Wallace) Date: Thu, 13 Oct 2016 08:35:44 -0700 Subject: Emoji end goal In-Reply-To: References: Message-ID: On Thu, Oct 13, 2016 at 1:39 AM, gfb hjjhjh wrote: > So, according to the emoji FAQ > , the end goal of emoji is to > have no emoji? Or something like Softbank's escape sequence? > >Q: What is the longer term plan for emoji? > >A: The Unicode Consortium encourages the use of embedded graphics (a.k.a. > ?stickers?) as a longer-term solution, since they allow much more freedom > of expression. See Longer Term Solutions > in UTR #51 > . > > btw is it just me or is the original Japanese carrier emoji, specifically > those provided by DoCoMo, still not completely coded into Unicode? I > counted the number of I-mode emoji listed on Japanese Wikipedia in the tron > code section and there're apparently more emoji than those that are in > emoji but I don't know which is missing. > Shibuya 109 was left out because AIUI, unlike the other landmarks, it's private property. Are there any others? -------------- next part -------------- An HTML attachment was scrubbed... URL: From harshula at hj.id.au Thu Oct 13 21:08:18 2016 From: harshula at hj.id.au (Harshula) Date: Fri, 14 Oct 2016 13:08:18 +1100 Subject: Noto unified font In-Reply-To: References: <201610082344.04995.luke@dashjr.org> <8930ff14-647d-757a-1329-e6e2a14a89a7@hj.id.au> <201610090250.44483.luke@dashjr.org> <53b1e87d-89c7-095d-0676-979305eb1a54@hj.id.au> Message-ID: Philippe, I presume your response was intended for Luke. If not, you may want to re-read the thread. On 09/10/16 15:37, Philippe Verdy wrote: > The licence itself says it respects the 4 FSF freedoms. It also > explicitly allows reselling (rule DFSG #1): > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > > It is not directly compatible with the GPL in a composite product, but > with LGPL there's no problem, and there's no problem if the font is > clearly separable and distributed along with its licence, even if the > software coming with it or the package containing it is commercial: you > are allowed to detach it from the package and redistribute. > > Really you are challenging the licence for unfair reasons > May be you just think that the GPL or MIT licences are enough. > > Or you'd like the Public Domain (which in fact offers no protection and > no long term warranty as it is reappropriatable at any time by > proprietary licences, even retrospectively, we see everyday companies > registering properties on pseudo-new technologies that are in fact > inherited from the past and are used since centuries or more by the > whole humanity, they leave some space only for today's current usages in > limtied scopes, but protect everything else by inventing some strange > concepts around the basic feature, with unfair claims and then want to > collect taxes). Also an international public domain does not exist at > all (it is always restricted by new additions to the copyright laws). > Publishing somethingf in the Public domain is really unsafe. > > 2016-10-09 5:35 GMT+02:00 Harshula >: > > On 09/10/16 13:50, Luke Dashjr wrote: > > On Sunday, October 09, 2016 12:08:05 AM Harshula wrote: > >> On 09/10/16 10:44, Luke Dashjr wrote: > >>> It's unfortunate they released it under the non-free OFL license. :( > > FSF appears to classify OFL as a Free license (though incompatible with > the GNU GPL & FDL): > https://www.gnu.org/licenses/license-list.en.html#Fonts > > > >> Which alternate license would you recommend? > > > > MIT license or LGPL seem reasonable and common among free fonts. Some also > > choose GPL, but AFAIK it's unclear how the LGPL vs GPL differences apply to > > fonts. > > Interestingly, Noto project saw advantages of OFL and moved to using it, > not too long ago: > https://github.com/googlei18n/noto-fonts/blob/master/NEWS > > > It seems you disagree with FSF's interpretation of the OFL and bundling > Hello World as being sufficient. Are there other reasons for your > preference for MIT/LGPL/GPL over OFL? > > > On Sunday, October 09, 2016 12:16:37 AM you wrote: > >> That's your definition of non-free then... If I were a font developer and > >> of mind to release my font for use without charge, I wouldn't want anyone > >> else to make money out of selling it when I myself - who put the effort > >> into preparing it - don't make money from selling it. So it protects the > >> moral rights of the developer. > > Why are you attributing Shriramana Sharma's email to me? It might be > clearer if you replied to his email. > > cya, > # From charupdate at orange.fr Fri Oct 14 08:17:14 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 14 Oct 2016 15:17:14 +0200 (CEST) Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: References: <1233897248-1476201141-cardhu_decombobulator_blackberry.rim.net-1791110519-@b13.c1.bise6.blackberry> Message-ID: <700499764.8011.1476451035080.JavaMail.www@wwinf1f34> On Tue, 11 Oct 2016 15:52:20 +0000, Don Osborn wrote: [?] > > The problem is input systems, not availability of fonts as it once was. Keyboard > layouts exist for Ga and other Ghanaian languages, and these enable typing needed > extended Latin characters. But a number of them, including possibly all for mobile > devices, work by substituting selected key assignments, which in the case of > multilingual text would apparently mean switching keyboards to accommodate > characters not present in both/all languages used. Not ideal. > [?] AIUI, what is drawing people away from getting able to efficiently input Extended Latin alongside with Basic Latin, is the fear of becoming unable to efficiently input digits as soon as these don?t show up in the Base shift state any longer. Thus IMHO it could be interesting for many more of the world?s languages to see that there is a good reason to depart from the typical layout pattern that has the digits in the Base shift state, and to see that this is in practice feasible inside the system input framework, which doesn?t have so much of the severe limitations that are often pointed. These mainly result from the appearance that the Windows keyboarding framework is given in the MSKLC UI, while the author of this useful software himself invited his users to expand the features by using the included Keyboard Table Generation Tool (Unicode) 3.40. So do I, FWIW. While still being very busy with the French keyboard layouts that I?m working on, I?m already able to share one more feature for keyboards that have the 102d/105th key, next to left Shift. It is obtained by mapping on this key e.g. the 0x10 modifier, and by allocating this new level to an emulated numerical keypad with hex digits beside Arabic digits, a comma key beside the decimal separator dot key, double and triple zero keys, the zero doubled on VK_0 to complete and to facilitate input of binary numbers, with % and $ and much more, and U+202F on the space bar. In many languages, this is used as a tousands separator, and in all languages before the unit (as in ?1,234.56?$?). This new ?Num? modifier is optional, as is the extra key proper to ISO keyboards. But I strongly recommend to always add the extra toggle I?ve already mentioned, on key E00 (or instead of Capitals Lock if this is disliked in the target locale). I believe that such keyboards will address the issue. Best regards, Marcel From doug at ewellic.org Fri Oct 14 10:33:48 2016 From: doug at ewellic.org (Doug Ewell) Date: Fri, 14 Oct 2016 08:33:48 -0700 Subject: Emoji end goal Message-ID: <20161014083348.665a7a7059d7ee80bb4d670165c8327d.3cdcef3df5.wbe@email03.godaddy.com> gfb hjjhjh wrote: > So, according to the emoji FAQ > , > the end goal of emoji is to have no emoji? Or something like > Softbank's escape sequence? > >> Q: What is the longer term plan for emoji? >> A: The Unicode Consortium encourages the use of embedded graphics >> (a.k.a. "stickers") as a longer-term solution, since they allow much >> more freedom of expression. See Longer Term Solutions >> in UTR #51 >> . There is a new emoji proposal [1] that cites the existence of "many apps and sticker packs" with the proposed image as one rationale for encoding it as a character. If ESC accepts this rationale, then the passage in UTR #51 cited above will not only be incorrect, it will have been turned on its ear. [1] http://www.unicode.org/L2/L2016/16280-breastfeeding-emoji.pdf -- Doug Ewell | Thornton, CO, US | ewellic.org From mjansche at google.com Fri Oct 14 12:07:23 2016 From: mjansche at google.com (Martin Jansche) Date: Fri, 14 Oct 2016 18:07:23 +0100 Subject: Amiguity(?) in Sinhala named sequences Message-ID: For Sinhala, the following named sequences are defined (for good reasons): SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll write Ya for 0DBA and Ra for 0DBB. Note that these give rise to two potentially ambiguous codepoint strings, namely 0DBB 0DCA 200D 0DBA 0DBB 0DCA 200D 0DBB I'll concentrate on the first, as all arguments apply to the second one analogously. At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible parses: 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya First question: Does the standard give any guidance as to which one is the intended parse? The section on Sinhala in the Unicode Standard is silent about this. Is there a general principle I'm missing? Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not used and is considered incorrect, suggesting that the second parse (Repaya+Ya) should be the default interpretation of this sequence. However, SLS 1134 does not address the potential ambiguity of this sequence explicitly and the description there could be read as informative, not normative. Second question: Given that one parse of this sequence should be the default, how does one represent the non-default parse? In most cases one can guess what the intended meaning is, but I suspect this is somewhat of a gray area. In practice, trying to render these problematic sequences and their neighbors in HarfBuzz with a variety of fonts results in a variety of outcomes (including occasionally unexpected glyph choices). If the meaning of these sequences is not well defined, that would partly explain the variation across fonts. Am I missing something fundamental? If not, it seems this issue should be called out explicit in some part of the standard. Regards, -- martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Fri Oct 14 13:09:28 2016 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 14 Oct 2016 11:09:28 -0700 Subject: Amiguity(?) in Sinhala named sequences In-Reply-To: References: Message-ID: <7c3ceb6b-c7b9-834d-3bf2-83037c3faeb0@ix.netcom.com> This is an interesting question. It seems the task of parsing a text into sequences depends on the purpose. Not all sequences of interest are named and, in the general case, not all attempts at parsing may be unique. In this case, it looks like the named sequences would correspond to a specific (ligated) glyph that matches a user-perceived unit of the writing system. Such a parsing task is akin to scanning, for example, strings using the Latin script for ligatures - while trying to emulate the rules that were in effect during days of hot metal typesetting for certain languages. For example, it wasn't enough to know that a certain cluster of letters might have a ligature glyph, one would also have to know whether the cluster straddled a (compound) word boundary or not. Just knowing the specification of ligated sequences alone would not be enough to identify a correct parse. Such rules, however, are usually not part of the Unicode standard. The situation here is similar; the standard simply specifies that a certain sequence of code points has a collective name. In case of ambiguities, you'll have to turn to external sources to resolve them. Now, if this isthe only such ambiguity (or one of a very small number) and if identification of the correct sequence is essential for selecting the correct rendering, I don't see why the script description for Sinhala couldn't be augmented to discuss that issue. In which case, the way to proceed is to assemble the full set of facts and submit them to the UTC using the reporting form on the website. A./ On 10/14/2016 10:07 AM, Martin Jansche wrote: > For Sinhala, the following named sequences are defined (for good reasons): > > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D > > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll > write Ya for 0DBA and Ra for 0DBB. > > Note that these give rise to two potentially ambiguous codepoint > strings, namely > > 0DBB 0DCA 200D 0DBA > 0DBB 0DCA 200D 0DBB > > I'll concentrate on the first, as all arguments apply to the second > one analogously. > > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible > parses: > > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya > > First question: Does the standard give any guidance as to which one is > the intended parse? The section on Sinhala in the Unicode Standard is > silent about this. Is there a general principle I'm missing? > > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not > used and is considered incorrect, suggesting that the second parse > (Repaya+Ya) should be the default interpretation of this sequence. > However, SLS 1134 does not address the potential ambiguity of this > sequence explicitly and the description there could be read as > informative, not normative. > > Second question: Given that one parse of this sequence should be the > default, how does one represent the non-default parse? > > In most cases one can guess what the intended meaning is, but I > suspect this is somewhat of a gray area. In practice, trying to render > these problematic sequences and their neighbors in HarfBuzz with a > variety of fonts results in a variety of outcomes (including > occasionally unexpected glyph choices). If the meaning of these > sequences is not well defined, that would partly explain the variation > across fonts. > > Am I missing something fundamental? If not, it seems this issue should > be called out explicit in some part of the standard. > > Regards, > -- martin From charupdate at orange.fr Sun Oct 16 12:08:59 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Sun, 16 Oct 2016 19:08:59 +0200 (CEST) Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> Message-ID: <1995960297.7266.1476637739730.JavaMail.www@wwinf1f09> On 11 Oct 2016 09:48:00 -0700, Doug Ewell wrote: [?] > > You mentioned mobile devices, but also mentioned ISO/IEC 9995 and 14755, > which seem to deal primarily with computer keyboards. > > On Windows, John Cowan's Moby Latin keyboard [1] allows the input of > more than 800 non-ASCII characters, including the two mentioned in your > post (? and ?): > > AltGr+p, o 0254 LATIN SMALL LETTER OPEN O > AltGr+p, e 025B LATIN SMALL LETTER OPEN E > > Moby Latin is a strict superset of the standard U.S. English keyboard; > that is, none of the standard keystrokes were redefined, unlike > keyboards such as United States-International which tend to redefine > keys for ASCII characters that look like diacritical marks, making > adoption difficult. There are also versions of Moby based on the > standard U.K. keyboard. > > [1] > http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html > U.S. Moby Latin and Whacking Latin keyboard driver packages are not available any more. What happened? Neither can John Cowan?s home pae be accessed: http://home.ccil.org/%7Ecowan/XML/ Though the Chester County Interlink host is not down. Still the ReadMe can be accessed, from another domain: http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html From charupdate at orange.fr Sun Oct 16 12:31:46 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Sun, 16 Oct 2016 19:31:46 +0200 (CEST) Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> Message-ID: <2082406741.7551.1476639106475.JavaMail.www@wwinf1f09> I guess that Moby Latin is now being reengineered, see: http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html#vietnamese ?These assignments are considered temporary, and will be reconsidered when the Microsoft program used to generate Moby Latin can handle serial dead keys.? Obviously the Microsoft program used to generate will be KbdUTool, the Microsoft Keyboard Table Generation Tool (Unicode). I?m so glad that now what many people were waiting for, serial dead keys, is going to become a common feature on Windows. All the best, Marcel On Sun, 16 Oct 2016 19:08:59 +0200 (CEST), I wrote: [?] > U.S. Moby Latin and Whacking Latin keyboard driver packages > are not available any more. What happened? > Neither can John Cowan?s home pae be accessed: > http://home.ccil.org/%7Ecowan/XML/ > Though the Chester County Interlink host is not down. > Still the ReadMe can be accessed, from another domain: > http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html From mark at kli.org Sun Oct 16 13:25:34 2016 From: mark at kli.org (Mark E. Shoulson) Date: Sun, 16 Oct 2016 14:25:34 -0400 Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: <1995960297.7266.1476637739730.JavaMail.www@wwinf1f09> References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> <1995960297.7266.1476637739730.JavaMail.www@wwinf1f09> Message-ID: I have the rare good fortune to see John Cowan on a near-daily basis (except this month, with all the Jewish Holidays); I'll forward your message on. ~mark On 10/16/2016 01:08 PM, Marcel Schneider wrote: > On 11 Oct 2016 09:48:00 -0700, Doug Ewell wrote: > [?] >> You mentioned mobile devices, but also mentioned ISO/IEC 9995 and 14755, >> which seem to deal primarily with computer keyboards. >> >> On Windows, John Cowan's Moby Latin keyboard [1] allows the input of >> more than 800 non-ASCII characters, including the two mentioned in your >> post (? and ?): >> >> AltGr+p, o 0254 LATIN SMALL LETTER OPEN O >> AltGr+p, e 025B LATIN SMALL LETTER OPEN E >> >> Moby Latin is a strict superset of the standard U.S. English keyboard; >> that is, none of the standard keystrokes were redefined, unlike >> keyboards such as United States-International which tend to redefine >> keys for ASCII characters that look like diacritical marks, making >> adoption difficult. There are also versions of Moby based on the >> standard U.K. keyboard. >> >> [1] >> http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html >> > U.S. Moby Latin and Whacking Latin keyboard driver packages > are not available any more. What happened? > Neither can John Cowan?s home pae be accessed: > http://home.ccil.org/%7Ecowan/XML/ > Though the Chester County Interlink host is not down. > Still the ReadMe can be accessed, from another domain: > http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html From doug at ewellic.org Sun Oct 16 13:25:27 2016 From: doug at ewellic.org (Doug Ewell) Date: Sun, 16 Oct 2016 12:25:27 -0600 Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: <2082406741.7551.1476639106475.JavaMail.www@wwinf1f09> References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> <2082406741.7551.1476639106475.JavaMail.www@wwinf1f09> Message-ID: <194ACB1C3362402CB0CC03D79BFF3A8F@DougEwell> Marcel Schneider wrote: > I guess that Moby Latin is now being reengineered, see: > > http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html#vietnamese That's Caoimh?n ? Donna?le's mirror of the readme file for Whacking Latin, the UK version of Moby Latin. I don't see anything about re-engineering it, but maybe I missed something. > Obviously the Microsoft program used to generate will be > KbdUTool, the Microsoft Keyboard Table Generation Tool (Unicode). Yes, via MSKLC. > I?m so glad that now what many people were waiting for, > serial dead keys, is going to become a common feature > on Windows. I would be glad to see that too, but where do you see that on the referenced page? All I see is John's original text about working around the MSKLC limitation. If you want to work directly with KbdUTool to get serial dead keys, bypassing MSKLC, here is Kaplan's post from 2011 on how to do this. Be sure to read all the warnings twice: http://archives.miloush.net/michkap/archive/2011/04/16/10154700.html -- Doug Ewell | Thornton, CO, US | ewellic.org From charupdate at orange.fr Sun Oct 16 15:59:01 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Sun, 16 Oct 2016 22:59:01 +0200 (CEST) Subject: Wogb3 j3k3: Pre-Unicode substitutions for extended characters live on In-Reply-To: <194ACB1C3362402CB0CC03D79BFF3A8F@DougEwell> References: <20161011094800.665a7a7059d7ee80bb4d670165c8327d.0d1531102f.wbe@email03.godaddy.com> <2082406741.7551.1476639106475.JavaMail.www@wwinf1f09> <194ACB1C3362402CB0CC03D79BFF3A8F@DougEwell> Message-ID: <1031266513.10647.1476651541641.JavaMail.www@wwinf1f09> On Sun, 16 Oct 2016 14:25:34 -0400, Mark E. Shoulson wrote: > I have the rare good fortune to see John Cowan on a near-daily basis > (except this month, with all the Jewish Holidays); I'll forward your > message on. Thank you. On Sun, 16 Oct 2016 12:25:27 -0600, Doug Ewell wrote: > Marcel Schneider wrote: > > > I guess that Moby Latin is now being reengineered, see: > > > > http://www.smo.uhi.ac.uk/gaidhlig/sracan/Whacking/MobyLatinKeyboard.html#vietnamese > > That's Caoimh?n ? Donna?le's mirror of the readme file for Whacking > Latin, the UK version of Moby Latin. I don't see anything about > re-engineering it, but maybe I missed something. Right, it isn?t talking about re-engineering. ?Reconsidered? is not re-engineered. Though I still guess that the author is doing much more now. I remembered this sentence from having read it when you?d shared Moby Latin here. > > > Obviously the Microsoft program used to generate will be > > KbdUTool, the Microsoft Keyboard Table Generation Tool (Unicode). > > Yes, via MSKLC. Then there would be nothing to be reconsidered. I?ve in mind using the -s flag to generate the C sources, then setting these read-only once edited. > > > I?m so glad that now what many people were waiting for, > > serial dead keys, is going to become a common feature > > on Windows. > > I would be glad to see that too, but where do you see that on the > referenced page? All I see is John's original text about working around > the MSKLC limitation. By experiencing the current use of KbdUTool (via a script in batch that I?ve written with a comfortable UI for end-users), I feel myself in a position to extrapolate this from John Cowan?s wording of the disclaimer: ?These assignments are considered temporary, and will be reconsidered when the Microsoft program used to generate Moby Latin can handle serial dead keys.? It doesn?t say what program. Just ?the Microsoft program used.? If today, this variable is set to 'KbdUTool' instead of 'MSKLC', then suddenly the Microsoft program ?can handle serial dead keys.? > > If you want to work directly with KbdUTool to get serial dead keys, > bypassing MSKLC, here is Kaplan's post from 2011 on how to do this. Be > sure to read all the warnings twice: > > http://archives.miloush.net/michkap/archive/2011/04/16/10154700.html Thank you for this link. This is what I should refer to when citing the feature. There is the test issue, that seems rather awesome. Does a working layout driver prove that there is no known bug? I?m actually using sucb a working layout driver. E.g. pressing the Acute dead key twice, then ?o?, inserts ???. I?d suggest not to do this in the .klc file, too complicated through its apparent simplicity because the diacritic doesn?t show up on each line. Kind regards, Marcel From harshula at hj.id.au Sun Oct 16 18:15:57 2016 From: harshula at hj.id.au (Harshula) Date: Mon, 17 Oct 2016 10:15:57 +1100 Subject: Amiguity(?) in Sinhala named sequences In-Reply-To: References: Message-ID: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au> Hi Martin, On 15/10/16 04:07, Martin Jansche wrote: > For Sinhala, the following named sequences are defined (for good reasons): > > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D > > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll > write Ya for 0DBA and Ra for 0DBB. > > Note that these give rise to two potentially ambiguous codepoint > strings, namely > > 0DBB 0DCA 200D 0DBA > 0DBB 0DCA 200D 0DBB > > I'll concentrate on the first, as all arguments apply to the second one > analogously. > > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible parses: > > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya > > First question: Does the standard give any guidance as to which one is > the intended parse? The section on Sinhala in the Unicode Standard is > silent about this. Is there a general principle I'm missing? > > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not > used and is considered incorrect, suggesting that the second parse > (Repaya+Ya) should be the default interpretation of this sequence. > However, SLS 1134 does not address the potential ambiguity of this > sequence explicitly and the description there could be read as > informative, not normative. 1) re: 0DBB 0DCA 200D 0DBA SLS 1134 was updated in 2011 (The latest public version I could find is v3.41. This extract is the same in v3.6.): https://sourceforge.net/p/sinhala/mailman/attachment/4D957C56.5050204 at cse.mrt.ac.lk/1/ "1. The yansaya is not used following the letter ?. e.g.: the spelling ??????? is incorrect." If the above is insufficient, it's best to discuss the issue with Harsha (CC'd) and Ruvan (CC'd). 2) re: 0DBB 0DCA 200D 0DBB Harsha & Ruvan can clarify this too. cya, # > Second question: Given that one parse of this sequence should be the > default, how does one represent the non-default parse? > > In most cases one can guess what the intended meaning is, but I suspect > this is somewhat of a gray area. In practice, trying to render these > problematic sequences and their neighbors in HarfBuzz with a variety of > fonts results in a variety of outcomes (including occasionally > unexpected glyph choices). If the meaning of these sequences is not well > defined, that would partly explain the variation across fonts. > > Am I missing something fundamental? If not, it seems this issue should > be called out explicit in some part of the standard. > > Regards, > -- martin From cibucj at gmail.com Sun Oct 16 22:12:54 2016 From: cibucj at gmail.com (=?UTF-8?B?4LS44LS/4LSs4LWBIOKAjA==?=) Date: Mon, 17 Oct 2016 04:12:54 +0100 Subject: Amiguity(?) in Sinhala named sequences In-Reply-To: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au> References: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au> Message-ID: Hi Martin, Isn't this question analogous to asking whether the layout engine should use C1-conjoining form or C2-conjoining form for a sequence in any indic? that is, whether the should form a glyph while C2 keeping its independent form or vice versa. (Potentially there can be more forms - that is, full ligature and explicit Virama form). If the question you asked is equivalent, then the answer is traditionally is left to the font to decide. BTW, even for a given C1 and C2 for a given script, a font can potentially choose a different answer based on its its purpose/character, like a font for Malayalam traditional script Vs a font for reformed script. regards, Cibu On Mon, Oct 17, 2016 at 12:15 AM, Harshula wrote: > Hi Martin, > > On 15/10/16 04:07, Martin Jansche wrote: > > For Sinhala, the following named sequences are defined (for good > reasons): > > > > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA > > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB > > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D > > > > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll > > write Ya for 0DBA and Ra for 0DBB. > > > > Note that these give rise to two potentially ambiguous codepoint > > strings, namely > > > > 0DBB 0DCA 200D 0DBA > > 0DBB 0DCA 200D 0DBB > > > > I'll concentrate on the first, as all arguments apply to the second one > > analogously. > > > > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible > parses: > > > > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya > > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya > > > > First question: Does the standard give any guidance as to which one is > > the intended parse? The section on Sinhala in the Unicode Standard is > > silent about this. Is there a general principle I'm missing? > > > > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not > > used and is considered incorrect, suggesting that the second parse > > (Repaya+Ya) should be the default interpretation of this sequence. > > However, SLS 1134 does not address the potential ambiguity of this > > sequence explicitly and the description there could be read as > > informative, not normative. > > 1) re: 0DBB 0DCA 200D 0DBA > > SLS 1134 was updated in 2011 (The latest public version I could find is > v3.41. This extract is the same in v3.6.): > https://sourceforge.net/p/sinhala/mailman/attachment/ > 4D957C56.5050204 at cse.mrt.ac.lk/1/ > > "1. The yansaya is not used following the letter ?. e.g.: the spelling > ??????? is incorrect." > > If the above is insufficient, it's best to discuss the issue with Harsha > (CC'd) and Ruvan (CC'd). > > 2) re: 0DBB 0DCA 200D 0DBB > > Harsha & Ruvan can clarify this too. > > cya, > # > > > > Second question: Given that one parse of this sequence should be the > > default, how does one represent the non-default parse? > > > > In most cases one can guess what the intended meaning is, but I suspect > > this is somewhat of a gray area. In practice, trying to render these > > problematic sequences and their neighbors in HarfBuzz with a variety of > > fonts results in a variety of outcomes (including occasionally > > unexpected glyph choices). If the meaning of these sequences is not well > > defined, that would partly explain the variation across fonts. > > > > Am I missing something fundamental? If not, it seems this issue should > > be called out explicit in some part of the standard. > > > > Regards, > > -- martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjansche at google.com Mon Oct 17 09:58:13 2016 From: mjansche at google.com (Martin Jansche) Date: Mon, 17 Oct 2016 15:58:13 +0100 Subject: Amiguity(?) in Sinhala named sequences In-Reply-To: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au> References: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au> Message-ID: Thanks for the pointer to the 2011 version of SLS 1134. After reading that and discussing further with Cibu, here's a tentative proposal: * The most logical[*] interpretation of the sequence 0DBB 0DCA 200D 0DBA is as Repaya+Ya. A standard (Unicode and/or SLS) should call this out explicitly. ([*]Logical: In other scripts, including Devanagari, Myanmar, etc. similar types of modifiers that logically precede a letter are represented in this way, sometimes without ZWJ or with a different character in lieu of ZWJ. Also this interpretation plays well alongside a hypothetical alternative encoding of Yansaya using a single codepoint.) * A standard (Unicode and/or SLS) should specify how Ra+Yansaya should be encoded. SLS 1134 points out that Ra+Yansaya is an incorrect spelling, yet in order to make this point it has to show the glyph sequence for Ra+Yansaya. So there is clearly some need to be able to render this, even if it's only at this meta-linguistic level. Plus SLS 1134 is very explicit that e.g. keyboarding should allow for letter combinations to be entered even if they are not practically useful. One possible way of encoding Ra+Yansaya is 0DBB 200C 0DCA 200D 0DBA, i.e. Ra ZWNJ Yansaya. This renders as intended in HarfBuzz with NotoSansSinhala, but not with LBhashitaComplex. If we had a clear directive regarding how Ra+Yansaya should be represented, we could work on getting fonts updated. * Everything about 0DBB 0DCA 200D 0DBA also applies to 0DBB 0DCA 200D 0DBB. This is much less relevant in practice, but the same arguments about ambiguity apply and should be resolved in the same way. Regards, -- martin On Mon, Oct 17, 2016 at 12:15 AM, Harshula wrote: > Hi Martin, > > On 15/10/16 04:07, Martin Jansche wrote: > > For Sinhala, the following named sequences are defined (for good > reasons): > > > > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA > > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB > > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D > > > > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll > > write Ya for 0DBA and Ra for 0DBB. > > > > Note that these give rise to two potentially ambiguous codepoint > > strings, namely > > > > 0DBB 0DCA 200D 0DBA > > 0DBB 0DCA 200D 0DBB > > > > I'll concentrate on the first, as all arguments apply to the second one > > analogously. > > > > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible > parses: > > > > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya > > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya > > > > First question: Does the standard give any guidance as to which one is > > the intended parse? The section on Sinhala in the Unicode Standard is > > silent about this. Is there a general principle I'm missing? > > > > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not > > used and is considered incorrect, suggesting that the second parse > > (Repaya+Ya) should be the default interpretation of this sequence. > > However, SLS 1134 does not address the potential ambiguity of this > > sequence explicitly and the description there could be read as > > informative, not normative. > > 1) re: 0DBB 0DCA 200D 0DBA > > SLS 1134 was updated in 2011 (The latest public version I could find is > v3.41. This extract is the same in v3.6.): > https://sourceforge.net/p/sinhala/mailman/attachment/ > 4D957C56.5050204 at cse.mrt.ac.lk/1/ > > "1. The yansaya is not used following the letter ?. e.g.: the spelling > ??????? is incorrect." > > If the above is insufficient, it's best to discuss the issue with Harsha > (CC'd) and Ruvan (CC'd). > > 2) re: 0DBB 0DCA 200D 0DBB > > Harsha & Ruvan can clarify this too. > > cya, > # > > > > Second question: Given that one parse of this sequence should be the > > default, how does one represent the non-default parse? > > > > In most cases one can guess what the intended meaning is, but I suspect > > this is somewhat of a gray area. In practice, trying to render these > > problematic sequences and their neighbors in HarfBuzz with a variety of > > fonts results in a variety of outcomes (including occasionally > > unexpected glyph choices). If the meaning of these sequences is not well > > defined, that would partly explain the variation across fonts. > > > > Am I missing something fundamental? If not, it seems this issue should > > be called out explicit in some part of the standard. > > > > Regards, > > -- martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Mon Oct 17 11:52:48 2016 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 17 Oct 2016 09:52:48 -0700 Subject: Amiguity(?) in Sinhala named sequences In-Reply-To: References: <9c737258-14c4-092d-d0fe-3d1f1ca8f10a@hj.id.au> Message-ID: An HTML attachment was scrubbed... URL: From sr.erickson at gmail.com Fri Oct 21 12:11:36 2016 From: sr.erickson at gmail.com (seth erickson) Date: Fri, 21 Oct 2016 10:11:36 -0700 Subject: Historical question about 'universal signs' Message-ID: Greetings Unicoders, I'm trying to find information (for research purposes) about a character set mentioned in Joseph Becker's 1988 draft proposal [1]: "In 1978, the initial proposal for a set of 'Universal Signs' was made by Bob Belleville at Xerox PARC. Many persons contributed ideas to the development of a new encoding design. Beginning in 1980, these efforts evolved into the Xerox Character Code Standard (XCCS) [...]" XCCS is fairly well documented but I'm having trouble finding anything about the proposal by Bob Belleville. Any pointers would be appreciated. Thanks, Seth Erickson PhD student Department of Information Studies University of California, Los Angeles [1] http://unicode.org/history/unicode88.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Sun Oct 23 12:01:29 2016 From: doug at ewellic.org (Doug Ewell) Date: Sun, 23 Oct 2016 11:01:29 -0600 Subject: XCCS (was: Historical question about 'universal signs') In-Reply-To: References: Message-ID: seth erickson wrote: > XCCS is fairly well documented That hasn't been my experience. I'd be interested in any links you can forward that go beyond "Unicode built on" or "drew ideas from" or "was influenced by" XCCS. Thanks, -- Doug Ewell | Thornton, CO, US | ewellic.org From sr.erickson at gmail.com Mon Oct 24 23:20:06 2016 From: sr.erickson at gmail.com (seth erickson) Date: Mon, 24 Oct 2016 21:20:06 -0700 Subject: XCCS (was: Historical question about 'universal signs') In-Reply-To: References: Message-ID: See pg. 57-63 of this: Xerox. (1985). *Xerox System Network Architecture: General Information Manua*l (No. XNSG 068504). Retrieved from http://archive.org/details/bitsavers_xeroxxnsXNNetworkArchitectureGeneralInformationMan_10024221 SE On Sun, Oct 23, 2016 at 10:01 AM, Doug Ewell wrote: > seth erickson wrote: > > XCCS is fairly well documented >> > > That hasn't been my experience. I'd be interested in any links you can > forward that go beyond "Unicode built on" or "drew ideas from" or "was > influenced by" XCCS. > > Thanks, > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shizhao at gmail.com Thu Oct 27 06:48:23 2016 From: shizhao at gmail.com (shi zhao) Date: Thu, 27 Oct 2016 11:48:23 +0000 Subject: about China Font Bank Message-ID: from http://www.nytimes.com/2016/10/25/opinion/chinas-digital-soft-power-play.html?_r=0 This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form. The project highlights 100,000 characters from the country?s 56 ethnic minorities, and another 100,000 rare and ancient characters from China?s written corpus. Deploying almost 30 companies, institutions and universities, it?s the largest state-funded digitization project ever undertaken. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john at mitre.org Thu Oct 27 09:13:35 2016 From: john at mitre.org (Burger, John D.) Date: Thu, 27 Oct 2016 10:13:35 -0400 Subject: about China Font Bank In-Reply-To: References: Message-ID: <328CAB25-BC48-40E8-8AAC-D3156AA55940@mitre.org> Language Log has a good article on this, including reactions from several sinographers: http://languagelog.ldc.upenn.edu/nll/?p=29034 - JB > On Oct 27, 2016, at 07:48, shi zhao wrote: > > from http://www.nytimes.com/2016/10/25/opinion/chinas-digital-soft-power-play.html?_r=0 > > This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form. > > The project highlights 100,000 characters from the country?s 56 ethnic minorities, and another 100,000 rare and ancient characters from China?s written corpus. Deploying almost 30 companies, institutions and universities, it?s the largest state-funded digitization project ever undertaken. -------------- next part -------------- An HTML attachment was scrubbed... URL: