From kenwhistler at att.net Fri Feb 3 16:54:29 2017 From: kenwhistler at att.net (Ken Whistler) Date: Fri, 3 Feb 2017 14:54:29 -0800 Subject: Indic Syllabic Category of U+11134 CHAKMA MAAYYAA In-Reply-To: <20170203223513.43b8ac8e@JRWUBU2> References: <20170203020033.235ee723@JRWUBU2> <20170203223513.43b8ac8e@JRWUBU2> Message-ID: <1d80fa62-783e-203b-b8bc-da3d8322f711@att.net> Richard, On 2/3/2017 2:35 PM, Richard Wordingham wrote: > Except that the added annotation "also used distinctly as a gemination > mark which can occur with vowels" also applies to U+103A MYANMAR SIGN > ASAT. TUS 9.0 Section 16.3 Myanmar calls the base 'double-acting' > rather than 'geminate', but it's pretty much the same thing. ASAT also > has functions that are unrelated to the presence of closed syllables - > Brahmi length mark in the compound vowel symbol for AU and part of a > Karen tone mark. Why don't you drop that in the feedback hopper, so the UTC sees and reviews it in May. It should be unproblematic to add a simple annotation like that to the names list -- and if you have a suggestion for updating the text in Section 16.3 for this, that would be good, too. --Ken From richard.wordingham at ntlworld.com Sat Feb 4 15:54:11 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 4 Feb 2017 21:54:11 +0000 Subject: Regex for the USE to Handle Tai Tham Message-ID: <20170204215411.1a447de1@JRWUBU2> I'm not sure if this is the right forum for the question; if not, please advise me where I should take the problem for public discussion. The immediate problem is that the Universal Shaping Engine (USE) uses a regular expression for Indic orthographic syllables that doesn't cover the common CVC orthographic syllables of the Tai Tham script, let alone the rarer CVCVC orthographic syllables. In his paper earlier this year, 'Making fonts for the Universal Shaping Engine' (available at http://tiro.com/John/Universal_Shaping_Engine_TYPOLabs.pdf), John Hudson reported, "It?s called the Universal Shaping Engine, then, not because it shapes all scripts, but because it uses a universal model. Of course, as soon as you declare that you have a universal model, someone comes along with an exception to that model. In this case, the exception is the Tai Tham or Lanna script of northern Thailand, which uses subjoined consonants in ways that may compress multiple syllables into a single cluster, causing recursion in cluster analysis. It remains to be seen whether Tai Tham can be accommodated with exception code in the Universal Shaping Engine, or will need to be passed to a script-specific engine." Does anyone know what the problem is that caused the complaint that Tai Tham needs "recursion in cluster analysis"? For syllables without a dangling stacking control code, the regular expression is similar (see https://www.microsoft.com/typography/OpenTypeDev/USE/intro.htm#clustervalidation for the precise form) to base subscript* vowel* final* where subscript = medial | consonant_subjoined | subjoiner consonant subjoiner = virama | coeng final = final_consonant I have omitted various modifiers for clarity. Now, the obvious generalisation to cover the Tai Tham script (and, incidentally, the Khmer script) is base (subscript* vowel* final2*)* where final2 = final | subjoiner consonant Now, I see iteration here, but we had it before, so I don't know what the problematic 'recursion' is. I can make various guesses. Perhaps the regex needs to be 'unambiguous'. Perhaps it needs to be 'deterministic', i.e. each character can be matched to an element of the regex as soon as encountered. Perhaps the problem is just that the regex encourages backtracking. These possible issues all seem soluble, so please, someone, what is the problem? Richard. From A.Schappo at lboro.ac.uk Sun Feb 5 06:48:21 2017 From: A.Schappo at lboro.ac.uk (Andre Schappo) Date: Sun, 5 Feb 2017 12:48:21 +0000 Subject: RFCs go Unicode Message-ID: <3EFA81C3-1AA6-4690-89D7-80D5533668B0@lboro.ac.uk> RFC 7997: The Use of Non-ASCII Characters in RFCs https://tools.ietf.org/pdf/rfc7997.pdf I especially like recommendation to allow person names to be written in their native scripts. The native script form is placed first and thus treated as primary, as it should be IMHO. The romanised/ASCII form is placed second and bracketed, and thus treated as secondary, as it should be IMHO. eg from section 3.2 ?? (Q. Wu) Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sun Feb 5 09:59:49 2017 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 5 Feb 2017 07:59:49 -0800 Subject: RFCs go Unicode In-Reply-To: <3EFA81C3-1AA6-4690-89D7-80D5533668B0@lboro.ac.uk> References: <3EFA81C3-1AA6-4690-89D7-80D5533668B0@lboro.ac.uk> Message-ID: That's great news. It will be so much clearer to be able to have examples with the real characters in them, and to be able to acknowledge the work of authors with the real forms of their names. Mark On Sun, Feb 5, 2017 at 4:48 AM, Andre Schappo wrote: > RFC 7997: The Use of Non-ASCII Characters in RFCs https://tools.ietf.org/ > pdf/rfc7997.pdf > > I especially like recommendation to allow person names to be written in > their native scripts. The native script form is placed first and thus > treated as primary, as it should be IMHO. The romanised/ASCII form is > placed second and bracketed, and thus treated as secondary, as it should be > IMHO. > > eg from section 3.2 > > ?? (Q. Wu) > > Andr? Schappo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.muller at efele.net Tue Feb 7 12:08:49 2017 From: eric.muller at efele.net (Eric Muller) Date: Tue, 7 Feb 2017 10:08:49 -0800 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> Message-ID: In looking at the wiki{pedia,book.source,tionary} corpus for Bengla, I see a relatively large number of syllables with <... 09BF 09BE> or <... 09BF 09C0>. I checked a couple of sources, and I did not find them listed anywhere as being normally used. Are they in normal use or are those all typos? I did not find any occurrence in the Assamese corpus. Thanks, Eric. The syllables (o is the number of occurrences): From manish at mozilla.com Tue Feb 7 14:22:44 2017 From: manish at mozilla.com (Manish Goregaokar) Date: Tue, 7 Feb 2017 12:22:44 -0800 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: Not a Bangla speaker, but they look like typos to me too. Only certain vowel diacritics double up in Indic languages (e.g. anusvaras). I'm not sure how you would even pronounce such sounds. I suppose such combinations of diacritics could be used to represent dipthongs in words from other languages, but some of these dipthongs already exist in the regular script. I found things like this[1] on wikisource which seems like an OCR of some really garbled text. The text does indeed seem like it has additional vowel diacritics, but that could also be a scanning glitch. The same word appears twice in the document, but once in the text. Another sequence I found in [2][3] seems to only happen when the text is really garbled. All of these documents have random Latin stuff interspersed in the OCR, and sometimes Devanagri. [2] even has a Han character at the end. I think it's just an OCR algorithm handling garbled Bangla text poorly. Such an algorithm might have a tendency to produce certain specific invalid sequences like the ones listed in your email. Might want to double-check with a native Bangla speaker. Thanks, -Manish [1]: https://bn.wikisource.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%A4%E0%A6%BE:%E0%A6%B0%E0%A6%BE%E0%A6%AE%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%A3%E0%A6%AE%E0%A7%8D%E2%80%8C_-_%E0%A6%AA%E0%A6%9E%E0%A7%8D%E0%A6%9A%E0%A6%BE%E0%A6%A8%E0%A6%A8_%E0%A6%A4%E0%A6%B0%E0%A7%8D%E0%A6%95%E0%A6%B0%E0%A6%A4%E0%A7%8D%E0%A6%A8.pdf/%E0%A7%A7%E0%A7%A9%E0%A7%A7%E0%A7%A7 [2]: https://bn.wikisource.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%A4%E0%A6%BE:%E0%A6%AC%E0%A6%BF%E0%A6%B6%E0%A7%8D%E0%A6%AC%E0%A6%95%E0%A7%8B%E0%A6%B7_%E0%A6%A8%E0%A6%AC%E0%A6%AE_%E0%A6%96%E0%A6%A3%E0%A7%8D%E0%A6%A1.djvu/%E0%A7%AD%E0%A7%AD%E0%A7%A6 [3]: https://bn.wikisource.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%A4%E0%A6%BE:%E0%A6%B6%E0%A6%BF%E0%A6%95%E0%A7%8D%E0%A6%B7%E0%A6%BE%E0%A6%AC%E0%A6%BF%E0%A6%A7%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%95_%E0%A6%AA%E0%A7%8D%E0%A6%B0%E0%A6%B8%E0%A7%8D%E0%A6%A4%E0%A6%BE%E0%A6%AC.pdf/%E0%A7%A7%E0%A7%AD%E0%A7%AE -Manish On Tue, Feb 7, 2017 at 10:08 AM, Eric Muller wrote: > In looking at the wiki{pedia,book.source,tionary} corpus for Bengla, I see a > relatively large number of syllables with <... 09BF 09BE> or <... 09BF > 09C0>. I checked a couple of sources, and I did not find them listed > anywhere as being normally used. > > Are they in normal use or are those all typos? > > I did not find any occurrence in the Assamese corpus. > > Thanks, > Eric. > > The syllables (o is the number of occurrences): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > o='54'/> > > > > > > > > > > > > > > > > > o='93'/> > o='171'/> > > o='238'/> > o='79'/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > o='75'/> > > > > > o='157'/> > > > > > > > > > o='125'/> > o='118'/> > o='58'/> > > > From richard.wordingham at ntlworld.com Tue Feb 7 15:46:13 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 7 Feb 2017 21:46:13 +0000 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: <20170207214613.21adb970@JRWUBU2> On Tue, 7 Feb 2017 12:22:44 -0800 Manish Goregaokar wrote: > I found things like this[1] on wikisource which seems like an OCR of > some really garbled text. The text does indeed seem like it has > additional vowel diacritics, but that could also be a scanning glitch. > The same word appears twice in the document, but once in the text. In particular, the two sequences look like misinterpreted U+09CB BENGALI VOWEL SIGN O and U+09CC BENGALI VOWEL SIGN AU, which would account for their high frequency. The OCRed texts cited by Manish seem to be in acute need of manual correction. Richard. From asmusf at ix.netcom.com Tue Feb 7 21:53:45 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 7 Feb 2017 19:53:45 -0800 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hkkejfcljlndkpdl.png Type: image/png Size: 3143 bytes Desc: not available URL: From manish at mozilla.com Tue Feb 7 23:38:26 2017 From: manish at mozilla.com (Manish Goregaokar) Date: Tue, 7 Feb 2017 21:38:26 -0800 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: > The very first one ???? (0995 09BF 09C0) had 1090 hits and shows up in a book of short stories: That's bad OCR, that's an apostrophe, a Ka, and an E, with the apostrophe being interpreted as a matra somehow. I bet there are only a couple of OCR algorithms out there handling Bangla. Indic scripts aren't something you can OCR glyph by glyph in such a straightforward way due to ligatures, so these algorithms are probably noticing components of a character and producing it. It sees a preceding line and the curve above, and interprets that as an I. It also sees the proceeding line and curve above, and interprets that as an EE. It then just puts the two together. It shouldn't, but it does. Given a small set of OCR algorithms I think it's reasonable to assume that such aberrations would be common across outputs -- so hundreds of hits for a typo doesn't sound out of the ordinary to me. > Tried a random one: ??? (0998 09BF 09BE) I went through the results for ??? (0998 09BF 09BE). Most occurrences are actually ????? (0998 09A8 09CD 099F 09BE), "ghanta" which can mean "hour" or "bell". Reasonably common word. These documents don't look scanned -- the text isn't garbled or anything, but it could be a cleaned up scanned document because I copied out some more of the text and there were similar aberrations all over the place. For example, in [1] the letter ? ("ba") is used frequently, but is written with a fancier script where it has an extra line through it. Many occurrences of it have been interpreted as sequences of vowel diacritics. The last line of the second-last stanza on page 5 has an absolutely ridiculous number of consecutive diacritics in the PDF text. [1]: http://yousigma.com/religionandphilosophy/poojasloka/Sri%20Hari%20Kathamruta%20Sara%20Datta%20Swatantrya%20Sandhi%20(Sri%20Jagannatha%20Vittala%20Dasaru)%20-%20Assamese.pdf -Manish On Tue, Feb 7, 2017 at 7:53 PM, Asmus Freytag wrote: > On 2/7/2017 10:08 AM, Eric Muller wrote: > > In looking at the wiki{pedia,book.source,tionary} corpus for Bengla, I > see a relatively large number of syllables with <... 09BF 09BE> or <... > 09BF 09C0>. I checked a couple of sources, and I did not find them listed > anywhere as being normally used. > > Are they in normal use or are those all typos? > > Tried a random one: ??? (0998 09BF 09BE) and got 385 hits in google. > Would surprise me if all of these were typos. > > The very first one ???? (0995 09BF 09C0) had 1090 hits and shows up in a > book of short stories: > > where it starts a paragraph. > > A./ > > > I did not find any occurrence in the Assamese corpus. > > Thanks, > Eric. > > The syllables (o is the number of occurrences): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > o='54'/> > > > > > > > > > > > > > > > > > o='93'/> > o='171'/> > > o='238'/> > o='79'/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > o='75'/> > > > > > o='157'/> > > > > > > > > > o='125'/> > o='118'/> > o='58'/> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hkkejfcljlndkpdl.png Type: image/png Size: 3143 bytes Desc: not available URL: From asmusf at ix.netcom.com Tue Feb 7 23:48:19 2017 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Tue, 7 Feb 2017 21:48:19 -0800 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: <18fafe41-3abe-c8c5-49db-664c3bb7588c@ix.netcom.com> On 2/7/2017 9:38 PM, Manish Goregaokar wrote: > > The very first one ???? (0995 09BF 09C0) had 1090 hits and shows up > in a book of short stories: > > That's bad OCR, that's an apostrophe, a Ka, and an E, with the > apostrophe being interpreted as a matra somehow. Interesting suggestion. Would explain a lot. A./ > > I bet there are only a couple of OCR algorithms out there handling > Bangla. Indic scripts aren't something you can OCR glyph by glyph in > such a straightforward way due to ligatures, so these algorithms are > probably noticing components of a character and producing it. It sees > a preceding line and the curve above, and interprets that as an I. It > also sees the proceeding line and curve above, and interprets that as > an EE. It then just puts the two together. It shouldn't, but it does. > > Given a small set of OCR algorithms I think it's reasonable to assume > that such aberrations would be common across outputs -- so hundreds of > hits for a typo doesn't sound out of the ordinary to me. > > > Tried a random one: ??? (0998 09BF 09BE) > > I went through the results for ??? (0998 09BF 09BE). Most occurrences > are actually ????? (0998 09A8 09CD 099F 09BE), "ghanta" which can mean > "hour" or "bell". Reasonably common word. These documents don't look > scanned -- the text isn't garbled or anything, but it could be a > cleaned up scanned document because I copied out some more of the text > and there were similar aberrations all over the place. For example, in > [1] the letter ? ("ba") is used frequently, but is written with a > fancier script where it has an extra line through it. Many occurrences > of it have been interpreted as sequences of vowel diacritics. The last > line of the second-last stanza on page 5 has an absolutely ridiculous > number of consecutive diacritics in the PDF text. > > > [1]: > http://yousigma.com/religionandphilosophy/poojasloka/Sri%20Hari%20Kathamruta%20Sara%20Datta%20Swatantrya%20Sandhi%20(Sri%20Jagannatha%20Vittala%20Dasaru)%20-%20Assamese.pdf > > > > -Manish > > On Tue, Feb 7, 2017 at 7:53 PM, Asmus Freytag > wrote: > > On 2/7/2017 10:08 AM, Eric Muller wrote: >> In looking at the wiki{pedia,book.source,tionary} corpus for >> Bengla, I see a relatively large number of syllables with <... >> 09BF 09BE> or <... 09BF 09C0>. I checked a couple of sources, and >> I did not find them listed anywhere as being normally used. >> >> Are they in normal use or are those all typos? > Tried a random one: ??? (0998 09BF 09BE) and got 385 hits in google. > Would surprise me if all of these were typos. > > The very first one ???? (0995 09BF 09C0) had 1090 hits and shows > up in a book of short stories: > > where it starts a paragraph. > > A./ > >> >> I did not find any occurrence in the Assamese corpus. >> >> Thanks, >> Eric. >> >> The syllables (o is the number of occurrences): >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > s='ত্ত্বিা' >> o='54'/> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > s='ন্ত্রিা' >> o='93'/> >> > s='ন্ত্রিী' >> o='171'/> >> >> > s='ন্দ্রিা' >> o='238'/> >> > s='ন্দ্রিী' >> o='79'/> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > s='ষ্ট্যিা' >> o='75'/> >> >> >> >> >> > s='স্ট্রিী' >> o='157'/> >> >> >> >> >> >> >> >> >> > s='ৰ্ত্তিা' >> o='125'/> >> > s='ৰ্ত্তিী' >> o='118'/> >> > s='ৰ্ম্মিা' >> o='58'/> >> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 3143 bytes Desc: not available URL: From richard.wordingham at ntlworld.com Wed Feb 8 00:40:27 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 8 Feb 2017 06:40:27 +0000 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: <20170208064027.2dc73cc9@JRWUBU2> On Tue, 7 Feb 2017 19:53:45 -0800 Asmus Freytag wrote: > On 2/7/2017 10:08 AM, Eric Muller wrote: > In looking at the wiki{pedia,book.source,tionary} corpus for Bengla, > I see a relatively large number of syllables with? <... 09BF 09BE> or > <... 09BF 09C0>. I checked a couple of sources, and I did not find > them listed anywhere as being normally used. > > Are they in normal use or are those all typos? > Tried a random one: ??? (0998 09BF 09BE) and got 385 hits in google. > Would surprise me if all of these were typos. >From the dotted circles and unassigned characters, I'm beginning to think they're all OCR errors or typos. There does seem to be the odd typo around. Tracking the more promising looking pages down, I found mostly OCR errors, but I did find one apparent typo - or conceivably genuine spelling. > The very first one ???? (0995 09BF 09C0) had 1090 hits and shows up > in a book of short stories: > > where it starts a paragraph. Well done. In the first entry I found on Google, http://sarbaharapath.com/wp-content/uploads/2016/05/%E0%A6%B8%E0%A6%BF%E0%A6%B0%E0%A6%BE%E0%A6%9C-%E0%A6%B8%E0%A6%BF%E0%A6%95%E0%A6%A6%E0%A6%BE%E0%A6%B0-%E0%A6%B0%E0%A6%9A%E0%A6%A8%E0%A6%BE_%E0%A6%AC%E0%A6%BF%E0%A6%AD%E0%A6%BF%E0%A6%A8%E0%A7%8D%E0%A6%A8-%E0%A6%B8%E0%A7%8D%E0%A6%A4%E0%A6%B0%E0%A7%87%E0%A6%B0-%E0%A6%B8%E0%A6%BE%E0%A6%82%E0%A6%97%E0%A6%A0%E0%A6%A8%E0%A6%BF%E0%A6%95-%E0%A6%8F%E0%A6%95%E0%A6%95%E0%A7%87%E0%A6%B0-%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A6%BE%E0%A6%AA%E0%A6%A4%E0%A7%8D%E0%A6%A4%E0%A6%BE%E0%A6%AE%E0%A7%82%E0%A6%B2%E0%A6%95-%E0%A6%A6%E0%A6%BE%E0%A7%9F%E0%A6%BF%E0%A6%A4%E0%A7%8D%E0%A6%AC.pdf , it appears to be an atrocious OCR error for ??? . However, if you're referring to Rabindranath Tagore's Golpo Samagra, via Google on https://books.google.co.uk/books?id=F8LfBwAAQBAJ, it is again a misreading by OCR, this time for the more forgivable '?? , which is why it occurs at the starts of paragraphs! Richard. From manish at mozilla.com Wed Feb 8 23:18:10 2017 From: manish at mozilla.com (Manish Goregaokar) Date: Wed, 8 Feb 2017 21:18:10 -0800 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: > For example, in [1] the letter ? ("ba") is used frequently, but is written with a fancier script where it has an extra line through it. I just realized that my interpretation here was wrong; that letter is actually the Assamese ?, and the document is in Assamese. I assume the OCR algorithm isn't aware of Assamese-only characters in this case. -Manish On Tue, Feb 7, 2017 at 9:38 PM, Manish Goregaokar wrote: > > The very first one ???? (0995 09BF 09C0) had 1090 hits and shows up in a > book of short stories: > > That's bad OCR, that's an apostrophe, a Ka, and an E, with the apostrophe > being interpreted as a matra somehow. > > I bet there are only a couple of OCR algorithms out there handling Bangla. > Indic scripts aren't something you can OCR glyph by glyph in such a > straightforward way due to ligatures, so these algorithms are probably > noticing components of a character and producing it. It sees a preceding > line and the curve above, and interprets that as an I. It also sees the > proceeding line and curve above, and interprets that as an EE. It then just > puts the two together. It shouldn't, but it does. > > Given a small set of OCR algorithms I think it's reasonable to assume that > such aberrations would be common across outputs -- so hundreds of hits for > a typo doesn't sound out of the ordinary to me. > > > Tried a random one: ??? (0998 09BF 09BE) > > I went through the results for ??? (0998 09BF 09BE). Most occurrences are > actually ????? (0998 09A8 09CD 099F 09BE), "ghanta" which can mean "hour" > or "bell". Reasonably common word. These documents don't look scanned -- > the text isn't garbled or anything, but it could be a cleaned up scanned > document because I copied out some more of the text and there were similar > aberrations all over the place. For example, in [1] the letter ? ("ba") is > used frequently, but is written with a fancier script where it has an extra > line through it. Many occurrences of it have been interpreted as sequences > of vowel diacritics. The last line of the second-last stanza on page 5 has > an absolutely ridiculous number of consecutive diacritics in the PDF text. > > > [1]: http://yousigma.com/religionandphilosophy/poojasloka/Sri%20Hari% > 20Kathamruta%20Sara%20Datta%20Swatantrya%20Sandhi%20(Sri% > 20Jagannatha%20Vittala%20Dasaru)%20-%20Assamese.pdf > > > -Manish > > On Tue, Feb 7, 2017 at 7:53 PM, Asmus Freytag > wrote: > >> On 2/7/2017 10:08 AM, Eric Muller wrote: >> >> In looking at the wiki{pedia,book.source,tionary} corpus for Bengla, I >> see a relatively large number of syllables with <... 09BF 09BE> or <... >> 09BF 09C0>. I checked a couple of sources, and I did not find them listed >> anywhere as being normally used. >> >> Are they in normal use or are those all typos? >> >> Tried a random one: ??? (0998 09BF 09BE) and got 385 hits in google. >> Would surprise me if all of these were typos. >> >> The very first one ???? (0995 09BF 09C0) had 1090 hits and shows up in a >> book of short stories: >> >> where it starts a paragraph. >> >> A./ >> >> >> I did not find any occurrence in the Assamese corpus. >> >> Thanks, >> Eric. >> >> The syllables (o is the number of occurrences): >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > o='54'/> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > o='93'/> >> > o='171'/> >> >> > o='238'/> >> > o='79'/> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > o='75'/> >> >> >> >> >> > o='157'/> >> >> >> >> >> >> >> >> >> > o='125'/> >> > o='118'/> >> > o='58'/> >> >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hkkejfcljlndkpdl.png Type: image/png Size: 3143 bytes Desc: not available URL: From moyogo at gmail.com Thu Feb 9 07:03:07 2017 From: moyogo at gmail.com (Denis Jacquerye) Date: Thu, 09 Feb 2017 13:03:07 +0000 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: In some cases, this seems to be intentional and some kind of alternative to using ?? <09AF 09BC>. See for example https://bn.wikipedia.org/wiki/??????_???????????_???_????? where ??????????? (with <09B0 09BF 09AF 09BC 09BE>) is used for ?Victoria? on most lines and ????????? (with <09B0 09BF 09BE>) on another line, or https://bn.wikipedia.org/wiki/????_???? where ?????????? (with <09B0 09BF 09AF 09BC 09BE>) is used for ?variant? on some lines and ???????? (with <09B0 09BF 09BE>) on other lines. I guess using ??? <09AF 09BC 09BE> is better, but that?s just my guess. On Tue, 7 Feb 2017 at 13:14 Eric Muller wrote: > In looking at the wiki{pedia,book.source,tionary} corpus for Bengla, I > see a relatively large number of syllables with <... 09BF 09BE> or <... > 09BF 09C0>. I checked a couple of sources, and I did not find them > listed anywhere as being normally used. > > Are they in normal use or are those all typos? > > I did not find any occurrence in the Assamese corpus. > > Thanks, > Eric. > > The syllables (o is the number of occurrences): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > o='54'/> > > > > > > > > > > > > > > > > > o='93'/> > o='171'/> > > o='238'/> > o='79'/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > o='75'/> > > > > > o='157'/> > > > > > > > > > o='125'/> > o='118'/> > o='58'/> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From root at unicode.org Sat Feb 11 15:25:03 2017 From: root at unicode.org (AppleID) Date: Sun, 12 Feb 2017 02:55:03 +0530 (IST) Subject: Account Status: On hold Message-ID: <20170211212503.84D9211899B8@digit9.co.in> An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sat Feb 11 18:45:45 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 12 Feb 2017 00:45:45 +0000 Subject: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0> In-Reply-To: References: Message-ID: <20170212004545.6dd94a92@JRWUBU2> On Tue, 7 Feb 2017 21:38:26 -0800 Manish Goregaokar wrote: > I went through the results for ??? (0998 09BF 09BE). Most occurrences > are actually ????? (0998 09A8 09CD 099F 09BE), "ghanta" which can > mean "hour" or "bell". Reasonably common word. These documents don't > look scanned -- the text isn't garbled or anything, but it could be a > cleaned up scanned document because I copied out some more of the > text and there were similar aberrations all over the place. I think OCR problems aren't the only cause. I had a detailed look at a PDF generated using Version 5.90 of the TrueType-outline Vrinda font (available with Windows 7, at least), found on the web at http://www.bsci-intl.org/sites/default/files/Terms%20of%20Implementation%20for%20Business%20Partners-Producers_2014_BN.pdf . The looking included decompressing the compressed streams in the file, which I haven't yet automated. The font name is visible in uncompressed part of the PDF. There was very little in the way of 'ActualText', so it seems that the actual text has to be deduced using ToUnicode entries. I looked for text allegedly matching ???? according to the Firefox (Version 51.0.1 as prepared for Ubuntu Xenial) 'preview' and analysed the second occurrence. It was on line 4 of p3, which I identified by a position in the page of y=684.22. The problem was that the ToUnicode mapping (Object 283 within the PDF) said glyph 0x0107 (=263) was for U+09BF BENGALI VOWEL SIGN I, whereas according to the cmap tables for the font, it is for U+09AE BENGALI LETTER MA, which is what I saw in the text as displayed. Now, although the font's post table has glyph names, I'm not sure that the semantics of names like "bn_ma" are defined. I suspect something went wrong when the PDF generator deduced corresponding characters from the GSUB table, though it could be a subsetting problem. Richard. From christoph.paeper at crissov.de Mon Feb 13 03:39:41 2017 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 13 Feb 2017 10:39:41 +0100 Subject: WAP Pictogram Specification as Emoji Source In-Reply-To: References: Message-ID: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> Christoph P?per 2016-01-07: > > I just discovered the WAP Pictogram specification (WAP-213-WAPInterPic), last published in April 2001 and updated in November 2001. There was a subsequent update by OMA in 2006, but it didn?t change any pictogram. > Reading through section 7 Pictogram Set, it?s obvious that WAP pictograms have been unified with Japanese (i-mode) emojis upon their encoding in Unicode 6+. There were references to WAP in L2/07-257 and L2/08-081, for instance. > However, the mapping is not obvious in all cases and I think there are some pictograms that have been omitted / forgotten or could have better annotation, I?ve put my best guesses for equivalent emojis onto Github: I still think there are some entries without proper corresponding emoji. Would that suffice as a reason for adding them? - animal/beetle ? unless same as animal/ladybird ?? - emotion/shakenHeart ? unless beating heart ?? is appropriate - emotion/shine ? Sparkles ? may or may not be appropriate - map/park ? maybe Fountain ??, not obvious - map/zoo ? perhaps any non-pet animal - music/rest ? is that what ?? or ?? means? - sport/scuba From mark at macchiato.com Mon Feb 13 05:19:44 2017 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 13 Feb 2017 06:19:44 -0500 Subject: WAP Pictogram Specification as Emoji Source In-Reply-To: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> References: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> Message-ID: Given the status of WAP, I don't think there is any particular need to seek compatibility for it. On the other hand, it ? like other sources ? can certainly be mined for ideas. For example, the topic of SCUBA has certainly come up, and I suspect one could make a good case for the expected frequency being similar to some other sports. See http://unicode.org/emoji/selection.html for how to submit. Mark On Mon, Feb 13, 2017 at 4:39 AM, Christoph P?per < christoph.paeper at crissov.de> wrote: > Christoph P?per 2016-01-07: > > > > I just discovered the WAP Pictogram specification (WAP-213-WAPInterPic), > last published in April 2001 and updated in November 2001. > > There was a subsequent update by OMA in 2006, but it didn?t change any > pictogram. > > > Reading through section 7 Pictogram Set, it?s obvious that WAP > pictograms have been unified with Japanese (i-mode) emojis upon their > encoding in Unicode 6+. > > There were references to WAP in L2/07-257 and L2/08-081, for instance. > > > However, the mapping is not obvious in all cases and I think there are > some pictograms that have been omitted / forgotten or could have better > annotation, > > I?ve put my best guesses for equivalent emojis onto Github: > > references/WAP%20Pictogram.tsv> > > I still think there are some entries without proper corresponding emoji. > Would that suffice as a reason for adding them? > > - animal/beetle ? unless same as animal/ladybird ?? > - emotion/shakenHeart ? unless beating heart ?? is appropriate > - emotion/shine ? Sparkles ? may or may not be appropriate > - map/park ? maybe Fountain ??, not obvious > - map/zoo ? perhaps any non-pet animal > - music/rest ? is that what ?? or ?? means? > - sport/scuba > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenwhistler at att.net Mon Feb 13 10:29:14 2017 From: kenwhistler at att.net (Ken Whistler) Date: Mon, 13 Feb 2017 08:29:14 -0800 Subject: WAP Pictogram Specification as Emoji Source In-Reply-To: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> References: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> Message-ID: <5d284a0f-29fb-5a87-d61b-ed5a2bec4370@att.net> I can't speak to the missing emoji mappings, but... On 2/13/2017 1:39 AM, Christoph P?per wrote: > - music/rest ? is that what ?? or ?? means? The first of those is presumably U+303D PART ALTERNATION MARK, and the second is probably the notorious U+3030 WAVY DASH. So not emoji at all. --Ken From christoph.paeper at crissov.de Mon Feb 13 15:26:40 2017 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 13 Feb 2017 22:26:40 +0100 Subject: WAP Pictogram Specification as Emoji Source In-Reply-To: <5d284a0f-29fb-5a87-d61b-ed5a2bec4370@att.net> References: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> <5d284a0f-29fb-5a87-d61b-ed5a2bec4370@att.net> Message-ID: <702C7694-DE2D-4580-8C52-B8DD9F87F3E0@crissov.de> Ken Whistler : > On 2/13/2017 1:39 AM, Christoph P?per wrote: >> - music/rest ? is that what ?? or ?? means? > > The first of those is presumably U+303D PART ALTERNATION MARK, and the second is probably the notorious U+3030 WAVY DASH. So not emoji at all. You?re right about the code points, but ? huh? From kenwhistler at att.net Mon Feb 13 15:37:28 2017 From: kenwhistler at att.net (Ken Whistler) Date: Mon, 13 Feb 2017 13:37:28 -0800 Subject: WAP Pictogram Specification as Emoji Source In-Reply-To: <702C7694-DE2D-4580-8C52-B8DD9F87F3E0@crissov.de> References: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> <5d284a0f-29fb-5a87-d61b-ed5a2bec4370@att.net> <702C7694-DE2D-4580-8C52-B8DD9F87F3E0@crissov.de> Message-ID: <55e8f43a-d3ed-40f1-d102-a671119c451a@att.net> On 2/13/2017 1:26 PM, Christoph P?per wrote: > Ken Whistler : >> On 2/13/2017 1:39 AM, Christoph P?per wrote: >>> - music/rest ? is that what ?? or ?? means? >> The first of those is presumably U+303D PART ALTERNATION MARK, and the second is probably the notorious U+3030 WAVY DASH. So not emoji at all. > You?re right about the code points, but ? huh? > > > Well, the emoji experts can step in -- presumably because they were in the cell phone SJis extensions that were mapped for emoji originally? Ah, yes, indeedy: 3030;F9AE;; 303D;;;F76C ... from EmojiSources.txt. First one a DoCoMo mapping; second one a SoftBank mapping. --Ken > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Mon Feb 13 16:06:27 2017 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 13 Feb 2017 23:06:27 +0100 Subject: WAP Pictogram Specification as Emoji Source In-Reply-To: References: <8AD54B62-9A0C-4DC7-B617-9EDE2B6A6EC3@crissov.de> Message-ID: Mark Davis ?? : > > Given the status of WAP, I don't think there is any particular need to seek compatibility for it. Much of WAP/WML is deprecated by recent OMA specs, but Pictogram is still an (optional) part of Browsing V2.4 from 2011 ? or more specifically: Mobile Application Environment Specification (MAE). Nevertheless, I think it?s OMA?s job ? not the UTC?s ? to seek a smooth transition of their standards to state-of-the-art technology. > On the other hand, it ? like other sources ? can certainly be mined for ideas. Indeed, but some sources one could mine will leave one in confusion: There have been several characters in drafts over the past ten or so years that seem to absolutely make sense, but haven?t made it for reasons mostly opaque to the public, e.g. Kangaroo in L2/09-114 = N3607. (This one, I think, was dismissed back then because it wasn?t part of the original Japanese sets nor W?dings, but it?s not clear why it didn?t reappear since, when other new animals have been added.) > For example, the topic of SCUBA has certainly come up, and I suspect one could make a good case for the expected frequency being similar to some other sports. Would it be a profession (Diver), an activity (Diving) or an object (Mask, Snorkel, Tanks, Finns, ?) though? This is easy to get wrong: the gender-aware Mage emoji coming to Unicode 10, for instance, is probably less useful than a Magic Wand (which is a common icon in image editors and thus became part of graphic artist jargon) or a generic Magic emoji (arguably covered by Sparkles ? already). From richard.wordingham at ntlworld.com Fri Feb 24 16:32:02 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 24 Feb 2017 22:32:02 +0000 Subject: Sanskrit -e/o a- Sandhi in Devanagari Message-ID: <20170224223202.36224fc5@JRWUBU2> The usual form of this sandhi in modern Sanskrit is described as the a- dropping and being replaced by avagraha. If word boundaries are represented by SPACE, am I correct in believing that the change in codepoints is: becomes I ask because I have seen lines starting with avagraha, though within a line there seems not to be a space before avagraha. (I am ignoring didactic writing which shows sandhi effects but leaves a space between the original words.) Richard. From samjnaa at gmail.com Fri Feb 24 19:43:05 2017 From: samjnaa at gmail.com (Shriramana Sharma) Date: Sat, 25 Feb 2017 07:13:05 +0530 Subject: Sanskrit -e/o a- Sandhi in Devanagari In-Reply-To: <20170224223202.36224fc5@JRWUBU2> References: <20170224223202.36224fc5@JRWUBU2> Message-ID: This seems quite reasonable. On 25 Feb 2017 04:06, "Richard Wordingham" wrote: > The usual form of this sandhi in modern Sanskrit is described as the a- > dropping and being replaced by avagraha. If word boundaries are > represented by SPACE, am I correct in believing that the change in > codepoints is: > > becomes DEVANAGARI SIGN AVAGRAHA> > > I ask because I have seen lines starting with avagraha, though within a > line there seems not to be a space before avagraha. (I am ignoring > didactic writing which shows sandhi effects but leaves a space between > the original words.) > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Tue Feb 28 01:37:10 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 28 Feb 2017 07:37:10 +0000 Subject: Northern Khmer on iPhone Message-ID: <20170228073710.75af64d4@JRWUBU2> Does iPhone support the use of Northern Khmer in Thai script? I would count an interface in Thai as support. The reason I ask is that I tried entering the word ??? 'he' and got a dotted circle. I also got a dotted circle for the alternative spelling . This might be an application issue. The application I was using was Line. Richard. From richard.wordingham at ntlworld.com Tue Feb 28 15:00:56 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 28 Feb 2017 21:00:56 +0000 Subject: Northern Khmer on iPhone In-Reply-To: <20170228073710.75af64d4@JRWUBU2> References: <20170228073710.75af64d4@JRWUBU2> Message-ID: <20170228210056.6e56fcf9@JRWUBU2> On Tue, 28 Feb 2017 07:37:10 +0000 Richard Wordingham wrote: > Does iPhone support the use of Northern Khmer in Thai script? I would > count an interface in Thai as support. > > The reason I ask is that I tried entering the word ??? CHARACTER KO KAI, U+0E3A THAI CHARACTER PHINTHU, U+0E35 THAI CHARACTER > SARA II> 'he' and got a dotted circle. I also got a dotted circle for > the alternative spelling . It's been suggested to me that this is just a font issue. Unfortunately, it seems that one can't change the font without jailbreaking the phone. Do Android systems permit written communication in Northern Khmer in the Thai script? Richard. From verdy_p at wanadoo.fr Tue Feb 28 16:09:05 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 28 Feb 2017 23:09:05 +0100 Subject: Northern Khmer on iPhone In-Reply-To: <20170228210056.6e56fcf9@JRWUBU2> References: <20170228073710.75af64d4@JRWUBU2> <20170228210056.6e56fcf9@JRWUBU2> Message-ID: Various Android devices allow easily the installation of new fonts : at least it works for the built in UI for the configuration panels itself or some default apps, but tpyically it does not work with many apps that will still be using the staock default fonts. The user's preselected font however is generally respected in apps with input forms for typing text and receiving it, or if the app UI is only using the default menu toolkits. If custome widegets are used, or custom layouts, this won't work as these apps are only tested for specific font designs with precise metrics, even if these fonts are "scalable", given the cumortable margins around UI elements that are needed in all cases for any touch interface. Some Android devices offer alternative fonts by integrating their own galey of additional fonts (free, or payed). But for stock fonts used in the Android UI, the Noto fonts are almost always preinstalled on devices, but will be updated only with OS upgrades, even if thee fonts are widely distributed online. These Noto fonts however have very limited ans simplified designs for general use. Games will need much more varieties of styles and emphasis than just changing font sizes, colors, or choosing between two roman/italic styles or between two weights, and need precise positioning of text to make the rest of the non-textual space usable for playing with enough distinctions. It's notoriously dfifficult to scale at the same time both graphics and texts, notably when bitmap graphics are not easily scalable and you can't predict the layout needed to fit texts depending on language and script uses, especially on small screens (even if they have higher pixel density and better contrasts, due to the limitation of your eyes and sizes of your fingers). In most cases, this pushes apps developers to reduce texts in their UI and (ab)use icons... Text is definitely not a priority for them and finally nobody cares about developing alternate fonts for smartphones: default stock fonts will be enough if they fit the basic need for the language users want to use and will be rarely updated, unless they buy a new phone with a newer version of the OS featuring better stock fonts. Garanti sans virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Tue Feb 28 16:45:43 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 28 Feb 2017 22:45:43 +0000 Subject: Northern Khmer on iPhone In-Reply-To: References: <20170228073710.75af64d4@JRWUBU2> <20170228210056.6e56fcf9@JRWUBU2> Message-ID: <20170228224543.75de36df@JRWUBU2> On Tue, 28 Feb 2017 23:09:05 +0100 Philippe Verdy wrote: > ... default stock fonts will be enough if they fit the basic > need for the language users want to use and will be rarely updated, > unless they buy a new phone with a newer version of the OS featuring > better stock fonts. I'm not sure that that applies to minority languages. I'm currently exploring the hypothesis that there is very little in the way of Northern Khmer on the web in the Thai script because input methods or rendering prevent or penalise (e.g. by dotted circles) its use. I am therefore interested in how compatible it is with mobile phones. Chatting with family and childhood friends is one place where using one's mother tongue might make good sense. Richard. From haberg-1 at telia.com Tue Feb 28 16:57:00 2017 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Tue, 28 Feb 2017 23:57:00 +0100 Subject: Northern Khmer on iPhone In-Reply-To: <20170228210056.6e56fcf9@JRWUBU2> References: <20170228073710.75af64d4@JRWUBU2> <20170228210056.6e56fcf9@JRWUBU2> Message-ID: > On 28 Feb 2017, at 22:00, Richard Wordingham wrote: > > On Tue, 28 Feb 2017 07:37:10 +0000 > Richard Wordingham wrote: > >> Does iPhone support the use of Northern Khmer in Thai script? I would >> count an interface in Thai as support. > > It's been suggested to me that this is just a font issue. > Unfortunately, it seems that one can't change the font without > jailbreaking the phone. A search for "iphone install fonts" gives hits for apps, and also a site that sells them and installs via Safari. From lang.support at gmail.com Tue Feb 28 18:06:38 2017 From: lang.support at gmail.com (Andrew Cunningham) Date: Wed, 1 Mar 2017 11:06:38 +1100 Subject: Northern Khmer on iPhone In-Reply-To: <20170228224543.75de36df@JRWUBU2> References: <20170228073710.75af64d4@JRWUBU2> <20170228210056.6e56fcf9@JRWUBU2> <20170228224543.75de36df@JRWUBU2> Message-ID: On iOS it is fairly straightforward to arrange solutions for minority languages. Android has always been a challenge. Older versions of Android might not rendering support for the script. Most handset manufactorers dont allow users to chamge fonts. A couple of handset manufactorers allow users to change between preinstalled fonts and in some cases allow installation of fonts via licensed solutions like flipfont. There are a few apps available that allow you to install additional fonts. But changing the fonts is still device dependent unless you jailbreak the handset. If you want to discuss specific devices or approaches easiest to do it offlist. Andrew On Wednesday, 1 March 2017, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > On Tue, 28 Feb 2017 23:09:05 +0100 > Philippe Verdy wrote: > >> ... default stock fonts will be enough if they fit the basic >> need for the language users want to use and will be rarely updated, >> unless they buy a new phone with a newer version of the OS featuring >> better stock fonts. > > I'm not sure that that applies to minority languages. I'm currently > exploring the hypothesis that there is very little in the way of > Northern Khmer on the web in the Thai script because input methods or > rendering prevent or penalise (e.g. by dotted circles) its use. I am > therefore interested in how compatible it is with mobile phones. > Chatting with family and childhood friends is one place where using > one's mother tongue might make good sense. > > Richard. > -- Andrew Cunningham lang.support at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Feb 28 19:31:37 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 1 Mar 2017 02:31:37 +0100 Subject: Northern Khmer on iPhone In-Reply-To: References: <20170228073710.75af64d4@JRWUBU2> <20170228210056.6e56fcf9@JRWUBU2> <20170228224543.75de36df@JRWUBU2> Message-ID: 2017-03-01 1:06 GMT+01:00 Andrew Cunningham : > Android has always been a challenge. > A couple of handset manufactorers allow users to change between > preinstalled fonts and in some cases allow installation of fonts via > licensed solutions like flipfont. > LG offers that possibility directly from its prepackaged online app store that proposes many fonts. You can even upgrade builtin fonts, notably Noto and Roboto families (removing them will restore the fonts prebuilt in the device ROM, just like you can also "uninstall" updates to builtin apps, to restore their stock version). Fonts are a viable market, but most sales of fonts are for "funny/decorative" font styles (most often with very limited coverage outside basic Latin). Or sold to be used for inclusion and deployement in application packages by application developers that licenced them or to be used by graphic designers and advertizing networks, for titling and logography on websites, books/press, or for merchandising (clothes...): the full coverage for plain text and many languages is not needed. For generic texts and UI on smartphones, a single uniform and minimalist style such as Noto for the UI is generally prefered (some manufacturers are preinstalling some additional fonts and use them by default in device settings, many of them include a serif style family, but with limited coverage such as Times, or a few other legacy sans-serif fonts such as basic versions of Helvetica, Arial or Verdana, and a single monospaced font with very low coverage, to be used only for booting environments or in debugging consoles). Very few users will switch their UI to use a more decorative style, but if they do it, they'll also change it frequently in their settings (and will also be the most frequent customers for alternate font styles available to them at tiny prices on mobile app markets). There are far more users adjusting the default font size rather than font family or styles. Garanti sans virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: