Bengla syllables <... 09BF 09BE> and <... 09BF 09C0>

Richard Wordingham richard.wordingham at ntlworld.com
Wed Feb 8 00:40:27 CST 2017


On Tue, 7 Feb 2017 19:53:45 -0800
Asmus Freytag <asmusf at ix.netcom.com> wrote:

> On 2/7/2017 10:08 AM, Eric Muller wrote:
> In looking at the wiki{pedia,book.source,tionary} corpus for Bengla,
> I see a relatively large number of syllables with  <... 09BF 09BE> or
> <... 09BF 09C0>. I checked a couple of sources, and I did not find
> them listed anywhere as being normally used.
> 
> Are they in normal use or are those all typos?
> Tried a random one: ঘিা (0998 09BF 09BE) and got 385 hits in google.
> Would surprise me if all of these were typos.

>From the dotted circles and unassigned characters, I'm beginning to
think they're all OCR errors or typos.  There does seem to be the odd
typo around.  Tracking the more promising looking pages down, I found
mostly OCR errors, but I did find one apparent typo - or conceivably
genuine spelling.

> The very first one কিী‎ (0995 09BF 09C0) had 1090 hits and shows up
> in a book of short stories:
> 
> where it starts a paragraph.

Well done.  In the first entry I found on Google,
http://sarbaharapath.com/wp-content/uploads/2016/05/%E0%A6%B8%E0%A6%BF%E0%A6%B0%E0%A6%BE%E0%A6%9C-%E0%A6%B8%E0%A6%BF%E0%A6%95%E0%A6%A6%E0%A6%BE%E0%A6%B0-%E0%A6%B0%E0%A6%9A%E0%A6%A8%E0%A6%BE_%E0%A6%AC%E0%A6%BF%E0%A6%AD%E0%A6%BF%E0%A6%A8%E0%A7%8D%E0%A6%A8-%E0%A6%B8%E0%A7%8D%E0%A6%A4%E0%A6%B0%E0%A7%87%E0%A6%B0-%E0%A6%B8%E0%A6%BE%E0%A6%82%E0%A6%97%E0%A6%A0%E0%A6%A8%E0%A6%BF%E0%A6%95-%E0%A6%8F%E0%A6%95%E0%A6%95%E0%A7%87%E0%A6%B0-%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A6%BE%E0%A6%AA%E0%A6%A4%E0%A7%8D%E0%A6%A4%E0%A6%BE%E0%A6%AE%E0%A7%82%E0%A6%B2%E0%A6%95-%E0%A6%A6%E0%A6%BE%E0%A7%9F%E0%A6%BF%E0%A6%A4%E0%A7%8D%E0%A6%AC.pdf ,
it appears to be an atrocious OCR error for কমী <U+0995, U+09AE,
U+09C0>.

However, if you're referring to Rabindranath Tagore's Golpo Samagra,
via Google on https://books.google.co.uk/books?id=F8LfBwAAQBAJ, it is
again a misreading by OCR, this time for the more forgivable 'কী
<U+0027 APOSTROPHE, U+0995, U+09C0>, which is why it occurs at the
starts of paragraphs!

Richard.



More information about the Unicode mailing list