Bengla syllables <... 09BF 09BE> and <... 09BF 09C0>

Richard Wordingham richard.wordingham at ntlworld.com
Tue Feb 7 15:46:13 CST 2017


On Tue, 7 Feb 2017 12:22:44 -0800
Manish Goregaokar <manish at mozilla.com> wrote:

> I found things like this[1] on wikisource which seems like an OCR of
> some really garbled text. The text does indeed seem like it has
> additional vowel diacritics, but that could also be a scanning glitch.
> The same word appears twice in the document, but once in the text.

In particular, the two sequences look like misinterpreted U+09CB
BENGALI VOWEL SIGN O and U+09CC BENGALI VOWEL SIGN AU, which would
account for their high frequency.  The OCRed texts cited by
Manish seem to be in acute need of manual correction.

Richard.


More information about the Unicode mailing list