Choosing the Set of Renderable Strings

Richard Wordingham via Unicode unicode at unicode.org
Mon May 14 02:47:55 CDT 2018


On Sun, 13 May 2018 22:15:10 -0800
James Kass via Unicode <unicode at unicode.org> wrote:

> Richard Wordingham asked,
> 
> > Is this a reasonable approach to allowing both collation
> > and suppressing needless homographs?  My contribution to
> > the rendering is only the provision of a font.
> 
> If anything about this approach was unreasonable, one of the experts
> on this list would probably have pointed it out by now.

Not necessarily; some may still be recovering from the recent UTC
meeting.  Moreover, it took many years before we were told that there
was no character to suppress word boundaries wrongly deduced by Thai
breaking algorithms.  The character we had been using, U+2060 WORD
JOINER, is apparently only for suppressing line breaks. 

> Riding along with the insertion of the dotted circles by the USE
> enables the actual users to see immediately that the text needs to be
> modified in order to render reasonably on that system with the shaping
> engine and font selected.  If users consider any such insertion
> inappropriate, then it's feedback time.

The massive failure of USE was reported within hours of USE being
announced on the Unicode forum.  So far there has only been tinkering,
and an encouragement of bad spelling.  For example, at least about 23%
of Northern Thai monosyllables can be rendered only by clear
misspelling - see the results in
http://www.wrdingham.co.uk/lanna/random_test.htm. The USE specification
brushes over this with the statement, "Note: Tai Tham support is
limited to mono-syllabic clusters", which gives the misleading
impression that mono-syllabic clusters are supported.  Basically,
support is limited to (C)+(V)* clusters with a liberal interpretation
of C and V.  Crw and Cry aren't supported either.

At the moment, one is generally better off using a Thai hack font that
uses paiyannoi to toggle between the various forms and placements of
Tai Tham characters.  That has the advantage that the text is still
intelligible when you have no font that renders it as Tai Tham.  The
main limitation of such schemes is in plain text.

> > ... and it is frequently desirable for a font to be able
> > to display its own name.
> 
> Does the font name have to be in a Latin-based script?

Postscript certainly gets unhappy if there isn't an ASCII name for it;
I don't know the requirements for the various PDF generators.

Richard.




More information about the Unicode mailing list