Implementation of ideographic description characters

Philippe Verdy verdy_p at wanadoo.fr
Tue Oct 11 09:27:05 CDT 2016


Actually that extension for now only has data tuned for Traditional
Chinese, and does not implement the full set of IDS mappings (not the
complete Unicode repertoire), but it contains really many mappings for many
IDS strings that have no Unicode encoding. Only very few ideographic
sources are used (not all those listed in the Unihan database) and only two
"True" variants are supported (for some characters) in the database but
only one returned by the current renderer implementation in Java.

Some mappings exist in tow versions: a generic one using some undecomposed
strokes/parts (from the Unicode repertoire), and an expanded one where some
strokes are further decomposed (but using Traditional Chinese rules). In
many mappings, the two IDS are identical. The generic mapping is used  to
handle many cases using overstriking IDS decompositions (which are not
further decomposed in the "expanded" IDS).

The database it contains is still in development though, but its schema
cannot really handle locale-specific variants, or additional variants that
are encoded in Unicode, except if they have a mapping in the CNS encoding
(the database contains a snapshot of the CNS to Big5 and CNS to Unicode
conversion tables, but they are not indexed and probably not used by the
Java written engine, i suppose they are just there only to allow
registering the composite glyphs that have been mapped to an IDS).

Then all IDS are are mapped into a dozen of virtual fonts (with a numeric
id between 0 and 13) and a glyph ID (assigned in the PUA range of the BMP;
font 0 is special as it contains all the base glyphs needed to compose all
other virtual fonts).

But for now this database contains no instruction for more precise
placement or resizing of components, the placement is performed using
generic rules from the IDS itself (and some rules impleemnted in the Java
code for adjusting specific strokes depending on their placement, and
adjusting the relative stroke weights in the composition), and that's
probably why the overstriking IDS (with ⿻)) cannot be processed: instead
they are mapped directly to a NULL unicode entry if needed or left
undecomposed both in the generic IDS and the extended IDS.

It's interesting though. But to adapt the code to Japanese or Korean,
you'll need to extend the current schema. Notably in the main table
containing the list of all supported IDS (generic plus expanded) as it
allows only a single mapping to Unicode (or NULL if there's no such
encoding) and has no column for specifying a localisation variant or
ideographic source (such a dictionnary, book, regional standard, or epoch).

----

Note that when viewing these IDS strings, I've seen that Chrome really has
a problem in displaying the IDS symbols (probably because of incorrect
autohinting): the dotted squares become random foms at usual font sizes
(12px or less) and just display garbage. It may be caused by some fonts on
my Windows 10 system. You need to zoom in the page to get a correct view of
IDS strings. When looking into the Chrome console, I see that symbols are
taken from a couple of system fonts (provided by Windows). Normally the IDS
symbols are very simple in design and even if they are dotted and can be
quirky to adjust at small sizes (to avoid dots to disappear or merge into
segments of lines), my opinion is that hinting for these symbols is simply
bad in Windows fonts or uses some proprietary technics in the OpenType
renderer of Windows, not supported by the font renderer of Chrome. Those
symbols should be correct with most common foint sizes used on the web. In
plain-text editors, the glyphs are correct at reasonnable font sizes, but
the top dotted border of these symbols is most often truncated (probably
extended too high above the line-height, and probably using incorrect
metrics).



2016-10-11 11:21 GMT+02:00 gfb hjjhjh <c933103 at gmail.com>:

> After some researches, there is already a Mediawiki extension named as ids
> that do exactly what I asked about. (https://www.mediawiki.org/
> wiki/Extension:Ids) With the only problem is that ⿻is still not yet
> supported by the system. Now the question is can this extension become
> something integrated into a font.
>
> 2016-08-05 3:26 GMT+08:00 Thomas H Gewecke <thgewecke at mac.com>:
>
>>
>> On Aug 4, 2016, at 2:45 PM, gfb hjjhjh <c933103 at gmail.com> wrote:
>>
>> That Wikipedia page also have a section named as "Ideographic Description
>> Sequences" which is exactly forming sequences base on those ideographic
>> description characters
>>
>>
>> As I understand it, such sequences may provide a “description” of kanji
>> useful for some purposes,  but are not sufficient to properly “render” them.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161011/b3494fce/attachment.html>


More information about the Unicode mailing list