IDC's versus Egyptian format controls

Richard Wordingham via Unicode unicode at
Fri Feb 16 18:48:10 CST 2018

On Fri, 16 Feb 2018 15:25:22 -0800
James Kass via Unicode <unicode at> wrote:

> Some people studying Han characters use the IDCs to illustrate the
> ideographs and their components for various purposes.  For example:
> U-0002A8B8 �� ⿰土土
> U-0002A8B9 �� ⿰土凡
> U-0002A8BA �� ⿱夂土
> U-0002A8BB �� ⿰土亡
> U-0002A8BC �� ⿰土无
> U-0002A8BD �� ⿰土冇
> U-0002A8BE �� ⿰土攴
> U-0002A8BF �� ⿰土月
> U-0002A8C0 �� ⿰土化
> U-0002A8C1 �� ⿰土丰
> It would be probably be disconcerting if the display of those
> sequences changed into their respective characters overnight.

And it would be extremely disconcerting if this post was suddenly
rendered in mediaeval black letters, but in theory that could happen.

One can argue that once the compound ideograph have been encoded, the
IDS should no longer be interpreted.  However, I think it will be
difficult to do this in practice.

> Such
> usage might be limited to scholars and students, and a desire for
> default composition might outweigh scholarly concerns,

The lack of mix and match control of the font choices for 'plain text'
presentations is disappointing.  We probably need a pair of OpenType
features, one to discourage and one to encourage interpretation of
IDSes. For web pages and PDFs one should be able to specify the font or
fonts, and OpenType features are increasingly being supported.

> but IMHO to say
> that 'doing it reasonably well at the font level would be a lot of
> work' is a vast understatement.

That was my first thought, but I had worried that I might have been
overestimating.  For the examples you give above, I strongly suspect
that Code2001 already contains the requisite glyph halves.

There is another possible use of the latitude given by TUS 5.0 to 10.0
and possibly earlier.  I can certainly imagine a case where someone
writes a font so that an unencoded character may be manipulated like any
other character.  He has two choices - he can put it in the PUA, or he
can make it the ligature for the IDS.  If he chooses the former, and
then the text and font are separated, the recipient of the text is left
with tofu for the character.  If he chooses the latter, the recipient of
the text would at least have the IDS.  I think the latter outcome is
the better outcome.


More information about the Unicode mailing list