IDC's versus Egyptian format controls

Richard Wordingham via Unicode unicode at
Sat Feb 17 03:43:58 CST 2018

On Fri, 16 Feb 2018 18:05:41 -0800
James Kass via Unicode <unicode at> wrote:

> Richard Wordingham wrote:
> > One can argue that once the compound ideograph have been encoded,
> > the IDS should no longer be interpreted.  
> Wouldn't that break existing data?  If this sort of thing were done at
> OS or app level, it might be possible to replace the IDS string with
> the appropriate character upon file save in some kind of automatic
> fashion.  But I'd sure hate for that to happen to any of my text files
> without warning.

TUS allows one to use an IDS in place of an unencoded character, but
not in place of an encoded character.  Once the character is encoded,
the IDS substitutions should be weeded out.  Of course, there is the
problem that upgrades to a new version of Unicode can be a mosaic
process, with data tables, fonts and rendering engines out of alignment.
At least it's a graceful break, unlike the probability of PUA mappings
simply vanishing or, worse, changing.

Ideally, searching as just searching would use a collation to equate
character and IDS.  There may be a problem in that two distinct
characters could have the same IDS.  Search and automatic replacement is
more of a problem.

I strongly suspect that the rule not to use an IDS in place of an
encoded character would only be applied to an input method.  There is
the very common interpretation that 'should' in the principal clause of
a requirement cancels the requirement; formally the justification is
that it would be too much work.  Enforcing the rule for an unsupported
encoded character would be a hostile act.


More information about the Unicode mailing list