Standardised Variation Sequences with Toggles

Ken Whistler kenwhistler at
Sun Aug 16 14:08:34 CDT 2015

On 8/16/2015 3:20 AM, Richard Wordingham wrote:
> The view of the Unicode Technical committee appears to be that the
> Unicode Character Database (UCD) takes priority over the core text of
> the Unicode Standard in case of conflict.  (Please advise if I have
> misunderstood; I only have the core text and samples of past behaviour
> to go on, neither of which appears to be binding.)


That means that if a data file states, e.g.,

200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;

thus *defining* the General_Category of ZWSP to be Cf, but
that we find that due to some oversight in editing (of what is now
a very large core specification, plus over a dozen annexes), somebody
goofed up and happened to refer to ZWSP as gc=Zs, the data file *wins*.
Some editorial oversight or a typo in the text of the core specification 
be taken as legalistically somehow trumping the data file, just because
somebody finds it "written in the standard".


This should not, IMO, be taken as occasion for general worrying
about the status of data files and the core specification. (In most cases,
the core specification is simply underspecified because the research,
writing and editing for it is under-resourced.)

> One possibility would be to change the text from
> ~ A856 FE00 <U+A856, FE00> phags-pa letter reversed shaping small a
> to
> ~ A856 FE00 phags-pa letter reversed shaping small a
>    • Toggles between <U+A586> and <U+A586, FE00>; see core text for
>    contextual shaping.
> where text in '<...>' is rendered as a string, not echoed as ASCII.
> However, that reads clumsily.  Can people suggest improvements?

Yes, a notice at the top:

@+ For details about the implementation of variation sequences in Phags-pa,
please refer to the Phags-pa section of the core specification.


More information about the Unicode mailing list