Proposal to add standardized variation sequences for chess notation

Sun Apr 2 03:53:22 CDT 2017

On Sun, 2 Apr 2017 08:19:00 +0200
Michael Everson <everson at evertype.com> wrote:

> > On 1 Apr 2017, at 23:49, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> > 
> > I think it's all about sizing so that white or black cells will
> > align, independantly of the piece that may be within it.  
> 
> A white knight may stand alone in text, in which case no variation
> exists for display beyond the base glyph in the font. 
> 
> A white knight may need to be represented with specific
> em-square-related metrics within the font, in two variations, one
> with no fill on the background of the em-square indicating the piece
> on a white square, and one with a (typically ///-shaped) fill
> indicating the piece on a black square. 

I'm uneasy about the semantics of the sequences.  To take the extreme
example, <U+2657, U+FE00> would be the white bishop on a (or 'with
its') white square while <U+2657, U+FE01> would be the white bishop on
a black square. Perhaps someone can show me evidence from mathematical
symbols or Japanese kanji that such semantic modifications are perfectly
acceptable.

I have related unease about the glyph of <U+2656, U+FE01>, intended for
a white rook on a black square, being used in text where the meaning is
'white rook' regardless of what square it is on.

If these variation sequences are acccepted, I hope that the intention
that they contribute to producing presentable populated chess
boards in plain text will be captured in at least the Unicode
Standard.  I can see issues with line-spacing, which I believe is
formally out of control in true plain text.

If my unease is well-founded, then I think we have a case for two
combining marks akin to U+20E3 COMBINING ENCLOSING KEYCAP.
Unfortunately, that would not be as simple to use (or define) as the
proposed variation sequences.

I'm also bothered by the purposes of the Format 14 'cmap' subtable.  For
each supported variation selector, it has a direct and an indirect
mapping of base character to glyph.  The direct mapping maps the
character to the glyph to be used when qualified by the variation
selector; that makes perfect sense.  The indirect mapping gives a list
of characters for which the 'default' glyph is to be used, i.e. the
cmap subtables that do not directly support variation selectors are to
be used.  As I understand it, this list will in general not include all
the characters supported by the font but without a non-default
variation sequence mapping.  Thus if a font supports the use of
U+E0100 VARIATION SELECTOR-17, the table for U+E0100 will not have
mention of U+0030, for <U+0030, U+E0100> is not a standardised
variation sequence, and therefore no font should support it.

I believe the purpose of having the indirect mapping is so that one can
query whether a font explicitly supports a sequence.  Thus the cmap
distinguishes two cases:

1) <U+82A6, U+E0100> is explicitly catered for, and used the same
glyph as unqualified U+82A6.

2) <U+82A6, U+E0100> is not catered for.  If one needs to be sure of
having its distinctive form, another font must be used.

If I have understood the intended use correctly, then we need another
variation sequence to explicitly specify a glyph of U+2656 suitable for
use in plain-looking running text, analogous to <U+0032, FE0E> for a
text-style '2'.  A renderer can then ask whether a font supports plain
text white rooks, as opposed to providing one dimensioned for
assembling chess boards.

Richard.