Proposal to add standardized variation sequences for chess notation

Sun Apr 2 04:57:49 CDT 2017

On 2 Apr 2017, at 10:53, Richard Wordingham <richard.wordingham at ntlworld.com> wrote:

>> A white knight may need to be represented with specific em-square-related metrics within the font, in two variations, one with no fill on the background of the em-square indicating the piece on a white square, and one with a (typically ///-shaped) fill indicating the piece on a black square. 
> 
> I'm uneasy about the semantics of the sequences.

There isn’t any, not really. This isn’t a problem.

> To take the extreme example, <U+2657, U+FE00> would be the white bishop on a (or ‘with its') white square

No, “on a” is correct. It’s a display mode. The semantics (such as they are) of a white knight have to do with its position in the 8 x 8 chessboard matrix (which can be parsed in plain text, which is why this proposal is useful in terms of chessboard data). In Figure 3, even in the inadequately formatted examples the plain text arrangement of the squares and chess characters can be read and understood. Proper formatting requires shadowing when a chess piece is on a black square, but whether that square is (in algebraic notation) c6 or d7 is a matter of the matrix, not the font display. This proposal permits a regular 

> while <U+2657, U+FE01> would be the white bishop on a black square. Perhaps someone can show me evidence from mathematical symbols or Japanese kanji that such semantic modifications are perfectly acceptable.

There isn’t a semantic modification. It’s a graphic modification, just like Even the emoji VSes regulate display, not semantics. Thus:

2194 ↔ LEFT RIGHT ARROW 
	= z notation relation 
	⁓ 2194 FE0E  text style 
	⁓ 2194 FE0F  emoji style

This is a matter of display, of font glyph selection. 

> I have related unease about the glyph of <U+2656, U+FE01>, intended for a white rook on a black square, being used in text where the meaning is 'white rook' regardless of what square it is on.

That sequence selects a glyph in the font which draws the white rook surrounded by the diagonal lines which indicate a black square. It has no semantics but what you read into it. (OK, it “means" the rook could logically be on a1, c1, e1, g1, b2, d2, f2, h2, a3, c3, e3, g3, b4, d4, f4, h4 etc. — but this is nothing to feel uneasy about. 

And what, pray, is the alternative?

A full matrix like this:

▗▁▁▁▁▁▁▁▁▖
▕□︀▨︁□︀▨︁□︀▨︁♞︀▨︁▏
▕▨︁□︀▨︁□︀▨︁□︀▨︁□︀▏
▕□︀▨︁♔︀▨︁□︀▨︁□︀▨︁▏
▕▨︁□︀▨︁□︀▨︁♘︀▨︁□︀▏
▕□︀▨︁□︀▨︁♚︀▨︁□︀▨︁▏
▕▨︁□︀▨︁□︀▨︁□︀▨︁□︀▏
▕□︀▨︁□︀♙︁♛︀▨︁□︀▨︁▏
▕▨︁□︀♕︁□︀▨︁♖︀▨︁□︀▏
▝▔▔▔▔▔▔▔▔▘

… is a set of characters which can be parsed. It’s text which can be parsed, and because it uses Unicode characters (rather than the ASCII and Symbol font hacks described in §2 of the proposal) it can be sent and received with fidelity and can be displayed nicely with a conformant font. 

> If these variation sequences are acccepted, I hope that the intention that they contribute to producing presentable populated chess boards in plain text will be captured in at least the Unicode Standard.

My intention would be to re-format the proposal document as a UTN for guidance to implementors, if that’s what you mean. 

> I can see issues with line-spacing, which I believe is formally out of control in true plain text.

So is the font rendering. The board which I have pasted in above in this e-mail doesn’t look great in Everson Mono (which is what I use to view my plain-text e-mail) — because I haven’t added the sequences to that font yet — but it is legible. And I can cut and paste it into another document where I have more font control. Yes, some control over line-spacing might be needed in some environments for optimum results. THat’s why in the proposal says things like "set in Ludus in 24 points with 26-point leading” where relevant. 

> If my unease is well-founded, then I think we have a case for two combining marks akin to U+20E3 COMBINING ENCLOSING KEYCAP. Unfortunately, that would not be as simple to use (or define) as the proposed variation sequences.

We could add some *COMBINING WHITE GAME SQUARE FILTER and *COMBINING BLACK GAME SQUARE FILTER, but this does not simplify matters. First, you would have to decide what base character to use for the squares on which no characters stand. I think that the proposed 25A1 WHITE SQUARE and 25A8 SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL make better sense because in environments where the OpenType features cannot be supplied the plain text is still legible, if not beautiful. 

Your suggestion is not going to alter the burden on the font with regard to display.

> I'm also bothered by the purposes of the Format 14 'cmap' subtable.  

I took this text from a successful proposal dealing with variation selectors by Ken Lunde of Adobe. I am not attached to it. To me, the instruction:

	sub uni2654 uniFE00 by uni2654FE00 ;
	sub uni2654 uniFE01 by uni2654FE01 ;

is all that I implemented, and the result was what was expected.

> For each supported variation selector, it has a direct and an indirect mapping of base character to glyph. The direct mapping maps the character to the glyph to be used when qualified by the variation selector; that makes perfect sense.  The indirect mapping gives a list of characters for which the 'default' glyph is to be used, i.e. the cmap subtables that do not directly support variation selectors are to be used.  As I understand it, this list will in general not include all the characters supported by the font but without a non-default variation sequence mapping.  Thus if a font supports the use of U+E0100 VARIATION SELECTOR-17, the table for U+E0100 will not have mention of U+0030, for <U+0030, U+E0100> is not a standardised variation sequence, and therefore no font should support it.

Um, I don’t understand a word of what you’ve said here, whatever you mean by “direct mapping” and “indirect mapping". All I know is that I used OpenType rules in my font to get sequences to point to certain glyphs, and the result works as intended. You can see this in the proposal document. 

> I believe the purpose of having the indirect mapping is so that one can query whether a font explicitly supports a sequence.  Thus the cmap distinguishes two cases:
> 
> 1) <U+82A6, U+E0100> is explicitly catered for, and used the same glyph as unqualified U+82A6.

I’m not using E0100. And I’m not using CJK character 芦. I did not propose sequences for unqualified chess pieces because I didn’t see any reason why there should be a benefit for it. If there is some genuine benefit, obviously the sequences in my proposal could be altered from

2654 FE00; Chesspiece on white; # WHITE CHESS KING
2654 FE01; Chesspiece on black; # WHITE CHESS KING

(that is:

	sub uni2654 uniFE00 by uni2654FE00 ;
	sub uni2654 uniFE01 by uni2654FE01 ;)

to

2654 FE00; Unqualified chesspiece; # WHITE CHESS KING
2654 FE01; Chesspiece on white; # WHITE CHESS KING
2654 FE02; Chesspiece on black; # WHITE CHESS KING

(that is:

	sub uni2654 uniFE00 by uni2654 ;
	sub uni2654 uniFE01 by uni2654FE02 ;
	sub uni2654 uniFE02 by uni2654FE01 ;)

But I didn’t see any need for that, since 2654 is already the unqualified chesspiece. If there’s a formal need for triplets rather than couplets here, I’ll conform to it, but that seems to be incidental to the robustness of the proposal.

> 2) <U+82A6, U+E0100> is not catered for.  If one needs to be sure of having its distinctive form, another font must be used.
> 
> If I have understood the intended use correctly, then we need another variation sequence to explicitly specify a glyph of U+2656 suitable for use in plain-looking running text, analogous to <U+0032, FE0E> for a text-style '2'.  A renderer can then ask whether a font supports plain text white rooks, as opposed to providing one dimensioned for assembling chess boards.

If a font doesn’t support a glyph or a sequence, then operating systems substitute other glyphs or the .notdef glyph or whatever, no?

Michael Everson